CloudFormation for SageMaker instance

Amazon SageMaker helps data scientists and Machine Learning developers build, train and deploy machine learning models. It includes Jupyter notebook to build and train model as well SageMaker API to train and deploy model with a few lines of code.

Amazon CloudFormation helps in provisioning AWS resources using code. It automates provisioning and configuring resources using YAML or JSON files.

In this post, we look at how to create a SageMaker notebook using Amazon CloudFormation and additionally do the following:

  1. Add a lifecycle configuration to the SageMaker notebook so that the notebook shuts down if its idle for more than an hour.
  2. Attach a default and an additional repository to the notebook so that the repository is available in the Github notebook.
  3. Create an IAM role for the notebook
  4. An S3 bucket that the SageMaker notebook can access.

Lets get started! The first step is to create the S3 repo

CloudFormation for S3 bucket

We first create the S3 bucket. Its a good practice to include the account number in the S3 bucket name to ensure that it’s unique.

Resources:
  notebookS3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Sub 'ac${AWS::AccountId}-sagemaker-bucket'

CloudFormation for GitHub Repository

The default Repository in our case is hosted on GitHub, we store the credentials in AWS SecretsManager

  defaultRepository:
    Type: AWS::SageMaker::CodeRepository
    Properties: 
      CodeRepositoryName: defaultRepo
      GitConfig:
        RepositoryUrl:  https://github.com/MithilShah/aws-examples
        SecretArn: arn:aws:secretsmanager:us-east-1:xxxxxxxx:secret:github-pXnnSg

CloudFormation for Lifecycle Configuration

  BasicNotebookInstanceLifecycleConfig:
    Type: "AWS::SageMaker::NotebookInstanceLifecycleConfig"
    Properties:
      NotebookInstanceLifecycleConfigName: stopidle
      OnStart:
      - Content:
          Fn::Base64: |
            #!/bin/bash
            
            set -e
            
            # OVERVIEW
            # This script stops a SageMaker notebook once it's idle for more than 1 hour (default time)
            # You can change the idle time for stop using the environment variable below.
            # If you want the notebook the stop only if no browsers are open, remove the --ignore-connections flag
            #
            # Note that this script will fail if either condition is not met
            #   1. Ensure the Notebook Instance has internet connectivity to fetch the example config
            #   2. Ensure the Notebook Instance execution role permissions to SageMaker:StopNotebookInstance to stop the notebook 
            #       and SageMaker:DescribeNotebookInstance to describe the notebook.
            #
            
            # PARAMETERS
            IDLE_TIME=3600
            
            echo "Fetching the autostop script"
            wget https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/master/scripts/auto-stop-idle/autostop.py
            
            echo "Starting the SageMaker autostop script in cron"
            
            (crontab -l 2>/dev/null; echo "*/5 * * * * /usr/bin/python $PWD/autostop.py --time $IDLE_TIME --ignore-connections") | crontab -
 

CloudFormation for SageMaker Execution Role

  ExecutionRole: 
    Type: "AWS::IAM::Role"
    Properties: 
      RoleName: "sagemaker-notebook-role"
      AssumeRolePolicyDocument: 
        Version: "2012-10-17"
        Statement: 
        - 
          Effect: "Allow"
          Principal: 
            Service: 
              - "sagemaker.amazonaws.com"
          Action: 
          - "sts:AssumeRole"
      Path: "/"
      Policies: 
        - 
          PolicyName: "sagemaker-notebook-policy"
          PolicyDocument: 
            Version: "2012-10-17"
            Statement: 
              - Effect: "Allow"
                Action: 
                  - "cloudwatch:PutMetricData"
                  - "logs:CreateLogStream"
                  - "logs:PutLogEvents"
                  - "logs:CreateLogGroup"
                  - "logs:DescribeLogStreams"
                Resource: "*"
              - Effect: "Allow"
                Action: 
                  - "s3:GetObject"
                  - "s3:PutObject"
                  - "s3:ListBucket"
                Resource: 
                - !GetAtt notebookS3Bucket.Arn

CloudFormation for SageMaker Notebook

  sageMakerNotebook:
    Type: AWS::SageMaker::NotebookInstance
    Properties: 
      AdditionalCodeRepositories: 
      - https://github.com/synthetichealth/synthea.git
      DefaultCodeRepository: !GetAtt defaultRepository.CodeRepositoryName
      DirectInternetAccess: Enabled
      InstanceType: ml.t3.medium
      NotebookInstanceName: sample-notebook
      RoleArn: !GetAtt ExecutionRole.Arn
      VolumeSizeInGB: 10
      LifecycleConfigName: !GetAtt BasicNotebookInstanceLifecycleConfig.NotebookInstanceLifecycleConfigName

That’s it! The complete YAML file can be found at https://github.com/MithilShah/aws-examples/blob/master/CloudFormation/SageMakerNotebook.yaml

Leave a Comment