This architecture describes how to: (1) preprocess an image (.jpg) dataset into the recommended RecordIO input format for image classification, (2) train and evaluate a MXNet binary image classification model using SageMaker, and (3) register the trained model to SageMaker Model Registry. Additionally, this pattern demonstrates how all these ML workflow steps can be defined and automated using SageMaker Pipelines.
- An active AWS account
- Download to the following Pizza or Not Pizza? public dataset
- Note: For this pattern, you will be building a binary image classification model that detects whether an input image contains a pizza food item or not. However, you can modify this pattern to optionally use any image dataset that has two distinct classes (i.e. cat vs. dog)
- An Amazon Simple Storage Service (Amazon S3) bucket to store the image (.jpg) dataset
- Access to create and configure an Amazon SageMaker Domain and User Profile. For more information about this, see Onboard to Amazon SageMaker Domain in the Amazon SageMaker documentation
- Access to Amazon SageMaker Studio
- An understanding of Amazon SageMaker notebooks and Jupyter notebooks
- An understanding of how to create an AWS Identity and Access Management (IAM) role with basic SageMaker role permissions and S3 bucket access permissions
- Familiarity with Python
- Familiarity with common ML terms and concepts such as “binary classification”, “preprocessing”, “hyperparameters”, etc. For more information about this, see Machine Learning Concepts in the Amazon Machine Learning documentation
- To save processing time and cut costs, only a subset (1000 images) of the Pizza or Not Pizza? dataset is used to build the image classification model. You can choose to use more (or less) data or choose another dataset entirely (as mentioned above)
- Certain hyperparameters in the model training step are hard-coded (manually set). These are specified in the
image-classification-sagemaker-pipelines.ipynb
Jupyter notebook. For more information about this, see Image Classification Hyperparameters in the Amazon SageMaker documentation. - You can extend upon the existing image classification ML workflow by adding additional steps (e.g. model tuning step) as needed. For more information about this, see Pipeline Steps in the Amazon SageMaker documentation.
After registering the trained model to SageMaker Model Registry, you can choose to deploy the model to a SageMaker endpoint for real-time inference. For more information about this, see Deploy a Model from the Registry in the Amazon SageMaker documentation.
- Amazon SageMaker — SageMaker is a fully managed ML service
- Amazon SageMaker Pipelines — SageMaker Pipelines help create, automate, and manage end-to-end ML workflows at scale
- Amazon SageMaker Model Registry — SageMaker Model Registry helps centrally catalog and manage trained ML models
- Amazon Simple Storage Service (Amazon S3) — Amazon S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance.
- Python — Python is a programming language.
- Create a new S3 bucket with default settings via the Amazon S3 console
- Create a new folder named “ImageData” within the newly created S3 bucket
- Within the “ImageData” folder, create two subfolders named “Pizza” and “NotPizza”.
- Locally download Pizza or Not Pizza? dataset on to your computer and unzip its contents
- You should notice two subdirectories within the downloaded file: (1) “pizza” and (2) “not_pizza”
- Note: You may have to create a free account with Kaggle.com to access the dataset.
- Navigate to the “Pizza” folder in the S3 bucket and upload 500 randomly selected images from the “pizza” subdirectory from the locally downloaded dataset.
- Navigate to the “NotPizza” folder in the S3 bucket and upload 500 randomly selected images from the “not_pizza” subdirectory from the locally downloaded dataset.
- Create a new SageMaker Domain and User Profile via the Amazon SageMaker console
- Follow the instructions from Onboard to Amazon SageMaker Domain Using Quick setup from the Amazon SageMaker documentation.
- Note: When setting up the IAM role for the user profile, ensure that you give access to the Amazon S3 bucket you created earlier.
- Note: Ensure that the SageMaker Domain is created in the same AWS Region as the S3 bucket you created earlier
- Launch SageMaker Studio application via the User
- Follow the instructions from Launch Studio Using the Amazon SageMaker Console from the Amazon SageMaker documentation
- Download the
image-classification-sagemaker-pipelines.ipynb
Jupyter notebook andscripts
folder from this GitHub repository - Upload the
image-classification-sagemaker-pipelines.ipynb
Jupyter notebook andscripts
folder to the SageMaker Studio application
- Sequentially run the code cells from the
image-classification-sagemaker-pipelines.ipynb
Jupyter notebook within SageMaker Studio- Note: Make sure to appropriately configure the
TODO
portions of the code as you run the code cells
- Note: Make sure to appropriately configure the
- You can graphically monitor the pipeline execution in SageMaker Studio.
- Follow the instructions from View a Pipeline from the Amazon SageMaker documentation.
- After the pipeline is finished, you can view the registered model and associated metadata within SageMaker Studio.
- Follow the instructions from View the Details of a Model Version (Amazon SageMaker Studio) section of Amazon SageMaker documentation.
- Delete the S3 bucket with the image dataset and the default S3 bucket created by the SageMaker session
- For more information on this, follow the instructions from Deleting a bucket from the Amazon S3 documentation.
- Delete the default S3 bucket created by the SageMaker session
- Note: The default S3 bucket created by the SageMaker session should be in the following format: "sagemaker-{region}-{aws-account-id}”
- Delete model group from SageMaker Model Registry
- Follow the instructions from Delete a Model Group from the Amazon SageMaker documentation.
- Note: The model group name should be “MXNet-Image-Classification.” This was previously defined in the
image-classification-sagemaker-pipelines.ipynb
Jupyter notebook
- Delete SageMaker IAM execution role
- First, navigate to your SageMaker Domain via the Amazon SageMaker console. Next, click on the "Domain Settings" tab. Now, under "General settings," you should find the "Execution role" for your SageMaker Domain. Copy the name of that "Execution role" (i.e. "AmazonSageMaker-ExecutionRole-XXXXX".
- Next, navigate to the AWS IAM console and delete the SageMaker Execution (IAM) role you just copied. For more information about this, refer to Deleting an IAM role (console) from the AWS IAM documentation.
- Delete SageMaker Domain
- Follow the instructions from Delete an Amazon SageMaker Domain (console) from the Amazon SageMaker documentation
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.
- Siddharth Kumaran -- Assoc. Machine Learning Engineer @ AWS Professional Services