An end-to-end example of a serverless machine learning pipeline for multiclass classification on AWS with SageMaker Pipelines, Data Wrangler, Athena and XGBoost. See this blog post for details.
- Node.js
- Python
- AWS CLI
Optional:
- MaxMind license key (free)
Before you proceed, set up MAXMIND_LICENSE_KEY
environment variable with a valid license key. If not provided, IP address lookup will be disabled.
Install all required dependencies with command:
bash init.sh
The following command will deploy all resources and will launch an inference server:
bash deploy.sh
The deployed infrastructure is serverless and does not have any hourly costs associated to it when not used, except for the inference server, which costs $0.056 per hour. Consider shutting down the inference server when you don't need it (see npm run stop
below). Shutting down the server does not remove any data.
All resources are deployed to the Oregon region (us-west-2
) and are managed by three CloudFormation stacks.
To start the ML pipeline execution, run the command:
bash invoke.sh
It will return an AWS Console URL to the Step Functions pipeline that you can use to track the execution. Additionally, go to SageMaker and launch the Studio application to check the ML workflow progress.
Test data that will be deployed to S3.
AWS CDK project with infrastructure definition.
npm commands:
npm run bootstrap
- deploy the AWS CDK (required when deploying for the first time);npm run deploy
- deploy the main infrastructure (no hourly costs);npm run runtime
- deploy the runtime infrastructure (hourly costs incurred);npm run stop
- delete the runtime infrastructure (the data will be retained).
SageMaker pipeline definitions and Python scripts.
Serverless.js project with Lambda API.
npm commands:
npm run deploy
- deploy the service.