Giter VIP home page Giter VIP logo

end-to-end-ml-sm's Introduction

End to end Machine Learning with Amazon SageMaker

Introduction

The Machine Learning process is an iterative process that consists of several steps:

  • Identifying a business problem and the related Machine Learning problem
  • Data ingestion, integration and preparation
  • Data visualization and analysis, feature engineering, model training and model evaluation
  • Model deployment, model monitoring and debugging

The previous steps are generally repeated multiple times to better meet business goals following to changes in the source data, decrease in the perfomance of the model, etc.

The process can be represented with the following diagram:

ML Process

After a model has been deployed, we might want to integrate it with our own application to provide insights to end users.

In this workshop we will go through the steps required to build a fully-fledged machine learning application on AWS. We will execute an iteration of the Machine Learning process to build, train and deploy a model using Amazon SageMaker, and then we will add inference capabilities to a demo application by deploying a REST API with Amazon API Gateway.

The final architecture will be:

Architecture

The Machine Learning task

We have been provided with a dataset (stored in an Amazon S3 bucket) containing data collected in a wind turbine plant, where each example includes several sensor measurements and a status indicating whether the plant was healthy or not.

โš ๏ธ Note: this is a synthetic dataset that oversimplifies the Predictive Maintenance task: however, it keeps this workshop easier to execute.

Our goal is building a simple Machine Learning model that, given new sensor data, will predict whether the plant requires maintenance or not, allowing to execute maintenance before a breakdown event happens (Predictive Maintenance).

Following is an excerpt from the dataset:

turbine_id turbine_type wind_speed rpm_blade oil_temperature ... breakdown
TID003 HAWT 85 78 36.0 ... yes
TID009 HAWT 80 25 37.0 ... no
TID005 HAWT 36 32 40.0 ... no

The target variable is the breakdown attribute, which is binary and suggests implementing a binary classification model.

After building the model, we can host it and expose as a REST API that will respond to inference requests from client-side applications.

Modules

This workshops consists of seven modules:

  • Module 01 - Open Amazon SageMaker Studio and clone the repository.
  • Module 02 - Using AWS Glue and Amazon Athena to execute data exploration, and then data preprocessing and feature engineering using Amazon SageMaker Processing and SKLearn.
  • Module 03 - Training a binary classification model with the Amazon SageMaker open-source XGBoost container; the model will predict whether a wind turbine plant requires maintenance. Use Sagemaker Debugger to monitor training progress with rules and visualize training metrics like accuracy and feature importance.
  • Module 04 - Deploying the feature engineering and ML models as a pipeline using Amazon SageMaker hosting (inference pipelines). Use Sagemaker Model Monitor to track data drift violations against the training data baseline.
  • Module 05 - Buiding a REST API using Amazon API Gateway and implementing an AWS Lambda function that will invoke the Amazon SageMaker endpoint for inference.
  • Module 06 - Using a single-page demo application to invoke the REST API and get inferences.
  • Module 07 - Use Amazon SageMaker Pipelines to orchestrate the model build workflow and store models in model registry.

You must comply with the order of modules, since the outputs of a module are inputs of the following one.

Getting started

This workshop has been designed assuming that each participant is using an AWS account that has been provided and pre-configured by the workshop instructor(s). However, you can also choose to use your own AWS account, but you'll have to execute some preliminary configuration steps as described here.

Once you are ready to go, please start with Module 01.

License

The contents of this workshop are licensed under the Apache 2.0 License.

Authors

Giuseppe A. Porcelli - Principal, ML Specialist Solutions Architect - Amazon Web Services EMEA
Antonio Duma - Sr. Startup Solutions Architect - Amazon Web Services EMEA
Hasan Poonawala - ML Specialist Solution Architect - Amazon Web Services EMEA
Nir Shney-Dor - Sr. Startup Solutions Architect - Amazon Web Services EMEA

end-to-end-ml-sm's People

Contributors

giuseppeporcelli avatar hasanp87 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

end-to-end-ml-sm's Issues

Can't copy the raw dataset

Hi Team,

I couldn't copy the raw dataset.
Please help me to double check again.

Thanks so much.

Regards,
Vu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.