Giter VIP home page Giter VIP logo

sagemaker-101-workshop's Introduction

Getting Started with "Amazon SageMaker 101"

This repository accompanies a hands-on training event to introduce data scientists (and ML-ready developers / technical leaders) to core model training and deployment workflows with Amazon SageMaker.

Like a "101" course in the academic sense, this will likely not be the simplest introduction to SageMaker you can find; nor the fastest way to get started with advanced features like optimized SageMaker Distributed training or SageMaker Clarify for bias and explainability analyses.

Instead, these exercises are chosen to demonstrate some core build/train/deploy patterns that we've found help new users to first get productive with SageMaker - and to later understand how the more advanced features fit in.

Agenda

An interactive walkthrough of the content with screenshots is available at:

https://sagemaker-101-workshop.workshop.aws/

Sessions in suggested order:

  1. builtin_algorithm_hpo_tabular: Explore some pre-built algorithms and tools for tabular data, including SageMaker Autopilot AutoML, the XGBoost built-in algorithm, and automatic hyperparameter tuning
  2. custom_script_demos: See how you can train and deploy your own models on SageMaker with custom Python scripts and the pre-built framework containers
    • (Optional) Start with sklearn_reg for an introduction if you're new to deep learning but familiar with Scikit-Learn
    • See huggingface_nlp (preferred) for a side-by-side comparison of in-notebook versus on-SageMaker model training and inference for text classification - or alternatively the custom CNN-based keras_nlp or pytorch_nlp examples.
  3. migration_challenge: Apply what you learned to port an in-notebook workflow to a SageMaker training job + endpoint deployment on your own

Deploying in Your Own Account

The recommended way to explore these exercises is to onboard to SageMaker Studio. Once you've done this, you can download this repository by launching a System terminal (From the "Utilities and files" section of the launcher screen inside Studio) and running git clone https://github.com/aws-samples/sagemaker-101-workshop.

If you prefer to use classic SageMaker Notebook Instances, you can find a CloudFormation template defining a simple setup at .simple.cf.yaml. This can be deployed via the AWS CloudFormation Console.

You can refer to the "How Are Amazon SageMaker Studio Notebooks Different from Notebook Instances?" docs page for more details on differences between the Studio and Notebook Instance environments.

Depending on your setup, you may be asked to choose a kernel when opening some notebooks. There should be guidance at the top of each notebook on suggested kernel types, but if you can't find any, Data Science 3.0 (Python 3) (on Studio) or conda_python3 (on Notebook Instances) are likely good options.

Setting up widgets and code completion (JupyterLab extensions)

Some of the examples depend on ipywidgets and ipycanvas for interactive inference demo widgets (but do provide code-only alternatives).

We also usually enable some additional JupyterLab extensions powered by jupyterlab-lsp and jupyterlab-s3-browser to improve user experience. You can find more information about these extensions in this AWS ML blog post

ipywidgets should be available by default on SageMaker Studio, but not on Notebook Instances when we last tested. The other extensions require installation.

To see how we automate these extra setup steps for AWS-run events, you can refer to the lifecycle configuration scripts in our CloudFormation templates. For a Notebook Instance LCC, see the AWS::SageMaker::NotebookInstanceLifecycleConfig in .simple.cf.yaml. For a SageMaker Studio LCC, see the Custom::StudioLifecycleConfig in .infrastructure/template.sam.yaml.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Further Reading

One major focus of this workshop is how SageMaker helps us right-size and segregate compute resources for different ML tasks, without sacrificing (and ideally accelerating!) data scientist productivity. For more information on this topic, see this post on the AWS Machine Learning Blog: Right-sizing resources and avoiding unnecessary costs in Amazon SageMaker

For a workshop that starts with a similar migration-based approach, but dives further into automated pipelines and CI/CD, check out aws-samples/amazon-sagemaker-from-idea-to-production.

As you continue to explore Amazon SageMaker, you'll also find many more useful resources in:

More advanced users may also find it helpful to refer to:

sagemaker-101-workshop's People

Contributors

acere avatar amazon-auto avatar athewsey avatar bbonik avatar pedrojpaez avatar rosalieandico avatar tagekezo avatar tash-f avatar tom5610 avatar yudho avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.