Giter VIP home page Giter VIP logo

aws-certified-machine-learning-study-notes's Introduction

AWS-Certified-Machine-Learning-Study-Notes

AWS Certified Machine Learning – Study Notes

These notes are written by a data scientist, so some basic topics may be glanced over

Learning Path

  1. Linux Academy
  2. SageMaker FAQ
  3. Blog Posts
  4. Practise exams

Below is a high level overview. More in-depth explanations are found in separate files

  • Machine learning lifecycle
  • Supervised vs Unsupervised vs Reinforcement learning
  • Optimisation
  • Regularisation (L1 Lasso & L2 Ridge)
  • Hyperparameters
  • Cross-validation

2. Data

  • Feature selection
  • Feature engineering
  • Principal Component analysis (PCA)
  • Missing and unbalanced data
  • Label encoding & One-hot-encoding
  • Train-test splits & Randomisation
  • RecordIO format
  • Logistic Regression
  • Linear regression
  • SVM
  • Decision trees
  • Random Forest
  • K-means
  • KNN
  • Latent Dirichlet ALlocation - LDA
  • Neural Networks
  • Activations functions (sigmoid, Tanh, ReLU)
  • Weights & biases
  • Forward & Back propogation
  • Convolutional Neural Networks (CNN)
  • Filters
  • Transfer Learning
  • Recurrent Neural Networks (RNN)
  • Sensitivity (Recall / TPR)
  • Specificity (TNR)
  • Precision
  • Accuracy
  • ROC / AUC
  • F1 Score
  • Gini impurity
  • Pytorch & Scikit-learn
  • Tensorflow & Keras
  • MXNET & Gluon
  • Tensors & Graphs
  • S3 Datalakes
  • Kinesis (video stream / data stream / firehose / data analytics)
  • Glue
  • Athena
  • Elastic Map Reduce (EMR) & Spark
  • EC2 instance types for ML
  • AWS Machine Learning service (deprecate)
  • Rekognition (images)
  • Rekognition (videos)
  • Polly (text2speech)
  • Transcribe (speech2text)
  • Translate
  • Comprehend
  • Lex (chatbots)
  • Step Functions

9. Sagemaker -- VERY IMPORTANT TOPIC

  • Sagemaker High Level
  • Three stages: Build, train, deploy
  • Sagemaker console
  • Sagemaker API
  • Sagemaker Python SDK
  • !!Define your problem first!!
  • Build process: Visualise, Explore, Feature engineering, Synthesize data, Convert data, Change structure (joins), Split data
  • Ground truth
  • SageMaker Algorithms: Built in, marketplace, custom
  • Algorithm Types: eg. BlazingText (AWS-Comprehend), Image classification (AWS-Rekognition)
  • Architecture behind Sagemaker training: Algorithms stored in docker containers in ECS, spin up EC2 instances
  • AWS Marketplace: Algorithms are to be trained, Model packages are pre-trained
  • Where to access data: S3, EFS, FSx for Lustre
  • Filetypes: Files / Pipe (recordIO)
  • Instance types: ml.m4, ml.c4, ml.p2 (gpu)
  • Some algorithms only support GPU instances
  • Managed spot training & Checkpoints
  • Automated Hyperparameter tuning
  • Real-time inference
  • Batch inference
  • Sagemaker root access
  • AmazoneSageMakerFullAccess policy: Admin access to SageMaker + necessary access to other services
  • Sagemaker can see objects in S3 by default, can't access
  • Deployed into public VPC by default

Other

  • AWS DeepLens – Deep learning enabled video camera for developers
  • AWS DeepRacer - Reinforcement learning enabled race-car

Sagemaker FAQs notes

  • CloudTrail to see SageMaker API calls
  • Notebooks persist on the volume of the attached instance. So stopping the instance doesn't make you lose your progress.
  • Managed spot training uses Spot instance to train. Have to specify time to wait for spot capacity
    • Good when you have flexibility
    • Uses checkpoints to store progress. Avoids failure when instance is terminated.
  • BlazingText
  • Automated hyperparameter tuning available for all algorithms (including custom one).
    • Uses a custom Bayesian Optimization under the hood
  • Can currently only optimise for one objective (ie. accuracy or speed)
  • Reinforcement learning is a machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences
    • Available to train in SageMaker. Can use AWS RoboMaker, Open AI Gym or commercial simulation environments to train
  • SageMaker Neo: Enables machine learning models to train once and run anywhere in the cloud and at the edge
    • Optimizes models built with popular deep learning frameworks that can be used to deploy on multiple hardware platforms
    • Two major components – a compiler and a runtime
    • Supports the most popular deep learning models for computer vision and decision tree models:
      • AlexNet, ResNet, VGG, Inception, MobileNet, SqueezeNet, and DenseNet models trained in MXNet and TensorFlow,
      • classification and random cut forest models trained in XGBoost
  • Model performance from multiple runs is available in the Management Console in tabular form giving you a leaderboard
  • Can't directly access the underlying hardware SageMaker runs on
  • Can scale manually, or automatically using Application Auto Scaling
  • CloudWatch Metrics to monitor SageMaker environment
    • Logs written to CloudWatch

SageMaker Algorithms - Overview

  • Built-in algorithms:
    • linear regression
    • logistic regression
    • k-means clustering
    • principal component analysis (PCA)
    • factorization machines
    • neural topic modeling
    • latent dirichlet allocation
    • gradient boosted trees
    • sequence2sequence
    • time series forecasting
    • word2vec
    • image classification
  • Optimized containers:
    • Apache MXNet
    • Tensorflow
    • Chainer
    • PyTorch
  • Custom algorithms by using Docker images

aws-certified-machine-learning-study-notes's People

Contributors

mikegchambers avatar petur-bjss avatar thenicelander avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.