Giter VIP home page Giter VIP logo

model-training-and-deployment's Introduction

Training and deployment of deep learning models applied to satellite and aerial imagery.

How to use this repository: if you know exactly what you are looking for (e.g. you have the paper name) you can Control+F to search for it in this page (or search in the raw markdown).

Contents

Model training

This section discusses training machine learning models.

Metrics

A number of metrics are common to all model types (but can have slightly different meanings in contexts such as object detection), whilst other metrics are very specific to particular classes of model. The correct choice of metric is particularly critical for imbalanced dataset problems, e.g. object detection

  • TP = true positive, FP = false positive, TN = true negative, FN = false negative
  • Precision is the % of correct positive predictions, calculated as precision = TP/(TP+FP)
  • Recall or true positive rate (TPR), is the % of true positives captured by the model, calculated as recall = TP/(TP+FN). Note that FN is not possible in object detection, so recall is not appropriate.
  • The F1 score (also called the F-score or the F-measure) is the harmonic mean of precision and recall, calculated as F1 = 2*(precision * recall)/(precision + recall). It conveys the balance between the precision and the recall. Ref
  • The false positive rate (FPR), calculated as FPR = FP/(FP+TN) is often plotted against recall/TPR in an ROC curve which shows how the TPR/FPR tradeoff varies with classification threshold. Lowering the classification threshold returns more true positives, but also more false positives. Note that since FN is not possible in object detection, ROC curves are not appropriate.
  • Precision-vs-recall curves visualise the tradeoff between making false positives and false negatives
  • Accuracy is the most commonly used metric in 'real life' but can be a highly misleading metric for imbalanced data sets.
  • IoU is an object detection specific metric, being the average intersect over union of prediction and ground truth bounding boxes for a given confidence threshold
  • [email protected] is another object detection specific metric, being the mean value of the average precision for each class. @0.5 sets a threshold for how much of the predicted bounding box overlaps the ground truth bounding box, i.e. "minimum 50% overlap"
  • For more comprehensive definitions checkout Object-Detection-Metrics
  • Metrics to Evaluate your Semantic Segmentation Model

Best practice

This section includes tips and ideas I have picked up from other practitioners including ai-fast-track, FraPochetti & the IceVision community

Free online compute

A GPU is required for training deep learning models (but not necessarily for inferencing), and this section lists a couple of free Jupyter environments with GPU available. There is a good overview of online Jupyter development environments on the fastai site. I personally use Colab Pro with data hosted on Google Drive, or Sagemaker if I have very long running training jobs.

Google Colab

  • Collaboratory notebooks with GPU as a backend for free for 12 hours at a time. Note that the GPU may be shared with other users, so if you aren't getting good performance try reloading.
  • Also a pro tier for $10 a month -> https://colab.research.google.com/signup
  • Tensorflow, pytorch & fastai available but you may need to update them
  • Colab Alive is a chrome extension that keeps Colab notebooks alive.
  • colab-ssh -> lets you ssh to a colab instance like it’s an EC2 machine and install packages that require full linux functionality

Kaggle

  • Free to use
  • GPU Kernels - may run for 1 hour
  • Tensorflow, pytorch & fastai available but you may need to update them
  • Advantage that many datasets are already available

Deployment

This section discusses how to get a trained machine learning & specifically deep learning model into production. For an overview on serving deep learning models checkout Practical-Deep-Learning-on-the-Cloud. There are many options if you are happy to dedicate a server, although you may want a GPU for batch processing. For serverless use AWS lambda.

Rest API on dedicated server

A common approach to serving up deep learning model inference code is to wrap it in a rest API. The API can be implemented in python (flask or FastAPI), and hosted on a dedicated server e.g. EC2 instance. Note that making this a scalable solution will require significant experience.

Model serving with GRPC

GPRC is a framework for implementing Remote Procedure Call (RPC) via HTTP/2. Developed and maintained mainly by Google, it is widely used in the industry. It allows two machines to communicate, similar to HTTP but with better syntax and performance.

Framework specific model serving

If you are happy to live with some lock-in, these are good options:

Framework agnostic model serving

Using lambda functions - i.e. serverless

Using lambda functions allows inference without having to configure or manage the underlying infrastructure

Inferencing on large images

Models are typically trained and inferenced on relatively small images, e.g. 640x640 pixels for YOLOv5m. To inference on a large image it is necessary to use a sliding window over the image, inference on each window, then combine the results. However lower confidence predicitons will be made at the edges of the window where objects may be partially cropped. To overcome this a framework called sahi has been developed. An example of how to use sahi with yolo is here. For an example of using threading to process a large image see Fast-Large-Image-Object-Detection-yolov7

Models in the browser

The model is run in the browser itself on live images, ensuring processing is always with the latest model available and removing the requirement for dedicated server side inferencing

Model optimisation for deployment

The general approaches are outlined in this article from NVIDIA which discusses fine tuning a model pre-trained on synthetic data (Rareplanes) with 10% real data, then pruning the model to reduce its size, before quantizing the model to improve inference speed. There are also toolkits for optimisation, in particular ONNX which is framework agnostic.

MLOps

MLOps is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently.

Model monitoring

Once your model is deployed you will want to monitor for data errors, broken pipelines, and model performance degradation/drift ref

Model tracking, versioning, specification & compilation

  • dvc -> a git extension to keep track of changes in data, source code, and ML models together
  • Weights and Biases -> keep track of your ML projects. Log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues
  • geo-ml-model-catalog -> provides a common metadata definition for ML models that operate on geospatial data
  • hummingbird -> a library for compiling trained traditional ML models into tensor computations, e.g. scikit learn model to pytorch for fast inference on a GPU
  • deepchecks -> Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort
  • pachyderm -> Data Versioning and Pipelines for MLOps. Read Pachyderm + Label Studio which discusses versioning and lineage of data annotations

AWS

Google Cloud

  • For storage use Cloud Storage (AWS S3 equivalent)
  • For data warehousing use BigQuery (AWS Redshift equivalent). Visualize massive spatial datasets directly in BigQuery using CARTO
  • For model training use Vertex (AWS Sagemaker equivalent)
  • For containerised apps use Cloud Run (AWS App Runner equivalent but can scale to zero)

Microsoft Azure

State of the art engineering

  • Compute and data storage are on the cloud. Read how Planet and Airbus use the cloud
  • Traditional data formats aren't designed for processing on the cloud, so new standards are evolving such as COG and STAC
  • Google Earth Engine and Microsoft Planetary Computer are democratising access to 'planetary scale' compute
  • Google Colab and others are providing free acces to GPU compute to enable training deep learning models
  • No-code platforms and auto-ml are making ML techniques more accessible than ever
  • Serverless compute (e.g. AWS Lambda) mean that managing servers may become a thing of the past
  • Custom hardware is being developed for rapid training and inferencing with deep learning models, both in the datacenter and at the edge
  • Supervised ML methods typically require large annotated datasets, but approaches such as self-supervised and active learning require less or even no annotation
  • Computer vision traditionally delivered high performance image processing on a CPU by using compiled languages like C++, as used by OpenCV for example. The advent of GPUs are changing the paradigm, with alternatives optimised for GPU being created, such as Kornia
  • Whilst the combo of python and keras/tensorflow/pytorch are currently preeminent, new python libraries such as Jax and alternative languages such as Julia are showing serious promise

Web apps

Flask is often used to serve up a simple web app that can expose a ML model

Neural nets in space

Processing on board a satellite allows less data to be downlinked. e.g. super-resolution image might take 8 images to generate, then a single image is downlinked. Other applications include cloud detection and collision avoidance.

model-training-and-deployment's People

Contributors

robmarkcole avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.