Giter VIP home page Giter VIP logo

machine-learning-operations's Introduction

Image classifier for detecting cats and dogs

Pipeline

architecture

Dataset

To pull and process the dataset run the following command

make data

Version control

The dataset is version controlled with dvc. There exists 3 tags for the datasets

  • all_data: raw and processed dataset of 279 cats and 278 dogs for training and 70 cats and 70 dogs for testing
  • raw_only: raw dataset of 279 cats and 278 dogs for training and 70 cats and 70 dogs for testing
  • expanded_dataset: expands the dataset with ~25k pictures og dogs and cats with a ~80/20 train/test split

To select a specific dataset run the following command before make data

git checkout tag data.dvc

Docker

The training can be containerized with docker. To build the docker images with the included docker file, from the root folder run

docker build -f trainer.dockerfile . -t trainer:latest

This docker image can be passed the following arguments

  • -lr: learning rate of the model. Default = 1e-4
  • -e: Number of epochs to train for. Default = 5
  • -bs: Batch size to use for the dataloader. Default = 16
  • -o: Optimizer to use in training. Default = Adam
  • -pt: Whether or not to use a pretrained ResNet50 CNN as the backbone. Default = True

The training script will log and report performance to wandb. Make sure you are logged into wandb by passing wandb login. Then when running the docker image, you must pass docker-run through wandb. Eg:

wandb docker-run --name experiment5 trainer:latest -lr 0.0001 -e 5 -bs 16 -o Adam -pt True

Project plan

The project is done by 5 members: Abdulrahman Ramadan, Cristina Ailoaei, Jakob Ryttergaard Poulsen, Roza Hasso, Teakosheen Joulak

  1. Dataset: Cats and Dogs image classification: https://www.kaggle.com/datasets/samuelcortinhas/cats-and-dogs-image-classification which consists of 697 files/images of cats and dogs.

  2. The project goal: The goal of the project is to classify a given image whether it includes a cat or a dog object, we want to create a structure repository to train a neural network model logging the results and the performance with reproducible experiments.

  3. Framework: We will use Pytorch Image Models TIMM, because it includes the necessery classes and code for initializing the neural network model.

  4. Deep Learning Model: We will use the Neural Network NN model to classify cats and dogs images

The tentative project plan is to use the following tools

Code structure and versioning

  • Cookiecutter for a structured repository template
  • Git for version control of code
  • DVC for version control of data

Reproducibility

  • Docker for system configuration
  • Conda for Python environment configuration

Experiment logging and monitoring

  • Hydra for hyperparameter specification
  • Wandb for experiment logging and model performance

Code performance and structure

  • Snakeviz for inspecting code performance
  • Using flake8 testing to check for Pep8 compliance in our code
  • Using isort for import structure

project plan

machine-learning-operations's People

Contributors

maqerfoo avatar ailoaeicristina avatar rozahasso avatar abdualrhman avatar teakosheen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.