Giter VIP home page Giter VIP logo

checkpoints-tutorial's Introduction

dvc-checkpoints-mnist

This example DVC project demonstrates the different ways to employ Checkpoint Experiments with DVC.

This scenario uses DVCLive to generate checkpoints for iterative model training. The model is a simple convolutional neural network (CNN) classifier trained on the MNIST data of handwritten digits to predict the digit (0-9) in each image.

๐Ÿ”„ Switch between scenarios

This repo has several branches to this that show different methods for using checkpoints on a similar pipeline:

  • The live scenario introduces full-featured checkpoint usage โ€” integrating with DVCLive.
  • The basic scenario uses single-checkpoint experiments to illustrate how checkpoints work in a simple way.
  • The Python-only variation features the make_checkpoint function from DVC's Python API.
  • Contrastingly, the signal file scenario shows how to make your own signal files (applicable to any programming language).
  • Finally, our full pipeline scenario elaborates on the full-featured usage with a more advanced process.

Setup

To try it out for yourself:

  1. Fork the repository and clone to your local workstation.
  2. Install the prerequisites in requirements.txt (if you are using pip, run pip install -r requirements.txt).

Experimenting

Start training the model with dvc exp run. It will train for 10 epochs (you can use Ctrl-C to cancel at any time and still recover the results of the completed epochs), each of which will generate a checkpoint.

Dvclive will track performance at each checkpoint. Open logs.html in your web browser during training to track performance over time (you will need to refresh after each epoch completes to see updates). Metrics will also be logged to .tsv files in the logs directory.

Once the training script completes, you can view the results of the experiment with:

$ dvc exp show
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Experiment    โ”ƒ Created  โ”ƒ step โ”ƒ    acc โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ workspace     โ”‚ -        โ”‚    9 โ”‚ 0.4997 โ”‚
โ”‚ live          โ”‚ 03:43 PM โ”‚    - โ”‚        โ”‚
โ”‚ โ”‚ โ•“ exp-34e55 โ”‚ 03:45 PM โ”‚    9 โ”‚ 0.4997 โ”‚
โ”‚ โ”‚ โ•Ÿ 2fe819e   โ”‚ 03:45 PM โ”‚    8 โ”‚ 0.4394 โ”‚
โ”‚ โ”‚ โ•Ÿ 3da85f8   โ”‚ 03:45 PM โ”‚    7 โ”‚ 0.4329 โ”‚
โ”‚ โ”‚ โ•Ÿ 4f64a8e   โ”‚ 03:44 PM โ”‚    6 โ”‚ 0.4686 โ”‚
โ”‚ โ”‚ โ•Ÿ b9bee58   โ”‚ 03:44 PM โ”‚    5 โ”‚ 0.2973 โ”‚
โ”‚ โ”‚ โ•Ÿ e2c5e8f   โ”‚ 03:44 PM โ”‚    4 โ”‚ 0.4004 โ”‚
โ”‚ โ”‚ โ•Ÿ c202f62   โ”‚ 03:44 PM โ”‚    3 โ”‚ 0.1468 โ”‚
โ”‚ โ”‚ โ•Ÿ eb0ecc4   โ”‚ 03:43 PM โ”‚    2 โ”‚  0.188 โ”‚
โ”‚ โ”‚ โ•Ÿ 28b170f   โ”‚ 03:43 PM โ”‚    1 โ”‚ 0.0904 โ”‚
โ”‚ โ”œโ”€โ•จ 9c705fc   โ”‚ 03:43 PM โ”‚    0 โ”‚ 0.0894 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

You can manage it like any other DVC experiments, including:

  • Run dvc exp run again to continue training from the last checkpoint.
  • Run dvc exp apply [checkpoint_id] to revert to any of the prior checkpoints (which will update the model.pt output file and metrics to that point).
  • Run dvc exp run --reset to drop all the existing checkpoints and start from scratch.

Adding dvclive checkpoints to a DVC project

Using dvclive to add checkpoints to a DVC project requires a few additional lines of code.

In your training script, use dvclive.log() to log metrics and dvclive.next_step() to make a checkpoint with those metrics. See the train.py script for an example:

    # Iterate over training epochs.
    for i in range(1, EPOCHS+1):
        train(model, x_train, y_train)
        torch.save(model.state_dict(), "model.pt")
        # Evaluate and checkpoint.
        metrics = evaluate(model, x_test, y_test)
        for metric, value in metrics.items():
            dvclive.log(metric, value)
        dvclive.next_step()

Then, in dvc.yaml, add the checkpoint: true option to your model output and a live section to your stage output. See dvc.yaml for an example:

stages:
  train:
    cmd: python train.py
    deps:
    - train.py
    outs:
    - model.pt:
        checkpoint: true
    live:
      logs:
        summary: true
        html: true

If you do not already have a dvc.yaml stage, you can use dvc stage add to create it:

$ dvc stage add -n train -d train.py -c model.pt --live logs python train.py

That's it! For users already familiar with logging metrics in DVC, note that you no longer need a metrics section in dvc.yaml since dvclive is already logging metrics.

checkpoints-tutorial's People

Contributors

flippedcoder avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.