Giter VIP home page Giter VIP logo

project_of_the_week_dvc's Introduction

DIY Data Version Control (DVC)

  • Goal:Learn how to use DVC as a tool for versioning data, pipeline managing, experimentation, and many other usages. Get practical experience by using it for your project.

  • Dates: from 22th May to 28th May.

  • Where: #project-of-the-week in DataTalks.Club (get in slack here: https://datatalks.club/slack.html)

For more information about the "Project of the Week" initiative at DataTalks.Club, see README.md.

If you want to receive reminders about this event, sign up here

Technologies

  • VS Code
  • Python
  • Git

Note: this is a suggested list of technologies, you can chose alternatives instead

Plan

This is a proposed plan only, you donโ€™t have to follow it day-by-day.

Day 1 (22 May, Wednesday)

  • Create a GitHub repository.
  • Install DVC in your own OS with [2]
  • Understand and run Data version control capabilities (with custom data for example txt or csv) check out [1] and [3]. Friendly reminder: DVC is not just a version control tool it also contains experimentation, pipelines, and more.
  • Share your progress in Slack and on social media.

Suggested material:

  1. ๐Ÿ—’๏ธWhy and When to use DVC?
  2. ๐Ÿ—’๏ธOfficial Doc for installation
  3. ๐Ÿ“บHands-on with DVC

Found good materials? Create a PR with links!

Day 2 (23 May, Thursday)

  • Create an ML project pipeline that contains a processing, training, and evaluation step. For dataset ideas check the first link in the suggested materials [1]. I would suggest using small datasets and light libraries (sklearn and datasets) remember, the goal is to explore/learn the tool.
  • For ideas on how to split your ML pipeline, you can check the official example: [2]. I made also a simple ml pipeline with a random forest with iris data if you want to copy: [3]
  • Create a params.yml that is going to store important parameters for the processing and training steps of your ML pipeline. Check these examples: Official example [4], mine more simple example [5]
  • Push your changes to GitHub.
  • Share your progress in Slack and on social media.

Notice the pipelines.py scripts do not have any dependency on DVClibrary (only the evaluation part, so you can skip it for now).

Suggested material:

  1. ๐Ÿ’พ List of Dataset/Project ideas
  2. ๐Ÿ’ป DVC Official ml pipeline example
  3. ๐Ÿ’ป My GitHubโ€™s simpler ml pipeline example with iris dataset
  4. ๐Ÿ’ป Params.yml from DVC example
  5. ๐Ÿ’ป Params.yml more simplified

Found good materials? Create a PR with links!

Day 3 (24 May, Friday)

  • Finish preparing your own project if you havenโ€™t from the previous day
  • Perform a version of your dataset with DVC: [1],[2]
  • For configuring the storage you can check [3]. In case you donโ€™t want to spend time setting a remote storage, you can also use a local folder.
  • Play with dvc commands dvc status,dvc add, dvc add (also with your git commands)
  • Push your changes to GitHub.
  • Share your progress in Slack and on social media.

Suggested material:

  1. ๐Ÿ“บVersioning data Official Hands-on
  2. ๐Ÿ“บHands-On with DVC
  3. ๐Ÿ—’๏ธRemote Storage

Found good materials? Create a PR with links!

Day 4 (25 May, Saturday)

  • Try to build pipelines based on the official documentation [1]
  • You can follow the official video tutorial [2]. The video is from 3 years ago and some commands might have changed so make sure you use [1] docs in parallel.
  • Check out the summary in [3] to understand what this DVC feature solves.
  • Push your changes to GitHub.
  • Share your progress in Slack and on social media.

Note: It is important to have your params.yml ready and the dependencies correct in dvc stage add flags. Post your dvc dag if you want ๐Ÿ™‚

Suggested material:

  1. ๐Ÿ—’๏ธData pipelines official doc
  2. ๐Ÿ“บVideo Tutorial Hands-on
  3. ๐Ÿ—’๏ธWhat do DVC pipelines solve

Found good materials? Create a PR with links!

Day 5 (26 May, Sunday)

  • Create an evaluation.py script that prints and plots metrics from your ML model.
  • Install DVClive and adapt the library functions to your script like the official tutorial: [1].
  • Check out the official hands-on tutorial: [2]
  • Compare the git committed code with the local changes (in terms of params, metrics, and plots)
  • Push your changes to GitHub.
  • Share your progress in Slack and on social media.

Note: It is important to have your params.yml ready and the dependencies correct in dvc stage add flags. Post your dvc dag if you want ๐Ÿ™‚

Suggested material:

  1. ๐Ÿ—’๏ธOfficial doc tutorial for metric parameter plots
  2. ๐Ÿ“บHands-on tutorial

Found good materials? Create a PR with links!

Day 6 (27 May, Monday)

  • Continue with developing yesterdayโ€™s task
  • Change some of the parameters in params.yml and run the pipeline. To compare those โ€˜experimentsโ€™ you can use this reference [1]
  • Run and compare different experiments using [2]
  • Push your changes to GitHub.
  • Share your progress in Slack and on social media

Suggested material:

  1. ๐Ÿ—’๏ธOfficial doc tutorial for tracking
  2. ๐Ÿ—’๏ธComparing experiments

Found good materials? Create a PR with links!

Day 7 (28 May, Tuesday)

  • Polish the documentation for your project.
  • Continue exploring more about this topic: Check out how to share Data and Models Hands-on [1], Experiments [2]
  • Push your changes to GitHub.
  • Share your progress in Slack and on social media.
  • Give us feedback.
  • Add the link to your project to this project of the week GitHub page.

Suggested material:

  1. ๐Ÿ“บSharing Data and Models Hands-on
  2. ๐Ÿ—’๏ธSharing Experiments
  3. ๐Ÿ“บModel Registry Tutorial

Projects

List of projects from our participants:

  • ...
  • (Create a PR)

(We will put the projects here after the event finishes)

project_of_the_week_dvc's People

Contributors

thedatadudede avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.