Giter VIP home page Giter VIP logo

notebooks-academy's Introduction

CI Linux CI macOS CI Windows Documentation Status PyPI Conda (channel only) Conda Coverage Twitter Downloads

Tip

Deploy AI apps for free on Ploomber Cloud!

Join our community | Newsletter | Contact us | Docs | Blog | Website | YouTube

Ploomber is the fastest way to build data pipelines ⚡️. Use your favorite editor (Jupyter, VSCode, PyCharm) to develop interactively and deploy ☁️ without code changes (Kubernetes, Airflow, AWS Batch, and SLURM). Do you have legacy notebooks? Refactor them into modular pipelines with a single command.

Installation

Compatible with Python 3.7 and higher.

Install with pip:

pip install ploomber

Or with conda:

conda install ploomber -c conda-forge

Getting started

Try the tutorial:

Community

Main Features

⚡️ Get started quickly

A simple YAML API to get started quickly, a powerful Python API for total flexibility.

get-started.mp4

⏱ Shorter development cycles

Automatically cache your pipeline’s previous results and only re-compute tasks that have changed since your last execution.

shorter-cycles.mp4

☁️ Deploy anywhere

Run as a shell script in a single machine or distributively in Kubernetes, Airflow, AWS Batch, or SLURM.

deploy.mp4

📙 Automated migration from legacy notebooks

Bring your old monolithic notebooks, and we’ll automatically convert them into maintainable, modular pipelines.

refactor.mp4

I want to migrate my notebook.

Show me a demo.

Resources

About Ploomber

Ploomber is a big community of data enthusiasts pushing the boundaries of Data Science and Machine Learning tooling.

Whatever your skillset is, you can contribute to our mission. So whether you're a beginner or an experienced professional, you're welcome to join us on this journey!

Click here to know how you can contribute to Ploomber.

notebooks-academy's People

Contributors

e1ha avatar edublancas avatar wsshawn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

notebooks-academy's Issues

prettify landing page

This site is deployed here.

The landing page needs some work.

  • add some CSS, so the div for subscribing to the newsletter looks good
  • Change the links to Twitter and LinkedIn for the actual logos and center them

We're using jupyter-book so check out their docs to see how to do this.

lesson 1

  • finish code examples
  • finish lesson summary
  • ensure building the website does not execute the example notebooks (but still indexes them)
  • record video
  • upload to youtube
  • publish
  • announce

Syllabus

Notebooks Academy: Write Production-Ready Code From Jupyter [DRAFT]

The course teaches how to use Jupyter to develop maintainable and production-ready code.

Please comment on this issue with your feedback! Are we missing any topics?

Lessons

  1. Why? The prototype, then refactor problem
  2. Writing clean notebooks
  3. Version control
  4. Hidden state
  5. Modularization
  6. Refactoring legacy pipelines
  7. Building data pipelines
  8. Integration testing
  9. Debugging
  10. Running pipelines in the cloud
  11. Notebook meta-analysis
  12. Using SQL in Jupyter
  13. Deployment

Format

11 video lessons, 20-30 minutes each.

I'm thinking of doing this a project-based course, so by the end of it, students have a pipeline up and running this dataset looks interesting.

Pre-requisites

  • Experience working with standard open-source tools: Jupyter, pandas, and scikit-learn

Syllabus

1. Why? The prototype, then refactor problem

Introduction to the problem: Developing projects in single notebooks cause a lot of trouble. They are hard to maintain, test and review. However, if we follow some best practices, and with the help of some open-source tools, we can implement a workflow that allows us to go from Jupyter to production instantly.

Related material

2. Writing clean notebooks

This lesson shows best practices for writing clean notebooks (it takes most of its content from the blog post).

Related material

3. Version control

It's challenging to version control Jupyter Notebooks because the .ipynb format is JSON. This lesson shows how to change the underlying format to .py and still interact with those files as notebooks.

Notes

  • Show other alternatives such as nbdime and the Jupyterlab-git plugin
  • Discuss jupytext's pairing feature to store the output in a separate file

Related material

4. Hidden state

Since notebooks are developed interactively, excessive editing often leads to broken notebooks. This lesson introduces notebooks smoke testing: we execute them with a sample of the data on each git push using papermill. It also shows how to set up GitHub Actions.

Related material

5. Modularization

Modularizing code is critical to developing maintainable and testable software. This lesson shows how to create a package to modularize our work, define functions in Python modules, and unit test those functions using pytest.

Notes

  • Show how IPython auto-reloading works
  • Covers pytest basic features: fixtures, parametrizing, testing exceptions

Related material

6. Refactoring legacy pipelines

A lot of existing pipelines live in notebooks. This lesson shows how to refactor a monolithic notebook-based project into a data pipeline.

Related material

7. Building data pipelines

Long notebooks are hard to manage because many variables and code are involved. Breaking down our analysis in multiple steps allows us to collaborate better and test our notebooks.

Notes

  • Why is structure important?
  • Mention advantages of building a data pipeline: can do integration testing, run tasks in parallel

Related material

8. Integration testing

Garbage in, garbage out. Testing for data quality at each stage of our pipeline ensures that we meet a minimum level of data quality. This lesson shows how to do integration testing after executing each notebook.

Related material

9. Debugging

Debugging data pipelines is challenging; however, having a robust unit and integration test suite helps us debug more effectively. This lesson shows how to debug data pipelines by conducting root cause analysis using pytest and the Python debugger.

Notes

  • Show how to debug failing tests
  • Cover Jupyter's visual debugger
  • Show how to use ipdb
  • Debugging with IPython.embed()
  • Debugging code with breakpoints
  • debuglater feature in Ploomber

Related material

10. Running pipelines in the cloud

When working with large datasets, we may want to run our pipeline in the cloud. This lesson shows how to use Ploomber to run a pipeline in AWS and Kubernetes and retrieve results.

11. Notebook meta-analysis

The .ipynb format is self-contained to store code and output. Such output can be anything from text, tables, or images. This lesson shows how to analyze the content of a Jupyter notebook to extract its output to evaluate and compare model experiments.

Related materials

12. Using SQL in Jupyter

Related materials

13. Deployment

This lesson shows how to generate a deployment artifact using a previously trained model to serve predictions.

Notes

  • Show how to use Ploomber's pipeline composition capabilities to create a serving pipeline
  • Discuss the importance of dependency locking
  • Generating a source distribution
  • When to use Docker (and when not to)

Related material

Basic materials

I may record these additional short lessons to cover the basics of dependency management and virtual environments.

Lessons to be considered

  • Profiling notebooks (memory, CPU, GPU)
  • Report generation (nbconvert, quarto)
  • Dashboards (voilá)
  • Technical blogging (Jupyblog)

Optional lessons

Optional lessons I may record.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.