ploomber / notebooks-academy Goto Github PK

View Code? Open in Web Editor NEW

12.0 12.0 3.0 324 KB

📙 Notebooks Academy: Write Production-Ready Code From Jupyter.

Home Page: https://notebooks.academy

License: Apache License 2.0

Jupyter Notebook 91.69% TeX 3.50% Python 3.32% CSS 1.49%

notebooks-academy's Introduction

Tip

Deploy AI apps for free on Ploomber Cloud!

Ploomber is the fastest way to build data pipelines ⚡️. Use your favorite editor (Jupyter, VSCode, PyCharm) to develop interactively and deploy ☁️ without code changes (Kubernetes, Airflow, AWS Batch, and SLURM). Do you have legacy notebooks? Refactor them into modular pipelines with a single command.

Installation

Compatible with Python 3.7 and higher.

Install with pip:

pip install ploomber

Or with conda:

conda install ploomber -c conda-forge

Getting started

Try the tutorial:

Community

Main Features

⚡️ Get started quickly

A simple YAML API to get started quickly, a powerful Python API for total flexibility.

get-started.mp4

⏱ Shorter development cycles

Automatically cache your pipeline’s previous results and only re-compute tasks that have changed since your last execution.

shorter-cycles.mp4

☁️ Deploy anywhere

Run as a shell script in a single machine or distributively in Kubernetes, Airflow, AWS Batch, or SLURM.

deploy.mp4

📙 Automated migration from legacy notebooks

Bring your old monolithic notebooks, and we’ll automatically convert them into maintainable, modular pipelines.

refactor.mp4

I want to migrate my notebook.

Show me a demo.

Resources

About Ploomber

Ploomber is a big community of data enthusiasts pushing the boundaries of Data Science and Machine Learning tooling.

Whatever your skillset is, you can contribute to our mission. So whether you're a beginner or an experienced professional, you're welcome to join us on this journey!

Click here to know how you can contribute to Ploomber.

notebooks-academy's People

Contributors

Stargazers

Watchers

Forkers

e1ha lfunderburk wsshawn

notebooks-academy's Issues

Improving the exerience

A few projects I found that we can use to improve the experience:

tabs: https://github.com/executablebooks/sphinx-tabs
toggle: https://github.com/executablebooks/sphinx-togglebutton
exercises: https://github.com/executablebooks/sphinx-exercise
comments: https://github.com/executablebooks/sphinx-comments
runnable snippets: https://github.com/executablebooks/thebe

prettify landing page

This site is deployed here.

The landing page needs some work.

add some CSS, so the div for subscribing to the newsletter looks good
Change the links to Twitter and LinkedIn for the actual logos and center them

We're using jupyter-book so check out their docs to see how to do this.

lesson 1

Syllabus

Notebooks Academy: Write Production-Ready Code From Jupyter [DRAFT]

The course teaches how to use Jupyter to develop maintainable and production-ready code.

Please comment on this issue with your feedback! Are we missing any topics?

Lessons

Why? The prototype, then refactor problem
Writing clean notebooks
Version control
Hidden state
Modularization
Refactoring legacy pipelines
Building data pipelines
Integration testing
Debugging
Running pipelines in the cloud
Notebook meta-analysis
Using SQL in Jupyter
Deployment

Format

11 video lessons, 20-30 minutes each.

I'm thinking of doing this a project-based course, so by the end of it, students have a pipeline up and running this dataset looks interesting.

Pre-requisites

Experience working with standard open-source tools: Jupyter, pandas, and scikit-learn

Syllabus

1. Why? The prototype, then refactor problem

Introduction to the problem: Developing projects in single notebooks cause a lot of trouble. They are hard to maintain, test and review. However, if we follow some best practices, and with the help of some open-source tools, we can implement a workflow that allows us to go from Jupyter to production instantly.

Related material

2. Writing clean notebooks

This lesson shows best practices for writing clean notebooks (it takes most of its content from the blog post).

Related material

On Writing Clean Notebooks
Soorgeon (for linting and formatting)
https://github.com/jupyter-lsp/jupyterlab-lsp
micro pipelines

3. Version control

It's challenging to version control Jupyter Notebooks because the .ipynb format is JSON. This lesson shows how to change the underlying format to .py and still interact with those files as notebooks.

Notes

Show other alternatives such as nbdime and the Jupyterlab-git plugin
Discuss jupytext's pairing feature to store the output in a separate file

Related material

4. Hidden state

Since notebooks are developed interactively, excessive editing often leads to broken notebooks. This lesson introduces notebooks smoke testing: we execute them with a sample of the data on each git push using papermill. It also shows how to set up GitHub Actions.

Related material

5. Modularization

Modularizing code is critical to developing maintainable and testable software. This lesson shows how to create a package to modularize our work, define functions in Python modules, and unit test those functions using pytest.

Notes

Show how IPython auto-reloading works
Covers pytest basic features: fixtures, parametrizing, testing exceptions

Related material

6. Refactoring legacy pipelines

A lot of existing pipelines live in notebooks. This lesson shows how to refactor a monolithic notebook-based project into a data pipeline.

Related material

7. Building data pipelines

Long notebooks are hard to manage because many variables and code are involved. Breaking down our analysis in multiple steps allows us to collaborate better and test our notebooks.

Notes

Why is structure important?
Mention advantages of building a data pipeline: can do integration testing, run tasks in parallel

Related material

8. Integration testing

Garbage in, garbage out. Testing for data quality at each stage of our pipeline ensures that we meet a minimum level of data quality. This lesson shows how to do integration testing after executing each notebook.

Related material

[Same as lesson 4. Modularization]
Effective Testing for ML projects recording
Effective Testing for ML series
ml-testing repository

9. Debugging

Debugging data pipelines is challenging; however, having a robust unit and integration test suite helps us debug more effectively. This lesson shows how to debug data pipelines by conducting root cause analysis using pytest and the Python debugger.

Notes

Show how to debug failing tests
Cover Jupyter's visual debugger
Show how to use ipdb
Debugging with IPython.embed()
Debugging code with breakpoints
debuglater feature in Ploomber

Related material

Rethinking Software Testing for Data Science @ PyData Global 2020

10. Running pipelines in the cloud

When working with large datasets, we may want to run our pipeline in the cloud. This lesson shows how to use Ploomber to run a pipeline in AWS and Kubernetes and retrieve results.

Soopervisor

11. Notebook meta-analysis

The .ipynb format is self-contained to store code and output. Such output can be anything from text, tables, or images. This lesson shows how to analyze the content of a Jupyter notebook to extract its output to evaluate and compare model experiments.

Related materials

12. Using SQL in Jupyter

Related materials

13. Deployment

This lesson shows how to generate a deployment artifact using a previously trained model to serve predictions.

Notes

Show how to use Ploomber's pipeline composition capabilities to create a serving pipeline
Discuss the importance of dependency locking
Generating a source distribution
When to use Docker (and when not to)

Related material

Basic materials

I may record these additional short lessons to cover the basics of dependency management and virtual environments.

Lessons to be considered

Profiling notebooks (memory, CPU, GPU)
Report generation (nbconvert, quarto)
Dashboards (voilá)
Technical blogging (Jupyblog)

Optional lessons

Optional lessons I may record.

Data lineage using parquet's metadata
Versioning deployment artifacts and models with python-versioneer
Managing dependencies and virtual environments

add opengraph info

this looks useful: https://github.com/wpilibsuite/sphinxext-opengraph

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.