Giter VIP home page Giter VIP logo

dask-tutorial's Introduction

NCAR CISL NSF Logo

NCAR Dask Tutorial

Jupyter Build Made withJupyter Commits

Welcome to NCAR Dask Tutorial!

Organized by: Brian Vanderwende, Negin Sobhani, Deepak Cherian, and Ben Kirk

The materials and notebooks in this tutorial is published as a Jupyter book here. Jupyter Book Badge

Here you will find the tutorial materials from the CISL/CSG Dask Tutorial. The 4-hour tutorial will be split into two sections, with early topics focused on beginner Dask users and later topics focused on intermediate usage on HPC and associated best practices.

This tutorial is open to non-UCAR staff. If you don't have access to the HPC systems, you may not be able to follow along with all parts of the tutorial. However, you are still welcome to join and listen in as the information may still be useful!

Video Recoding: Will be available after the event

Course Outline

  1. Dask Overview
  2. Dask Data Arrays
  3. Dask DataFrames
  4. Dask + Xarray
  5. Dask Schedulers
  6. Dask on HPC Systems
  7. Dask Best Practices

Prerequisites

Before beginning any of the tutorials, it is highly recommended that you have a basic understanding of Python programming and Python libraries such as NumPy, pandas, and Xarray.

⌨️ Getting set up

This tutorial is open to non-UCAR staff. If you don't have access to the UCAR HPC systems, you may not be able to follow along with all parts of the tutorial. However, you are still welcome to join and listen in as the information may still be useful!

This is the preferred way to interact with this tutorial. Users with access to Casper can run the notebooks interactively, and will be able to save their work and pull in new updates. To connect to NCAR JupyterHub, please open this link in a web browser: https://jupyterhub.hpc.ucar.edu/

Next, clone the repository to your local directory:

git clone https://github.com/NCAR/dask-tutorial

Finally, open the notebooks and interact with them. Make sure to choose the "NPL 2023a" kernel.

Local installation instructions

Users without access to the NCAR/UCAR Casper cluster can only run through the first few notebooks. To run the notebooks locally:

First clone this repository to your local machine via:

git clone https://github.com/NCAR/dask-tutorial

Next, download conda (if you haven't already)

If you do not already have the conda package manager installed, please follow the instructions here.

Now, create a conda environment:

Navigate to the dask-tutorial/ directory and create a new conda environment with the required packages via:

cd dask-tutorial
conda env update --file environment.yml

This will create a new conda environment named "dask-tutorial".

Next, activate the environment:

conda activate dask-tutorial

Finally, launch JupyterLab with:

jupyter lab

Contributing

We welcome contributions from the community! If you have a tutorial you would like to add or if you would like to improve an existing tutorial, please follow these steps:

Fork the repository.

Clone the repository to your local machine:

git clone https://github.com/your-username/dask-tutorial-repository.git

Create a new branch for your changes:

git checkout -b my-new-tutorial

Make your changes and commit them:

git add .
git commit -m "Add my new tutorial"

Push your changes to your fork:

git push origin my-new-tutorial

Submit a pull request to the original repository.

Support

If you have any questions or need help with the tutorials, please open a GitHub issue in the repository.

👍 Acknowledgments

  • NCAR CISL/CSG Team
  • ESDS Initiative

License

The tutorials in this repository are released under the MIT License.

dask-tutorial's People

Contributors

dcherian avatar negin513 avatar vanderwb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dask-tutorial's Issues

Updates to materials

Users are still utilizing these materials for learning and have encountered some issues (for CISL folks, see RC-23349).

Notably, use {{ib0}} Infiniband network interface now produces an error on Casper.

Also, the front README page should have the link to the presentation video recording added in.

Opening this issue as a reminder to commit some updates to the above.

Create an agenda for the tutorial

Since there are different topics/aspects that can be covered for the Dask tutorial and we have a limited time we need to define the scope of the tutorial and create a clear agenda for it.

From our conversation with @dcherian and @vanderwb : This going to be a half-day event with topics and sessions for the following audiences:
(a) folks who are Python-aware but essentially total novices to Dask.
(b) those who use Dask regularly but would like optimization guidance and tips/tricks.

The followings are my initial thoughts on what would be appropriate for this meeting:

  • Introductory Dask + Xarray:

    • What is Dask
    • Dask +Xarray
    • dask-backed Xarray objects (lazy computations, actual values)
    • Distributed clusters
    • Extract Dask arrays from Xarray objects and use Dask array directly.
    • Dask Delayed to parallelize any code ( do we have time to include this?)
  • Intermediate Topics:

    • Dask chunking schemes (performance and rechunking)
    • Apply unvectorized functions (apply_unfunc)
    • More advanced collection of custom operations (map_blocks, map_partitions, map_overlap do we have enough time for this?)
    • Blockwise computation

Since this is going to be a half-day tutorial, we need to be cautious of the time.

We are going to solicit more feedback on this from the community and ESDS forum.

HPC modifications from first run

After the first run of this tutorial, the following modifications seem useful in the HPC section:

  1. Make sure viewers can run through the example without YAML config files!
  2. Show comparison of various spill ratios on Casper and give guidance on recommended values.

Would be good to have a new section on analyzing perf metrics in more depth (case study of a real workflow).

More to come!

publicize

I think we should submit PRs to link to this material from the dask docs, and the dask_jobqueue docs (given the HPC angle).

See dask/dask-tutorial#275

Revision 1

An issue for tracking the comments from #5
To do items:

  • Move why Dask above the Dask Components section in overview 1.
  • Checking the array notebook for repetitiveness
  • Add explanation of task graph.
  • Add a better explanation on Client

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.