Skip to content

Dask Tutorial

February 3, 2023

CISL’s Consulting Services Group and the NCAR Earth System Data Science (ESDS) Initiative held a half-day tutorial for users interested in effective use of Dask on HPC resources like Casper and Cheyenne. The four-hour tutorial is split into two sections, with early topics focused on novice Dask users and later topics focused on intermediate usage on HPC and associated best practices. The knowledge areas covered include (but are not limited to):

Outline

  • Beginner section

    • High-level collections including dask.array and dask.dataframe
    • Distributed Dask clusters using HPC job schedulers
    • Earth Science data analysis using Dask with Xarray
    • Using the Dask dashboard to understand your computation
  • Intermediate section

    • Optimizing the number of workers and memory allocation
    • Choosing appropriate chunk shapes and sizes for Dask collections
    • Querying resource usage and debugging error

This tutorial is open to non-UCAR staff. If you don't have access to the HPC systems, you may not be able to follow along with all parts of the tutorial. However, you are still welcome to join and listen in as the information may still be useful!

Part 1

Part 2