High Throughput Computing with Dask
High-throughput (task-based) computing is a flexible approach to parallelization. It involves splitting a problem into loosely-coupled tasks. A scheduler then orchestrates the parallel execution of those tasks, allowing programs to adaptively scale their resource usage. Individual tasks may themselves be parallelized using MPI or OpenMP, and the high-throughput approach can therefore enable new levels of scalability.
Dask is a powerful Python tool for task-based computing. The Dask library was originally developed to provide parallel and out-of-core versions of common data analysis routines from data analysis packages such as NumPy and Pandas. However, the flexibility and usefulness of the underlying scheduler has led to extensions that enable users to write custom task-based algorithms, and to execute those algorithms on high-performance computing (HPC) resources.
This workshop will be a series of virtual seminars/tutorials on tools in the Dask HPC ecosystem. The event will run online via Zoom for registered participants ("participate" tab) and it will be live streamed via YouTube at https://youtube.com/playlist?list=PLmhmpa4C4MzZ2_AUSg7Wod62uVwZdw4Rl.
- 21 January 2021, 3pm CET (2pm UTC): Dask - a flexible library for parallel computing in Python
YouTube link: https://youtu.be/Tl8rO-baKuY
- 4 February 2021, 3pm CET (2pm UTC): Dask-Jobqueue - a library that integrates Dask with standard HPC queuing systems, such as SLURM or PBS
YouTube link: https://youtu.be/iNxhHXzmJ1w
- 11 February 2021, 3pm CET (2pm UTC) : Jobqueue-Features - a library that enables functionality aimed at enhancing scalability
YouTube link: https://youtu.be/FpMua8iJeTk
Registration is not required to attend, but registered participants will receive email reminders before each seminar.
David Swenson ( École normale supérieure de Lyon ) - Organiser
Alan O'Cais ( Jülich Supercomputing Centre ) - Organiser