Extended Software Development Workshop: Intelligent high throughput computing for scientific applications
- David Swenson (Universiteit van Amsterdam, The Netherlands)
- Alan O'Cais (Jülich Supercomputing Centre, Germany)
High throughput computing (HTC) is a computing paradigm focused on the execution of many loosely coupled tasks. It is a useful and general approach to parallelizing (nearly) embarrassingly parallel problems. Distributed computing middleware, such as Dask.distributed  or COMP Superscalar (COMPSs) , can include tools to facilitate HTC, although there may be challenges extending such approaches to the exascale.
Across scientific fields, HTC is becoming a necessary approach in order to fully utilize next-generation computer hardware. As an example, consider molecular dynamics: Excellent work over the years has developed software that can simulate a single trajectory very efficiently using massive parallelization . Unfortunately, for a fixed number of atoms, the extent of possible parallelization is limited. However, many methods, including semiclassical approaches to quantum dynamics [4,5] and some approaches to rare events [6,7], require running thousands of independent molecular dynamics trajectories. Intelligent HTC, which can treat each trajectory as a task and manage data dependencies between tasks, provides a way to run these simulations on hardware up to the exascale, thus opening the possibility of studying previously intractable systems.
In practice, many scientific programmers are not aware of the range of middleware to facilitate parallel programming. When HTC-like approaches are implemented as part of a scientific software project, they are often done manually, or through custom scripts to manage SSH, or by running separate jobs and manually collating the results. Using the intelligent high-level approaches enabled by distributed computing middleware will simplify and speed up development.
Furthermore, middleware frameworks can meet the needs of many different computing infrastructures. For example, in addition to working within a single job on a cluster, Dask.distributed and COMPSs include support for working through a cluster's queueing system or working on a distributed grid. Moreover, architecting a software package such that it can take advantage of one HTC library will make it easy to use other HTC middleware. Having all of these possibilities immediately available will enable developers to quickly create software that can meet the needs of many users.
This E-CAM Extended Software Development Workshop (ESDW) will focus on intelligent HTC as a technique that crosses many domains within the molecular simulation community in general and the E-CAM community in particular. Teaching developers how to incorporate middleware for HTC matches E-CAM's goal of training scientific developers on the use of more sophisticated software development tools and techniques.
This E-CAM extended software development workshop (ESDW) will focus on intelligent HTC, with the primary goals being:
1. To help scientific developers interface their software with HTC middleware.
2. To benchmark, and ideally improve, the performance of HTC middleware as applications approach extreme scale.
This workshop will aim to produce four or more software modules related to intelligent HTC, and to submit them, with their documentation, to the E-CAM software module repository. These will include modules adding HTC support to existing computational chemistry codes, where the participants will bring the codes they are developing. They may also include modules adding new middleware or adding features to existing middleware that facilitate the use of HTC by the computational chemistry community. This workshop will involve training both in the general topic of designing software to interface with HTC libraries, and in the details of interfacing with specific middleware packages.
The range of use for intelligent HTC in scientific programs is broad. For example, intelligent HTC can be used to select and run many single-point electronic structure calculations in order to develop approximate potential energy surfaces. Even more examples can be found in the wide range of methods that require many trajectories, where each trajectory can be treated as a task, such as:
* rare events methods, like transition interface sampling, weighted ensemble, committor analysis, and variants of the Bennett-Chandler reactive flux method
* semiclassical methods, including the phase integration method and the semiclassical initial value representation
* adaptive sampling methods for Markov state model generation
* approaches such as nested sampling, which use many short trajectories to estimate partition functions
The challenge is that most developers of scientific software are not familiar with the way such packages can simplify their development process, and the packages that exist may not scale to exascale. This workshop will introduce scientific software developers to useful middleware packages, improve scaling, and provide an opportunity for scientific developers to add support for HTC to their codes.
Major topics that will be covered include:
* Concepts of HTC; how to structure code for HTC
* Accessing computational resources to use HTC
* Interfacing existing C/C++/Fortran code with Python libraries
* Specifics of interfacing with Dask.distributed/COMPSs
* Challenges in using existing middleware at extreme scale
 Dask Development Team (2016). Dask: Library for dynamic task scheduling
 R.M. Badia et al. SoftwareX 3-4, 32 (2015).
 S. Plimpton. J. Comput. Phys. 117, 1 (1995).
 W.H. Miller. J. Chem. Phys. 105, 2942 (2001).
 J. Beutier et al. J. Chem. Phys. 141, 084102 (2014).
 Du et al. J. Chem. Phys. 108, 334 (1998).
 G.A. Huber and S. Kim. Biophys. J. 70, 97 (1996).