ECAM State-of-the Art Workshop: Large scale activated event simulations
- Christoph Dellago (University of Vienna, Austria)
- Peter Bolhuis (University of Amsterdam, The Netherlands)
- Gerhard Kahl (Vienna University of Technology, Austria)
Running on powerful computers, large-scale molecular dynamics (MD) simulations are used routinely to simulate systems of millions of atoms providing crucial insights on the atomistic level of a variety of processes of interest in physics, materials science, chemistry and biology. For instance, MD simulations are extensively used to study the dynamics and interactions of proteins, understand the properties of solutions or investigate transport in and on solids. From a technological point of view, molecular dynamics simulations play an important role in many fields such as drug development, the discovery of new materials, oil extraction or energy production. Indeed, enormous amounts of data are produced every day by molecular dynamics simulations running on high performance computers around the world and one of the big challenges related to such simulations is to make sense of the data and obtain mechanistic understanding in terms of low-dimensional models that capture the crucial features of the processes under study. Another central challenge is related to the time scale problem often affecting molecular dynamics simulations. More specifically, despite the exponential increase in computing power witnessed during the last decades and the development of efficient molecular dynamics algorithms, many processes are characterized by typical time scales that are still far beyond the reach of current computational capabilities. Addressing such time scale problems and developing scientific software able to overcome them is one of the central goals of Work Package 1 (WP1-Classical Molecular Dynamics) of the ECAM-Project.
Three fundamental problems are intimately tied to the time scale problem of classical molecular dynamics simulation:
1) The calculation of the populations of metastable states of an equilibrium system. Such populations can be expressed in terms of free energies and hence this problem boils down to the efficient calculation of free energies.
2) The sampling of transition pathways between long-lived (meta)stable states and the calculation of reaction rate constants.
3) The extraction of useful mechanistic information from the simulation data and the construction of low-dimensional models that capture the essential features of the process under study. Such models serve as the basis for the definition of reaction coordinates that enable in-depth studies of the process at hand, e.g. by computing the free energy and kinetics.
The central goal of this workshop is to review new algorithmic developments that address the computational challenges mentioned above with a particular emphasis on implications for industrial applications. In particular, the workshop aims at identifying software modules that should be developed to make efficient and scalable algorithms available to the academic and industrial community. Another goal of the workshop is to identify specific collaboration projects with industrial partners. A dedicated half-day session will be organized specifically for this purpose. To establish the needs of the community and lay out possible directions for development, we will bring together a diverse group of people including software developers, users of HPC infrastructure and industrial researchers.
The proposed workshop is a follow-up of the first ECAM State-of-the-art Workshop of WP1, which took place in the summer of 2016 at the Lorentz Center in Leiden, The Netherlands. At this workshop, participants reviewed current rare event methods including path sampling, milestoning, metadynamics, Markov state modeling, diffusion maps, dimension reduction, reaction coordinate optimization, machine learning, and unsupervised cluster methods, and explored ways to improve these methods. Particular attention was devoted to the integration of popular MD packages such as Gromacs, NAMD, Charmm, Amber, ACEMD, MOIL, LAMMPS with enhanced analysis and advanced sampling tools including Plumed (a package for enhanced sampling and collective variable analysis), pyEmma, and MSMBuilder (packages for Markov sate model analysis).
Notwithstanding the great capabilities of existing methods and software, several challenges remain and will be discussed at the proposed workshop in Vienna:
- Extracting order parameters from molecular simulations to construct low dimensional models. This point is important because there is no straightforward recipe to reduce the dimensions to meaningful variables and progress in this area is urgently needed.
- Efficient Methods for sampling rare pathways. Here the goal is to create the molecular trajectory data using advanced sampling algorithms.
- Machine learning algorithms. Automatic analysis methods may offer new ways to guide simulations and construct reaction coordinates from molecular trajectories.
- Better ways to integrate simulations and experiments. It is important to connect the proposed computational methods to experimental probes and integrate experimental information into the analysis of computer simulation data.
More specifically, questions that will be addressed at the proposed workshop include:
1. How to obtain the best low dimension model for the process of interest?
2. How can we use machine learning to find collective variables and reaction coordinates?
3. When can reaction coordinates, which often constitute the slow variables of a process, be used to coarse-grain the dynamics? When not?
4. What if multiple transitions are important? Do we resort to kinetic networks or use multiple reaction coordinates? Should one identify a single (possibly complicated) reaction coordinate, or try to construct a Markov state model (MSM) using many metastable states?
5. When is it possible to reduce a complex problem to diffusion on a one dimensional free energy landscape, and when do we need a network Markov model?
6. How can experiments test reaction coordinate predictions? How do we connect to experiments?
7. How can extreme-scale computational resources be used efficiently to address these questions?
8. How can progress in these questions help to address problems of industrial interest?