Machine Learning Augmented Sampling for the Molecular Sciences
CECAM-HQ-EPFL, Lausanne, Switzerland
Machine learning is rapidly becoming an indispensable tool for the computational sciences. As neural networks, generative modeling, and other machine learning techniques enter computational workflows, many important questions about efficiency, accuracy, and performance remain. Efforts to incorporate machine learning techniques to enhance and augment dynamics and sampling are among the most promising avenues for applications of machine learning in the computational molecular sciences , nevertheless, these techniques are still in their infancy. To date, research on this topic has followed two distinct paths: one route relies on generative neural networks to produce samples from a given target distribution  and a distinct strategy aims to enhance sampling or dynamics by biasing dynamics . What is more, the literature on this topic is currently somewhat siloed---similar computational problems arise in applications in molecular dynamics  and nonequilibrium statistical mechanics , quantum mechanics , high-energy physics , and Bayesian estimation , but there is not yet a robust dialogue among the participants in these different literatures.
With this proposed workshop, we seek to facilitate interactions among these areas and to accelerate the dissemination of machine learning strategies for sampling the high-dimensional and multimodal distributions that arise throughout the molecular sciences. The workshop will serve to recap recent advances across communities and reflect on the next steps to tackle several critical challenges in the field. Problems that arise in sampling physical distributions are among the major challenges on which we hope to focus.
- How do we learn non-local sampling procedures to target multiple modes? Currently, there are several strategies for using generative models to sample multimodal distributions, but there is no systematic understanding of the relative performance and contextual advantages of differing techniques.
- Which strategies can be used to leverage learning where no data is available a priori? In many molecular problems, we do not have a large initial training set with which to train a model, so seeking adaptive or active methodologies is a priority.
- What are good measures of success for sampling strategies involving some learning and can we obtain performance guarantees? The objective functions used for training often do not suffice to evaluate the quality of sampling because they lack global information, we hope to examine strategies for rigorous evaluations and comparisons of algorithms.
- How do we ensure these methods scale to very high-dimensional, physically relevant, systems? Most systems of biological or chemical relevance have many degrees of freedom, as such, we will aim to critically evaluate the scaling of machine learning strategies to high-dimensional spaces.
Giuseppe Carleo (EPFL) - Organiser
Juan P. Garrahan (University of Nottingham) - Organiser
Marylou Gabrié (Flatiron Institute) - Organiser
Grant Rotskoff (Stanford University) - Organiser