CECAM - Learning the Collective Variables of Biomolecular ProcessesLearning the Collective Variables of Biomolecular Processes

Classical molecular dynamics (MD) simulations of proteins are capable of describing biomolecular structure and dynamics at an atomistic level. This potential combined with advancements in computer hardware and algorithms has led to an ever-growing interest in simulations of increasing size and length. The interpretation of the resulting “Big Data” describing complicated multiscale molecular motion presents new challenges, as microsecond simulations may give rise to numerous complex conformational transitions that require careful statistical analysis [1].
To characterize the process of interest, it is common practice to choose some low-dimensional molecular observables –termed collective variables (CVs)– and consider their mean or distribution (structural analysis) as well as their time evolution (dynamical analysis) [2,3]. In particular, CVs are used to represent the free energy landscape of the system, which reveals the relevant regions of low energy (corresponding to metastable states) as well as the barriers (accounting for transition states) between these regions [4–8]. By describing the molecule’s time evolution on this free energy surface via a Langevin equation, we can directly study the pathways of a biomolecular process [9,10]. Alternatively, we may employ some clustering method to identify the metastable states of the system [11–15] which define a Markov state model that describes protein dynamics in terms of memory-less jumps [16–19]. In a complementary way, CVs are often used in algorithms designed to enhance the sampling of rare events in simulations [20–26]
The construction of CVs requires a dimensionality reduction strategy that can be achieved using linear transformations (e.g., principal component analysis (PCA) [27,28] and time-lagged independent component analysis (TICA) [29]) or nonlinear maps such as kernel PCA, multidimensional scaling and diffusion maps [2,30,31]. Recently, a variety of machine learning approaches have been proposed for this purpose [23,25,26,32–34]. Moreover, adaptive schemes exist that iterate between exploring a CV space and using the new data to refine the reduced description [24,32]. While some of those methods are mature and well-known across many disciplines, their performance depends on the specifics of the molecular process at play, the choice of the coordinate subspace, and the available statistical sampling of the data. Conversely, other methods such as deep learning have only recently been introduced in the molecular simulation field, and still developing rapidly; we still have few real-world applications to gauge their usefulness.
Bringing together experts from basic method development all the way to “real word” applications, the workshop aims to cover the many aspects of the construction and application of CVs. In particular, we will discuss virtues and shortcomings as well as common pitfalls of CV approaches to faithfully represent free energy landscapes, to allow for accurate clustering and the construction of Markov state models, and to facilitate efficient enhanced sampling protocols.

The deadline for applications is set to March 31, 2019. Please send an abstract with your application. There will be space for posters. A small number of abstracts will be selected for contributed talks.

Learning the Collective Variables of Biomolecular Processes

Location: INRIA Paris

Organisers

References