Learning the Collective Variables of Biomolecular Processes

July 10, 2019 to July 12, 2019
Location : INRIA Paris


  • Lucie Delemotte (KTH Science for Life Laboratory, Sweden)
  • Jérôme Hénin (CNRS, IBPC, France)
  • Gerhard Stock (Albert Ludwig University, Freiburg, Germany)
  • Tony Lelievre (INRIA and Ecole des Ponts ParisTech, France)


Laboratoire de Biochimie Théorique




Classical molecular dynamics (MD) simulations of proteins are capable of describing biomolecular structure and dynamics at an atomistic level. This potential combined with advancements in computer hardware and algorithms has led to an ever-growing interest in simulations of increasing size and length. The interpretation of the resulting “Big Data” describing complicated multiscale molecular motion presents new challenges, as microsecond simulations may give rise to numerous complex conformational transitions that require careful statistical analysis [1].
To characterize the process of interest, it is common practice to choose some low-dimensional molecular observables –termed collective variables (CVs)– and consider their mean or distribution (structural analysis) as well as their time evolution (dynamical analysis) [2,3]. In particular, CVs are used to represent the free energy landscape of the system, which reveals the relevant regions of low energy (corresponding to metastable states) as well as the barriers (accounting for transition states) between these regions [4–8]. By describing the molecule’s time evolution on this free energy surface via a Langevin equation, we can directly study the pathways of a biomolecular process [9,10]. Alternatively, we may employ some clustering method to identify the metastable states of the system [11–15] which define a Markov state model that describes protein dynamics in terms of memory-less jumps [16–19]. In a complementary way, CVs are often used in algorithms designed to enhance the sampling of rare events in simulations [20–26]
The construction of CVs requires a dimensionality reduction strategy that can be achieved using linear transformations (e.g., principal component analysis (PCA) [27,28] and time-lagged independent component analysis (TICA) [29]) or nonlinear maps such as kernel PCA, multidimensional scaling and diffusion maps [2,30,31]. Recently, a variety of machine learning approaches have been proposed for this purpose [23,25,26,32–34]. Moreover, adaptive schemes exist that iterate between exploring a CV space and using the new data to refine the reduced description [24,32]. While some of those methods are mature and well-known across many disciplines, their performance depends on the specifics of the molecular process at play, the choice of the coordinate subspace, and the available statistical sampling of the data. Conversely, other methods such as deep learning have only recently been introduced in the molecular simulation field, and still developing rapidly; we still have few real-world applications to gauge their usefulness.
Bringing together experts from basic method development all the way to “real word” applications, the workshop aims to cover the many aspects of the construction and application of CVs. In particular, we will discuss virtues and shortcomings as well as common pitfalls of CV approaches to faithfully represent free energy landscapes, to allow for accurate clustering and the construction of Markov state models, and to facilitate efficient enhanced sampling protocols.

The deadline for applications is set to March 31, 2019.


1. D. E. Shaw et al., Atomic-level characterization of the structural dynamics of proteins, Science 330, 341 (2010).
2. M. A. Rohrdanz, W. Zheng, and C. Clementi, Discovering mountain passes via torchlight: Methods for the definition of reaction coordinates and pathways in complex macromolecular reactions, Annu. Rev. Phys. Chem. 64, 295 (2013).
3. B. Peters, Reaction coordinates and mechanistic hypothesis tests, Annu. Rev. Phys. Chem. 67, 669 (2016).
4. P. G. Bolhuis, C. Dellago, and D. Chandler, Reaction coordinates of biomolecular isomerization, Proc. Natl. Acad. Sci. USA 97, 5877 (2000).
5. A. K. Faradjian and R. Elber, Computing time scales from reaction coordinates by milestoning, J. Chem. Phys. 120, 10880 (2004).
6. R. B. Best and G. Hummer, Reaction coordinates and rates from transition paths, Proc. Natl. Acad. Sci. USA 102, 6732 (2005).
7. S. V. Krivov and M. Karplus, Diffusive reaction dynamics on invariant free energy profiles, Proc. Natl. Acad. Sci. USA 105, 13841 (2008).
8. W. E and E. Vanden-Eijnden, Transition-path theory and path-finding algorithms for the study of rare events, Annu. Rev. Phys. Chem. 61, 391 (2010).
9. O. F. Lange and H. Grubmuller, Collective Langevin dynamics of conformational motions in proteins, J. Chem. Phys. 124, 214903 (2006).
10. N. Schaudinnus, B. Bastian, R. Hegger, and G. Stock, Multidimensional Langevin modeling of nonoverdamped dynamics, Phys. Rev. Lett. 115, 050602 (2015).
11. B. Keller, X. Daura, and W. F. van Gunsteren, Comparing geometric and kinetic cluster algorithms for molecular simulation data, J. Chem. Phys. 132, 074110 (2010).
12. A. Rodriguez and A. Laio, Clustering by fast search and find of density peaks, Science 344, 1492 (2014).
13. F. Sittel and G. Stock, Robust density-based clustering to identify metastable conformational states of proteins, J. Chem. Theory Comput. 12, 2426 (2016).
14. L. Martini, A. Kells, R. Covino, G. Hummer, N.-V. Buchete, and E. Rosta, Variational identification of Markovian transition states, Phys. Rev. X 7, 031060 (2017).
15. A.M. Westerlund and L. Delemotte, Effect of Ca2+ on the promiscuous target-protein binding of calmodulin, PloS Comp. Biol. 14, e1006072 (2018).
16. J. D. Chodera, W. C. Swope, J. W. Pitera, and K. A. Dill, Obtaining long-time protein folding dynamics from short-time molecular dynamics simulations, Multiscale Modeling & Simulation 5, 1214 (2006).
17. G. R. Bowman, K. A. Beauchamp, G. Boxer, and V. S. Pande, Progress and challenges in the automated construction of Markov state models for full protein systems, J. Chem. Phys. 131, 124101 (2009).
18. J.-H. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. D. Chodera, C. Schutte, and F. Noe, Markov models of molecular kinetics: generation and validation, J. Chem. Phys. 134, 174105 (2011).
19. G. R. Bowman, V. S. Pande, and F. Noe, An Introduction to Markov State Models, Springer, Heidelberg, 2013.
20. C. Chipot and A. Pohorille, Free Energy Calculations, Springer, Berlin, 2007.
21. G. Fiorin, M. L. Klein, and J. Henin, Using collective variables to drive molecular dynamics simulations, Mol. Phys. 111, 3345 (2013).
22. G. A. Tribello, M. Bonomi, D. Branduardi, C. Camilloni, and G. Bussi, PLUMED 2: New feathers for an old bird, Comp. Phys. Comm. 185, 604 (2014).
23. R. Galvelis and Y. Sugita, Neural network and nearest neighbor algorithms for enhancing sampling of molecular dynamics, J. Chem. Theory Comput. 13, 2489 (2017).
24. E. Chiavazzo, R. Covino, R. R. Coifman, C. W. Gear, A. S. Georgiou, G. Hummer, and I. G. Kevrekidis, Intrinsic map dynamics exploration for uncharted effective free-energy
landscapes, Proc. Natl. Acad. Sci. USA 114, E5494 (2017).
25. M. M. Sultan, H. K. Wayment-Steele, and V. S. Pande, Transferable neural networks for enhanced sampling of protein dynamics, J. Chem. Theory Comput. 14, 1887 (2018).
26. J. M. L. Ribeiro, P. Bravo, Y. Wang, and P. Tiwary, Reweighted autoencoded variational bayes for enhanced sampling (rave), J. Chem. Phys. 149, 072301 (2018).
27. A. Amadei, A. B. M. Linssen, and H. J. C. Berendsen, Essential dynamics of proteins, Proteins 17, 412 (1993).
28. A. Altis, M. Otten, P. H. Nguyen, R. Hegger, and G. Stock, Construction of the free energy landscape of biomolecules via dihedral angle principal component analysis, J. Chem. Phys. 128, 245102 (2008).
29. G. Perez-Hernandez, F. Paul, T. Giorgino, G. De Fabritiis, and F. Noe, Identification of slow molecular order parameters for Markov model construction, J. Chem. Phys. 139, 015102 (2013).
30. P. Das, M. Moll, H. Stamati, L. E. Kavraki, and C. Clementi, Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction, Proc. Natl. Acad. Sci. USA 103, 9885 (2006).
31. B. Hashemian, D. Millan, and M. Arroyo, Modeling and enhanced sampling of molecular systems with smooth and nonlinear data-driven collective variables, J. Chem. Phys. 139, 214101 (2013).
32. W. Chen, A. R. Tan, and A. L. Ferguson, Collective variable discovery and enhanced sampling using autoencoders: Innovations in network architecture and error function design, J. Chem. Phys. 149, 072312 (2018).
33. A. Mardt, L. Pasquali, H. Wu, and F. Noe, VAMPnets for deep learning of molecular kinetics, Nat. Comm. 9, 5 (2018).
34. S. Brandt, F. Sittel, M. Ernst, and G. Stock, Machine learning of biomolecular reaction coordinates, J. Phys. Chem. Lett. 9, 2144 (2018).