Data Modeling and Computation: Capturing Biomolecular Processes
CECAM-HQ-EPFL, Lausanne, Switzerland
Understanding the molecular basis of life requires synergy of molecular simulations, experiments, data analysis, and theory. There is however a significant disconnect among these disciplines, with, e. g., oversimplified phenomenological theories commonly employed to analyze both experimental and simulation data. This workshop is motivated by recent theoretical, computational, and experimental advances that offer novel opportunities for connecting these disciplines. These advances include achieving overlap of experimental and simulation timescales [Yoo 2020, Makarov 2021, Mazal 2019, Netz 2021, Elber 2020], development of data driven computational approaches to single molecule signals [Tavakoli 2020; Kilic 2021, Yoo 2020, Mazal 2019, Komatsuzaki 2019], advances in machine learning and its applications to molecular data [Glielmo 2021, Noe 2020, Rotskoff 2918], and breakthroughs in fundamental theory of molecular kinetics and stochastic dynamics [Satija 2020, Best 2016, Thorneywork 2020, Elber 2020, Lapolla 2021].
Tools from Data Science including Neural Nets, Bayesian and likelihood-based methods, and even Information Theory collectively offer new approaches to tackling longstanding challenges present in the world of biomolecular dynamics. This workshop will promote synergy between those who simulate the dynamics of molecules and those who infer their behaviors from raw data, thereby facilitating interaction between computational scientists and experimentalists who can capture dynamics at state-of-the-art spatiotemporal scales. While temporal scales available to molecular simulations remain severely limited from above and both spatial and temporal resolution of single-molecule data remains severely limited from below, this workshop is specifically intended to bridge this gap and leverage novel tools to unravel life at the length scale (single molecule) and across time scales at which it occurs. Specific open questions/focus themes that will be explored in the workshop are as follows:
- Can thermodynamic and kinetic properties of a molecular system be reliably estimated from molecular trajectories (simulated or experimental) given sparse sampling inherent to computational limitations and experimental constraints (e.g., because a molecular motor detaches from its track or a dye photobleaches)? Moreover, can accurate low-dimensional models capturing the stochastic evolution of single-molecule signals be inferred from such sparse signals? Such models will likely need to go beyond currently used phenomenological models (such as, e.g., one-dimensional diffusion along a reaction coordinate) to capture the complex assembly, say, of many biomolecular actors.
- Recent studies [Thorneywork 2020, Satija 2020] indicated that distributions of first passage times for molecular transitions contain information about the number of kinetic intermediates and the underlying dimensionality of the process. This motivates the following more general Inverse Problem: what information regarding underlying high-dimensional molecular processes is encoded in the observed low-dimensional signals (e.g., often just single photon arrivals)?
- What are optimal ways of combining data from molecular simulations and experiments?
- How can we combine the relative strengths and weaknesses of neural net approaches and, more traditional, computational statistics approaches to unravel biomolecular events down to microsecond timescales?
Irina Gopich (National Institutes of Health) - Organiser
DMITRII MAKAROV (University of Texas) - Organiser
Steve Presse (Arizona State University) - Organiser