Challenges in Large Scale Biomolecular Simulations 2019: Bridging Theory and Experiments
- Samuela Pasquali (Paris Descartes University, France)
- Ivan Coluzza (Center for Cooperative Research in Biomaterials, Spain)
- Fabio Sterpone (IBPC and University Paris, France)
- Leulliot Nicolas (Université Paris Descartes, France)
- Yassmine Chebaro (CNRS, Institut de Génétique et de Biologie Moléculaire et Cellulaire, Strasbourg, France)
- Tamar Schlick (New York University, USA)
- Elisa Frezza (Université Paris Descartes, France)
Motivation and novelty of the proposal
Deciphering the molecular mechanisms that govern disease progression in common conditions like human cancers and other common diseases represents a grand challenge for science. Understanding such mechanisms also holds great promise for improving our quality of life and curing or alleviating such diseases. Such molecular mechanisms involve proteins, nucleic acids, and other biological molecules that are the cell's workhorses. This extremely complex problem presents characteristic length scales spanning several orders of magnitude, from small molecules regulating cell functions to cell ensembles responsible for the generation of tissues and organs working together as an organism [Lodish 2010, Wright 2015, Zhang 2017]. Moreover, essential processes in biology are carried out by large macromolecular assemblies (like protein-nucleic complexes) whose structures are often difficult to determine by traditional methods (X-ray crystallography and NMR). To overcome these limitations, more advanced theoretical and experimental approaches are needed. In the last decade, researchers have increasingly developed integrative approaches combining information from different types of experiments, physical theories, and statistical analyses to compute structural models of large biological macromolecules and their assemblies [Albert 2007, Wang 2009, Kim 2018]. All this suggests that a stronger link between theoreticians, computational chemical physicists, bioinformaticians and experimentalists is highly desirable to build robust methods capable of enhancing the understanding of the cell’s functioning. This workshop aims to collect several experts in various fields to allow for a wide and up-to-date overview of the current bioinformatics tools and simulation techniques and for a presentation of the most recently available experimental results and experimental methods. The meeting will be an opportunity to build an interdisciplinary community to bring new insights into complex biological systems and to boost the development of an exchange program between Europe and the USA in the framework of this consortium.
State of the art
The essential challenge posed by human health requires the understanding of the cell’s machinery at a molecular level. The interplay among proteins, DNA and RNA is key for vital functions such as DNA transcription, translation and epigenetics. To understand these processes many experimental techniques are put in action, spanning a very wide range in terms of spatial resolution, temporal resolution and level of detail with which they can observe the macromolecules. This is necessary if we consider that DNA alone spans 9 orders of magnitude in space, with regulating mechanisms occurring at the level of single base pairs, all the way up to chromosomes and with times ranging from picoseconds for base-pairs formation, to hours for large structural rearrangements such as those of G-quadruplexes on the telomeres of chromosomes.
As for all branches of science, theoretical (modeling) and experimental approaches have been developed over the years to study these systems, and, with no surprise, the most successful strategies are those for which the two approaches come together to give a full picture of the system [Lasker 2012, Pérard 2013]. Indeed, because of the diversity and complementarity of the experimental techniques, molecular modeling becomes a necessary tool to decode experimental data, bridging different sources of information and building a coherent structural model compatible with experiments.
From the modeling perspective, to understand a molecular structure, and have hints on its function, the starting point is the molecule's sequence. Over the last 30 years a multitude of bioinformatic tools have been developed to exploit this information to infer protein and nucleic acids structures [Rother 2011, Webb 2016]. These methods, however, based on relatively simple and empirical scoring functions, find their limitations for large and complex molecules [Miao 2017, Lensink 2018]. Physical models, on the other hand, provide a more realistic picture of the molecule and, despite being more computationally expensive, are better suited for the study of large, complex systems. Once more, the combination of the two approaches is often beneficial [Lasker 2012, Olsson 2017], especially if either the bioinformatics or the physical model, or both, are able to incorporate experimental data from the start.
Current bioinformatics methods analyse the large amount of protein and nucleic acid sequence evolution data, searching for conservation or correlation patterns [Berman 2000, Cheng 2015, Finn 2016, Ho 2012, Lever 2010, McGinnis 2004, Marks 2012]. The significance of amino acids and nucleotide evolution covariance is based on the hypothesis that mutations of interacting residues are correlated. Hence, single point mutations would not conserve the molecule's stability, but multiple alterations must occur simultaneously among the interacting residues [Kortemme 2004, Pires 2017]. Co-evolution events could involve residues that are crucial for the activity of a protein (e.g. catalytic site residues), for the stability of the native structure (e.g. hydrophobic core residues) or in some cases for both. For single stranded nucleic acids, co-evolution has been the basis to infer the secondary structures of large ribosomal RNAs and it's commonly used to propose possible RNA secondary structures.
For nucleic acids, other bioinformatic methods based on the nearest-neighbor thermodynamic model, SantaLucia 1998], are used to propose secondary structures for smaller systems [Turner 2010, Chou 2016, nowadays accounting for chemical probing reactivity maps [Low 2010] that test experimentally whether a nucleotide is involved in a base-pair or not.
Bioinformatic methods therefore contribute to the understanding of a biomolecule providing a substantial reduction of the conformational space to be explored, based on experimental data, both from sequence analysis or from direct structural probing. This greatly simplifies the task of physical modeling, whose main drawback is the extremely large conformational space to be explored.
From the early days physical modeling played an important role in the two principal experimental method for high-resolution: X-Ray crystallography, requiring an initial model for phasing, and nuclear magnetic resonance (NMR), requiring a multi-dimensional minimization process on a model to infer possible structures.
With the current capabilities of molecular simulations (MD) the contribution of modeling can now go much further than structure refinement. MD simulations have evolved from the first 1-microsecond simulation of a villin-headpiece in 1998 [Duan 1988] to the current simulations of much larger biomolecular systems (e.g., an entire satellite mosaic virus with one million atoms [Freddolino 2006]) as well as longer time frames (e.g. B-DNA dodecamer [Pérez 2007], ubiquitin [Maragakis 2008], and beta2 AR protein receptor [Dror 2009]) for over 1 microsecond, and small proteins for 1 millisecond with specialized MD programs and dedicated supercomputers [Shaw 2010]. For some proteins, fully atomistic folding simulations can be very successful [Day 2010, Freddolino 2009, Voelz 2010], and similarly for nucleic acids, double helical DNA in particular [Schlick 2009, Clauvelin 2015, Collepardo 2015]. At the same time, coarse-grained models and combinations of enhanced sampling methods are emerging as viable alternatives for simulating complex biomolecular systems [Coluzza 2014, Lei 2007, Maisuradze 2010, Klein 2008, Schlick 2009]. Various scale coarse-graining allowed to address fundamental questions in protein folding with applications to diseases, such as Alzheimer [Sterpone 2014], RNA folding [Yasselman 2016, Denesyuk 2013, Cragnolini 2015], DNA assemblies and topologies [Ouldridge 2010], protein- protein interactions [Baaden 2013], DNA chromatin structure and condensation [Grigoryev 2016, Collepardo 2015, Bascom 2016, Bascom 2018], and many others. The quantitative accuracy reached by all the description levels allows for a flux of information from the atomistic detail up to complex simulations of cellular mechanisms done with event driven algorithms, up to simulations of whole cells [Dans 2016].
In recent years all these modeling and simulation techniques started to be coupled to experimental data in order to obtain an understanding of the biomolecular systems from an atomistic description all the way up to the meso-scale. For example, simulations have been used to obtain an atomic resolution structure for data coming from low-resolution techniques such as Small Angle X-ray Scattering (SAXS) or Cryo-Electron Microscopy (Cyo-EM) [Lasker 2012, Kim 2018] or to make sense of reactivity maps of SHAPE data and other chemical probing for single stranded RNA molecules [Pinamonti 2015, Kirmizialtin 2015].
Previous workshops have highlighted three main areas of research in relation to the simulation of large biomolecules:
· Model building
It comprises the development of models at different scales from atomistic to mesoscopic.
At present, atomistic force fields for proteins appear to have reached a satisfactory level and are indeed used for long simulations of large systems, while nucleic acids force fields are still an active area of development [Bergonzo 2015, Ivani 2016, Šponer 2018], in particular for the study of systems departing from double helical DNA. A variety of coarse-grained models of different resolution have been developed for both proteins and nucleic acids for folding and rational design [Coluzza 2014, Collepardo 2015, Rao 2017, Ozer 2015]. Similarly, mesoscopic models are able to address the dynamics of proteins such as molecular motors as a whole, adopting a continuum description of the system, or study the properties of long stretches of DNA [Hanson 2015]. Winning strategies in model building are integrating different levels of description for the systems into multi-scale simulations [Sterpone 2018]
Simulations of large macromolecular objects require the use and further development of enhanced sampling techniques [Laio 2002, Nguyen 2013, Sugita 1999]. When the initial and final states are known, path sampling and biased dynamics are efficient tools to study the transition and unveil transition pathways, kinetic barriers and metastable states [Cazals 2015, Joseph 2017]. Experimental information can also be integrated into simulations, in particular by coarse-grained and mesoscopic models, limiting the space to be explored to experimentally compatible conformations [Pitera 2012, White 2014]. Lately, simulations focus more on generating ensembles of conformations rather than on obtaining a single structural prediction, exploiting their ability to generate a multitude of possible states for a given system to be compared to the different experimental data.
· Analysis tools
All the non-standard models and methods described above require a specific treatment of the data they generate through trajectory descriptors, order parameters, topology and architecture descriptor [Humphrey 1996]. New technologies open the way to innovative tools to analyze simulation data with the interplay between state-of-the-art visualization tools (3D, virtual reality,...) [Doutreligne 2015, Mazzanti 2017] and embedded analysis, allowing to integrate at the same time simulation and experimental data on one single platform [http://www.baaden.ibpc.fr/umol/].
At this stage, the interplay between experiments and simulation opens to more opportunities than ever before. As both experimental and simulations methods are increasing dramatically the amount of data that they can generate in little time, it is necessary to build robust methods capable of exploiting these informations, that go beyond the proof of principle and ad hoc developments for specific systems or techniques.
Concluding discussions of simulation meetings often highlight the need to tighten the links between experiments and simulations. Theoreticians and experimentalists rarely have the chance to come together and exchange their point of view on the common problem of large molecular systems. With this workshop, we intend to create a long lasting discussion table to make simulations a tools available to experimentalists both through collaborative efforts and through the developments of new integrative software.
[Albert 2007] F. Alber, S. Dokudovskaya, L. Veenhoff, W. Zhang, J. Kipper, D. Devos, A. Suprapto, O. Karni-Schmidt, R. Williams, B. Chait, B. et al. Nature 450:695 (2007).
[Baaden 2013] M. Baaden and S. J. Marrink, Current Opinion in Structural Biology 23:878 (2013).
[Bascom 2016] G. Bascom, K. Sanbonmatsu, and T. Schlick, J. Phys. Chem. B Special Issue: J.
Andrew McCammon Festschrift, ,120 : 8642--8653 (2016).
[Bascom 2018] G. Bascom and T. Schlick, Special Memorial issue dedicated to Jorg Langowski, Biophys. J. 114: 2376 (2018).
[Bergonzo 1025] C Bergonzo, NM Henriksen, DR Roe, TE Cheatham, RNA 21: 1578 (2015)
[Berman 2000] Berman, H. M et al. Nucleic Acids Research, 28(1), 235–242 (2000)
[Chou 2016] F.-C. Chou, W. Kladwang, K. Kappel, R. Das Proc. Natl. Acad. Sci. USA 113:8430 (2016)
[Clauvelin 2015] N. Clauvelin and W. K. Olson, Biophysical Journal 108:399a (2015).
[Cazals 2015] F. Cazal et al., J. Comp.l Chem. 36:16 (2015)
[Cheng 2014] Cheng, R. R., Morcos, F., Levine, H., & Onuchic, J. N. PNAS 111, E563–E571 (2015)
[Cheng 2015] Cheng, C.Y., Chou, F.-C., and Das, R. (2015), Methods in Enzymology 553:35-64
[Collepardo 2015] Collepardo-Guevara R, Portella G, Vendruscolo M, Frenkel D, Schlick T, Orozco M., JACS 137:10205 (2015)
[Coluzza 2014] I. Coluzza PLoS One 9(12):e112852. (2014)
[Cragnolini 2015] T. Cragnolini, Y. Laurin, P. Derreumaux, and S. Pasquali, JCTC 11:3510 (2015)
[Dans 2016] P. D. Dans, J. Walther, H. Gomez, M. Orozco Curr. Opin. Struct. Biol. 37:29 (2016)
[Day 2010] R. Day, D. Paschek, and A. E. Garcia, Proteins: Structure, Function, and Bioinformatics 78:1889 (2010).
[De Juan 2013] De Juan, D., Pazos, F., & Valencia, A. Nature Reviews Genetics, 14(4), 249 (2013)
[Denesyuk 2013] N. A. Denesyuk and D. Thirumalai, J. Phys. Chem. B, 2013, 117:4901 (2013)
[Doutreligne 2015] S. Doutreligne, C. Gageat, T. Cragnolini, A. Taly, S. Pasquali, P. Derreumaux, M. Baaden, Virtual and Augmented Reality for Molecular Science, 2015 IEEE 1st International Workshop on. IEEE. 1–6. (2015)
[Dror 2009] R. O. Dror, D. H. Arlow, D. W. Borhani, M. Ø Jensen, S. Piana, and D. E. Shaw, PNAS 106:4689 (2009).
[Duan 1998] Y. Duan and P. A. Kollman, Science 282:740 (1998).
[Freddolino 2006] P. L. Freddolino, A. S. Arkhipov, S. B. Larson, A. McPherson, and K. Schulten, Structure 14:437 (2006).
[Freddolino 2009] P. L. Freddolino and K. Schulten, Biophysical journal 97:2338 (2009).
[Finn 2016] Finn et a. Nucleic Acids Research, 44(D1), D279–D285 (2016)
[Grigoryev 2016] S. Grigoryev, G. Bascom, J. M. Buckwalter, M. Schubert, C. L. Woodcock, and T. Schlick, PNAS 113 1238--1243 (2016).
[Hales 2017] J. Haleš, A. Héliou, J. Maňuch, Y. Ponty, and L. Stacho, Algorithmica, Springer Verlag, 79:835 (2017)
[Ho 2012] Ho, B. K., Perahia, D., & Buckle, A. M. Current Opinion in Structural Biology, 22(3), 386–393 (2012)
[Humphrey 1996] Humphrey, W., Dalke, A. and Schulten, K., J. Molec. Graphics, 14:33 (1996)
[Ivani 2016] I. Ivani et al. Nature Methods 13:55 (2016)
[Jack 1978] A. Jack and M. Levitt, Acta Cryst. A 34:931 (1978).
[Joseph 2017] J.A. Joseph, K. Röder K, D. Chakraborty, R.G. Mantell, D.J. Wales, Chemical Communications 53:6974 (2017)
[Kalinin 2012] S. Kalinin, T. Peulen, S. Sindbert, P.J. Rothwell, S. Berger, T. Restle, R.S. Goody, H. Gohlke, C.A.M. Seidel Nat. Meth. 9:1218 (2012)
[Kim 2018] S. J. Kim et al. Nature 555 :475 (2018)
[Kirmizialtin 2015] S. Kirmizialtin, S. P. Hennelly, A. Schug, J. N. Onuchic, K. Y. Sanbonmatsu Methods Enzymol. 553:15(2015)
[Klein 2008] M. L. Klein and W. Shinoda, Science 321:798 (2008).
[Konnert 1980] J. H. Konnert and W. A. Hendrickson, Acta Cryst. A 36:344 (1980).
[Kortemme 2004] Kortemme, T., Joachimiak, L. A., Bullock, A. N., Schuler, A. D., Stoddard, B. L., & Baker, D. Nature Structural & Molecular Biology, 11(4), 371–379 (2004)
[Laio 2002] A. Laio and M. Parrinello, PNAS 99:12562 (2002)
[Lasker 2012] K. Lasker, F. Forster, Stefan Bohn, Thomas Walzthoeni, Elizabeth Villa, Pia Unverdorben, Florian Beckc, Ruedi Aebersold, Andrej Sali,, and Wolfgang Baumeister, Proc. Natl. Acad. Sci. USA 109:1380 (2012)
[Lei 2007] H. Lei and Y. Duan, Current opinion in structural biology 17:187 (2007).
[Lensink 2018] M F. Lensink, S. Velankar; M. Baek, L. Heo C. Seok, S. J. Wodak Protein 86:257 (2018)
[Lever 2010] Lever, E., & Sheer, The Journal of Pathology, 220, 114–125 (2010)
[Lodish 2010] H. Lodish, A. Berk, S. L. Zipursky, P. Matsudaira, D. Baltimore, J. Darnell Molecular Cell Biology, New York: W.H. Freeman (2000).
[Low 2010] J. T. Low, K. M. Weeks Methods 52:150 (2010)
[Maisuradze 2010] G. G. Maisuradze et al. , J. Phys. Chem. A 114: 4471 (2010).
[Maragakis 2008] P. Maragakis et al. The Journal of Physical Chemistry B 112:6155 (2008).
[Marks 2012] Marks, D. S., Hopf, T. A., & Sander, C. Nature Biotechnology, 30(11), 1072 (2012).
[Mazzanti 2017] L. Mazzanti, S. Doutreligne, C. Gageat, P. Derreumaux, A. Taly, M. Baaden, S. Pasquali, Biophys J. 2017 pii: S0006-3495 (2017)
[McCammon 1977] J. A. McCammon, B. R. Gelin, and M. Karplus, Nature 267:585 (1977).
[ McGinnis 2004] McGinnis, S., & Madden, T. L. Nucleic Acids Research, 32(WEB SERVER ISS.), 20–25 (2004)
[Miao 2017] Z. Miao RNA 23:655 (2017)
[Nguyen 2013] P.H. Nguyen, Y. Okamoto, and P. Derreumaux, J. Chem. Phys. 138:061102 (2013)
[Noé 2009] F. Noé, C. Schütte, E. Vanden-Eijnden, L. Reich, and T. R. Weikl, PNAS 106:19011 (2009).
[Olsson 2017] S. Olsson, H. Wu, F. Paul, C. Clementi, F. Noé Proc. Natl. Acad. Sci. USA 114 : 8265 (2017)
[Ouldridge 2010] T. E. Ouldridge, A. A. Louis, and J. P. K. Doye, Physical Review Letters 104:178101 (2010).
[Ozer 2015] G. Ozer, A. Luque and T. Schlick, Curr. Opin. Struc. Biol. 31: 124--139 (2015).
[Pérard 2013] J. Pérard, C. Leyrat, F.Baudin, E. Drouet, M. Jamin Nat. Comm. 4:1612 (2013)
[Pérez 2007] A. Pérez, F. J. Luque, and M. Orozco, JACS 129:14739 (2007).
[Pinamonti 2015] G. Pinamonti, S. Bottaro, C. Micheletti, G. Bussi Nuc. Acid Res. 43:7260 (2015)
[Pires 2017] D. E. V. Pires, D. B. Ascher Nucleic Acids Res. 45 : W241 (2017)
[Pitera 2012] J. W. Pitera, J. D. Chodera, J. Chem. Theory Comput., 8:3445 (2012)
[Poblete 2018] Poblete, P. Jureka, N. G. Walter, M. Otyepka Chem. Rev. 118:4177 (2018)
[Rother 2011] K. Rother, M. Rother, M. Boniecki, T. Puton, and J. M. B J. Mol. Model., 17:2325 (2011).
[Rao 2017] S. S. P. Rao et al. Cell 171: 305--320 (2017).
[SantaLucia 1998] J. SantaLucia Proc. Natl. Acad. Sci. USA 17:1460 (1998)
[Schlick 1992] T. Schlick and W. K. Olson, Science 257:1110-1115 (1992).
[Schlick 2009] T. Schlick, F1000 biology reports 1:51 (2009).
[Shaw 2010) D. E. Shaw et al, Science 330:341 (2010).
[Šponer 2018] J. Šponer, G. Bussi, M. Krepl, P. Banáš, S. Bottaro, R.A. Cunha, A. Gil-Ley, G. Pinamonti, S.
[Sterpone 2014] F. Sterpone, S. Melchionna, P. Tuffery, S. Pasquali, N. Mousseau, T. Cragnolini, Y. Chebaro, J. St-Pierre, M. Kalimeri, A. Barducci, et al, Chemical Society reviews 43:4871 (2014).
[Sterpone 2018] F. Sterpone, S. Doutreligne, T. T. Tran, S. Melchionna, M. Baaden, P. H. Nguyen, P. Derreumaux Biochem. Biophys. Res. Co. 498:296 (2018)
[Sugita 1999] Y. Sugita and Y. Okamoto, Chemical Physics Letters. 314:141 (1999)
[Turner 2010] D. H. Turner, D. H. Mathews Nucleic Acids Res. 38:D280 (2010)
[Voelz 2010] V. A. Voelz, G. R. Bowman, K. Beauchamp, and V. S. Pande, JACS 132:1526 (2010).
[Wang 2009]H. W. Wang et al. Nat Struct Mol Biol. 16:1148 (2009)
[Warshel 1976] A. Warshel and M. Levitt, Journal of molecular biology 103:227 (1976).
[White 2014] A.D. White and G. A. Voth, J. Chem. Theory Comput., 2014, 10:3023 (2014)
[Webb 2016] B. Webb, A. Sali Curr. Prot. Protein Science 86: 2.9.1 (2016)
[Wright 2015] P. E. Wright, H. J. Dyson Nat Rev Mol Cell Biol. 16: 18 (2015)
[Yesselman 2016] J.D. Yesselman and R. Das, Methods in Molecular Biology 1490:187 (2016)
[Zhang 2017] W. Zhang, M. Ben-David and S. S. Sidhu, Curr Opin Struct Biol 45: 25 (2017).