Frontiers of computational biomolecular spectroscopy and mass spectrometry
Forschungzentrum Jülich, Germany
In recent years impressive methodological advances have pushed computational structural biology towards increasingly larger time and length scales bridging the gap to the relevant experimental scales. The applicability of computational spectroscopy has been continuously expanded towards increasingly complex problems in molecular biology and medicine. At the same time, the importance and necessity of understanding biological problems from a physical perspective has been recognized. This challenging task often requires an accurate description of the interactions at atomistic level, based on quantum mechanical laws. On the other hand, experimental static and time-resolved spectroscopies are progressively enhancing their capabilities to reveal new insights into molecular properties and biochemical processes with unprecedented detail. Vice versa, the increasing complexity of problems addressed in molecular biology and medicine poses severe challenges to the corresponding computational approaches to reconcile all requirements with respect to accuracy as well as length and time scales simultaneously.
This CECAM workshop is aimed at bridging the gap between quantum biology and life sciences. To this end we intend to bring together active researchers in the field of computational biomolecular spectroscopy. As the experimental validation of computational predictions is of crucial importance, each topical session will be complemented by experimentalists employing advanced spectroscopic techniques. Prospects and limitations of current methodologies will be discussed to define requirements for new developments to successfully address up-coming challenges for computational spectroscopy in molecular biology and medicine.
The elucidation of structural, dynamical and thus mechanistic properties of biomolecular systems and processes mostly rests on the application of X-ray scattering in conjunction with a variety of spectroscopic techniques. In order to arrive at accurate information, experiment and theory have established a symbiotic relationship. The interpretation of experimental signals and their relationship with structure and dynamics inevitably relies on simulations. Vice versa, computational biology requires initial input as well as validation against experiments.
Based on methodological progress in accordance with hardware technology advances, computational biology is increasingly capable of bridging the gap between experimentally relevant and computationally accessible length and time scales. Recently, the length scale of fully atomistic simulations has been extended to ~100 million atoms towards entire cell organelles [Ma11]. Simulation times that were so far accessible up to some microseconds, have been recently increased by about 2 orders of magnitude to millisecond for all-atom classical simulations of protein folding and conformational interconversion of medium-sized proteins [Sh10, Li11]. This progress has been achieved by designing a special-purpose machine for molecular dynamics (Anton) [Sh10]. The accessibility of millisecond time scale opens the opportunity to address outstanding questions in biomedicine such as protein folding or protein-protein interactions.
Computational spectroscopy is well established in chemistry. For small to medium-sized molecules NMR, (harmonic) IR and Raman vibrational spectra are nowadays routinely simulated with sufficient accuracy. The calculation of optical absorption and fluorescence spectra requires more advanced methods and analyses, but has also became common practice, at least for comparably small systems, often restricted, however, to gas phase calculations.
Spectroscopic signals usually depend critically on the environment due to specific interactions as well as conformational flexibility. Understanding how the environment of spectroscopic probes determines their signatures, i.e. frequency positions, intensities or band shapes, is therefore crucial to derive structural and mechanistic conclusions from spectroscopy [Ba08, Ba11]. Despite remarkable progress in recent years, the extension of computational spectroscopy to structural and mechanistic biology still faces a number of formidable challenges due to the complexity and size of biologically relevant systems.
Aromatic chromophores are sensitive optical probes of their environment, encountered in electronic absorption and fluorescence spectroscopy. In cell biology the optical properties of aromatic chromophores contained in tryptophan, tyrosine and phenylalanine amino acids are important indicators for the analysis of fundamental biochemical processes such as signaling, metabolisms or aberrant processes. To take environmental effects on optical properties into account, quantum chemical excited state methods are coupled either with continuum solvation models [Me12], with discrete molecular mechanics methods [Se09] or a combination of both [Ba10, Pe11]. Among continuum solvation methods, state-specific polarizable continuum models (PCM) have been particularly successful in simulating absorption [Ba10] as well as fluorescence spectra [Ba11, Pe11]. The best compromise between accuracy and efficiency when one wants to explicitly include the environment is often provided by combined quantum mechanical / molecular mechanics (QM/MM) approaches [Pe07, Se09]. The QM part is typically restricted to the spectroscopically relevant region, usually limited to a few hundred atoms. Very often, density functional theory (DFT) for ground states or time-dependent DFT (TD-DFT) [Ca09] for the description of electronically excited states is employed as the QM method. TD-DFT methods are computationally very efficient and have been used successfully, in particular in conjunction with Car-Parrinello molecular dynamics methods [Su05, Ca11, Ta11, Fi12, Mu12]. Nevertheless, the quality of results obtained by TD-DFT calculations depends on the system under investigation and on the functional used to reproduce the exchange and correlation interactions. TD-DFT is known to be particularly problematic for excited states involving charge transfer and thus, the method is not generally applicable.
Accurate and more generally applicable wavefunction-based quantum chemical techniques for excited states such as CASSCF/CASPT2, CASSCF/MRCI or coupled-cluster methods are as yet mostly limited to small molecules in the gas phase [Go12]. Linear scaling approaches extend the range of applicability, but limitations with respect to system sizes on the order of 100-200 atoms still remain [Ad09]. Many-body perturbation theory (MBPT) or quantum Monte Carlo approaches are emerging as promising alternatives for accurate calculations of quantum systems, providing favorable scaling performance over more and more CPUs. A frequently applied MBPT scheme combines the Green’s function GW method to calculate single quasi-particle energies with the Bethe-Salpeter equation (BSE) to introduce excitonic effects (GW-BSE) [On02, Ma10].
As yet, most QM/MM investigations on optical chromophore properties have been focused on solvent effects on absorption spectra [Ja11], whereas studies on biomolecular environmental shifts on optical spectra are still relatively rare [Ca11, Su10, Fi12, Mu12]. That is, the full conformational flexibility of large protein environments at ambient temperature based on molecular dynamics (MD) and its influence on absorption or emission band positions and shapes, e.g. inhomogeneous broadening of spectra, is only rudimentarily investigated.
At the experimental level, multidimensional NMR spectroscopy is a powerful tool for the structural analysis of biological systems. In-cell NMR has been demonstrated to be a very powerful method to probe the in vivo binding mode and interaction with target proteins of low molecular weight substrates, comprising metal-based drugs [In09]. This technique can also provide information about protein conformational changes occurring upon binding [Re07]. In recent years paramagnetism-assisted NMR spectroscopy has emerged as a powerful tool to investigate protein-protein interactions and the conformational flexibility of proteins [Be11]. The unique possibilities offered by paramagnetism are paramagnetic relaxation enhancement (PRE), pseudocontact shifts (PCS) and paramagnetic residual dipolar couplings (RDC).
Nevertheless, a complete structural assignment is not always possible. Here, computational NMR spectroscopy has proven to be most useful in complementing and extending experimental studies by providing an accurate description of the relationship between NMR parameters and structural features. First principles calculations of nuclear magnetic resonance parameters have already reached a mature stage [Bu11] and have been successfully applied in structural biology [Mu10]. In recent years, the simulation of NMR parameters has been extended to open-shell compounds, i.e. to paramagnetic compounds [Bu11, Mo04], which is traditionally the domain of electron paramagnetic resonance (EPR) spectroscopy [Sc10]. NMR spectra can be simulated at essentially all levels of first principle methods. For larger systems though, DFT methods tend to be the best choice with respect to feasibility and accuracy. Relative chemical shifts can already often be described with medium-sized basis sets. NMR experiments in solution are affected by temperature-depending thermal motion and solvation effects. These can be modeled by performing molecular dynamics simulations and averaging the NMR parameters over a number of snapshots of the trajectory [Se04]. They tend to be rather small for light nuclei, i.e. 1H and 13C, but more noticeable for heavier nuclei such as transition metals [Bu11]. Although there are still sizable deviations from experiment, the overall agreement is satisfactory for the 1H and 15N NMR shifts [Ko07].
Coherent two-dimensional IR (2D-IR) spectroscopy is emerging as new and unique bioanalytical tool as it combines spatial with femtosecond time resolution [At10, Gu11]. Whereas earlier studies have focused on small peptides, the 2D-IR method has been extended lately to more complex systems. Recent 2D-IR spectroscopy investigations have provided information on protein folding [Se12], on the structure of a transmembrane helix dimer [Re11], on an amyloid fibril inhibitor [Mi12], on ion selectivity or on transport in the potassium channel [Ga11] and proton channel M2 [Gh11]. Experimental investigations are commonly accompanied by molecular simulations to interpret and predict the spectroscopic signatures. For instance, 2D-IR spectra have been simulated from classical MD simulations for proton transport in a transmembrane ion channel [Li11, Li12].
To proceed further, other promising spectroscopies to get fine details on structural features of biomacromolecules and their environments exploit the selective vibrational response to chiral centers, namely vibrational circular dichroism (VCD) and vibrational Raman optical activity (ROA) [Po07, Bl03, Bl12]. Such techniques have produced increasing interest both in the scientific and industrial community, allowing not only the assignment of absolute configurations of biopolymers, but also, in principle, to access information on the local environment of the various oscillating modes of the molecular systems. However, the full potentiality of vibrational chiroptical spectroscopy strongly depends on the development of sophisticated theoretical methodologies, integrating high-level quantum mechanical calculations and enhanced sampling approaches that allow one to reproduce and interpret the complex spectroscopic signals intrinsic with this technique.
Protein mass spectrometry has evolved into an indispensable tool in biochemistry, structural biology and proteomics research over the last two decades [Sh07, Ch08]. The crucial aspect for the success of mass spectrometry is the observation that vaporization of proteins from aqueous solution into the gas phase under mild conditions as present during electrospray ionization (ESI) preserves the characteristic structural determinants in most cases ([Me09] and Refs). therein. Even non-covalent protein complexes often stay intact in the gas phase [Ae01]. Thus, powerful insights on protein structure and dynamics relevant for structural biology in solution can be obtained from mass spectrometry data. In fact, mass spectrometry has distinct advantages over standard methods such as X-ray crystallography and NMR spectroscopy as it does not require protein crystallization, which is quite difficult for important protein classes such as membrane proteins or intrinsically disordered proteins, as well as due to its sensitivity to low protein concentrations, and in particular its capability of detecting very large proteins or even protein aggregates. The limitation of mass spectrometry in resolving the three-dimensional structure of proteins can be conveniently overcome by combining experiments with molecular simulations [Pa07, Me09, Ma10a].
Paolo Carloni (Forschungszentrum Jülich and RWTH Aachen University) - Organiser & speaker
Jens Dreyer (German Research School for Simulation Sciences ) - Organiser & speaker
Emiliano Ippoliti (Forschungszentrum Jülich) - Organiser
Vincenzo Barone (Scuola Normale Superiore) - Organiser
Giuseppe Brancato (Scuola Normale Superiore) - Organiser & speaker