Integrated Software for Integrative Structural Biology
- Chris Morris (STFC, United Kingdom)
- Martyn Winn (STFC Daresbury Laboratory, United Kingdom)
- Alexandre M.J.J. Bonvin (Utrecht University, The Netherlands)
- Jose-Maria Carazo (Spanish National Center for Biotechnology, Spain)
- Keith Wilson (York University, United Kingdom)
Structural biologists use a variety of software tools to help their work, from data collection, through the creation of structural models, to finding biological significance in the results. Some of these tools work together well, with seamless data transfer and a consistent user interface. Others do not, often because they have been developed separately, by groups that are part of different subdisciplines of structural biology, e.g. X-ray crystallography and nuclear magnetic resonance spectroscopy.
Now structural biologists are targeting mesoscale structures including the macromolecular machinery of the cell. Increasingly often, they combine different techniques in a single large research project, aiming to create multiscale models. This raises the challenge to software developers of working together to create an integrated and extensible toolset that supports a range of experimental techniques, as well as modelling and simulation methods.
Such a toolset will also allow synergy between researchers beyond planned collaborations, by ensuring for example that a model that has been deposited in a public database can easily be reused within an investigation that is based on complementary techniques.
This workshop will discuss progress towards these goals and challenges along the way. The workshop is timely as there is now a strong drive for structural biologists to look beyond their own subdiscipline. The European ESFRI projects, such as INSTRUCT (http://www.structuralbiology.eu/) and ELIXIR (http://www.elixir-europe.org/), are encouraging multi-disciplinary approaches, and there is also a desire to fit individual experimental results into a systems view of the cell or organism. These scientific drivers must be supported by a suitable software environment. While this is widely recognised, there is as yet no coherent effort in this direction.
The workshop will consist of invited talks from leading computational structural biologists and modellers, supplemented by talks selected from responses to a Call for Papers. We take this approach in order to ensure input from younger software developers. Areas to be covered in the Call include (but are not limited to): * work to connect existing software packages* novel software to support combined techniques* formats for structural data* the "last mile" problem: securing community take up of innovative tools* position papers about future challenges
Submissions will be valued more if they report not only successes but also problems that remain to be solved. We will also schedule significant time for discussion.
Individual techniques are well supported by existing software, for example CCP4  and ARP/wARP  for macromolecular X-ray crystallography, CCPN for NMR spectroscopy , Xmipp for electron microscopy , SWISS-MODEL  for protein structure Homology Modeling, Gromacs  for molecular simulation, and many others.
Software for interdisciplinary studies is more patchy, and usually covers specific cases. For example, HADDOCK  is primarily a docking program, but can use restraints taken from a wide variety of experiments such as changes in NMR chemical shifts or mutation data. Fitting atomistic models from MX or NMR into lower resolution electron microscopy maps is of increasing relevance, as the technique of cryoEM gains in importance, and there are a number of software tools dedicated to this, for example VEDA (Jorge Navaza). The molecular dynamics flexible fitting (MDFF) method implemented in NAMD  can be used to flexibly fit atomic structures into EM density maps.
Different experimental techniques relate to each other through aspects of the structural model. This could be coordinate data (e.g. from MX or NMR), distance restraints (NMR or FRET), volume data (EM, SAXS/SANS or low resolution MX), or features (segmentation of EM volumes or tomograms). A pre-requisite for interdisciplinary software is a common understanding of structural features. For example, how to interpret multiple side chain conformations (e.g. from high resolution crystallography) or ensembles of models (e.g. from NMR or from modelling).
Interdisciplinary studies using multiple softwares can be aided to some extent by standard data formats. Despite some well-known limitations, the Protein Data Bank (PDB) format is widely used for representing atomic coordinate data. The MRC format for cryoEM volume data is derived from the CCP4 format for electron density, and in principle these are inter-operable. Nevertheless, formats diverge and work is needed to maintain interoperability.
Moving beyond file formats, there is a need for ontologies that cover broad areas of structural biology. There is a need to include metadata for different experiment types, as well as the raw experimental data. One of the most comprehensive data format in actual use for structural biology experiments is the pepcDB data interchange format:
The problems of working on a multi-disciplinary structural biology project, using a diverse set of software tools, is well known, but it is not clear how to address this problem. The workshop will seek to identify the specific areas where progress can be made, and discuss possible solutions. Questions for consideration include:
* multi-disciplinary software versus data conversion software to transfer between stand-alone packages
* the need for standardisation of data formats or ontologies
* validation and comparison of results between different techniques
* ensuring easy availability and user-friendliness of software
The proceedings will be published in a special issue of Acta Crystallographica, Section D.
 Winn et al., (2011) "Overview of the CCP4 suite and current developments." Acta Cryst D67 , 235-242
 Langer, G., Cohen, S.X., Lamzin, V.S. & Perrakis, A. (2008) "Automated macromolecular model building for X-ray crystallography using ARP/wARP version 7." Nature Protocols. 3, 1171-1179
 Vranken et al., (2005) "The CCPN data model for NMR spectroscopy: development of a software pipeline." Proteins, 59(4), 687-96
 C.O.S. Sorzano et al., (2004) "XMIPP: a new generation of an open-source image processing package for electron microscopy" J. Struct. Biol. 148(2), 194-204
 Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). "The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling." Bioinformatics, 22,195-201.
 Hess, et al. (2008) "GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation" J. Chem. Theory Comput. 4: 435-447
 Dominguez et al., J. Am. Chem. Soc. 125, 1731-1737 (2003)
 Frauenfeld et al., Nat. Struct. Mol. Biol., 18, 614-621, (2011).