Life sciences like chemistry and molecular biology are developing into one of the largest e- Infrastructure users in Europe, in part due to the ever-growing amount of biological data. Modern drug design typically includes both sequence bioinformatics, in-silico virtual screening, and free energy calculations, e.g. for drug binding. This development accelerates tremendously and puts high demands on simulation software and support services [1,2,3,4,5].
The typical computational chemistry and molecular biology applications concern small-to-medium size molecules. The ScalaLife project, within which scope we would like to organize this workshop, focuses on implementation of new techniques for efficient small-system parallelization combined with throughput and ensemble computing to enable the life science community to exploit the European e-Infrastructures efficiently.
We seek to establish new approaches for targeting the entire pipeline from problem formulation through algorithms, simulations, and analysis by focusing heavily on actual application problems. In particular, ScalaLife provides a long-term support for major European software for the Life Science community to foster Europe’s role as a major software provider.
High performance computing (HPC) is being instrumental in the development of molecular simulations in biology and chemistry. It has reached the Teraflops power in clusters, Petaflops in supercomputer centers and initiative exist to build Exaflops supercomputers. Additionally, large efforts are being made in developing new codes to use in an efficient manner specific purpose processors, such as graphical accelerator cards, or ASICs. As a consequence, we are now in the edge of a new age for molecular simulation, where major advances are going to arise from:
a) improvement in the potential energy function,
b) increase in the size of the systems to be studied, which will be closer to the real ones and
c) enrichment on the extension of the sampling obtained by simulation.
European groups have developed several leading software codes used world-wide in computational chemistry and molecular biology, but so far not enough emphasis has been put on software maintenance and integration of new results putting the user support and to some extent further development of these codes somewhat under risk.
While relatively large efforts have been made during the past years in making codes more efficient and parallel, little has been made in preparing additional interfaces and other infrastructure for massive proteome-scale simulations, both in terms of set-up and analysis. Additionally, nothing is done yet in trying to integrate simulation rules in the workflows and pipelines emerging from bioinformatics projects.
Moreover, one of the most important trends in the application domain is the new focus on massive data acquisition followed by analysis and modeling on genomics/proteomics scale. Surprisingly, very little has been done to efficiently exploit this implicit data parallelism beyond trivial “embarrassing parallelization” either in bioinformatics or molecular simulation.