CECAM - Macromolecular simulation software workshopMacromolecular simulation software workshop

Biomolecular simulation continues to grow in popularity and scope of application. It is no longer the preserve of a few specialist groups but is widespread - part of the ‘toolkit’ used by researchers in a wide variety of fields, often closely integrated with other experimental research techniques. As the use of biomolecular simulation grows, a corresponding boom in (decentralised) software development is taking place. Much of this is within multidisciplinary research groups and is highly focussed on the specific needs of that community. Development is often done by researchers with little or no formal training in software engineering or programming. This is not entirely a negative – it encourages practical solutions to real-life problems and ‘thinking outside the box’. However, it is obvious that it is not the ideal route to the production and maintenance of high quality, flexible, sustainable, and usable software products that can be adopted and modified by others. Giving researchers the right tools and training to develop such software will reduce the amount of ‘wheel-reinvention’, and consequently improve research productivity.

A number of recent developments make it easier to see a route to achieving this goal. Firstly there is the growth of the Open Source, Open Development paradigm, and the establishment of a number of widely-recognised platforms to support such activities such as SourceForge, GitHub, Bitbucket, and others. Secondly there has been a move towards object-oriented coding styles that facilitiate reusability and extension. In the biomolecular simulation field in particular, we see an increasing focus on high level scripting languages like Python, which combines power and flexibility with a relatively shallow learning curve, and promote a paradigm of code sharing, re-use, and extension. There are an growing number of software development projects related to biomolecular simulation that are based around Python toolkits, for example:

OpenMM (https://simtk.org/home/openmm),
MMTK (http://dirac.cnrs-orleans.fr/MMTK/),
MDAnalysis (https://code.google.com/p/mdanalysis/) ,
MDTraj (http://mdtraj.org/latest/),
SIRE (http://siremol.org/Sire/Home.html),
Bookshelf (http://sbcb.bioch.ox.ac.uk/bookshelf/)
pyEMMA (http://www.pyemma.org)

as well as more general-purpose Python-based tools that have clear applicability to biosimulation – e.g. RADICAL-Cybertools (http://radical-cybertools.github.com).

This CECAM Software Development Workshop will give an opportunity for representatives of these projects and many others that are as yet less well-known, to interact with end users with the aim of tackling the most challenging problems in this domain, particularly those relating to how the capabilities of and opportunities presented by future generations of massive, sometimes distributed, heterogeneous, computational resources can best be leveraged in this domain of science. Somewhat in the spirit of a “Hackathon”, the aim will be to challenge the code developers to maximise the interoperability of their projects through application to real-life simulation problems. Through this process they will identify gaps in provision, opportunities for optimisation, integration, and collaboration, and provide a showcase for the rich diversity of activity in this area.

The event will be formatted as six sequential workshops, each 2 days long. Participants will be able to attend all or just a subset of them. The outline program will be as follows:

Week 1 (Basic):

1.‘’Software carpentry workshop’’ (instructors: Philip Fowler, Oxford and David Dotson, ASU) Mon 12/10-Tue 13/10

Objective: Introduce shell scripting (bash), python, unit testing and version control (git / GitHub) which are pre-requsities for the remaining workshops. You will not need to attend if you have experience in the above. Please see http://philipwfowler.github.io/2015-01-13-oxford/ for an example software carpentry workshop.

Target Audience: participants who wish to attend one of the following workshops, but first would like to either improve their software skills and knowledge, or individuals who would like a refresher.

Software Carpentry Bootcamp website:

http://philipwfowler.github.io/2015-10-12-cecam-julich/

Coffee breaks kindly sponsored by:

The Software Sustainability Institute.

2. “Analysing simulation data” (lead: Philip Fowler, Oxford) Wed 14/10-Thu 15/10

Objective: Introduce participants to different python-based approaches of analysing the data produced by molecular dynamics codes, focussing on MDAnalysis (http://www.mdanalysis.org) and pmx (https://code.google.com/p/pmx/). The workshop will be structured around a "hack day" where participants will pitch problems to solve, form teams and then work on them, with expert help, in an intensive environment. Prior to the hackday will be talks by invited speakers and then tutorials on both python modules.

The following invited speakers have confirmed attendance

Oliver Beckstein (Arizona State University)

Bert de Groot (Max Planck Institute for Biophysical Chemistry, Göttigen)

Tyler Reddy (University of Oxford)

Detailed website:

http://philipwfowler.me/cecam-analysing-simulation-data-mini-workshop/

Coffee breaks kindly sponsored by:

The Software Sustainability Institute.

3. “Setting up simulations” (lead: Charlie Laughton, Nottingham) Fri 16/10-Sat 17/10

Objective: Introduce participants to alternative paradigms and toolkits for setting up molecular dynamics simulations, including validating starting structures and repairing "errors", embedding solutes in solvents and membranes, relaxation and equilibration protocols, and setting up complex simulations such as free energy perturbation calculations. Participants will have the opportunity to explore the capabilities of a range of web-based (e.g. http://mmb.irbbarcelona.org/MDWeb/) and python toolkit-based solutions (e.g. http://www.hecbiosim.ac.uk/fesetup), and in a "hackathon" type activity, work in small groups with the code owners to tackle problems related to unmet needs, flexibility, interoperability, or performance.

The following invited speakers have confirmed attendance:

Modesto Orozco & Adam Hospital (University of Barcelona)

Hannes Loeffler (Daresbury Laboratories)

Phill Stansfeld (Oxford)

Detailed website: https://bitbucket.org/claughton/cecam-setup/wiki/Home

-- Sunday is left free for participants --

Week 2 (Advanced):

4. “Developing interoperable and portable molecular simulation software libraries” (lead: Julien Michel, Edinburgh) Mon 19/10-Tue 20/10

Workshop website

http://jmichel80.github.io

This workshop will bring together developers to share their experience of developing molecular simulation software toolkits. Emphasis will be placed on discussing best-practices for code sharing, code re-use and code extensibility, as well as strategies to facilitate software maintenance. In addition two mini ‘hands-on’ workshops will be organised to teach delegates how to develop code with the molecular simulation library OpenMM (https://simtk.org/home/openmm), and the free energy calculations library Plumed (http://www.plumed-code.org/). Delegates will also have the opportunity to suggest coding problems they would like to work on with OpenMM and/or Plumed during the workshop.

Target audience

* Software developers with expertise in molecular simulation software development

* Computational scientists with an interest in learning how to use OpenMM and/or Plumed to write new code.

The following invited speakers have confirmed attendance

John Chodera (Memorial Sloan Kettering Cancer Center )

Peter Eastman (Stanford University)

Gareth Tribello (Queen’s University Belfast)

Christopher Woods (University of Bristol)

See http://jmichel80.github.io for details

5. “High Performance Distributed Computing tools for Biomolecular Simulations" (lead: Shantenu Jha, Rutgers) Wed 21/10-Thu 22/10

https://github.com/radical-cybertools/tutorials/wiki/CECAM-2015

How do you run O(1000) ensemble simulations on a large cluster or a supercomputer? How can you use a similar approach to run replica-exchange with thousands of replicas? Or even more sophisticated ensemble-based simulations? How can you separate the details of the high-performance computing platform from the details of the method?

This mini-workshop has two tracks. The first will provide an introduction to high-performance computing. It will then introduce the "pilot-abstraction" which will be the basis for ensemble-based simulations. This workshop will teach students how to run large scale simulations on machines.

The second track will help participants understand the basics of writing ensemble-based "workflows", introduce some exercises, and then provide hands-on experience and guidance to develop your own ensemble-based methodolgy.

Compulisve eagerness for running bigger, better and faster simulations is the only pre-requisite. Your introduction to RADICAL computing begins at:

http://radical-cybertools.github.io/

https://github.com/radical-cybertools/tutorials/wiki/CECAM-2015

On the second day, participants will be introduced to data-flow concepts and how to formulate high-level simulation problems as data-flow problems. They will also learn how to use the Copernicus simulation workflow package to launch, manage, and analyze large numbers of simulations in an automated fashion using pre-supplied control "modules" implementing adaptive sampling and swarms algorithms.

The following invited speakers have confirmed attendance

Shantenu Jha, Mark Santcroos and Andre Merzky (RADICAL, Rutgers)

Peter Kasson (Virginia)

6.“Advanced Sampling and Long Timescale Molecular Dynamics” (lead: Cecilia Clementi, Rice) Fri 23/10-Sat 24/10

This workshop will bring together developers involved in long timescale molecular dynamics, with focus on advanced sampling simulation and analysis. The first morning a few lectures will illustrate the needs for advanced sampling and will present a few applications. Then, three ‘hands-on’ mini-workshops will be organised on: 1) how to run metadynamics simulation with Plumed (http://www.plumed-code.org/), 2) how to use the software package ExTASY (http://extasy-project.org/), and 3) and how to analyze simulation with pyEMMA (pyemma.org). Delegates will have the opportunity to interact with software developers and discuss their coding problems in this context.

The following invited speakers have confirmed attendance

Giovanni Bussi (SISSA, Trieste)

Frank Noe' (Freie Universitat, Berlin)

Ardita Shkurti (University of Nottingham, UK)

Detailed website: https://bitbucket.org/extasy-project/extasy/wiki/CECAM%20Tutorial

Macromolecular simulation software workshop

Location: CECAM-DE-JUELICH

Organisers

References