CECAM Flagship Workshop Proposal FAIR and TRUE Data Processing for Soft Matter Simulations
Date: September 25-27, 2023
Location: Max Planck Institute for Polymer Research, Mainz, Germany
Registration Deadline: August 23, 2023
Registration Fee: 200 Euro, to be transfered to the following bank account after registration confirmation -
- Recipient: Max Planck Society c/o MPI for Polymer Research
- IBAN: DE80 7007 0010 0195 1383 16
- BIC: DEUTDEMMXXX
Overview: The FAIRmat consortia and MOSDEF group are holding a CECAM flagship workshop to highlight efforts towards FAIR data management for molecular simulations and to discuss standardization of metadata and interoperability within the community. The speakers represent a range of perspectives including FAIR-data-type projects and consortia as well as developers of simulation engines. There will also be proposals for metadata schemes / ontologies as well as research talks focusing on curating and usage of large datasets.
Tutorials for FAIR and TRUE simulations: The workshop will be directly followed by a tutorial series (Sep 28-29) for students and researchers to get some hands-on experience using the tools presented and discussed throughout the workshop. Beyond learning the basics, there will be time set aside to talk to the software developers and expert users about your own data or use-cases. Registration for the tutorials is separate from the workshop itself, and can be done here: https://www.cecam.org/workshop-details/1261
The emergence of big-data-driven techniques as a fundamental paradigm of science has forced the evaluation of the way that researchers manage, document, and share their data . As a result, a variety of domain-specific projects have developed innovative tools for ensuring FAIR--Findable, Accessible, Interoperable, Reusable--or TRUE--Transparent, Reproducible, Usable by others, Extensible--data management . For example, the Novel Materials Discovery (NOMAD) Laboratory is a user-driven platform for sharing and exploiting computational materials science data, with a focus on data from Ab-initio calculations . In practice, 4 main challenges, known as the 4V's of big data, arise when developing data management procedures: volume (the amount of data), variety (the heterogeneity of form and meaning of data), velocity (the rate at which data may change or new data arrive) and veracity (the uncertainty of data quality). In contrast to Ab-initio data, the data generated by soft matter simulations (e.g., atomistic molecular dynamics simulations and multiscale modeling techniques) pose a particular challenge, due primarily to issues associated with the first 2 V's.
To address reproducibility of soft matter simulations, Cummings, McCabe, and coworkers have developed the Molecular Simulation Design Framework (MoSDeF), an open-source Python software stack that enables facile use of multiple open-source molecular simulation engines, while at the same time ensuring maximum reproducibility [4,5]. This suite provides support for constructing topologies and configurations, implementing and saving force fields, and generating simulation input files for popular molecular simulation software. In this way, researchers can implement complex simulation workflows in a fully scriptable fashion that is maximally reproducible .
In the context of accessibility and data sharing, various communities have developed niche repositories and management tools. Recently, FAIRmat--a consortium of the German research-data infrastructure (NFDI)--was formed to continue to raise awareness and acceptance of FAIR data practices . One of the primary tasks of FAIRmat is to extend the NOMAD infrastructure to a wide variety of materials science data, including data from soft matter simulations. Additionally, FAIRmat aims to assist the community in advancing metadata schemas and ontologies, enabling efficient exchange of FAIR research data and big-data analyses that aim to revolutionize the development of novel materials.
Proper interoperability requires some standardization of the simulation data and workflows. To tackle these challenges, it is essential that involved parties, including leading software developers outside the data management sphere, come together to set appropriate standards. Communication between individual projects will facilitate efficient development of tools and avoid duplication of work and "reinventing the wheel".
Martin Girard (Max-Planck-Institut für Polymerforschung) - Organiser
Joseph Rudzinski (Humboldt University) - Organiser
Clare McCabe (Vanderbilt University (USA)) - Organiser
Peter Cummings Peter Cummings (Vanderbilt University) - Organiser