Machine-Actionable Data Interoperability for Chemical Sciences (MADICES): Bridging experiments, simulations, and machine learning for spectral data
on-line, hosted by CECAM-HQ-EPFL
At the CECAM MADICES workshop, we will bring together developers, scientists, and data specialists to discuss the hurdles and opportunities of data interoperability in the context of the chemical and materials sciences. We will strive for general technical recommendations, with X-ray absorption spectroscopy as the first prototype use case.
The workshop is free to attend! The latest logistic details can be found on the conference website. You can also contribute to the discussion about the workshop themes using the discussions in our GitHub repository.
To prepare a perspective paper targeted at providing guidelines and recommendations for new projects in our field that work with data.
- For example, how best to prepare and disseminate datasets, databases and APIs such that they are interoperable with existing projects.
- We will be focusing on the area of X-ray absorption spectroscopy as a concrete example of a domain where a sample, its context/metadata and relations with other samples and computational experiments (e.g. peak assignment) are necessary for the science
To identify particular challenges that can be overcome via new collaborative software libraries/infrastructure and motivate their creation over follow-up meetings/hackathons
- For example, services that aid discoverability of apps/schemas, and ways of discovering if two apps are interoperable
- For example, strategies for progressively adopting open linked data principles that are accessible to non-experts
To motivate follow-up meetings, hackathons and foster new cross-initiative collaborations.
Recent advances in the computational sciences allow us to simulate many spectra (e.g., X-ray absorption, infrared/Raman, NMR) in silico. In principle, this could open up unprecedented possibilities for the interpretation of experimental data.
Experimental data, however, comes in various, often undocumented or proprietary formats. In recent efforts, this experimental data is being recorded in electronic lab notebooks and archived with open data formats, aiding and automating crucial metadata capture. However, most of these lab notebooks have no mechanisms to exchange data between each other and even less so with our simulation tools, and typically, exporting data from such notebooks again requires lossy conversion to a chosen file format.
Standardization is an arduous process, and for a wide enough domain, it is infeasible. Nevertheless, without significant effort, there is a danger that we will not escape the local minima of “★★★/★★★★★” linked open data (as defined by Tim Berners-Lee).
In the case of the interoperability between experimental and computational data, there is the additional difficulty that computational systems are completely described, idealized systems with implicit assumptions, whereas for experimental systems parameters are ill-defined, unknown, or uncertain. Moreover, we also often miss a link between spectra data and the (meta) data contextualising the sample and its history.
How and where can we be interoperable in this setting? How can we make sure that experimental data can readily be consumed by computational tools, and vice versa, from the bottom-up? How can we share, contextualise and disseminate analysis (e.g., post-processing, peak assignment) in a reproducible way (on platforms such as MaterialsCloud or the Chemotion repository)? What new paradigms could such interoperability enable?
Due to COVID-related restrictions, we anticipate that the workshop will be held in online-only form. The registration form may ask you to fill in some motivation - you might leave this field empty.
Matthew Evans (UCLouvain) - Organiser
Sebastiaan Huber (EPFL) - Organiser
Kevin Jablonka (EPFL) - Organiser
Carlo Antonio Pignedoli (Swiss Federal Laboratories for Materials Science and Technology, Empa) - Organiser
Stefan Kuhn (De Montfort University) - Organiser
Shyam Dwaraknath (Lawrence Berkeley Lab) - Organiser