MADICES 3: Machine-actionable Data Interoperability for the Chemical Sciences
Location: PSI, Villigen, Switzerland
Organisers
The research community is increasingly recognizing FAIR data (Findable, Accessible, Interoperable, Reusable) as pivotal for the future, emphasizing the need to invest in making data easily accessible and usable by others. Central to achieving this is the role of metadata, which plays a crucial part in enabling searchability and discovery. Enhancing metadata to be machine-actionable improves interoperability and aids users in comprehending the context and significance of the data. Semantic metadata, which connects metadata concepts with ontological terms to provide contextualization beyond simple textual descriptions, is essential in this endeavor [1, 2].
Acknowledging its significance, the scientific community has been developing various tools and standards to advance metadata practices. The RO-Crate specification provides a standard for packaging research data with machine-actionable metadata utilizing JSON-LD. The schema.org initiative establishes community-driven schemas that encompass metadata concepts across different domains. Ontotext's Refine tool converts unstructured data into knowledge graphs, which are integral in the semantic annotation workflow.
Despite these efforts, several challenges persist. Complexity remains a barrier as identifying pertinent data elements and aligning them with suitable ontologies can be labor-intensive [3]. Moreover, the ontology landscape is vast and varied, leading to confusion, with significant gaps existing in specialized subdisciplines [4-7]. In addition, domain expertise is often crucial for mapping data to domain-specific ontology terms accurately, compounded by ontological concepts sharing similar names but differing in semantic meanings [8,9].
Moreover, there is a notable absence of user-friendly tools for robust and accurate semantic annotation and metadata generation. Addressing this complex issue requires further technological innovations [8,9]. Recent strides in AI/ML such as Large Language Model (LLM) systems show promise in accurately accomplishing semantic annotation tasks [10,11], suggesting potential avenues for integration into open science practices.
Another critical challenge is achieving interoperability of FAIR data across diverse Research Data Management (RDM) platforms. In the context of the PREMISE project under the ETH-Board’s ORD program, efforts are underway to explore interoperability between systems such as the AiiDA workflow management system (WFMS) and the openBIS ELN/LIMS platform. However, such pairwise solutions often rely on bespoke APIs, which are cumbersome and costly to develop. A more scalable approach would require a platform-agnostic solution, enabling various platforms to seamlessly adopt general data interoperability standards via a unified API. Initiatives towards realizing such solutions were commenced at MADICES 2, with detailed progress accessible on the MADICES GitHub repositories.
In conclusion, while significant strides have been made towards embracing FAIR principles and improving metadata practices, substantial challenges remain. Addressing these challenges demands continued interdisciplinary collaboration and innovative use of emerging technologies to realize the full potential of FAIR data principles in modern research.
References
Matthew Evans (UCLouvain) - Organiser
Germany
Simon Bekemeier (Bundesanstalt für Materialforschung und -prüfung (BAM)) - Organiser
Sebastian Brückner (IKZ Berlin and HU Berlin) - Organiser
Switzerland
Edan Bainglass (Paul Scherer Institut) - Organiser
Caterina Barillari (ETH Zurich) - Organiser
United Kingdom
Samantha Pearman-Kanza (University of Southampton) - Organiser