Open Databases Integration for Materials Design
Location: CECAM-FR-RA
Organisers
The traditional approach to designing custom materials for specific applications is typically trial-and-error. Researchers heavily rely on intuition and expertise to suggest compounds that are subsequently synthesized and tested to ascertain if they have the desired properties. This iterative process can be both time-consuming and expensive, often necessitating numerous rounds of experimentation to achieve the target material characteristics. Recognizing the limitations and inefficiencies of this approach, President Obama introduced the Materials Genome Initiative on June 24, 2011, with the primary objective of fostering and prioritizing cutting-edge research and development within the United States of America. This initiative aimed to revolutionize the way materials are designed and developed, moving away from traditional methods towards more innovative and efficient strategies.
In recent years, there have been significant advancements in materials design, driven by the increased computational power and the development of sophisticated electronic structure codes. These technological developments have revolutionized the field, enabling researchers to conduct complex calculations with unprecedented speed and accuracy. The automation of these calculations has paved the way for high-throughput ab initio computations to become a powerful tool in materials research, allowing for the rapid screening of vast numbers of materials to identify those with specific desired properties [1,2]. The outcomes of these high-throughput calculations are meticulously curated in extensive databases, which serve as repositories of valuable information containing properties of both existing and hypothetical materials. Recently there has been a proliferation of numerous open-domain databases that are accessible to the scientific community. By leveraging these databases, researchers can efficiently search for materials that exhibit specific characteristics, streamlining the materials design process and minimize their reliance on traditional trial-and-error methods. Moreover, the data stored in these databases can be utilized to develop advanced predictive machine learning models that enhance the efficiency and accuracy of materials design. There has also been increasing interest and engagement in materials discovery from industrial machine learning labs, who can leverage the extreme scale of their computational resources to create large, high-quality datasets. With initiatives from Meta, DeepMind and Microsoft Research, among others, [3-5] it is crucial that the outputs from such initiatives can be integrated into the community in an interoperable way, with OPTIMADE providing a clear “unified front” to which these labs can contribute to.
The experimental community has also been actively involved in creating databases that contain material properties [6-11]. These databases vary in accessibility, with some being openly available like those offered by the National Institute of Standards and Technology, while others are commercially available. Due to the abundance of materials databases, it is not feasible to compile a comprehensive list, but specific initiatives provide links to a wide range of databases (see References [12] and [13] for more information).
The integration of computational tools and materials databases has not only accelerated the pace of materials discovery but has also facilitated collaboration and knowledge sharing within the research community. As researchers continue to harness the power of computational tools and data-driven approaches, the landscape of materials design is poised for further innovation and advancement. The synergy between computational modeling, experimental validation, and data sharing is driving a new era of materials research that holds great promise for the development of novel materials with tailored properties and functionalities.
Recent initiatives to accelerate research in material science, often nationwide, were funded in Europe and the world to develop numerical and data infrastructure providing suite of open-source codes and workflows, including adaptable AI tools, as well as the possibility to make existing or newly developed codes and workflows available to the Materials’ community, with description scales going from ab initio electronic structure calculation and atomistic calculations with machine-learning potentials, to finite elements and materials properties and thermodynamics. Close collaboration of OPTIMADE consortium with these infrastructures offers a great added value to exploit large databases in an automated way through workflows. Given the diversity of the approaches the question of interoperable workflows is of utmost importance.
Despite these advancements, the landscape of materials databases has long remained fragmented. While some databases offer a Representational State Transfer (REST) Application Program Interface (API) for interaction (see e.g. Refs. [14-16]), the lack of standardized protocols makes it challenging to access curated materials data on a large scale. To address this issue, the OPTIMADE consortium [17] was established to develop a comprehensive API that can access all materials databases. To streamline data access and sharing, it brings together a growing number of developers and maintainers of leading databases:
- AFLOW distributed materials property repository: http://aflow.org
- Alexandria: https://alexandria.icams.rub.de
- BioExcel: https://bioexcel-cv19.bsc.es
- ChemDataExtractor in Cambridge: http://chemdataextractor. org
- Computational Materials Repository: http://cmr.fysik.dtu.dk
- Crystallography Open Database: http://www.crystallography.net/cod
- JARVIS: https://jarvis.nist.gov/optimade/jarvisdft
- Materials Cloud: https://materialscloud.org
- MaterialsPlatform for Data Science: http://mpds.io
- Materials Project: http://materialsproject.org
- Materials Properties Open Database: http://mpod.cimav.edu.mx
- MPDD: The Material-Property-Descriptor Database: https://mpdd.org/
- MPDD-OPTIMADE Interface. http://mpddoptimade.phaseslab.org/
- Matterverse: https://matterverse.ai/
- NOMAD: https://nomad-lab.eu/prod/rae/gui/search
- OpenQuantum Materials Database: http://oqmd.org
- Open Materials Database: http://openmaterialsdb.se
Through the OPTIMADE initiative, a community has been established through eight workshops. This community drives the development of the OPTIMADE API, establishes future plans to broaden the community, and enhances the API's scope. Discussions held during these workshops, at monthly virtual meetings, and via a community mailing list have led to the release of three stable versions (v1.0, v1.1, and v1.2) of the OPTIMADE API specifications [18,19]. All the databases mentioned have effectively integrated the OPTIMADE API, providing scientists with instant access to a wide array of materials data.
In 2021, the OPTIMADE specifications were published as a research paper in the prestigious peer-reviewed journal Scientific Data [20], spurring increased adoption and utilization of the API. This momentum led to the recent publication of a second research paper in Digital Discovery [21]. Additionally, a dedicated website [17] has been launched, featuring a mailing list, a wiki, and a GitHub repository. The optimade-python-tools module [22] was also created and is now available as the official reference implementation for Python servers, simplifying the onboarding process for new OPTIMADE adopters. Several online tutorials [23-25] have been conducted, with one hybrid tutorial held at CECAM in 2023 [26].
References
Gian-Marco Rignanese (Université catholique de Louvain) - Organiser
France
Noel JAKSE (Université Grenoble Alpes) - Organiser
Germany
Rickard Armiento (University of Bayreuth) - Organiser
Tilmann Hickel (MPIE) - Organiser
Lithuania
Saulius Gražulis (Vilnius University Life Science Center Institute of Biotechnology) - Organiser
Switzerland
Giovanni Pizzi (Paul Scherrer Institute PSI) - Organiser
United Kingdom
Gareth Conduit (University of Cambridge) - Organiser
Matthew Evans (University of Cambridge) - Organiser
United States
Zi-Kui Liu (Pennsylvania State University) - Organiser

About