Open Databases Integration for Materials Design
Location: Vilnius, Lithuania
Organisers
The traditional approach to designing custom materials for specific applications is typically trial-and-error. Researchers heavily rely on intuition and expertise to suggest compounds that are subsequently synthesized and tested to ascertain if they have the desired properties. This iterative process can be both time-consuming and expensive, often necessitating numerous rounds of experimentation to achieve the target material characteristics. Recognizing the limitations and inefficiencies of this approach, President Obama introduced the Materials Genome Initiative on June 24, 2011, with the primary objective of fostering and prioritizing cutting-edge research and development within the United States of America. This initiative aimed to revolutionize the way materials are designed and developed, moving away from traditional methods towards more innovative and efficient strategies.
In recent years, there have been significant advancements in materials design, driven by the increased computational power and the development of sophisticated electronic structure codes. These technological developments have revolutionized the field, enabling researchers to conduct complex calculations with unprecedented speed and accuracy. The automation of these calculations has paved the way for high-throughput ab initio computations to become a powerful tool in materials research, allowing for the rapid screening of vast numbers of materials to identify those with specific desired properties [1]. The outcomes of these high-throughput calculations are meticulously curated in extensive databases, which serve as repositories of valuable information containing properties of both existing and hypothetical materials. Recently there has been a proliferation of numerous open-domain databases that are accessible to the scientific community. By leveraging these databases, researchers can efficiently search for materials that exhibit specific characteristics, streamlining the materials design process and minimize their reliance on traditional trial-and-error methods. Moreover, the data stored in these databases can be utilized to develop advanced predictive machine learning models that enhance the efficiency and accuracy of materials design. The integration of computational tools and materials databases has not only accelerated the pace of materials discovery but has also facilitated collaboration and knowledge sharing within the research community. As researchers continue to harness the power of computational tools and data-driven approaches, the landscape of materials design is poised for further innovation and advancement. The synergy between computational modeling, experimental validation, and data sharing is driving a new era of materials research that holds great promise for the development of novel materials with tailored properties and functionalities.
The experimental community has also been actively involved in creating databases that contain material properties [2]. These databases vary in accessibility, with some being openly available like those offered by the National Institute of Standards and Technology, while others are commercially available. Due to the abundance of materials databases, it is not feasible to compile a comprehensive list, but specific initiatives provide links to a wide range of databases (see References [3] and [4] for more information).
Despite these advancements, the landscape of materials databases has long remained fragmented. While some databases offer a Representational State Transfer (REST) Application Program Interface (API) for interaction [5], the lack of standardized protocols makes it challenging to access curated materials data on a large scale. To address this issue, the OPTIMADE consortium [6] was established to develop a comprehensive API that can access all materials databases. To streamline data access and sharing, it brings together a growing number of developers and maintainers of leading databases:
AFLOW distributed materials property repository: http://aflow.org
Alexandria: https://alexandria.icams.rub.de
BioExcel: https://bioexcel-cv19.bsc.es
ChemDataExtractor in Cambridge: http://chemdataextractor.org
Computational Materials Repository: http://cmr.fysik.dtu.dk
Crystallography Open Database: http://www.crystallography.net/cod
JARVIS: https://jarvis.nist.gov/optimade/jarvisdft
Materials Cloud: http://materialscloud.org
Materials Platform for Data Science: http://mpds.io
Materials Project: http://materialsproject.org
Materials Properties Open Database: http://mpod.cimav.edu.mx
Matterverse: https://matterverse.ai/
NOMAD: https://nomad-lab.eu/prod/rae/gui/search
Open Quantum Materials Database: http://oqmd.org
Open Materials Database: http://openmaterialsdb.se
Through the OPTIMADE initiative, a community has been established through eight workshops. This community drives the development of the OPTIMADE API, establishes future plans to broaden the community, and enhances the API's scope. Discussions held during these workshops, at monthly virtual meetings, and via a community mailing list have led to the release of three stable versions (v1.0, v1.1, and v1.2) of the OPTIMADE API specifications [7,8]. All the databases mentioned have effectively integrated the OPTIMADE API, providing scientists with instant access to a wide array of materials data.
In 2021, the OPTIMADE specifications were published as a research paper in the prestigious peer-reviewed journal Scientific Data [9], spurring increased adoption and utilization of the API. This momentum led to the recent publication of a second research paper in Digital Discovery [10]. Additionally, a dedicated website [6] has been launched, featuring a mailing list, a wiki, and a GitHub repository. The optimade-python-tools module [11] was also created and is now available as the official reference implementation for Python servers, simplifying the onboarding process for new OPTIMADE adopters. Several online tutorials [12-14] have been conducted, with one hybrid tutorial held at CECAM in 2023 [15].
References
[1] S. Curtarolo et al., Nat. Mater. 12, 191 (2013); N. Marzari, Nat. Mater. 15, 381 (2016)
[2] K.H. Hellwege and L.C. Green, Am. Journ. Phys 35, 291 (1967); A. Belsky et al., Acta Cryst. B 58, 364 (2002); P. Villars et al., J. Alloys Compd. 367, 293 (2004); S. Gražulis et al., J. Appl. Cryst. 42, 726 (2009); G. Pepponi et al. Nucl. Instr. Meth. B 284, 10-14 (2012); Y. Xu, M. Yamazaki, and P. Villars, Jpn. J. Appl. Phys. 50, 11RH02 (2011); A. Zakutayev, Sci Data 5, 180053 (2018).
[3] https://github.com/tilde-lab/awesome-materials-informatics
[4] https://github.com/blaiszik/Materials-Databases
[5] see e.g. R.H. Taylor et al., Comput. Mater. Sci. 93, 178 (2014); S.P. Ong et al., Comput. Mater. Sci. 97, 209 (2015); F. Rose et al., Comput. Mater. Sci. 137, 362 (2017)
[7] http://www.optimade.org/optimade
[8] C. W. Andersen et al., The OPTIMADE Specification (Version 1.0). Zenodo (2020).
http://doi.org/10.5281/zenodo.4195051
[9] C. W. Andersen et al., Sci. Data 8, 217 (2021).
https://doi.org/10.1038/s41597-021-00974-z
[10] M. L. Evans et al., Digital Discovery (2024). https://doi.org/10.1039/D4DD00039K
[11] M. L. Evans et al., J. Open Source Softw. 6, 3458 (2021).
[12] https://th.fhi-berlin.mpg.de/meetings/nomad-tutorials/index.php?n=Meeting.Tutorial6
[13] https://www.youtube.com/watch?v=1OflR9qBP_A
References
Matthew Evans (UCLouvain) - Organiser
Gian-Marco Rignanese (Université catholique de Louvain) - Organiser
Germany
Joseph Rudzinski (Humboldt-Universität zu Berlin) - Organiser
Lithuania
Saulius Gražulis (Vilnius University Life Science Center Institute of Biotechnology) - Organiser
Sweden
Rickard Armiento (Linköping University) - Organiser
Switzerland
Giovanni Pizzi (Swiss Federal Institute of Technology Lausanne (EPFL)) - Organiser
United Kingdom
Gareth Conduit (University of Cambridge) - Organiser
United States
Zi-Kui Liu (Pennsylvania State University) - Organiser