Open Databases Integration for Materials Design
CECAM-HQ-EPFL, Lausanne, Switzerland
Trial-and-error is the traditional approach to design bespoke materials for specific applications. Typically, researchers use their intuition and experience to propose compounds that are then synthesized and tested to ascertain whether or not they fulfill the target properties. It usually takes months to test a single compound, and most often the outcome is negative so many cycles of trial and improvement are required to finalize the material. The typical research project is therefore complex, time-consuming, and costly. In a speech on June 24, 2011, President Obama announced the Materials Genome Initiative, as a major US research priority to foster innovation more effectively.
The last decade has witnessed game changing improvements in materials design. The exponential growth of computer power and the development of robust first-principles electronic structure codes make it possible to perform large sets of calculations automatically. This has provided a launchpad for the flourishing field of high-throughput (HT) ab initio computations . The concept is simple yet powerful. The results of HT calculations are curated in large databases (DBs), recording the computed properties of existing and hypothetical materials. These DBs can then be screened for materials with the desired properties, removing the guesswork from materials design. Furthermore, these DBs can be used to build machine learning models that can design materials. Thanks to such HT calculations, many open-domain DBs have emerged. In parallel, the experimental community has also continued to develop its own DBs with material properties . In the latter case, some of the DBs are open (e.g. at the National Institute of Standards and Technology) whilst others are marketed by companies. Given the number of available materials DBs, it is impossible to be exhaustive but some initiatives provide links to as many as possible (see e.g., Refs.  and ).
Nonetheless, the landscape in materials DBs is fragmented. In some cases, a Representational State Transfer (REST) Application Program Interface (API) is available  to interrogate the DB through scripts (though not always documented). But, until recently, it was only possible to interrogate one DB at a time and the APIs would vary from one DB to another. Furthermore, the lack of standards makes it complicated to access large-scale curated materials data. Flexible, uniform, computer-readable data standards should be established to enable data to be shared and systematically mined.
To take advantage of the disparate databases, the OPTIMADE consortium  was formed with the goal to create a holistic API that can access all materials databases. The consortium has gathered the developers/maintainers of the following DBs:
- AFLOW distributed materials property repository: http://aflow.org
- ChemDataExtractor in Cambridge: http://chemdataextractor.org
- Materials Cloud: http://materialscloud.org
- Materials Project: http://materialsproject.org
- NOMAD (Novel Materials Discovery) Archive: https://nomad-lab.eu/prod/rae/gui/search
- Open Quantum Materials Database: http://oqmd.org
- Computational Materials Repository: http://cmr.fysik.dtu.dk
- Open Materials Database: http://openmaterialsdb.se
- Crystallography Open Database: http://www.crystallography.net/cod
Under the OPTIMADE banner, six workshops have created a community developing the OPTIMADE API, and aims to further expand the community in the future. Thanks to the discussions that have taken place during the workshops and through the mailing list, the first stable version (v1.0) of the OPTIMADE API specifications has been released [7,8]. All of these databases have now implemented the OPTIMADE API, providing scientists with immediate access to a wealth of materials data.
A paper about the API was recently published in the prestigious peer reviewed journal Nature Scientific Data , which should act as a launchpad for future adoption and usage of the API. Furthermore, a website has been set up  with a mailing list, a wiki, and a GitHub repository. optimade-python-tools  have been developed and released as the reference implementation for python servers, lowering the barrier to new OPTIMADE adopters. A few tutorials have also been organized online [11,12].
 S. Curtarolo et al., Nat. Mater. 12, 191 (2013); N. Marzari, Nat. Mater. 15, 381 (2016)
 K.H. Hellwege and L.C. Green, Am. Journ. Phys 35, 291 (1967); A. Belsky et al., Acta Cryst. B 58, 364 (2002); P. Villars et al., J. Alloys Compd. 367, 293 (2004); S. Gražulis et al., J. Appl. Cryst. 42, 726 (2009); Y. Xu, M. Yamazaki, and P. Villars, Jpn. J. Appl. Phys. 50, 11RH02 (2011); A. Zakutayev, Sci Data 5, 180053 (2018).
 see e.g. R.H. Taylor et al., Comput. Mater. Sci. 93, 178 (2014); S.P. Ong et al., Comput. Mater. Sci. 97, 209 (2015); F. Rose et al., Comput. Mater. Sci. 137, 362 (2017)
 C. W. Andersen et al., The OPTIMADE Specification (Version 1.0). Zenodo (2020). http://doi.org/10.5281/zenodo.4195051
 C. W. Andersen et al., Sci. Data 8, 217 (2021).
 M. Evans et al., J. Open Source Softw. 6, 3458 (2021).
Gian-Marco Rignanese (Université catholique de Louvain) - Organiser
Markus Scheidgen (Humboldt-Universität zu Berlin) - Organiser
Saulius Gražulis (Vilnius University Life Science Center Institute of Biotechnology) - Organiser
Rickard Armiento (Linköping University) - Organiser
Giovanni Pizzi (PSI) - Organiser
Gareth Conduit (University of Cambridge) - Organiser
Cormac Toher (Duke University) - Organiser