Open Databases Integration for Materials Design
Location: CECAM-HQ-EPFL, Lausanne, Switzerland
Organisers
The conventional method for designing custom materials for specific applications is based on trial-and-error. Researchers typically rely on intuition and expertise to suggest compounds that are subsequently synthesized and examined to determine if they possess the target properties. Testing a single compound typically takes several months, and more often than not, the results are unfavorable, necessitating multiple rounds of experimentation to iteratively arrive at the desired material. Consequently, research and development are intricate, time-intensive, and expensive. Recognizing these challenges, President Obama declared the Materials Genome Initiative on June 24, 2011, with the aim of prioritizing innovative research in the United States.
Over the past decade there have been groundbreaking advances in materials design. The significant increase in computer power and the creation of robust first-principles electronic structure codes have enabled the automation of extensive calculations. This breakthrough paved the way for the thriving field of high-throughput ab initio computations [1]. The strategy is simple yet powerful. The outcomes of high-throughput calculations are carefully curated in extensive databases (DBs) that document the computed properties of both existing and hypothetical materials. These DBs can then be screened to identify materials or chemicals that possess specific desired properties, eliminating the need for guesswork in materials and chemicals design. Moreover, these DBs can be leveraged to construct predictive machine learning models that can be harnessed for design. This potential motivated the development of numerous open-domain DBs. Simultaneously, the experimental community has actively developed its own DBs containing material properties [2]. Some of these DBs are openly accessible, such as those provided by the National Institute of Standards and Technology, while others are commercially marketed by companies. Given the multitude of available materials DBs, it is impractical to provide an exhaustive list, but certain initiatives offer links to as many DBs as possible (refer to Refs. [3] and [4], for example).
However, the materials database landscape remains fragmented. In some instances, a Representational State Transfer (REST) Application Program Interface (API) is available [5] to interact with the database using scripts, although the documentation may not always be comprehensive. Until recently, it was only possible to query one database at a time, and furthermore the APIs would differ across different databases. Additionally, the absence of a standardized protocol made it challenging to access curated materials data on a large scale. The establishment of flexible, uniform, and machine-readable data standards is crucial to facilitate data sharing and systematic data mining.
To address the issue of scattered databases, the OPTIMADE consortium [6] was established to create a comprehensive API capable of accessing all materials databases. The consortium brought together the developers and maintainers of the leading databases:
- AFLOW distributed materials property repository: http://aflow.org
- ChemDataExtractor in Cambridge: http://chemdataextractor.org
- Materials Cloud: http://materialscloud.org
- Materials Project: http://materialsproject.org
- NOMAD (Novel Materials Discovery) Archive: https://nomad-lab.eu/prod/rae/gui/search
- Open Quantum Materials Database: http://oqmd.org
- Computational Materials Repository: http://cmr.fysik.dtu.dk
- Open Materials Database: http://openmaterialsdb.se
- Crystallography Open Database: http://www.crystallography.net/cod
Under the OPTIMADE initiative, a community has formed through seven workshops, developed the OPTIMADE API, and plans to expand the community and scope of the API in the future. Through discussions held during these workshops, during monthly video calls, and via the mailing list, the first (v1.0) and second (v1.1) stable version of the OPTIMADE API specifications were released [7,8]. All the aforementioned databases have successfully implemented the OPTIMADE API, granting scientists immediate access to an extensive range of materials data.
A research paper detailing the API has been published in the esteemed peer-reviewed journal, Scientific Data [9], which has catalyzed the further adoption and utilization of the API. In addition, a website [6] has been established, featuring a mailing list, a wiki, and a GitHub repository. The optimade-python-tools [10] have been developed and made available as the official reference implementation for Python servers, making it straightforward for new adopters of OPTIMADE to join the initiative. Several online tutorials [11,12] have been organized, and one hybrid tutorial took place at CECAM in 2023 [13]. Thanks to the postdoc working at CECAM (Johan Bergsma), the specification has been extended to trajectories. This will appear in the third (v1.2) stable version of the OPTIMADE API specifications, which should be released in the coming months.
References
[1] S. Curtarolo et al., Nat. Mater. 12, 191 (2013); N. Marzari, Nat. Mater. 15, 381 (2016)
[2] K.H. Hellwege and L.C. Green, Am. Journ. Phys 35, 291 (1967); A. Belsky et al., Acta Cryst. B 58, 364 (2002); P. Villars et al., J. Alloys Compd. 367, 293 (2004); S. Gražulis et al., J. Appl. Cryst. 42, 726 (2009); Y. Xu, M. Yamazaki, and P. Villars, Jpn. J. Appl. Phys. 50, 11RH02 (2011); A. Zakutayev, Sci Data 5, 180053 (2018).
[3] https://github.com/tilde-lab/awesome-materials-informatics
[4] https://github.com/blaiszik/Materials-Databases
[5] see e.g. R.H. Taylor et al., Comput. Mater. Sci. 93, 178 (2014); S.P. Ong et al., Comput. Mater. Sci. 97, 209 (2015); F. Rose et al., Comput. Mater. Sci. 137, 362 (2017)
[7] http://www.optimade.org/optimade
[8] C. W. Andersen et al., The OPTIMADE Specification (Version 1.0). Zenodo (2020). http://doi.org/10.5281/zenodo.4195051
[9] C. W. Andersen et al., Sci. Data 8, 217 (2021). https://doi.org/10.1038/s41597-021-00974-z
[10] M. Evans et al., J. Open Source Softw. 6, 3458 (2021).
[11] https://th.fhi-berlin.mpg.de/meetings/nomad-tutorials/index.php?n=Meeting.Tutorial6
References
Gian-Marco Rignanese (Université catholique de Louvain) - Organiser
Germany
Markus Scheidgen (Humboldt-Universität zu Berlin) - Organiser
Lithuania
Saulius Gražulis (Vilnius University Life Science Center Institute of Biotechnology) - Organiser
Sweden
Rickard Armiento (Linköping University) - Organiser
Switzerland
Giovanni Pizzi (Paul Scherrer Institute PSI) - Organiser
United Kingdom
Gareth Conduit (University of Cambridge) - Organiser
United States
Cormac Toher (University of Texas at Dallas) - Organiser