Open Databases Integration for Materials Design
Location: CECAM-HQ-EPFL, Lausanne, Switzerland
Organisers
Designing bespoke materials for specific applications is a long, complex, and costly process. Researchers propose materials based on intuition and experience. The synthesis and evaluation of those materials is costly, it can take months to test a single new material, and most often the outcome is negative. Therefore, researchers have to undertake many cycles of trial and improvement to finalize the material. In a speech on June 24, 2011, President Obama announced the Materials Genome Initiative, as a major US research priority to foster innovation more effectively.
In the last decade, there has been a major game change in materials design. Thanks to the exponential growth of computer power and the development of robust first-principles electronic structure codes, it has become possible to perform large sets of calculations automatically. This is the flourishing field of high-throughput (HT) ab initio computations [1]. The concept, though simple, is powerful. HT calculations are curated to create large databases (DBs) containing the calculated properties of existing and hypothetical materials. Furthermore, these DBs can be exploited, either through intelligent interrogation, or through machine learning, to direct researchers to the correct material and so removing the guesswork from materials design. In this framework, various open-domain DBs have become available online. And, a similar trend has also been observed in the experimental community. In the latter case, some of the DBs are open (e.g. at the National Institute of Standards and Technology) whilst others are marketed by companies. The number of available DBs is such that it has now become impossible to cite them all. Some initiatives have actually been started to collect as many links as possible to those DBs (see e.g., Refs.[3] and [4]).
The current materials data landscape is quite fragmented. In some of those cases, a Representational State Transfer (REST) Application Program Interface (API) is available [4] to interrogate the DB through scripts (though not always documented). But, until recently, it was only possible to interrogate one DB at a time and the APIs would vary from one DB to another. Furthermore, the lack of data standards in materials complicates gaining insights from large-scale materials data. Flexible, uniform, computer-readable data standards should be established to enable data to be shared and systematically mined.
In order to try to overcome this situation, the OPTIMADE consortium has been created to form a single API. The consortium has gathered the developers/maintainers of the following DBs:
AFLOW distributed materials property repository: http://aflow.org
ChemDataExtractor in Cambridge: http://chemdataextractor.org
Materials Cloud: http://materialscloud.org
Materials Project: http://materialsproject.org
NOMAD (Novel Materials Discovery) Archive: https://nomad-lab.eu/prod/rae/gui/search
Open Quantum Materials Database: http://oqmd.org
Computational Materials Repository: http://cmr.fysik.dtu.dk
Open Materials Database: http://openmaterialsdb.se
Theoretical Crystallography Open Database: http://www.crystallography.net/tcod
Under the OPTIMADE banner, five workshops havecreated a community involved in the development of the OPTIMADE API (which is meant to be further expanded as much as possible by involving more people). Thanks to the discussions (that have taken place during the workshops and through the mailing list) the first stable version (v1.0) of the OPTIMADE API specifications has been released [6,7]. All of these databases have now implemented the OPTIMADE API, providing scientists with ready access to a wealth of materials data.
A paper about the API has recently been accepted for publication in the prestigious peer reviewed journal Nature Scientific Data [8], which should act as a launchpad for future adoption and usage of the API. Furthermore, a website has been set up [5] with a mailing list, a wiki, and a GitHub repository. optimade-python-tools have been developed and releasted as the reference implementation for python servers, lowering the barrier to new OPTIMADE adopters.
References
Gian-Marco Rignanese (Université catholique de Louvain) - Organiser
Germany
Markus Scheidgen (Humboldt-Universität zu Berlin) - Organiser
Lithuania
Saulius Gražulis (Vilnius University Life Science Center Institute of Biotechnology) - Organiser
Sweden
Rickard Armiento (Linköping University) - Organiser
Switzerland
Giovanni Pizzi (Paul Scherrer Institute PSI) - Organiser
United Kingdom
Gareth Conduit (University of Cambridge) - Organiser
United States
Cormac Toher (University of Texas at Dallas) - Organiser