Extended Software Development Workshop: Scaling Electronic Structure Applications
- Nick Papior (DTU Nanotech, Denmark)
- Yann Pouillon (University of Cantabria, Spain)
- Micael Oliveira (Max Planck Institute for the Structure and Dynamics of Matter, Hamburg, Germany)
- Fabiano Corsetti (Synopsys QuantumWise, Denmark)
- Volker Blum (Duke University, Durham, NC, USA, USA)
- Emilio Artacho (Cavendish Laboratory, University of Cambridge, United Kingdom)
The evolutionary pressure on electronic structure software development is greatly increasing, due to the emergence of new paradigms, new kinds of users, new processes, and new tools. The large feature-full codes that were once developed within one field are now undergoing a heavy restructuring to reach much broader communities, including companies and non-scientific users. More and more use cases and workflows are performed by highly-automated frameworks instead of humans: high-throughput calculations and computational materials design, large data repositories, and multiscale/multi-paradigm modeling, for instance. At the same time, High-Performance Computing Centers are paving the way to exascale, with a cascade of effects on how to operate, from computer architectures to application design. The disruptive paradigm of quantum computing is also putting a big question mark on the relevance of all the ongoing efforts.
All these trends are highly challenging for the electronic structure community. Computer architectures have become rapidly moving targets, forcing a global paradigm shift. As a result, long-ignored and well-established software good practices that were summarised in the Agile Manifesto nearly 20 years ago are now adopted at an accelerating pace by more and more software projects. With time, this kind of migration is becoming a question of survival, the key for a successful transformation being to allow and preserve an enhanced collaboration between the increasing number of disciplines involved. Significant efforts of integration from code developers are also necessary, since both hardware and software paradigms have to change at once.
Two major issues are also coming from the community itself. Hybrid developer profiles, with people fluent both in computational and scientific matters, are still difficult to find and retain. On the long run, the numerous ongoing training initiatives will gradually improve the situation, while on the short run, the issue is becoming more salient and painful, because the context evolves faster than ever. Good practices have usually been the first element sacrificed in the "publish or perish" race. New features have usually been bound to the duration of a post-doc contract and been left undocumented and poorly tested, favoring the unsustainable "reinventing the wheel" syndrome.
Addressing these issues requires coordinated efforts at multiple levels:
- from a methodological perspective, mainly through the creation of open standards and the use of co-design, both for programming and for data;
- regarding documentation, with a significant leap in content policies, helped by tools like Doxygen and Sphinx, as well as publication platforms like ReadTheDocs;
- for testing, by introducing test-driven development concepts and systematically publishing test suites together with software;
- considering deployment, by creating synergies with popular software distribution systems;
- socially, by disseminating the relevant knowledge and training the community, through the release of demonstrators and giving all stakeholders the opportunity to meet regularly.
This is what the Electronic Structure Library (ESL) has been doing since 2014, with a wiki, a data-exchange standard, refactoring code of global interest into integrated modules, and regularly organising workshops, within a wider movement lead by the European eXtreme Data and Computing Initiative (EXDCI).
Since 2014, the Electronic Structure Library has been steadily growing and developing to cover most fundamental tasks required by electronic structure codes. In February 2018 an extended software development workshop will be held at CECAM-HQ with the purpose of building demonstrator codes providing powerful, non-trivial examples of how the ESL libraries can be used. These demonstrators will also provide a platform to test the performance and usability of the libraries in an environment as close as possible to real-life situations. This marks a milestone and enables the next step in the ESL development: going from a collection of libraries with a clear set of features and stable interfaces to a bundle of highly efficient, scalable and integrated implementations of those libraries.
Many libraries developed within the ESL perform low-level tasks or very specific steps of more complex algorithms and are not capable, by themselves, to reach exascale performances. Nevertheless, if they are to be used as efficient components of exascale codes, they must provide some level of parallelism and be as efficient as possible in a wide variety of architectures. During this workshop, we propose to perform advanced performance and scalability profiling of the ESL libraries. With that knowledge in hand it will be possible to select and implement the best strategies for parallelizing and optimizing the libraries. Assistance from HPC experts will be essential and is an unique opportunity to foster collaborations with other Centres of Excellence, like PoP (https://pop-coe.eu/) and MaX (http://www.max-centre.eu/).
Based on the successful experience of the previous ESL workshops, we propose to divide the workshop in two parts. The first two days will be dedicated to initial discussions between the participants and other invited stakeholders, and to presentations on state-of-the art methodological and software developments, performance analysis and scalability of applications. The remainder of the workshop will consist in a 12 days coding effort by a smaller team of experienced developers. Both the discussion and software development will take advantage of the ESL infrastructure (wiki, gitlab, etc) that was set up during the previous ESL workshops.
 See http://www.nanogune.eu/es/projects/spanish-initiative-electronic-simulations-thousands-atoms-codigo-abierto-con-garantia-y and
 See http://pymatgen.org/ and http://www.aiida.net/ for example.
 https://arxiv.org/pdf/1405.4464.pdf (sustainable software engineering)
 Several long-running projects routinely use modern bug trackers and continuous integration, e.g.: http://gitlab.abinit.org/, https://gitlab.com/octopus-code/octopus, http://qe-forge.org/, https://launchpad.net/siesta
 Transition of HPC Towards Exascale Computing, Volume 24 of Advances in Parallel Computing, E.H. D'Hollander, IOS Press, 2013, ISBN: 9781614993247
 See https://en.wikipedia.org/wiki/Open_standard and https://en.wikipedia.org/wiki/Participatory_design
 See http://www.doxygen.org/, http://www.sphinx-doc.org/, and http://readthedocs.org/
 See https://en.wikipedia.org/wiki/Test-driven_development and http://agiledata.org/essays/tdd.html
 See e.g. http://www.etp4hpc.eu/en/esds.html
 See e.g. https://easybuilders.github.io/easybuild/, https://github.com/LLNL/spack, https://github.com/snapcore/snapcraft, and https://www.macports.org/ports.php?by=category&substr=science