Exploiting heterogeneous multi-core and many-core platforms for atomic and molecular simulations
- Mike Ashworth (STFC Daresbury Laboratory, United Kingdom)
- Godehard Sutmann (Research Center Jülich, Germany)
For many years the speed of high-end computer systems used for atomic and molecular simulations has increased following Moore’s Law (doubling roughly every eighteen months) simply through the increased speed of the hardware. This has been principally through increases in clock speed and in instruction level parallelism (ILP): the number of operations per clock cycle. Thus, the same programs have delivered increased scientific output with little need for software development. In the last few years, clock speed has peaked at around 2-3 GHz due to limitations in chip power densities and further increases in ILP are not expected. Computer system performance is still increasing, and is still following Moore’s Law, but now entirely due to massive increases in parallelism. This requires a software revolution.
Driven by power constraints, computer systems are now built out of many-core CPUs with increasing numbers of cores: 16-32 on a node is typical. Additionally heterogeneous nodes are available with GPU or Intel Xeon Phi accelerators containing in the former case hundreds of simple cores and for the latter currently around 60 cores. These core numbers are all expected to increase in future generations. Looking forward to Exascale systems, it is predicted that tens or hundreds of millions of threads of execution will be required to exploit such a system.
Programming models have been relatively static over the last two decades with the combination of a serial language (Fortran or C) with the MPI message passing library enabling the exploitation of massively parallel systems with up to 100,000 cores. With fatter many-core nodes it is becoming critically important to introduce an additional level of parallelism at the thread level. Unfortunately there is no single programming model for this and codes are now being written exploiting OpenMP, CUDA, OpenCL and OpenAcc in addition to retaining distributed memory parallelism using MPI.
The purpose of the workshop on “Exploiting multi-core and many-core platforms for atomic and molecular simulations” is to provide a forum for open discussion to better understand the programming models and strategies that will be needed for the new generation of atomic, molecular, physics, chemistry and materials simulations on current and near-future multi-core and many-core platforms. Participants will present and discuss their most recent findings in an informal open format.
A large number of papers have appeared over the last five years or so describing the use of GPUs and, more recently, Intel Xeon Phi (MIC) processors to accelerate simulations of atomic and molecular systems. In many cases this requires a partial or complete re-design of the code and the algorithms used. Many of these activities have been successful in demonstrating performance improvements, sometimes dramatic, though in some cases only for a limited part of the full functionality of the codes. Thus use of these emerging architectures platforms is still a somewhat specialist activity and has not yet reached the mainstream for production scientific research.
Please register at the STFC event page which also includes details about local arrangements https://eventbooking.stfc.ac.uk/news-events/many-core-platforms
Qiang Wu, Canqun Yang, Tao Tang, Liquan Xiao, MIC acceleration of short-range molecular dynamics simulations, COSMIC '13 Proceedings of the First International Workshop on Code Optimisation for Multi- and Many-Cores, Article No. 2, ACM New York, NY, USA, 2013
Samuli Hakala, Ville Havu, Jussi Enkovaara, Risto Nieminen, Parallel Electronic Structure Calculations Using Multiple Graphics Processing Units (GPUs), Applied Parallel and Scientific Computing, Lecture Notes in Computer Science Volume 7782, 2013, pp. 63-76
R.M. Caplan, NLSEmagic: Nonlinear Schrödinger equation multi-dimensional Matlab-based GPU-accelerated integrators using compact high-order schemes, Computer Physics Communications Volume 184, Issue 4, April 2013, pp. 12501271
Stuart D. C. Walsh and Martin O. Saar, Developing Extensible Lattice-Boltzmann Simulators for General-Purpose Graphics-Processing Units Commun. Comput. Phys., Vol. 13, No. 3, pp. 867-879, March 2013
Andreas W. Götz, Mark J. Williamson, Dong Xu, Duncan Poole, Scott Le Grand, and Ross C. Walker, Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs: 1. Generalized Born, J Chem Theory Comput. 2012 May 8; 8(5) pp. 15421555
Joshua A. Anderson, Chris D. Lorenz, A. Travesset , General purpose molecular dynamics simulations fully implemented on graphics processing units, Journal of Computational Physics, Volume 227, Issue 10, 1 May 2008, pp. 53425359