The field of scientific computing is becoming moreand more important in the educational plan of universities. In practice this involves teaching of numerical methods, implementation of algorithms and programming practice. In order to work on modern computational challenges and to compete with other groups on an international level, students should learn about parallel programming in an early stage of their career, i.e. before entering master or PhD level.
Learning about the possibilities andc apabilities of parallel computing in this stage provides a wide perspective to students and opens interesting paths to choose the way in their scientific career. Therefore it is necessary to teach students not only theoretically on parallel computing but to provide access to modern HPC architectures and give them the possibility to obtain experience in practical parallel programming. This also leads toyoung academics who will be well prepared to work on a thesis incompute intense fields.
There are two main directions in parallel computing, which both havetheir pro and cons. The one is based on a distributed memory model,which needs explicit communication between processes in order to sharedata. The standard approach is the use of MPI (Message PassingInterface). The other is based on a shared memory model, whereprocesses can access data of other processes via a global addressspace. The standard approach for this programming model is OpenMP. The latter approach, however, is only suited for multi-corearchitectures, where cores have access to the node'smemory. Therefore, this approach is usually adopted to parallelismwith small number of cores (O(10)). Both approaches are sometimescombined into a hybrid programming model, where OpenMP is used on thenodes and MPI between the nodes. Depending on the application thissometimes increases the range of scalability.
There are also extensions of parallel languages and interfaces in viewof new hardware architectures. E.g. the standard for GPGPUarchitectures is CUDA, for Cell/BE it is CellSs, although there aregeneralizations in work, i.e. OpenCL  and StarSs , which will beavailable for different architectures and might evolve to newstandards.
The programme starts with an introductory course concerning the techniques of parallel computing and the use of Juelich supercomputers (Jugene - IBM Blue Gene/P and Juropa - Bull/Nehalem cluster). The course consists of lectures and practical hands-on sessions. Each student is then assigned to a scientific core group where he is working on a project within the context of ongoing research interests of this group. In the ideal case, the student already mentions in his application in which group he would like to work. During the programme the student is then supervised by a researcher. In the end of the programme a 2-day seminar is organized where the students give a presentation of about 30 minutes each about their work. In addition each student prepares a final report (~10 pages each) which is published as techincal report [8-18].