Kompetenznetzwerk für wissenschaftliches Höchstleistungsrechnen in Bayern


OMI4papps: Optimization, Modeling and Implementation for highly parallel applications

Project summary

In the emerging era of multi-/many-core technologies an improvement of computational speed for a specific application or numerical problem can be gained either by new numerical methods or by code optimization and parallelization. The latter approach often requires deep insights into computer architectures, optimization strategies and parallel programming models as well as parallel libraries. Most numerical research groups can not keep track of the rapid changes now happening with the advent of multi-/many-core technologies and require comprehensive highlevel support to address the problems arising from massive parallelism, shared onchip resources or ccNUMA data locality, to name only a few. This KONWIHR-II project addresses those problems and is intended to provide a central HPC user support resource for HLRB and KONWIHR projects and other research groups that rely on computationally intensive simulations. The project is hosted by the HPC support groups of the Computing Center Erlangen (RRZE) and Leibniz Supercomputing Centre (LRZ). These support groups have proven special expertise in parallelization and optimization of user applications, programming and evaluating new (highly parallel) architectures, optimizing data access as well as establishing performance models for complete applications and numerical kernels.

KONWIHR funding and follow-up projects

  • OMI4papps is a follow-up project of RRZE’s cxHPC
  • OMI4papps is also a follow-up project of LRZ’s previous support project
  • KONWIHR funding of OMI4papps: 9/2008 – 8/2013


  • Dr. Matthias Brehm, LRZ-München
  • Prof. Dr. Gerhard Wellein, Regionales Rechenzentrum Erlangen, Uni-Erlangen

Project staff:

  • Dr. Jan Treibig, Regionales Rechenzentrum Erlangen, Uni-Erlangen
  • Dr. Volker Weinberg, LRZ-München

Publications and presentations

  • Momme Allalen, Ferdinand Jamitzky, Helmut Satzger: Real World Application Acceleration with GPGPUs, inSiDE, Vol. 8 No. 1 (2010). http://inside.hlrs.de/htm/Edition_01_10/article_13.html
  • H. Stüben, M. Allalen: Extreme Scaling of the BQCD Benchmark, Jülich Blue Gene/P Extreme Scaling Workshop 2010, Technical Report FZJ-JSC-IB-2010-03, (2010). http://www.fz-juelich.de/jsc/docs/printable/ib/ib-10/ib-2010-03.pdf
  • J. Treibig, G. Hager, G. Wellein: Multi-core architectures: Complexities of performance prediction and the impact of cache topology, Konwihr/HLRB Springer Band 2010, Springer (Berlin, Heidelberg), (2010). Preprint
  • J. Treibig, G. Hager, G. Wellein: LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments, accepted for First International Workshop on Parallel Software Tools and Tool Infrastructures, (2010). PrePrint
  • J. Treibig, G. Wellein, G. Hager: Efficient multicore-aware parallelization strategies for iterative stencil computations, submitted to Journal of Computational Science (Ed: P.M.A. Sloot, P.V. Coveney, J. Dongarra), Elsevier, (2010). Preprint
  • J. Treibig, M. Meier, G. Hager, G. Wellein: LIKWID Performance Tools, inSiDE, Vol. 8 No. 1 (2010) 50-53. pdf
  • Volker Weinberg, Matthias Brehm, Iris Christadler: OMI4papps: Optimisation, Modelling and Implementation for Highly Parallel Applications, HLRB, KONWIHR and Linux-Cluster Review and Results Workshop , to be published by Springer, (2010). http://arxiv.org/abs/1001.1860
  • M. Wittmann, G. Hager, J. Treibig, G. Wellein: Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters, submitted, (2010). arXiv:1006.3148
  • Iris Christadler, Volker Weinberg: RapidMind: Portability across Architectures and its Limitations, Technischer Bericht, LRZ Garching, (2009). http://arxiv.org/abs/1001.1902
  • Erbacci, Cavazzoni, Spiga, Christadler: Report on petascale sortware libraries and programming models, Report, PRACE Project, (2009). http://www.prace-project.eu/documents/public-deliverables/d6-6.pdf
  • J. Treibig, G. Hager: Introducing a Performance Model for Bandwidth-Limited Loop Kernels, Proceedings of the Workshop „Memory issues on Multi- and Manycore Platforms“ at PPAM 2009, the 8th International Conference on Parallel Processing and Applied Mathematics, (2009). Preprint
  • G. Wellein, G. Hager, T. Zeiser, M. Wittmann, H. Fehske: Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization, Proceedings of COMPSAC 2009, the 33rd Annual IEEE International Computer Software and Applications Conference, Seattle, (2009). DOI:10.1109/COMPSAC.2009.82

See cxHPC for further publications and presentations related to the project and its forerunner at the RRZE