Georg Hager's Blog

2^nd International Workshop on Performance Modeling: Methods and Applications (PMMA16)

at ISC High Performance 2016, Frankfurt, Germany, June 23, 2016

Location: Room “Gold 2”, Marriott Hotel

Time: 9:00am-1:00pm

Workshop Scope

Understanding the performance characteristics of computer programs has been the subject of intense research over several decades. The growing heterogeneity, parallelism, and general complexity of computer architectures have made this venture more and more challenging. Numerous approaches provide insights into different aspects of performance; they range from resource-based analytical modeling of single loops to complex statistical models or machine learning concepts that deal with whole groups of full applications in production environments. In recent years, the energy consumption aspects of computing have received attention due to rising infrastructure costs and operational boundary conditions, adding even more complexity to the model-building process.
This workshop wants to bring together experts in performance modeling and related fields, such as modeling tools, model-guided code optimization, systems characterization, and future architectures, to present their latest research. We explicitly want to address all layers of hardware from the core up to the exascale level, and all layers of software from the OS to large-scale applications.

Preliminary agenda (see further below for abstracts)

Time	Speaker	Title
9:00am-10:00am	Darren J. Kerbyson, Pacific Northwest National Laboratory	Keynote presentation: Integrating Prediction, Provenance, and Optimization into High Energy Physics Workflows
10:00am-10:20am	Marat Dukhan, Georgia Institute of Technology	Wanted: Floating-Point Add Round-off Error instruction
10:20am-10:40am	Stephan Eidenbenz, Los Alamos National Laboratory	Simulation-Based and Analytical Models for Energy Use Prediction
10:40am-11:00am	Danilo Guerrera, University of Basel	Reproducible Stencil Compiler Benchmarks Using PROVA!
11:00am-11:30am	Coffee Break
11:30am-12:30pm	John D. McCalpin, Texas Advanced Computing Center	Keynote presentation: The Surprising Effectiveness of Non-Overlapping, Sensitivity-Based Performance Models
12:30pm-1:00pm	Harald Köstler, University of Erlangen-Nuremberg	Invited talk: Performance Engineering in the ExaStencils project

Keynote talks:

Darren J. Kerbyson: Integrating Prediction, Provenance, and Optimization into High Energy Physics WorkflowsAbstract: For a long time we have seen the benefit and impact that performance modeling can have in the optimization of applications and systems for scientific discovery. Models can be used early in design to explore alternatives and guide implementations. They can be used to explain deficiencies and hence guide optimizations. And they can be used to dynamically steer processing during execution. As we approach exascale levels of computation the complexities of modeling, and its possible impact, increases. Additionally, we are seeing an upturn in the generation of large data volumes from signature instruments that require processing on large-scale systems that are geographically distributed. One such example is the Belle II experiment that is designed to probe the interactions of the fundamental constituents of our universe using the KEK accelerator in Tsukuba, and will generate 25 petabytes of raw data per year. During the course of the experiments, the necessary storage is expected to reach over 350 petabytes. Users, data, storage and compute resources are geographically distributed across the world creating a complex data intensive workflow.In this talk we will describe our approach being taken to develop efficient execution and optimization methods for extreme-scale workflows on distributed resources that integrates provenance, performance modeling, and optimization-based scheduling. The key components of this include modeling and simulation methods to quantitatively predict workflow component behavior leveraging the PALM modeling tool at PNNL; optimized decision making such as choosing an optimal subset of resources to meet demand, assignment of tasks to resources, and placement of data to minimize data movement; prototypical testbeds for workflow execution on distributed resources; and provenance methods for collecting appropriate performance data. This brings together expertise in multiple domains but central in its approach is the use of performance modeling that is using used to both: explore-in-advance of implementation, and to dynamically optimize at run-time workflow execution.Speaker Bio: Darren Kerbyson is a laboratory Fellow and Associate Division Director for High Performance Computing at the Pacific Northwest National Laboratory. Before this he spent 10 years at Los Alamos at the lead of the Performance and Architecture Lab (PAL). He received his BSc in Computer Systems Engineering in 1988, and PhD in Computer Science in 1993 both from the University of Warwick (UK). Between 1993 and 2001 he was a Senior Faculty member of Computer Science at Warwick. His research interests include performance evaluation, performance and power modeling, and optimization of applications on high performance systems as well as image analysis. He currently leads several projects in these areas. He has published over 150 papers in these areas over the last 25 years. He is a member of the IEEE Computer Society and the Association for Computing Machinery.
John D. McCalpin: The Surprising Effectiveness of Non-Overlapping, Sensitivity-Based Performance ModelsAbstract: Understanding the performance of full-scale applications on modern HPC clusters is challenging in the best of circumstances. Detailed tracing with manual code instrumentation and analysis is too labor-intensive to be practical at a major supercomputing center with dozens of significant workloads across a wide variety of application areas. Attempting application characterization using hardware performance counters requires much less human effort, but is quickly blocked by a lack of information about what the hardware is supposed to be doing, what the counters are supposed to be counting, and whether the counters are reporting correct values. An alternative approach is presented here, based on additive (non-overlapping) performance models with coefficients determined by the sensitivity of execution time measurements to variations in hardware parameters such as CPU frequency, number of cores used, DRAM frequency, and interconnect bandwidth. These models were initially developed using proprietary system settings while the author was working in the performance teams at SGI, IBM, and AMD, but over time these control features have become available on on standard high-volume server processors. This talk reviews the assumptions of the technique and demonstrates its effectiveness for two-component and three-component execution time models using the SPECfp_rate2000 and SPECfp_rate2006 benchmarks on older hardware. New results for single-node runs of WRF (mesoscale weather), FLASH4 (forced 3D turbulence), NAMD (molecular dynamics), and the NERSC MiniDFT (Density Functional Theory) benchmark on Xeon E5 v3 (Haswell) processors are also presented. The performance results across a variety of hardware configurations can be used to derive robust bounds on the coefficients of the models if the assumption of non-overlapping execution time components is relaxed. These bounds are typically rather weak, but the technique still provides excellent fits to the data. The talk concludes with some speculations on the performance characteristics that allow the models to work so well despite the weak formal bounds.Speaker Bio: John joined TACC in 2009 as a Research Scientist in the High Performance Computing Group after a twelve year career in performance analysis and system architecture in the computer industry. His industrial experience includes 3 years at SGI (performance analysis and optimization on the Origin2000 and performance lead on the architecture team for the Altix3000), 6 years at IBM (performance analysis for HPC, processor and system design for Power4/4+ and Power5/5+), and 3 years at AMD (accelerated computing technologies and performance analysis). Prior to his industrial career, John was an oceanographer (Ph.D., Florida State), spending six years as an assistant professor at the University of Delaware engaged in research and teaching on numerical simulation of the large-scale circulation of the oceans.

See below for the original workshop CfP.

Topics of Interest

Suitable submissions should come from the following or related areas:

Analytic (first-principles) performance, energy, and scalability modeling of algorithmic building blocks and/or applications
Microbenchmarking for exploring hardware architectures and performance-related software properties
Statistical methods and curve-fitting for analyzing and predicting performance behavior
Machine learning approaches for (semi-)automated generation of performance insight
Model-guided code/algorithm optimization for real-world applications on the core, the node, or the highly parallel level
Hardware design space exploration for performance and energy aspects of computation
Hardware infrastructure for supporting and validating performance and power modeling
Approaches for evaluating energy efficiency across architectures in a well-defined way
Model-based evaluation of performance portability
Tool development, including visualization, for supporting all of the above

Submissions dealing with algorithms and architectures outside the scientific computing community are strongly encouraged.

Submission Guidelines

Authors are invited to submit full papers with unpublished, original work of no more than 12 pages of single-column text using the Springer LNCS style. All accepted papers (subject to post-review revisions) will be published in a separate ISC16 Workshop Proceedings volume by Springer (approval pending).

Authors of selected papers will be invited to submit extended versions of their manuscripts to a journal special issue.

Submissions must be done via EasyChair at: https://www.easychair.org/conferences/?conf=pmma16

Important Dates

Paper submission deadline: Monday, April 4, 2016
Author notification: Friday, April 29, 2016
Camera-ready submission: Monday, May 16, 2016
Workshop day: Thursday, June 23, 2016

Workshop Committee

Georg Hager, RRZE (organizer)
Gerhard Wellein, FAU Erlangen-Nuremberg (organizer)
Kevin Barker, PNNL
Jeffrey Vetter, ORNL
Laura Carrington, SDSC
William Jalby, UVSQ
Judit Gimenez, BSC
Sven Apel, University of Passau
Kengo Nakajima, University of Tokyo
Felix Wolf, University of Darmstadt
Trevor Carlson, University of Uppsala
Harald Köstler, FAU Erlangen-Nuremberg

Random thoughts on High Performance Computing

Content

ISC16 Workshop

2^nd International Workshop on Performance Modeling: Methods and Applications (PMMA16)

Workshop Scope

Preliminary agenda (see further below for abstracts)

Keynote talks:

Topics of Interest

Submission Guidelines

Important Dates

Workshop Committee

Georg Hager's Blog

Random thoughts on High Performance Computing

Content

2nd International Workshop on Performance Modeling: Methods and Applications (PMMA16)

Workshop Scope

Preliminary agenda (see further below for abstracts)

Keynote talks:

Topics of Interest

Submission Guidelines

Important Dates

Workshop Committee

2^nd International Workshop on Performance Modeling: Methods and Applications (PMMA16)