In 1991, David H. Bailey published his insightful “Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers.” In that humorous article, Bailey pinpointed typical “evade and disguise” techniques for presenting mediocre performance results in the best possible light. These are the original 12 ways:
- Quote only 32-bit performance results, not 64-bit results.
- Present performance figures for an inner kernel, and then represent these figures as the performance of the entire application.
- Quietly employ assembly code and other low-level language constructs.
- Scale up the problem size with the number of processors, but omit any mention of this fact.
- Quote performance results projected to a full system.
- Compare your results against scalar, unoptimized code on Crays.
- When direct run time comparisons are required, compare with an old code on an obsolete system.
- If MFLOPS rates must be quoted, base the operation count on the parallel implementation, not on the best sequential implementation.
- Quote performance in terms of processor utilization, parallel speedups or MFLOPS per dollar.
- Mutilate the algorithm used in the parallel implementation to match the architecture.
- Measure parallel run times on a dedicated system, but measure conventional run times in a busy environment.
- If all else fails, show pretty pictures and animated videos, and don’t talk about performance.
There are further explanations in the original paper for each item.
After two decades, it’s high time for an update. In 1991 the supercomputing landscape was governed by the “chicken vs. oxen” debate: The famous question “If you were plowing a field, which would you rather use?… Two strong oxen or 1024 chickens?” is attributed to Seymour Cray who couldn’t have said it better. Cray’s machines were certainly dominating in the oxen department, but competition from massively parallel systems like the Connection Machine was building up. At that time, users were much more used to dive into system-specific optimizations — with no MPI and OpenMP standards, portability of parallel programs was pretty much restricted to a certain vendor. And the use of double precision floating point was probably not as much a matter of course as it is today.
In the past two decades, hybrid, hierarchical systems, multi-core processors, accelerator technology, and the dominating presence of commodity hardware have reshaped the landscape of High Performance Computing. It’s also not so much oxen vs. chickens anymore; ants have received more than their share of hype. However, some things never change. My points (which I prefer to call “stunts”) are derived from Bailey’s original collection, and some are identical or merely reformulated. Others are new, reflecting today’s boundary conditions.
Although these musings are certainly inspired by experience with many publications and talks in HPC, I wish to point out that (i) no offense is intended, (ii) I am not immune to the inherent temptations myself and (iii) this all still just meant to be fun.
This is the list of stunts. It will be extended along the way:
- Report speedup instead of absolute performance!
- Slow down code execution!
- The log scale is your friend!
- Quietly employ weak scaling to show off!
- Instead of performance, plot absolute runtime versus CPU count!
- Ignore affinity and topology issues!
- Be creative when comparing scaled performance!
- Impress your audience with awe-inspiring accuracy!
- Boast massive speedups with accelerators!
- Always emphasize the “interesting” part of your work!
- Show data! Plenty. And then some.
- Redefine “performance” to suit your needs!
- If they get you cornered, blame it all on OS jitter!
- Secretly use fancy hardware setups and software tricks!
- Play mysterious!
- Worship the God of Automation!