Georg Hager's Blog

Intel has announced recently that their popular Architecture Code Analyzer (IACA) “has reached its End of Life” (sic!). Frankly speaking, it was never an official product anyway, but performance-aware bitfiddlers like my colleagues and me found it extremely useful. It’s strange that Intel decided to dump it right after a complete rewrite with version 3.0. Big mistake. Think “A380”.

Given a piece of object code, the latest version of IACA was able to calculate a prediction about its runtime, assuming no dependencies between instructions and full pipeline throughput. This is quite an optimistic assumption – earlier versions (here’s another useful thing they dumped) could also produce a “pessimistic” prediction based on the instruction latencies along the critical path. In reality, the actual runtime was typically in between, and an experienced performance engineer could read a lot out of the IACA output. Furthermore, the IACA predictions were one input to the ECM performance model and Kerncraft, our loop performance modeling tool.

Fortunately, alternatives exist. Besides LLVM-MCA, which may or may not be useful for some, our OSACA tool set out to become a full-fledged replacement for IACA, batteries included. Right now it can handle throughput analysis for Intel and AMD CPUs [1]; work on critical path analysis and support for ARM architectures is ongoing. Some undisclosed insight that was coded into IACA is unavailable to us, so predictions may differ. It’s work in progress, but you can check it out. Feedback is always welcome!

[1] J. Laukemann, J. Hammer, J. Hofmann, G. Hager, and G. Wellein: Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures. 2018 IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), Dallas, TX, USA, 2018, pp. 121-131. DOI: 10.1109/PMBS.2018.8641578. Preprint: arXiv:1809.00912

Our PhD student Christie Louis Alappat, by winning the ACM Student Research Competition (SRC) at SC18 at the graduate level last year, has advanced to the ACM SRC Grand Finals, where the winners from 26 ACM conferences contend for the Grand Prize. For this last round he had to prepare a five-page paper about his research. This paper, and the whole body of his work, was evaluated again by a panel of judges.

We are now happy to announce that Christie has come second place in the Grand SRC Finale. Together with his advisor, Prof. Gerhard Wellein, he is invited to the awards ceremony which will take place in San Francisco on June 15. This is the very same ceremony at which Yoshua Bengio, Geoffrey Hinton, and Yann LeCun will receive the prestigious ACM Turing Award 2018 for their seminal work on deep learning algorithms. Talk about good company!

Christie’s research revolves around a long-standing problem in computer science: How must a graph be colored to enable parallel processing in the presence of dependencies? His solution, the “Recursive Algebraic Coloring Engine,” can be used to parallelize many sparse algorithms in a hardware-efficient way, taking the specific properties of modern multicore chips into account. It outperforms existing approaches and libraries by a significant margin at such an important operation as symmetric sparse matrix-vector multiplication (SymmSpMV), but its range of applicability is much broader. Christie has prepared a walk-through of his SC18 poster to explain the details:

Georg Hager's Blog

Random thoughts on High Performance Computing

Content

Intel Architecture Code Analyzer (IACA) R.I.P.! All hail OSACA!

Christie Alappat comes second place in ACM Student Research Competition Grand Finals