Intel has announced recently that their popular Architecture Code Analyzer (IACA) “has reached its End of Life” (sic!). Frankly speaking, it was never an official product anyway, but performance-aware bitfiddlers like my colleagues and me found it extremely useful. It’s strange that Intel decided to dump it right after a complete rewrite with version 3.0. Big mistake. Think “A380”.
Given a piece of object code, the latest version of IACA was able to calculate a prediction about its runtime, assuming no dependencies between instructions and full pipeline throughput. This is quite an optimistic assumption – earlier versions (here’s another useful thing they dumped) could also produce a “pessimistic” prediction based on the instruction latencies along the critical path. In reality, the actual runtime was typically in between, and an experienced performance engineer could read a lot out of the IACA output. Furthermore, the IACA predictions were one input to the ECM performance model and Kerncraft, our loop performance modeling tool.
Fortunately, alternatives exist. Besides LLVM-MCA, which may or may not be useful for some, our OSACA tool set out to become a full-fledged replacement for IACA, batteries included. Right now it can handle throughput analysis for Intel and AMD CPUs [1]; work on critical path analysis and support for ARM architectures is ongoing. Some undisclosed insight that was coded into IACA is unavailable to us, so predictions may differ. It’s work in progress, but you can check it out. Feedback is always welcome!
[1] J. Laukemann, J. Hammer, J. Hofmann, G. Hager, and G. Wellein: Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures. 2018 IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), Dallas, TX, USA, 2018, pp. 121-131. DOI: 10.1109/PMBS.2018.8641578. Preprint: arXiv:1809.00912