(See the prelude for some general information on what this is all about)
Ok, so you’re in real trouble: The pimpled smarty-pants in the audience has figured out that the data you have shown in your talk was a pile of sugar-coated white noise. Or the conference deadline draws near, all you have is a pile of useless data, and your boss desperately needs another publication. Or your paper is already beyond the page limit, and you need a convenient shortcut.
In such a situation it is good to have a backup plan that will allow you to save your neck and still look good. Fortunately, computers are so complex that there is ample opportunity to blame something that is not your fault.
Enter the “technical-detail-not-under-my-control.” Here are your friends – each is followed by a typical usage example:
Stupid compilers: “Our version of the code shows slightly worse single-thread performance, which is presumably due to the limited optimization capabilities of the compiler.”
If the compiler can’t fix it, who can? And nobody wants to inspect assembly code anyway.
Out-of-order execution (or lack thereof): “Processor A shows better performance than processor B possibly because of A’s superior out-of-order processing capabilities.”
That’s a truly ingenious statement, mainly because it is so very hard to prove. Among friends you may just bluntly state that B has a lousy design.
L1 instruction cache misses: “As shown in Table 1, our optimized code version B is faster because it has 20% fewer L1 instruction cache misses than version A.”
Hardly anyone discusses L1 instruction cache misses, because they hardly ever do any harm. Thus they are the ideal scapegoat if you don’t know the exact reason for some effect. Be careful with presenting performance counter data, though: If the instruction count reveals that there is only one L1I miss every 10000 instructions, that’s not what you want to show in public…
TLB misses: “Performance shows a gradual breakdown with growing problem size. This may be caused by excessive penalties due to TLB misses.“
TLB misses are always a good choice because they have a reputation to be even more evil than cache misses! You may want to add that the time was too short to prove your statement by measuring the misses and estimate their actual impact, and then search for the real cause.
Bad prefetching: “Performance does not scale beyond four cores on a socket. We attribute this to problems with the prefetching hardware.”
Prefetching is necessary. Bad prefetching means bad performance because the latency (gasp!) cannot be hidden. Other reasons for performance saturation such as a bandwidth bottleneck or load imbalance are much too mundane to be interesting.
Bank conflicts: “Processor X has only [sic!] eight cache banks, which may explain the large fluctuations in performance vs. problem size.”
Wow. Cache banks. And you are the one to know that eight just won’t cut it.
Transient network errors: “In contrast to other high-performance networks such as Cray’s Gemini, InfiniBand does not have link-level error detection, which impacts the scalability of our highly parallel code.”
You are a good boy/girl, so you have read all of Cray’s marketing brochures. Link-level error detection is soooo cool, it just has to be what you need to scale.
OS jitter: “Beyond eight nodes our implementation essentially stops scaling. Since the cluster runs vanilla [insert your dearly hated distro here] Linux OS images, operating system noise (“OS jitter”) is the likely cause for this failure.”
My all-time favorite. This Linux c**p is beyond contempt.
The list can be extended, of course. Whatever technical detail you choose, make sure to pick one that is widely misunderstood or has a reputation of being evil, very complicated, hard to understand, and whose responsibility for your problem is next to impossible to prove. Note the peculiar diction; be noncommittal and use “may explain,” “could be,” “likely,” “presumably,” etc., so you won’t be accountable in case you’re proven wrong.