(See the prelude for some general information about what this is all about)
Common sense dictates that whenever you speed up any part of an application, be it computation, communication, or I/O, time to solution must go down. Why should one then try to slow down computations? In a sense, this stunt is similar to Stunt 1, but there’s more to it: Whenever there is some parallel overhead that adds to pure code execution time, the denominator in our “speedup” formula from Stunt 1 gets larger, impeding scalability. To make the discussion more general, let’s also scale the parallel problem size with a factor proportional to Nα:
Here, cα(N) summarizes all the overhead that is not directly related to code execution: Communication, I/O, synchronization, etc. Now if those parts of this expression except cα(N) get larger (e.g., by a factor of μ>1), the impact of overhead goes down. In layman’s terms, this effect can be summarized as “A slow machine scales better,” and it is one of the key reasons why Stunt 1 works.
Three corollaries immediately follow from this:
- Do not use high compiler optimization levels or the latest compiler versions. This is always possible if the machine you use just isn’t slow enough.
- Use a convoluted C++ framework that hides all performance complexities by neatly overloaded operators and template mechanics. You can then claim that, since the compiler will generate “optimal” code, performance is not your concern any more.
- If scalability is still bad, parallelize some short loops with OpenMP. That way you can get some extra bonus for a scalable hybrid code! Everyone knows today that “one should go hybrid”, even if there’s no indication that this will do any good.
If someone asks for time to solution, answer that if you had a bigger machine, you could get the solution as fast as you want. This is of course due to the superior scalability of your code!
However, let’s not forget that there are valid arguments for machines with slow processors like the IBM Blue Gene. Apart from the power consumption issue (a core that is μ times slower than a standard x86 core consumes far less than 1/μ times the power), it can be beneficial to use μN slow CPUs instead of N fast ones, if communication overhead has a certain dependence on N. See our book for a theoretical treatment of “slow computing”.