x87 not completely dead yet

The current Intel compilers do not generate x87 code in favor of SSE instructions for floating-point operations. According to the documentation the Intel compilers can only be forced to do so by generating code explicitly for the IA32 architecture (via -mia32 compiler switch). Surprisingly exactly these x87 instructions were found in a physics code where explicitly a SSE2 capable CPU was targeted (via -xsse4.2). Together with Georg Hager we found that the reason are complex double-precision floating-point divisions.

Complex division of two complex numbers e = a + bi and f = c + di is carried out as
\frac{a + b i}{c + d i} =
\frac{(a + b i) (c – d i)}{(c + d i) (c – d i)} =
\frac{ac + bd}{c^2 + d^2} + \frac{bc – ad}{c^2 + d^2} i.
The intermediate result of c² + d² can exceed its range if the exponents of c or d are already large [1]. To increase the range the compiler performs this computations on the x87 FPU which can use (IEEE) 80-bit extended double precision instead of the (IEEE) 64-bit double precision.

If you are sure the ranges will not exceed during a complex division the usage of x87 can be turned off, so that only SSE/AVX instructions are used.

Intel Compiler Options[2]:

  • -no-complex-limited-range (default): use x87 for complex division.
  • -complex-limited-range: do not use x87 for complex division.

GCC uses Smith’s method instead [4] via a call to __divsc3 for single precision and a call to __divdc3 for double precision in libm [5]. The options are [3]:

  • -fno-cx-limited-range (default): use Smith’s method for complex division.
  • -fcx-limited-range: do not use Smith’s method for complex division; this is automatically turned on with -ffast-math.

[1] M. Baudin and R. L. Smith. A Robust Complex Division in Scilab. arXiv:1210.4539.
[4] Robert L. Smith. Algorithm 116: Complex division. Commun. ACM, 5(8):435, 1962, doi:10.1145/368637.368661.