Markus Wittmann's Blog

The current Intel compilers do not generate x87 code in favor of SSE instructions for floating-point operations. According to the documentation the Intel compilers can only be forced to do so by generating code explicitly for the IA32 architecture (via -mia32 compiler switch). Surprisingly exactly these x87 instructions were found in a physics code where explicitly a SSE2 capable CPU was targeted (via -xsse4.2). Together with Georg Hager we found that the reason are complex double-precision floating-point divisions.

Complex division of two complex numbers e = a + bi and f = c + di is carried out as
[latex]
\frac{a + b i}{c + d i} =
\frac{(a + b i) (c – d i)}{(c + d i) (c – d i)} =
\frac{ac + bd}{c^2 + d^2} + \frac{bc – ad}{c^2 + d^2} i.
[/latex]
The intermediate result of c² + d² can exceed its range if the exponents of c or d are already large [1]. To increase the range the compiler performs this computations on the x87 FPU which can use (IEEE) 80-bit extended double precision instead of the (IEEE) 64-bit double precision.

If you are sure the ranges will not exceed during a complex division the usage of x87 can be turned off, so that only SSE/AVX instructions are used.

Intel Compiler Options[2]:

-no-complex-limited-range (default): use x87 for complex division.
-complex-limited-range: do not use x87 for complex division.

GCC uses Smith’s method instead [4] via a call to __divsc3 for single precision and a call to __divdc3 for double precision in libm [5]. The options are [3]:

-fno-cx-limited-range (default): use Smith’s method for complex division.
-fcx-limited-range: do not use Smith’s method for complex division; this is automatically turned on with -ffast-math.

[1] M. Baudin and R. L. Smith. A Robust Complex Division in Scilab. arXiv:1210.4539.
[2] https://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/cpp/win/copts/common_options/option_complex_limited_range.htm
[3] https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gcc/Optimize-Options.html#Optimize-Options
[4] Robert L. Smith. Algorithm 116: Complex division. Commun. ACM, 5(8):435, 1962, doi:10.1145/368637.368661.
[5] https://sourceware.org/git/?p=glibc.git;a=blob;f=math/divtc3.c;hb=HEAD

Inhalt