The current Intel compilers do not generate x87 code in favor of SSE instructions for floating-point operations. According to the documentation the Intel compilers can only be forced to do so by generating code explicitly for the IA32 architecture (via -mia32
compiler switch). Surprisingly exactly these x87 instructions were found in a physics code where explicitly a SSE2 capable CPU was targeted (via -xsse4.2
). Together with Georg Hager we found that the reason are complex double-precision floating-point divisions.
Complex division of two complex numbers e = a + bi and f = c + di is carried out as
[latex]
\frac{a + b i}{c + d i} =
\frac{(a + b i) (c – d i)}{(c + d i) (c – d i)} =
\frac{ac + bd}{c^2 + d^2} + \frac{bc – ad}{c^2 + d^2} i.
[/latex]
The intermediate result of c² + d² can exceed its range if the exponents of c or d are already large [1]. To increase the range the compiler performs this computations on the x87 FPU which can use (IEEE) 80-bit extended double precision instead of the (IEEE) 64-bit double precision.
If you are sure the ranges will not exceed during a complex division the usage of x87 can be turned off, so that only SSE/AVX instructions are used.
Intel Compiler Options[2]:
-no-complex-limited-range
(default): use x87 for complex division.-complex-limited-range
: do not use x87 for complex division.
GCC uses Smith’s method instead [4] via a call to __divsc3
for single precision and a call to __divdc3
for double precision in libm
[5]. The options are [3]:
-fno-cx-limited-range
(default): use Smith’s method for complex division.-fcx-limited-range
: do not use Smith’s method for complex division; this is automatically turned on with-ffast-math
.
[1] M. Baudin and R. L. Smith. A Robust Complex Division in Scilab. arXiv:1210.4539.
[2] https://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/cpp/win/copts/common_options/option_complex_limited_range.htm
[3] https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gcc/Optimize-Options.html#Optimize-Options
[4] Robert L. Smith. Algorithm 116: Complex division. Commun. ACM, 5(8):435, 1962, doi:10.1145/368637.368661.
[5] https://sourceware.org/git/?p=glibc.git;a=blob;f=math/divtc3.c;hb=HEAD