The current Intel compilers do not generate x87 code in favor of SSE instructions for floating-point operations. According to the documentation the Intel compilers can only be forced to do so by generating code explicitly for the IA32 architecture (via
-mia32 compiler switch). Surprisingly exactly these x87 instructions were found in a physics code where explicitly a SSE2 capable CPU was targeted (via
-xsse4.2). Together with Georg Hager we found that the reason are complex double-precision floating-point divisions.
Complex division of two complex numbers e = a + bi and f = c + di is carried out as
The intermediate result of c² + d² can exceed its range if the exponents of c or d are already large . To increase the range the compiler performs this computations on the x87 FPU which can use (IEEE) 80-bit extended double precision instead of the (IEEE) 64-bit double precision.
If you are sure the ranges will not exceed during a complex division the usage of x87 can be turned off, so that only SSE/AVX instructions are used.
Intel Compiler Options:
-no-complex-limited-range(default): use x87 for complex division.
-complex-limited-range: do not use x87 for complex division.
GCC uses Smith’s method instead  via a call to
__divsc3 for single precision and a call to
__divdc3 for double precision in
libm . The options are :
-fno-cx-limited-range(default): use Smith’s method for complex division.
-fcx-limited-range: do not use Smith’s method for complex division; this is automatically turned on with
 M. Baudin and R. L. Smith. A Robust Complex Division in Scilab. arXiv:1210.4539.
 Robert L. Smith. Algorithm 116: Complex division. Commun. ACM, 5(8):435, 1962, doi:10.1145/368637.368661.