PHP5 benchmarks on a Sun T2000

Note: This article in English since it might be of interest for the international community

Sebastian Bergmann benchmarked different PHP versions built with different compilers on a x86 platform. Since we’re in the process of adding one of the Sun Fire T2000 monsters to our webfarm, we decided to run similar tests before moving the system into the production environment – only on the T2000 hardware with Solaris 10. For ease of things, we only used the CALL Zend Virtual Machine.

We built PHP 5.0.5 and PHP 5.2.0 with GCC 3.4.5 (Blastwave package), GCC 4.0.2 (Blastwave package) and Sun’s own CC 5.8.
sun-t2000> /opt/csw/gcc3/bin/gcc -v
Thread model: posix
gcc version 3.4.5

sun-t2000> /opt/csw/gcc4/bin/gcc -v
Thread model: posix
gcc version 4.0.2

sun-t2000> /opt/SUNWspro/bin/cc -V
cc: Sun C 5.8 Patch 121015-03 2006/10/18

After heavy experimenting, the following optimization flags were used:

  • CC:CFLAGS: -fast -xarch=v8 and
    for PHP 5.2.0 once: CFLAGS: -fast -xarch=v8plusa -xipo -fsimple=0 -fns=no
  • GCC3: CFLAGS: -pipe -mcpu=v9 -O3, CXXFLAGS: -O3
  • GCC4: CFLAGS: -pipe -mcpu=v9 -O1, CXXFLAGS: -O3

Why -O1 only with GCC4? Because the PHP binary segfaulted when built with GCC and a higher -O{2,3} CFLAG.

For the actual benchmarks, we also used bench.php (just like Sebastian did) so results are comparable and as-statistically-valid-as-possible ™.

Here’s the result – explanations and more details below.

Far left: PHP 5.2.0 with cc and CFLAGS='-fast -xarch=v8',
second from left: PHP 5.2.0 with cc and CFLAGS='-fast -xarch=v8plusa -xipo -fsimple=0 -fns=no'.

And the winner is: PHP 5.2.0 with GCC 3.4.5
Even by far: It’s about double as fast as 5.0.5 – nice job by the core programmers!

More details:
Each binary ran the benchmark script ten times, the numbers shown are the mean from the ten runs.
Monitoring the benchmark runs with top showed the T2000 scheduling each PHP process for the CPU with the least usage. With the T2000 having 32 CPUs (yes, no typo, see below) this means that 32 PHP processes started simultaneously will all have the same optimal performance.

The configure options for each build were --disable-all --disable-cgi.

Raw test data:
PHP 5.2.0, cc: Sun C 5.8 Patch 121015-03 2006/10/18,
CFLAGS: -fast -xarch=v8,
Binary Size (in bytes): 3.488.164, VM: CALL
68,475

PHP 5.2.0, cc: Sun C 5.8 Patch 121015-03 2006/10/18,
CFLAGS: -fast -xarch=v8plusa -xipo -fsimple=0 -fns=no,
Binary Size (in bytes): 5.356.576, VM: CALL
70,603
PHP 5.2.0, gcc version 3.4.5 Thread model: posix,
CFLAGS: -pipe -mcpu=v9 -O3, CXXFLAGS: -O3,
Binary Size (in bytes): 2.435.536, VM: CALL
64,281

PHP 5.2.0, gcc version 4.0.2 Thread model: posix,
CFLAGS: -pipe -mcpu=v9 -O1, CXXFLAGS: -O3,
Binary Size (in bytes): 2.477.412, VM: CALL
68,417

PHP 5.0.5, cc: Sun C 5.8 Patch 121015-03 2006/10/18,
CFLAGS: -fast -xarch=v8,
Binary Size (in bytes): 2.427.148, VM: CALL
141,822

PHP 5.0.5, gcc version 3.4.5 Thread model: posix,
CFLAGS: -pipe -mcpu=v9 -O3, CXXFLAGS: -O3,
Binary Size (in bytes): 1.455.536, VM: CALL
134,063

PHP 5.0.5, gcc version 4.0.2 Thread model: posix,
CFLAGS: -pipe -mcpu=v9 -O1, CXXFLAGS: -O3,
Binary Size (in bytes): 1.380.096, VM: CALL
142,281

Update (see the comments): misleading sentence. Binary size affects start-up time of the interpreter, not performance itself.
Interesting to note here is that even though PHP 5.0.5 compiled with GCC4 results in the smallest binary, its performance is the slowest.

Sun Fire T2000:

sun-t-2000> /usr/platform/`uname -m`/sbin/prtdiag
System Configuration:  Sun Microsystems  sun4v Sun Fire T200
System clock frequency: 200 MHz
Memory size: 4088 Megabytes

========================= CPUs ===============================================

                            CPU                 CPU
Location     CPU   Freq     Implementation      Mask
------------ ----- -------- ------------------- -----
MB/CMP0/P0       0 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P1       1 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P2       2 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P3       3 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P4       4 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P5       5 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P6       6 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P7       7 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P8       8 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P9       9 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P10     10 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P11     11 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P12     12 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P13     13 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P14     14 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P15     15 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P16     16 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P17     17 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P18     18 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P19     19 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P20     20 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P21     21 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P22     22 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P23     23 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P24     24 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P25     25 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P26     26 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P27     27 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P28     28 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P29     29 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P30     30 1000 MHz  SUNW,UltraSPARC-T1
MB/CMP0/P31     31 1000 MHz  SUNW,UltraSPARC-T1