ABINIT v5.2 : Benchmarks
The three most demanding routines of ABINIT, in the most usual situation (total energy SCF calculation) have been analyzed (fourwf.F90, nonlop.F90, projbd.F90), for different platforms. Tests have been made for sets of increasing number of plane waves, for which the three-dimensional FFT grids range from 20x20x20 to 96x96x96. For the fourwf.F90 routine, the seven results are shown. For nonlop.F90 and projbd.F90, an average on the seven sets is given. In all cases, the cpu time for one routine call has been divided by the size of the data set (number of 3D FFT points for fourwf.F90, and number of planewave coefficients for nonlop.F90 and projbd.F90 . Results are given in microseconds per data.
NB : these tests are for the SEQUENTIAL version of the code.Routine fourwf.F90 (3D FFT - part related to the treatment of the potential)
|
|
|
freq(Mhz)/grid
|
20
|
30
|
36
|
48
|
64
|
80
|
96
|
| AMD Opteron (Lemaitre FE) | AMD-64 |
2400
|
0.025 | 0.033 | 0.036 | 0.038 | 0.042 | 0.046 | 0.051 |
| Apple Xserve (Max) | PowerPC G5 (v3.0) |
2000
|
0.016 | 0.026 | 0.028 | 0.026 | 0.031 | 0.033 | 0.041 |
| IBM p5-570 (Fock) | PWR5 |
1650
|
0.021 | 0.032 | 0.030 | 0.028 | 0.034 | 0.031 | 0.038 |
| HP Integrity (Chpit) | IA 64 |
1500
|
0.025 | 0.028 | 0.024 | 0.025 | 0.029 | 0.037 | 0.043 |
| Intel Xeon / Ifort (Sleepy) | Intel Xeon |
2800
|
0.049 | 0.056 | 0.051 | 0.057 | 0.059 | 0.065 | 0.070 |
| Intel Xeon / Pathscale (Sleepy) | Intel Xeon |
2800
|
0.038 | 0.054 | 0.053 | 0.053 | 0.056 | 0.062 | 0.072 |
| Intel Xeon / g95 (Sleepy) | Intel Xeon |
2800
|
0.056 | 0.074 | 0.073 | 0.073 | 0.074 | 0.081 | 0.094 |
| Intel Xeon / PGI (Sleepy) | Intel Xeon |
2800
|
0.070 | 0.089 | 0.081 | 0.087 | 0.083 | 0.095 | 0.097 |
| HP/Compaq ES40 (Decci1) | EV67 |
667
|
0.054 | 0.071 | 0.069 | 0.067 | 0.081 | 0.094 | 0.112 |
| DEC/Alpha (Tux front-end) | EV56 |
500
|
0.125 | 0.170 | 0.171 | 0.178 | 0.228 | 0.275 | 0.333 |
| IBM RS6000/44P (Dirac) | PWR3+ |
375
|
0.065 | 0.092 | 0.082 | 0.093 | 0.112 | 0.123 | 0.144 |
| SGI (Spinoza) | R14K |
600
|
0.079 | 0.096 | 0.114 | 0.129 | 0.148 | 0.178 | 0.221 |
Routine nonlop.F90 (Non-local potential - part related to the treatment of the energy)
Routine projbd.F90 (Orthogonalisation)
|
freq (Mhz)
|
nonlop | projbd | ||
| AMD Opteron (Lemaitre FE) | AMD-64 |
2400
|
0.072 | 0.014 |
| Apple Xserve (Max) | PowerPC G5 (v3.0) |
2000
|
0.076 | 0.016 |
| IBM p5-570 (Fock) | PWR5 |
1650
|
0.059 | 0.007 |
| HP Integrity (Chpit) | IA 64 |
1500
|
0.057 | 0.006 |
| Intel Xeon / Ifort (Sleepy) | Intel Xeon |
2800
|
0.104 | 0.023 |
| Intel Xeon / Pathscale (Sleepy) | Intel Xeon |
2800
|
0.117 | 0.020 |
| Intel Xeon / g95 (Sleepy) | Intel Xeon |
2800
|
0.145 | 0.024 |
| Intel Xeon / PGI (Sleepy) | Intel Xeon |
2800
|
0.128 | 0.027 |
| HP/Compaq ES40 (Decci1) | EV67 |
667
|
0.154 | 0.026 |
| DEC/Alpha (Tux front-end) | EV56 |
500
|
0.426 | 0.073 |
| IBM RS6000/44P (Dirac) | PWR3+ |
375
|
0.192 | 0.018 |
| SGI (Spinoza) | R14K |
600
|
0.307 | 0.053 |
Brief description of the hardware/software/compilers
- AMD Opteron ("Lemaitre Front-End") 2 Opteron@2.4 GHz ; Linux SLES9, Kernel 2.6, Compiler Ifort 9.1
- Apple Xserve PowerPC G5 ("Max") 2 PowerPC G5 @2.0 GHz, L2 = 512KB ; Mac OS X 10.3.8; Compiler IBM xlf for OSX
- IBM p5-570 ("Fock") 4 PWR5(dual-core)@1.65GHz ; Linux SLES9, Kernel 2.6; Compiler fortran IBM XLF 9.1
- HP Integrity ("chpit") 4 IA-64@1.5GHz ; Linux Debian; Compiler Ifort 8.1
- Intel Xeon ("Sleepy") 2 Intel Xeon@2.8GHz ; 1MB cache ; Compilers Ifort 9.1, Pathscale 2.4, g95 0.90, PGI 4.0
- HP/Compaq ES40 ("Decci1") EV67@667MHz ; Tru64 ; Digital Compiler
- DEC/Alpha ("Tux front-end") EV56@500MHz ; Linux RedHat; Digital Compiler
- IBM RS6000/44P ("dirac") 44P/Power3@375MHz ; AIX 5.1 ; Compiler xlf 7.x
- SGI ("spinoza")R14K@600 MHz ; Octane 2 - IRIX 6.5

