ABINIT v4.3 : Benchmarks
The three most demanding routines of ABINIT, in the most usual situation (total energy SCF calculation) have been analyzed (fourwf.f, nonlop.f, projbd.f), for different platforms. Tests have been made for sets of increasing number of plane waves, for which the three-dimensional FFT grids range from 20x20x20 to 96x96x96. For the fourwf.f routine, the seven results are shown. For nonlop.f and projbd.f, an average on the seven sets is given. In all cases, the cpu time for one routine call has been divided by the size of the data set (number of 3D FFT points for fourwf.f, and number of planewave coefficients for nonlop.f and projbd.f . Results are given in microseconds per data.
NB : these tests are for the SEQUENTIAL version of the code.
Routine fourwf.f (3D FFT - part related to the treatment of the potential)
|
|
|
freq(Mhz)/grid
|
20
|
30
|
36
|
48
|
64
|
80
|
96
|
| SGI Altix 3700 | IA64-2 |
1300
|
0.022 | 0.024 | 0.023 | 0.024 | 0.028 | 0.032 | 0.035 |
| HP/COMPAQ (ES45) | EV68 |
1250
|
0.026 | 0.034 | 0.033 | 0.034 | 0.038 | 0.042 | 0.054 |
| IBM p5-570 | PWR5 |
1650
|
0.029 | 0.029 | 0.025 | 0.024 | 0.027 | 0.026 | 0.032 |
| IBM p690 | PWR4 |
1300
|
0.036 | 0.038 | 0.035 | 0.034 | 0.036 | 0.039 | 0.049 |
| IBM p630 | PWR4 |
1000
|
0.039 | 0.042 | 0.046 | 0.041 | 0.046 | 0.049 | 0.063 |
| AMD Opteron 246 | AMD-64 |
2000
|
0.029 | 0.038 | 0.039 | 0.040 | 0.043 | 0.049 | 0.054 |
| AMD Opteron 246 | AMD-64 |
1800
|
0.033 | 0.044 | 0.044 | 0.044 | 0.048 | 0.055 | 0.060 |
| Apple | PPC G5 |
1800
|
0.026 | 0.046 | 0.046 | 0.046 | 0.047 | 0.049 | 0.060 |
| Apple | PPC G4 |
800
|
0.117 | 0.145 | 0.155 | 0.166 | 0.204 | 0.233 | 0.275 |
| Intel(Xeon) | Xeon |
3060
|
0.081 | 0.091 | 0.098 | 0.098 | 0.093 | 0.102 | 0.104 |
| Intel(FSB800) | PIV |
2800
|
0.081 | 0.099 | 0.094 | 0.098 | 0.099 | 0.110 | 0.109 |
| Intel(VortX) | PIV |
2400
|
0.096 | 0.117 | 0.123 | 0.120 | 0.118 | 0.130 | 0.132 |
| Intel(Cox) | PIII |
933
|
0.151 | 0.194 | 0.201 | 0.209 | 0.217 | 0.255 | 0.321 |
| HP N4000(Turing) | PA-RISC8500 |
360
|
0.130 | 0.164 | 0.154 | 0.156 | 0.157 | 0.176 | 0.215 |
| HP C360 | PA-RISC8500 |
367
|
0.185 | 0.192 | 0.199 | 0.224 | 0.248 | 0.266 | 0.308 |
| SGI(Spinoza) | R14K |
600
|
0.078 | 0.096 | 0.113 | 0.130 | 0.146 | 0.177 | 0.224 |
| IBM RS6000/44P | PWR3+ |
375
|
0.086 | 0.119 | 0.127 | 0.185 | 0.262 | 0.292 | 0.314 |
| Fujitsu | VPP/8 |
142
|
0.496 | 0.343 | 0.270 | 0.199 | 0.166 | 0.159 | 0.160 |
| SUN Sunfire V750 | USIII |
750
|
0.229 | 0.246 | 0.243 | 0.283 | 0.323 | 0.368 | 0.389 |
| Fujitsu | PIII |
600
|
0.299 | 0.355 | 0.353 | 0.366 | 0.397 | 0.438 | 0.492 |
| Microway | EV67 |
500
|
0.078 | 0.109 | 0.107 | 0.111 | 0.145 | 0.160 | 0.201 |
| AlphaStation 7000 | EV56 |
600
|
0.123 | 0.166 | 0.163 | 0.171 | 0.216 | 0.263 | 0.327 |
Routine nonlop.f (Non-local potential - part related to the treatment of the energy)
Routine projbd.f (Orthogonalisation)
|
freq (Mhz)
|
nonlop | projbd | ||
| SGI Altix 3700 | IA64-2 |
1300
|
0.078 | 0.016 |
| HP/COMPAQ (ES45) | EV68 |
1250
|
0.080 | 0.013 |
| IBM p5-570 | PWR5 |
1650
|
0.055 | 0.0053 |
| IBM p690 | PWR4 |
1300
|
0.010 | 0.010 |
| IBM p630 | PWR4 |
1000
|
0.095 | 0.015 |
| AMD Opteron 246 | AMD-64 |
2000
|
0.113 | 0.018 |
| AMD Opteron 246 | AMD-64 |
1800
|
0.102 | 0.015 |
| Apple | PPC G5 |
1800
|
0.179 | 0.013 |
| Apple | PPC G4 |
800
|
0.524 | 0.102 |
| Intel(Xeon) | Xeon |
3060
|
0.118 | 0.013 |
| Intel(FSB800) | PIV |
2800
|
0.144 | 0.024 |
| Intel(VortX) | PIV |
2400
|
0.156 | 0.024 |
| Intel(Cox) | PIII |
933
|
0.533 | 0.132 |
| HP N4000(Turing) | PA-RISC8500 |
360
|
0.852 | 0.046 |
| HP C360 | PA-RISC8500 |
367
|
0.334 | 0.072 |
| SGI(Spinoza) | R14K |
600
|
0.303 | 0.041 |
| IBM RS6000/44P | PWR3+ |
375
|
0.233 | 0.023 |
| Fujitsu | VPP/8 |
142
|
0.611 | 0.049 |
| SUN Sunfire V750 | USIII |
750
|
0.544 | 0.085 |
| Fujitsu | PIII |
600
|
0.923 | 0.193 |
| Microway | EV67 |
500
|
0.309 | 0.038 |
| AlphaStation 7000 | EV56 |
600
|
0.508 | 0.068 |
Brief description of the hardware
- SGI Altix 3700 28 x Intel Itanium2 Linux 2.4.21-sgi230r7(1.3GHz) 55Gb ram
- HP/COMPAQ ES45 4 x Alpha EV68 (1.25 GHz) TRU64 5.1B, Cache L1 I/D 64/64 kB, Cache L2 8MB,32 GB RAM
- IBM p5-570 4 CPUs@1.65GHz Linux SLES9, Kernel 2.6, Compilateur fortran IBM XLF 9.1
- IBM Power4 pSeries p690 Turbo 1.3GHz (regatta), AIX 5.0 ,Cache L1 I/D 32/128 kB, Cache L2 (N.A.)
- IBM Power4 pSeries p630 1.0GHz, AIX 5.0, Cache L1 I/D 32/128 kB, Cache L2 (N.A.)
- AMD Opteron 246("e325") 2GHz, Fedora Core 2 Linux-64, Cache L2 1Mb, PGI x86-64 5.1
- AMD Opteron 244("hyperion.enge") 1.8GHz, Suse Linux-64, Cache L2 1Mb, PGI x86-64 5.1
- SGI ("spinoza") Octane 2 - IRIX 6.5
- IBM ("dirac") 44P/Power3 - AIX 5.1 - xlf 7.x
- Intel Xeon ("tsunami") cluster Cenaero bi-Xeon 3.06GHz, Cache L2 512Kb, 2Gb RAM, PGI 5.1
- Intel FSB800 ("lowdin") P4 2.8GHz, Cache L2 512Kb, 1Gb RAM, PGI 5.1
- Intel ("VortX") P4 2.4GHz, Cache L2 512Kb, 1Gb RAM (RAMBUS RR400 PC800), PGI 3.2-3
- Intel ("Cox") P3 933MHz, Cache L2 265Kb, 1Gb RAM, PGI 3.2-3
- HP N4000 ("Turing") PA-8600 - HP-UX 11
- HP C360
- Fujitsu VPP 8x142MHz
theoretical peak performance 2.2 GFlops, vector processor 8 add + 8 mult per clock cycle, one processing element VX-1S
NB: The CPU test for FUJITSU was performed on the Fujitsu machine (VX-1S) in Mitsubishi Chemical Corp - Apple PowerPC G5 2 x PPC G5 (version = 2.2) , Bus speed: 900 MHz, L2 cache size: 512KB (times 2), L3 cache size: 2MB (times 2), Memory size: 512MB, Mac OS X 10.3.6, Compiler IBM xlf for OSX ver.8.
- Apple PowerPC G4 2 x PPC G4 (version = 2.1) Bus speed: 800 MHz, L2 cache size: 256KB (times 2), L3 cache size: 2MB (times 2), Memory size: 1.25GB, Mac OS X 10.2.2 (6F21), Compiler Absoft Pro Fortran for OSX ver.8.0
- Intel PIII 600 MHz/Fujitsu, M/B Intel 440BX AGPset ATX M/B, SCSI Onboard Adaptec AIC7890 Chip ("compatible" with Adaptec 2940U2W), HDD 9.1GB(Ultra2-Wide SCSI) , Memory 1GB, NIC PCI 10/100Mbps Intel PILA8460B Management Adaptor , GNU/Linux (Kernel v.2.2.5), F90 v2.0 Fujitsu "Fortran & C" package
- SUN SunFire V750 2 x US III 750 MHz, Solaris 2.9, Cache L1 I/D 16/16 kB, Cache L2 8MB, Worshop 6.0
- Alpha ev56 ("boop")
- Alpha ev56 ("deepflow")
- SGI Octane1 ("Zebulon")

