Benchmark

  1. Benchmark test using Mario 3D (3D grain growth) – Compiled by Intel parallel studio XE 2018 (Fortran)

NMM-Gibbs: Intel i9-7980XE CPU (18 Core) + 128Gb RAM (DDR4, 19200 MHz) + 2 way GTX 1080Ti (SLI)+ 1G NVMe-type SSD

NMM-Duhem: Post processing, access to Mozart: Intel i7-8700 (6 core) + 32Gb RAM (DDR4, 19200 MHz) + GTX 1050Ti + 250Gb Samsung SSD (860 EVO)

2. Double precision vs Single precision

  • Hardware: Mac Pro (Late 2013) 2.7 GHz 12-Core Intel Xeon E5, 64 GB DDR3
  • Compiler: Intel(R) Fortran Intel(R) 64 Compiler Version 16.0.1.111
  • Programming Language: Fortran95

Double precision result:

PI in reference:3.1415926535

PI from 4.0*atan(1.0)   3.14159274101257

Matches to the sixth decimal place.

PI value:3.14158260 when dx=  0.10E-04

PI value:3.14159165 when dx=  0.10E-05

PI value:3.14159238 when dx=  0.10E-06

PI value:3.14159263 when dx=  0.10E-07

PI value:3.14159265 when dx=  0.10E-08

Matches to the seventh decimal place.

Single precision result:

PI in reference:3.1415926535

PI from 4.0*atan(1.0)  3.1415927

Matches to the sixth decimal place.

PI value:3.1415827 when dx=  0.10E-04

PI value:3.1415935 when dx=  0.10E-05

PI value:3.1419716 when dx=  0.10E-06

Matches to the fourth decimal place.

Benchmark code download: link

3. Hello World! program with OpenMP

  • Hardware: Mac Pro (Late 2013) 2.7 GHz 12-Core Intel Xeon E5, 64 GB DDR3
  • Compiler: Intel(R) Fortran Intel(R) 64 Compiler Version 16.0.1.111
  • Programming Language: Fortran95

 The number of processors available =          24

 The number of threads available =          12

 Outside the parallel region.

Hello World! from thread  0

 Start parallel region

Hello World! from thread  0

Hello World! from thread  8

Hello World! from thread  4

Hello World! from thread  3

Hello World! from thread  2

Hello World! from thread  9

Hello World! from thread 11

Hello World! from thread 10

Hello World! from thread  7

Hello World! from thread  6

Hello World! from thread  1

Hello World! from thread  5

 Finisih parallel region

Elapsed wall clock time =        0.00168 s

Prototype code download: link

4. Optimization option

  • Hardware: Mac Pro (Late 2013) 2.7 GHz 12-Core Intel Xeon E5, 64 GB DDR3
  • Compiler: Intel(R) Fortran Intel(R) 64 Compiler Version 16.0.1.111
  • Programming Language: Fortran95

Compiler optimization flag: Consuming time

-O0 : 46.978s

-O1 : 18.562s

-O2 : 8.715s

-O3 : 8.497s

-O3 -xHost : 4.356s

-O3 -no-vec : 18.959s

-O3 -xAVX : 4.654s

Benchmark code Download: link

5. Heat equation solver – OpenMP

  • Hardware: Mac Pro (Late 2013) 2.7 GHz 12-Core Intel Xeon E5, 64 GB DDR3
  • Compiler: Intel(R) Fortran Intel(R) 64 Compiler Version 16.0.1.111
  • Programming Language: Fortran95

# of threads : Consuming time

1 : 16.16277s

2 : 9.27891s

3 : 7.0657s

4 :5.99156s

5 : 5.37791s

6 :5.03487s

10: 3.87756 s

12:3.74724s

20:4.29448s

  • Hardware: NMM-Duhem – Intel i7-8700 (6 core) + 32Gb DDR4 RAM + GTX 1050Ti + 250Gb Samsung SSD (860 EVO)
  • Compiler: Intel(R) Fortran Intel(R) 64 Compiler Version  18.0.2
  • Programming Language: Fortran95

# of threads : Consuming time

1 : 4.30457s

6: 3.08603 s

10:  2.91980 s

12:3.69964s

  • Hardware: NMM-Gibbs – Workstation for CUDA: Intel i9-7980XE CPU (18 Core) + 128Gb RAM (DDR4, 19200 MHz) + 2 way GTX 1080Ti (SLI)+ 1G NVMe-type SSDSamsung SSD (860 EVO)
  • Compiler: Intel(R) Fortran Intel(R) 64 Compiler Version  18.0.2
  • Programming Language: Fortran95

# of threads : Consuming time

12: 2.473s

18:  2.425 s

Benchmark code download: link

6. Heat equation solver – Intel MKL wrapping fftw vs User compiled fftw

  • Hardware: Mac Pro (Late 2013) 2.7 GHz 12-Core Intel Xeon E5, 64 GB DDR3
  • Compiler: Intel(R) Fortran Intel(R) 64 Compiler Version 16.0.1.111
  • Programming Language: Fortran95

Intel MKL wrapping fftw: 4.82680 s

User Compiled fftw: 9.45596 s

  • Hardware: NMM-Duhem – Intel i7-8700 (6 core) + 32Gb DDR4 RAM + GTX 1050Ti + 250Gb Samsung SSD (860 EVO)
  • Compiler: Intel(R) Fortran Intel(R) 64 Compiler Version  18.0.2
  • Programming Language: Fortran95

Intel MKL wrapping fftw: 3.02607 s

Benchmark code Download: link

 

7. Heat equation solver – Intel MKL wrapping fftw with OpenMP

  • Hardware: Mac Pro (Late 2013) 2.7 GHz 12-Core Intel Xeon E5, 64 GB DDR3
  • Compiler: Intel(R) Fortran Intel(R) 64 Compiler Version 16.0.1.111
  • Programming Language: Fortran95

 

# of threads : Consuming time

1 : 6.97241 s

2 : 4.16089 s

3 : 3.32926 s

4 : 2.75631 s

5: 2.46943 s

6 : 2.28327 s

10:1.85547 s

12: 1.78854 s

20: 1.79737s

Benchmark code Download: link

 

  • Hardware: Dell Precision T5600 (Intel Xeon(R) CPU E5-2667 2.90GHz, 24 core, 64GB Ram)
  • Compiler: Intel(R) Fortran Intel(R) 64 Compiler Version 16.0.1.111
  • Programming Language: Fortran95

# of threads : Consuming time

1 : 12.00364 s

2 : 7.43734 s

3 : 5.59979 s

4 : 4.84057 s

5: 4.06444 s

6 :3.60835 s

10:2.87825 s

12: 3.12203 s

 

  • Hardware: NMM-Duhem – Intel i7-8700 (6 core) + 32Gb DDR4 RAM + GTX 1050Ti + 250Gb Samsung SSD (860 EVO)
  • Compiler: Intel(R) Fortran Intel(R) 64 Compiler Version  18.0.2
  • Programming Language: Fortran95

# of threads : Consuming time

10: 1.94229s

12: 1.91429 s

14: 1.89164 s

16: 1.94960 s

Benchmark code Download: link

 

8. Heat equation solver – Intel MKL wrapping fftw with OpenMP (12 core) – Planar flag

  • Hardware: Mac Pro (Late 2013) 2.7 GHz 12-Core Intel Xeon E5, 64 GB DDR3
  • Compiler: Intel(R) Fortran Intel(R) 64 Compiler Version 16.0.1.111
  • Programming Language: Fortran95

Planar flag: Consuming time

FFTW_ESTIMATE: 1.76446 s

FFTW_MEASURE : 1.77204 s

FFTW_PATIENT: 1.77307 s

FFTW_EXHAUST: 1.77289 s