This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] under grant agreement n° 288777.
http://www.montblanc-project.eu
26th June 2013 Ter@tec Forum 1
The Mont-Blanc Project Daniele Tafani Leibniz Supercomputing Centre - - PowerPoint PPT Presentation
http://www.montblanc-project.eu The Mont-Blanc Project Daniele Tafani Leibniz Supercomputing Centre 26 th June 2013 1 Ter@tec Forum This project and the research leading to these results has received funding from the European Community's
This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] under grant agreement n° 288777.
http://www.montblanc-project.eu
26th June 2013 Ter@tec Forum 1
2
Ter@tec Forum 26th June 2013
compilers
3 Ter@tec Forum 26th June 2013
Alpha Intel AMD NVIDIA Tegra Samsung Exynos 4-core ARMv8 1.5 GHz
1990 1995 2000 2005 2010 100 1.000 10.000 100.000
MFLOPS
2015 1.000.000
4 Ter@tec Forum 26th June 2013
1 2 4 8 16 1999 2001 2003 2005 2007 2009 2011 2013 2015 Intel SSE IBM BG/P Intel AVX IBM BG/Q ARM Cortex-A9 ARM Cortex-A15 ARMv8 DP ops/ cycle
8 DP ops / cycle
5 Ter@tec Forum 26th June 2013
Cortex-A15 @ 2 GHz* Cortex-A9 @ 1 GHz ARM11 @ 482 MHz BG/Q @ 1.6 GHz Gflops/Watt
* Based on ARM Cortex-A9 @ 2GHz power consumption on 45nm, not an ARM commitment
6 Ter@tec Forum 26th June 2013
7 Ter@tec Forum 26th June 2013
8 Ter@tec Forum 26th June 2013
9 Ter@tec Forum 26th June 2013
10-40 Gb/s 1 Gb/s
10 Ter@tec Forum 26th June 2013
10-40 Gb/s 1 Gb/s
11 Ter@tec Forum 26th June 2013
10-40 Gb/s 1 Gb/s
12 Ter@tec Forum 26th June 2013
13 Ter@tec Forum 26th June 2013
Carrier blade 15 x Compute cards 485 GFLOPS 1 GbE to 10 GbE 200 Watts (?) 2.4 GFLOPS / W Exynos 5 Compute card 1x Samsung Exynos 5 Dual 2 x Cortex-A15 @ 1.7GHz 1 x Mali T604 GPU 6.8 + 25.5 GFLOPS (peak) ~10 Watts 3.2 GFLOPS / W (peak) 7U blade chassis 9 x Carrier blade 135 x Compute cards 4.3 TFLOPS 2 KWatt 2.2 GFLOPS / W 1 Rack 4 x blade cabinets
36 blades 540 compute cards
2x 36-port 10GbE switch 8-port 40GbE uplink 17.2 TFLOPS (peak) 8.2 KWatt 2.1 GFLOPS / W (peak)
80 Gb/s
14 Ter@tec Forum 26th June 2013
15 Ter@tec Forum 26th June 2013
16 Ter@tec Forum 26th June 2013
17 Ter@tec Forum 26th June 2013
Bull Newsca compute unit (Coldplate) LRZ SuperMUC compute unit (cooling pipeline)
18 Ter@tec Forum 26th June 2013
x86 systems.
19 Ter@tec Forum 26th June 2013
OmpSs runtime library (NANOS++) GPU CPU GPU CPU CPU GPU … Source files (C, C++, FORTRAN, …) gcc gfortran
OmpSs
… Native compiler(s) Executable(s) CUDA OpenCL MPI GASNet Linux Linux Linux FFTW HDF5 … … ATLAS Scientific libraries
Scalasca … Paraver Developer tools Cluster management (Slurm) 20 Ter@tec Forum 26th June 2013
BQCD
Particle physics
BigDFT *
COSMO
Weather forecast
EUTERPE
Fusion
MP2C
Multi-particle collisions
PEPC
Coulomb + Grav. Forces
ProFASI
Protein folding
Quantum ESPRESSO *
SMMP *
Protein folding
SPECFEM3D *
Wave propagation
YALES2
Combustion * Already GPU capable (CUDA
21 Ter@tec Forum 26th June 2013
22 Ter@tec Forum 26th June 2013
23