nvidia application lab at j lich dirk pleiter j lich
play

NVIDIA Application Lab at Jlich Dirk Pleiter | Jlich Supercomputing - PowerPoint PPT Presentation

Mitglied der Helmholtz- Gemeinschaft NVIDIA Application Lab at Jlich Dirk Pleiter | Jlich Supercomputing Centre (JSC) Forschungszentrum Jlich at a Glance (status 2010) Budget: 450 mio Euro Staff: 4,800 (thereof 1,630 scientists)


  1. Mitglied der Helmholtz- Gemeinschaft NVIDIA Application Lab at Jülich Dirk Pleiter | Jülich Supercomputing Centre (JSC)

  2. Forschungszentrum Jülich at a Glance (status 2010)  Budget: 450 mio Euro  Staff: 4,800 (thereof 1,630 scientists)  Visiting scientists: 900 per year  Trainees: 90  Publications: 1,800  Protective rights and licences: 14,800  Research fields: health, energy and environment, and information technology; key technologies for tomorrow 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 2

  3. Jülich Supercomputing Centre Supercomputer operation for: Centre – FZJ,  Regional – JARA  Helmholtz & National – NIC, GCS  Europe – PRACE, EU projects  Application support  User support; coordination with SimLabs  Scientific Visualization  Peer review support and coordination R&D work Algorithms, performance analysis and tools  Community data management service  Computer architectures, Exascale Laboratories: EIC, ECL, NVIDIA  Education and Training 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 3

  4. Supercomputer Systems: Dual Track Approach IBM Power 4+ 2004 JUMP, 9 TFlop/s IBM Blue Gene/L IBM Power 6 2006-8 JUBL, 45 TFlop/s JUMP, 9 TFlop/s JUROPA 200 TFlop/s HPC-FF 2009 IBM Blue Gene/P 100 TFlop/s JUGENE, 1 PFlop/s JUDGE 240 TFlop/s File Server IBM Blue Gene/Q 2012 GPFS, Lustre JUQUEEN JUROPA++ 5.7 PFlop/s (target) Cluster, 1-2 PFlop/s 2014 + Booster General-Purpose Highly-Scalable 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 4

  5. JUDGE Cluster System  206 IBM iDataPlex nodes  2 Tesla M2050 or M2070 per node  Infiniband QDR network  Peak performance: 239 Tflops Users  Institute for Advanced Simulations  Molecular dynamics and mechanics, micro-magnetism simulations, medical image reconstruction  JuBrain partition  Milkey Way partition 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 5

  6. NVIDIA Application Lab at Jülich Collaboration between JSC and NVIDIA since July 2012  Enable scientific applications for GPU-based architectures  Provide support for their optimization  Investigate performance and scaling Work focus  Application requirements analysis  Kepler and CUDA feature analysis  Parallelization on many GPUs  Collaboration with performance tools developers  Training 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 6

  7. Pilot Application: JuBrain Application developed at the Institute of Neuroscience and Medicine (INM-1) at Forschungszentrum Jülich: Katrin Amunts, Markus Axer, Marcel Huysegoms Research goal Accurate, highly detailed computer model of the human brain 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 7

  8. Brain Section Images Blockface pictures Exceeds GPU  Created while cutting brain in sections memory capacity Histological images  Polarized light images  Low resolution vs. high resolution  100 μ m → 3 μ m pixel size  30 MBytes → 4 0 Gbytes data Challenge: 3d reconstruction 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 8

  9. 3D Reconstruction Moving image Metric Optimizer Fixed image Interpolator Transformation O(30) Registration algorithms  → 3 parameters Rigid registration speedup  → 6 parameters Afine registration on GPU  → O(100) parameters Elastic registration 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 9

  10. Fluid dynamics on Fermi and Kepler Lattice Boltzmann method  D2Q37 model  Application developed at U Rome Tore Vergata/INFN, U Ferrara/INFN, TU Eindhoven  Reproduce dynamics of fluid by simulating virtual particles which collide and propagate  Simulation of large systems requires double precision computation on many GPUs 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 10

  11. Collide kernel on Fermi  Kernel dominated by arithmetic operations  Floating-point performance as a function of the number of threads/block [GFlop/s] Excellent performance on Fermi Implementation: F. Schifano (U Ferrara/INFN) 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 11

  12. Kepler Performance Tuning for (i = 0; i < NPOP-1; i++) { lPop = p_prv[i*NX*NY + idx]; u = u + param_cx[i] * lPop; Performance analysis observations v = v + param_cy[i] * lPop; }  Significant increase of L1 cache misses 17% (Tesla M2090) → 67% (Tesla K20 )  #pragma unroll for (i = 0; i < NPOP-1; i++) { lPop = p_prv[i*NX*NY + idx]; SM performance increased, but L1 cache u = u + param_cx[i] * lPop; v = v + param_cy[i] * lPop; capacity remained unchanged } Problem mitigation by simple code change Enforce loop unrolling to eliminate indirect memory accesses J. Kraus (NVIDIA Lab) 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 12

  13. Collide kernel on Kepler GK110 Comparison Fermi vs. Kepler  Grid size considered here: 252 x 16384  Floating-point performance as a function of the number of threads/block Performance improvement 1.7x 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 13

  14. Propagate kernel Kernel dominated by memory access  Grid size considered here: 252 x 16384  Memory bandwidth [GByte/s] as a function of the number of threads/block Performance improvement 1.4x 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 14

  15. Summary NVIDIA Application Lab at Jülich  New and fruitful model for collaboration  We are just at the beginning ... Application requirements analysis  JuBrain: Project aiming for realistic model of the human brain Kepler feature analysis  Initial performance results for Lattice Boltzmann application on GK110  Very high performance level reached on Fermi can be sustained 14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend