NVIDIA Application Lab at Jlich Dirk Pleiter | Jlich Supercomputing - - PowerPoint PPT Presentation

nvidia application lab at j lich dirk pleiter j lich
SMART_READER_LITE
LIVE PREVIEW

NVIDIA Application Lab at Jlich Dirk Pleiter | Jlich Supercomputing - - PowerPoint PPT Presentation

Mitglied der Helmholtz- Gemeinschaft NVIDIA Application Lab at Jlich Dirk Pleiter | Jlich Supercomputing Centre (JSC) Forschungszentrum Jlich at a Glance (status 2010) Budget: 450 mio Euro Staff: 4,800 (thereof 1,630 scientists)


slide-1
SLIDE 1

Mitglied der Helmholtz- Gemeinschaft

NVIDIA Application Lab at Jülich

Dirk Pleiter | Jülich Supercomputing Centre (JSC)

slide-2
SLIDE 2

14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 2

Forschungszentrum Jülich at a Glance (status 2010)

  • Budget: 450 mio Euro
  • Staff: 4,800 (thereof 1,630 scientists)
  • Visiting scientists: 900 per year
  • Trainees: 90
  • Publications: 1,800
  • Protective rights and licences: 14,800
  • Research fields: health, energy and

environment, and information technology; key technologies for tomorrow

slide-3
SLIDE 3

14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 3

Supercomputer operation for:

  • Centre – FZJ,
  • Regional – JARA
  • Helmholtz & National – NIC, GCS
  • Europe – PRACE, EU projects

Application support

  • User support; coordination with SimLabs
  • Scientific Visualization
  • Peer review support and coordination

R&D work

  • Algorithms, performance analysis and tools
  • Community data management service
  • Computer architectures, Exascale Laboratories: EIC, ECL, NVIDIA

Education and Training

Jülich Supercomputing Centre

slide-4
SLIDE 4

14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 4

2004 2006-8 2009 2012

File Server GPFS, Lustre

IBM Power 4+ JUMP, 9 TFlop/s IBM Power 6 JUMP, 9 TFlop/s IBM Blue Gene/P JUGENE, 1 PFlop/s

Supercomputer Systems: Dual Track Approach

HPC-FF 100 TFlop/s JUROPA 200 TFlop/s General-Purpose Highly-Scalable

2014

JUROPA++ Cluster, 1-2 PFlop/s + Booster IBM Blue Gene/Q JUQUEEN 5.7 PFlop/s (target) IBM Blue Gene/L JUBL, 45 TFlop/s JUDGE 240 TFlop/s

slide-5
SLIDE 5

14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 5

JUDGE Cluster

System

  • 206 IBM iDataPlex nodes
  • 2 Tesla M2050 or M2070 per node
  • Infiniband QDR network
  • Peak performance: 239 Tflops

Users

  • Institute for Advanced Simulations
  • Molecular dynamics and mechanics, micro-magnetism simulations, medical image reconstruction
  • JuBrain partition
  • Milkey Way partition
slide-6
SLIDE 6

14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 6

NVIDIA Application Lab at Jülich

Collaboration between JSC and NVIDIA since July 2012

  • Enable scientific applications for GPU-based architectures
  • Provide support for their optimization
  • Investigate performance and scaling

Work focus

  • Application requirements analysis
  • Kepler and CUDA feature analysis
  • Parallelization on many GPUs
  • Collaboration with performance tools developers
  • Training
slide-7
SLIDE 7

14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 7

Pilot Application: JuBrain

Application developed at the Institute of Neuroscience and Medicine (INM-1) at Forschungszentrum Jülich: Katrin Amunts, Markus Axer, Marcel Huysegoms

Research goal

Accurate, highly detailed computer model of the human brain

slide-8
SLIDE 8

14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 8

Brain Section Images

Blockface pictures

  • Created while cutting brain in sections

Histological images

  • Polarized light images
  • Low resolution vs. high resolution
  • 100 μm → 3 μm pixel size
  • 30 MBytes → 40 Gbytes data

Challenge: 3d reconstruction

Exceeds GPU memory capacity

slide-9
SLIDE 9

14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 9

3D Reconstruction

Registration algorithms

  • Rigid registration

→ 3 parameters

  • Afine registration

→ 6 parameters

  • Elastic registration

→ O(100) parameters Moving image Metric Fixed image Interpolator Transformation Optimizer

O(30) speedup

  • n GPU
slide-10
SLIDE 10

14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 10

Fluid dynamics on Fermi and Kepler

Lattice Boltzmann method

  • D2Q37 model
  • Application developed at

U Rome Tore Vergata/INFN, U Ferrara/INFN, TU Eindhoven

  • Reproduce dynamics of fluid by

simulating virtual particles which collide and propagate

  • Simulation of large systems requires double

precision computation on many GPUs

slide-11
SLIDE 11

14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 11

Collide kernel on Fermi

  • Kernel dominated by arithmetic
  • perations
  • Floating-point performance as

a function of the number of threads/block [GFlop/s] Implementation:

  • F. Schifano (U Ferrara/INFN)

Excellent performance

  • n Fermi
slide-12
SLIDE 12

14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 12

Kepler Performance Tuning

Performance analysis observations

  • Significant increase of L1 cache misses
  • 17% (Tesla M2090) → 67% (Tesla K20)

SM performance increased, but L1 cache capacity remained unchanged Problem mitigation by simple code change Enforce loop unrolling to eliminate indirect memory accesses

for (i = 0; i < NPOP-1; i++) { lPop = p_prv[i*NX*NY + idx]; u = u + param_cx[i] * lPop; v = v + param_cy[i] * lPop; }

#pragma unroll

for (i = 0; i < NPOP-1; i++) { lPop = p_prv[i*NX*NY + idx]; u = u + param_cx[i] * lPop; v = v + param_cy[i] * lPop; }

  • J. Kraus (NVIDIA Lab)
slide-13
SLIDE 13

14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 13

Collide kernel on Kepler GK110

Comparison Fermi vs. Kepler

  • Grid size considered here:

252 x 16384

  • Floating-point performance as a

function of the number of threads/block

Performance improvement 1.7x

slide-14
SLIDE 14

14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 14

Propagate kernel

Kernel dominated by memory access

  • Grid size considered here:

252 x 16384

  • Memory bandwidth [GByte/s] as a

function of the number of threads/block

Performance improvement 1.4x

slide-15
SLIDE 15

14.11.2012 Dirk Pleiter | NVIDIA Application Lab at Jülich 15

Summary

NVIDIA Application Lab at Jülich

  • New and fruitful model for collaboration
  • We are just at the beginning ...

Application requirements analysis

  • JuBrain: Project aiming for realistic model of the human brain

Kepler feature analysis

  • Initial performance results for Lattice Boltzmann application on GK110
  • Very high performance level reached on Fermi can be sustained