High Performance In-Situ Visualization on Thousands of GPUs Jeroen - - PowerPoint PPT Presentation

high performance in situ visualization on thousands of
SMART_READER_LITE
LIVE PREVIEW

High Performance In-Situ Visualization on Thousands of GPUs Jeroen - - PowerPoint PPT Presentation

High Performance In-Situ Visualization on Thousands of GPUs Jeroen Bdorf Evghenii Gaburov Simon Portegies Zwart Peter Messmer Leiden Observatory Compute machine Simulation I/O layer disk I/O software Storage


slide-1
SLIDE 1

High Performance In-Situ Visualization

  • n Thousands of GPUs

Jeroen Bédorf Simon Portegies Zwart

Leiden Observatory

Peter Messmer Evghenii Gaburov

slide-2
SLIDE 2
slide-3
SLIDE 3

Compute machine Simulation Ex-situ visualization machine I/O layer I/O layer

analysis & visualization software

disk I/O software Storage disk I/O software

slide-4
SLIDE 4

Compute & in-situ visualization machine Simulation

analysis & visualization, simulation steering sw

I/O layer Storage disk I/O software

slide-5
SLIDE 5
slide-6
SLIDE 6

Discovered at SC14! “Hoax object”

slide-7
SLIDE 7

Features:

http://github.com/treecode/Bonsai

  • Async parallel I/O

Gravitational tree code :: Bonsai

  • Scales up to 25 Pflops on Titan supercomputer
  • In-situ (parallel) visualization

Showcased at GTC12 & SC14 Gordon Bell Prize Finalist (2014)

slide-8
SLIDE 8

Features:

http://github.com/treecode/Bonsai

  • Async parallel I/O

Gravitational tree code :: Bonsai

  • Scales up to 25 Pflops on Titan supercomputer
  • In-situ (parallel) visualization

Showcased at GTC12 & SC14 Gordon Bell Prize Finalist (2014)

slide-9
SLIDE 9

Compute & in-situ visualization machine Bonsai

analysis & visualization, simulation steering sw

I/O layer

In-situ visualization pipeline:

10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms

Display Display (240 ms) Display Compositing

1

Simulation step (80 ms)

2

Data partition (50 ms)

3

OpenGL rendering (60 ms)

4

Compositing (50 ms) Simulation step …

  • 1. Simulation step
  • 2. Data partitioning
  • 3. OpenGL rendering
  • 4. Parallel compositing
slide-10
SLIDE 10
  • 2. Data partitioning
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14

1 2 3 4 5 7 8 9 6

slide-15
SLIDE 15

1 2 3 4 5 7 8 9 6

slide-16
SLIDE 16

Space Filling Curve (SFC) Domain decomposition in Bonsai

slide-17
SLIDE 17
slide-18
SLIDE 18

Ray casting

depth

slide-19
SLIDE 19

Ray casting Sampling data

depth

slide-20
SLIDE 20

Ray casting Sampling data Shading

depth

slide-21
SLIDE 21

4 3 2 5 1

Ray casting Sampling data Shading Compositing

depth

slide-22
SLIDE 22

P Q L

P Q

1 2 3 4 5 7 8 9 6

slide-23
SLIDE 23

P Q L

P Q

1 2 3 4 5 7 8 9 6

slide-24
SLIDE 24

P Q L

P Q

1 2 9 1 2 3 4 5 7 8 9 6

slide-25
SLIDE 25

1 2 3 4 5 7 8 9 6

P Q L

P Q

1 2 9

slide-26
SLIDE 26

1 2 3 4 5 7 8 9 6

P Q L

P Q

1 2 9 3 4 3 5 4 5 6 7

slide-27
SLIDE 27

2 3 4 5 6 7 8 9

P Q L

P Q

1

slide-28
SLIDE 28

1 2 3 4 5 6 7 8 9

P Q L

P

1 4 8 7

Q

2 3 5 6 9

slide-29
SLIDE 29

Recursive multi-section domain decomposition

slide-30
SLIDE 30

Every new in-situ data update

SFC Recursive multi-section

Both a CPU and Interconnect heavy operation

slide-31
SLIDE 31

Every new in-situ data update

SFC Recursive multi-section

Both a CPU and Interconnect heavy operation

slide-32
SLIDE 32
slide-33
SLIDE 33

GPU-0

slide-34
SLIDE 34

GPU-1

slide-35
SLIDE 35

GPU-2

slide-36
SLIDE 36

GPU-3

slide-37
SLIDE 37

GPU-4

slide-38
SLIDE 38

GPU-5

slide-39
SLIDE 39

GPU-6

slide-40
SLIDE 40

GPU-7

slide-41
SLIDE 41

GPU-8

slide-42
SLIDE 42

GPU-0 GPU-1 GPU-2 GPU-3 GPU-4 GPU-5 GPU-6 GPU-7 GPU-8

slide-43
SLIDE 43

Final image

slide-44
SLIDE 44
  • 4. Parallel compositing

P

1 4 8 7

Q

2 3 5 6 9

slide-45
SLIDE 45
slide-46
SLIDE 46

proc 0 proc 1 proc 2 proc 3 proc 4 proc 5 proc 7 proc 6

slide-47
SLIDE 47

proc 0 proc 1 proc 2 proc 3 proc 4 proc 5 proc 7 proc 6

slide-48
SLIDE 48

G1 G3 G6 G7

proc 0 proc 1 proc 2 proc 3 proc 4 proc 5 proc 7 proc 6

slide-49
SLIDE 49

G1 G3 G6 G7

proc 0 proc 1 proc 2 proc 3 proc 4 proc 5 proc 7 proc 6

slide-50
SLIDE 50

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7 7 7 7 7 1 1 1 1,3 1,3 3 7 7 7 7 7 1 1 1,6 1,3,6 1,3,6 3 7 7 7 7 7 6 3,6 3,6 3 7 7 7 7 7 6 6 6 6 6 6

proc 0 proc 1 proc 2 proc 3 proc 4 proc 5 proc 7 proc 6

MPI_Alltoallv(..)

A bit of math & data exchange is done with a single operation:

slide-51
SLIDE 51

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7 7 7 7 7 1 1 1 1,3 1,3 3 7 7 7 7 7 1 1 1,6 1,3,6 1,3,6 3 7 7 7 7 7 6 3,6 3,6 3 7 7 7 7 7 6 6 6 6 6 6

P3: blends pixels from G1, G3 & G6 P4: blends pixels from G3 & G6 P2: blends pixels from G1 & G3

proc 0 proc 1 proc 2 proc 3 proc 4 proc 5 proc 7 proc 6

slide-52
SLIDE 52

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7 7 7 7 7 1 1 1 1+3 1+3 3 7 7 7 7 7 1 1 1+6

1+3+6 1+3+6

3 7 7 7 7 7 6 3+6 3+6 3 7 7 7 7 7 6 6 6 6 6 6

Glue scan-lines together with a single operation: MPI_Gather(..)

proc 0 proc 1 proc 2 proc 3 proc 4 proc 5 proc 7 proc 6

slide-53
SLIDE 53

10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms

Compositing

1

Simulation step (80 ms)

2

Data partition (50 ms)

3

OpenGL rendering (60 ms)

4

Compositing (50 ms) Simulation step … Display Display (240 ms) Display

slide-54
SLIDE 54

10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms

1

Simulation step (80 ms)

2

Data partition (50 ms)

3

OpenGL rendering (60 ms)

4

Compositing (50 ms) Display

slide-55
SLIDE 55

10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms

Simulation step

1

Simulation step (80 ms) Simulation step (80 ms) Simulation step (80 ms) Simulation step …

2

Data partition (50 ms)

3

OpenGL rendering (60 ms)

4

Compositing (50 ms) Display

slide-56
SLIDE 56

10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms

Simulation step

1

Simulation step (80 ms) Simulation step (80 ms) Simulation step (80 ms) Simulation step … Data partition (50 ms)

2

Data partition (50 ms) Data partition (50 ms) Data partition

3

OpenGL rendering (60 ms)

4

Compositing (50 ms) Display

slide-57
SLIDE 57

10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms

Simulation step

1

Simulation step (80 ms) Simulation step (80 ms) Simulation step (80 ms) Simulation step … Data partition (50 ms)

2

Data partition (50 ms) Data partition (50 ms) Data partition OpenGL rendering OpenGL rendering (60 ms) OpenGL rendering (60 ms)

3

OpenGL rendering (60 ms) OpenGL rendering Compositing Compositing (50 ms) Compositing (50 ms) Compositing (50 ms)

4

Compositing (50 ms) Display Display (60 ms) Display (60 ms) Display (60 ms) Display

slide-58
SLIDE 58

10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms

Compositing

1

Simulation step (80 ms)

2

Data partition (50 ms)

3

OpenGL rendering (60 ms)

4

Compositing (50 ms) Simulation step … Display Display (240 ms) Display

4 fps 16 fps

10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms

Simulation step

1

Simulation step (80 ms) Simulation step (80 ms) Simulation step (80 ms) Simulation step … Data partition (50 ms)

2

Data partition (50 ms) Data partition (50 ms) Data partition OpenGL rendering OpenGL rendering (60 ms) OpenGL rendering (60 ms)

3

OpenGL rendering (60 ms) OpenGL rendering Compositing Compositing (50 ms) Compositing (50 ms) Compositing (50 ms)

4

Compositing (50 ms) Display Display (60 ms) Display (60 ms) Display (60 ms) Display

slide-59
SLIDE 59

10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms

Compositing

1

Simulation step (80 ms)

2

Data partition (50 ms)

3

OpenGL rendering (60 ms)

4

Compositing (50 ms) Simulation step … Display Display (240 ms) Display

4 fps 15 fps

  • 16 bit colors
  • delegated MPI_Alltoallv with MPI rank placement
  • dedicated remote displaying machine to gather final image
  • image compression

http://github.com/treecode/Bonsai

10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms 10 ms

Simulation step

1

Simulation step (80 ms) Simulation step (80 ms) Simulation step (80 ms) Simulation step … Data partition (50 ms)

2

Data partition (50 ms) Data partition (50 ms) Data partition OpenGL rendering OpenGL rendering (60 ms) OpenGL rendering (60 ms)

3

OpenGL rendering (60 ms) OpenGL rendering Compositing Compositing (50 ms) Compositing (50 ms) Compositing (50 ms)

4

Compositing (50 ms) Display Display (60 ms) Display (60 ms) Display (60 ms) Display

slide-60
SLIDE 60
  • In-situ visualization as I/O workflow (e.g. ADIOS)
  • Interoperability with job schedulers (e.g. slurm)
  • Take advantage of existing software (e.g. ParaView)
  • More use cases (astro, chem, bio, automotive, aerospace)
slide-61
SLIDE 61
slide-62
SLIDE 62
slide-63
SLIDE 63

High Performance In-Situ Visualization

  • n Thousands of GPUs

Jeroen Bédorf Simon Portegies Zwart

Leiden Observatory

Peter Messmer Evghenii Gaburov