Blasting Sand with CUDA: MPM Sand Simulation for VFX Gergely Klr - - PowerPoint PPT Presentation

blasting sand with cuda
SMART_READER_LITE
LIVE PREVIEW

Blasting Sand with CUDA: MPM Sand Simulation for VFX Gergely Klr - - PowerPoint PPT Presentation

Blasting Sand with CUDA: MPM Sand Simulation for VFX Gergely Klr DreamWorks Animation t n t n+1 t n t n+1 t n t n+1 Grid influence Nave Particles-to-Grid Gather Particles-to-Grid Our Solution Each particle is read only once, We


slide-1
SLIDE 1

Blasting Sand with CUDA: MPM Sand Simulation for VFX

Gergely Klár DreamWorks Animation

slide-2
SLIDE 2
slide-3
SLIDE 3

tn tn+1

slide-4
SLIDE 4

tn tn+1

slide-5
SLIDE 5

tn tn+1

slide-6
SLIDE 6

Grid influence

slide-7
SLIDE 7

Naïve Particles-to-Grid

slide-8
SLIDE 8

Gather Particles-to-Grid

slide-9
SLIDE 9

Our Solution

  • Each particle is read only once,
  • We efficiently use shared memory for the grids,
  • We significantly reduce the number of atomic
  • perations,
  • And our secret sauce: a special data structure

for particle queries.

slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13

1 CUDA Block 1 CUDA Block 1 CUDA Block 1 CUDA Block 1 CUDA Block 1 CUDA Block 1 CUDA Block 1 CUDA Block 1 CUDA Block

slide-14
SLIDE 14
slide-15
SLIDE 15

CellBins ParticleIDs Actual particle data

slide-16
SLIDE 16

TileBins CellBins ParticleIDs Actual particle data

slide-17
SLIDE 17
  • In each block/tile:

– Get blockIdx – Cells in the tile are TileBins[blockIdx-1].. TileBins[blockIdx]-1 – Get a cellId for each warp from this list

  • Each thread works on two affected grid nodes
  • Particles of a cell are

CellBins[cellId-1]..CellBins[cellId]-1

  • Compute the contribution from the particle
  • Store in shared

– Write back to global

slide-18
SLIDE 18

Tile & Cell Keys

  • Particle coordinates: (px, py, pz)
  • Cell coordinates:

(ci, cj, ck) = ⌊(px, py, pz)/Δx⌋

  • Tile and in-tile coordinates:

(ci, cj, ck) = (ti, tj, tk)∙TILE_SIZE + (ri, rj, rk)

Δx

tj ti tk rj rk ri

7 bits 7 bits 7 bits 3 bits 3 bits 3 bits

32 bit unsigned integer

slide-19
SLIDE 19

Tile Bins sort Initial Particle IDs Particle IDs RLE

  • inc. sum

Cell Bins masked RLE

  • inc. sum

Tile & Cell Keys

  • When sorted as uint32s, keys of the

same tile will be consecutive

  • RLE encoding counts the number of

particles per cell

  • The running sum of the counts gives

the offsets to particles

  • RLE encoding with a mask for the

tile bits counts the number of non- empty cells per tile

  • The running sum of these counts

gives the offsets to cells

slide-20
SLIDE 20

Results

slide-21
SLIDE 21

Overall

200 400 600 800 1000 262K 884K 2,097K 7,000K # of particles GPU CPU

Milliseconds per time step. Smaller is better. nVidia Quadro K5200 Intel Xeon CPU E5-2697 v3 @ 2.60GHz w/ 28 cores

slide-22
SLIDE 22

Particles to Grids

100 200 300 400 500 600 262K 884K 2,097K 7,000K

Grids to Particles

100 200 300 400 500 600 262K 884K 2,097K 7,000K

Milliseconds per time step. Smaller is better.

slide-23
SLIDE 23

Summary

  • Particle binning with sort-RLE-scan
  • Breaking the domain to tiles fitting to shared

memory

  • Processing particles of a cell by a single warp
slide-24
SLIDE 24

Special thanks to:

  • Ken Museth
  • Stephen Jones
  • Jeff Budsberg
  • Lawrence Lee
  • Rob Tesdahl
  • David Tonnesen
  • Ibrahim Sani
slide-25
SLIDE 25
slide-26
SLIDE 26

Thank you!