Blasting Sand with CUDA: MPM Sand Simulation for VFX Gergely Klr - - PowerPoint PPT Presentation

▶

Jan 19, 2024 281 likes •553 views

Blasting Sand with CUDA: MPM Sand Simulation for VFX Gergely Klr DreamWorks Animation t n t n+1 t n t n+1 t n t n+1 Grid influence Nave Particles-to-Grid Gather Particles-to-Grid Our Solution Each particle is read only once, We

SLIDE 1

Blasting Sand with CUDA: MPM Sand Simulation for VFX

Gergely Klár DreamWorks Animation

SLIDE 2

SLIDE 3

tn tn+1

SLIDE 4

tn tn+1

SLIDE 5

tn tn+1

SLIDE 6

Grid influence

SLIDE 7

Naïve Particles-to-Grid

SLIDE 8

Gather Particles-to-Grid

SLIDE 9

Our Solution

Each particle is read only once,
We efficiently use shared memory for the grids,
We significantly reduce the number of atomic
perations,
And our secret sauce: a special data structure

for particle queries.

SLIDE 10

SLIDE 11

SLIDE 12

SLIDE 13

1 CUDA Block 1 CUDA Block 1 CUDA Block 1 CUDA Block 1 CUDA Block 1 CUDA Block 1 CUDA Block 1 CUDA Block 1 CUDA Block

SLIDE 14

SLIDE 15

CellBins ParticleIDs Actual particle data

SLIDE 16

TileBins CellBins ParticleIDs Actual particle data

SLIDE 17

In each block/tile:

– Get blockIdx – Cells in the tile are TileBins[blockIdx-1].. TileBins[blockIdx]-1 – Get a cellId for each warp from this list

Each thread works on two affected grid nodes
Particles of a cell are

CellBins[cellId-1]..CellBins[cellId]-1

Compute the contribution from the particle
Store in shared

– Write back to global

SLIDE 18

Tile & Cell Keys

Particle coordinates: (px, py, pz)
Cell coordinates:

(ci, cj, ck) = ⌊(px, py, pz)/Δx⌋

Tile and in-tile coordinates:

(ci, cj, ck) = (ti, tj, tk)∙TILE_SIZE + (ri, rj, rk)

Δx

tj ti tk rj rk ri

7 bits 7 bits 7 bits 3 bits 3 bits 3 bits

32 bit unsigned integer

SLIDE 19

Tile Bins sort Initial Particle IDs Particle IDs RLE

inc. sum

Cell Bins masked RLE

inc. sum

Tile & Cell Keys

When sorted as uint32s, keys of the

same tile will be consecutive

RLE encoding counts the number of

particles per cell

The running sum of the counts gives

the offsets to particles

RLE encoding with a mask for the

tile bits counts the number of non- empty cells per tile

The running sum of these counts

gives the offsets to cells

SLIDE 20

Results

SLIDE 21

Overall

200 400 600 800 1000 262K 884K 2,097K 7,000K # of particles GPU CPU

Milliseconds per time step. Smaller is better. nVidia Quadro K5200 Intel Xeon CPU E5-2697 v3 @ 2.60GHz w/ 28 cores

SLIDE 22

Particles to Grids

100 200 300 400 500 600 262K 884K 2,097K 7,000K

Grids to Particles

100 200 300 400 500 600 262K 884K 2,097K 7,000K

Milliseconds per time step. Smaller is better.

SLIDE 23

Summary

Particle binning with sort-RLE-scan
Breaking the domain to tiles fitting to shared

memory

Processing particles of a cell by a single warp

SLIDE 24

Special thanks to:

Ken Museth
Stephen Jones
Jeff Budsberg
Lawrence Lee
Rob Tesdahl
David Tonnesen
Ibrahim Sani

SLIDE 25

SLIDE 26

Blasting Sand with CUDA: MPM Sand Simulation for VFX

Gergely Klár DreamWorks Animation

tn tn+1

tn tn+1

tn tn+1

Grid influence

Naïve Particles-to-Grid

Gather Particles-to-Grid

Our Solution

for particle queries.

Tile & Cell Keys

(ci, cj, ck) = ⌊(px, py, pz)/Δx⌋

(ci, cj, ck) = (ti, tj, tk)∙TILE_SIZE + (ri, rj, rk)

Results

Overall

Particles to Grids

Grids to Particles

Summary

memory

Special thanks to:

Thank you!