blasting sand with cuda
play

Blasting Sand with CUDA: MPM Sand Simulation for VFX Gergely Klr - PowerPoint PPT Presentation

Blasting Sand with CUDA: MPM Sand Simulation for VFX Gergely Klr DreamWorks Animation t n t n+1 t n t n+1 t n t n+1 Grid influence Nave Particles-to-Grid Gather Particles-to-Grid Our Solution Each particle is read only once, We


  1. Blasting Sand with CUDA: MPM Sand Simulation for VFX Gergely Klár DreamWorks Animation

  2. t n t n+1

  3. t n t n+1

  4. t n t n+1

  5. Grid influence

  6. Naïve Particles-to-Grid

  7. Gather Particles-to-Grid

  8. Our Solution • Each particle is read only once, • We efficiently use shared memory for the grids, • We significantly reduce the number of atomic operations, • And our secret sauce: a special data structure for particle queries.

  9. 1 CUDA 1 CUDA 1 CUDA Block Block Block 1 CUDA 1 CUDA 1 CUDA Block Block Block 1 CUDA 1 CUDA 1 CUDA Block Block Block

  10. CellBins ParticleIDs Actual particle data

  11. TileBins CellBins ParticleIDs Actual particle data

  12. • In each block/tile: – Get blockIdx – Cells in the tile are TileBins[blockIdx-1].. TileBins[blockIdx]-1 – Get a cellId for each warp from this list • Each thread works on two affected grid nodes • Particles of a cell are CellBins[cellId-1]..CellBins[cellId]-1 • Compute the contribution from the particle • Store in shared – Write back to global

  13. Tile & Cell Keys ● Particle coordinates: (px, py, pz) ● Cell coordinates: (ci, cj, ck) = ⌊ (px, py, pz)/ Δx ⌋ Δx ● Tile and in-tile coordinates: (ci, cj, ck) = (ti, tj, tk) ∙TILE_SIZE + (ri, rj, rk) 7 bits 7 bits 7 bits 3 bits 3 bits 3 bits ti tj tk ri rj rk 32 bit unsigned integer

  14. Tile & Cell Keys Initial Particle IDs ● When sorted as uint32s, keys of the same tile will be consecutive sort ● RLE encoding counts the number of Particle IDs particles per cell ● The running sum of the counts gives RLE the offsets to particles inc. sum ● RLE encoding with a mask for the Cell Bins tile bits counts the number of non- empty cells per tile masked RLE ● The running sum of these counts gives the offsets to cells inc. sum Tile Bins

  15. Results

  16. Overall 1000 800 600 GPU 400 CPU 200 0 262K 884K 2,097K 7,000K # of particles nVidia Quadro K5200 Intel Xeon CPU E5-2697 v3 @ 2.60GHz w/ 28 cores Milliseconds per time step. Smaller is better.

  17. Particles to Grids Grids to Particles 600 600 500 500 400 400 300 300 200 200 100 100 0 0 262K 884K 2,097K 7,000K 262K 884K 2,097K 7,000K Milliseconds per time step. Smaller is better.

  18. Summary • Particle binning with sort-RLE-scan • Breaking the domain to tiles fitting to shared memory • Processing particles of a cell by a single warp

  19. Special thanks to: • Ken Museth • Rob Tesdahl • Stephen Jones • David Tonnesen • Jeff Budsberg • Ibrahim Sani • Lawrence Lee

  20. Thank you!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend