Collision Detection Xinlei Wang, Material Point Method Fluid - PowerPoint PPT Presentation

GPU Optimizations of Material Point Method and Collision Detection Xinlei Wang, 王鑫磊浙江大学

Material Point Method • Fluid • Smoothed-Particle Hydrodynamics • Grid-based Methods • Solid • Finite Element Method • Finite Difference Method • Material Point Method • large deformation, complex topology changes • multi-material & multiphase coupling • (self) collision handling

MPM Pipeline Overview Lagrangian Eulerian • Particle: Sort & Order • Sparse Grid: Generate Sparse Blocks material paticles Cartesian grids transfer Maintain • Particle – Grid Mapping Structures particle to grid • Material Stress Computation 𝑜 𝑤 𝑞 𝑜 𝑦 𝑞 𝑜 𝑞 𝑗 𝑜 𝑜 𝑛 𝑞 𝑛 𝑗 • Particle-to-Grid Transfer (mass, Rasterize momentum, etc.) time explicit implicit integration 𝑜+1 = (𝑞 𝑗 𝑜 + 𝜀𝑢 ∗ 𝑔 𝑓𝑦𝑢 )/𝑛 𝑗 𝑜 • Explicit: 𝑤 𝑗 grid to particle 𝑜+1 𝐺 𝑜+1 𝑜+1 𝑜+1 Time • Implicit: Solve for 𝑤 𝑗 𝑤 𝑞 𝑤 𝑗 𝑞 Integration Up to 90% advection • Grid-to-Particle Transfer (velocity) Resample 𝑜+1 𝑦 𝑞 • Update Particle Attributes (position, deformation gradient, etc) Advection

Performance is the Solution • “dx gap” • a gap between adjacent models when colliding • increase grid resolution => more particles to achieve equal magnitude • CFL Condition • for simulation stability and collision handling • more time steps per frame => more work to compute a frame • Performance is the key ！

Gather (node based) Scatter (particle based) 0 n 1 n 0 2 n+1 1 2 3 4 n+1 3 n+2 5 6 4 n+4 7 5 n+2 6 transfer notation grid node particle 7 n+3

Hardware Friendly Solutions • MLS MPM • [2018 SIGGRAPH, Hu, et al.] A Moving Least Squares Material Point Method with Displacement Discontinuity and Two-Way Rigid Body Coupling • Async MPM • [2018 SCA, Fang, et al.] A Temporally Adaptive Material Point Method with Regional Time Stepping • GVDB • [2018 EG, Wu, et al.] Fast Fluid Simulations with Sparse Volumes on the GPU • Warp for Cell • [2017 GTC, Museth, et al.] Blasting Sand with NVIDIA CUDA: MPM Sand Simulation for VFX • http://on-demand.gputechconf.com/gtc/2017/video/s7298-ken-museth-blasting-sand-with-nvidia- cuda-mpm-sand-simulation-for-vfx.mp4 • Bottleneck: Particle-to-Grid Transfer

The Alternative of Transfer warp intrinsics ballot clz region region 1 region 2 region 3 0 iteration 0, stride 1 shfl iteration 1, stride 2 node node node node sh shared memory n n+1 n+2 n+3

Comparison Optimized Scatter Gather • No auxiliary structures or memory • Additional particle list for each grid node • Uniform workload for each thread • Divergent workload • Very few ‘ atomicAdd ’ write conflicts • No write-conflicts at all

CPU ： 18-core Intel Xeon Gold 6140, ￥ 16000 GPU ： Nvidia Titan XP, ￥ 8000 • vs. FLIP [Gao et al. 2017] • CPU-based, Gather-style • ~16X Speed-up • vs. MLS [Hu et al. 2018] • CPU-based, Scatter-style • ~8X Speed-up • vs. Naïve Scatter • GPU-based, Scatter-style • ~10~24X Speed-up • vs. GVDB [Wu et al. 2018] • GPU-based, Gather-style • ~ 7~15X Speed-up Performance Benchmarks

Fundamental Implementation Choices • Data Structure for Particles • Arrays in the SoA (Structure of Array) layout • Data Structure for Space • Perceptionally a sparse uniform grid • Support efficient interpolation operations • GSPGrid vs. GVDB • Sort • Radix sort vs. Histogram sort

Performance Factors 20 15 • When the number of particles is fixed, • ppc ↑ , node ↓ , performance ↑ 10 5 0 Gaussian_ μ=10 Uniform_ μ=10 Gaussian_ μ=18 Uniform_ μ=18 m s Mapping Stress P2G G2P Re-Sorting • Particle distribution doesn’t matter much • The number of particles matters

Delayed Ordering Speedup 10 Reorder No Reorder 8 6 4 2 0 Mapping Stress P2G Solver G2P Sorting Others

Delayed Ordering • Particle Attributes Classification • By Perception • Intrinsics: Mass, Physical Property (Constitutive Model, etc.) • Extrinsics: Position, Velocity, Deformation Gradient, Affine Velocity Field (or Velocity Gradient) • By Access (Write/ Read) Frequency • Mass: remains static after initialized, read once per timestep • Position: maintained after each timestep, • Everything else (Velocity, Deformation Gradient, Affine Velocity Field , etc.)

Ordering Strategy particle index particle attribute 𝑜 𝑛 6 𝑜 𝑛 1 𝑜 𝑛 2 𝑜 𝑛 3 𝑜 𝑛 4 𝑜 𝑛 5 𝑜 𝑛 7 𝑜 𝑛 0 step n-1 3 4 1 2 6 0 7 5 𝑜 𝑜 𝑜 𝑜 𝑜 𝑜 𝑜 𝑜 𝑦 5 𝑦 3 𝑦 4 𝑦 1 𝑦 2 𝑦 6 𝑦 0 𝑦 7 𝑜 𝑛 1 𝑜 𝑛 2 𝑜 𝑛 3 𝑜 𝑛 4 𝑜 𝑛 5 𝑜 𝑛 6 𝑜 𝑛 7 𝑜 𝑛 0 𝑜 𝑜 𝑜 𝑜 𝑜 𝑜 𝑜 𝑜 step n 𝑤 3 𝑤 4 𝑤 1 𝑤 2 𝑤 6 𝑤 0 𝑤 7 𝑤 5 3 1 5 4 0 2 7 6 𝑜 𝑜 𝑜 𝑜 𝑜 𝑜 𝑜 𝑜 𝑦 3 𝑦 1 𝑦 5 𝑦 4 𝑦 0 𝑦 2 𝑦 7 𝑦 6 𝑜 𝑛 1 𝑜 𝑛 2 𝑜 𝑛 3 𝑜 𝑛 4 𝑜 𝑛 5 𝑜 𝑛 6 𝑜 𝑛 7 𝑜 𝑛 0 𝑜 𝑜 𝑜 𝑜 𝑜 𝑜 𝑜 𝑜 step n+1 𝑤 3 𝑤 1 𝑤 5 𝑤 4 𝑤 0 𝑤 2 𝑤 7 𝑤 6 7 1 6 4 5 2 0 3 𝑜 𝑜 𝑜 𝑜 𝑜 𝑜 𝑜 𝑜 𝑦 7 𝑦 1 𝑦 6 𝑦 4 𝑦 5 𝑦 2 𝑦 0 𝑦 3

Ordering Strategy Access times per-particle per-timestep Reorder Everything Delayed Ordering Particle Read Write Particle Read Write Attribute Attribute arbitrary contiguous arbitrary contiguous arbitrary contiguous arbitrary contiguous (Dimension) (Dimension) mass (1) 1 1 0 1 mass (1) 1 0 0 0 position (d) 1 3 0 1+1 position (d) 1 3 0 1+1 velocity (d) 1 1 0 1+1 velocity (d) 1 0 0 1 deformation deformation 1 1 0 1+1 0 1 0 1 gradient (d*d) gradient (d*d) … … … … … …

Delayed Ordering Speedup 10 Reorder No Reorder 8 6 4 2 0 Mapping Stress P2G Solver G2P Sorting Others

Summary: • GPU MPM pipeline • efficient, extensible, cross-platform • support multiple-materials • https://github.com/kuiwuchn/GPUMPM • What’s next? • Multi-GPU MPM • Distributed GMPM

Collision Detection • Broad-phase Collision Detection • Look for AABB bounding box intersections • Typical memory-bound CUDA kernels!

BVH (Bounding Volume Hierarchy) Construction • BVH Construction • [2012 Karras] builds all nodes in parallel • [2014 Apetrei] builds & refits in one iteration • BVH Stackless Traversal • [2007 Damkjaer] depth-first order traversal using escape index Linear BVH built on top of primitives sorted by their Morton codes

Stackless BVH Traversal • BVH Construction • [2012 Karras] builds all nodes in parallel • [2014 Apetrei] builds & refits in one iteration • BVH Stackless Traversal • [2007 Damkjaer] depth-first order traversal using escape index Depth-first order traversal track of Primitive-1 assuming it collides with all the other primitives

BVH-based Collision Detection • Full traversal of the internal nodes 4 • Original BVH 4 2 1 0 3 6 5 2 • Ordered BVH 0 1 2 3 4 5 6 1 6 • How to compute BVH order 0 3 5 • Calculate the LCL-value of each leaf node 0 1 2 3 4 5 6 7 • Compute prefix sums of LCL-values 0 • Assign the indices from LCA from top 1 to bottom Sort 2 5 3 4 6 0 1 2 3 4 5 6 7

Effectiveness of ordering • Without ordering • With ordering • L2 Cache Hit Rate (L1 Reads) • L2 Cache Hit Rate (L1 Reads) • 88% • 92% • Global Load L2 Transactions/Access • Global Load L2 Transactions/Access • 31.7 • 23.4 • Maximum Divergence • Maximum Divergence • 99.9% • 65.7% • The overhead of histogram sort is low (~1ms) 2~3x speedup !

Thanks! https://github.com/littlemine Xinlei Wang, 王鑫磊

GPU Execution Model https://www.3dgep.com/cuda-thread-execution-model/

Other Useful Engineering Tips • For Performance: • SoA memory layout • Per-material computation, separate material properties from particle attributes • For Code Reusability: • Entity-Component System • Particle extrinsics formulation relies on certain components (MLS/non-MLS, PIC/FLIP/APIC) • Functional Programming • Implicit Time Integration involves lots of similar grid operations • Transfer schemes can be formulated by various submodules (kernel, transfer method) • Easier to make task parallel

Collision Detection Xinlei Wang, Material Point Method Fluid - PowerPoint PPT Presentation

GPU Optimizations of Material Point Method and Collision Detection Xinlei Wang, Material Point Method Fluid Smoothed-Particle Hydrodynamics Grid-based Methods Solid Finite Element Method Finite

Collision Detection Collision detection weaknesses Naive collision detection suffers from 3 known

Collision Detection Based on Collision Series On XNA Creators Club Collision Detection Circular

Collision Detection Part 2. Narrow Phase Collision Detection The Narrow Phase Exact collision

Collision Detection That Collision Detection That Collision Detection That Really Works Really

Collision Detection http://www.cse.iitd.ac.in/ Collision Detection IIT Delhi Collision handling

CS 1666 www.cs.pitt.edu/~nlf4/cs1666/ Collision detection Collision detection We already had to

10-Collision Response Collision Response Collision Response [Moore and Wilhelms 88]:

Collision Detection CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter 2018

Overview Overview Motivation Motivation Collision detection Collision detection

Collision Detection CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter 2017

Collision Detection CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter 2020

Overview Overview Motivation Motivation Collision Detection Collision Detection

Collision Detection CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Spring 2016

Collision Detection I Motivation F d X S, X Collision Force detection response F d F r

Collision Detection II Outline Broad phase collision detection: - Problem definition and

I. Introduction II. Collision of Domain Walls in 5D Minkowski Space III. Reheating by Collision

Towards optimization-based multi-agent collision avoidance under continuous stochastic dynamics

Collision Detection Iris Merilo 2016 What, where? the computational problem of detecting

CS 480/680: GAME ENGINE PROGRAMMING COLLISION DETECTION 1/31/2013 Santiago Ontan

PSCC: Parallel Self-Collision Culling with Spatial Hashing on GPUs

From Events to Reactions: A Progress Report Tony Garnock-Jones tonyg@ccs.neu.edu Northeastern

SCARE of Secret Ciphers with SPN Structures Matthieu Rivain Joint work with Thomas Roche (ANSSI)

Particle Dynamics Particles are objects that have mass, position, and velocity, but without

CSCI 4760 - Computer Networks Fall 2016 Instructor: Prof. Roberto Perdisci perdisci@cs.uga.edu

Sambuz

Useful Links

Newsletter

Mail Us