IN DIRECT X ALEX DUNN NVIDIA - DEV. TECH. AGENDA Fluid in games. - - PowerPoint PPT Presentation

in direct x
SMART_READER_LITE
LIVE PREVIEW

IN DIRECT X ALEX DUNN NVIDIA - DEV. TECH. AGENDA Fluid in games. - - PowerPoint PPT Presentation

SPARSE FLUID SIMULATION IN DIRECT X ALEX DUNN NVIDIA - DEV. TECH. AGENDA Fluid in games. Eulerian (grid based) fluid. Sparse Eulerian Fluid. Feature Level 11.3 Enhancements! WHY DO WE NEED FLUID IN GAMES? Replace particle kinematics!


slide-1
SLIDE 1

ALEX DUNN – NVIDIA - DEV. TECH.

SPARSE FLUID SIMULATION IN DIRECT X

slide-2
SLIDE 2

AGENDA

Fluid in games. Eulerian (grid based) fluid. Sparse Eulerian Fluid. Feature Level 11.3 Enhancements!

slide-3
SLIDE 3

WHY DO WE NEED FLUID IN GAMES?

Replace particle kinematics!

more realistic == better immersion

Game mechanics?

  • cclusion

smoke grenades physical interaction

Dispersion

air ventilation systems poison, smoke

Endless opportunities!

slide-4
SLIDE 4

EULERIAN SIMULATION #1

Inject Advect Pressure Vorticity Evolve

2x Velocity 2x Pressure 1x Vorticity

My (simple) DX11.0 eulerian fluid simulation:

slide-5
SLIDE 5

EULERIAN SIMULATION #2

Inject Advect Pressure Vorticity Evolve

  • Add fluid to simulation
  • Move data at, XYZ  (XYZ+Velocity)
  • Calculate localized pressure
  • Calculates localized rotational flow
  • Tick Simulation
slide-6
SLIDE 6

**(some imagination required)**

slide-7
SLIDE 7
slide-8
SLIDE 8

TOO MANY VOLUMES SPOIL THE…

Fluid isn’t box shaped.

clipping wastage

Simulated separately.

authoring GPU state no volume-to-volume interaction

Tricky to render.

slide-9
SLIDE 9
slide-10
SLIDE 10

PROBLEM!

N-order problem

64^3 = ~0.25m cells 128^3 = ~2m cells 256^3 = ~16m cells …

Applies to:

computational complexity memory requirements

1024 2048 3072 4096 5120 6144 7168 8192 256 512 768 1024 Memory (Mb) Dimensions (X = Y = Z)

Texture3D - 4x16F

And that’s just 1 texture…

slide-11
SLIDE 11

BRICKS

Split simulation space into groups of cells (each known as a brick). Simulate each brick independently.

slide-12
SLIDE 12

BRICK MAP

Need to track which bricks contain fluid Texture3D<uint> 1 voxel per brick

0  Unoccupied 1  Occupied Could also use packed binary grids [Gruen15], but this requires atomics 

slide-13
SLIDE 13

TRACKING BRICKS

Initialise with emitter Expansion (unoccupied  occupied)

if { V|x|y|z| > |Dbrick| } expand in that axis

Reduction (occupied  unoccupied)

inverse of Expansion handled automatically

slide-14
SLIDE 14

SPARSE SIMULATION

Inject Advect Pressure Vorticity Evolve* Clear Bricks Fill List

Reset all bricks to 0 (unoccupied) in brick map. Texture3D<uint> g_BrickMapRO; AppendStructredBuffer<uint3> g_ListRW; if(g_BrickMapRO[idx] != 0) { g_ListRW.Append(idx); } Read value from brick map. Append brick coordinate to list if occupied. *Includes expansion

slide-15
SLIDE 15

UNCOMPRESSED STORAGE

Allocate everything; forget about unoccupied cells  Pros:

  • simulation is coherent in memory.
  • works in DX11.0.

Cons:

  • no reduction in memory usage.
slide-16
SLIDE 16

COMPRESSED STORAGE

Similar to, List<Brick> Pros:

  • good memory consumption.
  • works in DX11.0.

Cons:

  • allocation strategies.
  • indirect lookup.
  • “software translation”
  • filtering particularly costly

Indirection Table Physical Memory

slide-17
SLIDE 17

1 Brick = (4)3 = 64

slide-18
SLIDE 18

1 Brick = (1+4+1)3 = 216

  • New problem;
  • “6n2 +12n + 8” problem.

Can we do better?

slide-19
SLIDE 19

ENTER; FEATURE LEVEL 11.3

Volume Tiled Resources (VTR)!  Extends 2D functionality in FL11.2 Must check HW support: (DX11.3 != FL11.3)

ID3D11Device3* pDevice3 = nullptr; pDevice->QueryInterface(&pDevice3); D3D11_FEATURE_DATA_D3D11_OPTIONS2 support; pDevice3->CheckFeatureSupport(D3D11_FEATURE_D3D11_OPTIONS2, &support, sizeof(support)); m_UseTiledResources = support.TiledResourcesTier == D3D11_TILED_RESOURCES_TIER_3;

slide-20
SLIDE 20

TILED RESOURCES #1

Pros:

  • nly mapped memory is

allocated in VRAM

  • “hardware translation”
  • logically a volume texture
  • all samplers supported
  • 1 Tile = 64KB (= 1 Brick)
  • fast loads
slide-21
SLIDE 21

TILED RESOURCES #2

Gotcha: Tile mappings must be updated from CPU

1 Tile = 64KB (= 1 Brick)

BPP Tile Dimensions 8 64x32x32 16 32x32x32 32 32x32x16 64 32x16x16 128 16x16x16

slide-22
SLIDE 22

CPU READ-BACKS

Taboo in real time graphics CPU read-backs are fine, if done correctly!

(and bad if not)

2 frame latency (more for AFR in SLI) Profile map/unmap calls

Frame N+2 Frame N+1 Frame N+2 Frame N+3 Frame N Frame N+1 Frame N N; Data Ready N+1; Data Ready N+2; Data Ready N; Tiles Mapped

CPU: GPU:

slide-23
SLIDE 23

LATENCY RESISTANT SIMULATION #1

Naïve Approach:

clamp velocity to Vmax CPU Read-back:

  • ccupied bricks.

2 frames of latency!

extrapolate “probable” tiles.

slide-24
SLIDE 24

LATENCY RESISTANT SIMULATION #2

Tight Approach:

CPU Read-back:

  • ccupied bricks.

max{|V|} within brick. 2 frames of latency!

extrapolate “probable” tiles.

slide-25
SLIDE 25

LATENCY RESISTANT SIMULATION #3

Sparse Eulerian Simulation Readback Brick List

CPU Readback Ready?

Yes Prediction Engine Emitter Bricks No UpdateTile Mappings

CPU GPU

slide-26
SLIDE 26

DEMO

slide-27
SLIDE 27

PERFORMANCE #1

NOTE: Numbers captured on a GeForce GTX980

2.3 19.9 64.7 0.4 1.8 2.7 2.9 6.0

128 256 384 512 1024

  • Sim. Time (ms)

Grid Resolution

Full Grid Sparse Grid

slide-28
SLIDE 28

PERFORMANCE #2

NOTE: Numbers captured on a GeForce GTX980

80 640 2,160 5,120 40,960 11 46 57 83 138

128 256 384 512 1024

Memory (MB) Grid Resolution

Full Grid Sparse Grid

slide-29
SLIDE 29

SCALING

Ratio (in time) of 1 Brick = ~75% across grid resolutions.

Time{Full} Time{Sparse}

slide-30
SLIDE 30

SUMMARY

Let’s see more fluid in games. Fluid is not box shaped! One volume is better than many small. Un/Compressed storage a viable fallback. CPU read-backs are useful if done right! VTRs great for fluid simulation. Other latency resistant algorithms with tiled resouces?

slide-31
SLIDE 31

THANK YOU

ALEX DUNN - ADUNN@NVIDIA.COM TWITTER: @ALEXWDUNN