A GPU-Accelerated Node Based Framework for Hair Simulation and - - PowerPoint PPT Presentation

a gpu accelerated node based framework for hair
SMART_READER_LITE
LIVE PREVIEW

A GPU-Accelerated Node Based Framework for Hair Simulation and - - PowerPoint PPT Presentation

double negative visual effects double negative visual effects A GPU-Accelerated Node Based Framework for Hair Simulation and Rendering Francesco Giordana Sarah Macdonald Gianluca Vatinno Double Negative VFX SIGGRAPH 2013 1 double negative


slide-1
SLIDE 1

double negative visual effects

Double Negative VFX

SIGGRAPH 2013

Francesco Giordana Sarah Macdonald Gianluca Vatinno

double negative visual effects

A GPU-Accelerated Node Based Framework for Hair Simulation and Rendering

1

slide-2
SLIDE 2

double negative visual effects SIGGRAPH 2013

Hair

  • Creatures:
  • Digi-doubles hair / facial hair

(100k - 150k)

  • Digital creatures fur and feathers

(few Ms)

  • Environments:
  • Grass, moss, seaweed, etc..

(many Ms)

2

slide-3
SLIDE 3

double negative visual effects SIGGRAPH 2013

Hair Pros Why hair on the GPU?

  • A lot of repetitions of very similar data
  • Each hair can be computed in parallel
  • Partially uniform domain
  • No need for high precision

3

slide-4
SLIDE 4

double negative visual effects SIGGRAPH 2013

Hair Cons Why NOT hair on the GPU?

  • Inter-dependency between hairs
  • Walk down the hair to propagate

constraints

  • Number of curves can change
  • Arbitrary spatial extension

4

slide-5
SLIDE 5

double negative visual effects SIGGRAPH 2013

What language / library ?

Thrust:

  • Fast and easy to use: STL-style containers and algorithms
  • Has lots of fancy iterators to keep code cleaner
  • Can handle host code: makes code reuse easier
  • CUDA backend quite optimized (sorts out automatically grid size, block size, shared mem usage)
  • Has CPU backends (TBB and OpenMP)
  • Limitation: no streams, no manual control of shared mem
  • We quickly prototype in thrust, then if needed we optimize writing specific CUDA kernels

5

slide-6
SLIDE 6

double negative visual effects SIGGRAPH 2013

Furball

  • Procedural node-graph
  • Custom node graph editor
  • Embedded in existing 3D sw packages

(Maya, Houdini, ...)

  • High-quality previews in viewport
  • Modular
  • C++ Core
  • Qt / PyQT UI

Historical first render with Furball

6

slide-7
SLIDE 7

double negative visual effects SIGGRAPH 2013

Furball Framework

dnSynapse dnSubdiv dnQt dnFurball qFurball

C++

GPU Accelerated

Python

PyQT pyFurball dnPublishing

Tools integration

Maya PRMan Houdini Qt

7

slide-8
SLIDE 8

double negative visual effects SIGGRAPH 2013

FurShop - Maya Integration

Real-time preview in Maya viewport Custom Graph Editor Embedded in Maya Dependency Graph

8

slide-9
SLIDE 9

double negative visual effects SIGGRAPH 2013

FurShop - Tools

Mask painting tool Interactive brush tool Custom UI elements Attribute publishing

9

slide-10
SLIDE 10

double negative visual effects SIGGRAPH 2013

Final Guides Hairs Follicles Density mask Static geometry

FurShop - Example Workflow

10

slide-11
SLIDE 11

double negative visual effects SIGGRAPH 2013

FurShop - Example workflow

Blue:

  • external inputs (maya curves, houdini simulation data, curve manipulation tools, etc)
  • stored in data caches

Red:

  • procedural networks
  • GPU accelerated elements

Purple:

  • rendering environment (PRMan DSO, OpenGL, etc.)

Static Mesh Dynamic Network Groom Network Render Node LookDev Network Groom Curves Dynamic Curves Anim Mesh Extra Inputs

11

slide-12
SLIDE 12

double negative visual effects SIGGRAPH 2013

FurShop - Maya Nodes

FurNetworkNode

FurNetwork

FurConversionNode

FurCache

FurRenderNode

FurSystem FurAttributePtr

MPxData MPxNode MPxNode MPxNode

FurNode FurNode FurNode

Merged computation chain

12

slide-13
SLIDE 13

double negative visual effects SIGGRAPH 2013

dnSynapse

  • DAG with lazy-pull computation model
  • Two types of objects: Node, Attribute
  • Data flow through Attributes
  • Nodes for computation
  • SubGraphs: nodes can contain an entire

graph inside

  • Proxy attributes: attributes from subgraph

can be exposed to the upper layer

Enable / disable node Enable / disable CUDA Make output node

13

slide-14
SLIDE 14

double negative visual effects SIGGRAPH 2013

dnSynapse - Device Controller

  • Initialize and select device
  • Create CUDA Context
  • Handle resources (e.g. available memory)
  • Enable / disable GPU acceleration

struct DeviceController { void enableGPU( bool enable ); void isEnabledGPU(); void selectBestDevice(); bool canHandle( const DataGPU* data ); }

14

slide-15
SLIDE 15

double negative visual effects SIGGRAPH 2013

dnSynapse - Dual Data

  • Abstract Data wrapper with interface exposed to user
  • Two separate implementations for CPU and GPU
  • Data conversion triggered with getDataGPU() or getDataCPU()

struct DataGPU { thrust::device_vector<...> ...; void clear(); void copyTo( DataCPU* dst ); void copyFrom( const DataCPU* src ); };

struct Data { DataCPU* dataCPU; DataGPU* dataGPU; void clear(); ... DataCPU* getDataCPU(); DataGPU* getDataGPU(); };

struct DataCPU { thrust::host_vector<...> ...; void clear(); void save( char* filename ); void load( char* filename ); };

15

slide-16
SLIDE 16

double negative visual effects SIGGRAPH 2013

dnSynapse - Dual Nodes

  • Nodes have a CPU compute and a GPU compute (optional)
  • Try GPU compute first, fallback to CPU compute
  • At first GPU compute data is transferred to Device
  • Data will stay on Device until the next CPU compute
  • Can enable / disable GPU computation with flags (for debugging)

void compute( Data* outData, Context* inContext ) { bool result_CUDA = false; if ( cudaEnabled() && canUseCUDA() ) result_CUDA = computeCUDA( outData->getDataGPU(), context ); if ( ! result_CUDA ) computeCPU( outData->getDataCPU(), context ); }

16

slide-17
SLIDE 17

double negative visual effects SIGGRAPH 2013

Furball - Hair

  • Follicles
  • Surface Patch ID
  • Surface Patch ST
  • Surface Reference Orient
  • Follicle Position
  • Follicle Orient
  • Follicle Reference Position
  • Follicle Reference Orient
  • Follicle UV
  • Curves
  • n Curve Points

17

slide-18
SLIDE 18

double negative visual effects SIGGRAPH 2013

Main families of operators

  • Per-point:
  • Each point in a separate thread
  • No need for info about neighbors
  • Example: scale
  • Per-curve:
  • Compute a whole curve in a single thread
  • Accumulate constraints walking along the curve
  • Example: curl
  • One-curve-to-many:
  • Relationships between one curve and a set of curves
  • Per-curve kernel with information about neighbors
  • Example: guide interpolation
  • Many-curves-to-many:
  • Potentially constraints between all curves in a set
  • Example: hair-hair collisions

18

slide-19
SLIDE 19

double negative visual effects SIGGRAPH 2013

Memory Layout

  • Follicles sorted per-patch
  • Curves sorted per patch, same order as follicles
  • Curve points are ordered per curve, root to tip
  • Each attribute to separate compact array
  • Can split components to separate arrays to maximize memory access efficiency
  • 1 million curves, 32 segments: 36ms on per-point operator, 96ms on per-curve operator

Coalesced access

curve1 curve2 points Thread1 { for (int i=0; i<n; ++i) float3 p = points[i+n]; } Thread2 { for (int i=0; i<n; ++i) float3 p = points[i + 2*n] }

Uncoalesced access

19

slide-20
SLIDE 20

double negative visual effects SIGGRAPH 2013

Caching

  • Problem: Caching occupies memory resources
  • Must cache on Host: Need transfer H->D when reading cache (slow)
  • Can’t use too much pinned memory, or system performance will degrade
  • Solution: cache follicles
  • Limited data set: no curve points
  • Can build kdtree and cache it along
  • Transfer of follicles data is quick, smaller data set so we can use pinned memory
  • Recompute hairs on the device
  • 1 million curves, 32 segments per curve:
  • Follicles and hairs on host, non-pinned memory, Size: 420MB, H->D:120ms
  • Follicles on host, hairs on device, pinned, Size: 50MB, H->D: 10ms, Hair Generation: 14ms

20

slide-21
SLIDE 21

double negative visual effects SIGGRAPH 2013

Test Computer

Mirrors current artists’ computers: Xeon X5690 @ 3.47 GHz 6 Cores 48 GB RAM Quadro 4000 CPU - Single threaded using STL containers CUDA - compute 2.0, using thrust Soon to test multi-threaded CPU and CUDA on Tesla K20

21

slide-22
SLIDE 22

double negative visual effects SIGGRAPH 2013

Filter Frizz

Inputs:

Hairs Ramp Mask Randomization

Steps:

1) Generate random sequence per-hair 2) Generate random sequence per-point 3) Apply random displacement to each curve point 4) Weigh the effect of the frizz by mask value, ramp value and random sequences

Improvement:

  • Combine mask and random values per-curve before launching

main kernel

  • Reduce texture accesses from (numSegments x numCurves) to

numCurves

  • 10% performance gain

22

slide-23
SLIDE 23

double negative visual effects SIGGRAPH 2013

FilterFrizz

  • Total 4-5x speedup
  • More data -> more performance gain

240k 1.2M 375 750 1125 1500 402 60 512 74 690 137 1340 264

Time (millisec) Num Hairs

CPU-60seg CPU-30seg CUDA-60seg CUDA-30seg

23

slide-24
SLIDE 24

double negative visual effects SIGGRAPH 2013

Wisps

Inputs:

Hairs Wisps center curves Envelope profiles Masks Randomization

Steps:

1) Generate envelope for each wisp 2) Distance computation hair follicle - wisp root 3) Randomly pick one of the overlapping wisps for each hair 4) Parallel transport of distance vector along the curve 5) Rescale vector so that it fits the envelope

24

slide-25
SLIDE 25

double negative visual effects SIGGRAPH 2013

Wisps

50k 200k 750 1500 2250 3000 450 125 500 142 2000 530 2580 672

10k Wisps Time (millisec) Num Hairs

50k 200k 225 450 675 900 50 13 78 20 485 118 855 210

100 Wisps Time (millisec) Num Hairs

CPU-60seg CPU-30seg CUDA-60seg CUDA-30seg CPU-60seg CPU-30seg CUDA-60seg CUDA-30seg

CPU kdtree - CUDA brute force 10x speedup

25

slide-26
SLIDE 26

double negative visual effects SIGGRAPH 2013

  • Try Kepler cards
  • Multiple streams
  • Multi-threaded CPU
  • Compile portions of graph
  • Kernel fusion
  • GPU k-d trees
  • Try CPU backends (OpenMP, TBB)
  • Include dynamics simulation system inside Furball

Future Work

26

slide-27
SLIDE 27

double negative visual effects SIGGRAPH 2013

Questions?

Francesco Giordana - fg@dneg.com Sarah Mcdonald - svm@dneg.com Gianluca Vatinno - gv@dneg.com

27