OPTIX UPDATE David K. McAllister, Ph.D. OptiX Engineering Manager - - PowerPoint PPT Presentation

optix update
SMART_READER_LITE
LIVE PREVIEW

OPTIX UPDATE David K. McAllister, Ph.D. OptiX Engineering Manager - - PowerPoint PPT Presentation

OPTIX UPDATE David K. McAllister, Ph.D. OptiX Engineering Manager OPTIX UPDATE OptiX Introduction Current Goodness Upcoming Goodness S pecial Guest RENDERING RENDERING WITH RAY TRACING S ampling What rays t o t


slide-1
SLIDE 1

OPTIX UPDATE

David K. McAllister, Ph.D. OptiX Engineering Manager

slide-2
SLIDE 2

OPTIX UPDATE

  • OptiX Introduction
  • Current Goodness
  • Upcoming Goodness
  • S

pecial Guest

slide-3
SLIDE 3

RENDERING

slide-4
SLIDE 4

RENDERING WITH RAY TRACING

  • S

ampling

— What rays t o t race?

  • Ray tracing

— What do t he rays hit ?

  • S

hading

— What color are t he rays?

slide-5
SLIDE 5

RAY TRACING IN THE ABSTRACT

  • Given a ray (O, D) and a geometric dataset find

— any hit — closest hit — all hits

  • Current datasets ~1M -> 100M primitives, usually triangles
  • Use a spatial data structure optimized for these operations
  • Datasets can also include GB of other data like textures

?

slide-6
SLIDE 6

ACCELERATION STRUCTURES

  • Bounding Volume Hierarchy (BVH)
slide-7
SLIDE 7

GPUS – THE PROCESSOR FOR RAY TRACING

  • Abundant parallelism, massive computational power
  • GPUs excel at shading
  • Opportunity for hybrid algorithms
slide-8
SLIDE 8

RAY CASTING (APPEL, 1968)

slide-9
SLIDE 9

REAL TIME PATH TRACING

  • What would it take?

— 4 rays / sample — 50 samples / pixel — 2M pixels / frame — 30 frames / second — 12B rays / second

1 shading sample 1 AA sample 9 shading samples 1 AA sample 18 shading samples 2 AA samples 36 shading samples 4 AA samples 72 shading samples 8 AA samples 144 shading samples 16 AA samples

  • GeForce GTX Titan:

— 350M rays / second — Need 34X speedup

slide-10
SLIDE 10

DESIGN GARAGE: ITERATIVE PATH TRACER

  • Closest hit programs do:

— Direct lighting (next event estimation with shadow query ray) — Compute next ray (sample BS DF for reflected/ refracted ray info) — Return direct light and next ray info to ray gen program

  • Ray generation program iterates along path
slide-11
SLIDE 11

DESIGNGARAGE: ITERATIVE PATH TRACER

RT_PROGRAM void closestHit() { // Calculate BSDF sample for next path ray float3 ray_direction, ray_weight; sampleBSDF( wo, N, ray_direction, ray_weight ); // Recurse float3 indirect_light = tracePathRay(P, ray_direction, ray_weight); // Perform direct lighting ... prd.result = indirect_light + direct_light; } RT_PROGRAM void rayGeneration() { float3 ray_dir = cameraGetRayDir(); float3 result = tracePathRay( camera.pos, ray_dir, 1 );

  • utput_buffer[ launch_index ] = result;

}

slide-12
SLIDE 12

DESIGNGARAGE: ITERATIVE PATH TRACER

RT_PROGRAM void closestHit() { // Calculate BSDF sample for next path ray float3 ray_direction, ray_weight; sampleBSDF( wo, N, ray_direction, ray_weight ); // Return sampled ray info and let ray_gen // iterate prd.ray_dir = ray_direction; prd.ray_origin = P; prd.ray_weight = ray_weight; // Perform direct lighting ... prd.direct = direct_light; } RT_PROGRAM void rayGeneration() { PerRayData prd; prd.ray_dir = cameraGetRayDir(); prd.ray_origin = camera.position; float3 weight = make_float3( 1.0f ); float3 result = make_float3( 0.0f ); for( i = 0; i < MAX_DEPTH; ++i ) { traceRay( prd.ray_origin, prd.ray_dir, prd ); result += prd.direct*weight; weight *= prd.ray_weight; }

  • utput_buffer[ launch_index ] = result;

}

slide-13
SLIDE 13

WHAT’S THE LATEST

slide-14
SLIDE 14

REGISTERED DEVELOPER PROGRAM

  • Access latest OptiX version
  • Access private beta releases
  • Tighter communication with OptiX developers
  • 800 registered developers in 4 months
slide-15
SLIDE 15

OPTIX 3.6

  • CUDA 6 support
  • Maxwell (S

M 5.0) support

— Great performance

  • Callable Program IDs

— Use with Bindless Buffer IDs and Texture IDs — Finishes our composable shading solution — S ee Pixar, Danny Nahmias, NVIDIA Theater, Tue.3:20

  • Optimized CPU ray tracing in OptiX Prime (Commercial)
slide-16
SLIDE 16

ACCELERATION BUILDER OPTIONS

Slow Render Slow Build

Bvh

Fast Build Fast Render

Sbvh MedianBvh Lbvh

Trbvh

slide-17
SLIDE 17

TRBVH MEMORY FOOTPRINT IMPROVED

  • S

cratch space used to limit build size

  • Also, 2 GB max build size
  • For a 40 million triangle model

— Before 4.5 GB — Now 2.8 GB

  • Builds Trbvh in chunks.

— Chunk size parameter — Virtually the same ray tracing performance

  • Both OptiX and OptiX Prime
slide-18
SLIDE 18

OPTIX PRIME

  • S

pecialized for ray tracing (no shading)

  • Replaces rtuTraversal (rtuTraversal is still available)
  • Improved performance

— Uses latest algorithms from NVIDIA Research

  • ray tracing kernels [Aila and Laine 2009; Aila et al. 2012]
  • Treelet Reordering BVH (TRBVH) [Karras 2013]

— Can use CUDA buffers as input/ output — S upport for asynchronous computation

  • S

hips with OptiX 3.6

slide-19
SLIDE 19

0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0 Arabic Armadillo Babylonian Bar Blade Bubs Buddha City Conference Crown Crytek-Sp Dragon Fairy Hairball Italian Motor Mustang PowerPla-16 Sibenik Soda Vegetation Veyron-NG

PRIMARY Mrays/s

OPTIX PRIME RAY TRACING PERFORMANCE

slide-20
SLIDE 20

0.0 50.0 100.0 150.0 200.0 250.0 300.0 350.0 Arabic Armadillo Babylonian Bar Blade Bubs Buddha City Conference Crown Crytek-Sp Dragon Fairy Hairball Italian Motor Mustang PowerPla-16 Sibenik Soda Vegetation Veyron-NG

DIFFUSE Mrays/s

OPTIX PRIME RAY TRACING PERFORMANCE

slide-21
SLIDE 21

OPTIX PRIME RAY TRACING PERFORMANCE

  • This is the best single-chip ray

tracing performance ever reported.

  • By more than 2X.
slide-22
SLIDE 22

WHAT’S COMING

slide-23
SLIDE 23

MAJOR ARCHITECTURAL RENOVATION

  • LL

VM-based OptiX compiler

  • Priorities

— Better GPU ray tracing performance — More fluid interactive rendering — Better multi-GPU scaling — More efficient complex node graphs — Additional input languages — CPU backend

slide-24
SLIDE 24

UNIFIED VIRTUAL MEMORY

  • Merges CPU and GPU memory spaces
  • Full read/ write access from both processors
  • Eliminates GPU memory footprint barrier
  • Coming in Pascal architecture (2016)
slide-25
SLIDE 25

NVIDIA VCA Under the Hood

GPUs 8 x K6000-VCA GPUs GPU Memory 12 GB per GPU CUDA Cores 23,040 CPU Cores 20 S ystem Memory 256 GB S torage 4 x 512GB S S D Network 2 x 1GigE 2 x 10GigE (S FP+) 1 x InfiniBand Installed S

  • ftware

Cent OS Linux + Iray IQ + VCA Cluster Manager U.S . MS RP $50,000

slide-26
SLIDE 26

OptiX on VCA Coming Next Year

Interactive Image S tream Incremental Updates OptiX App Ethernet or Internet Custom OptiX Applications All Processing on VCA OptiX Leveraging S ame Infrastructure as Iray (using DiCE) Minimal Work within the OptiX App

slide-27
SLIDE 27

MOVING PICTURE COMPANY

  • Damien Fagnau

— Global Head, VFX Operations

slide-28
SLIDE 28

OPTIX AT SIGGRAPH

  • OptiX on Titan Z and VCA, NVIDIA booth, Tue. –

Thu.

  • Iray on NVIDIA VCA, NVIDIA booth, Tue. –

Thu.

  • Recent Advances in Light-Transport, room 109, Tue. 2:00
  • Pixar, Danny Nahmias, NVIDIA Theater, Tue.3:20
  • Bunkspeed with Iray Interactive, NVIDIA Theater, Wed. 12:00
  • MPC, Damien Fagnou, NVIDIA Theater, Wed. 12:40