ADVANCES IN OPTIX DAVID K. MCALLISTER, PH.D. OPTIX MANAGER OPTIX - - PowerPoint PPT Presentation

advances in optix
SMART_READER_LITE
LIVE PREVIEW

ADVANCES IN OPTIX DAVID K. MCALLISTER, PH.D. OPTIX MANAGER OPTIX - - PowerPoint PPT Presentation

ADVANCES IN OPTIX DAVID K. MCALLISTER, PH.D. OPTIX MANAGER OPTIX EXECUTION MODEL Launch Ray Generation rtContextLaunch Program Shade Traverse SAMPLE DEVICE CODE RT_PROGRAM void dome_camera() { size_t2 screen = output_buffer.size();


slide-1
SLIDE 1

DAVID K. MCALLISTER, PH.D. OPTIX MANAGER

ADVANCES IN OPTIX

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4

OPTIX EXECUTION MODEL

rtContextLaunch

Launch

Ray Generation Program

Traverse Shade

slide-5
SLIDE 5

SAMPLE DEVICE CODE

RT_PROGRAM void dome_camera() { size_t2 screen = output_buffer.size(); float2 d = make_float2(launch_index) / make_float2(screen) * make_float2(2.0f, 2.0f) - make_float2(1.0f, 1.0f); float3 angle = make_float3(d.x, d.y, sqrtf(1.0f - (d.x*d.x + d.y*d.y))); float3 ray_origin = eye; float3 ray_direction = normalize(angle.x*normalize(U) + angle.y*normalize(V) + angle.z*normalize(W));

  • ptix::Ray ray(ray_origin, ray_direction, radiance_ray_type, scene_epsilon);

PerRayData_radiance prd; prd.importance = 1.f; prd.depth = 0;

rtTrace(top_object, ray, prd);

  • utput_buffer[launch_index] = make_color(prd.result);

}

slide-6
SLIDE 6

OPTIX EXECUTION MODEL

rtContextLaunch Exception Program Selector Visit Program Miss Program Node Graph Traversal Acceleration Traversal

Launch Traverse Shade

rtTrace Closest Hit Program Any Hit Program Intersection Program Callable Program Ray Generation Program

slide-7
SLIDE 7

OPTIX ENCAPSULATES THE ALGORITHM

OptiX is a to-the-algorithm API

Processor Algorithm Software

To-the-metal To-the-algorithm

slide-8
SLIDE 8

GOLDENROD

slide-9
SLIDE 9

MAJOR ARCHITECTURAL RENOVATION

LLVM-based OptiX compiler Better GPU ray tracing performance More fluid interactive rendering Better multi-GPU scaling More efficient complex node graphs Additional input languages CPU backend

slide-10
SLIDE 10

UNIFIED VIRTUAL MEMORY

Merges CPU and GPU memory spaces Full read/write access from both processors Eliminates GPU memory footprint barrier Coming in Pascal architecture (2016)

slide-11
SLIDE 11

OPTIX 3.7

slide-12
SLIDE 12

OPTIX PRIME

Specialized for ray tracing Latest algorithms from NVIDIA Research

Ray tracing kernels Treelet Reordering BVH (TRBVH)

Support for asynchronous computation CPU support No programing model support for shading No support for Quadro VCA No support for dynamic materials Triangles only No ability to target different architectures

slide-13
SLIDE 13

INSTANCING IN PRIME

A model is a set of instances:

RTP_BUFFER_FORMAT_INSTANCE_MODEL RTP_BUFFER_FORMAT_TRANSFORM_FLOAT4x3

New API call

rtpModelSetInstances

Hit result formats

RTP_BUFFER_FORMAT_HIT_T_TRIID_INSTID RTP_BUFFER_FORMAT_HIT_T_TRIID_INSTID_U_V

Context Model BufferDesc

transforms instances

Model Model BufferDesc

slide-14
SLIDE 14

INSTANCING IN PRIME

std::vector<instInfo_t> instanceData; std::vector<RTPmodel> instanceList; std::vector<SimpleMatrix4x3> transformList; createInstances(numInstances, models, instanceList, transformList, instanceData); RTPbufferdesc instances, transforms; rtpBufferDescCreate(context, RTP_BUFFER_FORMAT_INSTANCE_MODEL, RTP_BUFFER_TYPE_HOST, &instanceList[0], &instances); rtpBufferDescSetRange(instances, 0, instanceList.size()); rtpBufferDescCreate(context, RTP_BUFFER_FORMAT_TRANSFORM_FLOAT4x3, RTP_BUFFER_TYPE_HOST, &transformList[0], &transforms); rtpBufferDescSetRange(transforms, 0, transformList.size()); RTPmodel scene; rtpModelCreate(context, &scene); rtpModelSetInstances(scene, instances, transforms);

slide-15
SLIDE 15
slide-16
SLIDE 16

OPTIX PRIME IN MENTAL RAY 3.12

slide-17
SLIDE 17

OPTIX 3.8

slide-18
SLIDE 18

PROGRESSIVE API

Render all subframes in a single API call Encapsulate even more of the algorithm

slide-19
SLIDE 19

STREAM BUFFERS

RTbuffer output_buffer, stream_buffer; rtBufferCreate(context, RT_BUFFER_OUTPUT, &output_buffer); rtBufferCreate(context, RT_BUFFER_PROGRESSIVE_STREAM, &stream_buffer); rtBufferSetSize2D(output_buffer, width, height); rtBufferSetSize2D(stream_buffer, width, height); rtBufferSetFormat(output_buffer, RT_FORMAT_FLOAT4); rtBufferSetFormat(stream_buffer, RT_FORMAT_UNSIGNED_BYTE4); rtBufferBindProgressiveStream(stream_buffer, output_buffer);

slide-20
SLIDE 20

PROGRESSIVE API

rtContextLaunchProgressive2D(context, width, height, num_subframes); while(!finished) { int ready; rtBufferGetProgressiveUpdateReady(stream_buffer, &ready, 0, 0); if(ready) { rtBufferMap(stream_buffer, &data); display(data); rtBufferUnmap(stream_buffer); } if(scene_changed()) { // Update OptiX state rtVariableSet(...); } rtContextLaunchProgressive2D(context, width, height, num_subframes); }

slide-21
SLIDE 21

PROGRESSIVE API (DEVICE)

rtDeclareVariable(unsigned int, subframe_idx, rtSubframeIndex, ); unsigned int seed = rand_seed(launch_index, frame, subframe_idx);

slide-22
SLIDE 22

Quadro VCA Under the Hood

GPUs 8 x M6000-VCA GPUs GPU Memory 12 GB per GPU CUDA Cores 23,040 CPU Cores 20 Physical System Memory 256 GB Storage 4 x 512GB SSD Network 2 x 1GigE 2 x 10GigE (SFP+) 1 x InfiniBand Installed Software Iray IQ + Cent OS Linux + VCA Cluster Manager U.S. MSRP $50,000

slide-23
SLIDE 23

Interactive Image Stream Incremental Updates OptiX App Ethernet or Internet Custom OptiX Applications All Processing on VCA OptiX Leveraging Same Infrastructure as Iray (using DiCE) Minimal Work within the OptiX App

slide-24
SLIDE 24

CONNECTION API

RTremotedevice rdev; rtRemoteDeviceCreate("url", "user", "password", &rdev)); unsigned int num_configs; rtRemoteDeviceGetAttribute(rdev, RT_REMOTEDEVICE_ATTRIBUTE_NUM_CONFIGURATIONS, sizeof(unsigned int), &num_configs); int vca_config_index = chooseConfig(num_configs); rtRemoteDeviceReserve(rdev, vca_num_nodes, vca_config_index); int ready; do { rtRemoteDeviceGetAttribute(*rdev, RT_REMOTEDEVICE_ATTRIBUTE_STATUS, sizeof(int), &ready); if(ready != RT_REMOTEDEVICE_STATUS_READY) sleep(10); } while(ready != RT_REMOTEDEVICE_STATUS_READY); rtContextCreate(context); rtContextSetRemoteDevice(*context, rdev));

slide-25
SLIDE 25

JOHN STONE

slide-26
SLIDE 26

NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,

  • U. Illinois at Urbana-Champaign

S5246—Innovations in OptiX

Guest Presentation: Integrating OptiX in VMD John E. Stone

Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign http://www.ks.uiuc.edu/

S5246, GPU Technology Conference 15:00-15:50, Room LL21E, San Jose Convention Center, San Jose, CA, Wednesday March 18, 2015

slide-27
SLIDE 27

NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,

  • U. Illinois at Urbana-Champaign

VMD – “Visual Molecular Dynamics”

Goal: A Computational Microscope Study the molecular machines in living cells

Ribosome: target for antibiotics Poliovirus

slide-28
SLIDE 28

NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,

  • U. Illinois at Urbana-Champaign

Lighting Comparison

Two lights, no shadows Two lights, hard shadows, 1 shadow ray per light Ambient occlusion + two lights, 144 AO rays/hit

slide-29
SLIDE 29

NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,

  • U. Illinois at Urbana-Champaign

VMD Chromatophore Rendering on Blue Waters

  • New representatinos, GPU-accelerated

molecular surface calculations, memory- efficient algorithms for huge complexes

  • VMD GPU-accelerated ray tracing engine

w/ CUDA+OptiX+MPI+Pthreads

  • Each revision: 7,500 frames render on

~96 Cray XK7 nodes in 290 node-hours, 45GB of images prior to editing

GPU-Accelerated Molecular Visualization on Petascale Supercomputing Platforms.

  • J. E. Stone, K. L. Vandivort, and K. Schulten. UltraVis’13, 2013.

Visualization of Energy Conversion Processes in a Light Harvesting Organelle at Atomic Detail.

  • M. Sener, et al. SC'14 Visualization and Data Analytics Showcase, 2014.

***Winner of the SC'14 Visualization and Data Analytics Showcase

slide-30
SLIDE 30

NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,

  • U. Illinois at Urbana-Champaign

VMD 1.9.2 Interactive GPU Ray Tracing

  • Ray tracing heavily used for VMD

publication-quality images/movies

  • High quality lighting, shadows,

transparency, depth-of-field focal blur, etc.

  • VMD now provides –interactive–

ray tracing on laptops, desktops, and remote visual supercomputers

slide-31
SLIDE 31

NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,

  • U. Illinois at Urbana-Champaign

Scen Scene e Gr Graph ph

VMD T VMD Tac achy hyonL

  • nL-Opti

OptiX X Inter Interactiv active e RT w T w/ / Pr Prog

  • gressiv

essive R e Rende endering ring

RT R T Rend endering ering Pass ass

Seed RNGs

TrBvh rBvh RT A T Acce cceler lerati tion

  • n

Str Structur ucture e

Accumulate RT samples Normalize+copy accum. buf Compute ave. FPS, adjust RT samples per pass

Output Framebuffer

  • Accum. Buf
slide-32
SLIDE 32

NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,

  • U. Illinois at Urbana-Champaign

VMD VMD Scen Scene

VMD T VMD Tac achy hyonL

  • nL-Opti

OptiX: X: Multi Multi-GPU GPU on a Desktop

  • n a Desktop or Sing
  • r Single Node

le Node

Scen Scene e Da Data ta Replica eplicated, ted, Ima Image Space ge Space Par arallel allel Decompositi Decomposition

  • n
  • nto
  • nto GPU

GPUs

GPU 0

TrBvh rBvh RT A T Acce cceler lerati tion

  • n

Str Structur ucture e

GPU 3 GPU 2 GPU 1

slide-33
SLIDE 33

NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,

  • U. Illinois at Urbana-Champaign

Scen Scene e Gr Graph ph

VMD T VMD Tac achy hyonL

  • nL-Opti

OptiX X Inter Interactiv active e RT w T w/ / OptiX 3.8 Pr OptiX 3.8 Prog

  • gressiv

essive API e API

RT R T Rend endering ering Pass ass

Seed RNGs

TrBvh rBvh RT A T Acce cceler lerati tion

  • n

Str Structur ucture e

Accumulate RT samples Normalize+copy accum. buf Compute ave. FPS, adjust RT samples per pass

Output Framebuffer

  • Accum. Buf
slide-34
SLIDE 34

NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,

  • U. Illinois at Urbana-Champaign

Scen Scene e Gr Graph ph

VMD T VMD Tac achy hyonL

  • nL-Opti

OptiX X Inter Interactiv active e RT w T w/ / OptiX 3.8 Pr OptiX 3.8 Prog

  • gressiv

essive API e API

RT Pr T Prog

  • gressiv

essive e Subfr Subframe ame

rtContextLaunchProgressive2D()

TrBvh rBvh RT A T Acce cceler lerati tion

  • n

Str Structur ucture e

rtB tBuf ufferGet erGetPr Prog

  • gress

essiv iveUpda eUpdateR teReady eady() ()

Draw Output Framebuffer

Check for User Interface Inputs, Update OptiX Variables

rtContextStopProgressive()

slide-35
SLIDE 35

NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,

  • U. Illinois at Urbana-Champaign

VMD VMD Scen Scene

VMD T VMD Tac achy hyonL

  • nL-Opti

OptiX: X: Multi Multi-GPU GPU on N

  • n NVIDIA

VIDIA VCA CA Cluster Cluster

Scen Scene e Da Data ta Replica eplicated, ted, Ima Image Space ge Space / Sam / Sample ple Space Space Par arallel allel Dec Decompo

  • mposit

sition ion onto

  • nto GPU

GPUs

VCA 0: 8 K6000 GPUs VCA N: 8 K6000 GPUs

slide-36
SLIDE 36

NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,

  • U. Illinois at Urbana-Champaign

Future Work

  • Improved performance / quality trade-offs in

interactive RT stochastic sampling strategies

  • Optimize GPU scene DMA and BVH regen speed for

time-varying geometry, e.g. MD trajectories

  • Continue tuning of GPU-specific RT intersection

routines, memory layout

  • GPU-accelerated movie encoder back-end
  • Interactive RT combined with remote viz on HPC

systems, much larger data sizes

slide-37
SLIDE 37

NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,

  • U. Illinois at Urbana-Champaign

Acknowledgements

  • Theoretical and Computational Biophysics Group, University of Illinois at

Urbana-Champaign

  • NVIDIA CUDA Center of Excellence, University of Illinois at Urbana-

Champaign

  • NVIDIA CUDA team
  • NVIDIA OptiX team
  • NCSA Blue Waters Team
  • Funding:

– DOE INCITE, ORNL Titan: DE-AC05-00OR22725 – NSF Blue Waters: NSF OCI 07-25070, PRAC “The Computational Microscope”, ACI-1238993, ACI-1440026 – NIH support: 9P41GM104601, 5R01GM098243-02

slide-38
SLIDE 38

NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,

  • U. Illinois at Urbana-Champaign
slide-39
SLIDE 39

REGISTERED DEVELOPER PROGRAM

Access latest OptiX version Access private beta releases Tighter communication with OptiX developers

https://developer.nvidia.com/optix

slide-40
SLIDE 40

MORE OPTIX TALKS

SessionTitle Day Start End Room Speaker S5659 Accelerating Mountain Bike Development with Optimized Design Visualization Tuesday 13:30 13:55 LL21A Geoff Casey S5188 FurryBall RT: New OptiX Core and 30x Speed Up Tuesday 15:00 15:25 LL21D Jan Tománek S5643 Advanced Rendering Solutions from NVIDIA Tuesday 15:30 16:20 LL21E Phillip Miller S5622 Dekko: A Framework for Real-Time Preview for VFX Wednesday 9:30 9:55 LL21D Damien Fagnou S5644 Flexible Cluster Rendering with NVIDIA VCA Wednesday 10:00 10:50 LL21E Phillip Miller S5541 CATIA Live Rendering Iray and NVIDIA VCA Wednesday 10:00 10:50 LL21A Pierre Maheut S5409 Custom Iray Applications and MDL for Consistent Visual Appearance Wednesday 14:00 14:50 Ll21E Dave Hutchinson S5246 Innovations in OptiX Wednesday 15:00 15:50 LL21E David McAllister S5628 Simulation-Based CGI for Automotive Applications Wednesday 16:00 16:25 LL21A Benoit Deschamps S5386 VMD: Publication-Quality Ray Tracing of Molecular Graphics with OptiX Thursday 9:00 9:25 LL21E John Stone S5416 Accelerad: Daylight Simulation for Architectural Spaces Using GPU Ray Tracing Thursday 14:00 14:25 LL21E Nathaniel Jones S5210 GPU-Accelerated Spectral Caustic Rendering of Homogeneous Caustic Objects Thursday 14:30 14:55 LL21E Budianto Tandianus