DAVID K. MCALLISTER, PH.D. OPTIX MANAGER
ADVANCES IN OPTIX DAVID K. MCALLISTER, PH.D. OPTIX MANAGER OPTIX - - PowerPoint PPT Presentation
ADVANCES IN OPTIX DAVID K. MCALLISTER, PH.D. OPTIX MANAGER OPTIX - - PowerPoint PPT Presentation
ADVANCES IN OPTIX DAVID K. MCALLISTER, PH.D. OPTIX MANAGER OPTIX EXECUTION MODEL Launch Ray Generation rtContextLaunch Program Shade Traverse SAMPLE DEVICE CODE RT_PROGRAM void dome_camera() { size_t2 screen = output_buffer.size();
OPTIX EXECUTION MODEL
rtContextLaunch
Launch
Ray Generation Program
Traverse Shade
SAMPLE DEVICE CODE
RT_PROGRAM void dome_camera() { size_t2 screen = output_buffer.size(); float2 d = make_float2(launch_index) / make_float2(screen) * make_float2(2.0f, 2.0f) - make_float2(1.0f, 1.0f); float3 angle = make_float3(d.x, d.y, sqrtf(1.0f - (d.x*d.x + d.y*d.y))); float3 ray_origin = eye; float3 ray_direction = normalize(angle.x*normalize(U) + angle.y*normalize(V) + angle.z*normalize(W));
- ptix::Ray ray(ray_origin, ray_direction, radiance_ray_type, scene_epsilon);
PerRayData_radiance prd; prd.importance = 1.f; prd.depth = 0;
rtTrace(top_object, ray, prd);
- utput_buffer[launch_index] = make_color(prd.result);
}
OPTIX EXECUTION MODEL
rtContextLaunch Exception Program Selector Visit Program Miss Program Node Graph Traversal Acceleration Traversal
Launch Traverse Shade
rtTrace Closest Hit Program Any Hit Program Intersection Program Callable Program Ray Generation Program
OPTIX ENCAPSULATES THE ALGORITHM
OptiX is a to-the-algorithm API
Processor Algorithm Software
To-the-metal To-the-algorithm
GOLDENROD
MAJOR ARCHITECTURAL RENOVATION
LLVM-based OptiX compiler Better GPU ray tracing performance More fluid interactive rendering Better multi-GPU scaling More efficient complex node graphs Additional input languages CPU backend
UNIFIED VIRTUAL MEMORY
Merges CPU and GPU memory spaces Full read/write access from both processors Eliminates GPU memory footprint barrier Coming in Pascal architecture (2016)
OPTIX 3.7
OPTIX PRIME
Specialized for ray tracing Latest algorithms from NVIDIA Research
Ray tracing kernels Treelet Reordering BVH (TRBVH)
Support for asynchronous computation CPU support No programing model support for shading No support for Quadro VCA No support for dynamic materials Triangles only No ability to target different architectures
INSTANCING IN PRIME
A model is a set of instances:
RTP_BUFFER_FORMAT_INSTANCE_MODEL RTP_BUFFER_FORMAT_TRANSFORM_FLOAT4x3
New API call
rtpModelSetInstances
Hit result formats
RTP_BUFFER_FORMAT_HIT_T_TRIID_INSTID RTP_BUFFER_FORMAT_HIT_T_TRIID_INSTID_U_V
Context Model BufferDesc
transforms instances
Model Model BufferDesc
INSTANCING IN PRIME
std::vector<instInfo_t> instanceData; std::vector<RTPmodel> instanceList; std::vector<SimpleMatrix4x3> transformList; createInstances(numInstances, models, instanceList, transformList, instanceData); RTPbufferdesc instances, transforms; rtpBufferDescCreate(context, RTP_BUFFER_FORMAT_INSTANCE_MODEL, RTP_BUFFER_TYPE_HOST, &instanceList[0], &instances); rtpBufferDescSetRange(instances, 0, instanceList.size()); rtpBufferDescCreate(context, RTP_BUFFER_FORMAT_TRANSFORM_FLOAT4x3, RTP_BUFFER_TYPE_HOST, &transformList[0], &transforms); rtpBufferDescSetRange(transforms, 0, transformList.size()); RTPmodel scene; rtpModelCreate(context, &scene); rtpModelSetInstances(scene, instances, transforms);
OPTIX PRIME IN MENTAL RAY 3.12
OPTIX 3.8
PROGRESSIVE API
Render all subframes in a single API call Encapsulate even more of the algorithm
STREAM BUFFERS
RTbuffer output_buffer, stream_buffer; rtBufferCreate(context, RT_BUFFER_OUTPUT, &output_buffer); rtBufferCreate(context, RT_BUFFER_PROGRESSIVE_STREAM, &stream_buffer); rtBufferSetSize2D(output_buffer, width, height); rtBufferSetSize2D(stream_buffer, width, height); rtBufferSetFormat(output_buffer, RT_FORMAT_FLOAT4); rtBufferSetFormat(stream_buffer, RT_FORMAT_UNSIGNED_BYTE4); rtBufferBindProgressiveStream(stream_buffer, output_buffer);
PROGRESSIVE API
rtContextLaunchProgressive2D(context, width, height, num_subframes); while(!finished) { int ready; rtBufferGetProgressiveUpdateReady(stream_buffer, &ready, 0, 0); if(ready) { rtBufferMap(stream_buffer, &data); display(data); rtBufferUnmap(stream_buffer); } if(scene_changed()) { // Update OptiX state rtVariableSet(...); } rtContextLaunchProgressive2D(context, width, height, num_subframes); }
PROGRESSIVE API (DEVICE)
rtDeclareVariable(unsigned int, subframe_idx, rtSubframeIndex, ); unsigned int seed = rand_seed(launch_index, frame, subframe_idx);
Quadro VCA Under the Hood
GPUs 8 x M6000-VCA GPUs GPU Memory 12 GB per GPU CUDA Cores 23,040 CPU Cores 20 Physical System Memory 256 GB Storage 4 x 512GB SSD Network 2 x 1GigE 2 x 10GigE (SFP+) 1 x InfiniBand Installed Software Iray IQ + Cent OS Linux + VCA Cluster Manager U.S. MSRP $50,000
Interactive Image Stream Incremental Updates OptiX App Ethernet or Internet Custom OptiX Applications All Processing on VCA OptiX Leveraging Same Infrastructure as Iray (using DiCE) Minimal Work within the OptiX App
CONNECTION API
RTremotedevice rdev; rtRemoteDeviceCreate("url", "user", "password", &rdev)); unsigned int num_configs; rtRemoteDeviceGetAttribute(rdev, RT_REMOTEDEVICE_ATTRIBUTE_NUM_CONFIGURATIONS, sizeof(unsigned int), &num_configs); int vca_config_index = chooseConfig(num_configs); rtRemoteDeviceReserve(rdev, vca_num_nodes, vca_config_index); int ready; do { rtRemoteDeviceGetAttribute(*rdev, RT_REMOTEDEVICE_ATTRIBUTE_STATUS, sizeof(int), &ready); if(ready != RT_REMOTEDEVICE_STATUS_READY) sleep(10); } while(ready != RT_REMOTEDEVICE_STATUS_READY); rtContextCreate(context); rtContextSetRemoteDevice(*context, rdev));
JOHN STONE
NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,
- U. Illinois at Urbana-Champaign
S5246—Innovations in OptiX
Guest Presentation: Integrating OptiX in VMD John E. Stone
Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign http://www.ks.uiuc.edu/
S5246, GPU Technology Conference 15:00-15:50, Room LL21E, San Jose Convention Center, San Jose, CA, Wednesday March 18, 2015
NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,
- U. Illinois at Urbana-Champaign
VMD – “Visual Molecular Dynamics”
Goal: A Computational Microscope Study the molecular machines in living cells
Ribosome: target for antibiotics Poliovirus
NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,
- U. Illinois at Urbana-Champaign
Lighting Comparison
Two lights, no shadows Two lights, hard shadows, 1 shadow ray per light Ambient occlusion + two lights, 144 AO rays/hit
NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,
- U. Illinois at Urbana-Champaign
VMD Chromatophore Rendering on Blue Waters
- New representatinos, GPU-accelerated
molecular surface calculations, memory- efficient algorithms for huge complexes
- VMD GPU-accelerated ray tracing engine
w/ CUDA+OptiX+MPI+Pthreads
- Each revision: 7,500 frames render on
~96 Cray XK7 nodes in 290 node-hours, 45GB of images prior to editing
GPU-Accelerated Molecular Visualization on Petascale Supercomputing Platforms.
- J. E. Stone, K. L. Vandivort, and K. Schulten. UltraVis’13, 2013.
Visualization of Energy Conversion Processes in a Light Harvesting Organelle at Atomic Detail.
- M. Sener, et al. SC'14 Visualization and Data Analytics Showcase, 2014.
***Winner of the SC'14 Visualization and Data Analytics Showcase
NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,
- U. Illinois at Urbana-Champaign
VMD 1.9.2 Interactive GPU Ray Tracing
- Ray tracing heavily used for VMD
publication-quality images/movies
- High quality lighting, shadows,
transparency, depth-of-field focal blur, etc.
- VMD now provides –interactive–
ray tracing on laptops, desktops, and remote visual supercomputers
NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,
- U. Illinois at Urbana-Champaign
Scen Scene e Gr Graph ph
VMD T VMD Tac achy hyonL
- nL-Opti
OptiX X Inter Interactiv active e RT w T w/ / Pr Prog
- gressiv
essive R e Rende endering ring
RT R T Rend endering ering Pass ass
Seed RNGs
TrBvh rBvh RT A T Acce cceler lerati tion
- n
Str Structur ucture e
Accumulate RT samples Normalize+copy accum. buf Compute ave. FPS, adjust RT samples per pass
Output Framebuffer
- Accum. Buf
NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,
- U. Illinois at Urbana-Champaign
VMD VMD Scen Scene
VMD T VMD Tac achy hyonL
- nL-Opti
OptiX: X: Multi Multi-GPU GPU on a Desktop
- n a Desktop or Sing
- r Single Node
le Node
Scen Scene e Da Data ta Replica eplicated, ted, Ima Image Space ge Space Par arallel allel Decompositi Decomposition
- n
- nto
- nto GPU
GPUs
GPU 0
TrBvh rBvh RT A T Acce cceler lerati tion
- n
Str Structur ucture e
GPU 3 GPU 2 GPU 1
NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,
- U. Illinois at Urbana-Champaign
Scen Scene e Gr Graph ph
VMD T VMD Tac achy hyonL
- nL-Opti
OptiX X Inter Interactiv active e RT w T w/ / OptiX 3.8 Pr OptiX 3.8 Prog
- gressiv
essive API e API
RT R T Rend endering ering Pass ass
Seed RNGs
TrBvh rBvh RT A T Acce cceler lerati tion
- n
Str Structur ucture e
Accumulate RT samples Normalize+copy accum. buf Compute ave. FPS, adjust RT samples per pass
Output Framebuffer
- Accum. Buf
NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,
- U. Illinois at Urbana-Champaign
Scen Scene e Gr Graph ph
VMD T VMD Tac achy hyonL
- nL-Opti
OptiX X Inter Interactiv active e RT w T w/ / OptiX 3.8 Pr OptiX 3.8 Prog
- gressiv
essive API e API
RT Pr T Prog
- gressiv
essive e Subfr Subframe ame
rtContextLaunchProgressive2D()
TrBvh rBvh RT A T Acce cceler lerati tion
- n
Str Structur ucture e
rtB tBuf ufferGet erGetPr Prog
- gress
essiv iveUpda eUpdateR teReady eady() ()
Draw Output Framebuffer
Check for User Interface Inputs, Update OptiX Variables
rtContextStopProgressive()
NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,
- U. Illinois at Urbana-Champaign
VMD VMD Scen Scene
VMD T VMD Tac achy hyonL
- nL-Opti
OptiX: X: Multi Multi-GPU GPU on N
- n NVIDIA
VIDIA VCA CA Cluster Cluster
Scen Scene e Da Data ta Replica eplicated, ted, Ima Image Space ge Space / Sam / Sample ple Space Space Par arallel allel Dec Decompo
- mposit
sition ion onto
- nto GPU
GPUs
VCA 0: 8 K6000 GPUs VCA N: 8 K6000 GPUs
NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,
- U. Illinois at Urbana-Champaign
Future Work
- Improved performance / quality trade-offs in
interactive RT stochastic sampling strategies
- Optimize GPU scene DMA and BVH regen speed for
time-varying geometry, e.g. MD trajectories
- Continue tuning of GPU-specific RT intersection
routines, memory layout
- GPU-accelerated movie encoder back-end
- Interactive RT combined with remote viz on HPC
systems, much larger data sizes
NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,
- U. Illinois at Urbana-Champaign
Acknowledgements
- Theoretical and Computational Biophysics Group, University of Illinois at
Urbana-Champaign
- NVIDIA CUDA Center of Excellence, University of Illinois at Urbana-
Champaign
- NVIDIA CUDA team
- NVIDIA OptiX team
- NCSA Blue Waters Team
- Funding:
– DOE INCITE, ORNL Titan: DE-AC05-00OR22725 – NSF Blue Waters: NSF OCI 07-25070, PRAC “The Computational Microscope”, ACI-1238993, ACI-1440026 – NIH support: 9P41GM104601, 5R01GM098243-02
NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute,
- U. Illinois at Urbana-Champaign
REGISTERED DEVELOPER PROGRAM
Access latest OptiX version Access private beta releases Tighter communication with OptiX developers
https://developer.nvidia.com/optix
MORE OPTIX TALKS
SessionTitle Day Start End Room Speaker S5659 Accelerating Mountain Bike Development with Optimized Design Visualization Tuesday 13:30 13:55 LL21A Geoff Casey S5188 FurryBall RT: New OptiX Core and 30x Speed Up Tuesday 15:00 15:25 LL21D Jan Tománek S5643 Advanced Rendering Solutions from NVIDIA Tuesday 15:30 16:20 LL21E Phillip Miller S5622 Dekko: A Framework for Real-Time Preview for VFX Wednesday 9:30 9:55 LL21D Damien Fagnou S5644 Flexible Cluster Rendering with NVIDIA VCA Wednesday 10:00 10:50 LL21E Phillip Miller S5541 CATIA Live Rendering Iray and NVIDIA VCA Wednesday 10:00 10:50 LL21A Pierre Maheut S5409 Custom Iray Applications and MDL for Consistent Visual Appearance Wednesday 14:00 14:50 Ll21E Dave Hutchinson S5246 Innovations in OptiX Wednesday 15:00 15:50 LL21E David McAllister S5628 Simulation-Based CGI for Automotive Applications Wednesday 16:00 16:25 LL21A Benoit Deschamps S5386 VMD: Publication-Quality Ray Tracing of Molecular Graphics with OptiX Thursday 9:00 9:25 LL21E John Stone S5416 Accelerad: Daylight Simulation for Architectural Spaces Using GPU Ray Tracing Thursday 14:00 14:25 LL21E Nathaniel Jones S5210 GPU-Accelerated Spectral Caustic Rendering of Homogeneous Caustic Objects Thursday 14:30 14:55 LL21E Budianto Tandianus