GPU Enhanced Remote Collaborative Scientific Visualization Benjamin - - PowerPoint PPT Presentation

gpu enhanced remote collaborative scientific visualization
SMART_READER_LITE
LIVE PREVIEW

GPU Enhanced Remote Collaborative Scientific Visualization Benjamin - - PowerPoint PPT Presentation

GPU Enhanced Remote Collaborative Scientific Visualization Benjamin Hernandez (OLCF), Tim Biedert (NVIDIA) March 20th, 2019 ORNL is managed by UT-Battelle LLC for the US Department of Energy This research used resources of the Oak Ridge


slide-1
SLIDE 1

ORNL is managed by UT-Battelle LLC for the US Department of Energy

GPU Enhanced Remote Collaborative Scientific Visualization

Benjamin Hernandez (OLCF), Tim Biedert (NVIDIA) March 20th, 2019

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

slide-2
SLIDE 2

2

Contents

  • Part I

– GPU Enhanced Remote Collaborative Scientific Visualization

  • Part II

– Hardware-accelerated Multi-tile streaming for Realtime Remote

Visualization

slide-3
SLIDE 3

3

Oak Ridge Leadership Computing Facility (OLCF)

  • Provide the computational and data resources required to

solve the most challenging problems.

  • Highly competitive user allocation programs (INCITE, ALCC).

– OLCF provides 10x to 100x more resources than other centers

  • We collaborate with users sharing

diversity in expertise and geographic location

slide-4
SLIDE 4

4

How are these collaborations ?

  • Collaborations

– Extends through the life cycle of the

data from computation to analysis and visualization.

– Are structured around data.

  • Data analysis and Visualization

– An iterative and sometimes remote

process involving students, visualization experts, PIs and stakeholders

Simulation Pre- visualization Evaluation Visualization Discovery Team needs PI feedback

  • Viz. expert

feedback

slide-5
SLIDE 5

5

How are these collaborations ?

  • The collaborative future must be

characterized by[1]:

1.

U.S. Department of Energy.(2011). Scientific collaborations for extreme- scale science (Workshop Report). Retrieved from https://indico.bnl.gov/event/403/attachments/11180/13626/ScientificC

  • llaborationsforExtreme-ScaleScienceReportDec2011_Final.pdf

Discovery Resources easy to find Connectivity No resource is an island Portability Resources widely and transparently usable Centrality Resources efficiently, and centrally supported

“Web-based immersive visualization Tools with the ease of a virtual reality game.” “Visualization ideally would be combined with user-guided as well as template-guided automated feature extraction, real-time annotation, and quantitative geometrical analysis.” “Rapid data visualization and analysis to enable understanding in near real time by a geographically dispersed team.” …

slide-6
SLIDE 6

6

How are these collaborations ?

  • INCITE “Petascale simulations of short pulse laser interaction with

metals” PI Leonid Zhigilei, University of Virginia

– Laser ablation in vacuum and liquid environments – Hundreds of million to billion scale atomistic simulations, dozens of time steps

  • INCITE “Molecular dynamics of motor-protein networks in cellular

energy metabolism” PI(s) Abhishek Singharoy, Arizona State University

– Hundreds of million scale atomistic simulation, hundreds of time steps

slide-7
SLIDE 7

7

How are these collaborations ?

  • The data is centralized in OLCF
  • SIGHT is a custom platform for interactive data analysis and

visualization.

– Support for collaborative features are needed for

  • Low latency remote visualization streaming
  • Simultaneous and independent user views
  • Collaborative multi-display setting environment
slide-8
SLIDE 8

8 8

SIGHT: Exploratory Visualization of Scientific Data

  • Designed around user

needs

  • Lightweight tool

– Load your data – Perform exploratory analysis – Visualize/Save results

  • Heterogeneous scientific

visualization

– Advanced shading to enable new insights into data exploration. – Multicore and manycore support.

  • Remote visualization

– Server/Client architecture to provide high end visualization in laptops, desktops, and powerwalls.

  • Multi-threaded I/O
  • Supports interactive/batch

visualization

– In-situ (some effort)

  • Designed having OLCF infrastructure

in mind.

slide-9
SLIDE 9

9

Publications

  • M. V. Shugaev, C. Wu, O. Armbruster, A.

Naghilou, N. Brouwer, D. S. Ivanov, T. J.-Y. Derrien, N. M. Bulgakova, W. Kautek, B. Rethfeld, and L. V. Zhigilei, Fundamentals of ultrafast laser-material interaction, MRS

  • Bull. 41 (12), 960-968, 2016.

C.-Y. Shih, M. V. Shugaev, C. Wu, and L. V. Zhigilei, Generation of subsurface voids, incubation effect, and formation of nanoparticles in short pulse laser interactions with bulk metal targets in liquid: Molecular dynamics study, J. Phys. Chem. C 121, 16549- 16567, 2017. C.-Y. Shih, R. Streubel, J. Heberle, A. Letzel, M.

  • V. Shugaev, C. Wu, M. Schmidt, B. Gökce, S.

Barcikowski, and L. V. Zhigilei, Two mechanisms

  • f nanoparticle generation in picosecond laser

ablation in liquids: the origin of the bimodal size distribution, Nanoscale 10, 6900-6910, 2018.

slide-10
SLIDE 10

10 10

SIGHT’s System Architecture

SIGHT Client

Web browser - JavaScript

OLCF

SIGHT

Multi-threaded Dataset Parser Ray Tracing Backends1 NVIDIA Optix Intel OSPray Data Parallel Analysis2 Frame Server

Websockets

SIMD ENC

TurboJPEG

GPU ENC

NVENC NVPipe

1S7175 Exploratory Visualization of Petascale Particle Data in Nvidia DGX-1

Nvidia GPU Technology Conference 2017

2P8220 Heterogeneous Selection Algorithms for Interactive Analysis of Billion

Scale Atomistic Datasets Nvidia GPU Technology Conference 2018

slide-11
SLIDE 11

11

Node(s)

Design of Collaborative Infrastructure - Alternative 1

SIGHT User 1 Node(s) Data SIGHT Node(s) SIGHT User 3 User 2

slide-12
SLIDE 12

12

Job ? Job 2 Job 1 Node(s)

Design of Collaborative Infrastructure - Alternative 1

SIGHT User 1 Node(s) Data SIGHT User 2 Job 3 Node(s) SIGHT User 3

slide-13
SLIDE 13

13

Job 1 Job 3 Job 2 Job 1 Node(s)

Design of Collaborative Infrastructure - Alternative 1

SIGHT User 1 Node(s) Data SIGHT Node(s) SIGHT User 3 User 2

! !

Co-scheduling ! Job coordination ! Communication !

slide-14
SLIDE 14

14

Job 1

Design of Collaborative Infrastructure - Alternative 2

Node(s) SIGHT User 1 Data User 3 User 2

slide-15
SLIDE 15

15

Design of Collaborative Infrastructure

  • ORNL Summit System Overview

– 4,608 nodes – Dual-port Mellanox EDR InfiniBand network – 250 PB IBM file system transferring data at

2.5 TB/s

  • Each node has

– 2 IBM POWER9 processors – 6 NVIDIA Tesla V100 GPUs – 608 GB of fast memory (96 GB HBM2 + 512

GB DDR4)

– 1.6 TB of NV memory

Node(s) SIGHT User 1 Data User 3 User 2

slide-16
SLIDE 16

16

Design of Collaborative Infrastructure

  • Each NVIDIA Tesla V100 GPU

– 3 NVENC Chips – Unrestricted number of concurrent

sessions

  • NVPipe

– Lightweight C API library for low-latency

video compression

– Easy access to NVIDIA's hardware-

accelerated H.264 and HEVC video codecs

slide-17
SLIDE 17

17 17

GUI Events

Enhancing SIGHT Frame Server

Low latency encoding

Frame Server

Websockets

NVENC Ray Tracing Backend

SIGHT Client

Web browser - JavaScript

(x, y, button) Keyboard GUI Events

Sharing Optix buffer (OptiXpp)

Buffer frameBuffer = context->createBuffer( RT_BUFFER_OUTPUT, RT_FORMAT_UNSIGNED_BYTE4, m_width, m_height ); frameBufferPtr = buffer->getDevicePointer(optxDevice);

...

compress (frameBufferPtr);

Framebuffer Visualization Stream (x, y, button) Keyboard

slide-18
SLIDE 18

18 18

Enhancing SIGHT Frame Server

Low latency encoding

Opening encoding session:

myEncoder = NvPipe_CreateEncoder (NVPIPE_RGBA32, NVPIPE_H264, NVPIPE_LOSSY, bitrateMbps * 1000 * 1000, targetFps); if (!m_encoder) return error; myCompressedImg = new unsigned char[w*h*4];

Framebuffer compression:

bool compress (myFramebufferPtr) { ...

myCompressedSize = NvPipe_Encode(myEncoder, myFramebufferPtr, w*4, myCompressedImg, w*h*4, w, h, true); if (myCompressedSize == 0 ) return error;

… }

Closing encoding session:

NvPipe_Destroy(myEncoder);

GUI Events

Frame Server

Websockets

NVENC Ray Tracing Backend

SIGHT Client

Web browser - JavaScript

Framebuffer Visualization Stream (x, y, button) Keyboard Camera Parameters

slide-19
SLIDE 19

19

Enhancing SIGHT Frame Server

Low latency encoding

  • System Configuration:

– DGX-1 Volta – Connection Bandwidth

  • 800Mbps (ideally 1 Gbps)

– NVENC Encoder

  • H264 PROFILE BASELINE
  • 32 MBPS, 30 FPS

– Turbo JPEG

  • SIMD Instructions on
  • JPEG Quality 50

– Decoder

  • Broadway.js (FireFox 65)
  • Media Source Extensions (Chrome 72)
  • Built-in JPEG Decoder (Chrome 72)

NVENC NVENC+MP4 TJPEG Encoding HD (ms) 4.65 6.05 16.71 Encoding 4K (ms) 12.13 17.89 51.89 Frame Size HD (KB) 116.00 139.61 409.76 Frame Size 4K (KB) 106.32 150.65 569.04

Average

Broadway.cs MSE Built-in JPEG Decoding HD (ms) 43.28 39.97 78.15 Decoding 4K (ms) 87.40 53.10 197.63

slide-20
SLIDE 20

20

Enhancing SIGHT Frame Server

Low latency encoding

slide-21
SLIDE 21

21

Enhancing SIGHT Frame Server

Low latency encoding

slide-22
SLIDE 22

22

Enhancing SIGHT Frame Server

Simultaneous and independent user views

  • A Summit node can produce 4K visualizations with NVIDIA Optix

at interactive rates.

User 1 User 2 User 3 User 4 4K

HD HD HD HD

EVEREST User A User B

16:9 16:9 32:9

slide-23
SLIDE 23

23

Enhancing SIGHT Frame Server Simultaneous and independent user views

slide-24
SLIDE 24

24 24

Thread 0 Frame Server

Enhancing SIGHT Frame Server

Simultaneous and independent user views

Ray Tracing Backend

User 1

(x, y, button) Keyboard GUI Events

User 2 User 3

GUI Event queue Thread 1 Frame Server Thread 2 Frame Server

Camera Parameters (Usr Id)

Frame Buffer queue

  • Viz. frame

(Usr Id)

  • Viz. frame
slide-25
SLIDE 25

25

Enhancing SIGHT Frame Server

Simultaneous and independent user views

  • Video
slide-26
SLIDE 26

26

Discussion

  • Further work

– Multi-perspective/orthographic

projections

– Traditional case: stereoscopic projection – Annotations, sharing content between

users, saving sessions

  • How AI could help to improve remote

visualization performance:

– NVIDIA NGX Tech

  • AI Up-Res

– Improve compression rates – Different resolutions

  • Render and stream @ 720p, decoding at

HD, 2K, 4K according to each user display

  • AI Slow-Mo

– Render at low framerates

  • Optix Denoiser

– Ray tracing converge faster

slide-27
SLIDE 27

27

Thanks!

Benjamin Hernandez

Advanced Data and Workflows Group Oak Ridge National Laboratory hernandezarb@ornl.gov

  • Acknowledgments

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Datasets provided by Cheng-Yu Shi and Leonid Zhigilei, Computational Materials Group at University of Virginia, INCITE award MAT130.

slide-28
SLIDE 28

Tim Biedert, 03/20/2019

HARDWARE-ACCELERATED MULTI-TILE STREAMING FOR REALTIME REMOTE VISUALIZATION

slide-29
SLIDE 29

2

INSTANT VISUALIZATION FOR FASTER SCIENCE

Traditional

Slower Time to Discovery

CPU Supercomputer Viz Cluster

Simulation- 1 Week Viz- 1 Day

Multiple Iterations

Time to Discovery = Months

Tesla Platform

Faster Time to Discovery

GPU-Accelerated Supercomputer

Visualize while you simulate/without data transfers

Restart Simulation Instantly Multiple Iterations

Time to Discovery = Weeks Days

Data Transfer

  • Interactivity
  • Scalability
  • Flexibility

Value Proposition

slide-30
SLIDE 30

3

CO-PROCESSING PARTITIONED SYSTEM LEGACY WORKFLOW

SUPPORTING MULTIPLE VISUALIZATION WORKFLOWS

Separate compute & vis system Communication via file system Compute and visualization

  • n same GPU

Communication via host- device transfers or memcpy Different nodes for different roles Communication via high- speed network

slide-31
SLIDE 31

4

VISUALIZATION-ENABLED SUPERCOMPUTERS

http://blogs.nvidia.com/blog/2014/11/19/gpu-in- situ-milky-way/

CSCS Piz Daint NCSA Blue Waters

Galaxy formation

http://devblogs.nvidia.com/parallelforall/hpc

  • visualization-nvidia-tesla-gpus/

ORNL Titan

Molecular dynamics Cosmology

http://www.sdav-scidac.org/29- highlights/visualization/66-accelerated-cosmology- data-anal.html

slide-32
SLIDE 32

5

GEFORCE NOW

You listen to music on Spotify. You watch movies on Netflix. GeForce Now lets you play games the same way. Instantly stream the latest titles from our powerful cloud-gaming

  • supercomputers. Think of it as your

game console in the sky. Gaming is now easy and instant.

ROAD TO EXASCALE

Volta to Fuel Most Powerful US Supercomputers

1.64 1.50 1.39 1.41 1.37 1.7 1.4 1.5

V100 Performance Normalized to P100

1.5X HPC Performance in 1 Year

Summit Supercomputer 200+ PetaFlops ~3,400 Nodes 10 Megawatts

System Config Info: 2X Xeon E5-2690 v4, 2.6GHz, w/ 2X Tesla P100 or V100.

slide-33
SLIDE 33

6

VISUALIZATION TRENDS

New Approaches Required to Solve the Remoting Challenge

Increasing data set sizes In-situ scenarios Interactive workflows New display technologies Globally distributed user bases

slide-34
SLIDE 34

7

STREAMING

Benefits of Rendering on Supercomputer

Scale with Simulation

No Need to Scale Separate Vis Cluster

Cheaper Infrastructure

All Heavy Lifting Performed on the Server

Interactive High-Fidelity Rendering

Improves Perception and Scientific Insight

slide-35
SLIDE 35

8

FLEXIBLE GPU ACCELERATION ARCHITECTURE

* Diagram represents support for the NVIDIA Turing GPU family ** 4:2:2 is not natively supported on HW *** Support is codec dependent

Independent CUDA Cores & Video Engines

slide-36
SLIDE 36

9

CASE STUDY

Streaming of Large Tile Counts Frame rates / Latency / Bandwidth Synchronization Comparison against CPU-based Compressors Strong Scaling (Direct-Send Sort-First Compositing)

Hardware-Accelerated Multi-Tile Streaming for Realtime Remote Visualization Tim Biedert, Peter Messmer, Tom Fogal, Christoph Garth Eurographics Symposium on Parallel Graphics and Visualization (EGPGV) 2018 (Best Paper) DOI: 10.2312/pgv.20181093

slide-37
SLIDE 37

10

CONCEPTUAL OVERVIEW

Asynchronous Pipelines

slide-38
SLIDE 38

11

BENCHMARK SCENES

NASA Synthesis 4K

Space

Low Complexity

Orbit

Medium Complexity

Ice

High Complexity

Streamlines

Extreme Complexity

slide-39
SLIDE 39

12

CODEC PERFORMANCE

slide-40
SLIDE 40

13

BITRATE VS QUALITY

0,6 0,65 0,7 0,75 0,8 0,85 0,9 0,95 1 1 2 4 8 16 32 64 128 256

SSIM Bitrate (90 Hz) [Mbps]

Space (H.264) Orbit (H.264) Ice (H.264) Streamlines (H.264) Space (HEVC) Orbit (HEVC) Ice (HEVC) Streamlines (HEVC)

Structural Similarity Index (SSIM)

slide-41
SLIDE 41

14

HARDWARE LATENCY

2 4 6 8 10 12 14 16 18 20 1 2 4 8 16 32 64 128 256

Latency [ms] Bitrate (90 Hz) [Mbps]

Encode (HEVC, Space) Encode (HEVC, Orbit) Encode (HEVC, Ice) Encode (HEVC, Streamlines) Encode (H.264, Space) Encode (H.264, Orbit) Encode (H.264, Ice) Encode (H.264, Streamlines) Decode (HEVC, Space) Decode (HEVC, Orbit) Decode (HEVC, Ice) Decode (HEVC, Streamlines) Decode (H.264, Space) Decode (H.264, Orbit)

Encode HEVC - Encode H.264 - Decode HEVC - Decode H.264

slide-42
SLIDE 42

15

HARDWARE LATENCY

2 4 6 8 10 12 14 16 3840x2160 2720x1530 1920x1080 1360x766 960x540 672x378 480x270

Latency [ms] Resolution

Encode (HEVC) Encode (H.264) Decode (HEVC) Decode (H.264)

Decreases with Resolution

slide-43
SLIDE 43

16

CPU-BASED COMPRESSION

300 600 900 1200 1500 1800 2100 2400 20 40 60 80 100 120 HEVC (GPU) H.264 (GPU) HEVC (CPU) H.264 (CPU) TurboJPEG BloscLZ LZ4 Snappy

Bandwidth [MB/s] Latency [ms]

Encode Decode Bandwidth

Encode/Decode Latency and Bandwidth

slide-44
SLIDE 44

17

FULL TILES STREAMING

slide-45
SLIDE 45

18

N:N WITH SIMULATED NETWORK DELAY

Mean Frame Rates + Min/Max Ranges

20 40 60 80 100 120 0 ms 50 ms 150 ms 500 ms

Frame Rate [Hz] Network Delay

1 2 4 8 16 32 64 128 256

Server: Piz Daint Client: Piz Daint Ice 4K H.264: 32 Mbps HEVC: 16 Mbps Delay + 10% jitter

slide-46
SLIDE 46

19

N:N STREAMING

Pipeline Latencies

Server: Piz Daint Client: Piz Daint Ice 4K H.264: 32 Mbps MPI-based synchronization

5 10 15 20 25 30 35 40 45 1 2 4 8 16 32 64 128 256

Latency [ms] Tiles

Synchronize (Servers) Encode Network Decode Synchronize (Clients)

slide-47
SLIDE 47

20

N:1 STREAMING

Client-Side Frame Rate

Server: Piz Daint Clients: Site A (5 ms) Site B (25 ms) Ice 4K H.264: 32 Mbps

10 20 30 40 50 60 70 80 90 1 2 4 8 16 32 64 128 256

Frame Rate [Hz] Tiles

Site A (1x GP100) Site B (1x GP100) Site B (2x GP100)

slide-48
SLIDE 48

21

STRONG SCALING

slide-49
SLIDE 49

22

N:1 STRONG SCALING

Client-Side Frame Rate

Server: Piz Daint Clients: Site A (5 ms) Site B (25 ms) Ice 4K H.264: 32 Mbps

50 100 150 200 250 300 350 400 1 2 4 8 16 32 64 128 256

Frame Rate [Hz] Tiles

Site A (1x GP100) Site B (1x GP100) Site B (2x GP100)

slide-50
SLIDE 50

23

N:1 STRONG SCALING

Client-Side Frame Rate For Different Bitrates

Server: Piz Daint Client: Site A (5 ms) Ice 4K H.264

20 40 60 80 100 120 140 160 180 200 1 2 4 8 16 32 64 128 256

Frame Rate [Hz] Tiles

4 Mbps @ 90 Hz 72 Mbps @ 90 Hz 256 Mbps @ 90 Hz

slide-51
SLIDE 51

24

EXAMPLE: OPTIX PATHTRACER

Rendering Highly Sensitive to Tile Size

Server: Piz Daint Client: Site A (5 ms) H.264

10 20 30 1 2 4 8 16 32 64 128 256

Frame Rate [Hz] Tiles

Regular Tiling Auto-Tuned Tiling

slide-52
SLIDE 52

25

INTEROPERABILITY

slide-53
SLIDE 53

26

STANDARD-COMPLIANT BITSTREAM

Web Browser Streaming Example

EGL-based shim GLUT Stream unmodified simpleGL example from headless node to web browser (with interaction!) JavaScript client WebSocket-based bidirectional communication On-the-fly MP4 wrapping of H.264

slide-54
SLIDE 54

27

RESOURCES

slide-55
SLIDE 55

28

VIDEO CODEC SDK

APIs For Hardware Accelerated Video Encode/Decode

What’s New with Turing GPUs and Video Codec SDK 9.0

  • Up to 3x decode throughput with multiple

decoders on professional cards (Quadro & Tesla)

  • Higher quality encoding - H.264 & H.265
  • Higher encoding efficiency

(15% lower bitrate than Pascal)

  • HEVC B-frames support
  • HEVC 4:4:4 decoding support

NVIDIA GeForce Now is made possible by leveraging NVENC in the datacenter and streaming the result to end clients

https://developer.nvidia.com/nvidia-video-codec-sdk

slide-56
SLIDE 56

29

NVPIPE

Simple C API H.264, HEVC RGBA32, uint4, uint8, uint16 Lossy, Lossless Host/Device memory, OpenGL textures/PBOs https://github.com/NVIDIA/NvPipe Issues? Suggestions? Feedback welcome!

A Lightweight Video Codec SDK Wrapper

slide-57
SLIDE 57

30

Conclusion

GPU-accelerated video compression opens up novel and fast solutions to the large- scale remoting challenge Video Codec SDK https://developer.nvidia.com/nvidia-video-codec-sdk NvPipe https://github.com/NVIDIA/NvPipe We want to help you solve your large-scale vis problems on NVIDIA! Tim Biedert tbiedert@nvidia.com

slide-58
SLIDE 58