NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video - PowerPoint PPT Presentation

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019

NVIDIA Video Technologies Overview Turing Video Enhancements AGENDA Video Codec SDK Updates Benchmarks Roadmap 2

NVIDIA VIDEO TECHNOLOGIES 3

NVIDIA GPU VIDEO CAPABILITIES Decode HW* Encode HW* CPU Formats: • Formats: MPEG-2 • • H.264 VC1 • • H.265 VP8 • • Lossless VP9 • H.264 • Bit depth: H.265 • • 8 bit Lossless NVENC NVDEC Buffer • 10 bit Bit depth: • Color** 8/10/12 bit • YUV 4:4:4 • YUV 4:2:0 Color** • YUV 4:2:0 • CUDA Cores Resolution YUV 4:4:4 • Up to 8K*** Resolution • Up to 8K*** * See support diagram for previous NVIDIA HW generations 4 ** 4:4:4 is supported only on HEVC for Turing; 4:2:2 is not natively supported on HW *** Support is codec dependent

Gamestream VIDEO CODEC SDK A comprehensive set of APIs for GPU- Video transcoding accelerated video encode and decode Remote desktop streaming NVENCODE API for video encode acceleration Intelligent video analytics NVDECODE API for video & JPEG decode acceleration (formerly called NVCUVID API) Independent of CUDA/3D cores on GPU for Video archiving pre-/post-processing Video editing 5

NVIDIA VIDEO TECHNOLOGIES Easy access to GPU DeepStream cuDNN, TensorRT , DALI SDK cuBLAS, cuSPARSE video acceleration SOFTWARE VIDEO CODEC, OPTICAL FLOW SDK CUDA TOOLKIT Video Encode and Decode for Windows and Linux APIs, libraries, tools, samples CUDA, DirectX, OpenGL interoperability NVIDIA DRIVER NVENC NVDEC CUDA HARDWARE Video decode Video encode High-performance computing on GPU 6

VIDEO CODEC SDK UPDATE 7

VIDEO CODEC SDK UPDATE SDK 8.1 SDK 9.0 SDK 7.x B-as-ref Turing Pascal QP/emphasis map Multi-NVDEC 10-bit encode 4K60 HEVC encode HEVC 4:4:4 decode FFmpeg Reusable classes & Encode quality++ ME-only for VR new sample apps HEVC B frames Quality++ SDK 8.0 SDK 8.2 10-bit transcode Decode + inference 10/12-bit decode optimizations OpenGL Dec. optimizations WP, AQ, Enc. Quality Q3 2018 2016 2017 Q1 2018 2019 8

VIDEO CODEC SDK 9.0 Soul Feature Who it benefits Higher video encode quality Cloud gaming HEVC B-frames Game broadcasting (e.g. Twitch) Higher encode quality Video transcoding (e.g. Youtube, Facebook) OTT/M&E HEVC 4:4:4 decode End-to-end high-quality remote desktop Mutiple NVDECs Higher decode + inference throughput Direct output to vidmem Higher perf with post-processing Power 9 + Tesla V100 SXM2 Video SDK for IBM platforms 9

TURING UPDATES - NVDEC 10

MULTIPLE NVDECS IN TURING GPU Number of NVDECs per GPU Volta, Pascal & earlier 1 Turing – GeForce (RTX) 1 Turing – Quadro & Tesla (TU106) 3 Turing – Quadro & Tesla (TU104) 2 Turing – others 1 ➢ Quadro & Tesla feature ➢ Auto-load-balanced by driver 11

PASCAL & EARLIER Single NVDEC Scale Infer Bottleneck Scale Infer … 0101100010011 … … 1001010111010 … NVDEC … 0101100010011 … … 1001010111010 … Infer Scale High-res Decode 1080p, 720p Infer Scale Low-res infer 12 e.g. 300 × 200

TURING Multiple NVDECs Scale Infer … 0101100010011 … Scale Infer … 1001010111010 … NVDEC 0 … 0101100010011 … . … 1001010111010 … . . . . . . . . . . . . . . … 0101100010011 … NVDEC N … 1001010111010 … Scale Infer … 0101100010011 … … 1001010111010 … High-res Decode Scale Infer 1080p, 720p Low-res infer 13 e.g. 300 × 200

END-TO-END 4:4:4 IN TURING Preserves chroma: text and thin lines ➢ Valuable in desktop streaming ➢ 4:2:0 4:4:4 14

END-TO-END 4:4:4 IN TURING HEVC 4:4:4 HW encode & 4:4:4 HW decode Turing Pascal & earlier CPU HW Desktop HW Network Stream Render Capture Encode decode decode 15

TURING NVENC ENHANCEMENTS 16

NVENC - ENCODING QUALITY Focus for Turing NVENC Enhancement How to use Rate distortion optimization – RDO Turing only – always ON Multiple reference frames Preset-dependent HEVC B-frames NVENCODE API Others ➢ Higher throughput at same quality as Pascal ➢ Turing GPUs have single NVENC engine with higher quality 17

TURING NVENC QUALITY ➢ Focus on quality – RDO, multi-ref, HEVC B- frames, … ➢ Quality vs performance trade-off ➢ Quality is content dependent ➢ 600+ videos of 10-20 secs each: Natural, animation, gaming, video conference, movies ➢ 720p, 1080p, 4K, 8K ➢ Quality: PSNR, SSIM, VMAF, subjective ➢ Perf: fps, number of 1080p streams per GPU 18

H.264 ENCODE BENCHMARK Non latency critical – Turing vs Pascal vs x264 H.264 - non latency critical H.264 - non latency critical 25 1.161.17 1.20 19.41 1.15 20 18.73 Higher bitrate savings bitrate ratio @ iso quality 17.73 1.08 1.10 1.05 #1080p30 streams 1.05 1.00 15 0.98 1.00 0.93 10.60 0.95 10 0.90 Higher perf 6.28 0.85 5.72 5 0.80 2.95 0 5 10 15 20 #1080p30 streams 0 T4 medium T4 fast P4 slow P4 medium x264 slow x264 medium x264 fast T4 medium T4 fast P4 slow P4 medium x264 slow x264 medium x264 fast 19 “iso” quality = x264 medium

H.264 ENCODE BENCHMARK Non latency critical – FFmpeg commands NVENC slow -preset slow -bufsize BITRATE*2 -maxrate BITRATE*1.5 -profile:v high -bf 3 - b_ref_mode 2 -temporal-aq 1 -rc-lookahead 20 -vsync 0 x264 slow -preset slow -tune psnr -vsync 0 -threads 4 -vsync 0 NVENC medium -preset medium -rc vbr -profile:v high -bf 3 -b_ref_mode 2 -temporal-aq 1 -rc-lookahead 20 -vsync 0 x264 medium -preset medium -tune psnr -threads 4 -vsync 0 NVENC fast -preset fast -rc vbr -profile:v high -bf 3 -b_ref_mode 2 -temporal-aq 1 -rc-lookahead 20 -vsync 0 x264 fast -preset fast -tune psnr -vsync 0 -threads 4 -vsync 0 20

HEVC ENCODE BENCHMARK Non latency critical – Turing vs Pascal vs x265 HEVC – non latency critcal HEVC – non latency critical 14 11.80 1.6 12 10.71 1.4 1.35 Higher bitrate savings 1.21 bitrate ratio @ iso quality 1.10 10 1.2 1.00 1.10 #1080p30 streams 0.92 1.0 8 0.8 0.6 6 Higher perf 4.29 0.4 4 2.98 0.2 1.91 0.0 2 0 5 10 15 0.85 #1080p30 streams 0 T4 fast T4 medium P4 medium x265 fast x265 medium x265 slow T4 medium T4 fast P4 medium x265 slow x265 medium x265 fast 21 “iso” quality = x265 medium

HEVC ENCODE BENCHMARK Non latency critical – FFmpeg commands NVENC slow -preset slow -rc vbr_hq -b:v BITRATE -profile:v 4 -bf 2 -rc-lookahead 20 -g 250 -vsync 0 x265 slow -preset slow -b:v BITRATE -bf 2 -tune psnr -threads 4 -vsync 0 NVENC medium -preset medium -rc vbr_hq -b:v BITRATE -profile:v 4 -bf 2 -rc-lookahead 20 -g 250 -vsync 0 x265 medium -preset medium -b:v BITRATE -bf 2 -tune psnr -threads 4 -vsync 0 NVENC fast -preset fast -rc vbr_hq -b:v BITRATE -profile:v 4 -bf 2 -temporal-aq 1 -rc- lookahead 20 -g 250 -vsync 0 x265 fast -preset fast -b:v BITRATE -bf 2 -tune psnr -threads 4 -vsync 0 22

SOFTWARE UPDATES 23

RECONFIGURE DECODER Video Codec SDK 8.2 No init time, reuse context, lowers memory fragmentation ✓ Input resolution ✓ Scaling resolution ✓ Cropping rectangle Codecs Bit-depth and chroma format Deinterlace mode Input resolution beyond max width or max height 24

DIRECT OUTPUT TO VIDMEM Video Codec SDK 9.0 Host/system SDK 8.2 & earlier SDK 9.0 memory CPU process PCIe CUDA CUDA NVENC pre-process Post-process Video memory Video memory 25

OTHER UPDATES ➢ Video Codec SDK now supported on Power 9 + Tesla V100 SXM2 ➢ High-level NVDEC error status 26

OPTICAL FLOW New HW Functionality ➢ 4 × 4 optical flow vector , up to 4K × 4K ➢ New Optical Flow SDK ➢ Close to true motion ➢ Action recognition, object tracking, video ➢ Robust to intensity changes inter/extrapolation, frame-rate upconversion ➢ 10x faster than CPU; same quality ➢ Legacy ME-only mode support More information: http://developer.nvidia.com/opticalflow-sdk 27

TIPS FOR NVENC OPTIMIZATION 28

OPTIMIZATION STRATEGIES General Guidelines ➢ Minimize PCIe transfers ➢ Eliminate, if possible ➢ Use CUDA for video pre-/post-processing ➢ Multiple threads/processes to balance enc/dec utilization ➢ Monitor using nvidia-smi: nvidia-smi dmon -s uc -i <GPU_index> ➢ Analyze using GPUView on Windows ➢ Minimize disk I/O ➢ Optimize encoder settings for quality/perf balance 30

FFMPEG VIDEO TRANSCODING Tips ➢ Look at FFmpeg users’ guide in NVIDIA Video Codec SDK package ➢ Use – hwaccel keyword to keep entire transcode pipeline on GPU ➢ Run multiple 1: N transcode sessions to achieve M : N transcode at high perf 31

LOW LATENCY STREAMING (1/3) Optimization tips ➢ Low latency ≠ Low encoding time ➢ Latency determined by ➢ B-frames ➢ Look-ahead ➢ VBV buffer size & avlbl bandwidth 32

LOW LATENCY STREAMING (2/3) Optimization tips ➢ For 1-2 frame latency (e.g. cloud gaming), use ➢ RC_CBR_LOWDELAY_HQ & Low VBV buffer size ➢ Minimizes frame-to-frame variations ➢ Any preset (Default, HQ, HP preferred) LL presets have resolution-dependent behavior ➢ ➢ No look-ahead ➢ No B-frames 33

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video - PowerPoint PPT Presentation

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video Technologies Overview Turing Video Enhancements AGENDA Video Codec SDK Updates Benchmarks Roadmap 2 NVIDIA VIDEO TECHNOLOGIES 3 NVIDIA GPU VIDEO CAPABILITIES Decode HW*

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/26/2018 NVIDIA Video Technologies Overview Video

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 5/8/2017 NVIDIA Video Technologies New SDK Release

NVIDIA Quadro and NVS Video Walls NVIDIA Quadro and NVS Video Walls Using NVIDIA technology to

FOR THE BEST VDI USER EXPERIENCE NVIDIA VIRTUAL GPU PRODUCT POSITIONING NVIDIA GRID NVIDIA

NVIDIA NSIGHT ECLIPSE EDITION CHRISTOPH ANGERER, NVIDIA JULIEN DEMOUTH, NVIDIA WHAT YOU WILL

HIGH-PERFORMANCE GPU VIDEO ENCODING ABHIJIT PATAIT SR. MANAGER, NVIDIA AGENDA GPU Video

HIGH PERFORMANCE VIDEO ENCODING WITH NVIDIA GPUS Abhijit Patait Eric Young April 4 th , 2016

GENERATION OF GAMING TECHNOLOGY Samuel Lo, NVIDIA AI Technology Centre samuell@nvidia.com NVIDIA

Red Hat and the NVIDIA DGX: Tried, Tested, Trusted NVIDIA GTC 2019 Jeremy Eder, Andre Beausoleil,

NVIDIA INDEX IMPLEMENTING CLOUD SERVICES FOR MASSIVE DATA VISUALIZATION Marc Nienhaus (NVIDIA),

NVIDIA DESIGNWORKS Ankit Patel - ankitp@nvidia.com Prerna Dogra - pdogra@nvidia.com 1 Autonomous

NVIDIA VGPU LINUX KVM Neo Jia, Dec 19th 2019 AGENDA NVIDIA vGPU

GET TO KNOW THE NVIDIA GRID TM SDK Shounak Deshpande, NVIDIA Background NVIDIA GRID SDK AGENDA

NVIDIA INDEX IMPLEMENTING ADVANCED DATA VISUALIZATION WITH NVIDIA INDEX Alexander Kuhn and Marc

Cutting Edge Tools and Techniques for Real-Time Rendering with NVIDIA GameWorks David Coombes,

CUDA OPTIMIZATION WITH NVIDIA NSIGHT VISUAL STUDIO EDITION CHRISTOPH ANGERER, NVIDIA JULIEN

Dyslexia 101 Presented by: Barbara Steinberg, M.Ed. Dyslexia & Educational Consultant PDX

JS Character Encodings Anna Henningsen @ addaleax she/her 1 Its good to be back!

Encoding Multimedia Presentation for User Preferences and Limited Environments Conference Paper

VIDEO PRESENTATION AND COMPRESSION Article CITATIONS READS 6 539 2 authors: Borko Furht

Building an Area-optimized Multi-format Video Encoder IP Tomi Jalonen VP Sales

Dremel: Interac-ve Analysis of Web-Scale Datasets By Frank

Learning Ally Update Dyslexia Training Institute Presentation Q&A 2 1

Camera identification on YouTube Y A N N I C K S C H E E L E N J O P V A N D E R L E L I E

Sambuz

Useful Links

Newsletter

Mail Us

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video - PowerPoint PPT Presentation

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video Technologies Overview Turing Video Enhancements AGENDA Video Codec SDK Updates Benchmarks Roadmap 2 NVIDIA VIDEO TECHNOLOGIES 3 NVIDIA GPU VIDEO CAPABILITIES Decode HW*

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/26/2018 NVIDIA Video Technologies Overview Video

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 5/8/2017 NVIDIA Video Technologies New SDK Release

NVIDIA Quadro and NVS Video Walls NVIDIA Quadro and NVS Video Walls Using NVIDIA technology to

FOR THE BEST VDI USER EXPERIENCE NVIDIA VIRTUAL GPU PRODUCT POSITIONING NVIDIA GRID NVIDIA

NVIDIA NSIGHT ECLIPSE EDITION CHRISTOPH ANGERER, NVIDIA JULIEN DEMOUTH, NVIDIA WHAT YOU WILL

HIGH-PERFORMANCE GPU VIDEO ENCODING ABHIJIT PATAIT SR. MANAGER, NVIDIA AGENDA GPU Video

HIGH PERFORMANCE VIDEO ENCODING WITH NVIDIA GPUS Abhijit Patait Eric Young April 4 th , 2016

GENERATION OF GAMING TECHNOLOGY Samuel Lo, NVIDIA AI Technology Centre samuell@nvidia.com NVIDIA

Red Hat and the NVIDIA DGX: Tried, Tested, Trusted NVIDIA GTC 2019 Jeremy Eder, Andre Beausoleil,

NVIDIA INDEX IMPLEMENTING CLOUD SERVICES FOR MASSIVE DATA VISUALIZATION Marc Nienhaus (NVIDIA),

NVIDIA DESIGNWORKS Ankit Patel - ankitp@nvidia.com Prerna Dogra - pdogra@nvidia.com 1 Autonomous

NVIDIA VGPU LINUX KVM Neo Jia, Dec 19th 2019 AGENDA NVIDIA vGPU

GET TO KNOW THE NVIDIA GRID TM SDK Shounak Deshpande, NVIDIA Background NVIDIA GRID SDK AGENDA

NVIDIA INDEX IMPLEMENTING ADVANCED DATA VISUALIZATION WITH NVIDIA INDEX Alexander Kuhn and Marc

Cutting Edge Tools and Techniques for Real-Time Rendering with NVIDIA GameWorks David Coombes,

CUDA OPTIMIZATION WITH NVIDIA NSIGHT VISUAL STUDIO EDITION CHRISTOPH ANGERER, NVIDIA JULIEN

Dyslexia 101 Presented by: Barbara Steinberg, M.Ed. Dyslexia &amp; Educational Consultant PDX

JS Character Encodings Anna Henningsen @ addaleax she/her 1 Its good to be back!

Encoding Multimedia Presentation for User Preferences and Limited Environments Conference Paper

VIDEO PRESENTATION AND COMPRESSION Article CITATIONS READS 6 539 2 authors: Borko Furht

Building an Area-optimized Multi-format Video Encoder IP Tomi Jalonen VP Sales

Dremel: Interac-ve Analysis of Web-Scale Datasets By Frank

Learning Ally Update Dyslexia Training Institute Presentation Q&amp;A 2 1

Camera identification on YouTube Y A N N I C K S C H E E L E N J O P V A N D E R L E L I E

Sambuz

Useful Links

Newsletter

Mail Us

Dyslexia 101 Presented by: Barbara Steinberg, M.Ed. Dyslexia & Educational Consultant PDX

Learning Ally Update Dyslexia Training Institute Presentation Q&A 2 1