NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 5/8/2017 NVIDIA Video - PowerPoint PPT Presentation

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 5/8/2017

NVIDIA Video Technologies New SDK Release Major Focus Areas AGENDA Video SDK Features Software Flow FFmpeg Performance and Benchmarking Tips Benchmarks 2

NVIDIA VIDEO TECHNOLOGIES 3

VIDEO CODEC SDK A comprehensive set of APIs for GPU- accelerated Video Encode and Decode The SDK consists of two hardware acceleration interfaces: NVENCODE API for video encode acceleration NVIDIA Video Codec SDK technology is used to stream video with NVIDIA ShadowPlay running on NVIDIA GPUs NVDECODE API for video decode acceleration (formerly called NVCUVID API) Independent of CUDA/3D cores on GPU 4

NVIDIA VIDEO TECHNOLOGIES FFMPEG & LIBAV Easy access to NVIDIA GPU hardware acceleration SOFTWARE VIDEO CODEC SDK A comprehensive set of APIs for GPU-accelerated Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability NVIDIA DRIVER NVENC NVDEC HARDWARE Independent Hardware Encoder Function Independent Hardware Decoder Function 5

NVIDIA VIDEO TECHNOLOGIES Decode HW* Encode HW* CPU Formats: • Formats: MPEG-2 • • H.264 VC1 • • H.265 VP8 • • Lossless VP9 • H.264 • Bit depth: H.265 • • 8 bit Lossless NVENC NVDEC Buffer • 10 bit Bit depth: • Color** 8 bit • • YUV 4:4:4 10 bit • YUV 4:2:0 Color** • CUDA Cores Resolution YUV 4:2:0 • Up to 8K*** Resolution • Up to 8K*** * See support diagram for previous NVIDIA HW generations 6 ** 4:2:2 is not natively supported on HW *** Support is codec dependent

VIDEO SDK EVOLUTION Video SDK 8.0 SDK 7.x Pascal 10-bit encode SDK 5.0 FFmpeg ME-only for VR Maxwell 2 Quality++ HEVC Perf++ SDK 6.0 SDK 4.0 SDK 8.0 ARGB Maxwell 1 Quality+ 10-bit transcode H.264 Dec+Enc 10/12-bit decode 4:4:4, lossless ME-only OpenGL Dec. optimizations WP, AQ, Enc. Quality 2014 2015 2015 2016 2017 7

MAJOR FOCUS AREAS 8

VIDEO TRANSCODING Performance/Watt ➢ Content variety ➢ Codecs, resolutions, quality, bitrate ➢ Live, VOD, ultra-low-latency, broadcast, archives ➢ Pre-encoded or encoded-on-demand ➢ Performance/Watt 9

GAME/APP STREAMING Ultra-low-latency Stream ➢ Interactive, single frame latency ➢ Capture: NvFBC, Encode: NvENC, Decode: NvDEC ➢ 4K, HDR Record, Broadcast ➢ Quality 10

GPU VIRTUALIZATION Quality & reliability ➢ Capture + encode ➢ Low-latency ➢ H.264, HEVC ➢ 4:2:0, 4:4:4, lossless ➢ Multiple-displays 11

MOTION-ESTIMATION ONLY MODE Accuracy Frame # N N +1 ➢ Video frame interpolation N +1.5 ➢ Camera stitching (mono to stereo) N +2 ➢ Camera stabilization ➢ Computer vision Frame #( N +1.5) is interpolated based on motion vectors between frame # N and frame #( N +1) 12

VIDEO SDK FEATURES 13

ENCODE FEATURES (1/2) H.264 HEVC Use-case Base, Main, High Main, Main10 Baseline standards 8-bit 8-bit, 10-bit 10-bit for HDR B-frames No B-frames Higher compression & quality Up to 4096 × 4096 Up to 8192 × 8192 High-res YUV 4:2:0, 4:4:4 Subsampled or full-res chroma (e.g. wireframes) Lossless High-quality archiving Error resiliency: Intra refresh, LTR, ref-pic Handle streaming bit errors invalidation 14

ENCODE FEATURES (2/2) H.264 HEVC Use-case Rate control modes:1-pass, 2-pass Quality vs performance Look-ahead Efficient bit distribution across GOP; higher quality Adaptive quantization, ∆QP Finer quality control Weighted prediction (SDK 8.0) Fade-in/fade-out, explosion RGB inputs Direct NVFBC interoperability ME-only mode, MV-hints (SDK 8.0) Motion stabilization, Optical flow for VR stereo stitching, Frame interpolation 1-3 NVENCs per chip High throughput CUDA, DX, OGL (Linux) (SDK 8.0) Easy integration 15

DECODE FEATURES Feature Use-case MPEG2, VC-1, MPEG-4, H.264, HEVC, VP8, VP9 Baseline standards 8-bit (all codecs), 10/12 bit (HEVC, VP9) (SDK 8.0) HDR decoding Up to 8192 × 8192 for HEVC, 4096 × 4096 for H.264 High-res Error resiliency and concealment Internet streaming 16

VIDEO SDK – CONTENTS (1/2) ➢ Header, documentation, sample applications ➢ Binaries (.dll, .so) in NVIDIA display driver ➢ Unified API for Windows & Linux ➢ NVIDIA developer zone ➢ Encode limitations ➢ Unconstrained: Tesla, GRID, Quadro ≥ X 2000 ( X = K, M, P) ➢ 2 sessions/system: GeForce, Quadro < X 2000 ➢ No decode limitations 17

VIDEO SDK – CONTENTS (2/2) Sample Applications ➢ Decode: DX9, DX11, CUDA, OpenGL ➢ Encode: Basic functionality, features (NvEncoder) ➢ Encode: Performance (NvEnodePerf) ➢ Encode: CUDA interop, D3D interop, OGL interop, ➢ Encode: Low-latency (NVEncoderLowLatency) ➢ Transcode (NvTranscoder) ➢ Coming soon: Reusable classes 18

FFMPEG/LIBAV ➢ Major SW focus area for past 6 months ➢ Feature parity with Video SDK 7.1, SDK 8.0 post GTC ➢ End-to-end FFmpeg transcoding @ best possible quality & perf 19

SOFTWARE FLOW 20

ENCODE APP FLOW Client application Encoded Initialize, Configure, Encode bitstream NVENC API Configure HW NVENC OpenGL DirectX CUDA Driver HW Encode NVENC firmware + hardware OpenGL-CUDA interop NVENC-CUDA interop 21

ENCODE APP FLOW CUDA Open encode Device NvEncOpenEncodeSessionEx DirectX Session Type OpenGL APIs NvEncGetEncodeCaps Query Codec, presets, NvEncGetInputFormats capabilities features NvEncGetEncodePresetGUIDs NvEncInitializeEncoder API Functions W/H, framerate, Initialize NV_ENC_INITIALIZE_PARAMS preset, RC, codec- NV_ENC_CONFIG_H264/HEVC encoder specific params NV_ENC_RC_PARAMS Structures Internal/external DIRECTX, Allocate NvEncRegisterResource CUDADEVICEPTR, NV_ENC_REGISTER_RESOURCE buffers OPENGL_TEX Encode NvEncEncodePicture Picture-level config Defined in nvEncodeAPI.h NV_ENC_PIC_PARAMS picture parameters NvEncLockBitstream Retrieve Synchronous (Win/Lnux) NvEncUnlockBitstream bitstream Async (Win) Buffers, Clean-up NvEncUnregisterResource session, device 22

DECODE APP FLOW NV DECODE API Client application Bitstream • YUV Video Demux frames RGB NVDEC • Source Parser Driver • DX CUDA • NVDEC Callbacks Data flow Decode API calls 23

DECODE APP FLOW APIs Query cuvidGetDecoderCaps() Codecs, resolutions CUVIDDECODECAPS capabilities API functions supported Structures Create cuvidCreateDecoder() W/H, scaling, CUVIDDECODECREATEINFO decoder bit-depth Defined in dynlink_nvcuvid.h, Decode cuvidDecodePicture() Picture parameters CUVIDPICPARAMS picture from bitstream parser dynlink_cuviddec.h Post- CUDA kernels scaling, CSC Etc. processing cuvidDestroyDecoder() Clean-up 24

FFMPEG APP FLOW ffmpeg -y -vsync 0 – hwaccel cuvid -c:v h264_cuvid -i input.mp4 -c:a copy – vf scale_npp=1280:720 -c:v h264_nvenc -b:v 5M output.mp4 ➢ Chain of filters Post- Input Decode Scale Encode Output processing h264_cuvid scale_npp= x:y h264_nvenc ➢ -hwaccel cuvid : Use end-to-end NVIDIA hardware acceleration ➢ h264_cuvid : Use NVCUVID/NVDECODE ➢ h264_nvenc : Use NVENCODE ➢ scale_npp : high-perf CUDA scaling 25

HARDWARE ACCELERATED TRANSCODE USING FFMPEG 26

PERFORMANCE CONSIDERATIONS - FFMPEG ➢ Minimize memory (PCIe) transfers ➢ Saturate on-chip encoder/decoder ➢ Efficient M:N command line ➢ Minimize I/O ➢ Encode settings ➢ GPU Clocks 27

SW TRANSCODE ffmpeg -c:v h264 -i input.mp4 -c:a copy -c:v h264 -b:v 5M output.mp4 System Memory SW SW Decode Encode Bitstream Bitstream YUV YUV 32 fps* *1:2 transcode, fps per session 4 GHz Intel i7-6700K 28

SW TRANSCODE + SCALE ffmpeg -c:v h264 -i input.mp4 -vf scale=1280:720 -c:a copy -c:v h264 -b:v 5M output.mp4 System Memory SW SW Preprocess Decode Encode (e.g. scaling) Bitstream Bitstream YUV YUV YUV YUV 29 fps* *1:2 transcode, fps per session 4 GHz Intel i7-6700K 29

GPU UNOPTIMIZED TRANSCODE ffmpeg -y -vsync 0 -c:v h264_cuvid -i input.mp4 -c:a copy -c:v h264_nvenc -b:v 5M output.mp4 System Memory PCIe transfer PCIe transfer Bitstream Bitstream 288 fps* *1:2 transcode, fps per session GP104 GPU NVENC NVDEC Encode Decode YUV YUV GPU Memory 30

GPU UNOPTIMIZED TRANSCODE + CPU SCALE ffmpeg -y -vsync 0 -c:v h264_cuvid -i input.mp4 -c:a copy – vf scale=1280:720 -c:v h264_nvenc -b:v 5M output.mp4 System Memory PCIe transfer PCIe transfer Preprocess (e.g. scaling) Bitstream Bitstream 76 fps* NVENC NVDEC Encode Decode YUV YUV *1:2 transcode, fps per session GP104 GPU GPU Memory 31

HIGH-PERF GPU OPTIMIZED TRANSCODE ffmpeg -y -vsync 0 – hwaccel cuvid -c:v h264_cuvid -i input.mp4 -c:a copy – vf scale_npp=1280:720 -c:v h264_nvenc -b:v 5M output.mp4 System Memory 472 fps* Bitstream Bitstream *1:2 transcode, fps per session GP104 GPU NVENC NVDEC Preprocess Encode Decode (scaling in CUDA) YUV YUV YUV YUV GPU Memory 32

PERFORMANCE CONSIDERATIONS Saturating encoder/decoder ➢ Pipelining ➢ Input/output buffers ➢ Tools: nvidia-smi, Microsoft GPUView 33

ANALYZING PERFORMANCE BOTTLENECKS Microsoft GPUView (Windows only)   34

ANALYZING PERFORMANCE BOTTLENECKS nvidia-smi (Windows & Linux)   35

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 5/8/2017 NVIDIA Video - PowerPoint PPT Presentation

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 5/8/2017 NVIDIA Video Technologies New SDK Release Major Focus Areas AGENDA Video SDK Features Software Flow FFmpeg Performance and Benchmarking Tips Benchmarks 2 NVIDIA VIDEO TECHNOLOGIES 3

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video Technologies Overview Turing

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/26/2018 NVIDIA Video Technologies Overview Video

NVIDIA Quadro and NVS Video Walls NVIDIA Quadro and NVS Video Walls Using NVIDIA technology to

FOR THE BEST VDI USER EXPERIENCE NVIDIA VIRTUAL GPU PRODUCT POSITIONING NVIDIA GRID NVIDIA

NVIDIA NSIGHT ECLIPSE EDITION CHRISTOPH ANGERER, NVIDIA JULIEN DEMOUTH, NVIDIA WHAT YOU WILL

HIGH-PERFORMANCE GPU VIDEO ENCODING ABHIJIT PATAIT SR. MANAGER, NVIDIA AGENDA GPU Video

HIGH PERFORMANCE VIDEO ENCODING WITH NVIDIA GPUS Abhijit Patait Eric Young April 4 th , 2016

GENERATION OF GAMING TECHNOLOGY Samuel Lo, NVIDIA AI Technology Centre samuell@nvidia.com NVIDIA

Red Hat and the NVIDIA DGX: Tried, Tested, Trusted NVIDIA GTC 2019 Jeremy Eder, Andre Beausoleil,

NVIDIA INDEX IMPLEMENTING CLOUD SERVICES FOR MASSIVE DATA VISUALIZATION Marc Nienhaus (NVIDIA),

NVIDIA DESIGNWORKS Ankit Patel - ankitp@nvidia.com Prerna Dogra - pdogra@nvidia.com 1 Autonomous

NVIDIA VGPU LINUX KVM Neo Jia, Dec 19th 2019 AGENDA NVIDIA vGPU

GET TO KNOW THE NVIDIA GRID TM SDK Shounak Deshpande, NVIDIA Background NVIDIA GRID SDK AGENDA

NVIDIA INDEX IMPLEMENTING ADVANCED DATA VISUALIZATION WITH NVIDIA INDEX Alexander Kuhn and Marc

Cutting Edge Tools and Techniques for Real-Time Rendering with NVIDIA GameWorks David Coombes,

CUDA OPTIMIZATION WITH NVIDIA NSIGHT VISUAL STUDIO EDITION CHRISTOPH ANGERER, NVIDIA JULIEN

Proposed Satellite Service for Storm Warning and Ice-Edge Detection Martin Unwin (2) Philip J.

Check Out Our Soundcloud: The Wavetable Synth Team A0 Jens Ertman Charles Li Hailang Liou The

Multipath Interference Characterization in Wireless Communication Systems Michael Rice BYU

Reasoning about the Security of Open Architecture Software Systems Walt Scacchi and Thomas

Herndon-Reston Indivisible Howard Weiss Allan Zendle Robert Anthony November 2017 Discussion

P rudent From ractices & recedents P roper rotocols To rocedures rograms &

11 October 2017 DISCLAIMER Important Notice This presentation has been prepared by Pacific Edge

Accelerating the realisation of key interface technologies in transport and energy - the PNDC

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 5/8/2017 NVIDIA Video - PowerPoint PPT Presentation

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 5/8/2017 NVIDIA Video Technologies New SDK Release Major Focus Areas AGENDA Video SDK Features Software Flow FFmpeg Performance and Benchmarking Tips Benchmarks 2 NVIDIA VIDEO TECHNOLOGIES 3

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video Technologies Overview Turing

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/26/2018 NVIDIA Video Technologies Overview Video

NVIDIA Quadro and NVS Video Walls NVIDIA Quadro and NVS Video Walls Using NVIDIA technology to

FOR THE BEST VDI USER EXPERIENCE NVIDIA VIRTUAL GPU PRODUCT POSITIONING NVIDIA GRID NVIDIA

NVIDIA NSIGHT ECLIPSE EDITION CHRISTOPH ANGERER, NVIDIA JULIEN DEMOUTH, NVIDIA WHAT YOU WILL

HIGH-PERFORMANCE GPU VIDEO ENCODING ABHIJIT PATAIT SR. MANAGER, NVIDIA AGENDA GPU Video

HIGH PERFORMANCE VIDEO ENCODING WITH NVIDIA GPUS Abhijit Patait Eric Young April 4 th , 2016

GENERATION OF GAMING TECHNOLOGY Samuel Lo, NVIDIA AI Technology Centre samuell@nvidia.com NVIDIA

Red Hat and the NVIDIA DGX: Tried, Tested, Trusted NVIDIA GTC 2019 Jeremy Eder, Andre Beausoleil,

NVIDIA INDEX IMPLEMENTING CLOUD SERVICES FOR MASSIVE DATA VISUALIZATION Marc Nienhaus (NVIDIA),

NVIDIA DESIGNWORKS Ankit Patel - ankitp@nvidia.com Prerna Dogra - pdogra@nvidia.com 1 Autonomous

NVIDIA VGPU LINUX KVM Neo Jia, Dec 19th 2019 AGENDA NVIDIA vGPU

GET TO KNOW THE NVIDIA GRID TM SDK Shounak Deshpande, NVIDIA Background NVIDIA GRID SDK AGENDA

NVIDIA INDEX IMPLEMENTING ADVANCED DATA VISUALIZATION WITH NVIDIA INDEX Alexander Kuhn and Marc

Cutting Edge Tools and Techniques for Real-Time Rendering with NVIDIA GameWorks David Coombes,

CUDA OPTIMIZATION WITH NVIDIA NSIGHT VISUAL STUDIO EDITION CHRISTOPH ANGERER, NVIDIA JULIEN

Proposed Satellite Service for Storm Warning and Ice-Edge Detection Martin Unwin (2) Philip J.

Check Out Our Soundcloud: The Wavetable Synth Team A0 Jens Ertman Charles Li Hailang Liou The

Multipath Interference Characterization in Wireless Communication Systems Michael Rice BYU

Reasoning about the Security of Open Architecture Software Systems Walt Scacchi and Thomas

Herndon-Reston Indivisible Howard Weiss Allan Zendle Robert Anthony November 2017 Discussion

P rudent From ractices &amp; recedents P roper rotocols To rocedures rograms &amp;

11 October 2017 DISCLAIMER Important Notice This presentation has been prepared by Pacific Edge

Accelerating the realisation of key interface technologies in transport and energy - the PNDC

P rudent From ractices & recedents P roper rotocols To rocedures rograms &