HIGH-PERFORMANCE GPU VIDEO ENCODING ABHIJIT PATAIT SR. MANAGER, - - PowerPoint PPT Presentation

high performance gpu video encoding
SMART_READER_LITE
LIVE PREVIEW

HIGH-PERFORMANCE GPU VIDEO ENCODING ABHIJIT PATAIT SR. MANAGER, - - PowerPoint PPT Presentation

HIGH-PERFORMANCE GPU VIDEO ENCODING ABHIJIT PATAIT SR. MANAGER, NVIDIA AGENDA GPU Video Encoding Overview NVIDIA Video Encoding Capabilities Kepler, Maxwell Gen 1, Maxwell Gen 2 Software API Performance & Quality Roadmap WHY GPU VIDEO


slide-1
SLIDE 1

ABHIJIT PATAIT

  • SR. MANAGER, NVIDIA

HIGH-PERFORMANCE GPU VIDEO ENCODING

slide-2
SLIDE 2

AGENDA

GPU Video Encoding Overview NVIDIA Video Encoding Capabilities

Kepler, Maxwell Gen 1, Maxwell Gen 2

Software API Performance & Quality Roadmap

slide-3
SLIDE 3

WHY GPU VIDEO ENCODING?

slide-4
SLIDE 4

BENEFITS

Low power

Fixed function hardware, free CPU Reduced memory transfers

Low latency High performance Higher density Scalability

Automatic benefit from improvements in hardware

Ease of programming

Linux, Windows, C/C++, Application portability

slide-5
SLIDE 5

NVIDIA VIDEO ENCODER CAPABILITIES

slide-6
SLIDE 6

MAIN FEATURES

Feature Benefits H.264 base, main, high profiles Wide range of use-cases H.265/HEVC main profile Lower bitrates at same quality High performance (4K @ 60 fps) “Blazing-speed” encoding YUV 4:2:0 and 4:4:4 support High quality encoding without chroma subsampling QP maps Customizable quality, region of interest encoding 4K encoding in hardware High resolution encode API - NV Encode SDK & GRID SDK Flexible, Win/Linux, DirectX/CUDA Independent of CUDA Use CUDA and encode simultaneously

slide-7
SLIDE 7

FEATURE COMPARISON

Kepler Maxwell Gen 1 (GM10x) Maxwell Gen 2 (GM20x) H.264 only H.264 only H.264 and HEVC/H.265 Planar 4:4:4 & proprietary 4:4:4; no lossless encoding Standard 4:4:4 and H.264 lossless encoding Standard 4:4:4 and H.264 lossless encoding ~240 fps 2-pass encoding @ 720p ~500 fps 2-pass encoding @ 720p ~900 fps 2-pass encoding @ 720p GRID K340/K520, K1/K2, Quadro, Tesla K10/K20 Maxwell-based GRID & Quadro products TBA GeForce – 2 full-speed encode sessions/system GeForce – 2 full-speed encode sessions/system GeForce – 2 full-speed encode sessions/system NV Encode SDK 1.0-5.0 (Now) NV Encode SDK 4.0+ (Now) NV Encode SDK 5.0+ (Now) GRID SDK 1.x, 2.2, 2.3 (Now) GRID SDK 3.0+ (Now) In development

slide-8
SLIDE 8

WHAT’S NEW – HARDWARE

HEVC

8-bit encoding Main8 profile Optimized for low-latency applications (I and P frames) > 300 fps at very high quality 720p

H.264

Improved performance (~80% higher compared to 1st Gen Maxwell) 4:4:4 and lossless

slide-9
SLIDE 9

WHAT’S NEW - SOFTWARE

NVENC SDK 5.0 NVIDIA GPU driver 347.18 and above HEVC

Unified API for H.264 and HEVC Linux & Windows Intra refresh, ref-pic invalidation, etc. for H.264 and HEVC

Support for all NVENC hardware up to GM20x Adaptive quantization Quality improvements All-new sample applications, including a performance application

slide-10
SLIDE 10

SOFTWARE API

slide-11
SLIDE 11

USING NVENC NVENC SDK

  • No capture
  • Transcoding
  • Archiving
  • Video editing
  • CUDA pre-process +

encoding

  • Granular encoder settings
  • D3D, CUDA interop

Direct Encode

GRID SDK

  • Capture + encode
  • Optimized for low-latency

apps

  • Capture + CUDA pre-

process + encoding

  • Encoder settings
  • ptimized for streaming
  • D3D, CUDA interop

Capture + Encode

slide-12
SLIDE 12

DIRECT ENCODE (NVENC SDK)

Client application NVENC API NVENC Driver DirectX Driver CUDA Driver NVENC firmware + hardware Initialize, Configure HW HW Encode Encoded bitstream Configure, Encode

slide-13
SLIDE 13

CAPTURE & ENCODE (GRID SDK)

Client application NvFBC/NvIFR NVENC Driver DirectX/OGL Driver NVENC Hardware Capture YUV GPU 3D Engine DX/OGL Present Encode Encoded Bitstream

slide-14
SLIDE 14

NVENC SDK (1/2)

Available on NVIDIA developer zone

https://developer.nvidia.com/nvidia-video-codec-sdk Current release: 5.0

Interface header, documentation, sample application .dll/.so included in the driver Unified API for Windows and Linux Works on x86/x64 API’s, presets, rate control modes for

Low-latency streaming Transcoding Video conferencing

slide-15
SLIDE 15

NVENC SDK (2/2)

Unified API for H.264 and HEVC Flexibility

Dynamic resolution/bitrate change Low-level encoder settings Windows, Linux, DirectX, CUDA, OGL (via CUDA) Works on GeForce (2 sessions/system)

Error concealment

Reference picture invalidation Intra-refresh

Greater flexibility for quality/performance trade-off Lossless encoding only in NVENC SDK

slide-16
SLIDE 16

GRID SDK ENCODE

NDA only – older release available on NV developer zone

https://developer.nvidia.com/grid-app-game-streaming

Current release: 3.1 (Now – NDA), 2.3 (Public) Interface header, documentation, sample apps .dll/.so included in the driver Windows and Linux Works on x86/x64 Presets and API’s for

Remote graphics (Cloud gaming, remote desktop, capture & stream) Optimized for low latency

slide-17
SLIDE 17

QUALITY

slide-18
SLIDE 18

H.264 QUALITY – 1-PASS ENCODING

34 36 38 40 42 44 46 6 8 10 12 15 18 20 PSNR (dB) bitrate (Mbps)

H.264 quality with 1-pass rate control

Default LL-Default HP HQ BD LL-HQ

slide-19
SLIDE 19

H.264 QUALITY – 2-PASS ENCODING

34 36 38 40 42 44 46 6 8 10 12 15 18 20 PSNR (dB) bitrate (Mbps)

H.264 quality with 2-pass rate control

Default LL-Default HP HQ BD LL-HQ

slide-20
SLIDE 20

COMPARISON: 1-PASS VS 2-PASS

37.5 38 38.5 39 39.5 40 40.5 Default LL-Default HP HQ BD LL-HQ PSNR (dB) Encoder preset

H.264 quality comparison: 1-pass vs 2-pass

1-pass 2-pass 1-pass 1-pass 1-pass 1-pass 1-pass 2-pass 2-pass 2-pass 2-pass 2-pass

slide-21
SLIDE 21

BITRATE SAVINGS

39.5 dB 41.0 dB 42.0 dB 6 8 9.8 4 6 8

Bitrate savings - Default preset

39.5 dB 41.0 dB 42.0 dB 5.8 7.8 9.7 3.9 5.8 7.9

Bitrate savings - HQ preset 33% 25% 18%

33% 26% 19%

Bitrate savings H.264 H.264 H.264 H.264 H.264 H.264 HEVC HEVC HEVC HEVC HEVC HEVC

slide-22
SLIDE 22

H.264 VS HEVC

Courtesy: Vanguard video

slide-23
SLIDE 23

H.264 VS HEVC

Courtesy: Vanguard video

slide-24
SLIDE 24

PERFORMANCE

slide-25
SLIDE 25

H.264 PERFORMANCE – GM20X

1080p, GM20x

1 pass 2 pass 2 pass 2 pass 2 pass 1 pass 1 pass 1 pass

fps 50 fps 100 fps 150 fps 200 fps 250 fps 300 fps 350 fps 400 fps 450 fps 500 fps HP LL-HP HQ LL-HQ Single pass 464 fps 342 fps 291 fps 293 fps Two pass 306 fps 246 fps 171 fps 181 fps Encode FPS

H.264 Performance (1080p)

1 pass 1 pass 1 pass 1 pass 2 pass 2 pass 2 pass 2 pass

slide-26
SLIDE 26

H.264/HEVC PERF COMPARISON

fps 50 fps 100 fps 150 fps 200 fps 250 fps 300 fps 350 fps HP LL-HP HQ LL-HQ H.264 306 fps 246 fps 171 fps 181 fps H.265 220 fps 214 fps 102 fps 153 fps Encode FPS

H.264/HEVC Performance: 2-pass

H.264 H.264 H.264 H.264 HEVC HEVC HEVC HEVC

slide-27
SLIDE 27

PERFORMANCE - TREND

fps 200 fps 400 fps 600 fps 800 fps 1000 fps 1200 fps 1400 fps 1600 fps 1800 fps Kepler (2011) Maxwell Gen 1 (2013) Maxwell Gen 2 (2014) Future

Performance

slide-28
SLIDE 28

ROADMAP

slide-29
SLIDE 29

ROADMAP

Core GPU chip IP Motion estimation only mode – 2H2015 SAO, 10/12-bit, HEVC B-frames Lossless/4:4:4 Improved quality for screen content encoding ME performance and quality enhancements Today: 4K@60fps Next: 8K@??

slide-30
SLIDE 30

THANK YOU

APATAIT@NVIDIA.COM