S9884 USER EXPERIENCE IS KEY TO VDI SUCCESS, COLOR ACCURACY IS THE - - PowerPoint PPT Presentation

s9884 user experience is key to vdi success color
SMART_READER_LITE
LIVE PREVIEW

S9884 USER EXPERIENCE IS KEY TO VDI SUCCESS, COLOR ACCURACY IS THE - - PowerPoint PPT Presentation

S9884 USER EXPERIENCE IS KEY TO VDI SUCCESS, COLOR ACCURACY IS THE KEY TO USER EXPERIENCE Nachiket Karmakar Sr. Performance Engineer - NVIDIA SESSION TARGET Why is it key to choose the right protocol to get the best user experience CITRIX


slide-1
SLIDE 1

Nachiket Karmakar – Sr. Performance Engineer - NVIDIA

S9884 USER EXPERIENCE IS KEY TO VDI SUCCESS, COLOR ACCURACY IS THE KEY TO USER EXPERIENCE

slide-2
SLIDE 2 2

SESSION TARGET

  • CITRIX PROTOCOL OVERVIEW
  • PROTOCOL/CODEC USAGE SCENARIOS
  • IMAGE QUALITY HUMAN EYE & SSIM MEASUREMENT FOR H.264
  • BANDWIDTH COMPARISON FOR VIDEO USE CASE
  • VDI ON SCALE TESTING
  • WRAP-UP

Why is it key to choose the right protocol to get the best user experience

slide-3
SLIDE 3 3

PROTOCOL & CODECS

Video Codec Policy Region Visual Quality CODECS USED HW ENC*

Do Not Use Region optimized Medium Static: JPEG (90) + 2D/MDRLE Video: Adaptive JPEG (10-65) No For Entire Screen Entire Screen Medium H.264 4:2:0 Yes For act. changing regions Region optimized Medium Static: JPEG (90) + 2D/MDRLE Video: H.264 4:2:0 Yes H.264+TextOptimization* Entire Screen Medium H.264 4:2:0 + Lossless Text No For Entire Screen Entire Screen Build To Lossless H.264 4:2:0 during activity, 2D/MDRLE when stationary Yes For Entire Screen Entire Screen Visual lossless: Medium H.264 4:4:4 Yes For Entire Screen (H.265) Entire Screen Medium H.265 4:2:0 Yes For act. changing regions (H.265) Region optimized Medium Static: JPEG (90) + 2D/MDRLE Video: H.265 4:2:0 Yes For act. changing regions (H.265) Entire Screen Build To Lossless H.265 4:2:0 during activity, 2D/MDRLE when stationary Yes

Citrix XenDesktop 7.18

* videocodec (H.264/H.265) part via NVENC * no policy available for TextOpt

slide-4
SLIDE 4 4

CODECS & USE CASE

Bitmap (JPG, RLE) H.264 H.265

  • 2DRLE/MDRLE for text/crisp areas,

JPEG for photographic imagery

  • „Build to Lossless“ and „Always

Lossless“ policies for pixel perfect quality

  • Many compression policies (Image

quality, color depth, etc.)

  • Can utilize client side bitmap cache
  • No hardware encoding (NVENC)
  • Very bandwidth efficient for static

content

YUV 4:2:0

  • Good compression and visual quality
  • Hardware encoding (NVENC)
  • Chroma subsampling yields blurred text
  • Bandwidth efficient for video/moving

images

YUV 4:4:4

  • Very good visual quality
  • Hardware encoding (NVENC)
  • No chroma subsampling
  • Great for sharp graphics as well as text
  • Increase in bandwidth

YUV 4:2:0

  • Better compression at same visual

quality or same quality at lower bandwidth (compared to H.264)

  • Requires hardware encoding (NVENC)
  • No CPU encoding as it would be to cost

intensive (~8xCPU load compared to H.264)

  • Requires specific endpoint capabilities

to decode H.265. Use 3rd party tools like DXVAChecker to see if your endpoint is capable

What to use when...

Office VDI usage 3D VDI usage 3D VDI usage with high color accuracy requirements 3D VDI usage in low bandwidth scenarios

slide-5
SLIDE 5 5

CODECS & USE CASE

Mixed Mode (Video and Bitmap)

Adaptive Display / Selective H.264/H.265

  • „Hybrid“: Use the best available codec for a specific screen „region“
  • Leverages hardware encoding H.264/H.265 (NVENC) for video regions (a.k.a. „Selective H.264“). If HW encoding not available,

software H.264 encoding is used.

  • Very good image quality for static content (Bitmap) and low bandwidth requirement for moving images/video (H.264/H.265)

H.264/H.265 / Build to Lossless (NEW with 7.18)

  • Hardware encoding (NVENC) for video codec usage
  • „Sharpening“ effect when changing from moving to static content but pixel perfect quality
  • Chroma subsampling less problematic as it is used only for moving images/video

What to use when...

Office VDI usage with multimedia content 3D VDI usage with high color accuracy requirements and low bandwidth

slide-6
SLIDE 6 6

IMAGE QUALITY COMPARISON

slide-7
SLIDE 7 7

COMPARISON H.264

YUV4:2:0 and YUV4:4:4 (Reference Image)

slide-8
SLIDE 8 8

COMPARISON H.264

YUV4:2:0 and YUV4:4:4

Citrix YUV420 Citrix YUV444 Citrix YUV420 Citrix YUV444

slide-9
SLIDE 9 9

H.264 (STATIC TEXT)

YUV4:2:0 YUV4:4:4

slide-10
SLIDE 10 11

IMAGE QUALITY

Static Text

H.264 YUV 4:2:0 (Entire Screen) H.264 YUV 4:4:4 (Entire Screen) Bitmap MDRLE H.264 YUV 4:2:0 (Active Regions) H.264 YUV 4:2:0 (Entire Screen, VQ: BTL) H.264 YUV 4:2:0 (TextOptimization) H.265 YUV 4:2:0 (Entire Screen) H.265 YUV 4:2:0 (Entire Screen, VQ: BTL) H.265 YUV 4:2:0 (Active Regions) SSIM (StaticText) 0.83086 0.98362 0.99995 0.99994 0.9999 0.99111 0.83118 0.99872 0.99993 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

SSIM

Image Quality (Static Text)

slide-11
SLIDE 11 12

IMAGE QUALITY

Heatmaps

H264 YUV 4:2:0 (Entire Screen) H264 YUV 4:2:0 (BTL) H264 YUV 4:2:0 (TextOptimization) H264 YUV 4:4:4: (Entire Screen) H265 YUV 4:2:0 (Entire Screen) Bitmap Encoding (JPEG/RLE)

slide-12
SLIDE 12 13

COMPARISON H.264

YUV4:2:0 and YUV4:4:4 (Reference Image)

slide-13
SLIDE 13 14

COMPARISON H.264

YUV4:2:0 and YUV4:4:4

slide-14
SLIDE 14 15

H.264 (WIREFRAME)

YUV4:2:0 YUV4:4:4

slide-15
SLIDE 15 17

IMAGE QUALITY

Wireframe

H.264 YUV 4:2:0 (Entire Screen) H.264 YUV 4:4:4 (Entire Screen) Bitmap MDRLE H.264 YUV 4:2:0 (Active Regions) H.264 YUV 4:2:0 (Entire Screen, VQ: BTL) H.264 YUV 4:2:0 (TextOptimization) H.265 YUV 4:2:0 (Entire Screen) H.265 YUV 4:2:0 (Entire Screen, VQ: BTL) H.265 YUV 4:2:0 (Active Regions) SSIM (Wireframe) 0.99083 0.99738 0.99158 0.98559 0.99992 0.9915 0.99162 0.99994 0.99144 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

SSIM

Image Quality (Wireframe)

slide-16
SLIDE 16 18

BANDWIDTH COMPARISON (VIDEO)

slide-17
SLIDE 17 19

BANDWIDTH COMPARISON

Video playback scenario

141408x592 window size 2:30min duration Win10 with 1920x1200 resolution, 2vCPUs@3.5GHz, P40-1B profile

slide-18
SLIDE 18 20

BANDWIDTH COMPARISON

Video playback @ 30fps

CODEC Visual Quality Encoder CPU Total FPS MB transfered

Bitmap JPG/RLE Medium 7% 3693 355MB H.264 YUV420 Medium 2% 3736 220MB H.264 YUV444 Medium 3% 3728 655MB H.264/Bitmap* Medium 7% 3698 205MB H.264 Build To lossless 5% 3642 195MB H.264 TextOpt Medium 23% 3448 160MB H.265 YUV420 Medium 2% 3766 180MB H.265/Bitmap* Medium 8% 3721 185MB H.265 Build To Lossless 5% 3796 175MB

*Adaptive Display (active changing regions)

slide-19
SLIDE 19 21

BANDWIDTH COMPARISON

Video playback @ 30fps

CODEC Visual Quality Encoder CPU Total FPS MB transfered

Bitmap JPG/RLE High 8% 3633 610MB H.264 YUV420 High 2% 3719 210MB H.264 YUV444 High 4% 3716 690MB H.264/Bitmap* High 5% 3671 215MB H.264 Build To lossless 5% 3642 195MB H.264 TextOpt High 22% 3508 160MB H.265 YUV420 High 3% 3780 185MB H.265/Bitmap* High 7% 3627 175MB H.265 Build To Lossless 5% 3796 175MB

*Adaptive Display (active changing regions)

slide-20
SLIDE 20 22

VDI ON SCALE TESTING 24 VMS ON 1 TESLA P40

slide-21
SLIDE 21 23

TEST SYSTEM

Configuration Details

Host Configuration VDI Configuration Cisco UCS C240 M5 vCPU - 4 Intel Xeon Gold 6154 @ 3.00 GHz vRAM – 4096 MB VMware ESXi 6.7 NIC – 1 (E1000) Number of CPUs: 36 (2 x 18) Hard Disk – 40 GB Memory: 768 GB vGPU – P40-1B Storage: All-Flash SAN (iSCSI) Virtual Hardware – vmx-14 Hyperthreading, Turbo boost FRL enabled - Yes Power Setting: High Performance VDI agent – CITRIX XenDesktop 7.18 GPU: 1 x P40 CITRIX HDX GPU Scheduling Policy – Best Effort Number of Screens - 2 NVIDIA vGPU Driver 6.2 390.72 Screen Resolution – 1920 x 1080

Cirrus Knowledge Worker Workload (Excel, Word, PowerPoint, Chrome, Media Player, PDF)

slide-22
SLIDE 22 24

END USER LATENCY (CLICK TO PHOTON)

H.264 YUV 4:2:0 (Entire Screen) H.264 YUV 4:4:4 (Entire Screen) Bitmap JPG/RLE H.264 YUV 4:2:0 (Active Regions) H.264 YUV 4:2:0 (Entire Screen, VQ: BTL) H.264 YUV 4:2:0 (TextOptimiza tion) H.265 YUV 4:2:0 (Entire Screen) H.265 YUV 4:2:0 (Entire Screen, VQ: BTL) H.265 YUV 4:2:0 (Active Regions) End User Latency 115 166 199 199 116 132 115 132 201 50 100 150 200 250

Milliseconds

End User Latency

slide-23
SLIDE 23 25

TOTAL REMOTED FRAMES

H.264 YUV 4:2:0 (Entire Screen) H.264 YUV 4:4:4 (Entire Screen) Bitmap MDRLE H.264 YUV 4:2:0 (Active Regions) H.264 YUV 4:2:0 (Entire Screen, VQ: BTL) H.264 YUV 4:2:0 (TextOptimizat ion) H.265 YUV 4:2:0 (Entire Screen) H.265 YUV 4:2:0 (Entire Screen, VQ: BTL) H.265 YUV 4:2:0 (Active Regions) Total FPS 11684.33333 11799.08333 13347.625 13165.20833 20278.20833 11608.41667 11564.33333 20006.45833 13220.5 5000 10000 15000 20000 25000

Remoted Frames

slide-24
SLIDE 24 26

BANDWIDTH H.264

5000 10000 15000 20000 25000 1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196 209 222 235 248 261 274 287 300 313 326 339 352 365 378 391 404 417 430 443 456 469 482 495 508 521 534 547 560 573 586 599 612 625 638 651 664 677 690 703 716 729 742 755 768

Mbits

ESX Server - Transmitted Bandwidth (Cumulative Mbits)

H.264 YUV 4:2:0 (Entire Screen) H.264 YUV 4:4:4 (Entire Screen) Bitmap JPG/RLE H.264 YUV 4:2:0 (Active Regions) H.264 YUV 4:2:0 (Entire Screen, VQ: BTL)
slide-25
SLIDE 25 27

BANDWIDTH H.265

5000 10000 15000 20000 25000 1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193 205 217 229 241 253 265 277 289 301 313 325 337 349 361 373 385 397 409 421 433 445 457 469 481 493 505 517 529 541 553 565 577 589 601 613 625 637 649 661 673 685 697 709 721 733 745 757 769

Mbits

ESX Server - Transmitted Bandwidth (Cumulative Mbits)

Bitmap JPG/RLE H.265 YUV 4:2:0 (Entire Screen) H.265 YUV 4:2:0 (Entire Screen, VQ: BTL) H.265 YUV 4:2:0 (Active Regions)
slide-26
SLIDE 26 28

WRAP-UP

  • H.264 BTL is a very interesting addition for different use cases. If the users get

used to the „sharpening“ effect in their session this is the best possible compromise between visual quality, performance and bandwidth consumption which finally leads to the best achievable USER EXPERIENCE

  • Bitmap (Thinwire+) is still a good solution for pure office VDI use case, same

applies to Mixed Mode(Adaptive Display)

  • H.265 leads to slightly reduced bandwidth consumption and is therefore

interesting for 3D use cases with limited bandwidth

Analyzing the data lead to the following...

slide-27
SLIDE 27 29

USEFUL TECHNICAL RESOURCES

  • http://sschaber.de/blog/
  • https://www.nvidia.com/object/better-ux.html
  • https://www.nvidia.com/object/quantifying-impact-of-vgpu-whitepaper.html
  • https://www.nvidia.com/en-us/design-visualization/solutions/virtualization/resources/

Blogs, white papers and everything vGPU

slide-28
SLIDE 28 30

NVIDIA VIRTUAL GPU RESOURCES

Virtual GPU Test Drive https://www.nvidia.com/tryvgpu NVIDIA Virtual GPU Website www.nvidia.com/virtualgpu NVIDIA Virtual GPU YouTube Channel http://tinyurl.com/gridvideos Questions? Ask on our Forums https://gridforums.nvidia.com NVIDIA Virtual GPU on LinkedIn http://linkd.in/QG4A6u Follow us on Twitter @NVIDIAVirt

slide-29
SLIDE 29 31

Q & A