EVALUATING WINDOWS 10 LEARN WHY YOUR USERS NEED GPU ACCELERATION - - PowerPoint PPT Presentation

evaluating windows 10
SMART_READER_LITE
LIVE PREVIEW

EVALUATING WINDOWS 10 LEARN WHY YOUR USERS NEED GPU ACCELERATION - - PowerPoint PPT Presentation

May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN WHY YOUR USERS NEED GPU ACCELERATION Jason Kyungho Lee, Sr Performance Engineer, NVIDAI GRID @NVIDIA Hari Sivaraman, Staff Engineer @ VMware Introduction Latest Announcements


slide-1
SLIDE 1

May 8-11 2017 | Silicon Valley

EVALUATING WINDOWS 10 LEARN WHY YOUR USERS NEED GPU ACCELERATION

Jason Kyungho Lee, Sr Performance Engineer, NVIDAI GRID @NVIDIA Hari Sivaraman, Staff Engineer @ VMware

slide-2
SLIDE 2

2

AGENDA

  • Introduction
  • Latest Announcements
  • Windows 10 vs. Windows 7
  • Performance Testing
  • Summary
slide-3
SLIDE 3

3

TESLA LINEUP FOR GRID

The most powerful data center GPUs targeted at graphics virtualization

M10 M6 M60

GPU

Quad Mid-level Maxwell Single High-end Maxwell Dual High-end Maxwell

CUDA Cores

2560 (640 per GPU) 1536 4096 (2048 per GPU)

Memory Size

32 GB GDDR5 (8 GB per GPU) 8 GB GDDR5 16 GB GDDR5 (8GB per GPU)

H.264 1080p30 streams

28 18 36

Max vGPU instances

64 16 32

Form Factor

PCIe 3.0 Dual Slot (rack servers) MXM (blade servers) PCIe 3.0 Dual Slot (rack servers)

Power

225W 100W (75W opt) 240W / 300W (225W opt)

Thermal

passive bare board active / passive

USER DENSITY

Optimized

BLADE

Optimized

PERFORMANCE

Optimized

slide-4
SLIDE 4

5

LATEST ANNOUNCEMENTS

slide-5
SLIDE 5

6

LATEST ANNOUNCEMENTS

  • Instant Clone Support (VMware Horizon 7.1)
  • Allows ultra fast provisioning of virtual machines.
  • NVIDIA is the only GPU vendor supported
  • High Availability Support(VMware vSphere 6.5)
  • vSphere 6.5 supports HA for NVIDIA GRID vGPU enabled virtual machines
  • Multi Monitor support with Blast Extreme H.264 HW (VMware Horizon 7.1)
  • Offload the H.264 encode to the NVIDIA GPU for improved and predictable UX

S7763 - DELIVER A TRANSFORMATIVE 3D GRAPHICS USER EXPERIENCE WITH VMWARE HORIZON, BLAST EXTREME ADAPTIVE TRANSPORT , AND NVIDIA GRID S7429 - EXPERT AND CUSTOMER ROUNDTABLE: REAL-WORLD TALES OF GPU-ACCELERATED DESKTOPS AND APPS - IMPLEMENTERS SHARE BEST PRACTICES

slide-6
SLIDE 6

7

WINDOWS 10

slide-7
SLIDE 7

8

WINDOWS 10 NEW CHANGES

  • Visual compelling Modern UI / Menu with transparency
  • No Modern UI Disabling, assumption is you have GPU on Windows 10
  • GPU accelerated Virtual desktop / Task view / Alt-TAB preview
  • Video playback GPU acceleration by default media player
  • GPU accelerated font(DPI) and display scaling with Ultra high definition

resolution

  • Windows Device Driver Model WDDM 2.0 / DirectX 12 supported
  • Microsoft Edge GPU acceleration
slide-8
SLIDE 8

9

WINDOWS 10 REQUIRES MORE RESOURCES FOR IMPROVEMENT USER EXPERIENCE

Windows 10 requires more GPU frame buffer Windows 10 requires more CPU cycles

100 200 300 400 Windows 7 (single 1920x1080) Windows 10 (single 1920x1080) Windows 10 (single 2560x1600) Windows 10 (dual 1920x1080)

10 20 30 40 50 60 70 80 90 100

CPU host utilization % Time

Windows 7 Windows 10

64 x Tesla M10-1B VMs on a host running LoginVSI knowledge worker workload

15% more CPU utilization

slide-9
SLIDE 9

10

WINDOWS START BUTTON EXPERIENCE

This is Side-by-Side

slide-10
SLIDE 10

11

PERFORMANCE TESTING

slide-11
SLIDE 11

12

  • Two identical servers run LoginVSI Knowledge Worker

to create a realistic customer environment

  • CPU Utilization of the hosts is around 60-80%
  • Testers don’t know which session is GPU accelerated
  • Testers do the same tasks on both systems
  • Access Devices (Thin

Client/Monitor/Mouse/Keyboard) are the same with a single screen and 1080p resolution

  • Predefined scenarios plus freestyle at the end.
  • Scenarios include (Browsing, YouTube, Creation of

PowerPoint, Google Maps, WebGL)

TEST SETUP - SUBJECTIVE USER TESTING

slide-12
SLIDE 12

13

0.0 1.0 2.0 3.0 4.0 5.0 Horizon 7 with PCoIP - No GPU Horizon 7 with Blast Extreme and H.264 HW

CPU ONLY VS. NVIDIA GRID

GPU with NVENC provide an average positive increase to UX of 34%

Higher is better

Testing ran on two identical systems, CPU system was loaded up to 60-80% utilization, the GPU system ran the same workload

User Experience Scale 1 Unacceptable, unusable - fire someone in IT! 2 Barely useable, borderline, but I’ll get tired of this soon 3 Tolerable, I guess I can make do 4 Pretty good for a virtual desktop 5 Outstanding - as good (or almost) as physical

+20% +5% +19% +65% +6% +21% +55% +26% +9% +13% +13% +30% +68% +133%

slide-13
SLIDE 13

15

CLICK TO PHOTON

What it is and why it matters

  • Click-to-Photon is more than network latency
  • Click-to-Photon is a key metric that contributes to the overall user experience
  • Click-to-Photon defines how interactive/snappy the solution is
  • Click-to-Photon measures the overall latency from the user perspective
  • Click-to-Photon measures the time of the mouse click till the action is visible to the

user

  • includes latency of the USB device process, rendering the frame, displaying the frame,

etc.

  • Click-to-Photon in remote environments (VDI, etc.) in addition includes
  • encode latency, network latency and decode latency
slide-14
SLIDE 14

16

CLICK TO PHOTON SIMPLIFIED

Mouse button released Mouse click processed Packetized and encoded Packet Received Packed Decoded Frame displayed Packet transmitted Network Latency on the WAN (i.e. 50ms)

CLICK-TO-PHOTON CAPTURES THE OVERALL LATENCY

Network Latency on the WAN (i.e. 50ms)

Access Device Server

Packet Received Mouse click processed New Frame rendered Frame Captured via NVIDIA NVFBC Frame Encoded via NVIDIA NVENC Frame transmitted Packet Decoded Application

slide-15
SLIDE 15

17

CLICK TO PHOTON SIMPLIFIED

Mouse button released Mouse click processed Packetized and encoded Packet Received Packed Decoded Frame displayed Packet transmitted Network Latency on the WAN (i.e. 50ms)

CLICK-TO-PHOTON CAPTURES THE OVERALL LATENCY

Network Latency on the WAN (i.e. 50ms)

Access Device Server

CLICK-TO- PHOTON LATENCY

Packet Received Mouse click processed New Frame rendered Frame Captured via NVIDIA NVFBC Frame Encoded via NVIDIA NVENC Frame transmitted Packet Decoded Application

slide-16
SLIDE 16

18

CLICK TO PHOTON SIMPLIFIED

Mouse button released Mouse click processed Packetized and encoded Packet Received Packed Decoded Frame displayed Packet transmitted Network Latency on the WAN (i.e. 50ms)

CLICK-TO-PHOTON CAPTURES THE OVERALL LATENCY

Network Latency on the WAN (i.e. 50ms)

Access Device Server

CLICK-TO- PHOTON LATENCY Network Latency

Packet Received Mouse click processed New Frame rendered Frame Captured via NVIDIA NVFBC Frame Encoded via NVIDIA NVENC Frame transmitted Packet Decoded Application

slide-17
SLIDE 17

19

65 185 155 165 125 107

50 100 150 200 250 300

Local PC with Integrated GPU Blast Extreme No GPU - JPEG/PNG Blast Extreme M10-1B - JPEG/PNG Blast Extreme No GPU - H.264 Software Blast Extreme M10-1B - H.264 Software Blast Extreme M10-1B - H.264 Hardware

CLICK TO PHOTON LATENCY

Blast Extreme with NVENC decreases latency up to 140ms at <1ms network latency

Lower is better ms

slide-18
SLIDE 18

20

65 185 155 165 125 107 250 170 240 160 110

50 100 150 200 250 300 Local PC with Integrated GPU Blast Extreme No GPU - JPEG/PNG Blast Extreme M10-1B - JPEG/PNG Blast Extreme No GPU - H.264 Software Blast Extreme M10-1B - H.264 Software Blast Extreme M10-1B - H.264 Hardware Idle, 1 VM Scale, 64VMs

Lower is better

63 x Tesla M10-1B VMs on a host running LoginVSI knowledge worker workload and 1 additional VM measuring latency

CLICK TO PHOTON LATENCY

Comparing latency of single VM and at scale at <1ms network latency

ms

slide-19
SLIDE 19

21

HOST CPU OFFLOADING

Blast Extreme decreases CPU utilization on the host, up to 42%

Lower is better

63 x Tesla M10-1B VMs on a host running LoginVSI knowledge worker workload and 1 additional VM measuring latency

10 20 30 40 50 60 70 80 90 100 NOGPU-PCoIP GPU-PCoIP NoGPU-JPEG GPU-JPEG NOGPU-Blast-H.264 CPU GPU-BLAST-H.264CPU GPU-BLAST-NVENC

15000 30000 45000 60000 75000 90000

slide-20
SLIDE 20

22 10 20 30 40 50 60 70 80 90

Percent One CPU core Time Time

Remoting process utilization(PCoIP_server.exe or BlastW.exe) in Guest VM

NOGPU-PCoIP GPU-PCoIP NoGPU-JPEG GPU-JPEG NOGPU-Blast-H.264 CPU GPU-BLAST-H.264CPU GPU-BLAST-NVENC

GUEST VM, REMOTING PROCESS CPU OFFLOADING

Blast Extreme decreases CPU utilization on the VM

Lower is better

63 x Tesla M10-0B VMs on a host running LoginVSI knowledge worker workload and 1 additional VM measuring latency

slide-21
SLIDE 21

23

VIDEO PLAYBACK

Up to 52% improved User Experience due to GRID vGPU and H.264

FPS is remoted FPS

slide-22
SLIDE 22

24

VIDEO PLAYBACK

10 15 20 25 10 20 30 40

FPS #VM

Average FPS for a set of Videos

JPG +vGPU HW- H264 + vGPU JPG-NO vGPU SW-H264

5 105 205 305 405 505 605 705 805 10 20 30 40

FPS #VM

Total FPS for a set of Videos

JPG +vGPU HW-H264 + vGPU JPG-NO vGPU SW-H264

slide-23
SLIDE 23

25

VIDEO PLAYBACK

5 10 15 20 25 5 10 15 20 25 30 35

CPU-Util (%) #VM

CPU-Util (%) for a set of Videos

JPG +vGPU HW- H264+vGPU JPG-NO vGPU SW-H264

slide-24
SLIDE 24

26

VIDEOS

slide-25
SLIDE 25

27

POWERPOINT ANIMATION

This is Side-by-Side

slide-26
SLIDE 26

28

VIDEO PLAYBACK AND OFF LOADING CPU

This is Side-by-Side

slide-27
SLIDE 27

29

SUMMARY

slide-28
SLIDE 28

30

WINDOWS 10 IS DIFFERENT

Windows 10 is Microsoft’s most graphical operating system

  • Windows is differs to Windows 10
  • requires more CPU resources
  • Leveraged the GPU more
  • NVIDIA GRID vGPU
  • Improves user experience (as Microsoft intended)
  • Reduces Click-to-Photon latency(snappy user interaction)
  • Predictable and consistent user experience
  • reduces CPU cycles to allow higher user density

6/9/2017

slide-29
SLIDE 29

May 8-11 2017 | Silicon Valley

THANK YOU