May 8-11 2017 | Silicon Valley
EVALUATING WINDOWS 10 LEARN WHY YOUR USERS NEED GPU ACCELERATION
Jason Kyungho Lee, Sr Performance Engineer, NVIDAI GRID @NVIDIA Hari Sivaraman, Staff Engineer @ VMware
EVALUATING WINDOWS 10 LEARN WHY YOUR USERS NEED GPU ACCELERATION - - PowerPoint PPT Presentation
May 8-11 2017 | Silicon Valley EVALUATING WINDOWS 10 LEARN WHY YOUR USERS NEED GPU ACCELERATION Jason Kyungho Lee, Sr Performance Engineer, NVIDAI GRID @NVIDIA Hari Sivaraman, Staff Engineer @ VMware Introduction Latest Announcements
May 8-11 2017 | Silicon Valley
Jason Kyungho Lee, Sr Performance Engineer, NVIDAI GRID @NVIDIA Hari Sivaraman, Staff Engineer @ VMware
2
3
M10 M6 M60
GPU
Quad Mid-level Maxwell Single High-end Maxwell Dual High-end Maxwell
CUDA Cores
2560 (640 per GPU) 1536 4096 (2048 per GPU)
Memory Size
32 GB GDDR5 (8 GB per GPU) 8 GB GDDR5 16 GB GDDR5 (8GB per GPU)
H.264 1080p30 streams
28 18 36
Max vGPU instances
64 16 32
Form Factor
PCIe 3.0 Dual Slot (rack servers) MXM (blade servers) PCIe 3.0 Dual Slot (rack servers)
Power
225W 100W (75W opt) 240W / 300W (225W opt)
Thermal
passive bare board active / passive
USER DENSITY
Optimized
BLADE
Optimized
PERFORMANCE
Optimized
5
6
S7763 - DELIVER A TRANSFORMATIVE 3D GRAPHICS USER EXPERIENCE WITH VMWARE HORIZON, BLAST EXTREME ADAPTIVE TRANSPORT , AND NVIDIA GRID S7429 - EXPERT AND CUSTOMER ROUNDTABLE: REAL-WORLD TALES OF GPU-ACCELERATED DESKTOPS AND APPS - IMPLEMENTERS SHARE BEST PRACTICES
7
8
resolution
9
Windows 10 requires more GPU frame buffer Windows 10 requires more CPU cycles
100 200 300 400 Windows 7 (single 1920x1080) Windows 10 (single 1920x1080) Windows 10 (single 2560x1600) Windows 10 (dual 1920x1080)
10 20 30 40 50 60 70 80 90 100
CPU host utilization % Time
Windows 7 Windows 10
64 x Tesla M10-1B VMs on a host running LoginVSI knowledge worker workload
15% more CPU utilization
10
11
12
to create a realistic customer environment
Client/Monitor/Mouse/Keyboard) are the same with a single screen and 1080p resolution
PowerPoint, Google Maps, WebGL)
13
0.0 1.0 2.0 3.0 4.0 5.0 Horizon 7 with PCoIP - No GPU Horizon 7 with Blast Extreme and H.264 HW
Higher is better
Testing ran on two identical systems, CPU system was loaded up to 60-80% utilization, the GPU system ran the same workload
User Experience Scale 1 Unacceptable, unusable - fire someone in IT! 2 Barely useable, borderline, but I’ll get tired of this soon 3 Tolerable, I guess I can make do 4 Pretty good for a virtual desktop 5 Outstanding - as good (or almost) as physical
+20% +5% +19% +65% +6% +21% +55% +26% +9% +13% +13% +30% +68% +133%
15
user
etc.
16
Mouse button released Mouse click processed Packetized and encoded Packet Received Packed Decoded Frame displayed Packet transmitted Network Latency on the WAN (i.e. 50ms)
Network Latency on the WAN (i.e. 50ms)
Access Device Server
Packet Received Mouse click processed New Frame rendered Frame Captured via NVIDIA NVFBC Frame Encoded via NVIDIA NVENC Frame transmitted Packet Decoded Application
17
Mouse button released Mouse click processed Packetized and encoded Packet Received Packed Decoded Frame displayed Packet transmitted Network Latency on the WAN (i.e. 50ms)
Network Latency on the WAN (i.e. 50ms)
Access Device Server
CLICK-TO- PHOTON LATENCY
Packet Received Mouse click processed New Frame rendered Frame Captured via NVIDIA NVFBC Frame Encoded via NVIDIA NVENC Frame transmitted Packet Decoded Application
18
Mouse button released Mouse click processed Packetized and encoded Packet Received Packed Decoded Frame displayed Packet transmitted Network Latency on the WAN (i.e. 50ms)
Network Latency on the WAN (i.e. 50ms)
Access Device Server
CLICK-TO- PHOTON LATENCY Network Latency
Packet Received Mouse click processed New Frame rendered Frame Captured via NVIDIA NVFBC Frame Encoded via NVIDIA NVENC Frame transmitted Packet Decoded Application
19
65 185 155 165 125 107
50 100 150 200 250 300
Local PC with Integrated GPU Blast Extreme No GPU - JPEG/PNG Blast Extreme M10-1B - JPEG/PNG Blast Extreme No GPU - H.264 Software Blast Extreme M10-1B - H.264 Software Blast Extreme M10-1B - H.264 Hardware
Lower is better ms
20
65 185 155 165 125 107 250 170 240 160 110
50 100 150 200 250 300 Local PC with Integrated GPU Blast Extreme No GPU - JPEG/PNG Blast Extreme M10-1B - JPEG/PNG Blast Extreme No GPU - H.264 Software Blast Extreme M10-1B - H.264 Software Blast Extreme M10-1B - H.264 Hardware Idle, 1 VM Scale, 64VMs
Lower is better
63 x Tesla M10-1B VMs on a host running LoginVSI knowledge worker workload and 1 additional VM measuring latency
ms
21
Lower is better
63 x Tesla M10-1B VMs on a host running LoginVSI knowledge worker workload and 1 additional VM measuring latency
10 20 30 40 50 60 70 80 90 100 NOGPU-PCoIP GPU-PCoIP NoGPU-JPEG GPU-JPEG NOGPU-Blast-H.264 CPU GPU-BLAST-H.264CPU GPU-BLAST-NVENC
15000 30000 45000 60000 75000 90000
22 10 20 30 40 50 60 70 80 90
Percent One CPU core Time Time
Remoting process utilization(PCoIP_server.exe or BlastW.exe) in Guest VM
NOGPU-PCoIP GPU-PCoIP NoGPU-JPEG GPU-JPEG NOGPU-Blast-H.264 CPU GPU-BLAST-H.264CPU GPU-BLAST-NVENC
Lower is better
63 x Tesla M10-0B VMs on a host running LoginVSI knowledge worker workload and 1 additional VM measuring latency
23
FPS is remoted FPS
24
10 15 20 25 10 20 30 40
FPS #VM
Average FPS for a set of Videos
JPG +vGPU HW- H264 + vGPU JPG-NO vGPU SW-H264
5 105 205 305 405 505 605 705 805 10 20 30 40
FPS #VM
Total FPS for a set of Videos
JPG +vGPU HW-H264 + vGPU JPG-NO vGPU SW-H264
25
5 10 15 20 25 5 10 15 20 25 30 35
CPU-Util (%) #VM
CPU-Util (%) for a set of Videos
JPG +vGPU HW- H264+vGPU JPG-NO vGPU SW-H264
26
27
28
29
30
6/9/2017
May 8-11 2017 | Silicon Valley