April 4-7, 2016 | Silicon Valley
Manvender Rawat, NVIDIA Jason K. Lee, NVIDIA Uday Kurkure, VMware Inc.
REAL PERFORMANCE RESULTS WITH VMWARE HORIZON AND VIEWPLANNER - - PowerPoint PPT Presentation
April 4-7, 2016 | Silicon Valley REAL PERFORMANCE RESULTS WITH VMWARE HORIZON AND VIEWPLANNER Manvender Rawat, NVIDIA Jason K. Lee, NVIDIA Uday Kurkure, VMware Inc. Overview of VMware Horizon 7 and NVIDIA GRID 2.0 Overview of VMware View
April 4-7, 2016 | Silicon Valley
Manvender Rawat, NVIDIA Jason K. Lee, NVIDIA Uday Kurkure, VMware Inc.
2
Overview of VMware Horizon 7 and NVIDIA GRID 2.0 Overview of VMware View Planner Blast Protocol Performance and Scaling Results with Knowledge Worker Workloads Blast Extreme (GPU) vs. Blast Extreme (CPU ) vs PCoIP
3
4
5
Server Hypervisor Virtual PC Virtual Workstation Virtual PC Virtual Workstation
Virtual PC NVIDIA GPU H.264 Encode Virtual Workstation
NVIDIA Graphics Driver NVIDIA Quadro Driver
NVIDIA GRID vGPU manager
NVIDIA Graphics Driver NVIDIA Graphics Driver NVIDIA Quadro Driver NVIDIA Quadro Driver
vGPU vGPU vGPU vGPU vGPU vGPU CPUs NVIDIA GPU
Hardware Virtualization Layer
6
CLIENT
Render Kybd/Mse
SERVER with GRID GPU
Capture Encode
IP Network
CPU NIC GRID GPU WORKLOAD NON GPU WORKLOAD
Decode
Render
7
CLIENT
Render Kybd/Mse
SERVER with GRID GPU
Capture Encode
IP Network
CPU NIC GRID GPU WORKLOAD NON GPU WORKLOAD
Decode
Render
8
Load App Execute CPU workload Load GPU data in FB Execute GPU workload Transfer
sys-mem Transfer image to sys-mem Encode Packetize & transmit
CPU GPU CPU
Capture Display Encode
9
Load App
Execute CPU workload Load GPU data in FB Execute GPU workload Capture Display Packetize & transmit Encode Encode Load GPU data in FB Execute GPU workload Capture Display Encode Encode Load GPU data in FB Execute GPU workload Capture Display Encode Encode Load GPU data in FB Execute GPU workload Capture Display Encode Encode Load GPU data in FB Execute GPU workload Capture Display Encode Encode Load GPU data in FB Execute GPU workload Capture Display Encode Encode Load GPU data in FB Execute GPU workload Capture Display Encode Encode Load GPU data in FB Execute GPU workload Capture Display Encode
CPU GPU
10
Selection of Workloads/Applications Automation Performance Metrics Scaling
11
Simplicity: Ease of use - Simple Web Interface Expandability: Easily Add New Workloads Elasticity: Ease of Scaling with View and VP
12
Select the Workload Applications Provision the desired number of Desktop Virtual Machines with View and ViewPlanner Automatically Launch the Horizon Clients to Connect with the Desktops Automatically Start the workload on each of the desktop VMs Measure the Response times on the remote clients
Do the analysis on Response Times and Resource Utilization
Do the Scaling Experiments
13
14
15
Ramp down Steady State Ramp up For accurate results, the scores are computed in the Steady State Range. Exclude the Ramp Up & Ramp Down Iteration results.
16
17
18
Remote Display Protocol Blast Extreme / PCoIP
Storage
SuperMicro SYS-2027GR-TRFH Intel Xeon E5- 2690 v2 @ 3.00GHz + 2 x Nvidia GRID K1 20 cores (2 x 10-core socket) Intel IvyBridge 256 GB RAM SuperMicro SYS-2028GR-TRT Intel Xeon E5-2698 v3 @ 2.30GHz + 2 x Nvidia GRID M60 32 cores (2 x 16-core socket) Intel Haswell 256 GB RAM
Virtual Client VMs
Virtual VDI desktop VMs
19
20
3D intensive app
21
notice the difference once you exceed 30 FPS.
become unusable by the time you hit 5 FPS.
22
screen, NVEnc encoder didn’t utilize(around 50% during all benchmark)
20 30 40 50 60 70 80 90 100 23:10:57 23:11:59 23:13:01 23:14:03 23:15:05 23:16:07 23:17:09 23:18:11 23:19:12 23:20:14 23:21:16 23:22:18 23:23:20 23:24:22 23:25:24 23:26:25 23:27:27 23:28:29 23:29:31 23:30:33 23:31:35 23:32:37 23:33:39 23:34:40 23:35:42 23:36:44 23:37:46 23:38:48 23:39:50 23:40:52 23:41:54 23:42:56 23:43:58 23:45:00 23:46:02 23:47:04 23:48:06 23:49:08 23:50:10 23:51:12 23:52:14 23:53:16 23:54:18 23:55:20 23:56:22 23:57:23 23:58:25 23:59:27 0:00:29 0:01:31 0:02:32 0:03:34 0:04:36 0:05:37 0:06:39 0:07:41 0:08:43 0:09:45 0:10:46 0:11:48 0:12:50 0:13:51
Host CPU utilization, NVEnc vs PCoIP Total 10913 vs 10570 : Very similar
nvenc pcoip NvEnc Encoder
Lower is better
23
10 20 30 40 50 60 70 80 90 100 19:54:47 19:56:48 19:58:49 20:00:51 20:02:52 20:04:53 20:06:54 20:08:56 20:10:57 20:12:58 20:15:00 20:17:01 20:19:02 20:21:04 20:23:05 20:25:06 20:27:08 20:29:09 20:31:10 20:33:11 20:35:13 20:37:14 20:39:15 20:41:17 20:43:18 20:45:19 20:47:21 20:49:22 20:51:23 20:53:24 20:55:26 20:57:27 20:59:28 21:01:29 21:03:31 21:05:32 21:07:33 21:09:35 21:11:36 21:13:37 21:15:39 21:17:40 21:19:41 21:21:42 21:23:44 21:25:45 21:27:46 21:29:48 21:31:49 21:33:50
Utilization % Time
GPU utilization GPU memory utilization
24
we can scale more than 32. Planning testing go further.
36.81 36.49 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 NvEnc(build3) PCoIP
FPS
AutoCAD AVG FPS, M60-1Q 32VMs Blast Extreme(GPU) vs PCoIP
Higher is better
Minimum fps for UX
25
Remote Display Protocol Blast Extreme / PCoIP
Storage
Dell R730 – Intel Haswell CPUs + 2 x NVidia GRID M60 24 cores (2 x 12-core socket) E5-2680 V3 384 GB RAM Dell R730 – Intel Haswell CPUs + 2 x NVidia GRID M60 24 cores (2 x 12-core socket) E5-2680 V3 384 GB RAM
Virtual Client VMs
Virtual VDI desktop VMs
26
CONFIDENTIAL 2 6
27
Knowledge Worker Applications in ViewPlanner 3.6 Office Apps: Word, Excel, PowerPoint, Outlook Adobe Acrobat Reader, Firefox, 7zip Windows Media Player
28
Operations are split in Groups
4/20/2016
29
Measures True Remote User Experience
are noted on the Remote Client as the Remote Client sees it.
counted
4/20/2016
30
0.00 0.20 0.40 0.60 0.80 1.00 1.20 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.00 8.00 16.00 32.00 48.00 64.00 Normalized Latencies wrt PCoIP Seconds #of VMs
Lower is Better
BlastGPU BlastCPU PCoIP BlastGPU/PCoIP BlastCPU/PCoIP
31
0.00 0.20 0.40 0.60 0.80 1.00 1.20 0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 1.00 8.00 16.00 32.00 48.00 64.00 Normalized Latencies wrt PCoIP Seconds #of VMs
BlastGPU BlastCPU PCoIP BlastGPU/PCoIP BlastCPU/PCoIP
32
33
CONFIDENTIAL 3 3
34
CONFIDENTIAL 3 4
35
0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 100 200 300 400 500 600 700 800 8 16 32 48 Normalized FPS wrt PCoIP Cumulative FPS #of VMs
Higher is Better
BlastGPU BlastCPU PCoIP BlastGPU/PCoIP BlastCPU/PCoIP Linear (BlastGPU/PCoIP)
36
0.00 0.50 1.00 1.50 2.00 2.50 3.00 20 40 60 80 100 120 8 16 32 48 Normalized Average CPU Util. w rt PCoIP %CPU Utilization #of VMs
Lower is Better
BlastGPU BlastCPU PCoIP BlastGPU/PCoIP BlastCPU/PCoIP Linear (BlastGPU/PCoIP)
37
Better User Experience More Frames/Second Lower Latencies: Better Response Times Lower CPU Utilizatio Better Scalability
38
Horizon 6 Using NVIDIA GRID™ vGPUs by ManVender Rawat and Lan VU
GRID™ vGPU by Pat Lee and Luke Wignall
April 4-7, 2016 | Silicon Valley
JOIN THE NVIDIA DEVELOPER PROGRAM AT developer.nvidia.com/join
40
41
42
Body/bullet text no longer has a bullet icon Use 20 pt font No sub-bullets allowed No more than five bullets; one idea per bullet Example of highlighted text
43
44
GRID GPU
3D HW Encoder Framebuffer
Apps Apps Apps Graphics commands
Context/Display Capture Render Target Front Buffer
H.264 / H.265 streams
Remote Client