ERIK BOHNHORST , SR. GRID SOLUTION ARCHITECT , NVIDIA RONALD GRASS, SR. SYSTEMS ENGINEER, CITRIX SYSTEMS
NVIDIA GRID DEPLOYMENT ERIK BOHNHORST , SR. GRID SOLUTION ARCHITECT - - PowerPoint PPT Presentation
NVIDIA GRID DEPLOYMENT ERIK BOHNHORST , SR. GRID SOLUTION ARCHITECT - - PowerPoint PPT Presentation
S5393 - EVOLUTION OF AN NVIDIA GRID DEPLOYMENT ERIK BOHNHORST , SR. GRID SOLUTION ARCHITECT , NVIDIA RONALD GRASS, SR. SYSTEMS ENGINEER, CITRIX SYSTEMS What we will cover Who implemented NVIDIA GRID with Citrix XenDesktop Why did they want
Who implemented NVIDIA GRID with Citrix XenDesktop Why did they want to move to a remote desktop solution How did they evaluate and implemented NVIDIA GRID Sales pitch & TechDemo Proof of concept Production environment Challenges and learnings How will they move forward
What we will cover
Manufacturing vertical NVIDIA QUADRO customer Competitive market Wide range of CAD/CAE applications Experienced with remote desktop solutions
Who are we talking about
Growing globalization within the company
Enabling remote sites across the globe
Increasing competition to hire the best
Allowing employees, partners and contractors to work from anywhere
Increasing competition to design and build faster with better quality
Increasing productivity and flexibility Enable collaboration between internal and external teams
Increasing security breaches
Increasing the security and compliance
German law ( “Arbeitnehmerueberlassung”)
Enabling contractors to work off premise
Business Drivers and initiatives
Wouldn’t it be great if….
ON ANY DEVICE FROM ANYWHERE COLLABORATION PRODUCTIVITY INCREASE SECURITY & COMPLIANCE LESS REDUNDANT INFRASTRUCTURE
Project Start – early 2013
Evaluation of multiple remote solutions Interest in HP Blades due to the high density of GPUs Customer received a sales pitch on NVIDIA vGPU & XenDesktop Overall plan was to evaluate NVIDIA vGPU in early beta under NDA and compare NVIDIA vGPU vs. GPU Passthrough
Once upon a time ...
when the customer started
Citrix vGPU announced during Synergy Keynote May 2013 Citrix & Nvidia Partnership since 2008 GRID Announced during Nvidia GTC Keynote May 2012 Nvidia RTM 2013 Sep 2013 vGPU Tech Preview Oct 2013 vGPU General Availability Dec 2013
Somewhere in between
[root@SM01 ~]# xe vm-list name-label=Win7-vGPU-01 uuid ( RO) : 831ab2f3-8e23-e876-d92a-16810a85499e name-label ( RW): Win7-vGPU-01 power-state ( RO): halted [root@SM01 ~]# xe vgpu-create vm-uuid=831ab2f3-8e23-e876-d92a-16810a85499e gpu-group-uuid=d840caad- 2ce0-6395-78a5-9ac984667412 vgpu-type-uuid=5514073f-6d7b-90c6-6648-2335ad1cc81a 23908c99-eecb-835e-fd46-5936e0a3bf652 That’s our vGPU object as seen by the hypervisor (XenServer 6.2)
Evolution of Nvidia GRID / vGPU : 2013 vGPU beta
Only 5 vGPU profiles + passthrough available K100, K140Q, K200, K240Q, K260Q, passthrough Limited to Windows 7 only Creating a passthrough or vGPU objects was possible through CLI only :( No way to use passthrough and vGPU VMs at the same time XenServer 6.2 only ( special patched) Very limited hardware available
Evolution of Nvidia GRID / vGPU : 2013 RTM
Same 5 vGPU profiles + passthrough available K100, K140Q, K200, K240Q, K260Q, passthrough Creation of vGPUs and monitoring of pGPUs through CLI or XenCenter (GUI) Mass creations of vGPU enabled VMs through Desktop Studio (XenDesktop >7.1) Passthrough and vGPU VMs can be run simultaneously XenServer 6.2 SP1 with 64 vGPUs w0rk5 f0r m3 ... s0 ch3ck the uuids, bl00dy n00b !! I can‘t get it to work :-(
Lifecycle of a successful GRID implementation
Phase 1 (TechDemo)
Conduct a techdemo for CAD/CAM responsibles / engineers that leads to a „WOW“-effect.
Phase 2 (Assessment & small and focussed PoC) Phase 3 (widened PoC based on feedback) Phase 4 (Implementation/User Acceptance/Production) Phase 5 (Maintenance / Update / Daily Use)
Sales pitch & TechDemo – create the „WOW-effect“
We did a sales pitch on Nvidia GRID and a very convincing TechDemo of Citrix XenDesktop with vGPU on XenServer to create the WOW-effect Demo Applications like Nvidia Hair, Nvidia Faceworks, Design Garage, Blender, VRRender, Autodesk 30 Day Trials or JT2Go have been used because
- f the lack of licenses and deep CATIA / SolidEdge / Siemens NX knowledge
Demonstrated access from mobile platforms ( Android - Galaxy Tab 10.1 and iOS - iPad) We used Cloud-hosted Demo Center which proved this solution will work
- ver WAN as well
Focused on user experience and used peripherals (i.e Spacepilot)
From WOW to HOW ? Next steps
Phase 1 (TechDemo) Phase 2 (Assessment and very focussed PoC)
Start with a strictly defined use case ( LAN only, specific applications, small usergroup) Collect feedback on user experience, network
Phase 3 (widened PoC based on feedback)
Evaluate user feedback Widening use cases like remote access (WAN) Use more complex drawings / models and higher end use cases ( Engineer vs. Viewer only )
Phase 4 (Implementation/User Acceptance/Production) Phase 5 (Maintenance / Update / Daily Use)
Components involved
Dassault CATIA, Siemens NX, AutoDesk products, PTC Creo, JT2GO Dual Socket Server with two NVIDIA GRID K2 Hypervisor NVIDIA vGPU Driver Citrix Virtual Desktop Agent CAD Application Citrix XenDesktop 7.1 or 7.5 NVIDIA Display Driver 332.83 & corresponding vGPU Manager version Citrix XenServer 6.2 SP1 2x Intel E5-2690 v2, 256 GB RAM, SSDs, 2x GRID K2
POC – Define virtual Workstations
User Segmentation OS vCPUs Virtual GPU Frame Buffer GPU Mode Remoting Stack GPUs per host (2x GRID K2) Entry
Windows 7 4 GRID K220Q 512 NVIDIA vGPU Citrix XenDesktop 32
Medium
Windows 7 4 GRID K240Q 1024 NVIDIA vGPU Citrix XenDesktop 16
Advanced
Windows 7 4 GRID K260Q 2048 NVIDIA vGPU Citrix XenDesktop 8
Expert
Windows 7 4 GRID “K280Q” 4096 Passthrough Citrix XenDesktop 4
Medium
Linux 4 GRID K2 4096 Passthrough NICE DCV, HP RGS 4
Expert
Linux 4 GRID K2 4096 Passthrough NICE DCV, HP RGS 4
Technical challenges
Physical laws (latency, bandwidth,packet loss) Matching workstation-like user experience Server / Client side rendered mouse cursor Endpoint devices & endpoint performance (i.e. ThinClients) High screen resolution – lots of data (UHD/4K) Framerate / Low bandwidth / Graphics quality API support Distributed locations Peripheral devices
Bandwidth, Latency, Network Quality
Quality and performance are in close relationship with available network (bandwidth) and distance (latency)
Average User ~1-2 Mbps * Expert User ~4-5 Mbps * 20 Mbps for ~15 CAD/CAM Engineers *
Influencing parameters
Windows size and number of monitors Screen resolution Size of models, different usage patterns (VR, CAD, DMU, 3D-Viewing, etc.) Individual perception / level of acceptance (User Experience)
* average measurements
Source: Customer presentation
Technical Pitfalls we experienced
64bit hardware (MMIO - BAR Mapping) Server and GRID Card BIOSes NUMA – Server architecture Endpoint devices & performance (i.e. ThinClients + supported protocols) Framebuffer grabbing (NVFBC / Monterey API)
POC– End user feedback
Source: Customer presentation
User Segmentation OS vCPUs Virtual GPU Frame Buffer GPU Mode Remoting Stack GPUs per host (2x GRID K2) Entry
Windows 7 4 GRID K220Q 512 NVIDIA vGPU Citrix XenDesktop 32
Medium
Windows 7 4 GRID K240Q 1024 NVIDIA vGPU Citrix XenDesktop 16
Advanced
Windows 7 4 GRID K260Q 2048 NVIDIA vGPU Citrix XenDesktop 8
Expert
Windows 7 4 GRID “K280Q” 4096 Passthrough Citrix XenDesktop 4
Medium
Linux 4 GRID K2 4096 Passthrough NICE DCV, HP RGS 4
Expert
Linux 4 GRID K2 4096 Passthrough NICE DCV, HP RGS 4
POC – IT administrator evaluation
Too little GPU frame buffer and not enough CPU resources Great performance but doesn’t build the business case Great performance and great scalability for most users Great performance and good scalability for many users
POC – Sizing learning
Frame Buffer 3D Engine
NVIDIA QUADRO Time scheduling allows highest densities without compromising performance Customers need to understand the GPU requirements of their applications
VS
3D Engine Frame Buffer
NVIDIA GRID vGPU
POC – Organizational challenges
Source: Customer presentation
Clarification of support by the software vendors Decision on license model for CAx-Applications on virtual machines
- international usage
- usage by external partners, etc.
Adjusting applications or the associated environment for an
- ptimal use of the applications on
virtual machines Support model for company internal and external users Targets Project Schedule Project result must be a validated technical solution which will be provided to customers internal departments and their external development partners as an IT Service
Lifecycle of a successful GRID implementation
Phase 1 (TechDemo) Phase 2 (Assessment & small and focussed PoC) Phase 3 (widened PoC based on feedback) Phase 4 (Implementation/User Acceptance/Production)
Educate support engineers / introduce support matrix Implement daily managment processes like provisioning of new and patching of existing VMs
Phase 5 (Maintenance / Update / Daily Use)
Meanwhile things changed …
Evolution of Nvidia GRID / vGPU : 2014 vGPU 1.1
Introduced 2 additional vGPU profiles K100, K120Q, K140Q, K200, K220Q, K240Q, K260Q, passthrough Powershell Interface available nView and NVWMI supported on all vGPUs Windows 8.1 and Windows Server 2012 R2 signed drivers are included Various bugfixes Expanded certified servers and certified applications list
Evolution of Nvidia GRID / vGPU : 2015 vGPU 1.2
Introduced 3 additional vGPU profiles K100, K120Q, K140Q, K160Q, K180Q, K200, K220Q, K240Q, K260Q, K280Q, passthrough XenServer 6.2 SP1 and XenServer 6.5 96 vGPUs per host on XenServer 6.5
...and more to come ... stay connected
Many customer are now in full production
Many customer are now in full production
Did we succeed ? How can we improve further?
Growing globalization within the company
Enabling remote sites across the globe
Increasing competition to hire the best
Allowing employees, partners and contractors to work from anywhere
Increasing competition to design and build faster with better quality
Increasing productivity and flexibility Enable collaboration between internal and external teams
Increasing security breaches
Increasing the security and compliance
German law ( “Arbeitnehmerueberlassung”)
Enabling contractors to work off premise
Have we been successful ?
√ √ √ √ √ √
Lifecycle of a successful GRID implementation
Phase 1 (TechDemo) Phase 2 (Assessment & small and focussed PoC) Phase 3 (widened PoC based on feedback) Phase 4 (Implementation/User Acceptance/Production) Phase 5 (Maintenance / Update / Daily Use)
Maintenenance / Update / Daily Use
Upgrade to XenServer 6.5 Upgrade to XenDesktop 7.6 Upgrade to new Grid vGPU Manager and in-guest drivers Lifecycle of applications, VMs & associated Baseimage
Educate the Nvidia/Citrix partners Higher density (use H.264 Hardware encoding) User Experience in hostile network environments (Framehawk) Provide Linux based VMs for CAE etc Collaboration features Best practices and application specific whitepaper
Room for improvement
Lifecycle of a successful GRID implementation
Q & A
Summary - How successful projects lift off
Familiarize yourself with GRID (self paced learning, demo/test system) Do a proper assessment on existing workstations (real GPU usage) Leverage or build a close relationship with vendor (Citrix, Nvidia, etc) Set the right expectations Find a sponsor with a need to change the tradional workplace Involve ALL people ( IT , CAD/CAM department, endusers, decision makers experienced virtualization partner) Leverage partners who are familiar with desktop virtualization Specify phases ( TechDemo, PoC, Implementation, Production ) Continously listen to End-User feedback