NVIDIA GRID DEPLOYMENT ERIK BOHNHORST , SR. GRID SOLUTION ARCHITECT - - PowerPoint PPT Presentation

▶

Oct 24, 2023 1.31k likes •1.74k views

S5393 - EVOLUTION OF AN NVIDIA GRID DEPLOYMENT ERIK BOHNHORST , SR. GRID SOLUTION ARCHITECT , NVIDIA RONALD GRASS, SR. SYSTEMS ENGINEER, CITRIX SYSTEMS What we will cover Who implemented NVIDIA GRID with Citrix XenDesktop Why did they want

SLIDE 1

ERIK BOHNHORST , SR. GRID SOLUTION ARCHITECT , NVIDIA RONALD GRASS, SR. SYSTEMS ENGINEER, CITRIX SYSTEMS

S5393 - EVOLUTION OF AN NVIDIA GRID™ DEPLOYMENT

SLIDE 2

Who implemented NVIDIA GRID with Citrix XenDesktop Why did they want to move to a remote desktop solution How did they evaluate and implemented NVIDIA GRID Sales pitch & TechDemo Proof of concept Production environment Challenges and learnings How will they move forward

What we will cover

SLIDE 3

Manufacturing vertical NVIDIA QUADRO customer Competitive market Wide range of CAD/CAE applications Experienced with remote desktop solutions

Who are we talking about

SLIDE 4

Growing globalization within the company

Enabling remote sites across the globe

Increasing competition to hire the best

Allowing employees, partners and contractors to work from anywhere

Increasing competition to design and build faster with better quality

Increasing productivity and flexibility Enable collaboration between internal and external teams

Increasing security breaches

Increasing the security and compliance

German law ( “Arbeitnehmerueberlassung”)

Enabling contractors to work off premise

Business Drivers and initiatives

SLIDE 5

Wouldn’t it be great if….

ON ANY DEVICE FROM ANYWHERE COLLABORATION PRODUCTIVITY INCREASE SECURITY & COMPLIANCE LESS REDUNDANT INFRASTRUCTURE

SLIDE 6

Project Start – early 2013

Evaluation of multiple remote solutions Interest in HP Blades due to the high density of GPUs Customer received a sales pitch on NVIDIA vGPU & XenDesktop Overall plan was to evaluate NVIDIA vGPU in early beta under NDA and compare NVIDIA vGPU vs. GPU Passthrough

SLIDE 7

Once upon a time ...

when the customer started

SLIDE 8

Citrix vGPU announced during Synergy Keynote May 2013 Citrix & Nvidia Partnership since 2008 GRID Announced during Nvidia GTC Keynote May 2012 Nvidia RTM 2013 Sep 2013 vGPU Tech Preview Oct 2013 vGPU General Availability Dec 2013

Somewhere in between

SLIDE 9

[root@SM01 ~]# xe vm-list name-label=Win7-vGPU-01 uuid ( RO) : 831ab2f3-8e23-e876-d92a-16810a85499e name-label ( RW): Win7-vGPU-01 power-state ( RO): halted [root@SM01 ~]# xe vgpu-create vm-uuid=831ab2f3-8e23-e876-d92a-16810a85499e gpu-group-uuid=d840caad- 2ce0-6395-78a5-9ac984667412 vgpu-type-uuid=5514073f-6d7b-90c6-6648-2335ad1cc81a 23908c99-eecb-835e-fd46-5936e0a3bf652  That’s our vGPU object as seen by the hypervisor (XenServer 6.2)

Evolution of Nvidia GRID / vGPU : 2013 vGPU beta

Only 5 vGPU profiles + passthrough available K100, K140Q, K200, K240Q, K260Q, passthrough Limited to Windows 7 only Creating a passthrough or vGPU objects was possible through CLI only :( No way to use passthrough and vGPU VMs at the same time XenServer 6.2 only ( special patched) Very limited hardware available

Evolution of Nvidia GRID / vGPU : 2013 RTM

Same 5 vGPU profiles + passthrough available K100, K140Q, K200, K240Q, K260Q, passthrough Creation of vGPUs and monitoring of pGPUs through CLI or XenCenter (GUI) Mass creations of vGPU enabled VMs through Desktop Studio (XenDesktop >7.1) Passthrough and vGPU VMs can be run simultaneously XenServer 6.2 SP1 with 64 vGPUs w0rk5 f0r m3 ... s0 ch3ck the uuids, bl00dy n00b !! I can‘t get it to work :-(

SLIDE 10

Lifecycle of a successful GRID implementation

Phase 1 (TechDemo)

Conduct a techdemo for CAD/CAM responsibles / engineers that leads to a „WOW“-effect.

Phase 2 (Assessment & small and focussed PoC) Phase 3 (widened PoC based on feedback) Phase 4 (Implementation/User Acceptance/Production) Phase 5 (Maintenance / Update / Daily Use)

SLIDE 11

Sales pitch & TechDemo – create the „WOW-effect“

We did a sales pitch on Nvidia GRID and a very convincing TechDemo of Citrix XenDesktop with vGPU on XenServer to create the WOW-effect Demo Applications like Nvidia Hair, Nvidia Faceworks, Design Garage, Blender, VRRender, Autodesk 30 Day Trials or JT2Go have been used because

f the lack of licenses and deep CATIA / SolidEdge / Siemens NX knowledge

Demonstrated access from mobile platforms ( Android - Galaxy Tab 10.1 and iOS - iPad) We used Cloud-hosted Demo Center which proved this solution will work

ver WAN as well

Focused on user experience and used peripherals (i.e Spacepilot)

SLIDE 12

From WOW to HOW ? Next steps

Phase 1 (TechDemo) Phase 2 (Assessment and very focussed PoC)

Start with a strictly defined use case ( LAN only, specific applications, small usergroup) Collect feedback on user experience, network

Phase 3 (widened PoC based on feedback)

Evaluate user feedback Widening use cases like remote access (WAN) Use more complex drawings / models and higher end use cases ( Engineer vs. Viewer only )

Phase 4 (Implementation/User Acceptance/Production) Phase 5 (Maintenance / Update / Daily Use)

SLIDE 13

Components involved

Dassault CATIA, Siemens NX, AutoDesk products, PTC Creo, JT2GO Dual Socket Server with two NVIDIA GRID K2 Hypervisor NVIDIA vGPU Driver Citrix Virtual Desktop Agent CAD Application Citrix XenDesktop 7.1 or 7.5 NVIDIA Display Driver 332.83 & corresponding vGPU Manager version Citrix XenServer 6.2 SP1 2x Intel E5-2690 v2, 256 GB RAM, SSDs, 2x GRID K2

SLIDE 14

POC – Define virtual Workstations

User Segmentation OS vCPUs Virtual GPU Frame Buffer GPU Mode Remoting Stack GPUs per host (2x GRID K2) Entry

Windows 7 4 GRID K220Q 512 NVIDIA vGPU Citrix XenDesktop 32

Medium

Windows 7 4 GRID K240Q 1024 NVIDIA vGPU Citrix XenDesktop 16

Advanced

Windows 7 4 GRID K260Q 2048 NVIDIA vGPU Citrix XenDesktop 8

Expert

Windows 7 4 GRID “K280Q” 4096 Passthrough Citrix XenDesktop 4

Medium

Linux 4 GRID K2 4096 Passthrough NICE DCV, HP RGS 4

Expert

Linux 4 GRID K2 4096 Passthrough NICE DCV, HP RGS 4

SLIDE 15

Technical challenges

Physical laws (latency, bandwidth,packet loss) Matching workstation-like user experience Server / Client side rendered mouse cursor Endpoint devices & endpoint performance (i.e. ThinClients) High screen resolution – lots of data (UHD/4K) Framerate / Low bandwidth / Graphics quality API support Distributed locations Peripheral devices

SLIDE 16

Bandwidth, Latency, Network Quality

Quality and performance are in close relationship with available network (bandwidth) and distance (latency)

Average User ~1-2 Mbps * Expert User ~4-5 Mbps * 20 Mbps for ~15 CAD/CAM Engineers *

Influencing parameters

Windows size and number of monitors Screen resolution Size of models, different usage patterns (VR, CAD, DMU, 3D-Viewing, etc.) Individual perception / level of acceptance (User Experience)

* average measurements

Source: Customer presentation

SLIDE 17

Technical Pitfalls we experienced

64bit hardware (MMIO - BAR Mapping) Server and GRID Card BIOSes NUMA – Server architecture Endpoint devices & performance (i.e. ThinClients + supported protocols) Framebuffer grabbing (NVFBC / Monterey API)

SLIDE 18

POC– End user feedback

Source: Customer presentation

SLIDE 19

User Segmentation OS vCPUs Virtual GPU Frame Buffer GPU Mode Remoting Stack GPUs per host (2x GRID K2) Entry

Windows 7 4 GRID K220Q 512 NVIDIA vGPU Citrix XenDesktop 32

Medium

Windows 7 4 GRID K240Q 1024 NVIDIA vGPU Citrix XenDesktop 16

Advanced

Windows 7 4 GRID K260Q 2048 NVIDIA vGPU Citrix XenDesktop 8

Expert

Windows 7 4 GRID “K280Q” 4096 Passthrough Citrix XenDesktop 4

Medium

Linux 4 GRID K2 4096 Passthrough NICE DCV, HP RGS 4

Expert

Linux 4 GRID K2 4096 Passthrough NICE DCV, HP RGS 4

POC – IT administrator evaluation

Too little GPU frame buffer and not enough CPU resources Great performance but doesn’t build the business case Great performance and great scalability for most users Great performance and good scalability for many users

SLIDE 20

POC – Sizing learning

Frame Buffer 3D Engine

NVIDIA QUADRO Time scheduling allows highest densities without compromising performance Customers need to understand the GPU requirements of their applications

3D Engine Frame Buffer

NVIDIA GRID vGPU

SLIDE 21

POC – Organizational challenges

Source: Customer presentation

Clarification of support by the software vendors Decision on license model for CAx-Applications on virtual machines

international usage
usage by external partners, etc.

Adjusting applications or the associated environment for an

ptimal use of the applications on

virtual machines Support model for company internal and external users Targets Project Schedule Project result must be a validated technical solution which will be provided to customers internal departments and their external development partners as an IT Service

SLIDE 22

Lifecycle of a successful GRID implementation

Phase 1 (TechDemo) Phase 2 (Assessment & small and focussed PoC) Phase 3 (widened PoC based on feedback) Phase 4 (Implementation/User Acceptance/Production)

Educate support engineers / introduce support matrix Implement daily managment processes like provisioning of new and patching of existing VMs

Phase 5 (Maintenance / Update / Daily Use)

SLIDE 23

Meanwhile things changed … 

SLIDE 24

Evolution of Nvidia GRID / vGPU : 2014 vGPU 1.1

Introduced 2 additional vGPU profiles K100, K120Q, K140Q, K200, K220Q, K240Q, K260Q, passthrough Powershell Interface available nView and NVWMI supported on all vGPUs Windows 8.1 and Windows Server 2012 R2 signed drivers are included Various bugfixes Expanded certified servers and certified applications list

Evolution of Nvidia GRID / vGPU : 2015 vGPU 1.2

Introduced 3 additional vGPU profiles K100, K120Q, K140Q, K160Q, K180Q, K200, K220Q, K240Q, K260Q, K280Q, passthrough XenServer 6.2 SP1 and XenServer 6.5 96 vGPUs per host on XenServer 6.5

...and more to come ... stay connected

SLIDE 25

Many customer are now in full production

SLIDE 26

Many customer are now in full production

Did we succeed ? How can we improve further?

SLIDE 27

Growing globalization within the company

Enabling remote sites across the globe

Increasing competition to hire the best

Allowing employees, partners and contractors to work from anywhere

Increasing competition to design and build faster with better quality

Increasing productivity and flexibility Enable collaboration between internal and external teams

Increasing security breaches

Increasing the security and compliance

German law ( “Arbeitnehmerueberlassung”)

Enabling contractors to work off premise

Have we been successful ?

√ √ √ √ √ √

SLIDE 28

Lifecycle of a successful GRID implementation

Phase 1 (TechDemo) Phase 2 (Assessment & small and focussed PoC) Phase 3 (widened PoC based on feedback) Phase 4 (Implementation/User Acceptance/Production) Phase 5 (Maintenance / Update / Daily Use)

SLIDE 29

Maintenenance / Update / Daily Use

Upgrade to XenServer 6.5 Upgrade to XenDesktop 7.6 Upgrade to new Grid vGPU Manager and in-guest drivers Lifecycle of applications, VMs & associated Baseimage

SLIDE 30

Educate the Nvidia/Citrix partners Higher density (use H.264 Hardware encoding) User Experience in hostile network environments (Framehawk) Provide Linux based VMs for CAE etc Collaboration features Best practices and application specific whitepaper

Room for improvement

SLIDE 31

Lifecycle of a successful GRID implementation

Q & A

SLIDE 32

Summary - How successful projects lift off

Familiarize yourself with GRID (self paced learning, demo/test system) Do a proper assessment on existing workstations (real GPU usage) Leverage or build a close relationship with vendor (Citrix, Nvidia, etc) Set the right expectations Find a sponsor with a need to change the tradional workplace Involve ALL people ( IT , CAD/CAM department, endusers, decision makers experienced virtualization partner) Leverage partners who are familiar with desktop virtualization Specify phases ( TechDemo, PoC, Implementation, Production ) Continously listen to End-User feedback

SLIDE 33