UNVEILING THE IMPACT OF TIME SLICING WITH NVIDIA GRID Erik - - PowerPoint PPT Presentation

▶

Apr 12, 2023 149 likes •327 views

April 4-7, 2016 | Silicon Valley UNVEILING THE IMPACT OF TIME SLICING WITH NVIDIA GRID Erik Bohnhorst, GRID Performance Architect, 04/04/2016 GRID vGPU resource sharing (Simplified) Time Slicing works AGENDA Using benchmarks to evaluate

SLIDE 1

April 4-7, 2016 | Silicon Valley

Erik Bohnhorst, GRID Performance Architect, 04/04/2016

UNVEILING THE IMPACT OF TIME SLICING WITH NVIDIA GRID

SLIDE 2

AGENDA

GRID vGPU resource sharing (Simplified) Time Slicing works Using “benchmarks” to evaluate GRID Impact of realistic recommendations

SLIDE 3

GRID vGPU RESOURCE SHARING

Tesla GPU (simplified view) vGPU-1 vGPU-2 vGPU-n

…

Graphics Compute Video Encode Video Decode Copy Engine

GPU Engines

Framebuffer

VM-1 FB VM-2 FB VM-n FB

… Each vGPU is assigned a fixed range of framebuffer for its exclusive use

t=1 t=2 t=16

GPU engines (Graphics/Compute, Video Encode, Video Decode and Copy Engine) are time sliced and can execute in parallel Each vGPU has exclusive access to the entire engine during its time slice (all CUDA cores)

SLIDE 4

EXAMPLE OF SINGLE VM TESTING

ViewPerf 12 Catia viewset

Framebuffer

Framebuffer usage in MB Scores

2Q,4Q and 8Q vGPU profile perform equally (Sufficient framebuffer + Entire time of the 3D Engine)

SLIDE 5

WHY TIME SLICING WORKS

SLIDE 6

BUT WHAT IF THE RESOURCE IS UNDER “LOAD”

JT2GO – Zooming/Panning/Rotating

SLIDE 7

EXAMPLE OF A GPU “HEAVY” TASK

Performance utilization during 22 seconds

Zooming

Panning Rotating

GPU seems to be constantly utilized during zooming, panning and rotating

SLIDE 8

EXAMPLE OF A GPU “HEAVY” TASK

Performance utilization during 1 second

Lots of unused time in between spikes GPU CPU

SLIDE 9

CLOSE LOOK AT A RESOURCE UNDER “LOAD”

NVIDIA uses ultra high-end GPUs with GRID for maximum available time Time a different process/virtual machine can use the GPU Time a different process/virtual machine can use the GPU

SLIDE 10

EXAMPLE OF 6 ACTIVE USERS

6 users running “heavy” tasks

NVIDIA uses ultra high-end GPUs with GRID for maximum available time

SLIDE 11

EXAMPLE OF MANY ACTIVE USERS

Many users running “heavy” tasks

NVIDIA uses ultra high-end GPUs with GRID for maximum available time

SLIDE 12

WHY “BENCHMARKS” DON’T APPLY FOR EVALUATION

GPU heavy process (zooming)

Human workflow

VP12 Catia viewset

Benchmark

Synthetic workload

SLIDE 13

WHY “BENCHMARKS” DON’T APPLY FOR EVALUATION

Unrealistic users per host recommendation using benchmarks

Active Active

Performance

Active

Realistic users per host recommendation using human workflows

Performance Time

Idle Idle Idle Active Idle

Time

Idle Idle Idle Active Active Active Active Active Active Idle Idle Idle Active Active Active Active

Time

…

SLIDE 14

IMPACT OF USING BENCHMARKS

Realistic users per host (UPH) recommendations can only be generated with human workflows. Evaluate GRID by monitoring human workflows by working with a small group of real end users on GRID vGPU.

Monitor Configure/ Change Run

SLIDE 15

IMPACT OF REALISTIC RECOMMENDATIONS

“BENCHMARKS” REAL USERS Cost per Server

$30,000 $40,000

Users per Host

8 16

Software Costs

Per User Per User

Cost per User (Server Hardware)

$3,750 $2,500

“Cost per User” drops ”significantly” when evaluating GRID with real users.

SLIDE 16

SUMMARY

Time Slicing the 3D Engine allows sharing based on actual need for great performance at scale. Leveraging benchmarks results in unrealistic recommendations and too low user per host recommendations. Too low user per host recommendations create unrealistic TCO/ROI assumptions Too high TCO/ROI assumptions could delay/kill the project

SLIDE 17

April 4-7, 2016 | Silicon Valley

THANK YOU

JOIN THE NVIDIA DEVELOPER PROGRAM AT developer.nvidia.com/join