S7105 ADAS/AD CHALLENGES: GPU SCHEDULING & SYNCHRONIZATION - PowerPoint PPT Presentation

S7105 – ADAS/AD CHALLENGES: GPU SCHEDULING & SYNCHRONIZATION Venugopala Madumbu, NVIDIA GTC 2017 – 210D

ADVANCED DRIVING ASSIST SYSTEMS (ADAS) & AUTONOMOUS DRIVING (AD) High Compute Workloads Mapped to GPU 2

ADAS/AD Requirements & Challenges Real-Time Behavior Performance • Determinism • Maximum Throughput • Freedom from Interference • Minimal Latency • Priority of Functionalities Multi-Core GPU/DSP/HWA CPU 3

ADAS/AD WORKLOADS Challenges Illustrated If so, How to Scenario#1 – Standalone Exec Scenario#3 – Concurrent Exec • Achieve determinism Achieve Freedom from interference • GL GL Workload Workload Prioritize one Workload over other • X msec While also having maximum throughput CUDA Workload • minimum latency • > (X+Y) msec Scenario#2 – Standalone Exec CUDA Workload CUDA GL Workload Workload Y msec X msec Time Shared GPU Execution Y msec 4

GPU IN TEGRA High Level Tegra SoC Block Diagram GPU Other Clients CPU CPU submits job/work to GPU Host Engines (ISP , Display, etc.) GPU runs asynchronously to CPU GPU Memory Interface GPU has its own hardware scheduler (Host) It switches between workloads without CPU involvement Memory Controller DRAM 5

GPU SCHEDULING Concepts Channel – independent stream of work on the GPU Command Push Buffer – Command buffer written by Software and read by Hardware Channel Switching – Save/restore GPU state on a channel switch Semaphores/SyncPoints – Synchronization mechanism for events within the GPU Time Slice – How long a GPU executes commands of a channel before a channel switch Run-list – An ordered list of channels that SW wants the GPU to execute 6

GPU SCHEDULING Timesharing by Channel Switching App1 App3 App4 App2 Channel switching occurs when any ONE of the following happens: • Time slice expires • Engine runs out of work (no more Timesliced Round-Robin commands) • Blocked on a semaphore GPU Channel Switch time = Drain Time + Save/Restore time . . . . . . Preemption can reduce Channel Switch times drastically GPU Occupancy Time 7

GPU SCHEDULING Preemption 8

GPU SCHEDULING Channel Switching with Time Slice Scenarios 1. Channel finishes before time slice expires Channel 1 Context switch to next channel Time slice 2. Channel preemption Stop all commands in pipeline Channel 1 Channel 1 Wait for engines to idle Higher Context Switch time Time slice Channel Switch Timeout 3. Channel Reset Engine could not idle and context could Channel 1 Channel 1 not save before channel switch timeout Callback to notify kernel of channel Time slice Channel Switch Channel Reset reset event Timeout 9

CHALLENGE REVISTED How can we achieve both? Performance: Real-Time behavior: • Maximum Throughput • Determinism • Minimal Latency • Freedom from Interference • Priority of Functionalities 10

GPU SYNCHRONIZATION & SCHEDULING Software Control 1. User Driver Level (GPU Synchronization Approach) • Syncpoints/Semaphores for Synchronization • Through EglStreams, EGLSync etc 2. Kernel Driver Level (GPU Priority Scheduling Approach) • Run-List Engineering How long channel runs • Order of Channel execution • 11

GPU SYNCHRONIZATION APPROACH No Synchronization Case Kernel launch GPU Semaphore Latency due to Priority GPU Task Concurrent Execution GPU Task GPU Task GPU CPU Task CPU Task CPU Task CPU 15 30 0 5 10 20 25 35 msec 12

GPU SYNCHRONIZATION APPROACH Synchronization on CPU: Not good for GPU Kernel launch GPU Semaphore Priority GPU Task GPU Task GPU Task GPU CPU Task CPU Task CPU Task CPU 15 30 0 5 10 20 25 35 msec 13

GPU SYNCHRONIZATION APPROACH Synchronization on GPU: No Context Switches Kernel launch GPU Semaphore  Determinism Priority GPU Task GPU Task  Freedom from Delayed GPU Interference Start Task GPU  Priority of Functionalities CPU Task CPU Task CPU Task CPU 15 30 0 5 10 20 25 35 msec 14

GPU PRIORITY SCHEDULING APPROACH Hypothetical Example WORST CASE TASK PRIORITY FPS EXECUTION TIME (WCET) H1 H1 High 60 9ms M1 M1 Medium 30 4ms M2 M2 Medium 30 4ms L1 Low/Best Effort 30 10ms L1 15

GPU PRIORITY SCHEDULING APPROACH Engineered Run-list and Time Slice Ensuring FPS and Latency M1 (Max Exec Time = 4 ms) H1 H1 H1 (Max Exec Time = 9 ms) Time slice = 3 ms Time slice = 9 ms M1 M1 M2 (Max Exec Time = 4 ms) L1 (Max Exec Time = 10 ms) M2 M2 Time slice = 1 ms Time slice = 3 ms L1 Run-List Ensured not >16ms for 60fps operation . . . . . . Work on GPU Time 16

GPU PRIORITY SCHEDULING APPROACH Reduce Latency for GPU Work Completion  Ensure timeslice is long enough to complete work  Ensure work is continually submitted and also well ahead in time To Avoid • GPU idle time • Unnecessary context switches • 17

GPU SCHEDULING Best Practices to Keep GPU Busy  Submit work in advance • So the GPU has some work to execute at any point of time  Try to reduce/eliminate work dependencies  Have contingency plan for work overload If feedback shows over budget, submit work few frames ahead and spread •  Plan for worst case scenario • Deal with GPU reset case esp for the Low priority cases GL Robustness Extensions • 18

CONCLUSION GPU Synchronization & Scheduling Approaches Performance: Real-Time behavior: • Maximum Throughput • Determinism • Minimal Latency • Freedom from Interference • Priority of Functionalities 19

ACKNOWLEDGEMENTS • Scott Whitman, NVIDIA • Vladislav Buzov, NVIDIA • Amit Rao, NVIDIA • Yogesh Kini, NVIDIA GTC Instructor led Lab:: L7105 – EGLSTREAMS : INTEROPERABILITY OF CAMERA, CUDA AND OPENGL 11 TH MAY 2017 9:30-11:30AM LL21D 20

Q & A 21

THANK YOU

S7105 ADAS/AD CHALLENGES: GPU SCHEDULING & SYNCHRONIZATION - PowerPoint PPT Presentation

S7105 ADAS/AD CHALLENGES: GPU SCHEDULING & SYNCHRONIZATION Venugopala Madumbu, NVIDIA GTC 2017 210D ADVANCED DRIVING ASSIST SYSTEMS (ADAS) & AUTONOMOUS DRIVING (AD) High Compute Workloads Mapped to GPU 2 ADAS/AD Requirements

From ADAS to Automated Driving Matthew Avery Director of Research The Story So Far: Advanced

Value of grass/herbal leys in improving soil quality Anne Bhogal, ADAS Using leys to

BIOKENAF WP7 - Economic Analysis of the Crop Production Chain Rebecca Heaton ADAS

The Department of Justice and the ADAs Integration Mandate: A Year in Review March 29, 2019

Blackpool Hospital Trust Sean OBrien Senior Anticoagulation Practitioner WHAT DO WE DO!

- 1 - CILR wish to thank Rabbi Daniel Green of the Adas Israel Congregation for making this

Alpha Presentation Integration & Testing Suite for ADAS Sensors The Capstone Experience Team

Autonomous driving visual perception on the DRIVE PX2 Dr. Antonio Espinosa http://

practices Ian Kemp Commercial Motor Underwriting Director, UK 13 th February 2018 Todays Agenda

Embedded Bayesian Perception & Risk Assessment for ADAS & Autonomous Cars Christian

Beta Presentation Integration & Testing Suite for ADAS Sensors The Capstone Experience Team

Project Plan Integration & Testing Suite for ADAS Sensors The Capstone Experience Team Bosch

POF Knowledge Development EMC Lessons Learnt on Gigabit Ethernet Implementation for ADAS & AV

Using Ada95 to Build Software for a Gigabit Layer 7 IP Using Ada95 to Build Software for a Gigabit

ADAS COMPUTER VISION AND AUGMENTED REALITY SOLUTION Sergii Bykov, Technical Lead TECHNOLOGY

DEEP LEARNING IN THE FIELD OF AUTONOMOUS DRIVING AN OUTLINE OF THE DEPLOYMENT PROCESS FOR ADAS

Core-Aware Scheduling: Balancing Application Parallelism with Core Availability Henry Qin

Introduction to Program Analysis: A Pointer Centric View Uday Khedker

Trust in the Digital World Vienna 7-8 April 2014 Panel: Trusted Personal data Management

Market Analysis Findings 1 Project Overview and Status We are here Completing market assessment

SPECTRES, VIRTUAL GHOSTS, AND HARDWARE SUPPORT Xiaowan Dong University of

Entropy-based Concept Shift Detection Peter Vorburger, Abraham Bernstein University of Zurich

Porting FreeBSD to AArch64 Andrew Turner andrew@fubar.geek.nz 12 June 2015 About me Source

How$to$Measure$RTOS$ Performance Colin/Walls colin_walls@mentor.com

S7105 ADAS/AD CHALLENGES: GPU SCHEDULING & SYNCHRONIZATION - PowerPoint PPT Presentation

S7105 ADAS/AD CHALLENGES: GPU SCHEDULING & SYNCHRONIZATION Venugopala Madumbu, NVIDIA GTC 2017 210D ADVANCED DRIVING ASSIST SYSTEMS (ADAS) & AUTONOMOUS DRIVING (AD) High Compute Workloads Mapped to GPU 2 ADAS/AD Requirements

From ADAS to Automated Driving Matthew Avery Director of Research The Story So Far: Advanced

Value of grass/herbal leys in improving soil quality Anne Bhogal, ADAS Using leys to

BIOKENAF WP7 - Economic Analysis of the Crop Production Chain Rebecca Heaton ADAS

The Department of Justice and the ADAs Integration Mandate: A Year in Review March 29, 2019

Blackpool Hospital Trust Sean OBrien Senior Anticoagulation Practitioner WHAT DO WE DO!

- 1 - CILR wish to thank Rabbi Daniel Green of the Adas Israel Congregation for making this

Alpha Presentation Integration &amp; Testing Suite for ADAS Sensors The Capstone Experience Team

Autonomous driving visual perception on the DRIVE PX2 Dr. Antonio Espinosa http://

practices Ian Kemp Commercial Motor Underwriting Director, UK 13 th February 2018 Todays Agenda

Embedded Bayesian Perception &amp; Risk Assessment for ADAS &amp; Autonomous Cars Christian

Beta Presentation Integration &amp; Testing Suite for ADAS Sensors The Capstone Experience Team

Project Plan Integration &amp; Testing Suite for ADAS Sensors The Capstone Experience Team Bosch

POF Knowledge Development EMC Lessons Learnt on Gigabit Ethernet Implementation for ADAS &amp; AV

Using Ada95 to Build Software for a Gigabit Layer 7 IP Using Ada95 to Build Software for a Gigabit

ADAS COMPUTER VISION AND AUGMENTED REALITY SOLUTION Sergii Bykov, Technical Lead TECHNOLOGY

DEEP LEARNING IN THE FIELD OF AUTONOMOUS DRIVING AN OUTLINE OF THE DEPLOYMENT PROCESS FOR ADAS

Core-Aware Scheduling: Balancing Application Parallelism with Core Availability Henry Qin

Introduction to Program Analysis: A Pointer Centric View Uday Khedker

Trust in the Digital World Vienna 7-8 April 2014 Panel: Trusted Personal data Management

Market Analysis Findings 1 Project Overview and Status We are here Completing market assessment

SPECTRES, VIRTUAL GHOSTS, AND HARDWARE SUPPORT Xiaowan Dong University of

Entropy-based Concept Shift Detection Peter Vorburger, Abraham Bernstein University of Zurich

Porting FreeBSD to AArch64 Andrew Turner andrew@fubar.geek.nz 12 June 2015 About me Source

How$to$Measure$RTOS$ Performance Colin/Walls colin_walls@mentor.com

Alpha Presentation Integration & Testing Suite for ADAS Sensors The Capstone Experience Team

Embedded Bayesian Perception & Risk Assessment for ADAS & Autonomous Cars Christian

Beta Presentation Integration & Testing Suite for ADAS Sensors The Capstone Experience Team

Project Plan Integration & Testing Suite for ADAS Sensors The Capstone Experience Team Bosch

POF Knowledge Development EMC Lessons Learnt on Gigabit Ethernet Implementation for ADAS & AV