Making OpenVX Really Real Time Ming Yang 1 , Tanya Amert 1 , Kecheng - PowerPoint PPT Presentation

Making OpenVX Really “Real Time” Ming Yang 1 , Tanya Amert 1 , Kecheng Yang 1,2 , Nathan Otterness 1 , James H. Anderson 1 , F. Donelson Smith 1 , and Shige Wang 3 1 The University of North Carolina at Chapel Hill 2 Texas State University 3 General Motors Research

700 ms

A new approach for graph scheduling

Shorter response time + Less capacity loss

1. State of the art 2. Our approach 3. Future work � 6

Example OpenVX Graph OpenVX Graph-based Native Downstream Node OpenVX OpenVX Camera Application architecture Node Node OpenVX Control Processing Node Application Application Portability to diverse hardware GPU FPGA DSP Does OpenVX really target “real-time” processing? � 7 Source: https://www.khronos.org/openvx/

Does OpenVX really target “real-time” processing? 1. It lacks real-time concepts 2. Entire graphs = monolithic schedulable entities Example OpenVX Graph OpenVX Node Downstream Native Camera OpenVX OpenVX Application Node Control Node Processing OpenVX Node � 8 Source: https://www.khronos.org/openvx/

Does OpenVX really target “real-time” processing? 1. It lacks real-time concepts 2. Entire graphs = monolithic schedulable entities C A D B � 9 Source: https://www.khronos.org/openvx/

Does OpenVX really target “real-time” processing? 1. It lacks real-time concepts 2. Entire graphs = monolithic schedulable entities C A D B … A B C D A A B C D Time Monolithic scheduling � 10 Source: https://www.khronos.org/openvx/

Prior Work Coarse-grained scheduling • OpenVX nodes = schedulable entities [23, 51] C A D B Task A: A A Task B: B B Task C: C C … Task D: D D Time Coarse-grained scheduling � 11

Prior Work Coarse-grained scheduling • OpenVX nodes = schedulable entities [23, 51] Remaining problems: 1. More parallelism to be explored 2. Suspension-oblivious analysis was applied and causes capacity loss. � 12

Fine-Grained Scheduling This Work

1. Coarse-grained vs. fine-grained 2. Response-time bounds analysis 3. Case study � 14

C Task A: A D Task B: B Task C: Suspension for GPU Task D: execution Time … Coarse-Grained Scheduling C Task A: A D Task E: E F G Task F: Task G: GPU execution Task C: Task D: Time Fine-Grained Scheduling � 16

Deriving Response-Time Bounds for a DAG* Step 1: Schedule the nodes as sporadic tasks Step 2: Compute bounds for every node Step 3: Sum the bounds of nodes on the critical path * C. Liu and J. Anderson, “Supporting Soft Real-Time DAG-based Systems on Multiprocessors with No Utilization Loss,” in RTSS, 2013. � 18

Deriving Response-Time Bounds for a DAG C A D B E F � 19

Deriving Response-Time Bounds for a DAG C A D B E F � 20

Deriving Response-Time Bounds for a DAG CPU … C A D B F GPU E Need a response-time … bound analysis for GPU tasks � 21

A system model of GPU Tasks T 1 Per-block worst-case Number of SM1 2048 workload blocks C 1 H 1 = 1024 τ i = ( C i , T i , B i , H i ) B 1 SM0 2048 Number of Period threads per 3 0 6 Time block (or block size) τ 1 = (3076,6,2,1024) � 22

Response-Time Bounds Proof Sketch 1. We first show the necessity of a total utilization bound and intra-task parallelism via counterexamples. � 23

Response-Time Bounds Proof Sketch 1. We first show the necessity of a total utilization bound and intra-task parallelism via counterexamples. Releases: Without intra-task 1 2 3 4 5 parallelism: 1 3 5 With intra-task 2 4 parallelism: Time � 24

Response-Time Bounds Proof Sketch 1. We first show the necessity of a total utilization bound and intra-task parallelism via counterexamples. R k 3. We prove the job SM1 finishes τ k , j before . r k , j + R k 2. We then bound the SM0 unfinished workload from jobs released at or before . r k , j Time r k , j � 25

Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling • Application: Histogram of Oriented Gradients (HOG) vxHOGCells vxHOGFeature vxHOGCells vxHOGFeature Node vxHOGCells vxHOGFeatures sNode Node sNode Node Node Compute Normalize Compute Compute Normalize Resize Image Orientation Compute Orientation Normalize Compute Resize Image Gradients Orientation Orientation Compute Histograms Gradients Histograms Orientation Resize Image Orientation Histograms Histograms Gradients Histograms Histograms CPU+GPU Execution (Coarse-Grained) GPU Execution (Fine-Grained) � 27

Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling • Application: Histogram of Oriented Gradients (HOG) • 6 instances • 33 ms period • 30,000 samples • Platform: NVIDIA Titan V GPU + Two eight-core Intel CPUs. • Schedulers: G-EDF , G-FL (fair-lateness) � 28

Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling % of samples Left is better 50% samples have response time less than 60 ms Time � 29

Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling FL: fair-lateness [1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) 65.99 136.57 84669.47 Average Response Time (ms) 125.66 427.07 170091.06 Maximum Response Time (ms) � 30

Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling [1] [2] FL: fair-lateness Half the average response time [1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) 136.57 84669.47 Average Response Time (ms) 65.99 125.66 427.07 170091.06 Maximum Response Time (ms) � 31

Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling [1] [2] FL: fair-lateness Half the average response time [1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) 136.57 84669.47 Average Response Time (ms) 65.99 427.07 170091.06 Maximum Response Time (ms) 125.66 One-third the maximum response time � 32

Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling [1] [2] FL: fair-lateness [3] Half the average response time [1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) 136.57 84669.47 Average Response Time (ms) 65.99 427.07 170091.06 Maximum Response Time (ms) 125.66 One-third the maximum response time � 33

Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling [1] [2] FL: fair-lateness [3] [3] Half the average response time [1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) 136.57 84669.47 Average Response Time (ms) 65.99 427.07 170091.06 Maximum Response Time (ms) 125.66 One-third the maximum response time � 34

Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling [1] [2] FL: fair-lateness [3] [3] [1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) 136.57 84669.47 Average Response Time (ms) 65.99 427.07 170091.06 Maximum Response Time (ms) 125.66 N/A Analytical Bound (ms) � 35

Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling [1] [2] FL: fair-lateness [3] [3] [1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) 136.57 84669.47 Average Response Time (ms) 65.99 427.07 170091.06 Maximum Response Time (ms) 125.66 N/A N/A Analytical Bound (ms) � 36

Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling [1] [2] FL: fair-lateness [3] [3] [1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) 136.57 84669.47 Average Response Time (ms) 65.99 427.07 170091.06 Maximum Response Time (ms) 125.66 N/A N/A Analytical Bound (ms) 542.39 � 37

Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling [1] [2] FL: fair-lateness [3] [3] An alert driver takes 700 ms to react. [1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) 136.57 84669.47 Average Response Time (ms) 65.99 427.07 170091.06 Maximum Response Time (ms) 125.66 N/A N/A Analytical Bound (ms) 542.39 � 38

Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling [1] [2] • Fair-lateness-based scheduler is FL: fair-lateness beneficial as it reduced node response times by up to 9.9%. [3] [3] • Overheads of supporting fine-grained An alert driver takes scheduling was 14.15%. 700 ms to react. [1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) 136.57 84669.47 Average Response Time (ms) 65.99 427.07 170091.06 Maximum Response Time (ms) 125.66 N/A N/A Analytical Bound (ms) 542.39 � 39

Conclusions 1. Fine-grained scheduling 2. Response-time bounds analysis for GPU tasks 3. Case study � 40

Future Work 1. Cycles in the graph 2. Other resource constraints 3. Schedulability studies � 41

Thanks!

Making OpenVX Really Real Time Ming Yang 1 , Tanya Amert 1 , Kecheng - PowerPoint PPT Presentation

Making OpenVX Really Real Time Ming Yang 1 , Tanya Amert 1 , Kecheng Yang 1,2 , Nathan Otterness 1 , James H. Anderson 1 , F. Donelson Smith 1 , and Shige Wang 3 1 The University of North Carolina at Chapel Hill 2 Texas State University 3

Real- Real -Time Systems Time Systems Real- -Time Systems Time Systems Real

Real Real- -Time Systems Time Systems Designing a real- Designing a real -time system time

Real- Real -time systems time systems Real- Real -time programming time programming

Real DP Time - Really? Dan Endersby All Offshore Real DP Time Really? Daniel Endersby

Real graduates, Real graduates, real transitions, real transitions, real stories: real

When (Low ) Pow er Really Matters When (Low ) Pow er Really Matters When (Low ) Pow er Really

Real Real Real Time Real-Time Time Time Model Checking Model Model Checking Model

EMBEDDED EMBEDDED REAL TIME SYSTEMS REAL TIME SYSTEMS EMBEDDED EMBEDDED REAL TIME SYSTEMS

Real Time Operating Systems Shirvaikar Chapter 4 REAL TIME SYSTEMS SHIRVAIKAR 1 Real Time

RTOS Real-Time Operating Systems Chenyang Lu OS Support for Real-Time Real-Time OS

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

Cycle time: 40 sec Cycle time: 12 sec Cycle time: 0.75 sec Cycle time: 1.25 sec Cycle time: 5

Real-Time Operating system (RTOS) Real-time Embedded systems often have real-time computing

Real-Time Architecture Heechul Yun 1 Topics Introduction to Real-Time Systems, CPS CPS

The Real- -Time UML Standard: Time UML Standard: The Real The Real-Time UML Standard: Theory

Real- -Time Systems Time Systems Real Specification Implementation Task model

Real Real- -Time Systems Time Systems Deadline- Deadline -monotonic scheduling monotonic

Scheduling Chapter 7 OSPP Part I Main Points Scheduling policy: what to do next, when there

Intuitive Generosity and Error Prone Inference from Response Time Mara P. Recalde 1 Arno Riedl 2

CPU Scheduling (Chapter 7) CS 4410 Operating Systems [R. Agarwal, L. Alvisi, A. Bracy, M.

Refinement-based Exact Response-Time Analysis Martin Stigge Uppsala University, Sweden Joint

Operational Compliance Report March 2020 Frank Gresh Chief Information Officer To serve our

Welcome! Office Hours will start at 2pm and run until 3pm Please mute your microphone to cut

BlinkDB: Queries with Bounded Error and Bounded Response Times on Very Large Data Sameer Agarwal,