Making OpenVX Really “Real Time”
Ming Yang1, Tanya Amert1, Kecheng Yang1,2, Nathan Otterness1, James H. Anderson1, F. Donelson Smith1, and Shige Wang3
1The University of North Carolina at Chapel Hill 2Texas State University 3General Motors Research
Making OpenVX Really Real Time Ming Yang 1 , Tanya Amert 1 , Kecheng - - PowerPoint PPT Presentation
Making OpenVX Really Real Time Ming Yang 1 , Tanya Amert 1 , Kecheng Yang 1,2 , Nathan Otterness 1 , James H. Anderson 1 , F. Donelson Smith 1 , and Shige Wang 3 1 The University of North Carolina at Chapel Hill 2 Texas State University 3
Ming Yang1, Tanya Amert1, Kecheng Yang1,2, Nathan Otterness1, James H. Anderson1, F. Donelson Smith1, and Shige Wang3
1The University of North Carolina at Chapel Hill 2Texas State University 3General Motors Research
6
7 Source: https://www.khronos.org/openvx/
OpenVX Node OpenVX Node OpenVX Node OpenVX Node
Example OpenVX Graph
Native Camera Control Downstream Application Processing
Graph-based architecture
Application Application GPU FPGA DSP
Portability to diverse hardware Does OpenVX really target “real-time” processing?
8 Source: https://www.khronos.org/openvx/
OpenVX Node OpenVX Node OpenVX Node OpenVX Node
Example OpenVX Graph
Native Camera Control Downstream Application Processing
Does OpenVX really target “real-time” processing?
9 Source: https://www.khronos.org/openvx/
D C B A
Does OpenVX really target “real-time” processing?
D C B A
10 Source: https://www.khronos.org/openvx/
D C B A
Monolithic scheduling Time A … A B C D
Does OpenVX really target “real-time” processing?
11
Coarse-grained scheduling
D C B A Coarse-grained scheduling Time A B C D … Task A: Task B: Task C: Task D:
D C B A
12
Coarse-grained scheduling Remaining problems:
causes capacity loss.
This Work
14
1. Coarse-grained vs. fine-grained 2. Response-time bounds analysis 3. Case study
15
1. Coarse-grained vs. fine-grained 2. Response-time bounds analysis 3. Case study
Coarse-Grained Scheduling
16
Time
…
Task A: Task B: Task C: Task D:
D C B A
Suspension for GPU execution Time Task A: Task E: Task C: Task D: Task F: Task G:
D C A
GPU execution
E F G
Fine-Grained Scheduling
17
1. Coarse-grained vs. fine-grained 2. Response-time bounds analysis 3. Case study
18 * C. Liu and J. Anderson, “Supporting Soft Real-Time DAG-based Systems on Multiprocessors with No Utilization Loss,” in RTSS, 2013.
19
D C A
B E F
20
D C A
B E F
CPU GPU
21
D C A
B E F
… …
Need a response-time bound analysis for GPU tasks
2048 2048
22
τ1 = (3076,6,2,1024)
SM1 SM0 6 Time 3
τi = (Ci, Ti, Bi, Hi)
Period Number of blocks Number of threads per block (or block size) Per-block worst-case workload
C1
B1
H1 = 1024
T1
23
and intra-task parallelism via counterexamples.
24
and intra-task parallelism via counterexamples.
Time
Releases: 1 2 3 4 5 1 2 3 4 5 Without intra-task parallelism: With intra-task parallelism:
25
SM1 SM0 Time
Rk
rk,j
unfinished workload from jobs released at
rk,j
the job finishes before .
rk,j + Rk
and intra-task parallelism via counterexamples.
26
1. Coarse-grained vs. fine-grained 2. Response-time bounds analysis 3. Case study
Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling
27
Resize Image Compute Gradients Compute Orientation Histograms Normalize Orientation Histograms Resize Image Resize Image Compute Gradients Compute Gradients Compute Orientation Histograms Compute Orientation Histograms Normalize Orientation Histograms Normalize Orientation Histograms
vxHOGCells Node vxHOGCells Node vxHOGFeature sNode vxHOGFeature sNode vxHOGFeatures Node vxHOGCells Node
CPU+GPU Execution (Coarse-Grained) GPU Execution (Fine-Grained)
CPUs.
28
Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling
29
Left is better Time % of samples
50% samples have response time less than 60 ms
Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling
30
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) Average Response Time (ms) 65.99 136.57 84669.47 Maximum Response Time (ms) 125.66 427.07 170091.06
Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling
31
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) Average Response Time (ms) 65.99 136.57 84669.47 Maximum Response Time (ms) 125.66 427.07 170091.06
Half the average response time
[1] [2]
Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling
32
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) Average Response Time (ms) 65.99 136.57 84669.47 Maximum Response Time (ms) 125.66 427.07 170091.06
Half the average response time One-third the maximum response time
[1] [2]
Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling
33
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) Average Response Time (ms) 65.99 136.57 84669.47 Maximum Response Time (ms) 125.66 427.07 170091.06
[1] [2] [3]
Half the average response time One-third the maximum response time
Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling
34
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) Average Response Time (ms) 65.99 136.57 84669.47 Maximum Response Time (ms) 125.66 427.07 170091.06
[1] [2] [3] [3]
Half the average response time One-third the maximum response time
Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling
35
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) Average Response Time (ms) 65.99 136.57 84669.47 Maximum Response Time (ms) 125.66 427.07 170091.06 Analytical Bound (ms) N/A
[1] [2] [3] [3]
Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling
36
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) Average Response Time (ms) 65.99 136.57 84669.47 Maximum Response Time (ms) 125.66 427.07 170091.06 Analytical Bound (ms) N/A N/A
[1] [2] [3] [3]
Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling
37
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) Average Response Time (ms) 65.99 136.57 84669.47 Maximum Response Time (ms) 125.66 427.07 170091.06 Analytical Bound (ms) 542.39 N/A N/A
[1] [2] [3] [3]
Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling
38
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) Average Response Time (ms) 65.99 136.57 84669.47 Maximum Response Time (ms) 125.66 427.07 170091.06 Analytical Bound (ms) 542.39 N/A N/A
[1] [2] [3] [3]
An alert driver takes 700 ms to react.
Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling
39
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF) Average Response Time (ms) 65.99 136.57 84669.47 Maximum Response Time (ms) 125.66 427.07 170091.06 Analytical Bound (ms) 542.39 N/A N/A
[1] [2] [3] [3]
An alert driver takes 700 ms to react.
beneficial as it reduced node response times by up to 9.9%.
scheduling was 14.15%.
Case Study: Comparing Fine-Grained/ Coarse-Grained/Monolithic Scheduling
GPU tasks
40
41