i
play

i * f a S C o m p * l t e n N t e e c t * s T - PowerPoint PPT Presentation

Response Time Analysis of Synchronous Data Flow Programs on a Many-Core Processor Hamza Rihani, Matthieu Moy, Claire Maiza, Robert I. Davis, Sebastian Altmeyer RTNS16, October 19, 2016 A r t i * f a S C o m p * l t e n N t


  1. Application Model NA NB NC ◦ Direct Acyclic Task Graph i 1 o τ 1 τ 2 τ 3 ◦ Mono-rate (or at least harmonic rates) ND ◦ Fixed mapping and execution order τ 4 Each task τ i : ◦ Processor Demand, Memory Demand NE NF i 2 τ 5 τ 6 ◦ Release date ( rel i ), response time ( R i ) R i rel i � � Interference 0 t 00 40 80 120 160 8 / 21

  2. Application Model NA NB NC ◦ Direct Acyclic Task Graph i 1 o τ 1 τ 2 τ 3 ◦ Mono-rate (or at least harmonic rates) ND ◦ Fixed mapping and execution order τ 4 Each task τ i : ◦ Processor Demand, Memory Demand NE NF i 2 τ 5 τ 6 ◦ Release date ( rel i ), response time ( R i ) R i rel i � � Interference 0 t 00 40 80 120 160 � Find R i (including the interference) � Find rel i respecting precedence constraints 8 / 21

  3. Outline 1 Motivation and Context 2 Models Definition Architecture Model Execution Model Application Model 3 Multicore Response Time Analysis of SDF Programs 4 Evaluation 5 Conclusion and Future Work 9 / 21

  4. Response Time Analysis BUS ( R ) R = PD + I ◦ Response Time 10 / 21

  5. Response Time Analysis BUS ( R ) R = PD + I ◦ Response Time ◦ Processor Demand 10 / 21

  6. Response Time Analysis BUS ( R ) R = PD + I ◦ Response Time ◦ Processor Demand ◦ Bus Interference (given a model of the bus arbiter) 10 / 21

  7. Response Time Analysis BUS ( R ) + I PROC ( R ) + I DRAM ( R ) R = PD + I ◦ Response Time ◦ Processor Demand ◦ Bus Interference (given a model of the bus arbiter) ◦ Interference from preempting tasks (no preemption: I PROC = 0 ) ◦ Interference from DRAM refreshes (out of scope. I DRAM = 0 ) 10 / 21

  8. Response Time Analysis BUS ( R ) + I PROC ( R ) + I DRAM ( R ) R = PD + I ◦ Response Time ◦ Processor Demand ◦ Bus Interference (given a model of the bus arbiter) ◦ Interference from preempting tasks (no preemption: I PROC = 0 ) ◦ Interference from DRAM refreshes (out of scope. I DRAM = 0 ) ◦ Recursive formula ⇒ fixed-point algorithm. 10 / 21

  9. Response Time Analysis BUS ( R ) + I PROC ( R ) + I DRAM ( R ) R = PD + I ◦ Response Time ◦ Processor Demand ◦ Bus Interference (given a model of the bus arbiter) ◦ Interference from preempting tasks (no preemption: I PROC = 0 ) ◦ Interference from DRAM refreshes (out of scope. I DRAM = 0 ) ◦ Recursive formula ⇒ fixed-point algorithm. ◦ Multiple shared resources (memory banks) 10 / 21

  10. Response Time Analysis BUS ( R ) + I PROC ( R ) + I DRAM ( R ) R = PD + I ◦ Response Time ◦ Processor Demand ◦ Bus Interference (given a model of the bus arbiter) ◦ Interference from preempting tasks (no preemption: I PROC = 0 ) ◦ Interference from DRAM refreshes (out of scope. I DRAM = 0 ) ◦ Recursive formula ⇒ fixed-point algorithm. ◦ Multiple shared resources (memory banks) I BUS ( R ) = � I BUS ( R ) b b ∈ B where B : a set of memory banks 10 / 21

  11. Response Time Analysis BUS ( R ) + I PROC ( R ) + I DRAM ( R ) R = PD + I ◦ Response Time ◦ Processor Demand ◦ Bus Interference (given a model of the bus arbiter) ◦ Interference from preempting tasks (no preemption: I PROC = 0 ) ◦ Interference from DRAM refreshes (out of scope. I DRAM = 0 ) ◦ Recursive formula ⇒ fixed-point algorithm. ◦ Multiple shared resources (memory banks) I BUS ( R ) = � I BUS ( R ) b b ∈ B where B : a set of memory banks � Requires a model of the bus arbiter 10 / 21

  12. Model of the MPPA Bus G3 Rx high priority I BUS : delay from all accesses + concurrent ones Tx b Shared RR G2 DSU FP Memory 3 → 1 Bank Lv 4 RM RR 2 → 1 P 15 Lv 3 RR 16 → 1 G1 P 1 Lv 2 � P 0 Lv 1 � task of interest P 0 y t 00 40 80 11 / 21

  13. Model of the MPPA Bus G3 Rx high priority I BUS : delay from all accesses + concurrent ones Tx b Shared RR S b i : number of accesses of task τ i to bank b G2 DSU FP Memory 3 → 1 Bank Lv 4 Sb RM i = Memory Demand to bank b RR 2 → 1 P 15 Lv 3 RR 16 → 1 G1 P 1 Lv 2 � P 0 Lv 1 � task of interest Lv 1 = S b P 0 i y t 00 40 80 11 / 21

  14. Model of the MPPA Bus G3 Rx high priority I BUS : delay from all accesses + concurrent ones Tx b Shared RR S b i : number of accesses of task τ i to bank b G2 DSU FP Memory 3 → 1 Bank Lv 4 Sb RM i = Memory Demand to bank b RR 2 → 1 A y , b : number of concurrent accesses from core y to bank b P 15 Lv 3 i RR 16 → 1 G1 P 1 Lv 2 � P 0 Lv 1 � task of interest Lv 1 = S b P 0 i 15 min( A y , b � Lv 2 = Lv 1 + , Lv 1 ) i y =1 y t 00 40 80 11 / 21

  15. Model of the MPPA Bus G3 Rx high priority I BUS : delay from all accesses + concurrent ones Tx b Shared RR AG 2, b S b i : number of accesses of task τ i to bank b G2 DSU FP Memory i 3 → 1 Bank Lv 4 Sb RM i = Memory Demand to bank b RR 2 → 1 A y , b : number of concurrent accesses from core y to bank b P 15 Lv 3 i RR 16 → 1 G1 P 1 Lv 2 � P 0 Lv 1 � task of interest Lv 1 = S b P 0 i 15 min( A y , b � Lv 2 = Lv 1 + , Lv 1 ) i y =1 y Lv 3 = Lv 2 +min( A G 2, b t , Lv 2 ) i 00 40 80 11 / 21

  16. Model of the MPPA Bus G3 Rx high priority I BUS AG 3, b : delay from all accesses + concurrent ones Tx i b Shared RR S b i : number of accesses of task τ i to bank b G2 DSU FP Memory 3 → 1 Bank Lv 4 Sb RM i = Memory Demand to bank b RR 2 → 1 A y , b : number of concurrent accesses from core y to bank b P 15 Lv 3 i RR 16 → 1 G1 P 1 Lv 2 � P 0 Lv 1 � task of interest Lv 1 = S b P 0 i 15 min( A y , b � Lv 2 = Lv 1 + , Lv 1 ) i y =1 y Lv 3 = Lv 2 +min( A G 2, b t , Lv 2 ) i 00 40 80 Lv 4 = Lv 4 + A G 3, b i 11 / 21

  17. Model of the MPPA Bus G3 Rx high priority I BUS : delay from all accesses + concurrent ones Tx b Shared RR S b i : number of accesses of task τ i to bank b G2 DSU FP Memory 3 → 1 Bank Lv 4 Sb RM i = Memory Demand to bank b RR 2 → 1 A y , b : number of concurrent accesses from core y to bank b P 15 Lv 3 i RR 16 → 1 G1 P 1 Lv 2 � P 0 Lv 1 � task of interest Lv 1 = S b P 0 i 15 min( A y , b � Lv 2 = Lv 1 + , Lv 1 ) i y =1 y Lv 3 = Lv 2 +min( A G 2, b t , Lv 2 ) i 00 40 80 Lv 4 = Lv 4 + A G 3, b i I BUS = Lv 4 × Bus Delay b 11 / 21

  18. Model of the MPPA Bus G3 Rx high priority I BUS : delay from all accesses + concurrent ones Tx b Shared RR S b i : number of accesses of task τ i to bank b G2 DSU FP Memory 3 → 1 Bank Lv 4 Sb RM i = Memory Demand to bank b RR 2 → 1 A y , b : number of concurrent accesses from core y to bank b P 15 Lv 3 i RR Ay , b 16 → 1 � = overlapping concurrent accesses G1 P 1 i Lv 2 � P 0 Lv 1 � task of interest Lv 1 = S b P 0 i � 15 min( A y , b 0 � � Lv 2 = Lv 1 + , Lv 1 ) i y =1 y Lv 3 = Lv 2 +min( A G 2, b t , Lv 2 ) i 00 40 80 Lv 4 = Lv 4 + A G 3, b i I BUS = Lv 4 × Bus Delay b 11 / 21

  19. Model of the MPPA Bus G3 Rx high priority I BUS : delay from all accesses + concurrent ones Tx b Shared RR S b i : number of accesses of task τ i to bank b G2 DSU FP Memory 3 → 1 Bank Lv 4 Sb RM i = Memory Demand to bank b RR 2 → 1 A y , b : number of concurrent accesses from core y to bank b P 15 Lv 3 i RR Ay , b 16 → 1 � = overlapping concurrent accesses G1 P 1 i Lv 2 � P 0 Lv 1 � task of interest Lv 1 = S b P 0 i � 15 min( A y , b 0 � � Lv 2 = Lv 1 + , Lv 1 ) i y =1 y Lv 3 = Lv 2 +min( A G 2, b t , Lv 2 ) i 00 40 80 Lv 4 = Lv 4 + A G 3, b i � A y , b depends on rel i and R i i I BUS = Lv 4 × Bus Delay b 11 / 21

  20. Response Time Analysis with Dependencies τ 4 τ 5 PE 2 τ 3 PE 1 1 initial rel i WCRT analysis τ 0 τ 1 τ 2 PE 0 for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) i end for 1 Start with initial release dates. 12 / 21

  21. Response Time Analysis with Dependencies τ 4 τ 5 PE 2 R l +1 � = R l τ 3 PE 1 initial rel 0 i i i 2 WCRT analysis τ 0 τ 1 τ 2 PE 0 for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) i end for 1 Start with initial release dates. 2 Compute response times ... 12 / 21

  22. Response Time Analysis with Dependencies τ 4 τ 5 PE 2 R l +1 � = R l τ 3 PE 1 initial rel 0 i i i 2 WCRT analysis τ 0 τ 1 τ 2 PE 0 for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) i end for 1 Start with initial release dates. 2 Compute response times ... ... 12 / 21

  23. Response Time Analysis with Dependencies τ 4 τ 5 PE 2 τ 3 PE 1 initial rel 0 R l +1 � = R l i i i WCRT analysis τ 0 τ 1 τ 2 PE 0 for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) i end for 1 Start with initial release dates. R i 2 Compute response times 2 ... ... ... a fixed-point is reached! 12 / 21

  24. Response Time Analysis with Dependencies τ 4 τ 5 PE 2 τ 3 PE 1 initial rel 0 R l +1 � = R l i i i WCRT analysis τ 0 τ 1 τ 2 PE 0 for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) i end for 1 Start with initial release dates. 2 Compute response times R i ... ... ... a fixed-point is reached! 3 Update the release dates. Update release dates 3 for all i do rel i ← latest finish time of all the de- pendencies end for 12 / 21

  25. Response Time Analysis with Dependencies τ 4 τ 5 PE 2 τ 3 PE 1 initial rel 0 R l +1 � = R l i i i WCRT analysis τ 0 τ 1 τ 2 PE 0 for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) i end for 4 1 Start with initial release dates. new rel i 2 Compute response times R i repeat ... ... ... a fixed-point is reached! 3 Update the release dates. Update release dates 4 Repeat until no release date changes for all i do rel i ← latest finish time of all the de- (another fixed-point iteration). pendencies end for 12 / 21

  26. Response Time Analysis with Dependencies τ 4 τ 5 PE 2 τ 3 PE 1 initial rel 0 R l +1 � = R l i i i WCRT analysis τ 0 τ 1 τ 2 PE 0 for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) i end for 1 Start with initial release dates. new rel i 2 Compute response times R i repeate ... ... ... a fixed-point is reached! 3 Update the release dates. Update release dates 4 Repeat until no release date changes for all i do rel i ← latest finish time of all the de- (another fixed-point iteration). pendencies end for 4 rel i did not change Return: ( rel i , R i ) 12 / 21

  27. Convergence Toward a Fixed-point τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i repeate Update release dates for all i do rel i ← latest finish time of all the de- pendencies end for rel i did not change Return: ( rel i , R i ) 13 / 21

  28. Convergence Toward a Fixed-point τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeate Update release dates for all i do rel i ← latest finish time of all the de- pendencies end for rel i did not change Return: ( rel i , R i ) 13 / 21

  29. Convergence Toward a Fixed-point τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeat ◦ Convergence of the 2 nd fixed-point iteration: Update release dates for all i do rel i ← latest finish time of all the de- pendencies end for rel i did not change Return: ( rel i , R i ) 13 / 21

  30. Convergence Toward a Fixed-point τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeat ◦ Convergence of the 2 nd fixed-point iteration: Update release dates ◦ no monotonicity: R i and rel i may grow or shrink at each iteration. ? for all i do rel i ← latest finish time of all the de- pendencies end for rel i did not change Return: ( rel i , R i ) 13 / 21

  31. Convergence Toward a Fixed-point τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeat ◦ Convergence of the 2 nd fixed-point iteration: Update release dates ◦ no monotonicity: R i and rel i may grow or shrink at each iteration. ? for all i do rel i ← latest finish time of all the de- pendencies Theorem end for rel i did not change At each iteration, at least one task finds its final release date. Return: ( rel i , R i ) Full proof in our technical report: http://www-verimag.imag.fr/TR/TR-2016-1.pdf 13 / 21

  32. Convergence Toward a Fixed-point t τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeat ◦ Convergence of the 2 nd fixed-point iteration: Update release dates ◦ no monotonicity: R i and rel i may grow or shrink at each iteration. ? for all i do rel i ← latest finish time of all the de- pendencies Theorem end for rel i did not change At each iteration, at least one task finds its final release date. Return: ( rel i , R i ) Full proof in our technical report: http://www-verimag.imag.fr/TR/TR-2016-1.pdf 13 / 21

  33. Convergence Toward a Fixed-point final release date t τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeat ◦ Convergence of the 2 nd fixed-point iteration: Update release dates ◦ no monotonicity: R i and rel i may grow or shrink at each iteration. ? for all i do rel i ← latest finish time of all the de- pendencies Theorem end for rel i did not change At each iteration, at least one task finds its final release date. Return: ( rel i , R i ) Full proof in our technical report: http://www-verimag.imag.fr/TR/TR-2016-1.pdf 13 / 21

  34. Convergence Toward a Fixed-point final release date t τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeat ◦ Convergence of the 2 nd fixed-point iteration: Update release dates ◦ no monotonicity: R i and rel i may grow or shrink at each iteration. ? for all i do rel i ← latest finish time of all the de- pendencies Theorem end for rel i did not change At each iteration, at least one task finds its final release date. Return: ( rel i , R i ) Full proof in our technical report: http://www-verimag.imag.fr/TR/TR-2016-1.pdf 13 / 21

  35. Convergence Toward a Fixed-point final release date t τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeat ◦ Convergence of the 2 nd fixed-point iteration: Update release dates ◦ no monotonicity: R i and rel i may grow or shrink at each iteration. ? for all i do rel i ← latest finish time of all the de- pendencies Theorem end for rel i did not change At each iteration, at least one task finds its final release date. Return: ( rel i , R i ) Full proof in our technical report: http://www-verimag.imag.fr/TR/TR-2016-1.pdf 13 / 21

  36. Convergence Toward a Fixed-point final release date t τ 4 τ 5 PE 2 initial rel 0 R l +1 � = R l i i i τ 3 PE 1 WCRT analysis for all i do ← PD i + I BUS ( R l R l +1 i , rel i ) τ 0 τ 1 τ 2 PE 0 i end for ◦ Convergence of the 1 st fixed-point iteration: new rel i R i ◦ Monotonic and bounded ✓ repeat ◦ Convergence of the 2 nd fixed-point iteration: Update release dates ◦ no monotonicity: R i and rel i may grow or shrink at each iteration. ? for all i do rel i ← latest finish time of all the de- pendencies Theorem end for rel i did not change At each iteration, at least one task finds its final release date. Return: ( rel i , R i ) Full proof in our technical report: http://www-verimag.imag.fr/TR/TR-2016-1.pdf 13 / 21

  37. Outline 1 Motivation and Context 2 Models Definition Architecture Model Execution Model Application Model 3 Multicore Response Time Analysis of SDF Programs 4 Evaluation 5 Conclusion and Future Work 14 / 21

  38. Evaluation: ROSACE Case Study 1 h ( 200 Hz ) h_filter altitude ( 100 Hz ) ( 50 Hz ) az ( 200 Hz ) az_filter ( 100 Hz ) vz ( 200 Hz ) δ ec vz_filter vz_control ( 100 Hz ) ( 50 Hz ) q ( 200 Hz ) q_filter ( 100 Hz ) δ the va_control ( 50 Hz ) va ( 200 Hz ) va_filter ( 100 Hz ) ◦ Flight management system controller 1 Pagetti et al., RTAS 2014 15 / 21

  39. Evaluation: ROSACE Case Study 1 h ( 200 Hz ) h_filter altitude ( 100 Hz ) ( 50 Hz ) az ( 200 Hz ) az_filter ( 100 Hz ) vz ( 200 Hz ) δ ec vz_filter vz_control ( 100 Hz ) ( 50 Hz ) q ( 200 Hz ) q_filter ( 100 Hz ) δ the va_control ( 50 Hz ) va ( 200 Hz ) va_filter ( 100 Hz ) ◦ Flight management system controller ◦ Receive from sensors and transmit to actuators 1 Pagetti et al., RTAS 2014 15 / 21

  40. Evaluation: ROSACE Case Study 1 h ( 200 Hz ) h_filter altitude ( 100 Hz ) ( 50 Hz ) Rx receive az ( 200 Hz ) 200 Hz az_filter ( 100 Hz ) Tx transmit 50 Hz vz ( 200 Hz ) δ ec vz_filter vz_control h_filter vz_control ( 100 Hz ) ( 50 Hz ) P 4 altitude 100 Hz 50 Hz 50 Hz q ( 200 Hz ) q_filter az_filter P 3 ( 100 Hz ) δ the va_control 100 Hz ( 50 Hz ) va ( 200 Hz ) va_filter vz_filter P 2 ( 100 Hz ) 100 Hz q_filter P 1 100 Hz va_filter va_control P 0 100 Hz 50 Hz ◦ Flight management system controller ◦ Receive from sensors and transmit to actuators ◦ Assumptions: Tasks are mapped on 5 cores Debug Support Unit is disabled Context switches are over-approximated constants 1 Pagetti et al., RTAS 2014 15 / 21

  41. Evaluation: ROSACE Case Study 1 h ( 200 Hz ) Hyper-period h_filter altitude ( 100 Hz ) ( 50 Hz ) Rx receive receive receive receive az ( 200 Hz ) 200 Hz 200 Hz 200 Hz 200 Hz az_filter ( 100 Hz ) Tx transmit 50 Hz vz ( 200 Hz ) δ ec vz_filter vz_control h_filter vz_control h_filter ( 100 Hz ) ( 50 Hz ) P 4 altitude 100 Hz 50 Hz 50 Hz 100 Hz q ( 200 Hz ) q_filter az_filter az_filter P 3 ( 100 Hz ) δ the va_control 100 Hz 100 Hz ( 50 Hz ) va ( 200 Hz ) va_filter vz_filter vz_filter P 2 ( 100 Hz ) 100 Hz 100 Hz q_filter q_filter P 1 100 Hz 100 Hz va_filter va_control va_filter P 0 100 Hz 50 Hz 100 Hz ◦ Flight management system controller ◦ Receive from sensors and transmit to actuators ◦ Assumptions: Tasks are mapped on 5 cores Debug Support Unit is disabled Context switches are over-approximated constants 1 Pagetti et al., RTAS 2014 15 / 21

  42. Evaluation: ROSACE Case Study Task Processor Demand (cycles) Memory Demand (accesses) altitude 275 22 az_filter 274 22 h_filter 326 24 va_control 303 24 va_filter 301 23 vz_control 320 25 vz_filter 334 25 Table: Task profiles of the FMS controller ◦ Profile obtained from measurements 16 / 21

  43. Evaluation: ROSACE Case Study Task Processor Demand (cycles) Memory Demand (accesses) altitude 275 22 az_filter 274 22 h_filter 326 24 va_control 303 24 va_filter 301 23 vz_control 320 25 vz_filter 334 25 Table: Task profiles of the FMS controller ◦ Profile obtained from measurements ◦ Memory Demand: data and instruction cache misses + communications 16 / 21

  44. Evaluation: ROSACE Case Study Task Processor Demand (cycles) Memory Demand (accesses) altitude 275 22 az_filter 274 22 h_filter 326 24 va_control 303 24 va_filter 301 23 vz_control 320 25 vz_filter 334 25 Table: Task profiles of the FMS controller ◦ Profile obtained from measurements ◦ Memory Demand: data and instruction cache misses + communications ◦ Moreover: ◦ NoC Rx : writes 5 words ◦ NoC Tx : reads 2 words 16 / 21

  45. Evaluation: ROSACE Case Study Task Processor Demand (cycles) Memory Demand (accesses) altitude 275 22 az_filter 274 22 h_filter 326 24 va_control 303 24 va_filter 301 23 vz_control 320 25 vz_filter 334 25 Table: Task profiles of the FMS controller ◦ Profile obtained from measurements ◦ Memory Demand: data and instruction cache misses + communications ◦ Moreover: ◦ NoC Rx : writes 5 words ◦ NoC Tx : reads 2 words � Experiments: Find the smallest schedulable hyper-period 16 / 21

  46. Evaluation: Experiments 1 bank 5 banks 16000 E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 0 MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period 17 / 21

  47. Evaluation: Experiments 1 bank 5 banks 16000 E5: Pessimistic ◦ Pessimistic assumption: E4: 1−Phase (w/o release) High priority tasks are E3: 2−Phase (w/o release) bounded by 1 access per bank E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 0 MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E5: All accesses interfere 17 / 21

  48. Evaluation: Experiments 1 bank 5 banks 16000 E5: Pessimistic ◦ Pessimistic assumption: E4: 1−Phase (w/o release) High priority tasks are E3: 2−Phase (w/o release) bounded by 1 access per bank E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 0 MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E4, E3: We don’t use E5: All accesses interfere the release dates 17 / 21

  49. Evaluation: Experiments 1 bank 5 banks 16000 E5: Pessimistic ◦ Pessimistic assumption: E4: 1−Phase (w/o release) High priority tasks are E3: 2−Phase (w/o release) bounded by 1 access per bank E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 0 MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E4, E3: We don’t use E2, E1: Our approach. E5: All accesses interfere the release dates We use the release dates 17 / 21

  50. Evaluation: Experiments 1-Phase model 2-Phase model memory access pattern memory access pattern 1 bank 5 banks 16000 E5: Pessimistic ◦ Pessimistic assumption: E4: 1−Phase (w/o release) High priority tasks are E3: 2−Phase (w/o release) bounded by 1 access per bank E2: 1−Phase 12000 E1: 2−Phase Processor cycles ◦ Phases are modeled as sub-tasks 8000 4000 0 MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E4, E3: We don’t use E2, E1: Our approach. E5: All accesses interfere the release dates We use the release dates 17 / 21

  51. Evaluation: Experiments Taking into account the memory banks improves the analysis with a factor in [1.77,2.52] 1 bank 5 banks 16000 E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 A r t i * f a S C o m * p l e N n t t e e t * c T s s i W R n e t o l l C d * o * * e c s u u m d e E r e o n t t e v e y s a E * d 0 t a a l u MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA 4.15 4.12 1.68 1.29 ∼ 1.01 0.77 RR 3.3 3.29 1.24 1.13 ∼ 1.01 0.91 Speedup factors 18 / 21

  52. Evaluation: Experiments Taking into account the memory banks improves the analysis with a factor in [1.77,2.52] 1 bank 5 banks 16000 E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 A r t i * f a S C o m * p l e N n t t e e t * c T s s i W R n e t o l l C d * o * * e c s u u m d e E r e o n t t e v e y s a E * d 0 t a a l u MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA 4.15 4.12 1.68 1.29 ∼ 1.01 0.77 RR 3.3 3.29 1.24 1.13 ∼ 1.01 0.91 Speedup factors 18 / 21

  53. Evaluation: Experiments Taking into account the memory banks improves the analysis with a factor in [1.77,2.52] 1 bank 5 banks 16000 E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 A r t i * f a S C o m * p l e N n t t e e t * c T s s i W R n e t o l l C d * o * * e c s u u m d e E r e o n t t e v e y s a E * d 0 t a a l u MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA 4.15 4.12 1.68 1.29 ∼ 1.01 0.77 RR 3.3 3.29 1.24 1.13 ∼ 1.01 0.91 Speedup factors 18 / 21

  54. Evaluation: Experiments Taking into account the memory banks improves the analysis with a factor in [1.77,2.52] 1 bank 5 banks 16000 E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 A r t i * f a S C o m * p l e N n t t e e t * c T s s i W R n e t o l l C d * o * * e c s u u m d e E r e o n t t e v e y s a E * d 0 t a a l u MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA 4.15 4.12 1.68 1.29 ∼ 1.01 0.77 RR 3.3 3.29 1.24 1.13 ∼ 1.01 0.91 Speedup factors 18 / 21

  55. Evaluation: Experiments Taking into account the memory banks improves the analysis with a factor in [1.77,2.52] 1 bank 5 banks 16000 E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 A r t i * f a S C o m * p l e N n t t e e t * c T s s i W R n e t o l l C d * o * * e c s u u m d e E r e o n t t e v e y s a E * d 0 t a a l u MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA 4.15 4.12 1.68 1.29 ∼ 1.01 0.77 RR 3.3 3.29 1.24 1.13 ∼ 1.01 0.91 Speedup factors 18 / 21

  56. Evaluation: Experiments Taking into account the memory banks improves the analysis with a factor in [1.77,2.52] 1 bank 5 banks 16000 E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase 12000 E1: 2−Phase Processor cycles 8000 4000 A r t i * f a S C o m * p l e N n t t e e t * c T s s i W R n e t o l l C d * o * * e c s u u m d e E r e o n t t e v e y s a E * d 0 t a a l u MPPA RR MPPA RR Bus Policy Smallest schedulable hyper-period E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA 4.15 4.12 1.68 1.29 ∼ 1.01 0.77 RR 3.3 3.29 1.24 1.13 ∼ 1.01 0.91 Speedup factors 18 / 21

  57. Outline 1 Motivation and Context 2 Models Definition Architecture Model Execution Model Application Model 3 Multicore Response Time Analysis of SDF Programs 4 Evaluation 5 Conclusion and Future Work 19 / 21

  58. Conclusion ◦ A response time analysis of SDF on the Kalray MPPA 256 20 / 21

  59. Conclusion ◦ A response time analysis of SDF on the Kalray MPPA 256 ◦ Given: ◦ Task profile ◦ Mapping of Tasks ◦ Execution Order 20 / 21

  60. Conclusion ◦ A response time analysis of SDF on the Kalray MPPA 256 ◦ Given: ◦ Task profile ◦ Mapping of Tasks ◦ Execution Order ◦ We compute: ◦ Tight response times taking into account the interference. ◦ Release dates respecting the dependency constraints. 20 / 21

  61. Conclusion ◦ A response time analysis of SDF on the Kalray MPPA 256 ◦ Given: ◦ Task profile ◦ Mapping of Tasks ◦ Execution Order model of the multi-level arbiter ◦ We compute: ◦ Tight response times taking into account the interference. ◦ Release dates respecting the dependency constraints. 20 / 21

  62. Conclusion ◦ A response time analysis of SDF on the Kalray MPPA 256 ◦ Given: ◦ Task profile ◦ Mapping of Tasks ◦ Execution Order model of the multi-level arbiter ◦ We compute: ◦ Tight response times taking into account the interference. ◦ Release dates respecting the dependency constraints. double fixed-point algorithm 20 / 21

  63. Conclusion ◦ A response time analysis of SDF on the Kalray MPPA 256 ◦ Given: ◦ Task profile ◦ Mapping of Tasks ◦ Execution Order model of the multi-level arbiter ◦ We compute: ◦ Tight response times taking into account the interference. ◦ Release dates respecting the dependency constraints. ◦ Not restricted to SDF double fixed-point algorithm 20 / 21

  64. Future Work ◦ Model of the Resource Manager. NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks RM DSU P 6 P 7 P 14 P 15 P 4 P 5 P 12 P 13 P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 21 / 21

  65. Future Work tighter estimation of context switches and ◦ Model of the Resource Manager. other interrupts NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks RM DSU P 6 P 7 P 14 P 15 P 4 P 5 P 12 P 13 P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 21 / 21

  66. Future Work tighter estimation of context switches and ◦ Model of the Resource Manager. other interrupts NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks ◦ Model of the NoC accesses. RM DSU P 6 P 7 P 14 P 15 P 4 P 5 P 12 P 13 P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 21 / 21

  67. Future Work tighter estimation of context switches and ◦ Model of the Resource Manager. other interrupts use the output of NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks any NoC analysis ◦ Model of the NoC accesses. RM DSU P 6 P 7 P 14 P 15 P 4 P 5 P 12 P 13 P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 21 / 21

  68. Future Work tighter estimation of context switches and ◦ Model of the Resource Manager. other interrupts use the output of NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks any NoC analysis ◦ Model of the NoC accesses. RM DSU P 6 P 7 P 14 P 15 P 4 P 5 P 12 P 13 ◦ Memory access pipelining. P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 21 / 21

  69. Future Work tighter estimation of context switches and ◦ Model of the Resource Manager. other interrupts use the output of NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks any NoC analysis ◦ Model of the NoC accesses. RM DSU current assumption: P 6 P 7 P 14 P 15 bus delay is 10 cycles P 4 P 5 P 12 P 13 ◦ Memory access pipelining. P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 21 / 21

  70. Future Work tighter estimation of context switches and ◦ Model of the Resource Manager. other interrupts use the output of NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks any NoC analysis ◦ Model of the NoC accesses. RM DSU current assumption: P 6 P 7 P 14 P 15 bus delay is 10 cycles P 4 P 5 P 12 P 13 ◦ Memory access pipelining. P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 ◦ Model Blocking and non-blocking accesses. 21 / 21

  71. Future Work tighter estimation of context switches and ◦ Model of the Resource Manager. other interrupts use the output of NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks any NoC analysis ◦ Model of the NoC accesses. RM DSU current assumption: P 6 P 7 P 14 P 15 bus delay is 10 cycles P 4 P 5 P 12 P 13 ◦ Memory access pipelining. P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 ◦ Model Blocking and non-blocking accesses. reads are blocking writes are non-blocking 21 / 21

  72. Future Work tighter estimation of context switches and ◦ Model of the Resource Manager. other interrupts use the output of NoC Rx NoC Tx 8 shared memory banks 8 shared memory banks any NoC analysis ◦ Model of the NoC accesses. RM DSU current assumption: P 6 P 7 P 14 P 15 bus delay is 10 cycles P 4 P 5 P 12 P 13 ◦ Memory access pipelining. P 2 P 3 P 10 P 11 P 0 P 1 P 8 P 9 ◦ Model Blocking and non-blocking accesses. reads are blocking writes are non-blocking Questions? 21 / 21

  73. BACKUP

  74. Multicore Response Time Analysis Example: Fixed Priority bus arbiter, PE1 > PE0 Bus access delay = 10 T 1 T 1 T 1 T 1 2 accesses 2 accesses 2 accesses 2 accesses PE1 T 0 PE0 t 00 40 80 120 160 1 Altmeyer et al., RTNS 2015

  75. Multicore Response Time Analysis Example: Fixed Priority bus arbiter, PE1 > PE0 Bus access delay = 10 T 1 T 1 T 1 T 1 2 accesses 2 accesses 2 accesses 2 accesses PE1 T 0 3 accesses PE0 t 00 40 80 120 160 ◦ Task of interest running on PE 0 : R 0 = 10+3 × 10 (response time in isolation) 1 Altmeyer et al., RTNS 2015

  76. Multicore Response Time Analysis Example: Fixed Priority bus arbiter, PE1 > PE0 Bus access delay = 10 T 1 T 1 T 1 T 1 2 accesses 2 accesses 2 accesses 2 accesses PE1 T 0 3 accesses PE0 t 00 40 80 120 160 ◦ Task of interest running on PE 0 : R 0 = 10+3 × 10 (response time in isolation) R 1 = 10+3 × 10+2 × 10 = 60 1 Altmeyer et al., RTNS 2015

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend