i * f a S C o m p * l t e n N t e e c t * s T - - PowerPoint PPT Presentation

i
SMART_READER_LITE
LIVE PREVIEW

i * f a S C o m p * l t e n N t e e c t * s T - - PowerPoint PPT Presentation

Response Time Analysis of Synchronous Data Flow Programs on a Many-Core Processor Hamza Rihani, Matthieu Moy, Claire Maiza, Robert I. Davis, Sebastian Altmeyer RTNS16, October 19, 2016 A r t i * f a S C o m p * l t e n N t


slide-1
SLIDE 1

C

  • n

s i s t e n t * C

  • m

p l e t e * W e l l d

  • c

u m e n t e d * E a s y t

  • r

e u s e *

*

E v a l u a t e d

* R T N S *

A r t i f a c t

Response Time Analysis of Synchronous Data Flow Programs on a Many-Core Processor

Hamza Rihani, Matthieu Moy, Claire Maiza, Robert I. Davis, Sebastian Altmeyer

RTNS’16, October 19, 2016

slide-2
SLIDE 2

Execution of Synchronous Data Flow Programs

τ1

NA

τ2

NB

τ3

NC

τ4

ND

τ5

NE

τ6

NF

i1 i2

  • High level representation

code generation Single-core static non-preemptive scheduling

✞ ☎

int main_app(i1 , i2) { na = NA(i1); ne = NE(i2); nb = NB(na); nd = ND(na); nf = NF(ne);

  • = NC(nb ,nd ,nf);

return o; }

✝ ✆

2 / 21

slide-3
SLIDE 3

Execution of Synchronous Data Flow Programs

τ1

NA

τ2

NB

τ3

NC

τ4

ND

τ5

NE

τ6

NF

i1 i2

  • High level representation

code generation Multi/Many-core static non-preemptive scheduling

int NF (...) { // task τ6 return (...); } int NE (...) { // task τ5 return (...); } int ND (...) { // task τ4 return (...); } int NC (...) { // task τ3 return (...); } int NB (...) { // task τ2 return (...); } int NA (...) { // task τ1 return (...); }

PE2 PE1 PE0 wcrt0 τ0 wcrt1 τ1 wcrt2 τ2 wcrt3 τ3 wcrt4 τ4 wcrt5 τ5

2 / 21

slide-4
SLIDE 4

Execution of Synchronous Data Flow Programs

τ1

NA

τ2

NB

τ3

NC

τ4

ND

τ5

NE

τ6

NF

i1 i2

  • High level representation

✓ Respect the dependency

constraints

code generation Multi/Many-core static non-preemptive scheduling

int NF (...) { // task τ6 return (...); } int NE (...) { // task τ5 return (...); } int ND (...) { // task τ4 return (...); } int NC (...) { // task τ3 return (...); } int NB (...) { // task τ2 return (...); } int NA (...) { // task τ1 return (...); }

PE2 PE1 PE0 wcrt0 τ0 wcrt1 τ1 wcrt2 τ2 wcrt3 τ3 wcrt4 τ4 wcrt5 τ5

2 / 21

slide-5
SLIDE 5

Execution of Synchronous Data Flow Programs

τ1

NA

τ2

NB

τ3

NC

τ4

ND

τ5

NE

τ6

NF

i1 i2

  • High level representation

✓ Respect the dependency

constraints

✓ Set the release dates to get

precise upper bounds

  • n the interference

code generation Multi/Many-core static non-preemptive scheduling

int NF (...) { // task τ6 return (...); } int NE (...) { // task τ5 return (...); } int ND (...) { // task τ4 return (...); } int NC (...) { // task τ3 return (...); } int NB (...) { // task τ2 return (...); } int NA (...) { // task τ1 return (...); }

PE2 PE1 PE0 wcrt0 τ0 wcrt1 τ1 wcrt2 τ2 wcrt3 τ3 wcrt4 τ4 wcrt5 τ5

2 / 21

slide-6
SLIDE 6

Contributions

1 Precise accounting for interference on shared resources in a many-core processor

t P0 y

00 40 80 task of interest

3 / 21

slide-7
SLIDE 7

Contributions

1 Precise accounting for interference on shared resources in a many-core processor

t P0 y

00 40 80 task of interest

2 Model of a multi-level arbiter to the shared memory

P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks

3 / 21

slide-8
SLIDE 8

Contributions

1 Precise accounting for interference on shared resources in a many-core processor

t P0 y

00 40 80 task of interest

2 Model of a multi-level arbiter to the shared memory

P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks

3 Response time and release dates analysis respecting dependencies. 3 / 21

slide-9
SLIDE 9

Outline

1 Motivation and Context 2 Models Definition

Architecture Model Execution Model Application Model

3 Multicore Response Time Analysis of SDF Programs 4 Evaluation 5 Conclusion and Future Work

4 / 21

slide-10
SLIDE 10

Outline

1 Motivation and Context 2 Models Definition

Architecture Model Execution Model Application Model

3 Multicore Response Time Analysis of SDF Programs 4 Evaluation 5 Conclusion and Future Work

5 / 21

slide-11
SLIDE 11

Architecture Model

I/O Ethernet 0 I/O Ethernet 1 I/O DDR 0 I/O DDR 1

  • Kalray MPPA 256 Bostan
  • 16 compute clusters + 4 I/O clusters
  • Dual NoC

6 / 21

slide-12
SLIDE 12

Architecture Model

I/O Ethernet 0 I/O Ethernet 1 I/O DDR 0 I/O DDR 1

P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks

Per cluster:

  • 16 cores + 1 Resource Manager
  • NoC Tx, NoC Rx, Debug Unit
  • 16 shared memory banks (total size: 2 MB)

6 / 21

slide-13
SLIDE 13

Architecture Model

I/O Ethernet 0 I/O Ethernet 1 I/O DDR 0 I/O DDR 1

P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks

Per cluster:

  • 16 cores + 1 Resource Manager
  • NoC Tx, NoC Rx, Debug Unit
  • 16 shared memory banks (total size: 2 MB)
  • Multi-level bus arbiter per memory bank

Rx Tx DSU RM

P15 P0

RR 3→1 RR 16→1 RR 2→1 FP shared memory bank high priority G3 G2 G1

6 / 21

slide-14
SLIDE 14

Architecture Model

I/O Ethernet 0 I/O Ethernet 1 I/O DDR 0 I/O DDR 1

P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks

Per cluster:

  • 16 cores + 1 Resource Manager
  • NoC Tx, NoC Rx, Debug Unit
  • 16 shared memory banks (total size: 2 MB)
  • Multi-level bus arbiter per memory bank

Rx Tx DSU RM

P15 P0

RR 3→1 RR 16→1 RR 2→1 FP shared memory bank high priority G3 G2 G1

core

D$ I$ RR 2→1

6 / 21

slide-15
SLIDE 15

Execution Model

τ1

NA

τ2

NB

τ3

NC

τ4

ND

τ5

NE

τ6

NF

i1 i2

  • P0

P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks

7 / 21

slide-16
SLIDE 16

Execution Model

τ1

NA

τ2

NB

τ3

NC

τ4

ND

τ5

NE

τ6

NF

i1 i2

  • P0

P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks

  • Tasks mapping on cores
  • Static non-preemptive scheduling

7 / 21

slide-17
SLIDE 17

Execution Model

τ1

NA

τ2

NB

τ3

NC

τ4

ND

τ5

NE

τ6

NF

i1 i2

  • P0

P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks P0 P1 P2 arbiter arbiter arbiter b0 b1 b2 memory bank (128 KB)

  • Tasks mapping on cores
  • Static non-preemptive scheduling
  • Spatial Isolation

different tasks go to different memory banks 7 / 21

slide-18
SLIDE 18

Execution Model

τ1

NA

τ2

NB

τ3

NC

τ4

ND

τ5

NE

τ6

NF

i1 i2

  • P0

P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks P0 P1 P2 arbiter arbiter arbiter b0 b1 b2 memory bank (128 KB)

  • Tasks mapping on cores
  • Static non-preemptive scheduling
  • Spatial Isolation

different tasks go to different memory banks

  • Interference from communications

7 / 21

slide-19
SLIDE 19

Execution Model

τ1

NA

τ2

NB

τ3

NC

τ4

ND

τ5

NE

τ6

NF

i1 i2

  • P0

P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks P0 P1 P2 arbiter arbiter arbiter b0 b1 b2 memory bank (128 KB)

  • Tasks mapping on cores
  • Static non-preemptive scheduling
  • Spatial Isolation

different tasks go to different memory banks

  • Interference from communications
  • Execution model:
  • execute in a “local” bank
  • write to a “remote” bank

Single phase: execute and write data.

memory access pattern

7 / 21

slide-20
SLIDE 20

Execution Model

τ1

NA

τ2

NB

τ3

NC

τ4

ND

τ5

NE

τ6

NF

i1 i2

  • P0

P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks P0 P1 P2 arbiter arbiter arbiter b0 b1 b2 memory bank (128 KB)

  • Tasks mapping on cores
  • Static non-preemptive scheduling
  • Spatial Isolation

different tasks go to different memory banks

  • Interference from communications
  • Execution model:
  • execute in a “local” bank
  • write to a “remote” bank

Single phase: execute and write data.

memory access pattern

Two phases: execute then write data.

memory access pattern

7 / 21

slide-21
SLIDE 21

Application Model

τ1

NA

τ2

NB

τ3

NC

τ4

ND

τ5

NE

τ6

NF

i1 i2

  • Direct Acyclic Task Graph
  • Mono-rate (or at least harmonic rates)
  • Fixed mapping and execution order

8 / 21

slide-22
SLIDE 22

Application Model

τ1

NA

τ2

NB

τ3

NC

τ4

ND

τ5

NE

τ6

NF

i1 i2

  • Direct Acyclic Task Graph
  • Mono-rate (or at least harmonic rates)
  • Fixed mapping and execution order

Each task τi:

t

00 40 80 120 160

8 / 21

slide-23
SLIDE 23

Application Model

τ1

NA

τ2

NB

τ3

NC

τ4

ND

τ5

NE

τ6

NF

i1 i2

  • Direct Acyclic Task Graph
  • Mono-rate (or at least harmonic rates)
  • Fixed mapping and execution order

Each task τi:

  • Processor Demand, Memory Demand

t

00 40 80 120 160 Processor Demand Memory access time

8 / 21

slide-24
SLIDE 24

Application Model

τ1

NA

τ2

NB

τ3

NC

τ4

ND

τ5

NE

τ6

NF

i1 i2

  • Direct Acyclic Task Graph
  • Mono-rate (or at least harmonic rates)
  • Fixed mapping and execution order

Each task τi:

  • Processor Demand, Memory Demand
  • Release date (reli), response time (Ri)

t

00 40 80 120 160

reli

Ri

Isolation

8 / 21

slide-25
SLIDE 25

Application Model

τ1

NA

τ2

NB

τ3

NC

τ4

ND

τ5

NE

τ6

NF

i1 i2

  • Direct Acyclic Task Graph
  • Mono-rate (or at least harmonic rates)
  • Fixed mapping and execution order

Each task τi:

  • Processor Demand, Memory Demand
  • Release date (reli), response time (Ri)

t

00 40 80 120 160

reli

Ri

Interference

  • 8 / 21
slide-26
SLIDE 26

Application Model

τ1

NA

τ2

NB

τ3

NC

τ4

ND

τ5

NE

τ6

NF

i1 i2

  • Direct Acyclic Task Graph
  • Mono-rate (or at least harmonic rates)
  • Fixed mapping and execution order

Each task τi:

  • Processor Demand, Memory Demand
  • Release date (reli), response time (Ri)

t

00 40 80 120 160

reli

Ri

Interference

  • Find Ri (including the interference)

Find reli respecting precedence constraints

8 / 21

slide-27
SLIDE 27

Outline

1 Motivation and Context 2 Models Definition

Architecture Model Execution Model Application Model

3 Multicore Response Time Analysis of SDF Programs 4 Evaluation 5 Conclusion and Future Work

9 / 21

slide-28
SLIDE 28

Response Time Analysis

R = PD + I

BUS(R)

  • Response Time

10 / 21

slide-29
SLIDE 29

Response Time Analysis

R = PD + I

BUS(R)

  • Response Time
  • Processor Demand

10 / 21

slide-30
SLIDE 30

Response Time Analysis

R = PD + I

BUS(R)

  • Response Time
  • Processor Demand
  • Bus Interference

(given a model of the bus arbiter) 10 / 21

slide-31
SLIDE 31

Response Time Analysis

R = PD + I

BUS(R) + I PROC(R) + I DRAM(R)

  • Response Time
  • Processor Demand
  • Bus Interference

(given a model of the bus arbiter)

  • Interference from preempting tasks

(no preemption: IPROC = 0)

  • Interference from DRAM refreshes

(out of scope. IDRAM = 0) 10 / 21

slide-32
SLIDE 32

Response Time Analysis

R = PD + I

BUS(R) + I PROC(R) + I DRAM(R)

  • Response Time
  • Processor Demand
  • Bus Interference

(given a model of the bus arbiter)

  • Interference from preempting tasks

(no preemption: IPROC = 0)

  • Interference from DRAM refreshes

(out of scope. IDRAM = 0)

  • Recursive formula ⇒ fixed-point algorithm.

10 / 21

slide-33
SLIDE 33

Response Time Analysis

R = PD + I

BUS(R) + I PROC(R) + I DRAM(R)

  • Response Time
  • Processor Demand
  • Bus Interference

(given a model of the bus arbiter)

  • Interference from preempting tasks

(no preemption: IPROC = 0)

  • Interference from DRAM refreshes

(out of scope. IDRAM = 0)

  • Recursive formula ⇒ fixed-point algorithm.
  • Multiple shared resources (memory banks)

10 / 21

slide-34
SLIDE 34

Response Time Analysis

R = PD + I

BUS(R) + I PROC(R) + I DRAM(R)

  • Response Time
  • Processor Demand
  • Bus Interference

(given a model of the bus arbiter)

  • Interference from preempting tasks

(no preemption: IPROC = 0)

  • Interference from DRAM refreshes

(out of scope. IDRAM = 0)

  • Recursive formula ⇒ fixed-point algorithm.
  • Multiple shared resources (memory banks)

IBUS(R) =

  • b∈B

IBUS

b

(R)

where B: a set of memory banks 10 / 21

slide-35
SLIDE 35

Response Time Analysis

R = PD + I

BUS(R) + I PROC(R) + I DRAM(R)

  • Response Time
  • Processor Demand
  • Bus Interference

(given a model of the bus arbiter)

  • Interference from preempting tasks

(no preemption: IPROC = 0)

  • Interference from DRAM refreshes

(out of scope. IDRAM = 0)

  • Recursive formula ⇒ fixed-point algorithm.
  • Multiple shared resources (memory banks)

IBUS(R) =

  • b∈B

IBUS

b

(R)

where B: a set of memory banks

Requires a model of the bus arbiter

10 / 21

slide-36
SLIDE 36

Model of the MPPA Bus

Rx Tx DSU RM P15 P1 P0

Lv1

  • RR

3→1 RR 16→1

Lv2

RR 2→1

Lv3

FP

Lv4

Shared Memory Bank high priority G3 G2 G1

IBUS

b

: delay from all accesses + concurrent ones

t P0 y

00 40 80 task of interest

11 / 21

slide-37
SLIDE 37

Model of the MPPA Bus

Rx Tx DSU RM P15 P1 P0

Lv1

  • RR

3→1 RR 16→1

Lv2

RR 2→1

Lv3

FP

Lv4

Shared Memory Bank high priority G3 G2 G1

IBUS

b

: delay from all accesses + concurrent ones

Sb

i : number of accesses of task τi to bank b

Sb i = Memory Demand to bank b

Lv1 = Sb

i

t P0 y

00 40 80 task of interest

11 / 21

slide-38
SLIDE 38

Model of the MPPA Bus

Rx Tx DSU RM P15 P1 P0

Lv1

  • RR

3→1 RR 16→1

Lv2

RR 2→1

Lv3

FP

Lv4

Shared Memory Bank high priority G3 G2 G1

IBUS

b

: delay from all accesses + concurrent ones

Sb

i : number of accesses of task τi to bank b

Sb i = Memory Demand to bank b

Ay,b

i

: number of concurrent accesses from core y to bank b

Lv1 = Sb

i

Lv2 = Lv1 +

15

  • y=1

min( Ay,b

i

,Lv1)

t P0 y

00 40 80 task of interest

11 / 21

slide-39
SLIDE 39

Model of the MPPA Bus

Rx Tx DSU RM P15 P1 P0

Lv1

  • RR

3→1 RR 16→1

Lv2

RR 2→1

Lv3

FP

Lv4

Shared Memory Bank high priority G3 G2 G1

IBUS

b

: delay from all accesses + concurrent ones

Sb

i : number of accesses of task τi to bank b

Sb i = Memory Demand to bank b

Ay,b

i

: number of concurrent accesses from core y to bank b

Lv1 = Sb

i

Lv2 = Lv1 +

15

  • y=1

min( Ay,b

i

,Lv1) Lv3 = Lv2 +min( AG2,b

i

,Lv2)

t P0 y

00 40 80 task of interest

AG2,b i

11 / 21

slide-40
SLIDE 40

Model of the MPPA Bus

Rx Tx DSU RM P15 P1 P0

Lv1

  • RR

3→1 RR 16→1

Lv2

RR 2→1

Lv3

FP

Lv4

Shared Memory Bank high priority G3 G2 G1

IBUS

b

: delay from all accesses + concurrent ones

Sb

i : number of accesses of task τi to bank b

Sb i = Memory Demand to bank b

Ay,b

i

: number of concurrent accesses from core y to bank b

Lv1 = Sb

i

Lv2 = Lv1 +

15

  • y=1

min( Ay,b

i

,Lv1) Lv3 = Lv2 +min( AG2,b

i

,Lv2) Lv4 = Lv4 + AG3,b

i

t P0 y

00 40 80 task of interest

AG3,b i

11 / 21

slide-41
SLIDE 41

Model of the MPPA Bus

Rx Tx DSU RM P15 P1 P0

Lv1

  • RR

3→1 RR 16→1

Lv2

RR 2→1

Lv3

FP

Lv4

Shared Memory Bank high priority G3 G2 G1

IBUS

b

: delay from all accesses + concurrent ones

Sb

i : number of accesses of task τi to bank b

Sb i = Memory Demand to bank b

Ay,b

i

: number of concurrent accesses from core y to bank b

Lv1 = Sb

i

Lv2 = Lv1 +

15

  • y=1

min( Ay,b

i

,Lv1) Lv3 = Lv2 +min( AG2,b

i

,Lv2) Lv4 = Lv4 + AG3,b

i

IBUS

b

= Lv4 ×Bus Delay

t P0 y

00 40 80 task of interest

11 / 21

slide-42
SLIDE 42

Model of the MPPA Bus

Rx Tx DSU RM P15 P1 P0

Lv1

  • RR

3→1 RR 16→1

Lv2

RR 2→1

Lv3

FP

Lv4

Shared Memory Bank high priority G3 G2 G1

IBUS

b

: delay from all accesses + concurrent ones

Sb

i : number of accesses of task τi to bank b

Sb i = Memory Demand to bank b

Ay,b

i

: number of concurrent accesses from core y to bank b

Ay,b i =

  • verlapping concurrent accesses

Lv1 = Sb

i

Lv2 = Lv1 +

15

  • y=1

min( Ay,b

i

,Lv1) Lv3 = Lv2 +min( AG2,b

i

,Lv2) Lv4 = Lv4 + AG3,b

i

IBUS

b

= Lv4 ×Bus Delay

t P0 y

00 40 80 task of interest

  • 11 / 21
slide-43
SLIDE 43

Model of the MPPA Bus

Rx Tx DSU RM P15 P1 P0

Lv1

  • RR

3→1 RR 16→1

Lv2

RR 2→1

Lv3

FP

Lv4

Shared Memory Bank high priority G3 G2 G1

IBUS

b

: delay from all accesses + concurrent ones

Sb

i : number of accesses of task τi to bank b

Sb i = Memory Demand to bank b

Ay,b

i

: number of concurrent accesses from core y to bank b

Ay,b i =

  • verlapping concurrent accesses

Lv1 = Sb

i

Lv2 = Lv1 +

15

  • y=1

min( Ay,b

i

,Lv1) Lv3 = Lv2 +min( AG2,b

i

,Lv2) Lv4 = Lv4 + AG3,b

i

IBUS

b

= Lv4 ×Bus Delay

t P0 y

00 40 80 task of interest

Ay,b

i

depends on reli and Ri

  • 11 / 21
slide-44
SLIDE 44

Response Time Analysis with Dependencies

PE2 PE1 PE0 τ0 τ1 τ2 τ3 τ4 τ5

1 Start with initial release dates.

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

1 initial reli 12 / 21

slide-45
SLIDE 45

Response Time Analysis with Dependencies

PE2 PE1 PE0 τ0 τ1 τ2 τ3 τ4 τ5

1 Start with initial release dates. 2 Compute response times

...

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

initial rel 0

i

Rl+1

i

= Rl

i

2 12 / 21

slide-46
SLIDE 46

Response Time Analysis with Dependencies

PE2 PE1 PE0 τ0 τ1 τ2 τ3 τ4 τ5

1 Start with initial release dates. 2 Compute response times

... ...

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

initial rel 0

i

Rl+1

i

= Rl

i

2 12 / 21

slide-47
SLIDE 47

Response Time Analysis with Dependencies

PE2 PE1 PE0 τ0 τ1 τ2 τ3 τ4 τ5

1 Start with initial release dates. 2 Compute response times

... ... ... a fixed-point is reached!

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

initial rel 0

i

Rl+1

i

= Rl

i

Ri

2 12 / 21

slide-48
SLIDE 48

Response Time Analysis with Dependencies

PE2 PE1 PE0 τ0 τ1 τ2 τ3 τ4 τ5

1 Start with initial release dates. 2 Compute response times

... ... ... a fixed-point is reached!

3 Update the release dates.

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

initial rel 0

i

Rl+1

i

= Rl

i

Update release dates for all i do reli ← latest finish time of all the de- pendencies end for 3 Ri

12 / 21

slide-49
SLIDE 49

Response Time Analysis with Dependencies

PE2 PE1 PE0 τ0 τ1 τ2 τ3 τ4 τ5

1 Start with initial release dates. 2 Compute response times

... ... ... a fixed-point is reached!

3 Update the release dates. 4 Repeat until no release date changes

(another fixed-point iteration).

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

initial rel 0

i

Rl+1

i

= Rl

i

Update release dates for all i do reli ← latest finish time of all the de- pendencies end for Ri

new reli repeat 4 12 / 21

slide-50
SLIDE 50

Response Time Analysis with Dependencies

PE2 PE1 PE0 τ0 τ1 τ2 τ3 τ4 τ5

1 Start with initial release dates. 2 Compute response times

... ... ... a fixed-point is reached!

3 Update the release dates. 4 Repeat until no release date changes

(another fixed-point iteration).

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

initial rel 0

i

Rl+1

i

= Rl

i

Update release dates for all i do reli ← latest finish time of all the de- pendencies end for Ri new reli repeate

reli did not change Return: (reli,Ri) 4 12 / 21

slide-51
SLIDE 51

Convergence Toward a Fixed-point

PE2 PE1 PE0 τ0 τ1 τ2 τ3 τ4 τ5

  • Convergence of the 1st

fixed-point iteration:

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

initial rel 0

i

Rl+1

i

= Rl

i

Update release dates for all i do reli ← latest finish time of all the de- pendencies end for Ri new reli repeate

reli did not change Return: (reli,Ri) 13 / 21

slide-52
SLIDE 52

Convergence Toward a Fixed-point

PE2 PE1 PE0 τ0 τ1 τ2 τ3 τ4 τ5

  • Convergence of the 1st

fixed-point iteration:

  • Monotonic and bounded ✓

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

initial rel 0

i

Rl+1

i

= Rl

i

Update release dates for all i do reli ← latest finish time of all the de- pendencies end for Ri new reli repeate

reli did not change Return: (reli,Ri) 13 / 21

slide-53
SLIDE 53

Convergence Toward a Fixed-point

PE2 PE1 PE0 τ0 τ1 τ2 τ3 τ4 τ5

  • Convergence of the 1st

fixed-point iteration:

  • Monotonic and bounded ✓
  • Convergence of the 2nd fixed-point iteration:

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

initial rel 0

i

Rl+1

i

= Rl

i

Update release dates for all i do reli ← latest finish time of all the de- pendencies end for Ri

new reli repeat reli did not change Return: (reli,Ri) 13 / 21

slide-54
SLIDE 54

Convergence Toward a Fixed-point

PE2 PE1 PE0 τ0 τ1 τ2 τ3 τ4 τ5

  • Convergence of the 1st

fixed-point iteration:

  • Monotonic and bounded ✓
  • Convergence of the 2nd fixed-point iteration:
  • no monotonicity: Ri and reli may grow or shrink at

each iteration. ?

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

initial rel 0

i

Rl+1

i

= Rl

i

Update release dates for all i do reli ← latest finish time of all the de- pendencies end for Ri

new reli repeat reli did not change Return: (reli,Ri) 13 / 21

slide-55
SLIDE 55

Convergence Toward a Fixed-point

PE2 PE1 PE0 τ0 τ1 τ2 τ3 τ4 τ5

  • Convergence of the 1st

fixed-point iteration:

  • Monotonic and bounded ✓
  • Convergence of the 2nd fixed-point iteration:
  • no monotonicity: Ri and reli may grow or shrink at

each iteration. ?

Theorem

At each iteration, at least one task finds its final release date. Full proof in our technical report:

http://www-verimag.imag.fr/TR/TR-2016-1.pdf

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

initial rel 0

i

Rl+1

i

= Rl

i

Update release dates for all i do reli ← latest finish time of all the de- pendencies end for Ri

new reli repeat reli did not change Return: (reli,Ri) 13 / 21

slide-56
SLIDE 56

Convergence Toward a Fixed-point

PE2 PE1 PE0 t τ0 τ1 τ2 τ3 τ4 τ5

  • Convergence of the 1st

fixed-point iteration:

  • Monotonic and bounded ✓
  • Convergence of the 2nd fixed-point iteration:
  • no monotonicity: Ri and reli may grow or shrink at

each iteration. ?

Theorem

At each iteration, at least one task finds its final release date. Full proof in our technical report:

http://www-verimag.imag.fr/TR/TR-2016-1.pdf

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

initial rel 0

i

Rl+1

i

= Rl

i

Update release dates for all i do reli ← latest finish time of all the de- pendencies end for Ri

new reli repeat reli did not change Return: (reli,Ri) 13 / 21

slide-57
SLIDE 57

Convergence Toward a Fixed-point

PE2 PE1 PE0 t final release date τ0 τ1 τ2 τ3 τ4 τ5

  • Convergence of the 1st

fixed-point iteration:

  • Monotonic and bounded ✓
  • Convergence of the 2nd fixed-point iteration:
  • no monotonicity: Ri and reli may grow or shrink at

each iteration. ?

Theorem

At each iteration, at least one task finds its final release date. Full proof in our technical report:

http://www-verimag.imag.fr/TR/TR-2016-1.pdf

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

initial rel 0

i

Rl+1

i

= Rl

i

Update release dates for all i do reli ← latest finish time of all the de- pendencies end for Ri

new reli repeat reli did not change Return: (reli,Ri) 13 / 21

slide-58
SLIDE 58

Convergence Toward a Fixed-point

PE2 PE1 PE0 t final release date τ0 τ1 τ2 τ3 τ4 τ5

  • Convergence of the 1st

fixed-point iteration:

  • Monotonic and bounded ✓
  • Convergence of the 2nd fixed-point iteration:
  • no monotonicity: Ri and reli may grow or shrink at

each iteration. ?

Theorem

At each iteration, at least one task finds its final release date. Full proof in our technical report:

http://www-verimag.imag.fr/TR/TR-2016-1.pdf

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

initial rel 0

i

Rl+1

i

= Rl

i

Update release dates for all i do reli ← latest finish time of all the de- pendencies end for Ri

new reli repeat reli did not change Return: (reli,Ri) 13 / 21

slide-59
SLIDE 59

Convergence Toward a Fixed-point

PE2 PE1 PE0 t final release date τ0 τ1 τ2 τ3 τ4 τ5

  • Convergence of the 1st

fixed-point iteration:

  • Monotonic and bounded ✓
  • Convergence of the 2nd fixed-point iteration:
  • no monotonicity: Ri and reli may grow or shrink at

each iteration. ?

Theorem

At each iteration, at least one task finds its final release date. Full proof in our technical report:

http://www-verimag.imag.fr/TR/TR-2016-1.pdf

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

initial rel 0

i

Rl+1

i

= Rl

i

Update release dates for all i do reli ← latest finish time of all the de- pendencies end for Ri

new reli repeat reli did not change Return: (reli,Ri) 13 / 21

slide-60
SLIDE 60

Convergence Toward a Fixed-point

PE2 PE1 PE0 t final release date τ0 τ1 τ2 τ3 τ4 τ5

  • Convergence of the 1st

fixed-point iteration:

  • Monotonic and bounded ✓
  • Convergence of the 2nd fixed-point iteration:
  • no monotonicity: Ri and reli may grow or shrink at

each iteration. ?

Theorem

At each iteration, at least one task finds its final release date. Full proof in our technical report:

http://www-verimag.imag.fr/TR/TR-2016-1.pdf

WCRT analysis for all i do

Rl+1

i

← PDi +IBUS(Rl

i,reli)

end for

initial rel 0

i

Rl+1

i

= Rl

i

Update release dates for all i do reli ← latest finish time of all the de- pendencies end for Ri

new reli repeat reli did not change Return: (reli,Ri) 13 / 21

slide-61
SLIDE 61

Outline

1 Motivation and Context 2 Models Definition

Architecture Model Execution Model Application Model

3 Multicore Response Time Analysis of SDF Programs 4 Evaluation 5 Conclusion and Future Work

14 / 21

slide-62
SLIDE 62

Evaluation: ROSACE Case Study 1

va_filter (100Hz) q_filter (100Hz) vz_filter (100Hz) az_filter (100Hz) h_filter (100Hz) altitude (50Hz) vz_control (50Hz) va_control (50Hz) va (200Hz) q (200Hz) vz (200Hz) az (200Hz) h (200Hz) δec δthe

  • Flight management system controller

1 Pagetti et al., RTAS 2014 15 / 21

slide-63
SLIDE 63

Evaluation: ROSACE Case Study 1

va_filter (100Hz) q_filter (100Hz) vz_filter (100Hz) az_filter (100Hz) h_filter (100Hz) altitude (50Hz) vz_control (50Hz) va_control (50Hz) va (200Hz) q (200Hz) vz (200Hz) az (200Hz) h (200Hz) δec δthe

  • Flight management system controller
  • Receive from sensors and transmit to actuators

1 Pagetti et al., RTAS 2014 15 / 21

slide-64
SLIDE 64

Evaluation: ROSACE Case Study 1

va_filter (100Hz) q_filter (100Hz) vz_filter (100Hz) az_filter (100Hz) h_filter (100Hz) altitude (50Hz) vz_control (50Hz) va_control (50Hz) va (200Hz) q (200Hz) vz (200Hz) az (200Hz) h (200Hz) δec δthe

Rx Tx P4 P3 P2 P1 P0 va_filter 100 Hz va_control 50 Hz q_filter 100 Hz vz_filter 100 Hz az_filter 100 Hz h_filter 100 Hz altitude 50 Hz vz_control 50 Hz transmit 50 Hz receive 200 Hz

  • Flight management system controller
  • Receive from sensors and transmit to actuators
  • Assumptions:

Tasks are mapped on 5 cores Debug Support Unit is disabled Context switches are over-approximated constants

1 Pagetti et al., RTAS 2014 15 / 21

slide-65
SLIDE 65

Evaluation: ROSACE Case Study 1

va_filter (100Hz) q_filter (100Hz) vz_filter (100Hz) az_filter (100Hz) h_filter (100Hz) altitude (50Hz) vz_control (50Hz) va_control (50Hz) va (200Hz) q (200Hz) vz (200Hz) az (200Hz) h (200Hz) δec δthe

Rx Tx P4 P3 P2 P1 P0 va_filter 100 Hz va_control 50 Hz va_filter 100 Hz q_filter 100 Hz q_filter 100 Hz vz_filter 100 Hz vz_filter 100 Hz az_filter 100 Hz az_filter 100 Hz h_filter 100 Hz altitude 50 Hz vz_control 50 Hz h_filter 100 Hz transmit 50 Hz receive 200 Hz receive 200 Hz receive 200 Hz receive 200 Hz Hyper-period

  • Flight management system controller
  • Receive from sensors and transmit to actuators
  • Assumptions:

Tasks are mapped on 5 cores Debug Support Unit is disabled Context switches are over-approximated constants

1 Pagetti et al., RTAS 2014 15 / 21

slide-66
SLIDE 66

Evaluation: ROSACE Case Study

Task Processor Demand (cycles) Memory Demand (accesses) altitude 275 22 az_filter 274 22 h_filter 326 24 va_control 303 24 va_filter 301 23 vz_control 320 25 vz_filter 334 25 Table: Task profiles of the FMS controller

  • Profile obtained from measurements

16 / 21

slide-67
SLIDE 67

Evaluation: ROSACE Case Study

Task Processor Demand (cycles) Memory Demand (accesses) altitude 275 22 az_filter 274 22 h_filter 326 24 va_control 303 24 va_filter 301 23 vz_control 320 25 vz_filter 334 25 Table: Task profiles of the FMS controller

  • Profile obtained from measurements
  • Memory Demand: data and instruction cache misses + communications

16 / 21

slide-68
SLIDE 68

Evaluation: ROSACE Case Study

Task Processor Demand (cycles) Memory Demand (accesses) altitude 275 22 az_filter 274 22 h_filter 326 24 va_control 303 24 va_filter 301 23 vz_control 320 25 vz_filter 334 25 Table: Task profiles of the FMS controller

  • Profile obtained from measurements
  • Memory Demand: data and instruction cache misses + communications
  • Moreover:
  • NoC Rx: writes 5 words
  • NoC Tx: reads 2 words

16 / 21

slide-69
SLIDE 69

Evaluation: ROSACE Case Study

Task Processor Demand (cycles) Memory Demand (accesses) altitude 275 22 az_filter 274 22 h_filter 326 24 va_control 303 24 va_filter 301 23 vz_control 320 25 vz_filter 334 25 Table: Task profiles of the FMS controller

  • Profile obtained from measurements
  • Memory Demand: data and instruction cache misses + communications
  • Moreover:
  • NoC Rx: writes 5 words
  • NoC Tx: reads 2 words

Experiments: Find the smallest schedulable hyper-period

16 / 21

slide-70
SLIDE 70

Evaluation: Experiments

1 bank 5 banks

4000 8000 12000 16000 MPPA RR MPPA RR Bus Policy Processor cycles

E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase E1: 2−Phase

Smallest schedulable hyper-period

17 / 21

slide-71
SLIDE 71

Evaluation: Experiments

1 bank 5 banks

4000 8000 12000 16000 MPPA RR MPPA RR Bus Policy Processor cycles

E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase E1: 2−Phase

Smallest schedulable hyper-period

  • Pessimistic assumption:

High priority tasks are bounded by 1 access per bank E5: All accesses interfere 17 / 21

slide-72
SLIDE 72

Evaluation: Experiments

1 bank 5 banks

4000 8000 12000 16000 MPPA RR MPPA RR Bus Policy Processor cycles

E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase E1: 2−Phase

Smallest schedulable hyper-period

  • Pessimistic assumption:

High priority tasks are bounded by 1 access per bank E5: All accesses interfere E4, E3: We don’t use the release dates 17 / 21

slide-73
SLIDE 73

Evaluation: Experiments

1 bank 5 banks

4000 8000 12000 16000 MPPA RR MPPA RR Bus Policy Processor cycles

E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase E1: 2−Phase

Smallest schedulable hyper-period

  • Pessimistic assumption:

High priority tasks are bounded by 1 access per bank E5: All accesses interfere E4, E3: We don’t use the release dates E2, E1: Our approach. We use the release dates 17 / 21

slide-74
SLIDE 74

Evaluation: Experiments

memory access pattern memory access pattern

1 bank 5 banks

4000 8000 12000 16000 MPPA RR MPPA RR Bus Policy Processor cycles

E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase E1: 2−Phase

Smallest schedulable hyper-period

  • Pessimistic assumption:

High priority tasks are bounded by 1 access per bank

  • Phases are modeled as

sub-tasks

2-Phase model 1-Phase model

E5: All accesses interfere E4, E3: We don’t use the release dates E2, E1: Our approach. We use the release dates 17 / 21

slide-75
SLIDE 75

Evaluation: Experiments

Taking into account the memory banks improves the analysis with a factor in [1.77,2.52]

1 bank 5 banks

4000 8000 12000 16000 MPPA RR MPPA RR Bus Policy Processor cycles

E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase E1: 2−Phase

Smallest schedulable hyper-period

C
  • n
s i s t e n t * C
  • m
p l e t e * W e l l d
  • c
u m e n t e d * E a s y t
  • r
e u s e *

*

E v a l u a t e d

* R T N S *

A r t i f a c t

E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA

4.15 4.12 1.68 1.29 ∼1.01 0.77

RR

3.3 3.29 1.24 1.13 ∼1.01 0.91

Speedup factors

18 / 21

slide-76
SLIDE 76

Evaluation: Experiments

Taking into account the memory banks improves the analysis with a factor in [1.77,2.52]

1 bank 5 banks

4000 8000 12000 16000 MPPA RR MPPA RR Bus Policy Processor cycles

E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase E1: 2−Phase

Smallest schedulable hyper-period

C
  • n
s i s t e n t * C
  • m
p l e t e * W e l l d
  • c
u m e n t e d * E a s y t
  • r
e u s e *

*

E v a l u a t e d

* R T N S *

A r t i f a c t

E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA

4.15 4.12 1.68 1.29 ∼1.01 0.77

RR

3.3 3.29 1.24 1.13 ∼1.01 0.91

Speedup factors

18 / 21

slide-77
SLIDE 77

Evaluation: Experiments

Taking into account the memory banks improves the analysis with a factor in [1.77,2.52]

1 bank 5 banks

4000 8000 12000 16000 MPPA RR MPPA RR Bus Policy Processor cycles

E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase E1: 2−Phase

Smallest schedulable hyper-period

C
  • n
s i s t e n t * C
  • m
p l e t e * W e l l d
  • c
u m e n t e d * E a s y t
  • r
e u s e *

*

E v a l u a t e d

* R T N S *

A r t i f a c t

E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA

4.15 4.12 1.68 1.29 ∼1.01 0.77

RR

3.3 3.29 1.24 1.13 ∼1.01 0.91

Speedup factors

18 / 21

slide-78
SLIDE 78

Evaluation: Experiments

Taking into account the memory banks improves the analysis with a factor in [1.77,2.52]

1 bank 5 banks

4000 8000 12000 16000 MPPA RR MPPA RR Bus Policy Processor cycles

E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase E1: 2−Phase

Smallest schedulable hyper-period

C
  • n
s i s t e n t * C
  • m
p l e t e * W e l l d
  • c
u m e n t e d * E a s y t
  • r
e u s e *

*

E v a l u a t e d

* R T N S *

A r t i f a c t

E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA

4.15 4.12 1.68 1.29 ∼1.01 0.77

RR

3.3 3.29 1.24 1.13 ∼1.01 0.91

Speedup factors

18 / 21

slide-79
SLIDE 79

Evaluation: Experiments

Taking into account the memory banks improves the analysis with a factor in [1.77,2.52]

1 bank 5 banks

4000 8000 12000 16000 MPPA RR MPPA RR Bus Policy Processor cycles

E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase E1: 2−Phase

Smallest schedulable hyper-period

C
  • n
s i s t e n t * C
  • m
p l e t e * W e l l d
  • c
u m e n t e d * E a s y t
  • r
e u s e *

*

E v a l u a t e d

* R T N S *

A r t i f a c t

E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA

4.15 4.12 1.68 1.29 ∼1.01 0.77

RR

3.3 3.29 1.24 1.13 ∼1.01 0.91

Speedup factors

18 / 21

slide-80
SLIDE 80

Evaluation: Experiments

Taking into account the memory banks improves the analysis with a factor in [1.77,2.52]

1 bank 5 banks

4000 8000 12000 16000 MPPA RR MPPA RR Bus Policy Processor cycles

E5: Pessimistic E4: 1−Phase (w/o release) E3: 2−Phase (w/o release) E2: 1−Phase E1: 2−Phase

Smallest schedulable hyper-period

C
  • n
s i s t e n t * C
  • m
p l e t e * W e l l d
  • c
u m e n t e d * E a s y t
  • r
e u s e *

*

E v a l u a t e d

* R T N S *

A r t i f a c t

E5/E1 E5/E2 E3/E1 E4/E2 E2/E1 E4/E3 MPPA

4.15 4.12 1.68 1.29 ∼1.01 0.77

RR

3.3 3.29 1.24 1.13 ∼1.01 0.91

Speedup factors

18 / 21

slide-81
SLIDE 81

Outline

1 Motivation and Context 2 Models Definition

Architecture Model Execution Model Application Model

3 Multicore Response Time Analysis of SDF Programs 4 Evaluation 5 Conclusion and Future Work

19 / 21

slide-82
SLIDE 82

Conclusion

  • A response time analysis of SDF on the Kalray MPPA 256

20 / 21

slide-83
SLIDE 83

Conclusion

  • A response time analysis of SDF on the Kalray MPPA 256
  • Given:
  • Task profile
  • Mapping of Tasks
  • Execution Order

20 / 21

slide-84
SLIDE 84

Conclusion

  • A response time analysis of SDF on the Kalray MPPA 256
  • Given:
  • Task profile
  • Mapping of Tasks
  • Execution Order
  • We compute:
  • Tight response times taking into account the interference.
  • Release dates respecting the dependency constraints.

20 / 21

slide-85
SLIDE 85

Conclusion

  • A response time analysis of SDF on the Kalray MPPA 256
  • Given:
  • Task profile
  • Mapping of Tasks
  • Execution Order
  • We compute:
  • Tight response times taking into account the interference.
  • Release dates respecting the dependency constraints.

model of the multi-level arbiter

20 / 21

slide-86
SLIDE 86

Conclusion

  • A response time analysis of SDF on the Kalray MPPA 256
  • Given:
  • Task profile
  • Mapping of Tasks
  • Execution Order
  • We compute:
  • Tight response times taking into account the interference.
  • Release dates respecting the dependency constraints.

model of the multi-level arbiter double fixed-point algorithm

20 / 21

slide-87
SLIDE 87

Conclusion

  • A response time analysis of SDF on the Kalray MPPA 256
  • Given:
  • Task profile
  • Mapping of Tasks
  • Execution Order
  • We compute:
  • Tight response times taking into account the interference.
  • Release dates respecting the dependency constraints.
  • Not restricted to SDF

model of the multi-level arbiter double fixed-point algorithm

20 / 21

slide-88
SLIDE 88

Future Work

  • Model of the Resource Manager.

P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks

21 / 21

slide-89
SLIDE 89

Future Work

  • Model of the Resource Manager.

tighter estimation of context switches and

  • ther interrupts

P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks

21 / 21

slide-90
SLIDE 90

Future Work

  • Model of the Resource Manager.
  • Model of the NoC accesses.

tighter estimation of context switches and

  • ther interrupts

P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks

21 / 21

slide-91
SLIDE 91

Future Work

  • Model of the Resource Manager.
  • Model of the NoC accesses.

tighter estimation of context switches and

  • ther interrupts

use the output of any NoC analysis

P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks

21 / 21

slide-92
SLIDE 92

Future Work

  • Model of the Resource Manager.
  • Model of the NoC accesses.
  • Memory access pipelining.

tighter estimation of context switches and

  • ther interrupts

use the output of any NoC analysis

P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks

21 / 21

slide-93
SLIDE 93

Future Work

  • Model of the Resource Manager.
  • Model of the NoC accesses.
  • Memory access pipelining.

tighter estimation of context switches and

  • ther interrupts

use the output of any NoC analysis current assumption: bus delay is 10 cycles

P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks

21 / 21

slide-94
SLIDE 94

Future Work

  • Model of the Resource Manager.
  • Model of the NoC accesses.
  • Memory access pipelining.
  • Model Blocking and non-blocking accesses.

tighter estimation of context switches and

  • ther interrupts

use the output of any NoC analysis current assumption: bus delay is 10 cycles

P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks

21 / 21

slide-95
SLIDE 95

Future Work

  • Model of the Resource Manager.
  • Model of the NoC accesses.
  • Memory access pipelining.
  • Model Blocking and non-blocking accesses.

tighter estimation of context switches and

  • ther interrupts

use the output of any NoC analysis current assumption: bus delay is 10 cycles reads are blocking writes are non-blocking

P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks

21 / 21

slide-96
SLIDE 96

Future Work

  • Model of the Resource Manager.
  • Model of the NoC accesses.
  • Memory access pipelining.
  • Model Blocking and non-blocking accesses.

tighter estimation of context switches and

  • ther interrupts

use the output of any NoC analysis current assumption: bus delay is 10 cycles reads are blocking writes are non-blocking

P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx 8 shared memory banks 8 shared memory banks

Questions?

21 / 21

slide-97
SLIDE 97

BACKUP

slide-98
SLIDE 98

Multicore Response Time Analysis

Example: Fixed Priority bus arbiter, PE1 > PE0 Bus access delay = 10

t

PE0 PE1 00 40 80 120 160

2 accesses 2 accesses T1 2 accesses T1 2 accesses T1 T1 T0

1Altmeyer et al., RTNS 2015

slide-99
SLIDE 99

Multicore Response Time Analysis

Example: Fixed Priority bus arbiter, PE1 > PE0 Bus access delay = 10

t

PE0 PE1 00 40 80 120 160

2 accesses 2 accesses T1 2 accesses T1 2 accesses T1 T1 3 accesses T0

  • Task of interest running on PE0:

R0 = 10+3×10 (response time in isolation)

1Altmeyer et al., RTNS 2015

slide-100
SLIDE 100

Multicore Response Time Analysis

Example: Fixed Priority bus arbiter, PE1 > PE0 Bus access delay = 10

t

PE0 PE1 00 40 80 120 160

2 accesses 2 accesses T1 2 accesses T1 2 accesses T1 T1 3 accesses T0

  • Task of interest running on PE0:

R0 = 10+3×10 (response time in isolation) R1 = 10+3×10+2×10 = 60

1Altmeyer et al., RTNS 2015

slide-101
SLIDE 101

Multicore Response Time Analysis

Example: Fixed Priority bus arbiter, PE1 > PE0 Bus access delay = 10

t

PE0 PE1 00 40 80 120 160

2 accesses 2 accesses T1 2 accesses T1 2 accesses T1 T1 3 accesses T0

  • Task of interest running on PE0:

R0 = 10+3×10 (response time in isolation) R1 = 10+3×10+2×10 = 60 R2 = 10+3×10+2×10+2×10 = 80

1Altmeyer et al., RTNS 2015

slide-102
SLIDE 102

Multicore Response Time Analysis

Example: Fixed Priority bus arbiter, PE1 > PE0 Bus access delay = 10

t

PE0 PE1 00 40 80 120 160

2 accesses 2 accesses T1 2 accesses T1 2 accesses T1 T1 T0 3 accesses

Response time

  • Task of interest running on PE0:

R0 = 10+3×10 (response time in isolation) R1 = 10+3×10+2×10 = 60 R2 = 10+3×10+2×10+2×10 = 80 R3 = 10+3×10+2×10+2×10+0 = 80 (fixed-point)

1Altmeyer et al., RTNS 2015

slide-103
SLIDE 103

The Global Picture

Static Mapping/Scheduling WCRT with Interferences Local WCRT Analysis Timing models (static analysis) Probabilistic Models High-level Program + Executable Binary Binary Generation Code Generation Dependencies Tasks Mapping Execution Order Release Dates + Tasks WCRT WC Access