Symbolic Computation of Latency for Dataflow Graphs Adnan Bouakaz - - PowerPoint PPT Presentation

symbolic computation of latency for dataflow graphs
SMART_READER_LITE
LIVE PREVIEW

Symbolic Computation of Latency for Dataflow Graphs Adnan Bouakaz - - PowerPoint PPT Presentation

Symbolic Computation of Latency for Dataflow Graphs Adnan Bouakaz Pascal Fradet Alain Girault SYNCHRON International Workshop, Bamberg December 7th, 2016 Introduction Outline Introduction 1 Application model Scheduling policy Symbolic


slide-1
SLIDE 1

Symbolic Computation of Latency for Dataflow Graphs

Adnan Bouakaz Pascal Fradet Alain Girault

SYNCHRON International Workshop, Bamberg

December 7th, 2016

slide-2
SLIDE 2

Introduction

Outline

1

Introduction Application model Scheduling policy Symbolic analysis

2

Preliminary results

3

Graph A

p q

− − →B

4

Generalization to chains and acyclic graphs

5

Experiments

6

Conclusion

1 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-3
SLIDE 3

Introduction Application model

Data-flow models of computation

Stream-processing applications are found in many embedded systems video codecs, software defined radio, ... computationally intensive strict quality-of-service requirements low energy consumption more and more these applications run on many-core platforms Data-flow models of computation are good at: Expressing task-level parallelism Achieving efficient implementation Guaranteeing performances at compile time:

throughput: stream oriented applications latency: automatic control oriented applications buffer sizes: all embedded applications

2 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-4
SLIDE 4

Introduction Application model

Acyclic Synchronous Data-FLow (SDF) graphs

[Lee and Messerschmitt, Proc. 1987]

A B C

3 2 1 3

actor edge rate execution time

t

A =15

t

B =8

t

C =17

3 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-5
SLIDE 5

Introduction Application model

Acyclic Synchronous Data-FLow (SDF) graphs

[Lee and Messerschmitt, Proc. 1987]

A B C

3 2 1 3

z

A · 3 = z B · 2

z

B · 1 = z C · 3

System of Balance Equations Consistent SDF graph G: this system has a non-null solution Repetition vector of G: z = [A2, B3, C1] Iteration = firing sequence that returns G to its initial state

  • 6
  • 3
  • A2

B3 C1

3 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-6
SLIDE 6

Introduction Scheduling policy

Scheduling policy

As Soon As Possible (ASAP) [Sriram and Bhattacharyya 2000] No auto-concurrency Modeling Techniques

A B A B

t

A =15

t

B =8

3 2 3 2 3 2 8

buffer size auto-concurrency

  • z = [2, 3]

4 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-7
SLIDE 7

Introduction Scheduling policy

Scheduling policy

As Soon As Possible (ASAP) [Sriram and Bhattacharyya 2000] No auto-concurrency Modeling Techniques

A B A B

t

A =15

t

B =8

3 2 3 2 3 2 8

buffer size auto-concurrency

  • z = [2, 3]

15 23 30 38 45 46 54 60 68 75 76 84 90 98 106

A B

transient phase steady state

4 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-8
SLIDE 8

Introduction Scheduling policy

Scheduling policy

Definition: Multi-iteration latency of graph G: LG(n) = the finish time of the nth iteration.

A B

LG(1) LG(2)

5 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-9
SLIDE 9

Introduction Scheduling policy

Scheduling policy

Definition: Input-output latency of graph G: ℓG(n) = the duration between the start and ending of the nth iteration.

A B

ℓG(1) ℓG(2)

5 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-10
SLIDE 10

Introduction Scheduling policy

Scheduling policy

Definition: Period of graph G: PG = the average length of an iteration = lim

n→∞

LG(n) n Definition: Throughput of graph G: TG = 1 PG

A B

PG PG

5 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-11
SLIDE 11

Introduction Symbolic analysis

Symbolic analysis

parametric dataflow graph partially specified SDF graph SDF graph

instantiation numerical analysis

SDF graph results

numerical

NP-complete for HSDF

6 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-12
SLIDE 12

Introduction Symbolic analysis

Symbolic analysis

parametric dataflow graph partially specified SDF graph SDF graph

instantiation numerical analysis

SDF graph results

numerical

NP-complete for HSDF symbolic analysis symbolic evaluation numerical evaluation symbolic formulas

symbolic

6 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-13
SLIDE 13

Preliminary results

Outline

1

Introduction

2

Preliminary results Duality theorem

3

Graph A

p q

− − →B

4

Generalization to chains and acyclic graphs

5

Experiments

6

Conclusion

7 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-14
SLIDE 14

Preliminary results Duality theorem

Duality theorem

Definition: The dual of an SDF graph G: G−1 is obtained by reversing all edges of G. Duality theorem:

Let G be any (cyclic or not) live graph and G−1 be its dual, then TG = TG−1 and ∀i. LG(i) = LG−1(i).

A B

t

A =10

t

B =12

2 3 2 3 7

A B

30 42 60 72

LG(n)=LG−1(n) G G−1

B A

24 42 48 72 8 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-15
SLIDE 15

Preliminary results Duality theorem

Duality theorem

Definition: The dual of an SDF graph G: G−1 is obtained by reversing all edges of G. Duality theorem:

Let G be any (cyclic or not) live graph and G−1 be its dual, then TG = TG−1 and ∀i. LG(i) = LG−1(i).

Proof: Using SDF-to-HSDF transformation + unfolding:

A1 A2 A3 B1 B2

HSDF(G)

A1 A2 A3 B1 B2

HSDF(G−1)

8 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-16
SLIDE 16

Graph A

p q

− − − →B

Outline

1

Introduction

2

Preliminary results

3

Graph A

p q

− − →B Enabling patterns Minimum latency

4

Generalization to chains and acyclic graphs

5

Experiments

6

Conclusion

9 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-17
SLIDE 17

Graph A

p q

− − − →B

Preliminaries about graph A

p q

− − →B

Four parameters: p, q ∈ N+ and t

A, t B ∈ R+.

Repetition vector:

  • z

A =

q gcd(p, q), z

B =

p gcd(p, q)

  • ASAP period: PG = max(z

At A, z Bt B).

Problem statement What is θ

A,B the min. size of channel A−

− →B s.t. the ASAP execution achieves the max. throughput? Solution p + q − gcd(p, q) < θ

A,B ≤ 2(p + q − gcd(p, q))

Proof: 18 cases in total: p, q → 6 cases; t

A, t B → 3 cases

10 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-18
SLIDE 18

Graph A

p q

− − − →B Enabling patterns

Enabling patterns

A time-independent analytic and parametric characterization of the data-dependency A → B that covers one iteration. Example: Graph A 8

5

− − →B with t

A = 20 and t B = 7 A1 B1 A2 B2 B3 A3 B4 A4 B5 B6 A5 B7 B8 8 A B 11 A B2 9 A B 12 A B2 10 A B2 enabling point

Ai Bj ⇔ i firings of A enables j firings of B.

Unfolded pattern:

A B; A B2; A B; A B2; A B2

11 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-19
SLIDE 19

Graph A

p q

− − − →B Enabling patterns

Enabling patterns

Unfolded pattern: A B; A B2;

  • block

A B; A B2; A B2

  • block

12 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-20
SLIDE 20

Graph A

p q

− − − →B Enabling patterns

Enabling patterns

Unfolded pattern: A B; A B2;

  • block

A B; A B2; A B2

  • block

Factorized pattern:

  • A B; [A B2]fi

i=1··2 with f1 = 1, f2 = 2

12 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-21
SLIDE 21

Graph A

p q

− − − →B Enabling patterns

Enabling patterns

Unfolded pattern: A B; A B2;

  • block

A B; A B2; A B2

  • block

Factorized pattern:

  • A B; [A B2]fi

i=1··2 with f1 = 1, f2 = 2

General case:

  • A Bk;
  • A Bk+1αjj=1··

q−r gcd(p,q)

with p = kq + r and αj =

  • jr

q−r

(j−1)r

q−r

  • 12

Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-22
SLIDE 22

Graph A

p q

− − − →B Enabling patterns

Enabling patterns

Case A. p ≥ q

Let p = kq + r with 0 ≤ r < q Case A.1. r = 0 A Bk Case A.2. q ≤ 2r

  • A Bk;

A Bk+1αjj=1··

q−r gcd(p,q)

Case A.3. q > 2r

  • A Bkβj ; A Bk+1j=1··

r gcd(p,q)

αj = jr

q−r

  • − (j−1)r

q−r

  • βj =
  • jq

r

  • − (j−1)q

r

  • − 1

Case B. p < q

Let q = kp + r with 0 ≤ r < p Case B.1. r = 0 Ak B Case B.2. p ≥ 2r

  • Ak+1 B;

Ak Bγjj=1··

r gcd(p,q)

Case B.3. p < 2r

  • Ak+1 Bλj ; Ak B

j=1··

p−r gcd(p,q)

γj =

  • jp

r

  • − (j−1)p

r

  • − 1

λj =

  • jr

p−r

  • − (j−1)r

p−r

  • 13

Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-23
SLIDE 23

Graph A

p q

− − − →B Minimum latency

Multi-iteration latency: Case z

At A ≥ z Bt B

A imposes a higher load than B A never gets idle = ⇒ PG = z

At A

LG(n) = nPG + ∆

A,B ⇐

⇒ LG(n)

n

= nPG+∆

A,B

n

= PG + ∆

A,B

n

≥ PG ∆

A,B is the remaining execution time for actor B after actor A has

finished its firings of the nth iteration ∆

A,B is constant over all iterations so limn→+∞ ∆

A,B

n

= 0

(graph A

5 3

− − →B with TA = 14 and tB = 8)

14 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-24
SLIDE 24

Graph A

p q

− − − →B Minimum latency

Multi-iteration latency: Case z

At A ≥ z Bt B

Case I. ∆

A,B =

p

q

  • t

B

Case II.1. ∆

A,B = t A +

  • r

q − r (k + 1) t

B − t A

  • Case II.2.

A,B = t B +

p − r

r

  • (t

B − kt A)

Case III. ...

15 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-25
SLIDE 25

Graph A

p q

− − − →B Minimum latency

Multi-iteration latency: Case z

At A < z Bt B

B imposes a higher load than A B never gets idle in the steady state (untrue in transient) ∆

A,B may not constant over all iterations and diverges to infinity if

the buffer is unbounded Better solution: compute ∆

A,B with the duality theorem

LG(n) = LG−

1(n) = nPG− 1 + ∆

B,A

(graph A

5 3

− − →B with TA = 14 and tB = 12)

16 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-26
SLIDE 26

Graph A

p q

− − − →B Minimum latency

Input-output latency

Case z

At A ≥ z Bt B

A imposes the highest load = ⇒ PG = z

At A

ℓG(n) is equal to the finish time of the nth iteration minus the start time of the first firing of A in the nth iteration ℓG(n) = LG(n) − (n − 1)z

At A = LG(n) − (n − 1)PG = PG + ∆ A,B

Hence ℓG = PG + ∆

A,B = LG(1)

Case z

At A < z Bt B

B imposes the highest load Unbounded buffer: ℓG(n) = LG(n) − (n − 1)z

At A

It diverges with n! Bounded buffer: We compute an over-approximation with a (backward) linearization technique

17 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-27
SLIDE 27

Generalization to chains and acyclic graphs

Outline

1

Introduction

2

Preliminary results

3

Graph A

p q

− − →B

4

Generalization to chains and acyclic graphs

5

Experiments

6

Conclusion

18 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-28
SLIDE 28

Generalization to chains and acyclic graphs

Multi-iteration latency of chain A

p q

− − →B

p′ q′

− − − →C

Forward linearization B First analyse the graph A

p q

− − →B If B does not fire continuously, then build a fictive actor Bu s.t.: ∀i. f

B(i) ≤ f Bu(i)

∧ ∃i. f

B(i) = f Bu(i)

Then analyse the graph Bu p′

q′

− − − →C Finally combine the two schedules

19 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-29
SLIDE 29

Generalization to chains and acyclic graphs

Multi-iteration latency of acyclic graphs

Acyclic graph G seen as a set of maximal chains G(G) (chains from a source actor to a sink actor) Property: ∀i. LG(i) = max

g∈G(G){Lg(i)}

Proof: transform G into HSDF then unfold i times Compute the multi-iteration latency of each maximal chain

20 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-30
SLIDE 30

Generalization to chains and acyclic graphs

Input-output latency for the chain for A

p q

− − →B

p′ q′

− − − →C

Linearized schedule: (backward linearization) Conclusion: ℓG = 83 and ˆ ℓG = 89.8 so we over-approximate by 8.2%

21 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-31
SLIDE 31

Experiments

Outline

1

Introduction

2

Preliminary results

3

Graph A

p q

− − →B

4

Generalization to chains and acyclic graphs

5

Experiments

6

Conclusion

22 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-32
SLIDE 32

Experiments

Multi-iteration latency computation for real benchmarks

graph PG LG(1) ˆ LG(1)/LG(1) ˆ LG(2)/LG(2) modem 32 62 1 1 sample rate 960 1000 1.022 1.011 converter H.263 decoder 332046 369508 1 1 FFT 78844 94229 1 1 TDE 17740800 19314069 1 1

23 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-33
SLIDE 33

Experiments

Multi-iteration latency for randomly generated chains

Randomly generated chains of 10 actors p, q ∈ [1, 10] and tX ∈ [1, 100] Total number of firings per iteration < 2 × 103 We report ˆ LA1→A10 LG(1) = approximate exact

24 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-34
SLIDE 34

Experiments

Input-output latency for randomly generated chains

Randomly generated chains of 9 actors p, q ∈ [1, 10] and tX ∈ [1, 100] Total number of firings per iteration < 2 × 103 A9 imposes the highest load Each channel size Ai

p q

− − →Ai+1 is equal to 2(p + q − gcd(p, q)) We report ˆ ℓG ℓG = approximate exact

25 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-35
SLIDE 35

Conclusion

Outline

1

Introduction

2

Preliminary results

3

Graph A

p q

− − →B

4

Generalization to chains and acyclic graphs

5

Experiments

6

Conclusion

26 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-36
SLIDE 36

Conclusion

Related work

[Geilen 2011] and [Skelin et al. 2014]: (max, +) algebra to compute the token timestamp vector with the eigenvalue of the transition matrix = ⇒ Requires the ceiling operator to be simplified [Ghamarian et al. 2008]: parametric throughput analysis for SDF graphs with bounded parametric execution times of actors but constant rates = ⇒ Parameter space divided into a set of convex polyhedra (throughput regions), each with a throughput expression [Damavandpeyma et al. 2012]: Extension to scenario-aware dataflow (SADF) [Bodin et al. 2013]: lower bounds of the maximum throughput to compute strictly periodic schedules instead of ASAP schedules = ⇒ Can handle some cyclic graphs, but usually our linearization methods provide better results

27 Bouakaz, Fradet and Girault Symbolic Computation of Latency

slide-37
SLIDE 37

Conclusion

Conclusion

We presented:

An exact analytic solution for the A

p q

− − →B SDF graph using enabling patterns A safe generalization to acyclic graphs using forward and backward linearization

Still to solve: Symbolic analysis of cyclic dataflow graphs

28 Bouakaz, Fradet and Girault Symbolic Computation of Latency