Synchronous Elastic Systems Synchronous Elastic Systems Mike - - PowerPoint PPT Presentation

synchronous elastic systems synchronous elastic systems
SMART_READER_LITE
LIVE PREVIEW

Synchronous Elastic Systems Synchronous Elastic Systems Mike - - PowerPoint PPT Presentation

Synchronous Elastic Systems Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella Intel Intel Universitat Politecnica Politecnica Universitat Strategic CAD Labs Strategic CAD Labs de


slide-1
SLIDE 1

Synchronous Elastic Systems Synchronous Elastic Systems

Mike Kishinevsky and Jordi Cortadella Mike Kishinevsky and Jordi Cortadella

Universitat Universitat Politecnica Politecnica de de Catalunya Catalunya Barcelona, Spain Barcelona, Spain Intel Intel Strategic CAD Labs Strategic CAD Labs Hillsboro, USA Hillsboro, USA

slide-2
SLIDE 2

Contributors to SELF research

Performance analysis: Jorge J Performance analysis: Jorge Jú úlvez lvez Theory of elastic machines: Sava Krstic and John O’Leary Micro-architectural pipelining: Timothy Kam, Marc Galceran Oms Optimization: Dmitry Bufistov, Josep Carmona Bill Grundmann

2

slide-3
SLIDE 3

Agenda Agenda

I. I.

Basics of elastic systems Basics of elastic systems

II. II.

Early evaluation and Early evaluation and performance analysis performance analysis

III. III.

Applications of elastic systems Applications of elastic systems

IV. IV.

Demo of SELF compiler Demo of SELF compiler

3

slide-4
SLIDE 4

Synchronous Stream of Data Synchronous Stream of Data

Token (of data)

1 4 7

Clock cycle 1 2

4

slide-5
SLIDE 5

Synchronous Elastic Stream Synchronous Elastic Stream

1 4 7

1 2

Clock cycle Token

4 1 7

1 2

3 4 5 Clock cycle

Bubble (no data) 5

slide-6
SLIDE 6

Synchronous Circuit Synchronous Circuit

+ …

1 4 7

3 4 8 2 1

Latency = 0

6

slide-7
SLIDE 7

Synchronous Elastic Circuit Synchronous Elastic Circuit

+

Latency = 0

1 4 7

3 4 8 2 1

… +

e

3 4 8

1 4 7

2 1

Latency can vary

7

slide-8
SLIDE 8

Ordinary Synchronous System Ordinary Synchronous System

A C D B A C D B

=

Changing latencies changes behavior

8

slide-9
SLIDE 9

Synchronous Elastic Synchronous Elastic (characteristic property) (characteristic property)

A C D B A C D B e e e e e e e e e

=

Changing latencies does NOT change behavior = time elasticity

9

slide-10
SLIDE 10

Elasticity?

Elasticity refers to elasticity of time, i.e. tolerance to chang Elasticity refers to elasticity of time, i.e. tolerance to changes in es in timing parameters, not properties of materials timing parameters, not properties of materials Luca Carloni et al. in the first systematic study of such system Luca Carloni et al. in the first systematic study of such systems s called them Latency Insensitive Systems called them Latency Insensitive Systems Other used names: Other used names: – – Latency tolerant systems Latency tolerant systems – – Synchronous emulation of asynchronous systems Synchronous emulation of asynchronous systems – – Synchronous handshake circuits Synchronous handshake circuits We use term We use term “ “synchronous elastic synchronous elastic” ” to link to asynchronous elastic to link to asynchronous elastic systems that have been developed before systems that have been developed before e.g., David Muller e.g., David Muller’ ’s pipelines of late 1950s s pipelines of late 1950s Ivan Sutherland Ivan Sutherland’ ’s micro s micro-

  • pipelines 1989

pipelines 1989 Tolerate the variability of input data arrival and computation d Tolerate the variability of input data arrival and computation delays elays Asynchronous elastic tolerate changes in continuous time Asynchronous elastic tolerate changes in continuous time Synchronous elastic Synchronous elastic -

  • in discrete time

in discrete time

10

slide-11
SLIDE 11

Why Why

Scalable Scalable Modular (Plug & Play) Modular (Plug & Play) Better energy Better energy-

  • delay trade

delay trade-

  • offs
  • ffs

(design for typical case instead of worst case) (design for typical case instead of worst case) New micro New micro-

  • architectural opportunities

architectural opportunities in digital design in digital design Not asynchronous: use existing design Not asynchronous: use existing design experience, CAD tools and flows... but have experience, CAD tools and flows... but have some advantages of asynchronous some advantages of asynchronous 11

slide-12
SLIDE 12

How to Design Synchronous Elastic Systems How to Design Synchronous Elastic Systems Example of the implementation: Example of the implementation: SELF = Synchronous Elastic Flow SELF = Synchronous Elastic Flow Others are possible Others are possible

12

slide-13
SLIDE 13

Pipelined communication Pipelined communication

sender receiver Data Data

What if the sender does not always send valid data?

13

slide-14
SLIDE 14

The Valid bit The Valid bit

sender receiver Data Data Valid Valid

What if the receiver is not always ready ? 14

slide-15
SLIDE 15

The Stop bit The Stop bit

sender Data Valid Stop receiver Data Valid Stop

15

slide-16
SLIDE 16

The Stop bit The Stop bit

1 1 1 1

sender Data Valid Stop receiver Data Valid Stop

16

slide-17
SLIDE 17

The Stop bit The Stop bit

1 1 1 1 1 1

sender Data Valid Stop receiver Data Valid Stop

17

slide-18
SLIDE 18

The Stop bit The Stop bit

1 1 1 1 1 1 1 1 1 1

sender Data Valid Stop receiver Data Valid Stop

Back Back-

  • pressure

pressure

18

slide-19
SLIDE 19

The Stop bit The Stop bit

1 1

sender Data Valid Stop receiver Data Valid Stop

Long combinational path 19

slide-20
SLIDE 20

Cyclic structures Cyclic structures

20

Combinational cycle

Data Valid Stop One can build circuits with combinational cycles (constructive cycles by Berry), but synthesis and timing tools do not like them

slide-21
SLIDE 21

Example: pipelined linear communication chain Example: pipelined linear communication chain with transparent latches with transparent latches

sender receiver

H L H L ½ cycle ½ cycle

Master and slave latches with independent control

21

slide-22
SLIDE 22

Shorthand notation Shorthand notation (clock lines not shown) (clock lines not shown)

22

D Q clk

En En

slide-23
SLIDE 23

SELF (linear communication) SELF (linear communication)

sender receiver

V V V V S S S En En En En S 1 1 Data Valid Stop Data Valid Stop 1 1

23

slide-24
SLIDE 24

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop Data Valid Stop

1 1 24

slide-25
SLIDE 25

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop Data Valid Stop

1 1 25

slide-26
SLIDE 26

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop Data Valid Stop

1 1 26

slide-27
SLIDE 27

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop Data Valid Stop

1 1 27

slide-28
SLIDE 28

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop Data Valid Stop

1 1 28

slide-29
SLIDE 29

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop Data Valid Stop

29

slide-30
SLIDE 30

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop Data Valid Stop

30

slide-31
SLIDE 31

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop Data Valid Stop

31

slide-32
SLIDE 32

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop Data Valid Stop

32

slide-33
SLIDE 33

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop Data Valid Stop

33

slide-34
SLIDE 34

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop Data Valid Stop

1 1 1 1 34

slide-35
SLIDE 35

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop

1 1 1 1

Data Valid Stop

35

slide-36
SLIDE 36

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop

1 1 1 1

Data Valid Stop

36

slide-37
SLIDE 37

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop

1 1 1 1

Data Valid Stop

37

slide-38
SLIDE 38

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop

1 1 1 1

Data Valid Stop

38

slide-39
SLIDE 39

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop

1 1 1 1

Data Valid Stop

39

slide-40
SLIDE 40

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop

1 1 1 1

Data Valid Stop

40

slide-41
SLIDE 41

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop

1 1 1 1

Data Valid Stop

41

slide-42
SLIDE 42

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop

1 1 1 1

Data Valid Stop

42

slide-43
SLIDE 43

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop

1 1

Data Valid Stop

43

slide-44
SLIDE 44

SELF SELF

sender receiver

V V V V S S S En En En En

1 1

S Data Valid Stop Data Valid Stop

44

slide-45
SLIDE 45

SELF SELF

sender receiver

V V V V S S S En En En En

1 1

S Data Valid Stop Data Valid Stop

45

slide-46
SLIDE 46

SELF SELF

sender receiver

V V V V S S S En En En En

1 1

S Data Valid Stop Data Valid Stop

46

slide-47
SLIDE 47

SELF SELF

sender receiver

V V V V S S S En En En En

1 1

S Data Valid Stop Data Valid Stop

47

slide-48
SLIDE 48

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop Data Valid Stop

1 1 48

slide-49
SLIDE 49

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop Data Valid Stop

1 1 49

slide-50
SLIDE 50

SELF SELF

sender receiver

V V V V S S S En En En En S Data Valid Stop Data Valid Stop

1 1 50

slide-51
SLIDE 51

Elastic channel and its protocol Elastic channel and its protocol

Idle Retry Transfer

Valid * not Stop not Valid Valid * Stop

Sender Sender Receiver Receiver

Data Valid Stop

51

slide-52
SLIDE 52

Elastic channel protocol Elastic channel protocol

Sender Sender Receiver Receiver Data Data

Valid Valid Stop Stop

Data Data

Valid Valid Stop Stop

* D D * C C C B * A * D D * C C C B * A 0 1 1 0 1 1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 1 1 0 0 0

Transfer Transfer

Retry Retry Idle Idle 52

slide-53
SLIDE 53

Basic VS block Basic VS block

En Eni

i

V Vi

i

En Eni

i

V Vi

i-

  • 1

53 V Vi

i-

  • 1

VS

V Vi

i 1 1

S Si

i-

  • 1

S Si S Si

i-

  • 1

S Si

1 i 1 i

VS block + data-path latch = elastic HALF-buffer (EHB) EHB + EHB = elastic buffer with capacity 2

slide-54
SLIDE 54

Control specification of the EB

54

slide-55
SLIDE 55

Two implementations

55

slide-56
SLIDE 56

Elastic buffer keeps data while stop is in flight Elastic buffer keeps data while stop is in flight

W1R1 W2R1 W1R2

56

W2R2 W1R1 Cannot be done with Single Edge Flops without double pumping Can use latches inside Master-Slave as shown before

EBs = FIFOs with two parameters:

Forward latency Capacity Backward latency for stop propagation assumed (but need not be) equal to fwd latency Typical case: (1,2) - 1 cycle forward latency with capacity of 2 Replaces “normal” registers Decoupling buffers

slide-57
SLIDE 57

Join Join

57 VS

+

V1 V2 S1 S2 V S VS VS

slide-58
SLIDE 58

(Lazy) Fork (Lazy) Fork

V V1 S1 V2 S S2 58

slide-59
SLIDE 59

Eager Fork Eager Fork

S1

^ ^ ^ ^

V1 V V2 S S2 59

slide-60
SLIDE 60

Eager fork (another implementation) Eager fork (another implementation)

VS VS VS VS VS

60

slide-61
SLIDE 61

Variable Latency Units Variable Latency Units

61

[0 - k] cycles [0 - k] cycles V/S V/S done go clear

slide-62
SLIDE 62

Coarse grain control

62

slide-63
SLIDE 63

Elasticization Elasticization

Synchronous Elastic

63

slide-64
SLIDE 64

CLK CLK

64

slide-65
SLIDE 65

CLK CLK

PC IF/ID ID/EX MEM/WB EX/MEM J J O O I I N N J J O O I I N N F F O O R R K K FORK FORK

65

slide-66
SLIDE 66

66

V S

CLK CLK

V S V S V S V S J O I N J O I N F O R K FORK

slide-67
SLIDE 67

67

1

CLK CLK

1 1 1 1 J O I N J O I N F O R K FORK

slide-68
SLIDE 68

68

1 1 1 1 1

Elastic control layer Generation of gated clocks

CLK CLK

slide-69
SLIDE 69

Equivalence

Synchronous: stream of data

D: a b c d e d f g h i j D: a b c d e d f g h i j … …

SELF: elastic stream of data

D: a * b * * c d e * d f * g h * * i j D: a * b * * c d e * d f * g h * * i j … … V: 1 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 V: 1 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 … … S: 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 S: 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 … …

Transfer sub-stream = original stream

Called: transfer equivalence, flow equivalence, or latency equivalence

69

slide-70
SLIDE 70

Marked Graph models Marked Graph models

  • f elastic systems
  • f elastic systems

70

slide-71
SLIDE 71

Modelling elastic control with Petri nets Modelling elastic control with Petri nets

data data-

  • token

token bubble bubble data data-

  • token

token bubble bubble

71

slide-72
SLIDE 72

Modelling elastic control with Petri nets Modelling elastic control with Petri nets

bubble bubble data data-

  • token

token 2 data 2 data-

  • tokens

tokens

72 Hiding internal transitions of elastic buffers

slide-73
SLIDE 73

Modelling elastic control with Marked Graphs Modelling elastic control with Marked Graphs

73

slide-74
SLIDE 74

Modelling elastic control with Marked Graphs Modelling elastic control with Marked Graphs

Forward Forward (Valid or Request) (Valid or Request) Backward Backward (Stop or Acknowledgement) (Stop or Acknowledgement)

74

slide-75
SLIDE 75

Elastic control with Timed Marked Graphs. Elastic control with Timed Marked Graphs. Continuous time = asynchronous Continuous time = asynchronous

75 d=250ps d=151ps Delays in time units 250 151

slide-76
SLIDE 76

Elastic control with Timed Marked Graphs. Elastic control with Timed Marked Graphs. Discrete time = synchronous elastic Discrete time = synchronous elastic

76 d=1 d=1 Latencies in clock cycles 1 1

slide-77
SLIDE 77

Elastic control with Timed Marked Graphs. Elastic control with Timed Marked Graphs. Discrete time. Multi Discrete time. Multi-

  • cycle operation

cycle operation

77 d=2 d=1 2 1

slide-78
SLIDE 78

Elastic control with Timed Marked Graphs. Elastic control with Timed Marked Graphs. Discrete time. Variable latency operation Discrete time. Variable latency operation

d {1,2} d=1 {1,2} 1 e.g. discrete probabilistic distribution: average latency 0.8*1 + 0.2*2 = 1.2

78

slide-79
SLIDE 79

Modeling forks and joins Modeling forks and joins

d=1 1 79

slide-80
SLIDE 80

Modelling combinational elastic blocks Modelling combinational elastic blocks

d=1 d=0 1 80

slide-81
SLIDE 81

Elastic Marked Graphs Elastic Marked Graphs

An Elastic Marked Graph (EMG) is a Timed MG such that for any arc a there exists a complementary arc a’ satisfying the following condition

  • a = a’• and •a’ = a•

Initial number of tokens on a and a’ (M0(a)+M0(a’)) = capacity of the corresponding elastic buffer Similar forms of “pipelined” Petri Nets and Marked Graphs have been previously used for modeling pipelining in HW and SW (e.g. Patil 1974; Tsirlin, Rosenblum 1982) 81

slide-82
SLIDE 82

Reminder Reminder: : Performance Performance analysis analysis of

  • f Marked

Marked graphs graphs

Th Th = operations / cycle = number of firings per time unit = operations / cycle = number of firings per time unit

The throughput is given by the minimum mean-weight cycle Th=min(Th(A), Th(B), Th(C))=2/5

A B C Th(A)=3/7 Th(B)=3/5 Th(C)=2/5

Efficient algorithms: (Karp 1978), (Dasdan,Gupta 1998)

82

slide-83
SLIDE 83

Early evaluation Early evaluation

Na Naï ïve solution: introduce choice places ve solution: introduce choice places – – issue tokens at choice node only into one (some) relevant path issue tokens at choice node only into one (some) relevant path – – problem: tokens can arrive to merge nodes out problem: tokens can arrive to merge nodes out-

  • of
  • f-
  • order
  • rder

later token can overpass the earlier one later token can overpass the earlier one Solution: change enabling rule Solution: change enabling rule – – early evaluation early evaluation – – issue negative tokens to input places without tokens, issue negative tokens to input places without tokens, i.e. keep the same firing rule i.e. keep the same firing rule – – Add symmetric sub Add symmetric sub-

  • channels with negative tokens

channels with negative tokens – – Negative tokens kill positive tokens when meet Negative tokens kill positive tokens when meet Two related problems: Two related problems: Early evaluation and Exceptions (how to kill a data Early evaluation and Exceptions (how to kill a data-

  • token)

token)

83

slide-84
SLIDE 84

Examples Examples of

  • f early

early evaluation evaluation

MULTIPLEXOR a b c s

if s = T then c := a -- don’t wait for b else c := b -- don’t wait for a

T F MULTIPLIER a b c

if a = 0 then c := 0 -- don’t wait for b

*

84

slide-85
SLIDE 85

Related work Related work

Petri nets Petri nets

– – Extensions to model OR causality Extensions to model OR causality Kishinevsky et al. Change Diagrams [e.g. book of 1994] Kishinevsky et al. Change Diagrams [e.g. book of 1994] Yakovlev et al. Causal Nets 1996 Yakovlev et al. Causal Nets 1996

Asynchronous systems Asynchronous systems

– – Reese et al 2002: Early evaluation Reese et al 2002: Early evaluation – – Brej Brej 2003: Early evaluation with anti 2003: Early evaluation with anti-

  • tokens

tokens – – Ampalan Ampalan & Singh 2006: & Singh 2006: preemption preemption using anti using anti-

  • tokens

tokens 85

slide-86
SLIDE 86

Dual Marked Graph Dual Marked Graph

Marking: Marking: Arcs (places) −> Z (allow negative markings) Some nodes are labeled as early-enabling Enabling rules for a node:

– Positive enabling: M(a) > 0 for every input arc – Early enabling (for early enabling nodes): M(a) > 0 for some input arcs – Negative enabling: M(a) < 0 for every output arc

Firing rule: the same as in regular MG

86

slide-87
SLIDE 87

Dual Marked Graphs Dual Marked Graphs

Early enabling can be associated with an external guard that depends on data variables (e.g., a select signal of a multiplexor) Actual enabling guards are abstracted away (unless needed) Anti-token generation: When an early enabled node fires, it generates anti-tokens in the predecessor arcs that had no tokens Anti-token propagation counterflow: When negative enabled node fires, it propagates the anti-tokens from the successor to the predecessor arcs

87

slide-88
SLIDE 88

Dual Marked Graph model Dual Marked Graph model

  • 1

Enabled !

  • 1
  • 1
  • 1
  • 1

88

slide-89
SLIDE 89

Passive anti Passive anti-

  • token

token

Passive DMG = version of DMG without negative enabling Passive DMG = version of DMG without negative enabling Negative tokens can only be generated due to early Negative tokens can only be generated due to early enabling, but cannot propagate enabling, but cannot propagate Let Let D D be a strongly connected DMG such that all cycles be a strongly connected DMG such that all cycles have positive cumulative marking have positive cumulative marking Let Let D Dp

p be a corresponding passive DMG

be a corresponding passive DMG. . If environment (consumers) never generate negative If environment (consumers) never generate negative tokens, then tokens, then throughput ( throughput (D D) = throughput ( ) = throughput (D Dp

p)

)

– – If capacity of input places for early enabling transitions is un If capacity of input places for early enabling transitions is unlimited, limited, then active anti then active anti-

  • tokens do not improve performance

tokens do not improve performance – – Active anti Active anti-

  • tokens reduce activity in the data

tokens reduce activity in the data-

  • path

path (good for power reduction) (good for power reduction)

89

slide-90
SLIDE 90

Properties of Properties of DMGs DMGs

Firing invariant: Let node n be simultaneously positive (early) and negative enabled in marking M. Let M1 be the result of firing n from M due to positive (early) enabling. Let M2 be the result of firing n from M due to negative enabling. Then, M1 = M2 Token preservation. Let c be a cycle of a strongly connected DMG with initial marking M0. For every reachable marking M : M(c) = M0(c)

  • Liveness. A strongly connected passive DMG is live iff for every cycle c:

M(c) > 0.

– For DMGs this is a sufficient condition of liveness – It is also a necessary condition for positive liveness

Repetitive behavior. In a SC DMG: a firing sequence s from M leads to the same marking iff every node fires in s the same number of times DMGs have properties similar to regular MGs

90

slide-91
SLIDE 91

Implementing early enabling Implementing early enabling

91

slide-92
SLIDE 92

How to implement anti How to implement anti-

  • tokens ?

tokens ?

Positive tokens Negative tokens 92

slide-93
SLIDE 93

How to implement anti How to implement anti-

  • tokens ?

tokens ?

Positive tokens Negative tokens 93

slide-94
SLIDE 94

How to implement anti How to implement anti-

  • tokens ?

tokens ?

Valid Valid+

+

94

Valid Valid+

+

Valid Valid+

+

Stop Stop+

+

Valid Valid–

Stop Stop–

+

Stop Stop+

+

Valid Valid–

Stop Stop–

slide-95
SLIDE 95

Controller for elastic buffer Controller for elastic buffer

V S V S Data

H H L L L H

V S V S En En

95

slide-96
SLIDE 96

Dual controller for elastic buffer Dual controller for elastic buffer

S+ V+ V- S- S+ V+ V- S- En En

96

slide-97
SLIDE 97

Dual Join and Fork

97

slide-98
SLIDE 98

Join with early evaluation

98

slide-99
SLIDE 99

Condition on Early Evaluation Function

Early evaluation function makes decision based on presence

  • f valid bits, not on their absence

Formally: EE is positive unate with respect to data input Example: legal EE function for a data-path MUX (s – select input) 99

slide-100
SLIDE 100

Passive anti Passive anti-

  • token (capacity one)

token (capacity one)

Bigger capacity can be achieved by “injecting” anti-token up-down counters on elastic channels 100

slide-101
SLIDE 101

Properties of elastic channels

Invariants: mutually exclusive Kill (V -) and Stop (S +) Valid (V +) and retain of a kill (S -)

101

slide-102
SLIDE 102

DLX DLX processor processor model model with with slow slow bypass bypass

Fetch Bypass Decode Execute Memory Write-back α 1−α β 1−β

Th Th = operations / cycle = operations / cycle

Throughput:

102

Late evaluation

Th=0.5 Th = 0.7 (α=0.3; β=0.3)

Applying early evaluation on “Execution” and “Write-back”

slide-103
SLIDE 103

Conclusions Conclusions

Early evaluation can increase performance Early evaluation can increase performance beyond the min cycle ratio beyond the min cycle ratio The duality between positive and negative The duality between positive and negative tokens suggests a clean and effective tokens suggests a clean and effective implementation implementation Dual Marked Graphs is a formal model for Dual Marked Graphs is a formal model for analytical analysis and optimization methods analytical analysis and optimization methods

103

slide-104
SLIDE 104

Performance analysis Performance analysis with early evaluation with early evaluation

(joint work with Jorge J (joint work with Jorge Jú úlvez lvez) )

104

slide-105
SLIDE 105

Revisit Revisit Performance Performance Analysis Analysis of

  • f Marked

Marked Graphs Graphs

The throughput can also be computed by means of The throughput can also be computed by means of linear programming linear programming

Average marking

∞ →

=

t p t t p

)d ( m m

1

lim τ τ

p p m

th min =

Throughput

) , min(

2 1 p p m

m th =

t1 t2 t3 p1 p2

[Campos, Chiola, Silva 1991]

105

slide-106
SLIDE 106

Revisit Revisit Performance Performance Analysis Analysis of

  • f Marked

Marked Graphs Graphs max th

106 a b c p1 p2 p3 p4 p5

mp1 = 1 + tb – ta mp2 = 0 + ta – tb mp3 = 1 + td – ta mp4 = 0 + ta – tc mp5 = 1 + tc – td

Th Th = 0.5 = 0.5

reachability

th ≤ mp2 // transition b th ≤ mp4 // transition c th ≤ mp5 // transition d th ≤ min(mp1, mp3) // transition a

th constraints

d

slide-107
SLIDE 107

GMG = Multi GMG = Multi-

  • guarded Dual Marked Graph

guarded Dual Marked Graph

Refinement of passive Refinement of passive DMGs DMGs Every node has a set of guards Every node has a set of guards Every guard is a set of input places (arcs) Every guard is a set of input places (arcs) Example: Example:

t1 t3 t2 t4 p1 p2 p3

G(t4)={{p1,p3},{p2,p3}} 107

slide-108
SLIDE 108

Early Early evaluation evaluation

α 1-α β 1-β

108

slide-109
SLIDE 109

Early Early evaluation evaluation

α 1−α β 1−β α β

0.60 0.60 0.54 0.54 0.49 0.49 0.46 0.46 0.44 0.44 0.43 0.43 1.0 1.0 0.54 0.54 0.51 0.51 0.48 0.48 0.46 0.46 0.44 0.44 0.43 0.43 0.8 0.8 0.49 0.49 0.48 0.48 0.47 0.47 0.45 0.45 0.44 0.44 0.43 0.43 0.6 0.6 0.45 0.45 0.45 0.45 0.45 0.45 0.44 0.44 0.44 0.44 0.43 0.43 0.4 0.4 0.43 0.43 0.43 0.43 0.42 0.42 0.42 0.42 0.42 0.42 0.42 0.42 0.2 0.2 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.0 0.0 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.60 0.60 0.54 0.54 0.49 0.49 0.46 0.46 0.44 0.44 0.43 0.43 1.0 1.0 0.54 0.54 0.51 0.51 0.48 0.48 0.46 0.46 0.44 0.44 0.43 0.43 0.8 0.8 0.49 0.49 0.48 0.48 0.47 0.47 0.45 0.45 0.44 0.44 0.43 0.43 0.6 0.6 0.45 0.45 0.45 0.45 0.45 0.45 0.44 0.44 0.44 0.44 0.43 0.43 0.4 0.4 0.43 0.43 0.43 0.43 0.42 0.42 0.42 0.42 0.42 0.42 0.42 0.42 0.2 0.2 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.0 0.0 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0

109 (0.43) (0.43) (0.60) (0.60) (0.40) (0.40)

slide-110
SLIDE 110

LP LP formulation formulation for for an an upper upper bound bound of

  • f a

a throughput throughput (by (by example example) )

α 1-α

a b d c p1 p2 p3 p4 p5

max th

mp1 = 1 + tb – ta mp2 = 0 + ta – tb mp3 = 1 + td – ta mp4 = 0 + ta – tc mp5 = 1 + tc – td th ≤ mp2 th ≤ mp4 th ≤ mp5 th = α mp1 + (1-α) mp3

Th Th = (2 = (2 -

  • α

α) / (3 ) / (3 -

  • α

α) )

110

slide-111
SLIDE 111

Averaging cycle throughput or cycle times Averaging cycle throughput or cycle times does not work does not work

Th Th = (2 = (2 -

  • α

α) / (3 ) / (3 -

  • α

α) )

α 1-α

a b d c p1 p2 p3 p4 p5

1/2 1/2 2/3 2/3

Averaging throughput of individual cycles Th Th’ ’ = = α α 1/2 + (1 1/2 + (1-

  • α

α) 2/3 = (4 ) 2/3 = (4 -

  • α

α) / 6 ) / 6 1/Th 1/Th” ” = 2 = 2α α + (1 + (1-

  • α

α) 3/2 ) 3/2 = = (3 + (3 + α α) / 2 ) / 2 Th Th” ” = 2/(3+ = 2/(3+ α α) ) Averaging effective cycle times

  • f individual cycles

111

slide-112
SLIDE 112

What can we do with synchronous elastic systems?

112

slide-113
SLIDE 113

Variable latency units

ALU ALU L = 1 L = 3 L = 2 L = 1 start done

113

slide-114
SLIDE 114

# adds

Benchmark “Patricia” from Media Bench

Statistics

  • f operand

sizes bits of adder used

1st operand 2

nd

  • p

e r a n d S i g n i f i c a n t b i t s # adds

12 bits of an adder do 95% of additions 114

slide-115
SLIDE 115

Power-delay for an adder

1 1.25 1.5

Compare 64 bits VLA and prefix adder

relative delay

115

slide-116
SLIDE 116

Variable-latency cache hits

12-cycle miss L2-cache 2-way associative 32KB 2-cycle hit

1-cycle hit

suggested by Joel Emer for ASIM experiment suggested by Joel Emer for ASIM experiment

L1-cache

116

slide-117
SLIDE 117

Variable-latency cache hits

12-cycle miss L2-cache Pseudo-associative 32KB {1-2} cycle hit

1-cycle hit

Sequential access: if hit in first access L = 1, if not – L=2 Trade-off: faster, or larger, or less power cache

L1-cache

117

slide-118
SLIDE 118

Variable-latency cache hits

12-cycle miss L2-cache Pseudo-associative 64KB {2-3} cycle hit

1-cycle hit

Sequential access: if hit in first access L = 1, if not – L=2 Trade-off: faster, or larger, or less power cache

L1-cache

118

slide-119
SLIDE 119

Correct-by-construction pipelining

Transforms:

– bypass – retiming – elasticize – early enabling – insert buffers and negative tokens – size elastic buffer capacity

ID E1 E2 RF

  • ID

E1 E2 RF

  • 1
1
  • 1

SPEC Correct-by-construction IMP

[Joint work with Timothy Kam and Marc Galceran] 119

slide-120
SLIDE 120

Tree topology NoC

R R R R R

AGENT AGENT AGENT AGENT AGENT AGENT AGENT

[In collaboration with Ken Stevens, Charles Dike, Bill Grundmann] 120

slide-121
SLIDE 121

Router node interface

B Router A C

121

slide-122
SLIDE 122

EHB EHB

S M S M

NoC Router EHB

M S A

Relative order of tokens between agents is preserved

B C

122

slide-123
SLIDE 123

Switch and Merge

123

slide-124
SLIDE 124

Correctness (short story) Correctness (short story)

Developed theory of elastic machines Developed theory of elastic machines (for late evaluation) (for late evaluation) Verify correctness of any elastic implementation = check Verify correctness of any elastic implementation = check conformance with the definition of elastic machine conformance with the definition of elastic machine All SELF controllers are verified for conformance All SELF controllers are verified for conformance Elasticization Elasticization is correct is correct-

  • by

by-

  • construction

construction Theory for early evaluation and negative delays is more Theory for early evaluation and negative delays is more challenging challenging

– – Sketch of a theory, but no fully satisfactory compositional Sketch of a theory, but no fully satisfactory compositional properties found yet properties found yet – – Verification done on concrete systems and controllers Verification done on concrete systems and controllers

124

slide-125
SLIDE 125

Summary Summary SELF gives a low cost implementation of elastic machines SELF gives a low cost implementation of elastic machines Functionality is correct when latencies change Functionality is correct when latencies change New micro New micro-

  • architectural opportunities

architectural opportunities Compositional theory proving correctness Compositional theory proving correctness Early evaluation Early evaluation -

  • mechanism for performance and power

mechanism for performance and power

  • ptimization
  • ptimization

Optimization methods (that we did not discuss): Optimization methods (that we did not discuss): Retiming and recycling, buffer optimization and pipelining Retiming and recycling, buffer optimization and pipelining Applications to design of Applications to design of NoC NoC link layer link layer 125

slide-126
SLIDE 126

See reference list for some relevant publications

126

slide-127
SLIDE 127

SELF compiler

Flow Flow graph graph

Parameterized Parameterized library of library of controllers controllers

Control generation Verilog Verilog SMV SMV blif blif

Backend synthesis NuSMV SIS & ABC Simulator

Verification Logic synthesis

Netlist Netlist of

  • f

distributed distributed controllers controllers

127

Performance

slide-128
SLIDE 128

128

Example Example

slide-129
SLIDE 129

Example Example

Evaluation Evaluation Throughput Throughput No early evaluation No early evaluation 0.277 0.277 Passive anti Passive anti-

  • tokens M2

tokens M2 → → W W 0.280 0.280 Passive anti Passive anti-

  • tokens F3

tokens F3 → → W W 0.387 0.387 Active anti Active anti-

  • tokens

tokens 0.400 0.400

129