Hunting Deadlocks Efficiently in Micro-Architectural Models of - - PowerPoint PPT Presentation

hunting deadlocks efficiently in micro architectural
SMART_READER_LITE
LIVE PREVIEW

Hunting Deadlocks Efficiently in Micro-Architectural Models of - - PowerPoint PPT Presentation

Hunting Deadlocks Efficiently in Micro-Architectural Models of Communication Fabrics Freek Verbeek and Julien Schmaltz Growing number of cores (W. Tichy - Keynote ICST 2011) AMD Opteron 12 cores Sun Niagara3 16 cores Intel 8 cores ~1.8 Bill.


slide-1
SLIDE 1

Hunting Deadlocks Efficiently in Micro-Architectural Models of Communication Fabrics

Freek Verbeek and Julien Schmaltz

slide-2
SLIDE 2

Growing number of cores (W. Tichy - Keynote ICST 2011)

Intel 2 cores ~167 Mio. T. on 1.1cm2 Intel 4 cores ~582 Mio. T. on 2.86cm2 Intel 8 cores ~2.3 Bill. T. on 6.8cm2 AMD Opteron 12 cores ~1.8 Bill. T. on 2x3.46cm2 Sun Niagara3 16 cores ~1 Bill. T. on 3.7cm2 Intel SCC 48 cores ~1.3 Bill. T. on 5.6cm2 Tilera TILEPro64 64 cores Intel Research 80 cores ~100 Mio. T. on 2.75cm2

Usual:

  • verify cores
  • verify

interconnect

slide-3
SLIDE 3

Networks-on-Chips: Example 1, HERMES

The topology:

  • Two dimensional mesh
slide-4
SLIDE 4
  • XY: simple deterministic routing algorithm
  • First route to the destination column and then to the correct row
  • No cyclic dependencies and thus deadlock-free

The routing function:

Networks-on-Chips: Example 1

slide-5
SLIDE 5
  • Masters send requests and wait for responses
  • Slaves produce responses when receiving requests
  • Deadlock-free protocol

Master Slave

req! active

The high-level protocol:

Networks-on-Chips: Example 1

slide-6
SLIDE 6

Networks-on-Chips: Example 1

The high-level protocol:

rsp req ⇥ req rsp

  • No message dependencies

Master Slave

req! active

slide-7
SLIDE 7

Deadlockfree system

= ?

Network component Deadlock-free? Topology Routing Function High-level protocol Message Dependencies

Networks-on-Chips: Example 1

slide-8
SLIDE 8

Master Slave Master Master Slave Slave

Networks-on-Chips: Example 1

Core distribution:

  • Masters on the odd/slaves on the even columns
slide-9
SLIDE 9
  • Is the system deadlock-free ?
  • No if at least four columns, yes otherwise.

Master Slave Master Master Slave Slave Green requests waits for blue reponses

Networks-on-Chips: Example 1

Response Request

slide-10
SLIDE 10

Deadlockfree system

=

Network component Cause of deadlock? Topology Routing Function High-level protocol Message Dependencies

Networks-on-Chips: Example 1

slide-11
SLIDE 11
  • Design by STMicroelectronics
  • Simple shortest path routing algorithm
  • Regular for an even number of nodes
  • Packet, circuit, or wormhole switching

RelAd = (dest - current ) mod 4 * N if RelAd = 0 then stop elseif 0 < RelAd <= N then go clockwise elseif 3*N <= RelAd <= 4*N then go counter clockwise else go across endif

req! High-level protocol Routing logic

6 5 3 4 1 2 7

Networks-on-Chips: Example 2, Spidergon from STMElectronics

Topology

slide-12
SLIDE 12

7 6 5 3 4 1 2

Networks-on-Chips: Example 2

Network component Cause of deadlock Routing Function

slide-13
SLIDE 13
  • Is the system deadlock-free ?

Idle cores 7 6 5 3 4 1 2 Send packets

Networks-on-Chips: Example 2

slide-14
SLIDE 14
  • Is the system deadlock-free ?
  • Yes ! None of the dependencies in the right upper quarter occur.

7 6 5 3 4 1 2 Idle cores Send packets

Networks-on-Chips: Example 2

slide-15
SLIDE 15
  • Is the system deadlock-free ?

14 12

4 6 5

15

3 1 2

10

8

13

9 7

11

Idle cores Send packets

Networks-on-Chips: Example 2

slide-16
SLIDE 16

Network component Deadlock-free? Topology Routing Function High-level protocol Message Dependencies Core Distribution Network size Deadlockfree system

=

Networks-on-Chips: Example 2

slide-17
SLIDE 17

Networks-on-Chips: Example 3

Network component Deadlock-free? Topology Routing Function High-level protocol Message Dependencies Core Distribution Network size Queue sizes Counter information Virtual channel allocation Deadlockfree system

= ?

slide-18
SLIDE 18

Confusing ...

  • We need tools to (quickly) check for deadlocks

– in large systems – with message dependencies – with the topology, routing and core behavior in one model – able to handle parameters such as queue size

slide-19
SLIDE 19

Outline

  • Intel's micro-architectural description language

– xMAS language – Capturing high-level structure and message dependencies

  • Deadlock verification for xMAS

– Definition of deadlocks – Labelled waiting graph – Feasible logically closed subgraph

  • Conclusion and future work
slide-20
SLIDE 20

Intel's abstraction for communication fabrics

slide-21
SLIDE 21

xMAS - Executable MicroArchitectural Specifications

  • Fair sinks and sometimes sources
  • Diagram is formal model
  • Friendly to microarchitects
slide-22
SLIDE 22

xMAS example

req,rsp req q0 q1 q2 rsp req

slide-23
SLIDE 23

xMAS example

req,rsp req q0 q1 q2 rsp req

P

slide-24
SLIDE 24

xMAS example

req,rsp req q0 q1 q2 rsp req

P

slide-25
SLIDE 25

xMAS example

req,rsp req q0 q1 q2 rsp req

P P

slide-26
SLIDE 26

xMAS example

req,rsp req q0 q1 q2 rsp req

P P

slide-27
SLIDE 27

xMAS example

req,rsp req q0 q1 q2 rsp req

P

slide-28
SLIDE 28

Outline

  • Intel's micro-architectural description language

– xMAS language – Capturing high-level structure and message dependencies

  • Deadlock verification for xMAS

– Definition of deadlocks – Labelled dependency graph – Feasible logically closed subgraph

  • Conclusion and future work
slide-29
SLIDE 29
  • Intuition is a "dead" channel
  • Formal definition based on Linear Temporal Logic

– Predicate logic – Temporal operators "eventually" ( ) and "globally" ( )

  • Channel c is dead iff
  • Formal definition of "deadlock" in xMAS

⇥(c.irdy ∧ ¬c.trdy) ♦

slide-30
SLIDE 30
  • Inject two requests in q0
  • Fork creates two copies
  • One pair is sunk

req,rsp req q0 q1 q2 rsp req requests

xMAS example

dead channel

slide-31
SLIDE 31

General approach for deadlock detection in xMAS networks

  • Define Blocking Equations for all components

– Equations capture the reason why a component is idle or blocking

  • Build a labelled waiting graph for each queue

– Labels correspond to the equations – Graph captures the topology, i.e., the dependencies between the xMAS components

  • Search for a feasible logically closed subgraph

– Corresponds to a deadlock situation – Feasibility checked using Linear Programming

slide-32
SLIDE 32

General approach for deadlock detection in xMAS networks

  • Define Blocking Equations for all components

– Equations capture the reason why a component is idle or blocking

  • Build a labelled waiting graph for each queue

– Labels correspond to the equations – Graph captures the topology, i.e., the dependencies between the xMAS components

  • Search for a feasible logically closed subgraph

– Corresponds to a deadlock situation – Feasibility checked using Linear Programming

slide-33
SLIDE 33

Blocking Equations for a join

  • 2 cases

– output is blocked – the other input is idle

  • Block(u) = Idle(v) + Block(w)

u v w

req

slide-34
SLIDE 34

Blocking Equations for a join

  • 2 cases

– output is blocked – the other input is idle

  • Block(u) = Idle(v) + Block(w)

u v w We need to know when a channel is idle !

req

slide-35
SLIDE 35

Idle equations for a fork

  • A fork output is idle if the input is idle or the other output is blocked
  • Idle(w) = Idle(u) + Block(v)

u v w

req

slide-36
SLIDE 36

General approach for deadlock detection in xMAS networks

  • Define Blocking Equations for all components

– Equations capture the reason why a component is idle or blocking

  • Build a labelled waiting graph for each queue

– Labels correspond to the equations – Graph captures the topology, i.e., the dependencies between the xMAS components

  • Search for a feasible logically closed subgraph

– Corresponds to a deadlock situation – Feasibility checked using Linear Programming

slide-37
SLIDE 37

Step 2 / labelled dependency graph (1)

q1

q1.req ≥ 1

join start join start with a message in q1 and visit the join req

slide-38
SLIDE 38

Step 2 / labelled dependency graph (2)

q1

q1.req ≥ 1

join start join

u v w

analyse the join according to its Blocking Equation

Block(u) = Idle(v) + Block(w)

mrg2 sw

+

mrg2 sw we go forward to the merge and backward to the switch req

slide-39
SLIDE 39

Step 2 / labelled dependency graph (2)

q1

q1.req ≥ 1

join start join

u v w

forwards to the switch - then the sink can never be blocked

Block(u) = Block(w)

mrg2 sw

+

mrg2 sw we assume fair sinks sink

false

req

slide-40
SLIDE 40

Step 2 / labelled dependency graph (2)

q1

q1.req ≥ 1

join start join

u v w

backwards to the switch

Idle(u) = Idle(w)

mrg2 sw

+

mrg2 sw sink

false

req

slide-41
SLIDE 41

Step 2 / labelled dependency graph (2)

q1

q1.req ≥ 1

join start join

u w

backwards to the queue

Idle(u) = Idle(w) . Empty(q2)

mrg2 sw

+

mrg2 sw sink

false q2.rsp = 0

q2 note that we forgot the Block(w') case req

slide-42
SLIDE 42

Step 2 / labelled dependency graph (2)

q1

q1.req ≥ 1

join start join

u w

backwards to the merge and branch

Idle(w) = Idle(u) . Idle(v)

mrg2 sw

+

mrg2 sw sink

false q2.rsp = 0

q2 mrg1

u v

note branching is bad for us req

slide-43
SLIDE 43

true

Step 2 / labelled dependency graph (2)

q1

q1.req ≥ 1

join start join

u w Idle(u) = Block(v) + Idle(w)

mrg2 sw

+

mrg2 sw sink

false q2.rsp = 0

q2 mrg1

u v

frk src2

.

backwards to the merge and branch to the source - idle if no type produced to the fork req

slide-44
SLIDE 44

Step 2 / labelled dependency graph (2)

q1

q1.req ≥ 1

join start join

u w Idle(u) = Idle(w) . Empty(q0)

mrg2 sw

+

mrg2 sw sink

false q2.rsp = 0

q2 mrg1

u

frk src2

.

backwards to q0 and the source

q0.rsp = 0

false

src1 q0

true

req

slide-45
SLIDE 45

Step 2 / labelled dependency graph (2)

q1

q1.req ≥ 1

join start join

Block(u) = Block(w) . Full(q1)

mrg2 sw

+

mrg2 sw sink

false q2.rsp = 0

q2 mrg1

u

frk src2

.

forwards back to q1 and stop expansion

q0.rsp = 0

false

src1 q0

q1 = q1.size

+

w

true

req

slide-46
SLIDE 46

General approach for deadlock detection in xMAS networks

  • Define Blocking Equations for all components

– Equations capture the reason why a component is idle or blocking

  • Build a labelled waiting graph for each queue

– Labels correspond to the equations – Graph captures the topology, i.e., the dependencies between the xMAS components

  • Search for a feasible logically closed subgraph

– Corresponds to a deadlock situation – Feasibility checked using Linear Programming

slide-47
SLIDE 47

q1.req ≥ 1

q2.rsp = 0

q0.rsp = 0

q1 = q1.size

false

Step 2 / logically closed subgraph 1

q1 join mrg2 sink sw q2 mrg1 frk src1 src2

. +

q0

false

+

true

req

slide-48
SLIDE 48

q1.req ≥ 1

q2.rsp = 0

q0.rsp = 0

q1 = q1.size

false

Step 2 / logically closed subgraph 1

q1 join mrg2 sink sw q2 mrg1 frk src1 src2

. +

q0

false

+

true

req

slide-49
SLIDE 49

q1.req ≥ 1

q2.rsp = 0

q0.rsp = 0

q1 = q1.size

false

Step 2 / logically closed subgraph 1

q1 join mrg2 sink sw q2 mrg1 frk src1 src2

. +

q0

false

not feasible +

true

req

slide-50
SLIDE 50

q1.req ≥ 1

q2.rsp = 0

q0.rsp = 0

q1 = q1.size

false

Step 2 / logically closed subgraph 2

q1 join mrg2 sink sw q2 mrg1 frk src1 src2

. +

q0

false

+

true

req

slide-51
SLIDE 51

q1.req ≥ 1

q2.rsp = 0

q0.rsp = 0

q1 = q1.size

false

Step 2 / logically closed subgraph 2

q1 join mrg2 sink sw q2 mrg1 frk src1 src2

. +

q0

false

+

true

req

slide-52
SLIDE 52

Experimental Results

h.y ≠ Y h.x = X h.x ≠ X h.y = Y h.x > X h.x < X h.y < Y h.y > Y L N S E W N S E W (dst,src,req)

 (src, _ ,rsp)

req rsp

With deadlocks: a 14x14 mesh with 3724 components in 6.05 seconds Without deadlocks: a 14x14 mesh with 3724 components in 1.31 seconds

slide-53
SLIDE 53

Experimental Results

With deadlocks: a 28 ring with 477 components in 0.5 seconds Without deadlocks: a 28 ring with 477 components in 6.6 seconds

slide-54
SLIDE 54

Outline

  • Intel's micro-architectural description language

– xMAS definition – examples

  • Deadlock verification for xMAS

– definition of deadlocks – labelled dependency graph – feasible logically closed subgraph

  • Conclusion and future work
slide-55
SLIDE 55

Conclusion and future work

  • Tool to detect message dependent deadlocks

– Expressive language for routing, protocol, injection, etc. – Intricate deadlocks – Very efficient due to equations – Necessary and sufficient for structural deadlocks – Counterexamples

  • Future work:

– Still need to be formally proven – Composition/Hierarchy

  • Check sub-networks first and then compose
slide-56
SLIDE 56

Thanks !

slide-57
SLIDE 57

Deadlock example 3

  • Channels with three signals

– data, input ready, target ready

  • Transfer cycle

– both input and target are "true"

slide-58
SLIDE 58

Master Slave

Networks-on-Chips: Example 1

Core distribution:

  • Masters on the right/slaves on the left
slide-59
SLIDE 59

Master Slave Response Request

Networks-on-Chips: Example 1

  • The system is deadlock-free!