A gnostic R econfiguration I n A D isconnected N etwork E nvironment - - PowerPoint PPT Presentation

a gnostic r econfiguration i n a
SMART_READER_LITE
LIVE PREVIEW

A gnostic R econfiguration I n A D isconnected N etwork E nvironment - - PowerPoint PPT Presentation

ARIADNE A gnostic R econfiguration I n A D isconnected N etwork E nvironment Konstantinos Aisopos (Princeton, MIT), Andrew DeOrio (Michigan), Li-Shiuan Peh (MIT), Valeria Bertacco (Michigan) What is reconfiguration? Silicon technologies


slide-1
SLIDE 1

ARIADNE Agnostic Reconfiguration In A Disconnected Network Environment

Konstantinos Aisopos (Princeton, MIT), Andrew DeOrio (Michigan), Li-Shiuan Peh (MIT), Valeria Bertacco (Michigan)

slide-2
SLIDE 2

What is “reconfiguration”?

Silicon technologies move into the nanometer regime …transistors become unreliable

for architects: permanent faults

Our focus in this talk: Network-on-Chip cannot resend need to re-route around the fault reconfiguration: “the process of replacing the routing algorithm”

S D

Shekhar Borkar (Intel Fellow) In future chips of 100 billion transistors, 10%

  • f transistors will

eventually fail over the lifetime of the chip

P P$ S$

NIC

R

slide-3
SLIDE 3

Why is reconfiguration challenging?

  • XY routing

S D

X Y

slide-4
SLIDE 4

S D

  • XY routing

Why is reconfiguration challenging?

Agnostic Reconfiguration algorithm In A Disconnected Network Environment

slide-5
SLIDE 5

S D

  • XY routing

Why is reconfiguration challenging?

Agnostic Reconfiguration algorithm

In

A Disconnected Network Environment

slide-6
SLIDE 6

Outline

  • Motivation
  • Ariadne

–Baseline –Deadlocks –Synchronization

  • Evaluation

–Overhead –Performance –Reliability

  • Conclusions
slide-7
SLIDE 7

S D

How will S find a path to D?

?

slide-8
SLIDE 8

S D

How will S find a path to D?

RT

E

… …

D:

RT

S

… …

D:

RT

W

… …

D:

RT

N

… …

D:

slide-9
SLIDE 9

S D

How will S find a path to D?

RT

E,S

… …

D:

RT

S

… …

D:

RT

W,N

… …

D:

RT

E,N

… …

D:

RT

W,S

… …

D:

RT

N

… …

D:

slide-10
SLIDE 10

S D

How will S find a path to D?

slide-11
SLIDE 11

How will S find a path to D?

D S

RT

W

… …

D:

RT

W

… …

D:

RT

S

… …

D:

slide-12
SLIDE 12
  • Upon a fault that changes the topology…

– a node can let everyone know how it can be reached with a single broadcast – N nodes can let everyone know how they can be reached with N broadcasts

ARIADNE: baseline

slide-13
SLIDE 13
  • Upon a fault that changes the topology…

– Every node broadcasts “in-turn” to let

  • thers know how it can be reached

ARIADNE: baseline

1st

9 10

2nd

11

3rd

… 8

last fault detector

Every node has a statically assigned node ID

slide-14
SLIDE 14
  • Upon a fault that changes the topology…

– Every node broadcasts “in-turn” to let

  • thers know how it can be reached

ARIADNE: baseline

1st

9 10

2nd

11

3rd

… 8

last fault detector

  • Issues:

– deadlock avoidance – synchronization (when to broadcast, multiple detectors)

slide-15
SLIDE 15

S D

ARIADNE: deadlocks

slide-16
SLIDE 16

S D

ARIADNE: deadlocks

D S D S

slide-17
SLIDE 17

r

ARIADNE: deadlocks

first bcast ONLY: nodes are assigned ranks bcaster “root”

1

immediate neighbors

2

2-hop neighbors

3

3-hop neighbors up*/down* disable routes where rank goes

higher lower

unique ordering: among nodes with same rank, arbitrarily select a higher one in every circle: 1 node will have higher rank than its neighbors, breaking the circular route

slide-18
SLIDE 18

r

ARIADNE: deadlocks

first bcast ONLY: nodes are assigned ranks bcaster “root”

1

immediate neighbors

2

2-hop neighbors

3

3-hop neighbors up*/down* disable routes where rank goes

higher lower

unique ordering: among nodes with same rank, arbitrarily select a higher one in every circle: 1 node will have higher rank than its neighbors, breaking the circular route

slide-19
SLIDE 19

r

ARIADNE: deadlocks

first bcast ONLY: nodes are assigned ranks bcaster “root”

1

immediate neighbors

2

2-hop neighbors

3

3-hop neighbors up*/down* disable routes where rank goes

higher lower

unique ordering: among nodes with same rank, arbitrarily select a higher one in every circle: 1 node will have higher rank than its neighbors, breaking the circular route connectivity: can reach any node via the root

S D S D

slide-20
SLIDE 20
  • Upon a fault that changes the topology…

– Every node broadcasts “in-turn” to let

  • thers know how it can be reached

ARIADNE: deadlocks

  • Issues:

– deadlock avoidance – synchronization

slide-21
SLIDE 21
  • Upon a fault that changes the topology…

– Every node broadcasts “in-turn” to let

  • thers know how it can be reached

RULE: (i) first broadcast ranks nodes (ii) remaining broadcasts spread only via enabled turns

ARIADNE: deadlocks

  • Issues:

– synchronization

slide-22
SLIDE 22
  • How do I know completion of previous

broadcast? can broadcasts overlap?

  • How does the recipient of a flag know the

broadcasting node?

ARIADNE: synchronization

slide-23
SLIDE 23

Solution : Atomic Broadcasts

  • Nodes utilize the cycle count as a global

reference point

  • Each node is assigned a unique broadcast slot

from the “global” cycle counter

ARIADNE: synchronization

slide-24
SLIDE 24

4 1 5 2 6 3 7 8 12 9 13 10 14 11 15 1 1 1 1 1 1 1 1 cycle count (same for all nodes) X X X X X X 1 1 1 1 1 bcast cycle bcast node 9 7 5

waits for

5 0

1 1 1 1 1 4 15 1 1 5 0

5 initiates bcast

1 1 1 1 1 1 5 15

5’s bcast completes 6 initiates bcast

1 1 6 0

1 1 1 1 1 1 6 15

6’s bcast completes

4’s bcast completes

1 1 1 1 1 4 15 reconfiguration completes in (16)2 =(number of nodes)2 cycles log(16) bits log(16) bits

longest (in hops) broadcast

ARIADNE: synchronization

slide-25
SLIDE 25

4 1 5 2 6 3 7 8 12 9 13 10 14 11 1 1 1 1 1 1 1 1 cycle count (same for all nodes) X X X X X X 1 1 1 1 1 bcast cycle bcast node 9 7 5

waits for

5 0

1 1 1 1 1 4 15 1 1 5 0

5 initiates bcast

15 1 1 1 1 1 1 5 15

5’s bcast completes

8

waits for

8 0 1st hop 2nd hop 1 5 1 1 5 2

8 resigns from becoming the root node

(!) we need to reconfigure once even for multiple faults

ARIADNE: synchronization

slide-26
SLIDE 26

Outline

  • Motivation
  • Ariadne

–Baseline –Deadlocks –Synchronization

  • Evaluation

–Overhead –Performance –Reliability

  • Conclusions
slide-27
SLIDE 27
  • verhead

performance reliability

  • On-chip routing algorithms for irregular topologies

Immunet (V. Puente, ISCA’04) Vicis routing algo (D. Fick, DATE’09)

reserves an escape VC for deadlock freedom (routes deterministically in a ring) exceptions to turn model to apply it to an arbitrary topology

Evaluation

ARIADNE

synthesized a baseline 5-stage pipelined router (5 ports, 2 VCs, 5-flit buffer/VC) with Synopsys Design Compiler (IBM 130nm target library): router area (mm2): baseline=2.708, Ariadne=2.761, Vicis=2.748, Immunet=2.870

1.5% 2.0% 6.0%

Evaluation: Overhead

slide-28
SLIDE 28
  • Experimental Setup:

Garnet + GEMS

network topology 8x8 2D mesh memory controllers 4 at chip corners channel width 64 bits router architecture 5-stage pipeline router ports, VCs 5, 2 (private) router buffers/port 5-flit for each VC processors In-order SPARC cores coherence MOESI protocol L1 caching private unified 32KB/node ways: 2 latency: 3 cycles L2 caching shared distributed 1MB/node ways: 16 latency: 15 cycles

Network Architecture (GARNET) System Configuration (GEMS)

Evaluation: Performance

20 40 60 80 100 20 40 60 80 100

Average Latency (cycles) Injected Faults

Ariadne (average) Vicis (average) Immunet (average)

Average over 100 topologies 10 PARSEC benchmarks

 lower is better

deadlocks  traffic routing in a ring

slide-29
SLIDE 29
  • verhead

performance reliability

  • On-chip routing algorithms for irregular topologies

Immunet (V. Puente, ISCA’04) Vicis routing algo (D. Fick, DATE’09)

reserves an escape VC for deadlock freedom (routes deterministically in a ring) exceptions to turn model to apply it to an arbitrary topology

Evaluation: Performance + Reliability

ARIADNE

1.5% 2.0% 6.0%

slide-30
SLIDE 30

Outline

  • Motivation
  • Ariadne

–Baseline –Deadlocks –Synchronization

  • Evaluation

–Overhead –Performance –Reliability

  • Conclusions
slide-31
SLIDE 31

Conclusions

We have presented Ariadne.

  • a reconfiguration algorithm that provides

deadlock-free routing paths in irregular network topologies that result from faulty links

  • is implemented in a fully distributed mode,

resulting in simple hardware and low complexity

  • enables a trade-off between performance and

reliable functionality on unreliable silicon

slide-32
SLIDE 32

Thank You!

Questions?

The Greek legend of Princess Ariadne

“Ariadne (Αριάδνη), was the daughter of King Minos of Crete. Minos attacked Athens after his son was killed there. The Athenians asked for terms, and were required to sacrifice seven young men and seven maidens every nine years to the Minotaur, a monster with the head of a bull on the body of a

  • man. One year, the sacrificial party included Theseus, a young man who

volunteered to come and kill the Minotaur. Ariadne fell in love at first sight, and helped him by giving him a ball of red fleece thread that she was spinning, to find his way out of the Minotaur's labyrinth.” …similarly to Princess Ariadne, our Ariadne algorithm helps packets find their way in the labyrinth-like topology of a faulty network.

[source: wikipedia]