a gnostic r econfiguration i n a
play

A gnostic R econfiguration I n A D isconnected N etwork E nvironment - PowerPoint PPT Presentation

ARIADNE A gnostic R econfiguration I n A D isconnected N etwork E nvironment Konstantinos Aisopos (Princeton, MIT), Andrew DeOrio (Michigan), Li-Shiuan Peh (MIT), Valeria Bertacco (Michigan) What is reconfiguration? Silicon technologies


  1. ARIADNE A gnostic R econfiguration I n A D isconnected N etwork E nvironment Konstantinos Aisopos (Princeton, MIT), Andrew DeOrio (Michigan), Li-Shiuan Peh (MIT), Valeria Bertacco (Michigan)

  2. What is “reconfiguration”? Silicon technologies move into the nanometer regime …transistors become unreliable In future chips of 100 billion transistors, 10% for architects: permanent faults of transistors will eventually fail over Shekhar Borkar the lifetime of the chip Our focus in this talk: (Intel Fellow) Network-on-Chip cannot resend P need to re-route S D P$ S$ around the fault NIC reconfiguration: R “the process of replacing the routing algorithm”

  3. Why is reconfiguration challenging? • XY routing X S Y D

  4. Why is reconfiguration challenging? • XY routing Agnostic Reconfiguration S algorithm In A D Disconnected Network Environment

  5. Why is reconfiguration challenging? • XY routing A gnostic R econfiguration S algorithm I n A D D isconnected N etwork E nvironment

  6. Outline • Motivation • Ariadne – Baseline – Deadlocks – Synchronization • Evaluation – Overhead – Performance – Reliability • Conclusions

  7. How will S find a path to D? ? S D

  8. How will S find a path to D? RT S … D: S … RT RT … D … D: D: E W … … RT … D: N …

  9. How will S find a path to D? RT … D: S RT RT … S … … D: D: E,S W,S … … RT RT … … D D: W,N D: E,N … … RT … D: N …

  10. How will S find a path to D? S D

  11. How will S find a path to D? RT RT … … D: D: W W RT … … … D: S … S D

  12. ARIADNE: baseline • Upon a fault that changes the topology… – a node can let everyone know how it can be reached with a single broadcast – N nodes can let everyone know how they can be reached with N broadcasts

  13. ARIADNE: baseline • Upon a fault that changes the topology… – Every node broadcasts “in - turn” to let others know how it can be reached … 8 9 10 11 1st 2nd 3rd last fault detector Every node has a statically assigned node ID

  14. ARIADNE: baseline • Upon a fault that changes the topology… – Every node broadcasts “in - turn” to let others know how it can be reached … 8 9 10 11 1st 2nd 3rd last fault detector • Issues: – deadlock avoidance – synchronization (when to broadcast, multiple detectors)

  15. ARIADNE: deadlocks S D

  16. ARIADNE: deadlocks S S D D S D

  17. ARIADNE: deadlocks up*/down* disable routes where rank goes first bcast ONLY: nodes are higher assigned ranks bcaster 0 “root” lower immediate 1 neighbors in every circle: 2-hop 2 1 node will have neighbors r higher rank than its 3-hop 3 neighbors, breaking neighbors the circular route unique ordering: among nodes with same rank, arbitrarily select a higher one

  18. ARIADNE: deadlocks up*/down* disable routes where rank goes first bcast ONLY: nodes are higher assigned ranks bcaster 0 “root” lower immediate 1 neighbors in every circle: 2-hop 2 1 node will have neighbors r higher rank than its 3-hop 3 neighbors, breaking neighbors the circular route unique ordering: among nodes with same rank, arbitrarily select a higher one

  19. ARIADNE: deadlocks up*/down* disable routes where rank goes first bcast ONLY: S nodes are higher assigned ranks bcaster 0 “root” lower immediate 1 neighbors D in every circle: 2-hop 2 1 node will have neighbors r higher rank than its 3-hop 3 neighbors, breaking neighbors the circular route unique ordering: connectivity: among nodes with D S can reach any same rank, arbitrarily node via the root select a higher one

  20. ARIADNE: deadlocks • Upon a fault that changes the topology… – Every node broadcasts “in - turn” to let others know how it can be reached • Issues: – deadlock avoidance – synchronization

  21. ARIADNE: deadlocks • Upon a fault that changes the topology… – Every node broadcasts “in - turn” to let others know how it can be reached RULE: (i) first broadcast ranks nodes (ii) remaining broadcasts spread only via enabled turns • Issues: – synchronization

  22. ARIADNE: synchronization • How do I know completion of previous broadcast? can broadcasts overlap? • How does the recipient of a flag know the broadcasting node?

  23. ARIADNE: synchronization Solution : Atomic Broadcasts • Nodes utilize the cycle count as a global reference point • Each node is assigned a unique broadcast slot from the “global” cycle counter

  24. ARIADNE: synchronization cycle count (same for all nodes) 0 1 2 3 bcast bcast node cycle 0 X 1 X X 1 X 1 X 0 X 0 1 1 0 0 0 0 1 1 0 0 1 1 1 1 1 1 9 7 4 5 5 6 7 log(16) log(16) … bits bits waits for 5 0 0 1 0 0 1 1 1 1 4 15 8 9 10 11 5 initiates 0 1 0 1 0 0 0 0 5 0 … bcast 5’s bcast 0 1 0 1 1 1 1 1 5 15 completes 12 13 14 15 6 initiates 0 1 1 0 0 0 0 0 6 0 … bcast longest (in hops) broadcast 6’s bcast 0 1 1 0 1 1 1 1 6 15 completes … … reconfiguration completes in 4’s bcast 0 1 0 0 1 1 1 1 4 15 (16) 2 =(number of nodes) 2 cycles completes

  25. ARIADNE: synchronization cycle count (same for all nodes) 0 1 2 3 bcast bcast node cycle 1 st hop X 0 1 X 1 X X 1 0 X 0 X 0 1 0 1 0 0 1 1 0 0 1 1 1 1 1 1 9 7 4 5 5 6 7 2 nd hop … waits for 5 0 0 1 0 0 1 1 1 1 4 15 8 8 9 10 11 5 initiates 0 1 0 1 0 0 0 0 5 0 bcast waits for 8 0 0 0 0 1 5 1 8 resigns from 12 13 14 15 0 0 1 0 5 2 becoming the … root node (!) we need to reconfigure once 5’s bcast even for multiple faults 0 1 0 1 1 1 1 1 5 15 completes

  26. Outline • Motivation • Ariadne – Baseline – Deadlocks – Synchronization • Evaluation – Overhead – Performance – Reliability • Conclusions

  27. Evaluation: Overhead Evaluation • On-chip routing algorithms for irregular topologies Vicis routing algo Immunet (D. Fick , DATE’09) (V. Puente, ISCA’04) reserves an escape VC for exceptions to turn deadlock freedom (routes model to apply it to ARIADNE deterministically in a ring) an arbitrary topology 6.0% overhead 2.0% 1.5% performance reliability synthesized a baseline 5-stage pipelined router (5 ports, 2 VCs, 5-flit buffer/VC) with Synopsys Design Compiler (IBM 130nm target library): router area (mm 2 ): baseline=2.708, Ariadne=2.761, Vicis=2.748, Immunet=2.870

  28. Evaluation: Performance Average over 100 topologies • Experimental Setup: 10 PARSEC benchmarks Garnet + GEMS 100  lower is better System Configuration (GEMS) Average Latency (cycles) processors In-order SPARC cores 80  traffic coherence MOESI protocol deadlocks routing in a ring L1 caching private unified 32KB/node 60 ways: 2 latency: 3 cycles L2 caching shared distributed 1MB/node ways: 16 latency: 15 cycles 40 Network Architecture (GARNET) Ariadne (average) network topology 8x8 2D mesh 20 Vicis (average) memory controllers 4 at chip corners Immunet (average) channel width 64 bits 0 router architecture 5-stage pipeline 0 20 40 60 80 100 router ports, VCs 5, 2 (private) Injected Faults router buffers/port 5-flit for each VC

  29. Evaluation: Performance + Reliability • On-chip routing algorithms for irregular topologies Vicis routing algo Immunet (D. Fick , DATE’09) (V. Puente, ISCA’04) reserves an escape VC for exceptions to turn deadlock freedom (routes model to apply it to ARIADNE deterministically in a ring) an arbitrary topology 6.0% overhead 2.0% 1.5% performance reliability

  30. Outline • Motivation • Ariadne – Baseline – Deadlocks – Synchronization • Evaluation – Overhead – Performance – Reliability • Conclusions

  31. Conclusions We have presented Ariadne. • a reconfiguration algorithm that provides deadlock-free routing paths in irregular network topologies that result from faulty links • is implemented in a fully distributed mode, resulting in simple hardware and low complexity • enables a trade-off between performance and reliable functionality on unreliable silicon

  32. Thank You! Questions? [source: wikipedia] The Greek legend of Princess Ariadne “Ariadne (Αριάδνη ), was the daughter of King Minos of Crete. Minos attacked Athens after his son was killed there. The Athenians asked for terms, and were required to sacrifice seven young men and seven maidens every nine years to the Minotaur, a monster with the head of a bull on the body of a man. One year, the sacrificial party included Theseus, a young man who volunteered to come and kill the Minotaur. Ariadne fell in love at first sight, and helped him by giving him a ball of red fleece thread that she was spinning, to find his way out of the Minotaur's labyrinth. ” …similarly to Princess Ariadne, our Ariadne algorithm helps packets find their way in the labyrinth-like topology of a faulty network.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend