A Low‐Overhead Asynchronous A Low‐Overhead Asynchronous Interconnection Network for Interconnection Network for GALS Chip Multiprocessors GALS Chip Multiprocessors Michael N. Horak , Michael N. Horak , University of Maryland University of Maryland Steven M. Nowick , Steven M. Nowick , Columbia University Columbia University Matthew Carlberg Carlberg , Matthew , UC Berkeley UC Berkeley Uzi Vishkin Vishkin , Uzi , University of Maryland University of Maryland In ACM/IEEE Int. Symp. on Networks-on-Chip (NOCS-10)
Challenges for Designing Networks‐on‐Chip Challenges for Designing Networks‐on‐Chip • Power Consumption Power Consumption • – Will exceed future power budgets by a factor of Will exceed future power budgets by a factor of !"# !"# [1] [1] – – Global clocks: consume large fraction of overall power Global clocks: consume large fraction of overall power – • Performance Bottlenecks Performance Bottlenecks • – Large network latencies cause performance degradation Large network latencies cause performance degradation – • Increased Designer Resources Increased Designer Resources • – Many techniques are incompatible with current CAD tools Many techniques are incompatible with current CAD tools – – Difficulties integrating heterogeneous modules Difficulties integrating heterogeneous modules – • Chips partitioned into Chips partitioned into !"#$%&#'($%!%)*(+,!-%). !"#$%&#'($%!%)*(+,!-%). • [1] J.D. Owens, W.J. Dally, R. Ho, D.N. [1] J.D. Owens, W.J. Dally, R. Ho, D.N. Jayasimha Jayasimha, S.W. , S.W. Keckler Keckler, and L.‐S. , and L.‐S. Peh Peh. . Research challenges for on‐chip interconnection networks. IEEE Micro IEEE Micro , 27(5):96‐108, 2007. , 27(5):96‐108, 2007. Research challenges for on‐chip interconnection networks.
Potential Advantages of Asynchronous Design Potential Advantages of Asynchronous Design • Lower Power Lower Power • – No clock power consumed: No clock power consumed: without without clock gating clock gating – – Idle components inherently consume low power Idle components inherently consume low power – • Greater Flexibility/Modularity Greater Flexibility/Modularity • – No clock distribution No clock distribution – – Easier integration between multiple timing domains Easier integration between multiple timing domains – – Supports reusable components Supports reusable components – • Lower System Latency Lower System Latency • – End‐to‐end traffic without clock synchronization End‐to‐end traffic without clock synchronization – • More Resilient to On‐Chip Variations More Resilient to On‐Chip Variations • – Correct operation depends on localized timing constraints Correct operation depends on localized timing constraints –
Mixed‐Timing (GALS) System Mixed‐Timing (GALS) System • Globally Asynchronous, Globally Asynchronous, • Locally Synchronous [2] Locally Synchronous [2] [2] D. [2] D. Chapiro Chapiro. . Globally‐Asynchronous Locally‐Synchronous Systems. Globally‐Asynchronous Locally‐Synchronous Systems. PhD thesis, Stanford Univ., 1984. PhD thesis, Stanford Univ., 1984.
Mixed‐Timing (GALS) System Mixed‐Timing (GALS) System • Globally Asynchronous, Globally Asynchronous, • Locally Synchronous [2] Locally Synchronous [2] • Asynchronous Network Asynchronous Network • – Clockless Clockless network fabric network fabric – [2] D. [2] D. Chapiro Chapiro. . Globally‐Asynchronous Locally‐Synchronous Systems. Globally‐Asynchronous Locally‐Synchronous Systems. PhD thesis, Stanford Univ., 1984. PhD thesis, Stanford Univ., 1984.
Mixed‐Timing (GALS) System Mixed‐Timing (GALS) System • Globally Asynchronous, Globally Asynchronous, • Locally Synchronous [2] Locally Synchronous [2] • Asynchronous Network Asynchronous Network • – Clockless Clockless network fabric network fabric – • Synchronous Terminals Synchronous Terminals • – Different unrelated clocks Different unrelated clocks – [2] D. [2] D. Chapiro Chapiro. . Globally‐Asynchronous Locally‐Synchronous Systems. Globally‐Asynchronous Locally‐Synchronous Systems. PhD thesis, Stanford Univ., 1984. PhD thesis, Stanford Univ., 1984.
Mixed‐Timing (GALS) System Mixed‐Timing (GALS) System • Globally Asynchronous, Globally Asynchronous, • Locally Synchronous [2] Locally Synchronous [2] • Asynchronous Network Asynchronous Network • – Clockless Clockless network fabric network fabric – • Synchronous Terminals Synchronous Terminals • – Different unrelated clocks Different unrelated clocks – • Mixed‐Timing Interfaces Mixed‐Timing Interfaces • – Provide robust communication Provide robust communication – between Sync and Async Async domains domains between Sync and [2] D. [2] D. Chapiro Chapiro. . Globally‐Asynchronous Locally‐Synchronous Systems. Globally‐Asynchronous Locally‐Synchronous Systems. PhD thesis, Stanford Univ., 1984. PhD thesis, Stanford Univ., 1984.
Advances in GALS Networks‐on‐Chip Advances in GALS Networks‐on‐Chip • Commercial Designs Commercial Designs • – Silistix Silistix, Inc. , Inc. (J. Bainbridge, S. – (J. Bainbridge, S. Furber Furber. IEEE Micro‐02) . IEEE Micro‐02) • CHAIN CHAIN™ ™ works tool suite: heterogeneous works tool suite: heterogeneous SOCs SOCs • – Fulcrum Microsystems Fulcrum Microsystems (A. Lines. Micro‐04) – (A. Lines. Micro‐04) • FocalPoint FocalPoint chips: chips: high‐performance Ethernet routing high‐performance Ethernet routing • • Recent Recent Work Work • – Asynchronous Network‐on‐Chip ( Asynchronous Network‐on‐Chip (ANoC ANoC) ) ( – (Beigne Beigne, , Clermidy Clermidy, , Vivet Vivet et al. Async‐05) et al. Async‐05) • Wormhole packet‐switched Wormhole packet‐switched NoC NoC with low‐latency service with low‐latency service • – MANGO MANGO Clockless Clockless Network‐on‐Chip Network‐on‐Chip (T. – (T. Bjerregaard Bjerregaard. DATE‐05) . DATE‐05) • Offers quality‐of‐service ( Offers quality‐of‐service (QoS QoS) guarantees ) guarantees • – RasP RasP On‐Chip Network On‐Chip Network (S. Hollis, S.W. Moore. ICCD‐06) – (S. Hollis, S.W. Moore. ICCD‐06) • Utilizes high‐speed pulse‐based signaling Utilizes high‐speed pulse‐based signaling • – SpiNNaker SpiNNaker Project Project (Khan, Lester, – (Khan, Lester, Plana Plana, , Furber Furber et al. IJCNN‐08) et al. IJCNN‐08) • Massively‐parallel neural simulation Massively‐parallel neural simulation •
GALS NOCs NOCs: Typical Current Targets : Typical Current Targets GALS • Low‐ to Moderate‐Performance Embedded Systems Low‐ to Moderate‐Performance Embedded Systems • – 200‐500 MHz 200‐500 MHz – – High system latency High system latency – • “ “Four‐Phase Return‐to‐Zero Four‐Phase Return‐to‐Zero” ” Protocols Protocols • – Two round‐trips/link Two round‐trips/link per transaction per transaction – • “ “Delay‐Insensitive Data Delay‐Insensitive Data” ” Encoding (dual‐rail, 1‐of‐4) Encoding (dual‐rail, 1‐of‐4) • – Lower coding efficiency than single‐rail Lower coding efficiency than single‐rail – • Complex‐Functionality Router Nodes Complex‐Functionality Router Nodes • – 5‐port routers with layered services ( 5‐port routers with layered services (QoS QoS, etc.) , etc.) – – High latency/high area High latency/high area – • Custom Circuit Techniques: Custom Circuit Techniques: • – Pulse‐based signaling, low‐swing Pulse‐based signaling, low‐swing signalling signalling – – Dynamic Dynamic logic, specialized cells logic, specialized cells –
Recommend
More recommend