ALowOverheadAsynchronous ALowOverheadAsynchronous - PowerPoint PPT Presentation

A Low‐Overhead Asynchronous A Low‐Overhead Asynchronous Interconnection Network for Interconnection Network for GALS Chip Multiprocessors GALS Chip Multiprocessors Michael N. Horak ,  Michael N. Horak ,  University of Maryland University of Maryland Steven M. Nowick ,  Steven M. Nowick ,  Columbia University Columbia University Matthew Carlberg Carlberg ,  Matthew  ,  UC Berkeley UC Berkeley Uzi Vishkin Vishkin ,  Uzi  ,  University of Maryland University of Maryland In ACM/IEEE Int. Symp. on Networks-on-Chip (NOCS-10)

Challenges for Designing Networks‐on‐Chip Challenges for Designing Networks‐on‐Chip • Power Consumption Power Consumption • – Will exceed future power budgets by a factor of  Will exceed future power budgets by a factor of  !"# !"#  [1]  [1] – – Global clocks: consume large fraction of overall power Global clocks: consume large fraction of overall power – • Performance Bottlenecks Performance Bottlenecks • – Large network latencies cause performance degradation Large network latencies cause performance degradation – • Increased Designer Resources Increased Designer Resources • – Many techniques are incompatible with current CAD tools Many techniques are incompatible with current CAD tools – – Difficulties integrating heterogeneous modules Difficulties integrating heterogeneous modules – • Chips partitioned into  Chips partitioned into  !"#$%&#'($%!%)*(+,!-%). !"#$%&#'($%!%)*(+,!-%). • [1] J.D. Owens, W.J. Dally, R. Ho, D.N.  [1] J.D. Owens, W.J. Dally, R. Ho, D.N. Jayasimha Jayasimha, S.W.  , S.W. Keckler Keckler, and L.‐S.  , and L.‐S. Peh Peh. . Research challenges for on‐chip interconnection networks.  IEEE Micro IEEE Micro , 27(5):96‐108, 2007. , 27(5):96‐108, 2007. Research challenges for on‐chip interconnection networks. 

Potential Advantages of Asynchronous Design Potential Advantages of Asynchronous Design • Lower Power Lower Power • – No clock power consumed:  No clock power consumed:    without without   clock gating clock gating – – Idle components inherently consume low power Idle components inherently consume low power – • Greater Flexibility/Modularity Greater Flexibility/Modularity • – No clock distribution No clock distribution – – Easier integration between multiple timing domains Easier integration between multiple timing domains – – Supports reusable components Supports reusable components – • Lower System Latency Lower System Latency • – End‐to‐end traffic without clock synchronization End‐to‐end traffic without clock synchronization – • More Resilient to On‐Chip Variations More Resilient to On‐Chip Variations • – Correct operation depends on localized timing constraints Correct operation depends on localized timing constraints –

Mixed‐Timing (GALS) System Mixed‐Timing (GALS) System • Globally Asynchronous, Globally Asynchronous, • Locally Synchronous [2] Locally Synchronous [2] [2] D.  [2] D. Chapiro Chapiro.  .  Globally‐Asynchronous Locally‐Synchronous Systems.  Globally‐Asynchronous Locally‐Synchronous Systems.  PhD thesis, Stanford Univ., 1984. PhD thesis, Stanford Univ., 1984.

Mixed‐Timing (GALS) System Mixed‐Timing (GALS) System • Globally Asynchronous, Globally Asynchronous, • Locally Synchronous [2] Locally Synchronous [2] • Asynchronous Network Asynchronous Network • – Clockless  Clockless network fabric network fabric – [2] D.  [2] D. Chapiro Chapiro.  .  Globally‐Asynchronous Locally‐Synchronous Systems.  Globally‐Asynchronous Locally‐Synchronous Systems.  PhD thesis, Stanford Univ., 1984. PhD thesis, Stanford Univ., 1984.

Mixed‐Timing (GALS) System Mixed‐Timing (GALS) System • Globally Asynchronous, Globally Asynchronous, • Locally Synchronous [2] Locally Synchronous [2] • Asynchronous Network Asynchronous Network • – Clockless  Clockless network fabric network fabric – • Synchronous Terminals Synchronous Terminals • – Different unrelated clocks Different unrelated clocks – [2] D.  [2] D. Chapiro Chapiro.  .  Globally‐Asynchronous Locally‐Synchronous Systems.  Globally‐Asynchronous Locally‐Synchronous Systems.  PhD thesis, Stanford Univ., 1984. PhD thesis, Stanford Univ., 1984.

Mixed‐Timing (GALS) System Mixed‐Timing (GALS) System • Globally Asynchronous, Globally Asynchronous, • Locally Synchronous [2] Locally Synchronous [2] • Asynchronous Network Asynchronous Network • – Clockless  Clockless network fabric network fabric – • Synchronous Terminals Synchronous Terminals • – Different unrelated clocks Different unrelated clocks – • Mixed‐Timing Interfaces Mixed‐Timing Interfaces • – Provide robust communication Provide robust communication – between Sync and Async  Async domains domains between Sync and  [2] D.  [2] D. Chapiro Chapiro.  .  Globally‐Asynchronous Locally‐Synchronous Systems.  Globally‐Asynchronous Locally‐Synchronous Systems.  PhD thesis, Stanford Univ., 1984. PhD thesis, Stanford Univ., 1984.

Advances in GALS Networks‐on‐Chip Advances in GALS Networks‐on‐Chip • Commercial Designs Commercial Designs • – Silistix Silistix, Inc.  , Inc.  (J. Bainbridge, S.  – (J. Bainbridge, S. Furber Furber. IEEE Micro‐02) . IEEE Micro‐02) • CHAIN CHAIN™ ™ works tool suite:  heterogeneous   works tool suite:  heterogeneous SOCs SOCs • – Fulcrum Microsystems  Fulcrum Microsystems  (A. Lines. Micro‐04) – (A. Lines. Micro‐04) • FocalPoint  FocalPoint chips: chips:     high‐performance Ethernet routing high‐performance Ethernet routing • • Recent Recent   Work Work • – Asynchronous Network‐on‐Chip ( Asynchronous Network‐on‐Chip (ANoC ANoC)  )  ( – (Beigne Beigne,  , Clermidy Clermidy,  , Vivet  Vivet et al. Async‐05) et al. Async‐05) • Wormhole packet‐switched  Wormhole packet‐switched NoC  NoC with low‐latency service with low‐latency service • – MANGO  MANGO Clockless  Clockless Network‐on‐Chip  Network‐on‐Chip  (T.  – (T. Bjerregaard Bjerregaard. DATE‐05) . DATE‐05) • Offers quality‐of‐service ( Offers quality‐of‐service (QoS QoS) guarantees ) guarantees • – RasP  RasP On‐Chip Network  On‐Chip Network  (S. Hollis, S.W. Moore. ICCD‐06) – (S. Hollis, S.W. Moore. ICCD‐06) • Utilizes high‐speed pulse‐based signaling Utilizes high‐speed pulse‐based signaling • – SpiNNaker  SpiNNaker Project  Project  (Khan, Lester,  – (Khan, Lester, Plana Plana,  , Furber  Furber et al. IJCNN‐08) et al. IJCNN‐08) • Massively‐parallel neural simulation Massively‐parallel neural simulation •

GALS NOCs NOCs:  Typical Current Targets :  Typical Current Targets GALS  • Low‐ to Moderate‐Performance Embedded Systems Low‐ to Moderate‐Performance Embedded Systems • – 200‐500 MHz 200‐500 MHz – – High system latency High system latency – • “ “Four‐Phase Return‐to‐Zero Four‐Phase Return‐to‐Zero” ” Protocols  Protocols • – Two round‐trips/link Two round‐trips/link per transaction  per transaction – • “ “Delay‐Insensitive Data Delay‐Insensitive Data” ” Encoding (dual‐rail, 1‐of‐4)  Encoding (dual‐rail, 1‐of‐4) • – Lower coding efficiency than single‐rail Lower coding efficiency than single‐rail – • Complex‐Functionality Router Nodes Complex‐Functionality Router Nodes • – 5‐port routers with layered services ( 5‐port routers with layered services (QoS QoS, etc.) , etc.) – – High latency/high area High latency/high area – • Custom Circuit Techniques: Custom Circuit Techniques: • – Pulse‐based signaling, low‐swing  Pulse‐based signaling, low‐swing signalling signalling – – Dynamic Dynamic   logic, specialized cells logic, specialized cells –

ALowOverheadAsynchronous ALowOverheadAsynchronous - PowerPoint PPT Presentation

ALowOverheadAsynchronous ALowOverheadAsynchronous InterconnectionNetworkfor InterconnectionNetworkfor GALSChipMultiprocessors GALSChipMultiprocessors MichaelN.Horak , MichaelN.Horak ,

Low-Overhead System Tracing With eBPF Akshay Kapoor DevOps Engineer @ SAP Labs May 2018

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers

AN ASYNCHRONOUS DIVIDER IMPLEMENTATION Navaneeth Jamadagni and Jo Ebergen 2 Asynchronous

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

Bursty Tracing: A Framework for Low-Overhead Temporal Profiling Martin Hirzel Trishul Chilimbi

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Electric Traction Electrified railway systems Prof. Dr. Ir. R.P.B.J. Dollevoet Introduction

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

Asynchronous sequence circuits An asynchronous sequence machine is a sequence circuit without

OVERHEAD CRANE OVERHEAD CRANE-HOIST HOIST-JIB CRANE JIB CRANE ATEX PLANT ATEX PLANT

Tables in TEX \eTD \bTD overhead \eTD \eTR overhead so much \eTABLE \eTD \eTR \bTABLE even

File System Performance File System Performance Memory mapped files - Avoid system call overhead

Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

Asynchronous Presentation Asynchronous Presentation VoiceThreads http://voicethreads.com

An Adiabatic Power-Supply Controller for An Adiabatic Power-Supply Controller for Asynchronous

Lecture 15 Logistics HW6 is out, due Wednesday Last lecture Continuing on basic

Parallel & Distributed Real-Time Systems Lecture #12 Professor Jan Jonsson Department of

Synchronous Computations, Basic techniques (Secs. 6.1-6.2) T-79.4001 Seminar on Theoretical

Dariusz Makowski Department of Microelectronics and Computer Science tel. 631 2720

Layered Protocols Low-level layers Transport layer Application layer Middleware layer 1 / 54

System Buses Chapter 5 S. Dandamudi Outline Introduction Bus arbitration Dynamic

Interrupts and System Calls Don Porter 1 COMP 530: Operating Systems First lecture Ok,

Learning Goals [CT Building Block] Define spam, phishing schemes, and cookies and give

ALowOverheadAsynchronous ALowOverheadAsynchronous - PowerPoint PPT Presentation

ALowOverheadAsynchronous ALowOverheadAsynchronous InterconnectionNetworkfor InterconnectionNetworkfor GALSChipMultiprocessors GALSChipMultiprocessors MichaelN.Horak , MichaelN.Horak ,

Low-Overhead System Tracing With eBPF Akshay Kapoor DevOps Engineer @ SAP Labs May 2018

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers

AN ASYNCHRONOUS DIVIDER IMPLEMENTATION Navaneeth Jamadagni and Jo Ebergen 2 Asynchronous

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

Bursty Tracing: A Framework for Low-Overhead Temporal Profiling Martin Hirzel Trishul Chilimbi

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Electric Traction Electrified railway systems Prof. Dr. Ir. R.P.B.J. Dollevoet Introduction

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

Asynchronous sequence circuits An asynchronous sequence machine is a sequence circuit without

OVERHEAD CRANE OVERHEAD CRANE-HOIST HOIST-JIB CRANE JIB CRANE ATEX PLANT ATEX PLANT

Tables in TEX \eTD \bTD overhead \eTD \eTR overhead so much \eTABLE \eTD \eTR \bTABLE even

File System Performance File System Performance Memory mapped files - Avoid system call overhead

Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

Asynchronous Presentation Asynchronous Presentation VoiceThreads http://voicethreads.com

An Adiabatic Power-Supply Controller for An Adiabatic Power-Supply Controller for Asynchronous

Lecture 15 Logistics HW6 is out, due Wednesday Last lecture Continuing on basic

Parallel &amp; Distributed Real-Time Systems Lecture #12 Professor Jan Jonsson Department of

Synchronous Computations, Basic techniques (Secs. 6.1-6.2) T-79.4001 Seminar on Theoretical

Dariusz Makowski Department of Microelectronics and Computer Science tel. 631 2720

Layered Protocols Low-level layers Transport layer Application layer Middleware layer 1 / 54

System Buses Chapter 5 S. Dandamudi Outline Introduction Bus arbitration Dynamic

Interrupts and System Calls Don Porter 1 COMP 530: Operating Systems First lecture Ok,

Learning Goals [CT Building Block] Define spam, phishing schemes, and cookies and give

Parallel & Distributed Real-Time Systems Lecture #12 Professor Jan Jonsson Department of