Designing Networks on Chip: Designing Networks on Chip: Solutions - PowerPoint PPT Presentation

Designing Networks on Chip: Designing Networks on Chip: Solutions and Challenges Solutions and Challenges Luca Benini Benini Luca DEIS – – Universita Universita` ` di di Bologna Bologna DEIS

Designing a micro-network • Physical layer – signalling – synchronization • Architecture and control – network topology – data flow: packetization, encoding – control flow: media access, switching, routing • Software – communication API: implicit vs. explicit – run-time management 2 Luca Benini MPSoC 2002

Physical layer: the channel • Channel characteristics – Global wires: lumped → distributed models n • Time of flight is non-negligible – Inductance effects F T F C F R + • Refelections, matching issues • Designing around the channel’s transfer function – Current mode vs. voltage mode [Dally98,Burleson01] • Low swing vs. rail-to-rail – Repeater insertion [Friedman01,Burleson02] – Wire sizing [Cong01,Alpert01] – Pre-emphasis / post-filtering [Horowitz99] – Modulation [Dally98,Bogliolo01] 0 0 1 1 1 1 3 Luca Benini MPSoC 2002

Case study: Low Swing signalling • Pseudodifferential interconnect [Zhang et al., JSSC00] (x6 energy reduction vs. CMOS Vdd=2V) Static FF Low Vdd reference (0.5V) Clocked SA 4 Luca Benini MPSoC 2002

Physical layer: synchronization • Single, global timing reference is not 1 2 realistic 0 – Direct consequence of non-negligible tof – Isochronous regions are shrinking • How and when to sample the channel? – Avoid a clock: asyncronous communication – The clock travels with the data B 1 …B n – The clock can be reconstructed from the data CLK • Synchronization recovery has a cost – Cannot be abstracted away D Q – Can cause errors (e.g., metastability) CK 5 Luca Benini MPSoC 2002

Case study: Asyncronous Bus • MARBLE SoC Bus [Bainbridge et al. Asynch01) – 1-of-4 encoding (4 wires for 2 bits) – Delay insensitive - No bus clock - Wire pipelining – High crosstalk immunity – Four-phase ACK protocol 00 01 10 11 L1 L2 L3 L4 6 Luca Benini MPSoC 2002

Physical layer: multiobjective optimization • Communication is unreliable: – Crosstalk, supply noise, synchronization noise ⇒ P bitflip > 0 • S/N minimization via S maximization is highly suboptimal (energy-wise) • High performance decreases reliability – Shorter eye opening • Wire redundancy helps – But consumes wiring resources Multiobjective design space: design space: Multiobjective energy vs. performance . performance vs vs. . reliability tradeoffs reliability tradeoffs energy vs 7 Luca Benini MPSoC 2002

Case study: EC vs. ED codes • Low swing signalling with redundant codes [Bertozzi et al. DATE02]: exploring energy vs. error rate tradeoff 8 Luca Benini MPSoC 2002

NoC Architecture: topology • Point-to-point vs. shared medium – Shared mediun: On-chip bus P1 P2 P3 • Dominant today (e.g. AMBA, CoreConnect, etc.) • Unidirectional (vs. off-chip three state) mux A • Bridged (high speed vs. peripherals) • Performance/Power bottleneck – Point-to-point: dedicated links • Ad-hoc width • Ad-hoc control • Wiring bottleneck • Towards multi-stage networks – Hierarchical+eterogeneous, e.g. Maia [Rabaey00] – Omogeneous e.g. FPGAs, Network processors, … 9 Luca Benini MPSoC 2002

Topology optimization • One size does not fit all... – Low-area, low performance systems • Shared medium (on-chip bus) – General-purpose, high performance • Omogenenous multi-stage networks – Domain-specific, low power • Eterogeneous multi-stage networks • EDA support – Physical design (floorplanning, routing, layer assignment) • Eterogeneous solution requires strongest EDA support – IP-based approach (VSI: topology-neutral) 10 Luca Benini MPSoC 2002

Case study: hierarchical networks • AMBA [Flynn Micro97]: bridged bus architecture • Hierarchical Mesh [Zhang et al. JSSC00] cluster Universal (intra-cluster) cluster Hierarchial switch-boxes Cluster switch-boxes Switchboxes cluster 11 Luca Benini MPSoC 2002

NoC control: data flow • Packetization – Payload: single-word vs. multi-word • E.g. burst transactions in AMBA – Header-tail: in packet vs. dedicated channels • E.g. SPIN (in-packet) [Guerrier00] vs. AMBA (control signals) – Acknowledgement: blocking vs. non blocking • E.g. Split transaction bus in Daytona [Ackland00] • Data representation/encoding – Fast hardware-based compression [Benini01] – Encoding for low energy/error resiliency […] 12 Luca Benini MPSoC 2002

NoC data-flow optimization • Packet size/format optimization – Payload vs. control • Lager payload ⇒ reduce control overhead • Smaller payload ⇒ improved error recovery – Dedicated control channels vs. in-packet • Control wires overhead (long and slow) • Smaller payload (reduced effective bandwidth) • Forward (data) and backward (ack) traffic • Data representation – Payload/address compression, low power encodings • Compression-decompression cost (performance/power) 13 Luca Benini MPSoC 2002

Case study: STBus • Daytona split transaction bus [Ackland JSSC00] – Pipelined 128b Data, 32b Address – Multiple outstanding transactions (8b transaction ID) • Variable packet size (1 - 128 B) • Multiple types of transactions – Explicit data transfers (e.g., IO): RD, WR – Cache coherency (modified MESI write-invalidate, snoopy) • Four priority levels with RR: Instr, Data, Touch, DMA Address bus access Data bus access Arbitrate Drive Compute Signal Arbitrate Drive ID Drive Data A-Bus transaction response status D-Bus 14 Luca Benini MPSoC 2002

NoC control flow • Shared medium accessn ⇒ TDMA – Bus arbitration (e.g., AMBA) – Slot reservation (e.g., SiliconBackplane) • Switching & Routing (multi-stage NoCs) – Access – Switching – Routing 15 Luca Benini MPSoC 2002

NoC control flow optimization • Shared-medium protocol optimization – Define bus priorities [Lahiri01] – Decentralized/pipelined arbitration [Sonics] – Slotted access window assignment [Lahiri01] • Multistage networks – Static routing, circuit swiching ⇒ FPGAs – Dynamic routing, circuit switching ⇒ DPGAs – Static routing, packet switching • Burst-level switching (virtual circuit) • Single packet switching ⇒ STM Octagon [Dey01] • Cut-through switching ⇒ SPIN – Dynamic routing, packet switching (not yet) 16 Luca Benini MPSoC 2002

Case study: Slot reservation • Sonic µNetwork [Wingard DAC01] 1 2 3 256 1 – Two-level arbitration mechanism ... • First level: TDMA – Time wheel of 256 frames – Each frame can be pre-allocated to one initiator • Second level: Round Robin – Only in idle reserved frames or unreserved frames – Token passing mechanism (distributed protocol) • Use first level for regular, heavy traffic sources • Use second level for sporadic, light traffic sources 17 Luca Benini MPSoC 2002

Programming for NoCs • The programmer’s model – Implicit communication : a single-thread application, communication is to-from memory – Explicit communication : multiple threads/tasks, communication and synchronization are either fully explicit (message passing) or partially explicit (shared variables) • Parallelism extraction vs. parallelism support 18 Luca Benini MPSoC 2002

Explicit communication • Explicitly parallel programming styles Applications – Implicit communication (memory traffic) still relevant MPI MPI – Explicit communication (inter-process) HW-abstraction layer • APIs for explicit communication OS/driver OS/driver – From multiprocessors (e.g. MPI, pthreads) – Support for eterogeneous network fabrics HW-specific layer • Parallel programming API as HW abstraction layers – How much abstraction do we need? 19 Luca Benini MPSoC 2002

Run-time infrastructure • Traditional RTOSes – Single-processor master – Limited support for complex memory hierarchies – Focused on performance • The NoC OS – Natively parallel – Supports eterogeneous memory, computation, communication – Energy/power aware 20 Luca Benini MPSoC 2002

Case study: MPDSP SDE • Daytona SDE [Kalawade DAC99] – Software design methodology and tools Algorithm design environment Ptolemy/SPW/Matlab Module design environment Compiler & Assembler Dynamic application set Module lib. Static Applications Dynamic Scheduling Environment Static Scheduling Environment Run-time kernel (low-overhead Parallelizing tools preemptive, multiprocessor, guarantees performance Simulation and Debugging Performance estimation Simulagtor Evaluate schedulers Debugger Select scheduling policy Profiling tools Set application priorities 21 Luca Benini MPSoC 2002

Managing system energy • Power is a primary constraint • Hardware support for energy efficiency – Multiple shutdown states (idle, sleep, etc.) – Variable/multiple clock speed – Variable voltage • The OS should manage the degrees of freedom – Dynamic power management policies • In NoCs ⇒ distributed control issue – Multi-server systems – Interaction with application layer 22 Luca Benini MPSoC 2002

Case study: node DPM • Maia processor [Zhang JSSC00] – On-demand node activation (GALS) D in Interconnect Satellite PE REQ in clk done REQ out Handshake ACK out & NI ACK in D in REQ in done clk 23 Luca Benini MPSoC 2002

Designing Networks on Chip: Designing Networks on Chip: Solutions - PowerPoint PPT Presentation

Designing Networks on Chip: Designing Networks on Chip: Solutions and Challenges Solutions and Challenges Luca Benini Benini Luca DEIS Universita Universita` ` di di Bologna Bologna DEIS Designing a micro-network Physical

Calibration des Microroc (II) Alex, Cyril, Giom, Jean, Max 09 Mai 2011, Annecy 1 Reminder 2

Exploring Chip to Chip Photonic Networks Philip Watts Computer Laboratory University of Cambridge

Designing for Designing for Greenspace Greenspace Greenspace Designing for Designing for

Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need

Study Of Chip Breaker El-Sherbeeny, PhD 2014 Project-Group 6 TYPES ES OF F CHI HIP a)

Australian Junior Resources Blue Chip Australian Junior Resources Blue Chip Australian Junior

Final Assembly Chip Core Your final project chip consists of a core The Chip Core is

Class 14 Slides SLIDE what is the designing principle how does designing principle

SoC Design Lecture 10: On-Chip Interconnection Networks Lecture 10: On Chip Interconnection

Designing Your Fashion Portfolio From Concept To Presentation Designing Your Fashion Portfolio

Chip Seal ROAD FUTURE: TOWN OF STAR VALLEY RANCH Presentation Goals Chip Seal Class 101 (4

Future of Childrens Health Insurance Program (CHIP) All Kids Covered August 2014 Todays

2015 CHIP Progress 2015 CHIP Overview In May-August 2015, Ottawa County developed its first

Shaping for the future Leon Goddard, LGA CHIP Rachel Carter, LGA CHIP Fiona Richardson, IPC

Importing data Peter Humburg Statistician, Macquarie University DataCamp ChIP-seq Workflows in

PCI/PII Awareness Training Ben Jordan Security Specialist Credit Card Security: Chip Cards

Distributed Multimedia Systems Introduction Introducing Multimedia Systems Example target

Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer

PRESENT runs fast: Efficient and Secure Implementation in Software Tiago Reis , Diego Aranha,

Component- -based, Context based, Context- -aware aware Component Software Systems Software

Video Conferencing M. Nassourou ISNM 2004 1 Media Streaming Outline Introduction

How to develop Device Drivers for Linux OS? Presented from: Rashid Siddiqui For: Bin Tang What

Possible nuclear structure issues for nucleosynthesis Yang Sun Shanghai Jiao Tong University,

Introduction Inter Process Communication (IPC) is the heart of distributed systems. The