Designing Networks on Chip: Designing Networks on Chip: Solutions - - PowerPoint PPT Presentation

designing networks on chip designing networks on chip
SMART_READER_LITE
LIVE PREVIEW

Designing Networks on Chip: Designing Networks on Chip: Solutions - - PowerPoint PPT Presentation

Designing Networks on Chip: Designing Networks on Chip: Solutions and Challenges Solutions and Challenges Luca Benini Benini Luca DEIS Universita Universita` ` di di Bologna Bologna DEIS Designing a micro-network Physical


slide-1
SLIDE 1

Designing Networks on Chip: Designing Networks on Chip: Solutions and Challenges Solutions and Challenges

Luca Luca Benini Benini DEIS DEIS – – Universita Universita` ` di di Bologna Bologna

slide-2
SLIDE 2

Luca Benini MPSoC 2002 2

Designing a micro-network

  • Physical layer

– signalling – synchronization

  • Architecture and control

– network topology – data flow: packetization, encoding – control flow: media access, switching, routing

  • Software

– communication API: implicit vs. explicit – run-time management

slide-3
SLIDE 3

Luca Benini MPSoC 2002 3

Physical layer: the channel

  • Channel characteristics

– Global wires: lumped → distributed models

  • Time of flight is non-negligible

– Inductance effects

  • Refelections, matching issues
  • Designing around the channel’s transfer function

– Current mode vs. voltage mode [Dally98,Burleson01]

  • Low swing vs. rail-to-rail

– Repeater insertion [Friedman01,Burleson02] – Wire sizing [Cong01,Alpert01] – Pre-emphasis / post-filtering [Horowitz99] – Modulation [Dally98,Bogliolo01]

FC FT FR

+

n

1 1 1 1

slide-4
SLIDE 4

Luca Benini MPSoC 2002 4

Case study: Low Swing signalling

  • Pseudodifferential interconnect [Zhang et al.,

JSSC00] (x6 energy reduction vs. CMOS Vdd=2V)

Static FF Clocked SA Low Vdd reference (0.5V)

slide-5
SLIDE 5

Luca Benini MPSoC 2002 5

Physical layer: synchronization

  • Single, global timing reference is not

realistic

– Direct consequence of non-negligible tof – Isochronous regions are shrinking

  • How and when to sample the channel?

– Avoid a clock: asyncronous communication – The clock travels with the data – The clock can be reconstructed from the data

  • Synchronization recovery has a cost

– Cannot be abstracted away – Can cause errors (e.g., metastability)

B1…Bn CLK

D Q CK

1 2

slide-6
SLIDE 6

Luca Benini MPSoC 2002 6

Case study: Asyncronous Bus

  • MARBLE SoC Bus [Bainbridge et al. Asynch01)

– 1-of-4 encoding (4 wires for 2 bits) – Delay insensitive - No bus clock - Wire pipelining – High crosstalk immunity – Four-phase ACK protocol 00 01 10 11 L1 L2 L3 L4

slide-7
SLIDE 7

Luca Benini MPSoC 2002 7

Physical layer: multiobjective optimization

  • Communication is unreliable:

– Crosstalk, supply noise, synchronization noise ⇒ Pbitflip > 0

  • S/N minimization via S maximization is highly

suboptimal (energy-wise)

  • High performance decreases reliability

– Shorter eye opening

  • Wire redundancy helps

– But consumes wiring resources Multiobjective Multiobjective design space: design space: energy vs energy vs. performance . performance vs vs. . reliability tradeoffs reliability tradeoffs

slide-8
SLIDE 8

Luca Benini MPSoC 2002 8

Case study: EC vs. ED codes

  • Low swing signalling with redundant codes [Bertozzi et
  • al. DATE02]: exploring energy vs. error rate tradeoff
slide-9
SLIDE 9

Luca Benini MPSoC 2002 9

NoC Architecture: topology

  • Point-to-point vs. shared medium

– Shared mediun: On-chip bus

  • Dominant today (e.g. AMBA, CoreConnect, etc.)
  • Unidirectional (vs. off-chip three state)
  • Bridged (high speed vs. peripherals)
  • Performance/Power bottleneck

– Point-to-point: dedicated links

  • Ad-hoc width
  • Ad-hoc control
  • Wiring bottleneck
  • Towards multi-stage networks

– Hierarchical+eterogeneous, e.g. Maia [Rabaey00] – Omogeneous e.g. FPGAs, Network processors, … P1 P2 P3

A

mux

slide-10
SLIDE 10

Luca Benini MPSoC 2002 10

Topology optimization

  • One size does not fit all...

– Low-area, low performance systems

  • Shared medium (on-chip bus)

– General-purpose, high performance

  • Omogenenous multi-stage networks

– Domain-specific, low power

  • Eterogeneous multi-stage networks
  • EDA support

– Physical design (floorplanning, routing, layer assignment)

  • Eterogeneous solution requires strongest EDA support

– IP-based approach (VSI: topology-neutral)

slide-11
SLIDE 11

Luca Benini MPSoC 2002 11

Case study: hierarchical networks

  • AMBA [Flynn Micro97]: bridged bus architecture
  • Hierarchical Mesh [Zhang et al. JSSC00]

cluster cluster cluster

Cluster Switchboxes

Hierarchial switch-boxes Universal (intra-cluster) switch-boxes

slide-12
SLIDE 12

Luca Benini MPSoC 2002 12

NoC control: data flow

  • Packetization

– Payload: single-word vs. multi-word

  • E.g. burst transactions in AMBA

– Header-tail: in packet vs. dedicated channels

  • E.g. SPIN (in-packet) [Guerrier00] vs. AMBA (control signals)

– Acknowledgement: blocking vs. non blocking

  • E.g. Split transaction bus in Daytona [Ackland00]
  • Data representation/encoding

– Fast hardware-based compression [Benini01] – Encoding for low energy/error resiliency […]

slide-13
SLIDE 13

Luca Benini MPSoC 2002 13

NoC data-flow optimization

  • Packet size/format optimization

– Payload vs. control

  • Lager payload⇒reduce control overhead
  • Smaller payload⇒improved error recovery

– Dedicated control channels vs. in-packet

  • Control wires overhead (long and slow)
  • Smaller payload (reduced effective bandwidth)
  • Forward (data) and backward (ack) traffic
  • Data representation

– Payload/address compression, low power encodings

  • Compression-decompression cost (performance/power)
slide-14
SLIDE 14

Luca Benini MPSoC 2002 14

Case study: STBus

  • Daytona split transaction bus [Ackland JSSC00]

– Pipelined 128b Data, 32b Address – Multiple outstanding transactions (8b transaction ID)

  • Variable packet size (1 - 128 B)
  • Multiple types of transactions

– Explicit data transfers (e.g., IO): RD, WR – Cache coherency (modified MESI write-invalidate, snoopy)

  • Four priority levels with RR: Instr, Data, Touch, DMA

Address bus access

Arbitrate A-Bus Drive transaction Compute response Signal status Arbitrate D-Bus Drive ID Drive Data

Data bus access

slide-15
SLIDE 15

Luca Benini MPSoC 2002 15

NoC control flow

  • Shared medium accessn⇒TDMA

– Bus arbitration (e.g., AMBA) – Slot reservation (e.g., SiliconBackplane)

  • Switching & Routing (multi-stage NoCs)

– Access – Switching – Routing

slide-16
SLIDE 16

Luca Benini MPSoC 2002 16

NoC control flow optimization

  • Shared-medium protocol optimization

– Define bus priorities [Lahiri01] – Decentralized/pipelined arbitration [Sonics] – Slotted access window assignment [Lahiri01]

  • Multistage networks

– Static routing, circuit swiching ⇒ FPGAs – Dynamic routing, circuit switching ⇒DPGAs – Static routing, packet switching

  • Burst-level switching (virtual circuit)
  • Single packet switching ⇒ STM Octagon [Dey01]
  • Cut-through switching ⇒ SPIN

– Dynamic routing, packet switching (not yet)

slide-17
SLIDE 17

Luca Benini MPSoC 2002 17

Case study: Slot reservation

  • Sonic µNetwork [Wingard DAC01]

– Two-level arbitration mechanism

  • First level: TDMA

– Time wheel of 256 frames – Each frame can be pre-allocated to one initiator

  • Second level: Round Robin

– Only in idle reserved frames or unreserved frames – Token passing mechanism (distributed protocol)

  • Use first level for regular, heavy traffic sources
  • Use second level for sporadic, light traffic sources

... 1 2 3 256 1

slide-18
SLIDE 18

Luca Benini MPSoC 2002 18

Programming for NoCs

  • The programmer’s model

– Implicit communication: a single-thread application, communication is to-from memory – Explicit communication: multiple threads/tasks, communication and synchronization are either fully explicit (message passing) or partially explicit (shared variables)

  • Parallelism extraction vs. parallelism support
slide-19
SLIDE 19

Luca Benini MPSoC 2002 19

Explicit communication

  • Explicitly parallel programming styles

– Implicit communication (memory traffic) still relevant – Explicit communication (inter-process)

  • APIs for explicit communication

– From multiprocessors (e.g. MPI, pthreads) – Support for eterogeneous network fabrics

  • Parallel programming API as HW

abstraction layers

– How much abstraction do we need? HW-abstraction layer HW-specific layer Applications MPI MPI OS/driver OS/driver

slide-20
SLIDE 20

Luca Benini MPSoC 2002 20

Run-time infrastructure

  • Traditional RTOSes

– Single-processor master – Limited support for complex memory hierarchies – Focused on performance

  • The NoC OS

– Natively parallel – Supports eterogeneous memory, computation, communication – Energy/power aware

slide-21
SLIDE 21

Luca Benini MPSoC 2002 21

Case study: MPDSP SDE

  • Daytona SDE [Kalawade DAC99]

– Software design methodology and tools

Algorithm design environment Ptolemy/SPW/Matlab

Dynamic Scheduling Environment Run-time kernel (low-overhead

preemptive, multiprocessor, guarantees performance

Static Scheduling Environment Parallelizing tools Performance estimation

Evaluate schedulers Select scheduling policy Set application priorities

Module design environment Compiler & Assembler Simulation and Debugging

Simulagtor Debugger Profiling tools

Static Applications Module lib. Dynamic application set

slide-22
SLIDE 22

Luca Benini MPSoC 2002 22

Managing system energy

  • Power is a primary constraint
  • Hardware support for energy efficiency

– Multiple shutdown states (idle, sleep, etc.) – Variable/multiple clock speed – Variable voltage

  • The OS should manage the degrees of freedom

– Dynamic power management policies

  • In NoCs⇒distributed control issue

– Multi-server systems – Interaction with application layer

slide-23
SLIDE 23

Luca Benini MPSoC 2002 23

Case study: node DPM

  • Maia processor [Zhang JSSC00]

– On-demand node activation (GALS)

Interconnect

Satellite PE Handshake & NI ACKin REQin clk done REQout ACKout Din Din REQin clk done

slide-24
SLIDE 24

Luca Benini MPSoC 2002 24

Summary

  • Trend toward NoCs

– Physics/technology drives us there

  • A methodology to design/use NoCs

– A layered approach

  • Some solutions are already out there

– EDA support is essential – Software infrastructure is key