Addressing the System-on-a- Addressing the System-on-a- Chip - - PowerPoint PPT Presentation

addressing the system on a addressing the system on a
SMART_READER_LITE
LIVE PREVIEW

Addressing the System-on-a- Addressing the System-on-a- Chip - - PowerPoint PPT Presentation

Addressing the System-on-a- Addressing the System-on-a- Chip Interconnect Woes Through Chip Interconnect Woes Through Communication-Based Design Communication-Based Design J. J. Rabaey, Rabaey, M. Sgroi Sgroi, M. Sheets, A. Mihal, K.


slide-1
SLIDE 1

Addressing the System-on-a- Addressing the System-on-a- Chip Interconnect Woes Through Chip Interconnect Woes Through Communication-Based Design Communication-Based Design

J.

  • J. Rabaey,

Rabaey, M.

  • M. Sgroi

Sgroi, M. Sheets, A. Mihal, K. Keutzer, S. , M. Sheets, A. Mihal, K. Keutzer, S. Malik Malik, J. Rabaey, , J. Rabaey,

  • A. Sangiovanni-
  • A. Sangiovanni-Vincentelli

Vincentelli

University of California, Berkeley and Princeton University University of California, Berkeley and Princeton University

slide-2
SLIDE 2

The SOC Interconnect Challenge The SOC Interconnect Challenge

slide-3
SLIDE 3

The SOC Interconnect Challenge The SOC Interconnect Challenge

“Femme se “Femme se coiffant coiffant” ” Pablo Ruiz Picasso Pablo Ruiz Picasso 1940 1940

slide-4
SLIDE 4

The SOC Interconnect Challenge The SOC Interconnect Challenge

Bridge DMA CPU DSP Mem Ctrl. MPEG C I O O

System Bus Peripheral Bus Control Wires Custom Interfaces

Ad-hoc Approach Ad-hoc Approach

slide-5
SLIDE 5

The SOC Interconnect Challenge The SOC Interconnect Challenge

Alternative: Alternative:

Bridge DMA CPU DSP Mem Ctrl. MPEG C I O O

System Bus Peripheral Bus Control Wires

Ad-hoc Approach Ad-hoc Approach

A disciplined SOC disciplined SOC interconnect design approach interconnect design approach that addresses:

  • reliability
  • predictability
  • performance
  • power dissipation

concerns caused by deep- submicron effects and complexity considerations, and exploits advanced communication techniques

Custom Interfaces

slide-6
SLIDE 6

The Network-on-a-Chip (NOC) Approach The Network-on-a-Chip (NOC) Approach

Embedded Processors Embedded Processors Memory Sub-system Memory Sub-system Baseband Processing Baseband Processing Configurable Accelerators Configurable Accelerators Programmable Protocol Stack Programmable Protocol Stack Interconnect Backplane

Communication-based Design

Communication-based Design

  • Orthogonalizes

Orthogonalizes function and communication function and communication

  • Builds on well-known

Builds on well-known models-of-computation models-of-computation and correct-by-construction and correct-by-construction synthesis flow synthesis flow

  • Parallels

Parallels layered approach layered approach exploited by communications community exploited by communications community

slide-7
SLIDE 7

How Does the Communication Network How Does the Communication Network World Deal with these Problems? World Deal with these Problems?

  • Scalable clusters of

heterogeneous networks

  • Wide range of data units

at different levels of abstraction (streams, packets, bits)

  • With varying throughput,

latency and reliability requirements

Clusters Massive Cluster

Gigabit Ethernet

Central tenet: Layered approach standardized as the ISO-OSI Reference Model.

slide-8
SLIDE 8

The ISO Protocol Stack The ISO Protocol Stack

  • Reference model for wired and

wireless protocol design —Also useful guide for for conception and decomposition of NOCs

  • Layered approach allows for
  • rthogonalization of concerns

and decomposition of constraints

  • Not required to implement all

layers of the stack

– depends upon application needs and technology

  • Layered structure must not

necessarily be maintained in final implementation

– e.g., multiple layers can be merged in implementation optimization

Network Network Transport Transport Session Session Data Link Data Link Physical Physical Presentation/Application Presentation/Application

slide-9
SLIDE 9

The ISO Protocol Stack The ISO Protocol Stack

Network Network Transport Transport Session Session Data Link Data Link Physical Physical Presentation/Application Presentation/Application

Transmit bits over physical interconnect medium (signal waveform, voltages, timing, synchronization)

Example: synchronous reduced- swing pulse-based signaling

slide-10
SLIDE 10

The ISO Protocol Stack The ISO Protocol Stack

Network Network Transport Transport Session Session Data Link Data Link Physical Physical Presentation/Application Presentation/Application

Reliable transmission over physical link + media access control (MAC) (error detection and coding, multiple- access scheme, arbitration)

Example: Bus

slide-11
SLIDE 11

The ISO Protocol Stack The ISO Protocol Stack

Network Network Transport Transport Session Session Data Link Data Link Physical Physical Presentation/Application Presentation/Application

Topology-independent end-to-end communication over multiple data links (routing, bridging, repeaters)

Example: Statically-configured mesh network of FPGA

slide-12
SLIDE 12

The ISO Protocol Stack The ISO Protocol Stack

Network Network Transport Transport Session Session Data Link Data Link Physical Physical Presentation/Application Presentation/Application

Establish and maintain end-to-end communications (flow control, message reordering, packet segmentation and reassembly)

Example: Establish, maintain and rip-up connections in dynamically reconfigurable SOCs

slide-13
SLIDE 13

The ISO Protocol Stack The ISO Protocol Stack

Network Network Transport Transport Session Session Data Link Data Link Physical Physical Presentation/Application Presentation/Application

Adds state to the end-to-end connection provided by the protocol stack

Example: Synchronous messaging, requiring sender and receiver to rendez- vous using semaphore

slide-14
SLIDE 14

The ISO Protocol Stack The ISO Protocol Stack

Network Network Transport Transport Session Session Data Link Data Link Physical Physical Presentation/Application Presentation/Application

Exports communication architecture to system and performs data formatting and conversion

Example: Change byte-ordering of data to ensure compatibility

slide-15
SLIDE 15

Example: The Pleiades Network-on-a-Chip Example: The Pleiades Network-on-a-Chip

Configuration

Dedicated Arithmetic

Configuration Bus Reconfigurable Interconnect Network

Embedded Processor FPGA Memory Address Generator Arithmetic Processor Arithmetic Processor

.. ..

Network Interface

  • Programmable/configurable platform intended for low-energy

communication and signal-processing applications (wireless, media)

  • Allows for dynamic task-level reconfiguration of large-granularity

modules into dedicated “data-flow” accelerators

[Zhang, ISSCC 00]

slide-16
SLIDE 16

Maia Maia: Reconfigurable : Reconfigurable Baseband Baseband Processor for Wireless Processor for Wireless

slide-17
SLIDE 17

A Session-level Perspective A Session-level Perspective

for(i=1;i<=L;i++) for(k=i;k<=L;k++) phi[i][k]= phi[i-1][k-1] +in[NP-i]*in[NP-k]

  • in[NA-1-i]*in[NA-1-k];

end start

Embedded processor

AddrGen MEM: in ALU ALU AddrGen MEM: phi MPY MPY

Code seg Code seg

Set up connections “Configure” modules

slide-18
SLIDE 18

The Network Layer The Network Layer

Universal Switchbox

Cluster Cluster

Level-1 Mesh Level-2 Mesh

Hierarchical Switchbox

  • Network statically configured at start of session and ripped up at end
  • Structured approach reduces interconnect energy with factor 7
  • ver straightforward cross-bar

Hierarchical reconfigurable mesh network

slide-19
SLIDE 19

The Physical Layer The Physical Layer

Reconfigurable Network Reconfigurable Network Co-Processor Co-Processor Module Module ( (µ µProc Proc, ALU, MPY, SRAM…) , ALU, MPY, SRAM…)

din reqin ackin dout reqout ackout Din

REQin done

Globally Globally Asynchronous Asynchronous 2-phase self-timed handshaking protocol

Allows individual modules to dynamically trade-off performance for energy-efficiency

slide-20
SLIDE 20

The Physical Layer The Physical Layer

Reconfigurable Network Reconfigurable Network

Physical Layer Interface Module

Co-Processor Co-Processor Module Module (ALU, MPY, SRAM…) (ALU, MPY, SRAM…)

din reqin ackin dout reqout ackout din dout clk Din

REQin Clk done

Locally Locally synchronous synchronous

done

Globally Globally Asynchronous Asynchronous

slide-21
SLIDE 21

The Physical Layer The Physical Layer

Reconfigurable Network Reconfigurable Network

Physical Layer Interface Module

Co-Processor Module (ALU, MPY, SRAM…)

din reqin ackin dout reqout ackout din dout clk done

clk in d

  • ut

A B

0.4V

1V clk in d

  • ut

A B

0.4V

1V

0.4 V 0.4 V 1 V 1 V

Reduced voltage swing Reduced voltage swing

  • n interconnect reduces
  • n interconnect reduces

energy by factor 3.4 energy by factor 3.4

level-converters

slide-22
SLIDE 22

Metropolis Design Methodology Metropolis Design Methodology

P P

P1’ P2’ P1’ P2’ P1’ P2’

A A

P1” P2”

  • Orthogonalization of concerns:

separation of communication and computation

  • Formal system representation

(supporting multiple Models of Computation)

  • Formal Methodology for

Communication Refinement: sequence of adaptation steps between objects (processes and channel) with incompatible behaviors

slide-23
SLIDE 23

Metropolis Design Methodology Metropolis Design Methodology

P1 P2

3 kb 1 kb

P1 P2

Behavior Adapter:

Adapt communicating processes with incompatible behaviors

slide-24
SLIDE 24

Metropolis Design Methodology Metropolis Design Methodology

P1 P2

3 kb 1 kb

P1 P2

BA

Segmentation

BA

Behavior Adapter:

Adapt communicating processes with incompatible behaviors

slide-25
SLIDE 25

Metropolis Design Methodology Metropolis Design Methodology

P1’ P2

1 kb 1 kb

P1’ P2’ Behavior Adapter:

Adapt communicating processes with incompatible behaviors

slide-26
SLIDE 26

Metropolis Design Methodology Metropolis Design Methodology

Channel Selection:

Select a (non-ideal) channel that physically transports messages P1’ P2

P1’ P2’

Wireless Channel (BER=10-3)

Globally Asynchronous Locally Synchronous Model

slide-27
SLIDE 27

Metropolis Design Methodology Metropolis Design Methodology

Channel Adaptation: Adapt the behaviors of processes and channel to meet communication requirements

P1’ P2

P1’ P2’

CA CA

CA CA

CA: CRC + Retransmission of incorrect packets

slide-28
SLIDE 28

Metropolis Design Methodology Metropolis Design Methodology

Optimization: Merge adapters and processes

P1’’ P2’’

P1’’ P2’’

slide-29
SLIDE 29

Example: Wireless Application Protocol Example: Wireless Application Protocol

PAP HTTP TCP IP WSP WTLS OTA WTP

Internet Protocol Stack Internet Protocol Stack WAP Protocol Stack WAP Protocol Stack “over-the-air “ “push-application protocol“

slide-30
SLIDE 30

Example: Wireless Application Protocol Example: Wireless Application Protocol

PAP HTTP TCP IP WSP WTLS OTA WTP

Internet Protocol Stack Internet Protocol Stack WAP Protocol Stack WAP Protocol Stack

slide-31
SLIDE 31

Example: Wireless Application Protocol Example: Wireless Application Protocol

PAP HTTP TCP IP WSP WTLS OTA WTP

Gateway

HTTP Encoding OTA WSP WTLS WTP TCP IP

Behavior Adaptation Behavior Adaptation

slide-32
SLIDE 32

Example: Wireless Application Protocol Example: Wireless Application Protocol

PAP HTTP TCP IP WSP WTLS OTA WTP HTTP Encoding OTA WSP WTLS WTP TCP IP GSM-SMS GSM-SMS ATM

Channel Selection Channel Selection

ATM

Fiber Optic Fiber Optic Ether Ether

Gateway

slide-33
SLIDE 33

Example: Wireless Application Protocol Example: Wireless Application Protocol

PAP HTTP TCP IP WSP WTLS OTA WTP HTTP Encoding OTA WSP WTLS WTP TCP IP GSM-SMS GSM-SMS ATM

Channel Channel Adaptation Adaptation

ATM AAL AAL WDP WDP

Gateway

slide-34
SLIDE 34

Example: The PicoRadio II (TCI) Design Example: The PicoRadio II (TCI) Design

slide-35
SLIDE 35

Example: The PicoRadio II (TCI) Design Example: The PicoRadio II (TCI) Design

2 Mbit flash

Network Protocol MAC CPU (Xtensa) Flash Ctrl.

1 kB I$

I/O

64 kB IRAM 64 kB DRAM

Baseband FPGA

to RF

slide-36
SLIDE 36

Example: The PicoRadio II (TCI) Design Example: The PicoRadio II (TCI) Design

DATA DATA SRAM SRAM 64Kbit 64Kbit INSTRUCTION INSTRUCTION SRAM SRAM 64Kbit 64Kbit LINK/ LINK/ MAC MAC XTENSA XTENSA Network

Network

I/O I/O

2 Mbit flash

Network Protocol MAC CPU (Xtensa) Flash Ctrl.

1 kB I$

I/O

64 kB IRAM 64 kB DRAM

Baseband FPGA

to RF

slide-37
SLIDE 37

Mapping Behavior onto Architecture* Mapping Behavior onto Architecture*

*Using the VCC tools from Cadence Design Systems

Behavior described using CFSMs

Behavior Behavior

slide-38
SLIDE 38

Mapping Behavior onto Architecture* Mapping Behavior onto Architecture*

Architecture Architecture

*Using the VCC tools from Cadence Design Systems

Behavior described using CFSMs

Behavior Behavior

slide-39
SLIDE 39

Mapping Behavior onto Architecture* Mapping Behavior onto Architecture*

Architecture Architecture

*Using the VCC tools from Cadence Design Systems

Behavior described using CFSMs Collection of interconnect models enables exploration

Behavior Behavior

slide-40
SLIDE 40

Mapping Behavior onto Architecture* Mapping Behavior onto Architecture*

Architecture Architecture Behavior Behavior

*Using the VCC tools from Cadence Design Systems

Behavior described using CFSMs Collection of interconnect models enables exploration

slide-41
SLIDE 41

Choosing the Interconnect Architecture Choosing the Interconnect Architecture

“The Silicon Backplane” “The Silicon Backplane” (Courtesy Sonics, Inc) (Courtesy Sonics, Inc)

DSP MPEG CPU DMA C MEM I O

Open Core ProtocolTM SiliconBackplane AgentTM

slide-42
SLIDE 42

Choosing the Interconnect Architecture Choosing the Interconnect Architecture

“The Silicon Backplane” (Sonics, “The Silicon Backplane” (Sonics, Inc) Inc)

DSP MPEG CPU DMA C MEM I O

Open Core ProtocolTM SiliconBackplane AgentTM

TDMA Multiple Access Scheme TDMA Multiple Access Scheme guarantees bandwidth guarantees bandwidth for time-critical links; for time-critical links; Combined with contention slots for Combined with contention slots for

  • ther communications
  • ther communications
slide-43
SLIDE 43

Modeling the Impact of Interconnect Choice Modeling the Impact of Interconnect Choice

Initiator Core Initiator Agent Interconnect OCP Target Agent Target Core OCP Arbiter

Flexible bandwidth arbitration model TDMA slot map gives slot

  • wner right of refusal

Unowned/unused slots fall to round-robin arbitration Latency after slice granted is user-specified between 2-7 Bus Clock cycles

Silicon Backplane Model in VCC

slide-44
SLIDE 44

Example: The MESCAL Architecture* Example: The MESCAL Architecture*

  • A MESCAL Communication Architecture is a general,

coarse-grained interconnection scheme for system components

  • Communicators are Processing Elements, I/Os, Memories,

Switches, Reconfigurable fabrics

IO IO IO IO IO IO IO IO IO Processing Element Processing Processing Element Element Processing Element Processing Processing Element Element SW SW SW SW SW SW MEM MEM MEM MEM MEM MEM PE PE PE Reconfigurable Reconfig Reconfigurable urable

*MESCAL is a GSRC project, targeting full-programmable SOC design

slide-45
SLIDE 45

Application Application Application Transport Transport Network Network Data Link Data Link Physical Physical Physical

PE PE

Local Memory Local Memory Cache Cache Communication Assist Communication Assist

The MESCAL Interconnect Architecture The MESCAL Interconnect Architecture

  • Stack layers map to software

and hardware components:

– PE software – CA software – CA hardware – Channel Hardware Communicator

slide-46
SLIDE 46

Communication Architecture Design Communication Architecture Design

  • Describe a stack at each node using a formal Ptolemy model
  • Describe the interconnect topology
  • Use a correct-by-construction synthesis approach to

implement on a programmable platform

Processing Element Processing Processing Element Element Processing Element Processing Processing Element Element

slide-47
SLIDE 47

Summary Summary

  • Designing a SOC has become a communications-

design problem

  • Refinement-based formal methodology, inspired

by OSI protocol stack, leads to predictable, verifiable and testable solution

  • Methodology opens the door for innovative

solutions to the interconnect problem

– Ultra-low swing signals with error-correction and retransmission – Data compression for high-rate links – Globally asynchronous design – Dynamic routing of data (see talk of Bill Dally)