Al Faruque http://ces.univ-karlsruhe.de/
1
System-on-Chip Communication Architecture Dr.-Ing. Mohammad - - PowerPoint PPT Presentation
1 System-on-Chip Communication Architecture Dr.-Ing. Mohammad Abdullah Al Faruque Chair for Embedded Systems (CES) Karlsruhe Institute of Technology Al Faruque http://ces.univ-karlsruhe.de/ Columns of Embedded System Design 1. Embedded
Al Faruque http://ces.univ-karlsruhe.de/
1
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
appropriate for ES since they offer a fair compromise between many constraints but they do not allow to adapt to the specific needs for ES
methodologies at higher level of abstraction
technology
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Flexibility, 1/time-to-market, … “efficiency”: $/Mips, mW/MHz, Mips/area, … ASIC:
General purpose processor ASIP (extensible processor)
functionality/devices “Hardware solution” “Software solution”
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
55 50 47 43 32 25 20 10 8 3 2 1 0.8 0.4 0.3 0.2
50 100 150 200 250 300
Available Gates Used Gates
Millions of Gates 1990 1992 1994 1996 1998 2000 2002 2004 2006
Design Productivity Gap
[source: Gartner/Dataquest]
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
incoming call is accepted
Huge Computational Power and Application Concurrency
Application Parallelism
Download File
X
TV – Channel …
X
Incom coming ng Video Call!
Phone
X
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Fclock > 4 GHz. Memory bandwidth: 25.6 GBytes per second. I/O bandwidth: 76.8 GBytes per second. Performance:
256 GFLOPS (Single precision at 4 GHz). 256 GOPS (Integer at 4 GHz). 25 GFLOPS (Double precision at 4 GHz).
235 square mm. 235 million transistors. Power consumption estimated at 60 - 80 W @ 4GHz
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
A B
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Chip (SoC) design. NoC based-systems accommodate multiple asynchronous clocking that many of today's complex SoC designs use. The NoC solution brings a networking method to
performance increase over conventional bus systems.”
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
P E 2 P E 1 P E 3 P E 4 Bus-based system
P E 2 P E 1 P E 3 P E 4 S S
NoC-based system
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
wiring
application areas (e.g. embedded multimedia)
power efficient than a bus-based system
any possible recipient whereas in a NoC-based system the information (packet) is only been sent to actual recipients
the power consumption of the NoC may be a major power consumer of an SoC
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
vu med cpu rast sdram sram1 sram2 idct ,etc adsp up samp risc au bab
190 0.5 910 0.5 60 40 600 40 250 500 173 670 32 MPEG Core Graph
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
27
Al Faruque http://ces.univ-karlsruhe.de/
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
vary widely, area is wasted
behavior (uneven wire length) It's a tradeoff between performance and design costs
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
used for communication infrastructure
Higher channel width
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
500 1000 1500 2000 2500 16 32 64 128 slices
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
influence on the latency and the total area needed for the routers There are algorithms to optimize the buffer size under special assumptions Influence of the buffer size in NoC: input buffers increase from 2 to 3 words the router area 30% or more
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
topologies)
step
In contrast to normal floor planning problems you have to take into account:
for predictable latency
communication architecture
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
MEM MEM
Topology with empty tiles Application Characterization Graph (APCG) Determine a mapping function that maps the IP- Cores onto the topology
CPU 2 DSP CPU 1 I / O O
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
t3 t3 t1 t1 t2 t4 t4 t5 t6
For static scheduling there exist useable algorithms Problem for applications with conditional branches but
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Which switching strategy should be used ?
Which routing strategy should be used ?
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
S D
Command Address Payload
Flit Flit Flit
Flit (routing info) Flit Flit
Ref: QNoC group
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Suits on chip interconnect Small number of buffers Low latency Virtual Channels
Interface Interface
Interface
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Interface
Interface
The delivery
Packet transmission
We focus on packet
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Interface
Interface
Packet delivery
Low-capacity link
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
D S
N E
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Crossbar Switch
East Out Channel North Out Channel South Out Channel West Out Channel Local Out Channel Crossbar Arbiter Addr Decoder Channel Controller North Input Buffer Addr Decoder Channel Controller East Input Buffer Addr Decoder Channel Controller South Input Buffer Addr Decoder Channel Controller West Input Buffer Addr Decoder Channel Controller Local Input Buffer (0,3) (3,0) (2,0) (1,0) (0,0) (3,1) (2,1) (1,1) (0,1) (3,2) (2,2) (1,2) (0,2) (3,3) (2,3) (1,3) (3,1) Router Processing Element Router
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Architectures, and Implementations, 2005.
guaranteed and best-effort services for networks on chip, 2003.
process for network on chip, 2004.
communication schemes to guarantee quality-of-service in networks-on-chip, 2005.
and BE traffic, 2006.
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
+ High resource utilization + Priority aware service
guarantees)
Advantages Disadvantages
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Philips Research Laboratories Connection oriented architecture Æthereal NoC provides
Best-effort Service Guaranteed Service
TDMA (Time Division Multiple Access)
producer consumer producer consumer
request response
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
How is TDMA adopted in Æthereal NoC
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
i0
i3
i1 i2
i0 i3
io
a b c b a c
s=2 s=2 s=2 T1 T2 T3
b a s=3 s=3 s=3
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
SetUp Tear Down Ack SetUp
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
IP1 IP2
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
Router
Module Module
another router
CROSS-BAR
Scheduler Control Routing
CREDIT
Buffers
SIGNAL RT RD/WR BLOCK SIGNAL RT RD/WR BLOCK CREDIT
Scheduler Control Routing
CREDIT SIGNAL RT RD/WR BLOCK SIGNAL RT RD/WR BLOCK CREDIT
Output ports Input ports
Ref: QNoC group
Mohammad Abdullah Al Faruque
Chair for Embedded Systems
WS09/10
NoC Research Group Group Research NoC