Networks on chip: Evolution or Revolution? Luca Benini - PDF document

Networks on chip: Evolution or Revolution? Luca Benini lbenini@deis.unibo.it DEIS-Universita’ di Bologna MPSOC 2004 The evolution of SoC platforms MIPS ™ TriMedia ™ SDRAM General-purpose Scalable VLIW MIPS CPU MMI TriMedia CPU Scalable RISC Media Processor: D$ D$ Processor • 100 to 300+ MHz PRxxxx TM-xxxx I$ I$ • 50 to 300+ MHz • 32-bit or 64-bit • 32-bit or 64-bit DEVICE IP BLOCK DEVICE IP BLOCK DVP MEMORY BUS Nexperia ™ DEVICE IP BLOCK DEVICE IP BLOCK Library of Device System Buses . . IP Blocks • 32-128 bit PI BUS PI BUS . . • Image coprocessors . . • DSPs DEVICE IP BLOCK DEVICE IP BLOCK • UART • 1394 • USB DVP SYSTEM SILICON … � 2 Cores : Philips’ Nexperia PNX8850 SoC platform for High-end digital video (2001) L. Benini MPSOC 2004 2

Running forward… Four 350/400 MHz StarCore � SC140 DSP extended cores 16 ALUs: 5600/6400 MMACS � 1436 KB of internal SRAM & � multi-level memory hierarchy � Internal DMA controller supports 16 TDM unidirectional channels, Two internal coprocesssors � (TCOP and VCOP) to provide special-purpose processing capability in parallel with the core processors � 6 Cores : Motorola’s MSC8126 SoC platform for 3G base stations (late 2003) L. Benini MPSOC 2004 3 What’s happening in SoCs? � Technology: no slow-down in sight! � Faster and smaller transistors � … but slower wires, lower voltage, more noise! � Design complexity: from 2 to 10 to 100 cores! � Design reuse is essential � …but differentiation/innovation is key for winning on the market! � Performance and power: GOPS for MWs! � Performance requirements keep going up � …but power budgets don’t! L. Benini MPSOC 2004 4

…and on-chip communication? � Starting point: the “on chip bus” � Advances in protocols � Advances in topologies � Revolutionary approaches � Networks on chip � Things are moving FAST � …but it’s evolution or revolution? L. Benini MPSOC 2004 5 Outline � Introduction and motivation � On-chip networking � The HW-SW interface L. Benini MPSOC 2004 6

On-chip bus Architecture � Many alternatives � Large semiconductor firms (e.g. IBM Coreconnect, STMicro STBus) � Core vendors (e.g. ARM AMBA) � Interconnect IP vendors (e.g. SiliconBackplane) � Same topology, different protocols L. Benini MPSOC 2004 7 AMBA bus Master port Slave port CPU EU IO System- Peripheral Bridge AMBA High-speed bus CPU Bus EU Mem Mem APB : Simplified processor for AHB : high-speed high-bandwidth general purpose peripherals multi-master bus L. Benini MPSOC 2004 8

AHB Bus architecture Dedicated wires Different wires NO Bidirectional wires L. Benini MPSOC 2004 9 AMBA basic transfer Pipelining increases Bus bandwidth For a write For a read L. Benini MPSOC 2004 10

Bus arbitraton Shared address bus ARBITER HBREQ_M1 HBREQ_M2 Dedicated wires HBREQ_M3 Arbitration Protocol is defined, but Arbitration Policy is not L. Benini MPSOC 2004 11 The price for arbitration Time for handshaking Time for arbitration Wait state L. Benini MPSOC 2004 12

Burst transfers � Burst transfers amortize arbitration cost � Grant bus control for a number of cycles � Help with DMA and block transfers � Help hiding arbitration latency � Requires safeguards against starvation � Split and error L. Benini MPSOC 2004 13 Critical analysis: bottlenecks � Protocol � Lacks parallelism � In order completion � No multiple outstanding transactions: cannot hide slave wait states � High arbitration overhead (on single-transfers) � Bus-centric vs. transaction-centric � Initiators and targets are exposed to bus architecture (e.g. arbiter) � Topology � Scalability limitation of shared bus solution! L. Benini MPSOC 2004 14

STBUS � On-chip interconnect solution by ST � Level 1-3: increasing complexity (and performance) � Features � Higher parallelism: 2 channels (M-S and S-M) � Multiple outstanding transactions with out-of order completion � Supports deep pipelining � Supports Packets (request and response) for multiple data transfers � Support for protection, caches, locking � Deployed in a number of large-scale SoCs in STM L. Benini MPSOC 2004 15 STBUS Protocol (Type 3) Request channel Transaction level Transaction Resp Packet Req Packet Packet level Initiator Target Cell level Response channel Signal level Initiator port Target port L. Benini MPSOC 2004 16

STBUS bottlenecks � Protocol is not fully transaction-centric � Cannot connect initiator to target (e.g. initiator does not have control flow on the response channel) � Packets are atomic on the interconnect � Cannot initiate nor receive multiple packets at the same time � Large data transfers may starve other initiators L. Benini MPSOC 2004 17 AMBA AXI � Latest (2003) evolution of AMBA � Advanced eXtensible Interface � Features � Fully transaction centric: can connect M to S with nothing in between � Higher parallelism: multiple channels � Supports bus-based power management � Support for protection, caches, locking � Deployment: ?? L. Benini MPSOC 2004 18

Multi-channel M-S interface Channel hanshaking Address Channel Write channel DATA VALID Master Slave READY Read channel 4 parallel channels are Write response ch. available! L. Benini MPSOC 2004 19 Multiple outstanding transactions � A transaction implies activity on multiple channels � E.g Read uses the Address and Read channel � Channels are fully decoupled in time � Each transaction is labeled when it is started (Address channel) � Labels, not signals, are used to track transaction opening and closing � Out of order completion is supported (tracking logic in master), but master can request in order delivery � Burst support � Single-address burst transactions (multiple data channel slots) � Bursts are not atomic! � Atomicity is tricky � Exclusive access better than locked access L. Benini MPSOC 2004 20

Scalability: Execution Time � Highly parallel benchmark (no slave bottlenecks) 110% 180% 170% 100% 160% 150% 90% 140% Relative execution time Relative execution time 80% 130% 120% 70% 110% 100% 60% 2 Cores 2 Cores 90% 4 Cores 4 Cores 50% 6 Cores 80% 6 Cores 70% 8 Cores 8 Cores 40% 60% 30% 50% 40% 20% 30% 20% 10% 10% 0% 0% AHB AXI STBus STBus (B) AHB AXI STBus STBus (B) � 256 B cache (high � 1 kB cache (low bus bus traffic) traffic) L. Benini MPSOC 2004 21 Scalability: Protocol Efficiency 100% 100% 90% 90% Interconnect usage efficiency 80% 80% 70% 70% Interconnect busy 60% 60% 50% 2 Cores 50% 2 Cores 4 Cores 4 Cores 40% 6 Cores 40% 6 Cores 8 Cores 8 Cores 30% 30% 20% 20% 10% 10% 0% 0% AHB AXI STBus STBus (B) AHB AXI STBus STBus (B) � Increasing contention: AXI, STBus show 80%+ efficiency, AHB < 50% L. Benini MPSOC 2004 22

Scalability: latency 14 Latency for access completion (cycles) 13 12 11 10 9 8 7 6 STBus (B) write avg STBus (B) write min 5 STBus (B) read avg 4 STBus (B) read min AXI write avg 3 AXI write min 2 AXI read avg AXI read min 1 0 2 Cores 4 Cores 6 Cores 8 Cores � STBus management has less arbitration latency overhead, especially noticeable in low-contention conditions L. Benini MPSOC 2004 23 Topology � Single shared bus is clearly non-scalable � Evolutionary path � “Patch” bus topology � Two approaches B � Clustering & Bridging � Multi-layer/Multibus M M L. Benini MPSOC 2004 24

Clustering and bridging Heterogeneous architectures with asymmetric traffic � � Cost for going across a bridge is HIGH Bus clusters for bandwidth & latency reasons � � Example: EASY SoCs for WLAN T I I T L. Benini MPSOC 2004 25 AMBA Multi-layer AHB � Enables parallel access paths between multiple masters and slaves � Fully compatible with AHB wrappers AHB1 Interconnect Slave1 Master1 Matrix Slave1 AHB2 Master2 Slave1 Slave Port L. Benini MPSOC 2004 26

Multi-Layer AHB implementation � The matrix is made of slave ports � No explicit arbitration of slaves � Variable latency in case of destination conflicts Slave1 Decode Master1 Mux Crossbar arbitration Mux Master2 Slave4 Decode L. Benini MPSOC 2004 27 STBUS Crossbar & Partial CB FC PC L. Benini MPSOC 2004 28

Topology speedup (AMBA AHB) 7000000 6000000 � Independent tasks (matrix multiply) 5000000 � With & without semaphore 4000000 Shared synchronization Bridging � 8 processors (small cache) 3000000 MultiLayer 2000000 1000000 0 Semaphore No semaphore L. Benini MPSOC 2004 29 Crossbar: critical analysis � No bandwidth reduction � Scales poorly � N 2 area and delay � A lot of wires and a lot of gates in a bus- based crossbar � E.g. Area_cell_4x4/Area_cell_bus ~2 for STbus � No locality � Does not scale beyond 10x10! L. Benini MPSOC 2004 30

NoCs network interface switch � More radical solutions in the long term link DSP CPU � Nostrum � HiNoC � Linkoeping SoCBUS � SPIN � Star-connected on-chip network � Aethereal Memory � Proteo Memory � Xpipes CPU � … (at least 15 groups) L. Benini MPSOC 2004 31 NOCs vs. Busses � Packet-based STBUS and AXI � No distinction address/data, only packets (but of many types) � Complete separation between end-to-end transactions and data delivery protocols � Distributed vs. centralized � No global control bottleneck � Better link with placement and routing � Bandwidth scalability, of course! L. Benini MPSOC 2004 32

Networks on chip: Evolution or Revolution? Luca Benini - PDF document

Networks on chip: Evolution or Revolution? Luca Benini lbenini@deis.unibo.it DEIS-Universita di Bologna MPSOC 2004 The evolution of SoC platforms MIPS TriMedia SDRAM General-purpose Scalable VLIW MIPS CPU MMI TriMedia CPU

The Digital Revolution 1 Digital Revolution Nadias Theme 2 Digital Revolution Digital

5. Revolution and Napoleonic Europe 5.1 The Revolution in France 5.2 The Revolution and Europe

Calibration des Microroc (II) Alex, Cyril, Giom, Jean, Max 09 Mai 2011, Annecy 1 Reminder 2

From Evolution to (ML?) Revolution in Mobile Networking Slawomir Stanczak The Actual Revolution

Triple Revolution 1. Internet Revolution 2. Mobile Revolution 3. Social Media Revolution

Healthgrids, from Revolution to Evolution Healthgrids, from Revolution to Evolution Experiences

Exploring Chip to Chip Photonic Networks Philip Watts Computer Laboratory University of Cambridge

Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need

Sons of the Sons of the American Revolution American Revolution JROTC/ ROTC & Service

Digital Industrial Revolution Bearing Specialists Association Greg Scheu, President ABB Americas

EVOLUTION X3 - 1 - Evolution X3 Marketing Dpt. November 2006 - 2 - EVOLUTION X3 Evolution X3

Study Of Chip Breaker El-Sherbeeny, PhD 2014 Project-Group 6 TYPES ES OF F CHI HIP a)

Australian Junior Resources Blue Chip Australian Junior Resources Blue Chip Australian Junior

Final Assembly Chip Core Your final project chip consists of a core The Chip Core is

Evolution vs. Revolution Take Control of Your Risk Technology Journey Sponsored by: Evolution

Designing Networks on Chip: Designing Networks on Chip: Solutions and Challenges Solutions and

Challenges of MPSOC Communication, Computation and Design Flow Prof. Jari Nurmi Tampere

Network Analysis to understand the Roman Commerce Pau de Soto Network Analysis to understand the

Plans of the WLCG for Run3 and HL-LHC era Jose F. Salt Cairols Instituto de Fsica Corpuscular

Hell Defeated The Harrowing of Hell Figure: The harrowing of hell, s. xv (cc0: source) Biblical

Roadrunner: What makes it tick? Los Alamos Computer Science Symposium October 14, 2008 Ken Koch

Thoughts on system software for next-generation hardware !"#"$%"&'()*$ $

VI-EPSCoR Annual Conference 2015 VI-EPSCoR Annual Conference 2015 VI-EPSCoR Annual Conference

The OmpSs Programming Model Jesus Labarta Director Computer Sciences Research Dept. BSC

Networks on chip: Evolution or Revolution? Luca Benini - PDF document

Networks on chip: Evolution or Revolution? Luca Benini lbenini@deis.unibo.it DEIS-Universita di Bologna MPSOC 2004 The evolution of SoC platforms MIPS TriMedia SDRAM General-purpose Scalable VLIW MIPS CPU MMI TriMedia CPU

The Digital Revolution 1 Digital Revolution Nadias Theme 2 Digital Revolution Digital

5. Revolution and Napoleonic Europe 5.1 The Revolution in France 5.2 The Revolution and Europe

Calibration des Microroc (II) Alex, Cyril, Giom, Jean, Max 09 Mai 2011, Annecy 1 Reminder 2

From Evolution to (ML?) Revolution in Mobile Networking Slawomir Stanczak The Actual Revolution

Triple Revolution 1. Internet Revolution 2. Mobile Revolution 3. Social Media Revolution

Healthgrids, from Revolution to Evolution Healthgrids, from Revolution to Evolution Experiences

Exploring Chip to Chip Photonic Networks Philip Watts Computer Laboratory University of Cambridge

Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need

Sons of the Sons of the American Revolution American Revolution JROTC/ ROTC &amp; Service

Digital Industrial Revolution Bearing Specialists Association Greg Scheu, President ABB Americas

EVOLUTION X3 - 1 - Evolution X3 Marketing Dpt. November 2006 - 2 - EVOLUTION X3 Evolution X3

Study Of Chip Breaker El-Sherbeeny, PhD 2014 Project-Group 6 TYPES ES OF F CHI HIP a)

Australian Junior Resources Blue Chip Australian Junior Resources Blue Chip Australian Junior

Final Assembly Chip Core Your final project chip consists of a core The Chip Core is

Evolution vs. Revolution Take Control of Your Risk Technology Journey Sponsored by: Evolution

Designing Networks on Chip: Designing Networks on Chip: Solutions and Challenges Solutions and

Challenges of MPSOC Communication, Computation and Design Flow Prof. Jari Nurmi Tampere

Network Analysis to understand the Roman Commerce Pau de Soto Network Analysis to understand the

Plans of the WLCG for Run3 and HL-LHC era Jose F. Salt Cairols Instituto de Fsica Corpuscular

Hell Defeated The Harrowing of Hell Figure: The harrowing of hell, s. xv (cc0: source) Biblical

Roadrunner: What makes it tick? Los Alamos Computer Science Symposium October 14, 2008 Ken Koch

Thoughts on system software for next-generation hardware !&quot;#&quot;$%&quot;&amp;'()*$ $

VI-EPSCoR Annual Conference 2015 VI-EPSCoR Annual Conference 2015 VI-EPSCoR Annual Conference

The OmpSs Programming Model Jesus Labarta Director Computer Sciences Research Dept. BSC

Sons of the Sons of the American Revolution American Revolution JROTC/ ROTC & Service

Thoughts on system software for next-generation hardware !"#"$%"&'()*$ $