Exploring High Dimensional Topologies for NoC Design Through an - PowerPoint PPT Presentation

Exploring High ‐ Dimensional Topologies for NoC Design Through an Integrated Analysis and Synthesis Framework F. Gilabert † , S. Medardoni ‡ , D. Bertozzi ‡ , L. Benini † † , M.E. Gomez † , P. Lopez † and J. Duato † † Universidad Politécnica de Valencia. ‡ University of Ferrara. † † University of Bologna.

Multi ‐ dimension topologies 2D mesh frequently used for NoC design - perfectly matches 2D silicon surface - high level of modularity - controllability of electrical parameters But its avg latency and resource consumption scale poorly with network size Topology with more than 2 dimensions attractive: - higher bandwidth and lower avg latency - on-chip wiring more cost-effective than off-chip But layout (routing) issues might impact their effectiveness and even feasibility (use of more metal layers) (links with different latencies)

Objective Explore the effectiveness and feasibility of multi-dimensional topologies Exploration methodology issues arise Exploration methodology issues arise 1. Fast and accurate exploration tools required for system-level analysis and topology selection Our approach Abstract the behaviour of all NoC architecture-level mechanisms while retaining RTL clock cycle accuracy (flow control, arbitration, switching, routing, buffering, injection and ejection )

Objective Explore the effectiveness and feasibility of multi-dimensional topologies Exploration methodology issues arise Exploration methodology issues arise 2. Realistically capture traffic behavior May lead to highly inaccurate performance Traffic pattern predictions usually abstracted (traffic peaks, different as an average link kinds of messaging, bandwidth utilization synchronization mismatches) Our approach • Project network traffic based on latest advances in MPSoC communication middleware • Generate traffic patterns for the NoC “shaped” by the above communication middleware (e.g., synchronization, communication semantics)

Objective Explore the effectiveness and feasibility of multi-dimensional topologies Exploration methodology issues arise Exploration methodology issues arise 3. Backend synthesis flow required for assessment of layout effects A single 65nm LP-LVT 65nm LP-HVT 65nm LP-LVT 65nm LP-HVT technology The spread library no increases as longer technology exists for scales down standard cell design Our approach • Silicon-aware topology exploration • Derive physical constraints that, if met, allow to keep the better theoretical properties of multi-dimensional topologies

Topology exploration framework � Reference NoC architecture � Transaction Level models � Traffic pattern generation Exploration of multi ‐ dimensional topologies � System ‐ level performance analysis � Implementation space exploration � TLS driven physical synthesis

Reference NoC architecture Xpipes-Lite switch architecture ARB ARB ARB FLOW CONTROL ARB MGR IN 0 BUFF OUT 0 LATCH PATH SHFT MUXES IN 1 LATCH BUFF PATH SHFT OUT 1 IN 2 BUFF OUT 2 LATCH PATH SHFT IN 3 LATCH BUFF OUT 3 PATH SHFT 1 CK cycle � Input and output sampling � Latency: 1 cycle in the switch, 1 cycle in the link � Wormhole switching � Round ‐ robin arbitration on the output ports

Reference NoC architecture The Network Interface OCP CLK NoC CLK OCP Slave Interface Network Master IF NoC Interface Initiator OCP Fabric Core Back ‐ end � Protocol conversion (from OCP to network) � Packetization � Clock domain crossing � OCP Clock is an integer divider of NoC Clock � Pre ‐ computation of routing path (source routing) � A symmetric network interface target exists

Topology exploration framework � Reference NoC architecture � Transaction level models � Traffic pattern generation Exploration of multi ‐ dimensional topologies � System ‐ level performance analysis � Implementation space exploration � TLS driven physical synthesis

Transaction Level Models NoC Architecture Transaction Level Models Architectural Data structures components Abstraction Component behavior Logical functions Sensitivity Events Our transaction level models thus achieve maximum accuracy Each cycle, only components affected by an event require simulation time •Speed-up is dependant on the system idleness

Network Interface Master Processor To Network STALL flag OCP Side Network Side Data structure Buffers modeled as counters NI Slave is modeled using a similar data structure Flow control status modeled as a flag

Network Interface Master Processor 7 8 To Network STALL flag OCP Side Network Side Packet header generation Events Packetization process Notifies message availability in the processor Send Send one burst of data from the processor to the NI Path calculation If it is possible, moves flits from the NI to the attached OCP Transfer Indicates the size in bursts of data switch Schedules a Packetization Event Schedules additional OCP Transfer Event when OCP Packetization Schedules a OCPTransfer Event according to the is free and Processor has pending bursts processor rate Transmit Each time a flit is created it schedules Transmit Event if possible

Switch Fabric Input ports Output ports STALL flag STALL flag STALL flag STALL flag Data structure Input and Output buffers modeled as counters Flow control status at each port modeled as a flag Internal variables used to represents switch status

Switch Fabric Input ports Output ports STALL flag STALL flag STALL flag STALL flag Events Scheduled by the previous NoC component Input Transmit If possible, moves flits from input to output Chooses for each output, one input which packets are Moves a flit to an input port not routed and want to go through it Route If possible, transmits a flit to the next NoC component If it is moving the tail flit, it frees the output If it is the first flit of a packet it schedules a Route Event If it is possible to move flits from input to output, it Cross schedules a Cross Event If it is possible, it schedules an Output Transmit Event If not, it schedules a Cross Event Output Transmit

TL Models Validation RTL Results simulator System definition Topology description Validation Traffic pattern TL Results simulator

TL Models Validation OCP Processor NI Master OCP Shared NI Memory Slave Switch Switch Switch Switch Switch Network frequency: 1GHz OCP Core frequency: 1GHz, 500MHz, 250 MHz, 125MHz Newort/OCP Clock ratio: 1,2,4,8 Several OCP traffic patterns (parameters: burst length and inter ‐ burst idle time)

TL Models Validation • Maximum error of all the tests was: 0.03% • Simulation speedup varied from 20x to 100x with respects to RTL simulator – Depends heavily on the number of idle cycles of the simulation • 4x4 mesh test: – Maximum error: 0.01% – Speed ‐ up: ~100x

Tile Architecture Tile Memory Processor Core Core Network IF Network IF Initiator Target • Processor core – Connected through a Network Interface Initiator • Local memory core – Connected through a Network Interface Target • Two network interfaces can be used in parallel

Communication protocol Producer Tile ConsumerTile • Step 4: Consumer sends • Step 2: Consumer • Step 3: Consumer reads • Step 1: Producer checks Local 1 Local Polling local semaphores for a notification upon data from the producer detects unblocked Polling 2 pending messages for completion semaphore the destination Read – This allows the producer • Requests producer for Write Operation Message 3 • If not, it writes data to to send another message data the local tile memory to this consumer 4 and unblocks a Reset semaphore at the Semaphore consumer tile • Message sent only when consumer is ready to read it • The producer is free • Only one outstanding message for a producer-consumer pair • Low network bandwidth utilization Dalla Torre, A. et al., ”MP-Queue: an Efficient Communication Library for Embedded to carry out other • Tight latency constraints on the topology Streaming Multimedia Platform”, IEEE Workshop on Embedded Systems for Real-Time Multimedia, 2007. tasks

Workload distribution External I/O • Producer, worker and consumer tasks • I/O devices dedicated to input OR output data • Modeling of layout constraints (I/O devices on one side of the chip)

System ‐ level performance analysis • Tile ‐ based architecture • 16 tiles system – Up to 5 tiles used for access external I/O • Baseline topology 4x4 mesh (4 ‐ ary 2 ‐ mesh) – Switch frequency: 1GHZ – Tile Frequency: 500 MHz – External I/O frequency: 500 MHz

Exploring High Dimensional Topologies for NoC Design Through an - PowerPoint PPT Presentation

Exploring High Dimensional Topologies for NoC Design Through an Integrated Analysis and Synthesis Framework F. Gilabert , S. Medardoni , D. Bertozzi , L. Benini , M.E. Gomez , P. Lopez and J. Duato Universidad

In-house management tools TF-NOC George Kargiotakis (kargig@noc.grnet.gr) Andreas Polyrakis

in Barcelona NOC taxonomy Stefan Listrm NORDUnet NOC taxonomy topics Nordic infrastructure

Network-on-Chip Switching strategies Routing algorithms (NoC) Flow control schemes NoC

NREN N NOC TF-NOC preparation meeti ing Copenhagen May 3, 2010 Hvard Kusslid, NOC C-manager,

GRNET NOC flash presentation TF NOC 15 16/2/2011, Ljubljana Andreas Polyrakis GRNET NOC

New mandate for TF-NOC TF-NOC meeting, Prague, 14-11-2013 TF-NOC, a summary from 2010 to 2013

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif

top 3 items for a new NOC top 3 items for a new NOC an informal survey Gareth Eason, HEAnet for

Welcome to Madrid, TF-NOC! Maria Isabel Gandia Carriedo 11th TF-NOC meeting 21-22 Oct 2014

NOC front end development NOC front end development Work Item Update Gareth Eason, HEAnet for

GRNET NOC network monitoring & visualization tools TF-NOC Zurich Alex Kosiaris

How to Build a 24x7 NOC Cheaply Hank Nussbacher IUCC Terena TF-NOC - Zurich June 29, 2011 How

Social Media in your NOC Social Media in your NOC A discussion Gareth Eason, HEAnet for TF-NOC,

3M DI-NOC Film Interior & Exterior Design Solutions 3M Confidential 3M Confidential

RF-Interconnect RF-Interconnect and its Applications to and its Applications to NoC Design NoC

Orthogonal Compaction Using Additional Bends Michael Jnger 1 , Petra Mutzel 2 , Christiane Spisla

LinuxBoot: Linux as Firmware Chris Koch, Gan Shun Lim Google with Ron Minnich, Ryan OLeary,

Orchestrating Proxysql with OCP Avi Apelbaum, Wix aviap@wix.com linkedin/aviapelbaum

OCP SAI Sanjay Sane Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada OCP SAI

Interoperating direct and indirect optimal control solvers Olivier Cots, Joint work with J.-B.

Grammatical Theory with Gradient Symbol Structures The GSC Research Group Paul Smolensky

Multigenerational mobility in India / Vegard Iversen, Anustup Kundu & Kunal Sen September 13,

Design principles: laughing in the face of change Perdita Stevens School of Informatics

Exploring High Dimensional Topologies for NoC Design Through an - PowerPoint PPT Presentation

Exploring High Dimensional Topologies for NoC Design Through an Integrated Analysis and Synthesis Framework F. Gilabert , S. Medardoni , D. Bertozzi , L. Benini , M.E. Gomez , P. Lopez and J. Duato Universidad

In-house management tools TF-NOC George Kargiotakis (kargig@noc.grnet.gr) Andreas Polyrakis

in Barcelona NOC taxonomy Stefan Listrm NORDUnet NOC taxonomy topics Nordic infrastructure

Network-on-Chip Switching strategies Routing algorithms (NoC) Flow control schemes NoC

NREN N NOC TF-NOC preparation meeti ing Copenhagen May 3, 2010 Hvard Kusslid, NOC C-manager,

GRNET NOC flash presentation TF NOC 15 16/2/2011, Ljubljana Andreas Polyrakis GRNET NOC

New mandate for TF-NOC TF-NOC meeting, Prague, 14-11-2013 TF-NOC, a summary from 2010 to 2013

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif

top 3 items for a new NOC top 3 items for a new NOC an informal survey Gareth Eason, HEAnet for

Welcome to Madrid, TF-NOC! Maria Isabel Gandia Carriedo 11th TF-NOC meeting 21-22 Oct 2014

NOC front end development NOC front end development Work Item Update Gareth Eason, HEAnet for

GRNET NOC network monitoring &amp; visualization tools TF-NOC Zurich Alex Kosiaris

How to Build a 24x7 NOC Cheaply Hank Nussbacher IUCC Terena TF-NOC - Zurich June 29, 2011 How

Social Media in your NOC Social Media in your NOC A discussion Gareth Eason, HEAnet for TF-NOC,

3M DI-NOC Film Interior &amp; Exterior Design Solutions 3M Confidential 3M Confidential

RF-Interconnect RF-Interconnect and its Applications to and its Applications to NoC Design NoC

Orthogonal Compaction Using Additional Bends Michael Jnger 1 , Petra Mutzel 2 , Christiane Spisla

LinuxBoot: Linux as Firmware Chris Koch, Gan Shun Lim Google with Ron Minnich, Ryan OLeary,

Orchestrating Proxysql with OCP Avi Apelbaum, Wix aviap@wix.com linkedin/aviapelbaum

OCP SAI Sanjay Sane Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada OCP SAI

Interoperating direct and indirect optimal control solvers Olivier Cots, Joint work with J.-B.

Grammatical Theory with Gradient Symbol Structures The GSC Research Group Paul Smolensky

Multigenerational mobility in India / Vegard Iversen, Anustup Kundu &amp; Kunal Sen September 13,

Design principles: laughing in the face of change Perdita Stevens School of Informatics

GRNET NOC network monitoring & visualization tools TF-NOC Zurich Alex Kosiaris

3M DI-NOC Film Interior & Exterior Design Solutions 3M Confidential 3M Confidential

Multigenerational mobility in India / Vegard Iversen, Anustup Kundu & Kunal Sen September 13,