Contrasting Topologies for Regular Interconnection Networks under the Constraints of Nanoscale Technologies
MPSoC Research Group @ University of Ferrara
Daniele Ludovici, Francisco Gilabert, Maria Gomez, Georgi Gaydadjiev, Davide Bertozzi
Contrasting Topologies for Regular Interconnection Networks under - - PowerPoint PPT Presentation
Contrasting Topologies for Regular Interconnection Networks under the Constraints of Nanoscale Technologies MPSoC Research Group @ University of Ferrara Daniele Ludovici , Francisco Gilabert, Maria Gomez, Georgi Gaydadjiev, Davide Bertozzi
MPSoC Research Group @ University of Ferrara
Daniele Ludovici, Francisco Gilabert, Maria Gomez, Georgi Gaydadjiev, Davide Bertozzi
The execution of many multimedia and signal processing functions has been historically accelerated by means of specialized processing engines Performance of hardware accelerators is becoming accessible by combining multiple programmable processor tiles within a multicore system With the advent of MPSoC technology
The execution of many multimedia and signal processing functions has been historically accelerated by means of specialized processing engines Performance of hardware accelerators is becoming accessible by combining multiple programmable processor tiles within a multicore system With the advent of MPSoC technology
System complexity is more a matter of instantiation and connectivity capability rather than architecture development
Connectivity patterns for large scale systems are well known from off-chip networking Nanoscale silicon Technologies
Module Module Module Module Module Module Module Module Module Module Module Module
Growing gap between pre- and post-layout properties of topology connectivity patterns
Over-the-cell routing? Latency in injection links? Latency in express links? Which switch
frequency ? Can automatic routing tools handle this effectively? How is routing congestion at each metal layer impacted? Regularity broken by asymmetric tile size or heterogeneous tiles!
8ary-2mesh => 2D mesh
4ary-3mesh
4ary-2mesh
Other concentrated variants 2ary 6mesh
8-Cmesh
Power estimation Physical Synthesis Floorplan Topology generation Topology specification RTL SystemC/Verilog Prime Time PX Prime time SDF (timing) Placement Clock Tree Synth., Power Grid, routing, post-routing opt Netlist, Parasitic Extraction Simulation VCD Trace OCP Traffic Generator Transactional Simulator
Challenge: layout aware physical modeling of large scale NoC topologies
The critical path is determined by the switch-to-switch link in a NoC topology! Most of the topologies are not competitive with the 2D mesh because of their long links and even unusable!!!
Longest link determines the highest achievable frequency (post-layout) Highest switch radix determines maximum frequency (post-synthesis)
consequently their final synthesis frequency KEY T/A: they are not more area efficient than the 2D mesh but due to their slow down, their area footprint can be overly optimized… …never forget target frequency when considering area footprint!! 4-ary 2-mesh: short link (3mm)=>small performance drop. Few switches (16): 20% saving E.g., 2-ary 6-mesh has slower final frequency w.r.t. 8-ary 2-mesh but all the swiches have radix 8 vs. 4,5,6 => 8-ary 2-mesh has 10% area saving
switch switch Flip flops
data data
Pipeline stage Pipeline stage
Data Data stall stall Control Logic Control Logic
2 slot buffers needed for stall/go flow control
Normal flit propagation Backup slot to compensate propagation delay
signals
sel en2 en1
stall valid Data
0.5 1 1.5 2 2.5 3 3.5 flip-flops barrier flow control stage
area overhead
Key take-away: each topology has a different price to pay to restore the maximum achievable frequency dictated by its elementary switch block
Theoretical =>
Layout aware no pipelining =>
Layout aware, with link pipelining =>
using link pipelining techniques:
Is the performance boost proportioned to the area overhead?
is NOT cost-effective in that the area overhead is disproportioned with the performance boost
achieves better area efficiency
C-mesh topologies at different levels of abstraction: from system- to layout-level
from an area and timing viewpoint while pruning implementation time and memory requirements
C-mesh topologies preserve performance benefits…. …but this comes at disproportioned area cost!
daniele.ludovici@unife.it