Andrew Clinton, Matt Liberty, Ian Kuon FPGA Routing (Interconnect) - PowerPoint PPT Presentation

Andrew Clinton, Matt Liberty, Ian Kuon

FPGA Routing (Interconnect) FPGA routing consists of a network of wires and programmable switches  Wire is modeled with a reduced RC network  Drivers are modeled as a SPICE netlist  2-Level pass gate mux is modeled with a capacitive load model  Programmability comes through SRAM bits that control the pass gate switches 2

Routing Delay Annotation Routing (interconnect) delay calculation contributes significantly to overall FPGA compiler runtime  Timing graph topology and wire loading are not known in advance  Due to this high degree of runtime configurability, we’ve previously relied on high-accuracy SPICE-like simulations to calculate routing delays  These simulations have historically contributed as much as 10% to overall FPGA compiler runtime – For just signoff timing, the proportion of runtime is larger 3

Routing Tree Traversal (SPICE) In software, routing is represented as a SPICE Simulation (rise and fall) forest of trees Liberty cell evaluation  Trees are sourced and sinked at timing cells (such as logic elements or DSPs)  For each tree, delay annotation traverses Driver the tree in depth first order Load  Each driver/load pair is simulated using SPICE – Output waveform(s) are propagated to children  Node delays are saved 4

RICE – Rapid Interconnect Evaluation An implementation of AWE (Asymptotic Waveform Evaluation)  Black box that takes a circuit as input and provides the impulse response as output – In our case, always a grounded RC circuit – Sometimes containing resistor loops  Impulse response is a sum of exponentials – Given impulse response, can calculate the output voltage waveform for an arbitrary input  Generally O(n) in circuit complexity and number of moments 5

RICE vs SPICE, 84 Node RC Network Step Response Algorithm Delay (ps) Error Runtime (us) 41.0 RICE, order 1 94.554 6% 50.9 RICE, order 2 108.399 8% 57.0 RICE, order 3 100.137 <0.01% 63.0 RICE, order 4 100.139 <0.01% 143.3 SPICE, 50ps step 99.377 0.75% 264.6 SPICE, 10ps step 100.180 0.04% 418.6 SPICE, 5ps step 100.128 <0.01% 872.0 SPICE, 2ps step 100.135 <0.01% 6

Integrating RICE with Non-Linear Drivers RICE can calculate accurate linear circuit delays approximately 1 order of magnitude faster than our SPICE simulator. However, it doesn’t handle non -linear drivers  The challenge is then to obtain sufficiently accurate driver delays without incurring the cost of simulations  Our general approach involves pre-computing a table of voltage waveforms at the driver output, parameterized by: – Input waveform slew – Output load (pi model)  Similar to Liberty cell models, we will query this table at runtime 7

Cumulative Approximation Sequence The following slides will outline a sequence of approximations that help to break down the sources of error that arise from replacing SPICE with RICE:  3.1 Splitting Driver / Load Simulations  3.2 Reducing Input Waveforms to 1 Parameter  3.3 Using RICE for Loads  3.4 Reducing Driver Load Model to 3 Parameters  3.5 4D Driver Waveform Cache  3.6 2D Driver Waveform Cache 8

3.1 Splitting Driver and Load Simulations Driver and load delay calculation need SPICE Simulation (rise and fall) to be separate to substitute RICE for Liberty cell evaluation just the load  As a first step toward this goal, split up the monolithic driver/load simulation into separate driver and load sims Driver Load  With a small step size, there should be little impact on delays  Useful for sanity checking our flow 9

3.2 Reducing Input Waveforms to 1 Parameter To key our waveform cache on input waveforms, we need to reduce waveform dimensionality  Routing Waveforms are strongly exponential – We’ve chosen this shape as our fit target  Some outliers don’t fit well, resulting in bias/variance 10

3.3 Using RICE for Loads Our initial evaluation showed almost no SPICE Simulation (rise and fall) RICE evaluation error (<0.01%) for step response Liberty cell evaluation  Calculating the response to arbitrary input waveforms leads to some error due to our convolution implementation – We found it necessary to implement this Driver convolution with discretization and an Load internal 5ps step size to improve runtime  Low order could compromise accuracy – Order 4 seems to converge fairly completely in our tests 11

3.4 Reducing Driver Load Model to 3 Parameters To key our waveform cache on the output load, we need to reduce the dimensionality of the load  A Pi model for the load is readily available Pi model from the first 4 moments in RICE  Some inaccuracy in driver waveform shape is possible with this approximation 12

3.5 4D Driver Waveform Cache Given an input waveform / load in 4D cache evaluation RICE evaluation reduced parameter space, we can Liberty cell evaluation tabulate driver waveforms  Choose evaluation points on each axis  Evaluate and store monotonic waveforms Driver  At runtime, interpolation/extrapolate Load waveforms in the cache – Interpolating time, not voltage requires monotonicity 13

3.5 4D Interpolation Several sources of error creep in with interpolation:  Interpolation error – Choice of evaluation points and cache resolution have a strong influence on error  Extrapolation error  Forced monotonicity  Waveform simplification – For efficiency, choose fixed evaluation voltages and use vector CPU instructions 14

Results We integrated IRICE (Intel’s implementation of RICE) into our FPGA signoff timing engine in Quartus  To generate test routes, we compiled a single large user design for the Stratix 10 device, resulting in routing with n=~1.3 million routing elements  Each successive approximation (3.1 – 3.6) was statistically compared to the ground truth for both rising and falling delays – Ground truth delays were calculated using our custom SPICE simulator with a small step size (5ps) – We also compared against SPICE in the lower accuracy mode (50ps) that we have used in production in the past 15

Accuracy – Rising Delays 4.0% 3.0% 2.0% Percent Error 1.0% 0.0% -1.0% -2.0% 3.5 4D 3.2 Simplify 3.5 4D 3.6 2D 3.1 Split 3.3 Simulate 3.4 Pi Model for Waveform SPICE, 50ps Input Waveform Waveform Simulations Load with RICE Driver Load Cache (2x Maximum Step Waveforms Cache Cache resolution) Bias 0.0% 0.5% 0.7% 0.7% 0.9% 0.9% 0.0% -0.4% Standard Deviation 0.1% 0.6% 0.6% 0.7% 1.5% 0.8% 3.6% 1.9% 16

Accuracy – Falling Delays 4.0% 3.0% 2.0% Percent Error 1.0% 0.0% -1.0% -2.0% 3.5 4D 3.2 Simplify 3.5 4D 3.6 2D 3.1 Split 3.3 Simulate 3.4 Pi Model for Waveform SPICE, 50ps Input Waveform Waveform Simulations Load with RICE Driver Load Cache (2x Maximum Step Waveforms Cache Cache resolution) Bias 0.0% 0.6% 0.9% 1.1% 0.1% 1.0% -1.6% -0.9% Standard Deviation 0.1% 1.0% 1.0% 1.1% 1.6% 1.2% 3.6% 1.6% 17

Accuracy – Error Distribution (4D Cache with IRICE) Irregularity in distribution shape arises partly due to the summation of several distinct driver types into one distribution  Worst case outliers (not shown): – -8.9%, +11.1% for rising delays – -9.0%, +15.9% for falling delays 18

Runtime Profile (4D Cache with IRICE) More than 50% of runtime is spent in Subtask Delay (ps) IRICE RICE Build Circuit 9.3%  In particular, moment calculation RICE Calculate Moments 36.7% followed by poles/residues calculation RICE Calculate Poles/Residues 18.0%  Outside IRICE, piecewise linear PWL Convolution 10.7% waveform convolution has the highest runtime Least Squares Fit 6.3% When compared to SPICE, overall 4D Interpolation 4.0% runtime is ~3x faster at a similar 4D Cache Initialization 4.6% accuracy level Other 10.4% 19

Andrew Clinton, Matt Liberty, Ian Kuon FPGA Routing (Interconnect) - PowerPoint PPT Presentation

Andrew Clinton, Matt Liberty, Ian Kuon FPGA Routing (Interconnect) FPGA routing consists of a network of wires and programmable switches Wire is modeled with a reduced RC network Drivers are modeled as a SPICE netlist 2-Level pass

Liberty State Park Park Interior WRT Liberty State Park Today Liberty State Park The Park

Clinton Township School District Relationship of School Budget and Clinton Township Municipal

Reconfigurable Molecular Dynamics Simulator Navid Azizi, Ian Kuon, Aaron Egier, Ahmad Darabiha

Liberty Public Schools Liberty North High School W HY A+ A+ Program provides significant college

Liberty North High School W HY A+ A+ Program provides significant college funds to former Liberty

Scalable Routing Outline Routing Algorithms Scalability 1 Overview Forwarding vs Routing

Ad Hoc Wireless Routing CS 218- Fall 2003 Wireless multihop routing challenges Review of

Routing Algebras What are routing algebras? Created to study properties of routing protocols

Clinton, MA Westfield Technical Assistance Panel, October 26, 2015 Clinton Technical Assistance

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

Analyst Day Presentation 2 Liberty Group Analyst day - Agenda Introduction and overall update

LIBERTY BIKES An online game for 1-4 players Built on microservices What is Liberty Bikes? What

Advanced routing topics Tuomas Launiainen Suboptimal routing Routing trees Measurement of

A comprehensive analysis of superpage management mechanisms and policies Weixi Zhu, Alan L. Cox,

Bohmian mechanics and cosmology Ward Struyve Rutgers University, USA Outline I. Introduction to

Cutland: Computability, an introduction to recursive function theory Kozen: Automata and

Agricultural technology adoption and impact Luc Christiaensen, Jobs Group, World Bank,

The Quintet PoissonMellinNewtonRiceLaplace Brigitte Vall ee CNRS et Universit

Spatio-Tem poral Available Bandw idth Estim ation Vinay Ribeiro Rolf Riedi, Richard Baraniuk

Theory of Computer Science D4. Halting Problem Variants & Rices Theorem Gabriele R oger

Exercise Sheet 2 Undecidability and Rices Theorem David Carral October 23, 2019 Exercise

Andrew Clinton, Matt Liberty, Ian Kuon FPGA Routing (Interconnect) - PowerPoint PPT Presentation

Andrew Clinton, Matt Liberty, Ian Kuon FPGA Routing (Interconnect) FPGA routing consists of a network of wires and programmable switches Wire is modeled with a reduced RC network Drivers are modeled as a SPICE netlist 2-Level pass

Liberty State Park Park Interior WRT Liberty State Park Today Liberty State Park The Park

Clinton Township School District Relationship of School Budget and Clinton Township Municipal

Reconfigurable Molecular Dynamics Simulator Navid Azizi, Ian Kuon, Aaron Egier, Ahmad Darabiha

Liberty Public Schools Liberty North High School W HY A+ A+ Program provides significant college

Liberty North High School W HY A+ A+ Program provides significant college funds to former Liberty

Scalable Routing Outline Routing Algorithms Scalability 1 Overview Forwarding vs Routing

Ad Hoc Wireless Routing CS 218- Fall 2003 Wireless multihop routing challenges Review of

Routing Algebras What are routing algebras? Created to study properties of routing protocols

Clinton, MA Westfield Technical Assistance Panel, October 26, 2015 Clinton Technical Assistance

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

Analyst Day Presentation 2 Liberty Group Analyst day - Agenda Introduction and overall update

LIBERTY BIKES An online game for 1-4 players Built on microservices What is Liberty Bikes? What

Advanced routing topics Tuomas Launiainen Suboptimal routing Routing trees Measurement of

A comprehensive analysis of superpage management mechanisms and policies Weixi Zhu, Alan L. Cox,

Bohmian mechanics and cosmology Ward Struyve Rutgers University, USA Outline I. Introduction to

Cutland: Computability, an introduction to recursive function theory Kozen: Automata and

Agricultural technology adoption and impact Luc Christiaensen, Jobs Group, World Bank,

The Quintet PoissonMellinNewtonRiceLaplace Brigitte Vall ee CNRS et Universit

Spatio-Tem poral Available Bandw idth Estim ation Vinay Ribeiro Rolf Riedi, Richard Baraniuk

Theory of Computer Science D4. Halting Problem Variants &amp; Rices Theorem Gabriele R oger

Exercise Sheet 2 Undecidability and Rices Theorem David Carral October 23, 2019 Exercise

Theory of Computer Science D4. Halting Problem Variants & Rices Theorem Gabriele R oger