ASP-DAC'01 Olivier Coudert IV-1
The Quest for design closure The Quest for design closure Olivier - - PowerPoint PPT Presentation
The Quest for design closure The Quest for design closure Olivier - - PowerPoint PPT Presentation
Part IV Part IV The Quest for design closure The Quest for design closure Olivier Coudert Monterey Design System ASP-DAC 2001, tutorial 3 ASP-DAC'01 Olivier Coudert IV-1 DSM Dilemma SOC DSM Time to market Higher resistance Abstraction
ASP-DAC'01 Olivier Coudert IV-2
DSM Dilemma
SOC Time to market Million gates High density, larger die Higher clock speeds Long wires Project management Re-use, IPs Larger database Larger design space
Need abstraction levels to manage complexity Require detailed analyses to understand physical interactions
Accuracy DSM Higher resistance Higher cross- coupling Non-linear timing Power Electromigration IR Drop Inductances etc ... Abstraction
ASP-DAC'01 Olivier Coudert IV-3
Logic Synthesis Flow
RTL Behavioral spec. Behavioral synthesis Logic synthesis Layout Gate level netlist
while (x<a) do while (x<a) do x1:= x + x1:= x + dx dx; ; u1:= u u1:= u -
- (3*x*u*
(3*x*u*dx dx) ) -
- (3*y*
(3*y*dx dx); ); y1:= y + (u* y1:= y + (u*dx dx); ); x:= x1; u:= u1; y:= y1; x:= x1; u:= u1; y:= y1; endwhile endwhile
RC: = ALU 1(RX, a, comp); RC: = ALU 1(RX, a, comp); wait until clock AND RC; wait until clock AND RC; RX1 := ALU1 (RX, RDX, ADD); RX1 := ALU1 (RX, RDX, ADD); RT1 := MULT1(RU, RX); RT1 := MULT1(RU, RX); RT2 := MULT 2(3, RDX); RT2 := MULT 2(3, RDX); wait until clock; wait until clock; RT3 := MULT1(RT1, RT2); RT3 := MULT1(RT1, RT2); RT4:= MULT2(RT2, RY); RT4:= MULT2(RT2, RY);
Synthesis done with WLM and Elmore delay
ASP-DAC'01 Olivier Coudert IV-4
( )
4 3 2 1 1 C C C C R T 4 4 ) 4 3 2 ( 2 C R C C C R + + + + + + + =
R4 C4 R1 R2 R3 C1 C2 C3
Limits of Elmore Delay
As R increases, Elmore delay becomes inaccurate,
and cannot be trusted for guiding optimizations
Elmore delay to C4 is independent of R3!
ASP-DAC'01 Olivier Coudert IV-5
2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0 1 6 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 % error
m ax
1 0 0 ps rise tim es
Elmore Errors
1200 delay comparisons from a 0.35 µm CMOS mP
ASP-DAC'01 Olivier Coudert IV-6
Wire Load Model
Based on number of pins of the nets
# k-pin nets C
Good prediction of average/median capacitance… …But very large variance
ASP-DAC'01 Olivier Coudert IV-7
Transistor vs. Wire Delays
Technology Generation (µ) Technology Generation (µ) Delay Delay ( (psec psec) ) 1 10 100 1000 1.5 1 0.8 0.6 0.35 0.25 0.18 0.1 Transistor Metal 2 (2mm)
ASP-DAC'01 Olivier Coudert IV-8
Timing & Interconnect
Wireload models were ALWAYS inaccurate
Post-synthesis signoff was possible when
interconnect contributed ~20% of the total capacitance
But now the interconnect capacitance is
becoming dominant over the total capacitance with each new process generation
ASP-DAC'01 Olivier Coudert IV-9
Design Closure
Placement/synthesis/routing interaction Congestion Timing Optimization Clock design Power design Signal integrity Design signoff Problem size & Computational resources
ASP-DAC'01 Olivier Coudert IV-10
Placed & routed netlist Detailed placed netlist Gate level netlist Instanciate architecture and perform technology mapping
Timing in the pre-DSM flows
RTL to gate-level performance driven synthesis, then P&R RTL Limited sizing interconnect information #fanout detailed topology accurate estimation timing good estimation accurate netlist flexibility high low
ASP-DAC'01 Olivier Coudert IV-11
Placed & routed netlist Detailed placed netlist Gate level netlist Instanciate architecture and perform technology mapping
Timing in a DSM flow
Timing is known after placement: synthesis and P&R cannot be independent RTL Limited sizing interconnect information #fanout detailed topology accurate estimation timing large variance accurate netlist flexibility high low
ASP-DAC'01 Olivier Coudert IV-12
Placed & routed netlist Detailed placed netlist Gate level netlist Instanciate architecture and perform technology mapping
Timing in a DSM flow
RTL Limited sizing interconnect information #fanout detailed topology accurate estimation timing large variance accurate netlist flexibility high low Need enough P&R information, and enough netlist flexibility good estimation medium
ASP-DAC'01 Olivier Coudert IV-13
- Quadratic placement
fast restricted cost function, e.g., timing driven placement with
net weighting
- Simulated annealing
- pen cost function
extremely slow
- Force directed
semi-open cost function slower than quadratic placement tuning more difficult
- Bisection (mincut + partitioning)
- pen cost function
slower than quadratic placement
Placement
ASP-DAC'01 Olivier Coudert IV-14
Placement/Synthesis/Routing
placement timing synthesis congestion area route
Placement is needed to derive routing Routing is needed to derive timing Cell placement and net topology must be flexible to allow synthesis
ASP-DAC'01 Olivier Coudert IV-15
Netlist Clustering
- Start placement by building a hierarchical tree of cell-clusters
from the netlist (h-Metis DAC’97)
- A key to optimal placement is to optimize the size and
locations of these clusters
- Both functional hierarchy and netlist topology need to be
considered
A C B Netlist F E D
ASP-DAC'01 Olivier Coudert IV-16
Placement
- The clusters are sized and placed within bins and among
megacells
- Minimize:
wirelength Intra-bin and inter-bins congestion
ASP-DAC'01 Olivier Coudert IV-17
Placement
- This process continues to smaller clusters and smaller bins
- Long wires are probabilistically routed
ASP-DAC'01 Olivier Coudert IV-18
Placement
- Eventually one reaches a cluster and bin size for which timing
and congestion are predictable (~ 1k to 10k gates per bin)
- physical prototype:
Timing optimization can start at this level ONLY Timing signoff can be done at this level ONLY
ASP-DAC'01 Olivier Coudert IV-19
Design Closure
Placement/synthesis/routing interaction Congestion Timing Optimization Clock design Power design Signal integrity Design signoff Problem size & Computational resources
ASP-DAC'01 Olivier Coudert IV-20
Placement & Congestion
- Cells are nonuniformily distributed within bins
Dynamic whitespace allocation addresses congestion at
the global level
- Inter- and intra-bin congestion is predictable at the physical
prototype level
ASP-DAC'01 Olivier Coudert IV-21
Non-Uniform Whitespace Mgmt.
- Example of whitespace allocation after timing driven
placement and optimization
White Space added to relieve congestion White Space added to relieve congestion White Space added to relieve congestion White Space added to relieve congestion White Space removed to help relieve congestion in other areas White Space removed to help relieve congestion in other areas Movement of cells for timing optimization Movement of cells for timing optimization
ASP-DAC'01 Olivier Coudert IV-22
Congestion Management
- DSM creates a
significant timing/congestion dependency
- Carefully manage
congestion so that there are no surprises at DR stage!
- Wiring models and
congestion estimates are strongly correlated from placement, through GR to DR
ASP-DAC'01 Olivier Coudert IV-23
Routing Correlation
Global routing can utilize the whitespace to avoid
long-distance couplings for critical nets
Extra spacing, shielding, or space for rip-up and
reroute
No surprises for the detailed router after GR
Advanced N-layer shape-based router Supports gridless and gridded routing Variable wire width for optimal delay constraints Cross-talk avoidance, antenna effects Clock tree sizing for tree balancing Power routing sizing for voltage drop and
electromigration
ASP-DAC'01 Olivier Coudert IV-24
Design Closure
Placement/synthesis/routing interaction Congestion Timing Optimization Clock design Power design Signal integrity Design signoff Problem size & Computational resources
ASP-DAC'01 Olivier Coudert IV-25
Timing Prediction
- As the routing models become more precise, so do the timing
predictions for the long wires
The timing/delay models and analyses are only as precise
as the physical information
Enforce correlation from front-end to back-end
ASP-DAC'01 Olivier Coudert IV-26
Timing Optimization
- The first tech mapping was an approximation, since the wiring
capacitances were not known
- With sufficient physical information at the placement level, we
begin timing optimization
ASP-DAC'01 Olivier Coudert IV-27
Placement/Synthesis/Routing
- Buffers are inserted for shielding, delay, and attenuation
- Global routing is used to place the buffers and inverters
- The design of long nets is “seeded” by buffers, driven by
accurate physical information
ASP-DAC'01 Olivier Coudert IV-28
Incremental Synthesis Requirements
- “partial” placement and routing
- placement and routing must accommodate for the incremental
logic changes
- local transformation
- accurate delay estimation
input slope and output capacitance dependent Interconnect delay rising and falling signals crosstalk aware
- efficient incremental timing analysis
ASP-DAC'01 Olivier Coudert IV-29
Incremental Synthesis
- Load and driver strength adaptation
Sizing Buffering Pin swapping Cloning
- Timing boundary shifting
Transparent latch Retiming Useful skew
- Technology remapping
- Re-synthesis
- Redundancies based optimization
- Area recovery
ASP-DAC'01 Olivier Coudert IV-30
Logic Optimization
“Analytical” approaches
Assume continuous size Fast Map a continuous solution onto a discrete library Use simple models (e.g., Elmore delay)
“Refinement” approaches
Can use complex and/or discrete models Can mix a wide range of transformations Slower Strategy/control more difficult
ASP-DAC'01 Olivier Coudert IV-31
Resizing
C
delay s1 s2 s3 s C < C1 C1 < C < C3 C > C3
C
delay
Load fitting is a rough resizing
s s4
C1 C2 C3
ASP-DAC'01 Olivier Coudert IV-32
Resizing
C
delay s s
pin cap
ASP-DAC'01 Olivier Coudert IV-33
- output load dependent
Resizing
C
delay
non-convex
s
- rising/falling signal
- non-monotonic input pin cap
- input slope dependent
s
- Explore local solutions using a host
- f optimizations
Buffering, sizing, remapping, etc...
Logic Optimization
- The selection criteria target “problem regions”:
Timing critical regions Timing non-critical regions Congested regions … Selection Selection Exploration Exploration Application Application converge? converge?
Need to keep boundary conditions stable during the exploration phase
- The final objectives include:
Fix slope violations Maximize global slack Congestion relief Area recovery …
Logic Optimization: Sizing
Filter timing paths Filter timing paths Explore sizes Explore sizes Apply solution Apply solution converge? converge?
- Each gate and a suitable window
is extracted
- Explore different sizes for the
gate and keep the best
- Always focused on the
current critical paths
ASP-DAC'01 Olivier Coudert IV-36
S = −0.1
Useful Skew
S > 0.1 0.3 0.4 0.4 S = 0.0 S > 0.0 +0.1
ASP-DAC'01 Olivier Coudert IV-37
Placement/Synthesis/Routing
- The flexibility of the placement and the continuous refinement
allows logic optimization to continue throughout the flow
Continual monitoring of “what is critical” From extensive to local logic optimization
Global buffering & sizing Resynthesis/techmap Re-buffering & sizing Focused techmap & logic transformations Re-buffering & sizing
ASP-DAC'01 Olivier Coudert IV-38
Design Closure
Placement/synthesis/routing interaction Congestion Timing Optimization Clock design Power design Signal integrity Design signoff Problem size & Computational resources
ASP-DAC'01 Olivier Coudert IV-39
Clock Distribution
- Most clock tree synthesis algorithms attempt to build the clock
tree post-placement
This is too late – congestion could disturb timing closure But you can’t build it too early, since you don’t know where
the latches are
Clock Routing Clock Tree Generation Placement Floorplanning Synthesis
ASP-DAC'01 Olivier Coudert IV-40
Clock Distribution
- The placement should provide enough information to know the
distribution of latches, but should be abstract enough to avoid being trapped by congestion caused by the clock wiring
ASP-DAC'01 Olivier Coudert IV-41
Clock Distribution
- First clock tree is created at the physical prototyping level
A complete buffered/gated tree is automatically
synthesized
The user has the option to instantiate the top portions of a
tree based on the distribution of latches and flipflops
ASP-DAC'01 Olivier Coudert IV-42
Clock Distribution
- The contribution of the clock tree to the congestion is taken
into account as early as it is meaningful
- The latch and flip-flop distribution will not change dramatically
after the physical prototype level
- The clock tree will be refined and adjust as the placement and
- ptimization processes continue
ASP-DAC'01 Olivier Coudert IV-43
Clock Distribution
- Accurate timing projections enable useful skew methods to be
applied at this level
- Placement is still coarse enough so that objects with common-
skew targets can be grouped
ASP-DAC'01 Olivier Coudert IV-44
Design Closure
Placement/synthesis/routing interaction Congestion Timing Optimization Clock design Power design Signal integrity Design signoff Problem size & Computational resources
ASP-DAC'01 Olivier Coudert IV-45
Power/Ground Distribution
- The physical prototype also provides sufficient information to judge
the quality and integrity of the power/ground network
Power/ground network can have a huge impact on congestion
- Power rail currents will not change much as the placement is refined
- Yet there is enough space to add/widen stripes
API driven adjustment using incremental IR-drop analyses Ultimately this optimization process can be automated
ASP-DAC'01 Olivier Coudert IV-46
Power/Ground Distribution
Eventually automation process will have to consider
more detailed analysis too:
Inductance of chip and packaging Resonance frequencies via ac analyses On-chip decoupling
ASP-DAC'01 Olivier Coudert IV-47
Design Closure
Placement/synthesis/routing interaction Congestion Timing Optimization Clock design Power design Signal integrity Design signoff Problem size & Computational resources
ASP-DAC'01 Olivier Coudert IV-48
Crosstalk
Coupling vs. Inter-layer capacitance 2 4 6 8 1997 2001 2006 2009 2012 Cc/ Cs
Source: 1998 Update, International Technology Roadmap for Semiconductors
The same layer coupling capacitance is
beginning to dominate the total net capacitance
Makes cross-talk a dominant factor in
achieving timing closure
ASP-DAC'01 Olivier Coudert IV-49
- Traditional model
replace by grounded cap with twice the value is inaccurate
Crosstalk - modeling
VIC AGG AGG
Different directions same directions 2x cap
ASP-DAC'01 Olivier Coudert IV-50
Crosstalk - modeling
Aggressor switches at different times Delay of victim net depends
- n the switching time of the
aggressor nets
- Dependency on arrival times of the signals
ASP-DAC'01 Olivier Coudert IV-51
Crosstalk Delay Impact
- Simply modeling the coupling capacitance as grounded
capacitance scaled by ~2x is overly pessimistic
- Timer should model early and late arrival times at all nodes (for
each library) so that worst/best case switching can be determined during path traversal
TACO: Timing Analysis with Coupling (DAC 2000)
ASP-DAC'01 Olivier Coudert IV-52
Crosstalk Avoidance
Neighboring-net switching can cause DR surprises
Trying to solve this problem at DR is far too late!
Passing constraints to DR to avoid routing certain
nets in parallel is easy, but DR is already
- verconstrained!
The right way is to attack the xtalk problem starting
at the proper placement level
ASP-DAC'01 Olivier Coudert IV-53
Crosstalk Avoidance
- Histogram captures the combination of probabilistic routes,
switching activities, and neighboring net probabilities for all nets which may pass through the edge
- Whitespace is now directed at regions of timing-congestion
- Overlay the congestion-edge
models with switching activity windows for corresponding critical nets
time Switching window densities
ASP-DAC'01 Olivier Coudert IV-54
IR drop
- Decrease in supply voltage at the gates
- Due to current flow through the power resistive network
- Effects of IR drop on circuit performance
D Q in
- ut
clk clk in
- ut
- ut signal does not
switch in signal delayed clk with IR Drop without IR Drop
ASP-DAC'01 Olivier Coudert IV-55
Traditional Approach
- Modeling of the power network
extracted from the layout
can be R only, RC
- Use verification tool at post-layout stage
- ften done by simulation at transistor level
very compute resource intensive and time consuming can cause unacceptable delay for tape-out
- Current approach
- ver-design the power network
wastes routing resource, impacts design closure
ASP-DAC'01 Olivier Coudert IV-56
Better Approach
- Solve IR drop at the physical prototyping level
- Fast solution to be included in optimization
- Accurately predict effect on timing, congestion and
interconnect loading
Current source for all currents in the bin
ASP-DAC'01 Olivier Coudert IV-57
IR Drop
- IR Drop analysis
Accurate extraction of power network Power network modeling using MRICE Switching activity of the circuit
VCD file (Verilog Change Dump) Simulate user input vectors Simulate with internal vector generators
- At the physical prototype stage:
Current of cells within each bin will
drive the power network
Thermal color map is used to
indicate the level of IR drop
Power grid can be adjusted
with GUI
ASP-DAC'01 Olivier Coudert IV-58
Electromigration & Self Heating
- Metal interconnect disintegration due to high current density
Can occur for power network and also signal nets
- Important DSM effect
Higher current densities due to increased currents and finer wire
widths/thicknesses
Faster switching is increasing the di/dt’s
- Traditional Approach
Over-design the power network Ignore the signal nets Post-layout verification
ASP-DAC'01 Olivier Coudert IV-59
Electromigration & Self Heating
- Switching activity is used to determine the current
- During clock-tree synthesis, top level wires are automatically
sized to satisfy E/M constraints
- Below 0.25um we expect similar constraints for signal nets
Don’t wait until DR to determine layer assignments or find
extra space for wide wires
Appropriate routing resource assigned by global routing The wire sizes and layers should be modeled at the earliest
possible placement level
ASP-DAC'01 Olivier Coudert IV-60
Inductance
- Inductance effect is becoming more important for designs over
500 MHz
- Both self- and mutual- inductance effects are important
clocks should be analyzed for self-inductance effect bus signals should be analyzed for mutual-inductance
effect
- Simulation is typically used to identify the effect of inductance
ASP-DAC'01 Olivier Coudert IV-61
Design Closure
Placement/synthesis/routing interaction Congestion Timing Optimization Clock design Power design Signal integrity Design signoff Problem size & Computational resources
ASP-DAC'01 Olivier Coudert IV-62
Model refinement
- At the physical prototype level, timing becomes meaningful,
and can be optimized
- Circuit refinement proceeds:
smaller clusters and bins clock and power routing is refined logic optimization continues on a more local scale target timing, congestion, area, … …eventually power and signal integrity
ASP-DAC'01 Olivier Coudert IV-63
Design Closure
Final static timing analysis Extraction & delay calculation
System RTL Synthesis
Model accuracy time Transformation scale global/estimate local/accurate
Continuity and correlation are keys!
Tim ing Logic opt. Route P l a c e
ASP-DAC'01 Olivier Coudert IV-64
Delay Calculation
DSM Design Signoff
Timing Route Place Clock Static Timing Analysis Synthesis +
- pt. floorplan
RTL Design signoff can only be done when DSM timing & congestion can be properly estimated: at the physical prototype level No physical information at that level Physical implementation Timing, congestion, clock, etc, predictable at that level
ASP-DAC'01 Olivier Coudert IV-65
Conclusion
A one-pass gate-level to GDSII design closure
solution has to:
simultaneously and continuously monitor and
- ptimize all design variables:
timing, area, congestion, clock tree, scan chain, power, ...
cover all DSM physical effects:
cross talk, electromigration, IR drop, ...
A long term solution should
be scalable handle block based and hierarchical design deliver a hierarchical design signoff
ASP-DAC'01 Olivier Coudert IV-66
Perspectives
Synthesis at the gate level is now also about
placement and routing
- The future?
(1) Fast behavioral synthesis + floorplanning (2) Fast RTL to gate synthesis (3) Physical prototyping signoff (4) Physical implementation
New problems: Placement/synthesis/routing interaction Physical synthesis (includes placement) Hierarchy