The Quest for design closure The Quest for design closure Olivier - - PowerPoint PPT Presentation

the quest for design closure the quest for design closure
SMART_READER_LITE
LIVE PREVIEW

The Quest for design closure The Quest for design closure Olivier - - PowerPoint PPT Presentation

Part IV Part IV The Quest for design closure The Quest for design closure Olivier Coudert Monterey Design System ASP-DAC 2001, tutorial 3 ASP-DAC'01 Olivier Coudert IV-1 DSM Dilemma SOC DSM Time to market Higher resistance Abstraction


slide-1
SLIDE 1

ASP-DAC'01 Olivier Coudert IV-1

The Quest for design closure The Quest for design closure

ASP-DAC 2001, tutorial 3

Olivier Coudert

Monterey Design System

Part IV Part IV

slide-2
SLIDE 2

ASP-DAC'01 Olivier Coudert IV-2

DSM Dilemma

SOC Time to market Million gates High density, larger die Higher clock speeds Long wires Project management Re-use, IPs Larger database Larger design space

Need abstraction levels to manage complexity Require detailed analyses to understand physical interactions

Accuracy DSM Higher resistance Higher cross- coupling Non-linear timing Power Electromigration IR Drop Inductances etc ... Abstraction

slide-3
SLIDE 3

ASP-DAC'01 Olivier Coudert IV-3

Logic Synthesis Flow

RTL Behavioral spec. Behavioral synthesis Logic synthesis Layout Gate level netlist

while (x<a) do while (x<a) do x1:= x + x1:= x + dx dx; ; u1:= u u1:= u -

  • (3*x*u*

(3*x*u*dx dx) ) -

  • (3*y*

(3*y*dx dx); ); y1:= y + (u* y1:= y + (u*dx dx); ); x:= x1; u:= u1; y:= y1; x:= x1; u:= u1; y:= y1; endwhile endwhile

RC: = ALU 1(RX, a, comp); RC: = ALU 1(RX, a, comp); wait until clock AND RC; wait until clock AND RC; RX1 := ALU1 (RX, RDX, ADD); RX1 := ALU1 (RX, RDX, ADD); RT1 := MULT1(RU, RX); RT1 := MULT1(RU, RX); RT2 := MULT 2(3, RDX); RT2 := MULT 2(3, RDX); wait until clock; wait until clock; RT3 := MULT1(RT1, RT2); RT3 := MULT1(RT1, RT2); RT4:= MULT2(RT2, RY); RT4:= MULT2(RT2, RY);

Synthesis done with WLM and Elmore delay

slide-4
SLIDE 4

ASP-DAC'01 Olivier Coudert IV-4

( )

4 3 2 1 1 C C C C R T 4 4 ) 4 3 2 ( 2 C R C C C R + + + + + + + =

R4 C4 R1 R2 R3 C1 C2 C3

Limits of Elmore Delay

As R increases, Elmore delay becomes inaccurate,

and cannot be trusted for guiding optimizations

Elmore delay to C4 is independent of R3!

slide-5
SLIDE 5

ASP-DAC'01 Olivier Coudert IV-5

2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0 1 6 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 % error

m ax

1 0 0 ps rise tim es

Elmore Errors

1200 delay comparisons from a 0.35 µm CMOS mP

slide-6
SLIDE 6

ASP-DAC'01 Olivier Coudert IV-6

Wire Load Model

Based on number of pins of the nets

# k-pin nets C

Good prediction of average/median capacitance… …But very large variance

slide-7
SLIDE 7

ASP-DAC'01 Olivier Coudert IV-7

Transistor vs. Wire Delays

Technology Generation (µ) Technology Generation (µ) Delay Delay ( (psec psec) ) 1 10 100 1000 1.5 1 0.8 0.6 0.35 0.25 0.18 0.1 Transistor Metal 2 (2mm)

slide-8
SLIDE 8

ASP-DAC'01 Olivier Coudert IV-8

Timing & Interconnect

Wireload models were ALWAYS inaccurate

Post-synthesis signoff was possible when

interconnect contributed ~20% of the total capacitance

But now the interconnect capacitance is

becoming dominant over the total capacitance with each new process generation

slide-9
SLIDE 9

ASP-DAC'01 Olivier Coudert IV-9

Design Closure

Placement/synthesis/routing interaction Congestion Timing Optimization Clock design Power design Signal integrity Design signoff Problem size & Computational resources

slide-10
SLIDE 10

ASP-DAC'01 Olivier Coudert IV-10

Placed & routed netlist Detailed placed netlist Gate level netlist Instanciate architecture and perform technology mapping

Timing in the pre-DSM flows

RTL to gate-level performance driven synthesis, then P&R RTL Limited sizing interconnect information #fanout detailed topology accurate estimation timing good estimation accurate netlist flexibility high low

slide-11
SLIDE 11

ASP-DAC'01 Olivier Coudert IV-11

Placed & routed netlist Detailed placed netlist Gate level netlist Instanciate architecture and perform technology mapping

Timing in a DSM flow

Timing is known after placement: synthesis and P&R cannot be independent RTL Limited sizing interconnect information #fanout detailed topology accurate estimation timing large variance accurate netlist flexibility high low

slide-12
SLIDE 12

ASP-DAC'01 Olivier Coudert IV-12

Placed & routed netlist Detailed placed netlist Gate level netlist Instanciate architecture and perform technology mapping

Timing in a DSM flow

RTL Limited sizing interconnect information #fanout detailed topology accurate estimation timing large variance accurate netlist flexibility high low Need enough P&R information, and enough netlist flexibility good estimation medium

slide-13
SLIDE 13

ASP-DAC'01 Olivier Coudert IV-13

  • Quadratic placement

fast restricted cost function, e.g., timing driven placement with

net weighting

  • Simulated annealing
  • pen cost function

extremely slow

  • Force directed

semi-open cost function slower than quadratic placement tuning more difficult

  • Bisection (mincut + partitioning)
  • pen cost function

slower than quadratic placement

Placement

slide-14
SLIDE 14

ASP-DAC'01 Olivier Coudert IV-14

Placement/Synthesis/Routing

placement timing synthesis congestion area route

Placement is needed to derive routing Routing is needed to derive timing Cell placement and net topology must be flexible to allow synthesis

slide-15
SLIDE 15

ASP-DAC'01 Olivier Coudert IV-15

Netlist Clustering

  • Start placement by building a hierarchical tree of cell-clusters

from the netlist (h-Metis DAC’97)

  • A key to optimal placement is to optimize the size and

locations of these clusters

  • Both functional hierarchy and netlist topology need to be

considered

A C B Netlist F E D

slide-16
SLIDE 16

ASP-DAC'01 Olivier Coudert IV-16

Placement

  • The clusters are sized and placed within bins and among

megacells

  • Minimize:

wirelength Intra-bin and inter-bins congestion

slide-17
SLIDE 17

ASP-DAC'01 Olivier Coudert IV-17

Placement

  • This process continues to smaller clusters and smaller bins
  • Long wires are probabilistically routed
slide-18
SLIDE 18

ASP-DAC'01 Olivier Coudert IV-18

Placement

  • Eventually one reaches a cluster and bin size for which timing

and congestion are predictable (~ 1k to 10k gates per bin)

  • physical prototype:

Timing optimization can start at this level ONLY Timing signoff can be done at this level ONLY

slide-19
SLIDE 19

ASP-DAC'01 Olivier Coudert IV-19

Design Closure

Placement/synthesis/routing interaction Congestion Timing Optimization Clock design Power design Signal integrity Design signoff Problem size & Computational resources

slide-20
SLIDE 20

ASP-DAC'01 Olivier Coudert IV-20

Placement & Congestion

  • Cells are nonuniformily distributed within bins

Dynamic whitespace allocation addresses congestion at

the global level

  • Inter- and intra-bin congestion is predictable at the physical

prototype level

slide-21
SLIDE 21

ASP-DAC'01 Olivier Coudert IV-21

Non-Uniform Whitespace Mgmt.

  • Example of whitespace allocation after timing driven

placement and optimization

White Space added to relieve congestion White Space added to relieve congestion White Space added to relieve congestion White Space added to relieve congestion White Space removed to help relieve congestion in other areas White Space removed to help relieve congestion in other areas Movement of cells for timing optimization Movement of cells for timing optimization

slide-22
SLIDE 22

ASP-DAC'01 Olivier Coudert IV-22

Congestion Management

  • DSM creates a

significant timing/congestion dependency

  • Carefully manage

congestion so that there are no surprises at DR stage!

  • Wiring models and

congestion estimates are strongly correlated from placement, through GR to DR

slide-23
SLIDE 23

ASP-DAC'01 Olivier Coudert IV-23

Routing Correlation

Global routing can utilize the whitespace to avoid

long-distance couplings for critical nets

Extra spacing, shielding, or space for rip-up and

reroute

No surprises for the detailed router after GR

Advanced N-layer shape-based router Supports gridless and gridded routing Variable wire width for optimal delay constraints Cross-talk avoidance, antenna effects Clock tree sizing for tree balancing Power routing sizing for voltage drop and

electromigration

slide-24
SLIDE 24

ASP-DAC'01 Olivier Coudert IV-24

Design Closure

Placement/synthesis/routing interaction Congestion Timing Optimization Clock design Power design Signal integrity Design signoff Problem size & Computational resources

slide-25
SLIDE 25

ASP-DAC'01 Olivier Coudert IV-25

Timing Prediction

  • As the routing models become more precise, so do the timing

predictions for the long wires

The timing/delay models and analyses are only as precise

as the physical information

Enforce correlation from front-end to back-end

slide-26
SLIDE 26

ASP-DAC'01 Olivier Coudert IV-26

Timing Optimization

  • The first tech mapping was an approximation, since the wiring

capacitances were not known

  • With sufficient physical information at the placement level, we

begin timing optimization

slide-27
SLIDE 27

ASP-DAC'01 Olivier Coudert IV-27

Placement/Synthesis/Routing

  • Buffers are inserted for shielding, delay, and attenuation
  • Global routing is used to place the buffers and inverters
  • The design of long nets is “seeded” by buffers, driven by

accurate physical information

slide-28
SLIDE 28

ASP-DAC'01 Olivier Coudert IV-28

Incremental Synthesis Requirements

  • “partial” placement and routing
  • placement and routing must accommodate for the incremental

logic changes

  • local transformation
  • accurate delay estimation

input slope and output capacitance dependent Interconnect delay rising and falling signals crosstalk aware

  • efficient incremental timing analysis
slide-29
SLIDE 29

ASP-DAC'01 Olivier Coudert IV-29

Incremental Synthesis

  • Load and driver strength adaptation

Sizing Buffering Pin swapping Cloning

  • Timing boundary shifting

Transparent latch Retiming Useful skew

  • Technology remapping
  • Re-synthesis
  • Redundancies based optimization
  • Area recovery
slide-30
SLIDE 30

ASP-DAC'01 Olivier Coudert IV-30

Logic Optimization

“Analytical” approaches

Assume continuous size Fast Map a continuous solution onto a discrete library Use simple models (e.g., Elmore delay)

“Refinement” approaches

Can use complex and/or discrete models Can mix a wide range of transformations Slower Strategy/control more difficult

slide-31
SLIDE 31

ASP-DAC'01 Olivier Coudert IV-31

Resizing

C

delay s1 s2 s3 s C < C1 C1 < C < C3 C > C3

C

delay

Load fitting is a rough resizing

s s4

C1 C2 C3

slide-32
SLIDE 32

ASP-DAC'01 Olivier Coudert IV-32

Resizing

C

delay s s

pin cap

slide-33
SLIDE 33

ASP-DAC'01 Olivier Coudert IV-33

  • output load dependent

Resizing

C

delay

non-convex

s

  • rising/falling signal
  • non-monotonic input pin cap
  • input slope dependent

s

slide-34
SLIDE 34
  • Explore local solutions using a host
  • f optimizations

Buffering, sizing, remapping, etc...

Logic Optimization

  • The selection criteria target “problem regions”:

Timing critical regions Timing non-critical regions Congested regions … Selection Selection Exploration Exploration Application Application converge? converge?

Need to keep boundary conditions stable during the exploration phase

  • The final objectives include:

Fix slope violations Maximize global slack Congestion relief Area recovery …

slide-35
SLIDE 35

Logic Optimization: Sizing

Filter timing paths Filter timing paths Explore sizes Explore sizes Apply solution Apply solution converge? converge?

  • Each gate and a suitable window

is extracted

  • Explore different sizes for the

gate and keep the best

  • Always focused on the

current critical paths

slide-36
SLIDE 36

ASP-DAC'01 Olivier Coudert IV-36

S = −0.1

Useful Skew

S > 0.1 0.3 0.4 0.4 S = 0.0 S > 0.0 +0.1

slide-37
SLIDE 37

ASP-DAC'01 Olivier Coudert IV-37

Placement/Synthesis/Routing

  • The flexibility of the placement and the continuous refinement

allows logic optimization to continue throughout the flow

Continual monitoring of “what is critical” From extensive to local logic optimization

Global buffering & sizing Resynthesis/techmap Re-buffering & sizing Focused techmap & logic transformations Re-buffering & sizing

slide-38
SLIDE 38

ASP-DAC'01 Olivier Coudert IV-38

Design Closure

Placement/synthesis/routing interaction Congestion Timing Optimization Clock design Power design Signal integrity Design signoff Problem size & Computational resources

slide-39
SLIDE 39

ASP-DAC'01 Olivier Coudert IV-39

Clock Distribution

  • Most clock tree synthesis algorithms attempt to build the clock

tree post-placement

This is too late – congestion could disturb timing closure But you can’t build it too early, since you don’t know where

the latches are

Clock Routing Clock Tree Generation Placement Floorplanning Synthesis

slide-40
SLIDE 40

ASP-DAC'01 Olivier Coudert IV-40

Clock Distribution

  • The placement should provide enough information to know the

distribution of latches, but should be abstract enough to avoid being trapped by congestion caused by the clock wiring

slide-41
SLIDE 41

ASP-DAC'01 Olivier Coudert IV-41

Clock Distribution

  • First clock tree is created at the physical prototyping level

A complete buffered/gated tree is automatically

synthesized

The user has the option to instantiate the top portions of a

tree based on the distribution of latches and flipflops

slide-42
SLIDE 42

ASP-DAC'01 Olivier Coudert IV-42

Clock Distribution

  • The contribution of the clock tree to the congestion is taken

into account as early as it is meaningful

  • The latch and flip-flop distribution will not change dramatically

after the physical prototype level

  • The clock tree will be refined and adjust as the placement and
  • ptimization processes continue
slide-43
SLIDE 43

ASP-DAC'01 Olivier Coudert IV-43

Clock Distribution

  • Accurate timing projections enable useful skew methods to be

applied at this level

  • Placement is still coarse enough so that objects with common-

skew targets can be grouped

slide-44
SLIDE 44

ASP-DAC'01 Olivier Coudert IV-44

Design Closure

Placement/synthesis/routing interaction Congestion Timing Optimization Clock design Power design Signal integrity Design signoff Problem size & Computational resources

slide-45
SLIDE 45

ASP-DAC'01 Olivier Coudert IV-45

Power/Ground Distribution

  • The physical prototype also provides sufficient information to judge

the quality and integrity of the power/ground network

Power/ground network can have a huge impact on congestion

  • Power rail currents will not change much as the placement is refined
  • Yet there is enough space to add/widen stripes

API driven adjustment using incremental IR-drop analyses Ultimately this optimization process can be automated

slide-46
SLIDE 46

ASP-DAC'01 Olivier Coudert IV-46

Power/Ground Distribution

Eventually automation process will have to consider

more detailed analysis too:

Inductance of chip and packaging Resonance frequencies via ac analyses On-chip decoupling

slide-47
SLIDE 47

ASP-DAC'01 Olivier Coudert IV-47

Design Closure

Placement/synthesis/routing interaction Congestion Timing Optimization Clock design Power design Signal integrity Design signoff Problem size & Computational resources

slide-48
SLIDE 48

ASP-DAC'01 Olivier Coudert IV-48

Crosstalk

Coupling vs. Inter-layer capacitance 2 4 6 8 1997 2001 2006 2009 2012 Cc/ Cs

Source: 1998 Update, International Technology Roadmap for Semiconductors

The same layer coupling capacitance is

beginning to dominate the total net capacitance

Makes cross-talk a dominant factor in

achieving timing closure

slide-49
SLIDE 49

ASP-DAC'01 Olivier Coudert IV-49

  • Traditional model

replace by grounded cap with twice the value is inaccurate

Crosstalk - modeling

VIC AGG AGG

Different directions same directions 2x cap

slide-50
SLIDE 50

ASP-DAC'01 Olivier Coudert IV-50

Crosstalk - modeling

Aggressor switches at different times Delay of victim net depends

  • n the switching time of the

aggressor nets

  • Dependency on arrival times of the signals
slide-51
SLIDE 51

ASP-DAC'01 Olivier Coudert IV-51

Crosstalk Delay Impact

  • Simply modeling the coupling capacitance as grounded

capacitance scaled by ~2x is overly pessimistic

  • Timer should model early and late arrival times at all nodes (for

each library) so that worst/best case switching can be determined during path traversal

TACO: Timing Analysis with Coupling (DAC 2000)

slide-52
SLIDE 52

ASP-DAC'01 Olivier Coudert IV-52

Crosstalk Avoidance

Neighboring-net switching can cause DR surprises

Trying to solve this problem at DR is far too late!

Passing constraints to DR to avoid routing certain

nets in parallel is easy, but DR is already

  • verconstrained!

The right way is to attack the xtalk problem starting

at the proper placement level

slide-53
SLIDE 53

ASP-DAC'01 Olivier Coudert IV-53

Crosstalk Avoidance

  • Histogram captures the combination of probabilistic routes,

switching activities, and neighboring net probabilities for all nets which may pass through the edge

  • Whitespace is now directed at regions of timing-congestion
  • Overlay the congestion-edge

models with switching activity windows for corresponding critical nets

time Switching window densities

slide-54
SLIDE 54

ASP-DAC'01 Olivier Coudert IV-54

IR drop

  • Decrease in supply voltage at the gates
  • Due to current flow through the power resistive network
  • Effects of IR drop on circuit performance

D Q in

  • ut

clk clk in

  • ut
  • ut signal does not

switch in signal delayed clk with IR Drop without IR Drop

slide-55
SLIDE 55

ASP-DAC'01 Olivier Coudert IV-55

Traditional Approach

  • Modeling of the power network

extracted from the layout

can be R only, RC

  • Use verification tool at post-layout stage
  • ften done by simulation at transistor level

very compute resource intensive and time consuming can cause unacceptable delay for tape-out

  • Current approach
  • ver-design the power network

wastes routing resource, impacts design closure

slide-56
SLIDE 56

ASP-DAC'01 Olivier Coudert IV-56

Better Approach

  • Solve IR drop at the physical prototyping level
  • Fast solution to be included in optimization
  • Accurately predict effect on timing, congestion and

interconnect loading

Current source for all currents in the bin

slide-57
SLIDE 57

ASP-DAC'01 Olivier Coudert IV-57

IR Drop

  • IR Drop analysis

Accurate extraction of power network Power network modeling using MRICE Switching activity of the circuit

VCD file (Verilog Change Dump) Simulate user input vectors Simulate with internal vector generators

  • At the physical prototype stage:

Current of cells within each bin will

drive the power network

Thermal color map is used to

indicate the level of IR drop

Power grid can be adjusted

with GUI

slide-58
SLIDE 58

ASP-DAC'01 Olivier Coudert IV-58

Electromigration & Self Heating

  • Metal interconnect disintegration due to high current density

Can occur for power network and also signal nets

  • Important DSM effect

Higher current densities due to increased currents and finer wire

widths/thicknesses

Faster switching is increasing the di/dt’s

  • Traditional Approach

Over-design the power network Ignore the signal nets Post-layout verification

slide-59
SLIDE 59

ASP-DAC'01 Olivier Coudert IV-59

Electromigration & Self Heating

  • Switching activity is used to determine the current
  • During clock-tree synthesis, top level wires are automatically

sized to satisfy E/M constraints

  • Below 0.25um we expect similar constraints for signal nets

Don’t wait until DR to determine layer assignments or find

extra space for wide wires

Appropriate routing resource assigned by global routing The wire sizes and layers should be modeled at the earliest

possible placement level

slide-60
SLIDE 60

ASP-DAC'01 Olivier Coudert IV-60

Inductance

  • Inductance effect is becoming more important for designs over

500 MHz

  • Both self- and mutual- inductance effects are important

clocks should be analyzed for self-inductance effect bus signals should be analyzed for mutual-inductance

effect

  • Simulation is typically used to identify the effect of inductance
slide-61
SLIDE 61

ASP-DAC'01 Olivier Coudert IV-61

Design Closure

Placement/synthesis/routing interaction Congestion Timing Optimization Clock design Power design Signal integrity Design signoff Problem size & Computational resources

slide-62
SLIDE 62

ASP-DAC'01 Olivier Coudert IV-62

Model refinement

  • At the physical prototype level, timing becomes meaningful,

and can be optimized

  • Circuit refinement proceeds:

smaller clusters and bins clock and power routing is refined logic optimization continues on a more local scale target timing, congestion, area, … …eventually power and signal integrity

slide-63
SLIDE 63

ASP-DAC'01 Olivier Coudert IV-63

Design Closure

Final static timing analysis Extraction & delay calculation

System RTL Synthesis

Model accuracy time Transformation scale global/estimate local/accurate

Continuity and correlation are keys!

Tim ing Logic opt. Route P l a c e

slide-64
SLIDE 64

ASP-DAC'01 Olivier Coudert IV-64

Delay Calculation

DSM Design Signoff

Timing Route Place Clock Static Timing Analysis Synthesis +

  • pt. floorplan

RTL Design signoff can only be done when DSM timing & congestion can be properly estimated: at the physical prototype level No physical information at that level Physical implementation Timing, congestion, clock, etc, predictable at that level

slide-65
SLIDE 65

ASP-DAC'01 Olivier Coudert IV-65

Conclusion

A one-pass gate-level to GDSII design closure

solution has to:

simultaneously and continuously monitor and

  • ptimize all design variables:

timing, area, congestion, clock tree, scan chain, power, ...

cover all DSM physical effects:

cross talk, electromigration, IR drop, ...

A long term solution should

be scalable handle block based and hierarchical design deliver a hierarchical design signoff

slide-66
SLIDE 66

ASP-DAC'01 Olivier Coudert IV-66

Perspectives

Synthesis at the gate level is now also about

placement and routing

  • The future?

(1) Fast behavioral synthesis + floorplanning (2) Fast RTL to gate synthesis (3) Physical prototyping signoff (4) Physical implementation

New problems: Placement/synthesis/routing interaction Physical synthesis (includes placement) Hierarchy

synthesis + placement + routing

}