Physical Design Closure Physical Design Closure Olivier Coudert - - PDF document

physical design closure physical design closure
SMART_READER_LITE
LIVE PREVIEW

Physical Design Closure Physical Design Closure Olivier Coudert - - PDF document

Physical Design Closure Physical Design Closure Olivier Coudert Monterey Design System DAC 2000 DAC2000 (C) Monterey Design Systems 1 DSM Dilemma SOC DSM Time to market Higher resistance Abstraction Million gates


slide-1
SLIDE 1

1

DAC2000 (C) Monterey Design Systems 1

Physical Design Closure Physical Design Closure

DAC 2000

Olivier Coudert

Monterey Design System

DAC2000 (C) Monterey Design Systems 2

DSM Dilemma

SOC Time to market Million gates High density, larger die Higher clock speeds Long wires Project management Re-use, IPs Larger database Larger design space

Need abstraction levels to manage complexity Require detailed analyses to understand physical interactions

Accuracy DSM Higher resistance Higher cross- coupling Non-linear timing Power Electromigration IR Drop Inductances etc ... Abstraction

slide-2
SLIDE 2

2

DAC2000 (C) Monterey Design Systems 3

Top 10 Impediments to Design Closure

Strong placement/timing dependency Timing/congestion interaction Timing signoff Signal integrity Power design Problem size Computational resources Clock design Modeling accuracy Marketing hype DAC2000 (C) Monterey Design Systems 4

Top 10 Impediments to Design Closure

Strong placement/timing dependency Timing/congestion interaction Timing signoff Signal integrity Power design Problem size Computational resources Clock design Modeling accuracy Marketing hype

slide-3
SLIDE 3

3

DAC2000 (C) Monterey Design Systems 5

Timing & Placement

  • Interconnect dominance makes DSM netlist signoff difficult
  • Wireload models were ALWAYS inaccurate

Post-synthesis signoff was possible when interconnect

contributed ~20% of the total capacitance

But now the interconnect-C is becoming dominant over the

total-C with each new process generation

0.05 0.1 0.15 0.2 0.25 0.3 1992 1995 1998 2001 2004 2007

Wire Cap. (fF/um)

DAC2000 (C) Monterey Design Systems 6

Long-Wire Problems

  • For DSM designs the metal resistance further complicates

timing prediction and closure for the global wires

Average long-wire length is not scaling with new

technologies since the systems are becoming bigger

Occurrence Rate (Normalized)

die size wire length

~0.5

Local wires Global wires

slide-4
SLIDE 4

4

DAC2000 (C) Monterey Design Systems 7

Top 10 Impediments to Design Closure

Strong placement/timing dependency Timing/congestion interaction Timing signoff Signal integrity Power design Problem size Computational resources Clock design Modeling accuracy Marketing hype DAC2000 (C) Monterey Design Systems 8

  • Quadratic placement

fast restricted cost function, e.g., timing driven placement

mimicked with net weighting

  • Simulated annealing
  • pen cost function

extremely slow

  • Force directed

semi-open cost function slower than quadratic placement tuning more difficult

  • Bisection (mincut + partitioning)
  • pen cost function

slower than quadratic placement

Placement

slide-5
SLIDE 5

5

DAC2000 (C) Monterey Design Systems 9

Netlist Clustering

  • Start placement by building a hierarchical tree of cell-clusters

from the netlist (hMetis DAC’97)

  • A key to optimal placement is to optimize the size and

locations of these clusters

  • Both functional hierarchy and netlist topology need to be

considered

A C B Netlist F E D

DAC2000 (C) Monterey Design Systems 10

Placement

  • The clusters are sized and placed within partitions and among

megacells

  • Long wires are modeled among partitions, and congestion is

approximated within partitions

Initially, congestion is dominated by local wires Early wireplanning for long wires will not work

slide-6
SLIDE 6

6

DAC2000 (C) Monterey Design Systems 11

Placement

  • This process continues to smaller clusters and smaller

partitions

  • “Long” wires are not “planned”, but are “placed”

probabilistically in terms of where the router is likely to want to route them

DAC2000 (C) Monterey Design Systems 12

Placement

  • This process continues to smaller clusters and smaller

partitions

  • “Long” wires are not “planned”, but are “placed”

probabilistically in terms of where the router is likely to want to route them

slide-7
SLIDE 7

7

DAC2000 (C) Monterey Design Systems 13

Placement

  • This process continues to smaller clusters and smaller

partitions

  • “Long” wires are not “planned”, but are “placed”

probabilistically in terms of where the router is likely to want to route them

DAC2000 (C) Monterey Design Systems 14

Placement

  • One eventually reaches a cluster and partition size for which

timing and congestion are predictable

  • Timing signoff can be done at this level ONLY!
slide-8
SLIDE 8

8

DAC2000 (C) Monterey Design Systems 15

Top 10 Impediments to Design Closure

Strong placement/timing dependency Timing/congestion interaction Timing signoff Signal integrity Power design Problem size Computational resources Clock design Modeling accuracy Marketing hype DAC2000 (C) Monterey Design Systems 16

Placement

  • Cells are non uniformily distributed into bins

Dynamic whitespace allocation addresses congestion at

the global level

slide-9
SLIDE 9

9

DAC2000 (C) Monterey Design Systems 17

Placement

  • Cells are nonuniformily distributed at subfloorplan level

Dynamic whitespace allocation addresses congestion at

the global level

  • Inter- and intra-partition congestion is predictable at this

placement level

DAC2000 (C) Monterey Design Systems 18

Non-Uniform Whitespace Mgmt.

  • Example of whitespace allocation after timing driven

placement and optimization

White Space added to relieve congestion White Space added to relieve congestion White Space added to relieve congestion White Space added to relieve congestion White Space removed to help relieve congestion in other areas White Space removed to help relieve congestion in other areas Movement of cells for timing optimization Movement of cells for timing optimization

slide-10
SLIDE 10

10 10

DAC2000 (C) Monterey Design Systems 19

Placement

  • The placement algorithm generality and common database

provide for the front-to-back logic optimization, control of wiring, etc…

  • These same features provide for powerful ECO capabilities too

Netlist can be adjusted via API at all levels of the placement

progression

Design progress can be viewed and manipulated at every

placement level

DAC2000 (C) Monterey Design Systems 20

Top 10 Impediments to Design Closure

Strong placement/timing dependency Timing/congestion interaction Timing signoff Signal integrity Power design Problem size Computational resources Clock design Modeling accuracy Marketing hype

slide-11
SLIDE 11

11 11

DAC2000 (C) Monterey Design Systems 21

Timing Prediction

  • As the routing models become more precise, so do the timing

predictions for the long wires

  • The timing/delay models and analyses are only as precise as the

physical information

  • New metrics provide excellent correlation from front-end to back-

end

DAC2000 (C) Monterey Design Systems 22

Timing Prediction

  • As the routing models become more precise, so do the timing

predictions for the long wires

  • The timing/delay models and analyses are only as precise as the

physical information

  • New metrics provide excellent correlation from front-end to back-

end

  • Intra-partition wiring delays are accurately predicted at this partition

size too

slide-12
SLIDE 12

12 12

DAC2000 (C) Monterey Design Systems 23

Timing Optimization

  • The first tech mapping was an approximation, since the wiring

capacitances were not known

  • With sufficient physical information at the placement level, we

begin timing optimization

  • Buffers are inserted for shielding, delay and attenuation

DAC2000 (C) Monterey Design Systems 24

Timing Optimization

  • Buffers are added only when it is determined that they will not

have to be removed

  • Global routing is used to place the buffers and inverters
  • Long wires are “seeded” by buffers

Long wire “design” is driven by accurate physical

information

slide-13
SLIDE 13

13 13

DAC2000 (C) Monterey Design Systems 25

Top 10 Impediments to Design Closure

Strong placement/timing dependency Timing/congestion interaction Timing signoff Signal integrity Power design Problem size Computational resources Clock design Modeling accuracy Marketing hype DAC2000 (C) Monterey Design Systems 26

Logic Optimization

“Analytical” approaches

Assume continuous “size” Fast Map a continuous solution onto a discrete library Use oversimplified models (e.g., Elmore delay)

“Refinement” approaches

Can use complex and/or discrete models Can mix a wide range of transformations Slower Strategy/control more difficult

slide-14
SLIDE 14

14 14

DAC2000 (C) Monterey Design Systems 27

Logic Optimization

  • Placement provides enough physical information to accurately

buffer, resize, remap, resynthesize, etc.

  • Yet design is still abstract enough for global exploration
  • E.g.: Logic optimization for “global” congestion relief
  • “Placement” is coarse enough that resizing in one region does

not require cells to be moved in another

  • More effective than completing a placement, feeding back

custom wireload models, and iterating…

DAC2000 (C) Monterey Design Systems 28

Logic Optimization

Buffering can help in reducing congestion too

1 2 3 4 5

Critical path

6 7 8

  • Buffering targets slope fixing and timing
  • Several algorithm, slack, delay, and slope driven

Shielding buffer for timing optimization

slide-15
SLIDE 15

15 15

DAC2000 (C) Monterey Design Systems 29

Logic Optimization

  • More aggressive for critical paths, e.g., logic collapsing and

decomposition, logic duplication and logic sharing, logic remapping, logic resynthesis 2

2 1

path 1 path 2 both paths 1 & 2 are critical

DAC2000 (C) Monterey Design Systems 30

Logic Optimization

  • The generality of the placement algorithm allows logic
  • ptimization to continue throughout the flow

No net constraints Continual monitoring of “what is critical”

  • Includes simple logic restructuring for congestion relief:
slide-16
SLIDE 16

16 16

DAC2000 (C) Monterey Design Systems 31

Top 10 Impediments to Design Closure

Strong placement/timing dependency Timing/congestion interaction Timing signoff Signal integrity Power design Problem size Computational resources Clock design Modeling accuracy Marketing hype DAC2000 (C) Monterey Design Systems 32

Clock Distribution

  • Most clock tree synthesis algorithms attempt to build the clock

tree post-placement

This is too late – congestion could disturb timing closure But you can’t build it too early, since you don’t know where

the latches are

Clock Routing Clock Tree Generation Placement Floorplanning Synthesis

slide-17
SLIDE 17

17 17

DAC2000 (C) Monterey Design Systems 33

Clock Distribution

  • The placement should provide enough information to know the

distribution of latches, but should be abstract enough to avoid being trapped by congestion caused by the clock wiring

DAC2000 (C) Monterey Design Systems 34

Clock Distribution

  • First clock tree is created with the clock pins distribution

A complete buffered/gated tree can be automatically

synthesized

The user has the option to instantiate the top portions of a

tree based on the distribution of latches and flipflops

slide-18
SLIDE 18

18 18

DAC2000 (C) Monterey Design Systems 35

Clock Distribution

  • This clock tree congestion is used to predict the overall

congestion, since the latch distribution will not change substantially from this point forward

  • As the lower portions of the clock tree continue to grow, the

top levels of the tree take root

The top levels will continue to adjust slightly as the

placement and optimization processes continue

DAC2000 (C) Monterey Design Systems 36

Clock Distribution

  • Accurate timing projections enable useful skew methods to be

applied at this level

  • Placement is still coarse enough so that objects with common-

skew targets can be grouped

slide-19
SLIDE 19

19 19

DAC2000 (C) Monterey Design Systems 37

Top 10 Impediments to Design Closure

Strong placement/timing dependency Timing/congestion interaction Timing signoff Signal integrity Power design Problem size Computational resources Clock design Modeling accuracy Marketing hype DAC2000 (C) Monterey Design Systems 38

Power/Ground Distribution

  • The placement also provides sufficient information to judge the

quality and integrity of the power/ground network

  • Power/ground network can have a huge impact on congestion
  • Power rail currents will not change much as the placement is refined
  • Yet there is enough space to add/widen stripes
  • API driven adjustment using incremental IR-drop analyses
  • Ultimately this optimization process can be automated
slide-20
SLIDE 20

20 20

DAC2000 (C) Monterey Design Systems 39

Power/Ground Distribution

Eventually automation process will have to consider

more detailed analysis too:

Inductance of chip and packaging Resonance frequencies via ac analyses On-chip decoupling DAC2000 (C) Monterey Design Systems 40

Top 10 Impediments to Design Closure

Strong placement/timing dependency Timing/congestion interaction Timing signoff Signal integrity Power design Problem size Computational resources Clock design Modeling accuracy Marketing hype

slide-21
SLIDE 21

21 21

DAC2000 (C) Monterey Design Systems 41

Model refinement

  • Once the quadrisection level’s results are acceptable, we

proceed with a similar partition-based placement strategy

  • The cost function includes timing, area, congestion, power,

and eventually xtalk and signal integrity

There are no timing constraints fed forward!

  • Logic optimization, buffering, whitespace allocation, etc., all

continue on a more local scale

DAC2000 (C) Monterey Design Systems 42

Design Closure

Final static timing analysis Extraction & delay calculation

System RTL Synthesis

Model accuracy time Transformation scale global/estimate local/accurate

Continuity and correlation are keys!

Tim ing Logic opt. Route Place

slide-22
SLIDE 22

22 22

DAC2000 (C) Monterey Design Systems 43

Top 10 Impediments to Design Closure

Strong placement/timing dependency Timing/congestion interaction Timing signoff Signal integrity Power design Problem size Computational resources Clock design Modeling accuracy Marketing hype DAC2000 (C) Monterey Design Systems 44

logical domain physical domain

Delay Calculation Extraction

Pre-DSM Design Flow

Routing Synthesis Clock Tree Placement Static Timing Analysis

statistical WLM

RTL

timing library (SDF, RC’s)

Netlist Signoff

custom WLM

slide-23
SLIDE 23

23 23

DAC2000 (C) Monterey Design Systems 45

Delay Calculation

DSM Design Signoff

Timing Route P l a c e Remap Static Timing Analysis Synthesis +

  • pt. floorplan

RTL No physical information at that level Physical implementation Timing, congestion, clock, etc, predictable at that level

DAC2000 (C) Monterey Design Systems 46

Delay Calculation

DSM Design Signoff

Timing Route P l a c e Remap Static Timing Analysis Synthesis +

  • pt. floorplan

RTL Design signoff can only be done when DSM timing & congestion can be properly estimated: physical prototype level No physical information at that level Physical implementation Timing, congestion, clock, etc, predictable at that level

slide-24
SLIDE 24

24 24

DAC2000 (C) Monterey Design Systems 47

Top 10 Impediments to Design Closure

Strong placement/timing dependency Timing/congestion interaction Timing signoff Signal integrity Power design Problem size Computational resources Clock design Modeling accuracy Marketing hype DAC2000 (C) Monterey Design Systems 48

Routing

Requirements for the DSM Router:

N-layer shape-based router Supports gridless and gridded routing Variable wire width for optimal delay constraints Cross-talk avoidance, antenna effects Clock tree sizing for tree balancing Power routing sizing for voltage drop and

electromigration

slide-25
SLIDE 25

25 25

DAC2000 (C) Monterey Design Systems 49

Routing Correlation

Global routing can utilize the whitespace to avoid

long-distance couplings for critical nets

Extra spacing, shielding, or space for rip-up and

reroute

No surprises for the detailed router after GR

Shape-based gridless area router Timing and xtalk aware Spacing wires nonuniformily within and among

layers is handled without loss of generality

Router capabilities are also critical for delay

  • ptimization and satisfying reliability constraints

DAC2000 (C) Monterey Design Systems 50

Crosstalk

Coupling vs. I nter- layer capacitance 2 4 6 8 1997 2001 2006 2009 2012 Cc/ Cs

Source: 1998 Update, International Technology Roadmap for Semiconductors

Fact: the same layer coupling capacitance is

beginning to dominate the total net capacitance

Makes cross-talk a dominant factor in

achieving timing closure

slide-26
SLIDE 26

26 26

DAC2000 (C) Monterey Design Systems 51

Crosstalk

Neighboring-net switching can cause DR surprises

Trying to solve this problem at DR is far too late!

Passing constraints to Detailed Routing to avoid

routing certain nets in parallel is easy, but DR is already overconstrained!

The right way is to attack the xtalk problem starting

at the proper placement level

DAC2000 (C) Monterey Design Systems 52

Crosstalk Delay Impact

  • Simply modeling the coupling capacitance as grounded

capacitance scaled by ~2x is overly pessimistic

  • Timer should model early and late arrival times at all nodes (for

each library) so that worst/best case switching can be determined during path traversal

TACO: Timing Analysis with Coupling (DAC 2000)

slide-27
SLIDE 27

27 27

DAC2000 (C) Monterey Design Systems 53

Electromigration

  • During clock-tree synthesis, top level wires are automatically

sized to satisfy E/M constraints

  • Below 0.25um we expect similar constraints for signal nets

Don’t wait until DR to determine layer assignments or find

extra space for wide wires

The wire sizes and layers should be modeled at the earliest

possible placement level

DAC2000 (C) Monterey Design Systems 54

IR drop

P = Pnet + Pint + Pleak Simulation and/or probabilistic based dynamic power

evaluation

Power distribution at the chip level, along with the

quadrisection level

Consequently power distribution can be optimized

along with the other design variables

slide-28
SLIDE 28

28 28

DAC2000 (C) Monterey Design Systems 55

Top 10 Impediments to Design Closure

Strong placement/timing dependency Timing/congestion interaction Timing signoff Signal integrity Power design Problem size Computational resources Clock design Modeling accuracy Marketing hype

Moore’s Law: Tapering off?

1 10 100 1000 10000 100000 1970 1975 1980 1985 1990 1995 2000 2005

Year

Thousands of transistors

4004 8086 80286 80386 80486 Pentium P.Pro Merced 1 10 100 1000 10000 100000 1970 1975 1980 1985 1990 1995 2000 2005

Year

Thousands of transistors

4004 8086 80286 80386 80486 Pentium P.Pro Merced

2x in 2 years 2.5 years

slide-29
SLIDE 29

29 29

Parallel Processing

Parallel processing: The process of breaking a

problem into multiple pieces and executing them simultaneously

Speedup: Let T(n) = wall-clock time for executing

the original task on n processors. Then speedup is T(1)/T(n)

Speedup depends on:

load balancing inter-processor communication scheduling

Load balancing

Objectives

Evenly balance computational loads among

available processors

Minimize inter-processor communication

This is a hard problem

Load balancing is a NP-complete! The time taken to load-balance should be a

fraction of the total process time

slide-30
SLIDE 30

30 30

Job Scheduling

p q p p p q 10 independent Jobs, 2 Processors p p p p T(p) = 1 T(q) = 10 Serial runtime: 8 * 1 + 2 * 10 = 28 Thread scheduling chart 10 20 T2 T1

Poor scheduling is detrimental for speedup

S(2) = 1.47 p q p p p q p p p p Reschedule

Improving Job Scheduling

p q p p p q 10 independent Jobs, 2 Processors p p p p Thread scheduling chart 10 20 T2 T1

Simple scheduling algorithms improve speedup

S(2) = 2.00

slide-31
SLIDE 31

31 31

Inter-job Communication

k k 1 2 p q p q

Reducing job-communication improves scaling

Partition the problem!

jobs

Global Routing

“Global” doesn’t lend itself to parallelism “q” is a very big portion of each task

q = updating of “global” congestion map

Quality vs Speed trade of:

Lazy update Multi-level partitioning ...

k k 1 2 p q p q

slide-32
SLIDE 32

32 32

Global Routing: Lazy Update

Algorithm:

Each parallel task gets a list of nets to be routed While routing a net, “Global” congestion map

represents an earlier state

After a while, routing stops and congestion map

is updated

Cons:

Quality degradation Possibility of slowing convergence due to

delayed congestion map

Global Routing: Multi-level partitioning

Algorithm:

Divide routing area into partitions, at each level

partitions are non-overlapping

Levels could be 1x1, 2x2, 3x3, 5x5 ...

slide-33
SLIDE 33

33 33

Global Routing: Multi-level partitioning

Algorithm (cnt’d):

At each level, routing within partitions can be

threaded.

Detail Routing

Detail Router optimizes local interaction of routes Localized, thus simple partition based threading

scheme:

Divide chip into small partitions Instantiate router on partitions in parallel

Quasi-linear speed-up

slide-34
SLIDE 34

34 34

Detail Routing

In reality, partitions will be overlapping

Better quality near partition boundaries Can not route adjacent partitions concurrently To minimize locks, need a scheduler

Speedup (n=4)

Global Placement Congestion modeling Place - Logic interaction Sizing Buffering Technology mapping Static Timing Analysis (with crosstalk) Clock generation Power topology construction Detailed placement Global routing (with crosstalk ) Shape-based detailed routing (with crosstalk) 4 3 2 1

slide-35
SLIDE 35

35 35

DAC2000 (C) Monterey Design Systems 69

Top 10 Impediments to Design Closure

Strong placement/timing dependency Timing/congestion interaction Timing signoff Signal integrity Power design Problem size Computational resources Clock design Modeling accuracy Marketing hype