Summary Summary I Iterative flow Iterative flow vs vs. stepwise - - PDF document

summary summary
SMART_READER_LITE
LIVE PREVIEW

Summary Summary I Iterative flow Iterative flow vs vs. stepwise - - PDF document

Gain Gain-based synthesis based synthesis enabler for correct enabler for correct-by by-construction design construction design Patrick Patrick Groeneveld Groeneveld (patrick patrick@magma @magma-da da.com) .com) Magma


slide-1
SLIDE 1

Patrick Groeneveld DAC'2000 1

Gain Gain-based synthesis based synthesis

enabler for ‘correct enabler for ‘correct-by by-construction’ design construction’ design Patrick Patrick Groeneveld Groeneveld

(patrick patrick@magma @magma-da da.com) .com) Magma Design Automation Magma Design Automation Cupertino, CA Cupertino, CA

Patrick Groeneveld DAC'2000 2

Summary Summary

I Iterative flow

Iterative flow vs

  • vs. stepwise refinement

. stepwise refinement

I Derivation of a simple delay model

Derivation of a simple delay model

I Gain based delay optimization

Gain based delay optimization

I Building a tool flow around this model

Building a tool flow around this model

I Standard cell library issues

Standard cell library issues

I Getting timing during routing

Getting timing during routing

I Recommendations

Recommendations

I latest slides at http://

latest slides at http://cas cas.et. .et.tudelft tudelft.nl nl/~ /~patrick patrick/closure /closure

slide-2
SLIDE 2

Patrick Groeneveld DAC'2000 3

Preliminaries Preliminaries

I Timing closure: Obtaining a feasible layout of a circuit

Timing closure: Obtaining a feasible layout of a circuit that meets the given timing specification. that meets the given timing specification.

I Objective: Obtain closure as fast and effortless as

Objective: Obtain closure as fast and effortless as possible. possible.

I Assumptions:

Assumptions:

N ASIC design style.

ASIC design style.

N Standard cell abstraction.

Standard cell abstraction.

N Static CMOS.

Static CMOS.

I Neglect other design issues

Neglect other design issues

Patrick Groeneveld DAC'2000 4

Interconnect Interconnect parasitics parasitics (C and R) (C and R)

I Speed is entirely determined by

Speed is entirely determined by parasitics parasitics

I Parasitics

Parasitics are tiny are tiny

I Parasitics

Parasitics depend on the depend on the exact exact layout layout

I Therefore they are hard or impossible to estimate,

Therefore they are hard or impossible to estimate, especially before placement. especially before placement.

slide-3
SLIDE 3

Patrick Groeneveld DAC'2000 5

Timing Uncertainty Timing Uncertainty

Gate Gate-to to-gate delay depends on: gate delay depends on:

  • Wire length (unknown during synthesis)

Wire length (unknown during synthesis)

  • The layer of the wire (determined during routing)

The layer of the wire (determined during routing)

  • The configuration of the neighboring wires:

The configuration of the neighboring wires: distance, near/far (unknown before detailed routing) distance, near/far (unknown before detailed routing)

  • Timing window and slope of the neighboring wires.

Timing window and slope of the neighboring wires.

Patrick Groeneveld DAC'2000 6

Meeting timing gets harder Meeting timing gets harder

Flip Flip flop flop Flip Flip flop flop Flip Flip flop flop Flip Flip flop flop Flip Flip flop flop

q q q q q q d d 5 5 ns ns max max

slide-4
SLIDE 4

Patrick Groeneveld DAC'2000 7

slack

Timing is a result of the placement Timing is a result of the placement

I The bad news: the worst timing sets the clock speed!

The bad news: the worst timing sets the clock speed!

slack

Cdream s Creal

Patrick Groeneveld DAC'2000 8

Prediction vs reality Prediction vs reality

number of number of nets nets Real delay Real delay - predicted delay predicted delay

Average, Average, wireload model, wireload model, what you what you designed for designed for fastest/best fastest/best slowest/worst slowest/worst

circuit does circuit does not work not work

  • 100%

100% +100% +100%

slide-5
SLIDE 5

Patrick Groeneveld DAC'2000 9

The end of the wire load model The end of the wire load model

I Model is used in

Model is used in coventional coventional synthesis tools synthesis tools

I It guesses load based on the number of pins of the

It guesses load based on the number of pins of the net net

I The average is correct but...

The average is correct but...

Patrick Groeneveld DAC'2000 10

You must iterate! You must iterate!

Logic Synthesis Placement Extraction Routing Timing Analysis Logic Synthesis

GDSII RTL

Multiple iterations

Met timing? NO

Today’s Conventional Flow Today’s Conventional Flow

I Synthesis does not

Synthesis does not accurately model accurately model interconnect interconnect

I Cell sizes fixed before

Cell sizes fixed before placement. placement.

I Place & route unable

Place & route unable to meet timing goal to meet timing goal

slide-6
SLIDE 6

Patrick Groeneveld DAC'2000 11

place & route logic synthesis

The trial and error iteration The trial and error iteration

Patrick Groeneveld DAC'2000 12

Methodology Problems Methodology Problems

I To avoid endless iterations, the design must be ‘on

To avoid endless iterations, the design must be ‘on the safe side’ the safe side’

I Iterations are very slow and may not converge

Iterations are very slow and may not converge

I You’re never sure if you’ll make it

You’re never sure if you’ll make it

I Only a painful trial and error process reports design

Only a painful trial and error process reports design feasibility. feasibility.

slide-7
SLIDE 7

Patrick Groeneveld DAC'2000 13

Ways to attack timing closure Ways to attack timing closure

I Iterate through SPEF or internally

Iterate through SPEF or internally

I Post

Post-placement optimization (ECO) placement optimization (ECO)

I Partition the design into smaller pieces

Partition the design into smaller pieces

N Variation in wire length will decrease

Variation in wire length will decrease

N Better timing closure on each block if #gates < 50,000

Better timing closure on each block if #gates < 50,000

I Gain

Gain-based synthesis based synthesis

Patrick Groeneveld DAC'2000 14

Hierarchy Hierarchy

I Make problems smaller

Make problems smaller

I Structure makes the problem

Structure makes the problem better manageable better manageable

I Solve sub

Solve sub-problems problems independently independently

I Enables efficient re

Enables efficient re-use use

I Enables consistent verification

Enables consistent verification

slide-8
SLIDE 8

Patrick Groeneveld DAC'2000 15

Physical hierarchy and timing closure Physical hierarchy and timing closure

I Wires need to slalom around blocks or traverse

Wires need to slalom around blocks or traverse through or over blocks through or over blocks

I How to set pin locations?

How to set pin locations?

I Where to put the buffers?

Where to put the buffers?

I Automatic floor planning problem is unsolved

Automatic floor planning problem is unsolved

I Large hidden inefficiency

Large hidden inefficiency

Patrick Groeneveld DAC'2000 16

Physical hierarchy is a necessary evil Physical hierarchy is a necessary evil

2,000,000 2,000,000 standard cells flat? standard cells flat?

macro

20 x 20 x approx

  • approx. 100,000

. 100,000 standard cells standard cells

macro

I If you can do it, do it as flat

If you can do it, do it as flat as possible! as possible!

I Also do

Also do datapath datapath flat flat

4 blocks of 500,000 4 blocks of 500,000 standard cells? standard cells?

macro

slide-9
SLIDE 9

Patrick Groeneveld DAC'2000 17

slack

Conventional layout synthesis Conventional layout synthesis

slack

Cdream s Creal

Patrick Groeneveld DAC'2000 18

slack

Gain Gain-

  • based synthesis:

based synthesis:

Cdream s Creal

slide-10
SLIDE 10

Patrick Groeneveld DAC'2000 19

Focus for timing closure Focus for timing closure

I Combine logical and physical worlds.

Combine logical and physical worlds.

I Crisp: focus on the main effect, skip irrelevant details

Crisp: focus on the main effect, skip irrelevant details

I Enable blazingly fast optimization

Enable blazingly fast optimization

I Compact: Memory efficient for tomorrow’s 50M gate chip

Compact: Memory efficient for tomorrow’s 50M gate chip

Patrick Groeneveld DAC'2000 20

Good practices, bad practices Good practices, bad practices

I

Use a simple model, and adapt reality to it. Use a simple model, and adapt reality to it.

I

At each step, freeze a single constraint, postpone decisions on At each step, freeze a single constraint, postpone decisions on others.

  • thers.

I

Allow sufficient freedom in future steps to fulfill all remainin Allow sufficient freedom in future steps to fulfill all remaining g constraints. constraints.

I

Bail out early if there’s no use continuing Bail out early if there’s no use continuing

I

Fix multiple objectives at once. Fix multiple objectives at once.

I

Iterate. Iterate.

I

Indulge in ‘accurate’ models Indulge in ‘accurate’ models

I

Attempt to be optimal Attempt to be optimal

slide-11
SLIDE 11

Patrick Groeneveld DAC'2000 21

Is ‘Optimal’ optimal?? Is ‘Optimal’ optimal??

Contacts in layout have Contacts in layout have parasitic resistance and parasitic resistance and affect reliability affect reliability 8 contacts 8 contacts Optimal Optimal with contact minimization: with contact minimization: 0 contacts 0 contacts There are still 8 contacts! There are still 8 contacts!

Patrick Groeneveld DAC'2000 22

Compromises Compromises

I Flexible

Flexible-die design is better: die design is better:

N Guarantee routing completion

Guarantee routing completion

N Until the last moment we can trade

Until the last moment we can trade-off delay for area.

  • ff delay for area.

I Fixed

Fixed-die design instead die design instead

N Need to guess initial utilization.

Need to guess initial utilization.

I This could result in an iteration.

This could result in an iteration.

slide-12
SLIDE 12

Patrick Groeneveld DAC'2000 23

Simple delay model of a gate Simple delay model of a gate

I Model transistor by a resistor and a switch

Model transistor by a resistor and a switch

I We can assume that the rise

We can assume that the rise-delay and the fall delay are similar. delay and the fall delay are similar.

I Therefore pull

Therefore pull-up up Rui Rui and pull and pull-down down Rdi Rdi become become Ri Ri

I The transistor impedance depends on the transistor size (W/L)

The transistor impedance depends on the transistor size (W/L)

Rgate

in in in

  • ut
  • ut

Cin Cin Cload Cgate

in

  • ut
  • ut

Cgate Cload Rui Rdi

I Cin

in : input capacitance of the gate : input capacitance of the gate

I Cgate

gate : the internal parasitic capacitance (mostly diffusion) : the internal parasitic capacitance (mostly diffusion)

I Cload

load : the external load that the gate is driving : the external load that the gate is driving

I Rgate

gate : effective output impedance : effective output impedance

Patrick Groeneveld DAC'2000 24

Parasitic delay

Gate delay and load Gate delay and load

gate gate load gate abs

C R C R d + =

Rgate

in

Cin

  • ut

Cgate Cload Cload

Cload delay x x x x

Delay dependency on load is often given as table.

Cin

slide-13
SLIDE 13

Patrick Groeneveld DAC'2000 25

Lets double the gate size Lets double the gate size

in

Cin,0

in

  • ut
  • ut

Cgate,0 Cload

in

  • ut

in

  • ut

Cgate,0 Cload

, , ,

2 2 1 2

gate gate gate gate in in

C C R R C C ∗ = = ∗ =

Rgate,0 Rgate,0 Rgate,0

Cgate,0 Cin,0 Cin,0

, , , , , ,

2 2 2 2

gate gate load gate abs gate gate load gate abs

C R C R d C R C R d + = ⇔ ∗ + =

Patrick Groeneveld DAC'2000 26

Parasitic delay

Gate delay and size Gate delay and size

I Assume a gate sizing factor

Assume a gate sizing factor α α (=relative scaling towards smallest) (=relative scaling towards smallest)

Gate size delay x x x x

, , , , , ,

1

gate gate load gate abs gate gate gate gate in in

C R C R d C C R R C C + = ∗ = = ∗ = α α α α

Cload Cload Cload So keeping Cload constant results in: Cload

slide-14
SLIDE 14

Patrick Groeneveld DAC'2000 27

Delay and gain Delay and gain

I

The gain is the ratio of the The gain is the ratio of the input capacitance and the input capacitance and the load capacitance: load capacitance:

I

Now we can rewrite the Now we can rewrite the previous equations to in previous equations to in terms of gain: terms of gain:

R

gate

in

Cin

  • ut

Cgate Cload

in load

C C h gain = = p h g p C C g d C R C C C R d C R C R d C C R R R C C C C

in load gate gate in load in gate abs gate gate load gate abs in in gate gate gate in in in in

+ ∗ = + ∗ = ⇔ + = ⇒ + = = = = ⇔ ∗ =

, , , , , , , , ,

α α α

Patrick Groeneveld DAC'2000 28

Making delay independent of load Making delay independent of load

I

If the gain is constant, delay is constant over a range!! If the gain is constant, delay is constant over a range!! Cload Size = Cin Cload delay

Cload Cload Cload Cload

x x x x x x x x

slide-15
SLIDE 15

Patrick Groeneveld DAC'2000 29

Fixed Timing Methodology Fixed Timing Methodology

Delay Load Size

x

Fixed Timing plane Timing Sign-off

Cin Cload

Patrick Groeneveld DAC'2000 30

Fixed Timing in a nutshell Fixed Timing in a nutshell

I Goal:

Goal:

N Correct by construction (eliminate iterations)

Correct by construction (eliminate iterations)

N Emphasis on timing, not on size.

Emphasis on timing, not on size.

I Map to size

Map to size-independent independent supercells supercells

I Pick optimized delay up

Pick optimized delay up-front = pick a gain front = pick a gain

N If no feasible gain can be found: change your RTL

If no feasible gain can be found: change your RTL

I Fix this delay throughout placement and routing

Fix this delay throughout placement and routing

I Keep delay constant primarily by cell sizing.

Keep delay constant primarily by cell sizing.

slide-16
SLIDE 16

Patrick Groeneveld DAC'2000 31

“Fast circuit design on a napkin” “Fast circuit design on a napkin”

Fixed part, Fixed part, parasitic delay parasitic delay Delay of the Delay of the gate + its load gate + its load Electrical effort Electrical effort proportional to output load proportional to output load Cload

load / C

/ Cin

in

Logical effort Logical effort depends on depends on function of gate function of gate

Delay = (g * h) + p Delay = (g * h) + p

Ivan Ivan Sutherland Sutherland (1991): (1991):

Cload Cin

Patrick Groeneveld DAC'2000 32

Logical effort: g Logical effort: g

I

To keep the same output drive strength, the 2 n To keep the same output drive strength, the 2 n-transistors in series transistors in series must double their size. must double their size.

I

As a result, the input capacitance of the As a result, the input capacitance of the nand nand is larger. is larger.

I

For the same output drive strength, an inverter needs less input For the same output drive strength, an inverter needs less input capacitance: the inverter has a higher gain. capacitance: the inverter has a higher gain.

I

More complex gates have less gain More complex gates have less gain

Inverter: Cin = 1 Inverter: Cin = 1 Nand 2: Cin = 4/3 Nand 2: Cin = 4/3

slide-17
SLIDE 17

Patrick Groeneveld DAC'2000 33

Logical effort: g Logical effort: g

I

Assuming that in static Assuming that in static CMOS gates the mobility of CMOS gates the mobility of the p the p-transistor is half of the transistor is half of the n-mobility: mobility:

Gate 1 2 3 n Inverter 1

  • Nand
  • 4/3

5/3 (n+2)/3 Nor

  • 5/4

7/3 (2n+1)/3

I

3-input nor input nor

Patrick Groeneveld DAC'2000 34

p: The parasitic delay p: The parasitic delay

I Independent of size and load

Independent of size and load

I Dependent on process and logic function

Dependent on process and logic function

I Can be ignored during optimization

Can be ignored during optimization

, , gate gate C

R p =

Same input cap Cin Then p2nand = 2pinverter

x x x x x x

Gate Relative Parasitic delay Inverter 1 n-input nand n n-input nor n

slide-18
SLIDE 18

Patrick Groeneveld DAC'2000 35

Putting it together Putting it together

I .

Parasitic delay: p Effort delay: g*h 1 2 3 4 5 6 7 1 2 3 h: Electrical effort = gain d: normalized to inverter

Inverter: g=1, p=1 2-input nor: g=5/3, p=2

p h g d + = *

2-input nand : g=4/3, p=2 4

  • i

n p u t n

  • r

: g = 9 / 3 , p = 4

Cin Cload h = Cload/ Cin

Patrick Groeneveld DAC'2000 36

Optimizing speed Optimizing speed

I Goal: Drive load as

Goal: Drive load as fast fast as possible as possible

N What is the optimal number of stages

What is the optimal number of stages n ?

N What is the size ratio of the gates?

What is the size ratio of the gates? Cload Cin

slide-19
SLIDE 19

Patrick Groeneveld DAC'2000 37

Tune for Tune for maximum maximum speed speed

I Mead and Conway (1980), ignoring parasitic delay

Mead and Conway (1980), ignoring parasitic delay

gain stage i size i size C C C C H h stages

  • f

number H n gain total C C C C H

i in i in i in i load n i in n load in load

_ 71 . 2 ) ( ) 1 ( _ _ ) ln( _

, 1 , , , 1 , ,

= = + ≈ = = = = = = = =

+

Cload Cin,1 Cin,2 Cin,n Cin,3

I With the parasitic delay p, the optimum ratio is 3.59

With the parasitic delay p, the optimum ratio is 3.59

Patrick Groeneveld DAC'2000 38

Maximum speed…. Maximum speed….

slide-20
SLIDE 20

Patrick Groeneveld DAC'2000 39

Tune a path for maximum speed Tune a path for maximum speed

a b

I Maximum speed is obtained if effort delay f=(g*h) is

Maximum speed is obtained if effort delay f=(g*h) is the same for each stage. the same for each stage.

I The optimal effort delay is f = 3.59

The optimal effort delay is f = 3.59

I The more complex the gate, the more capacitance

The more complex the gate, the more capacitance will be propagated backwards. will be propagated backwards.

59 . 3 ) * 4 5 ( =

a in a load

C C

c

59 . 3 ) * 3 6 ( =

b in b load

C C 59 . 3 ) * 3 7 ( =

c in c load

C C

20 =

c load

c 13 59 . 3 20 * 3 7 * = = = f c g C

c load c c in

2 . 7 59 . 3 13 * 3 6 = =

b in

C

5 . 2 59 . 3 2 . 7 * 4 5 = =

a in

C

Patrick Groeneveld DAC'2000 40

Choosing the right number of stages Choosing the right number of stages (logical depth) (logical depth)

I

During layout: Adding inverters for long During layout: Adding inverters for long-wire delay minimization. wire delay minimization.

I

The optimum depth depends on the The optimum depth depends on the path effort path effort and process and process parameters. parameters.

I

Not very critical: being 50% off results in less than 10% delay Not very critical: being 50% off results in less than 10% delay penalty penalty

I

Logic depth is determined by synthesis Logic depth is determined by synthesis

I

pre pre-layout: Adding buffers to high layout: Adding buffers to high-fanout fanout nets generally improves speed nets generally improves speed due to the high inverter gain. due to the high inverter gain.

slide-21
SLIDE 21

Patrick Groeneveld DAC'2000 41

Assigning delays Assigning delays

I Timing constraints determine the delay budget:

Timing constraints determine the delay budget:

N e.g.

e.g. dabcd

abcd < 2.0ns,

< 2.0ns, ded

ed < 2.0ns,

< 2.0ns, dfcd

fcd < 2.0ns

< 2.0ns

I Spread delay budgets evenly over all paths

Spread delay budgets evenly over all paths

N If paths collide, take the smallest delay budget

If paths collide, take the smallest delay budget

N Relax others

Relax others

I Translate delay budgets into gain.

Translate delay budgets into gain.

ff ff

b c d e f a 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 0.66 0.66 0.66 0.66 0.66 0.66 -> 1.0 > 1.0 1.0 1.0 -> 1.5 > 1.5

Patrick Groeneveld DAC'2000 42

Pre Pre-

  • layout sign

layout sign-

  • off
  • ff

0.5ns 0.5ns 0.5ns 0.5ns FF ff ff ff ff

I If there is no feasible gain assignment, the sizes

If there is no feasible gain assignment, the sizes literally ‘explode’. literally ‘explode’.

slide-22
SLIDE 22

Patrick Groeneveld DAC'2000 43

Keeping delay constant during layout Keeping delay constant during layout

I The gain ratio (=

The gain ratio (=Cload Cload/Cin Cin) is maintained is placement ) is maintained is placement

I Sizes change

Sizes change during during placement. placement.

I As a result, delay is (almost) constant

As a result, delay is (almost) constant

I Sizes cannot ‘explode’

Sizes cannot ‘explode’ Cload/Cin = fixed

Patrick Groeneveld DAC'2000 44

Sizing driven placement Sizing driven placement

I Gate sizes change gradually during placement to keep

Gate sizes change gradually during placement to keep delay constant. delay constant.

I Placer much be able to cope with the net list changes

Placer much be able to cope with the net list changes due to buffering, cloning, restructuring, clock insertion, due to buffering, cloning, restructuring, clock insertion, etc. etc.

I .. while producing a routable result.

.. while producing a routable result.

slide-23
SLIDE 23

Patrick Groeneveld DAC'2000 45

Automatic Automatic Congestion Handling Congestion Handling

I During placement

During placement

Routing Congestion Utilization Routing Congestion Utilization

Patrick Groeneveld DAC'2000 46

What happened What happened

…. at the logical …. at the logical-physical boundary? physical boundary?

I

Delay fixed Delay fixed

I

Cell Area unknown Cell Area unknown

I

Sum of areas determines Sum of areas determines chip size. (Additive) chip size. (Additive)

I

No iterations required No iterations required

I

Each gate has exactly the Each gate has exactly the right drive strength: right drive strength:

N Not too little (fanout

Not too little (fanout violation, timing fails) violation, timing fails)

N Not too much (waste of

Not too much (waste of area) area)

I

Cell Area fixed Cell Area fixed

I

Delay is a gamble Delay is a gamble

I

Worst case delay Worst case delay determines timing (max) determines timing (max)

I

Iterate to make ends meet. Iterate to make ends meet.

I

After timing finally closes, After timing finally closes, many gates will be too big: many gates will be too big:

N waste of area

waste of area

N waste of power

waste of power

slide-24
SLIDE 24

Patrick Groeneveld DAC'2000 47

Conventional way: Conventional way: Worst case delay sets timing Worst case delay sets timing

I

99% of paths meets timing, 99% of paths meets timing, 1% does not 1% does not

I

Cell sizes do not change Cell sizes do not change during Place and Route during Place and Route

I

Design conservatively to avoid Design conservatively to avoid excessive iterations. Also excessive iterations. Also WLM is tuned conservatively. WLM is tuned conservatively.

I

This This oversizes

  • versizes all cells

all cells

N because also cells on non

because also cells on non- critical paths are sized up. critical paths are sized up.

I

Chip significantly bigger than Chip significantly bigger than necessary (10 necessary (10-30%) 30%)

Patrick Groeneveld DAC'2000 48

What about In What about In-

  • place optimization?

place optimization?

I

Do a post Do a post-placement ECO, placement ECO,

I

Change only the cells on the Change only the cells on the critical paths. critical paths.

I

Conservatism is still required Conservatism is still required because of limited ECO because of limited ECO capacity. capacity.

I

All non All non-critical cells are still critical cells are still

  • versized
  • versized

I

Chip still bigger than Chip still bigger than necessary. necessary.

slide-25
SLIDE 25

Patrick Groeneveld DAC'2000 49

Gain based synthesis: area is additive Gain based synthesis: area is additive

I

Timing is fixed, Timing is fixed,

I

As a result, cell sizes change. As a result, cell sizes change.

I

But large cells and small cells But large cells and small cells cancel out: some get bigger, cancel out: some get bigger,

  • thers smaller
  • thers smaller

I

All cells have exactly the right All cells have exactly the right drive strength: many paths are drive strength: many paths are almost critical. almost critical.

I

Chip size remains small (10 Chip size remains small (10- 30% smaller than conventional 30% smaller than conventional way) way)

Patrick Groeneveld DAC'2000 50

Logic ( Logic (wireload wireload) Synthesis ) Synthesis

I For a simple function ( (A’ + B) * C ) `

For a simple function ( (A’ + B) * C ) `

I Various logic structures are possible with one size

Various logic structures are possible with one size

I Conventional logic synthesis tool attempts to

Conventional logic synthesis tool attempts to

  • ptimize the delay by:
  • ptimize the delay by:

N Logic restructuring

Logic restructuring

N Picking the proper sizes

Picking the proper sizes

I This is driven by a vague idea of the wire load

This is driven by a vague idea of the wire load

slide-26
SLIDE 26

Patrick Groeneveld DAC'2000 51

Many sizing combinations Many sizing combinations

Heuristics tradeoffs -- significantly slower than equation-based constant delay

Patrick Groeneveld DAC'2000 52

Gain Gain-

  • based synthesis:

based synthesis: supercells supercells

I

Need a single ‘super’ cell representing all sizes in a logic fun Need a single ‘super’ cell representing all sizes in a logic function. ction. Super!

I

Contains: Contains:

N g, h, p

g, h, p

N size

size-range range

slide-27
SLIDE 27

Patrick Groeneveld DAC'2000 53

Gain Gain-

  • based mapping

based mapping

I In timing

In timing-critical parts, the critical parts, the mapper mapper picks super cells picks super cells that have low parasitic delay and highest maximum that have low parasitic delay and highest maximum drive strength. drive strength.

I In non

In non-critical parts, ‘weaker’ super cells can be used. critical parts, ‘weaker’ super cells can be used.

N Pick cells that have potentially the smallest size.

Pick cells that have potentially the smallest size.

I Insert buffers on high

Insert buffers on high-fanout fanout nets nets

Patrick Groeneveld DAC'2000 54

Putting it together Putting it together

I

Map onto generic ‘super cells’ with flexible area. Map onto generic ‘super cells’ with flexible area.

I

Optimize gains for all super cells such that maximum speed is Optimize gains for all super cells such that maximum speed is achieved.

  • achieved. This fixes all delays in the circuit!

This fixes all delays in the circuit!

I

Give up Give up if the (optimally conditioned) circuit does not meet the given if the (optimally conditioned) circuit does not meet the given timing criteria. timing criteria.

I

Perform ‘sizing driven placement’: keep delay constant by adapti Perform ‘sizing driven placement’: keep delay constant by adapting cell ng cell size to parasitic capacitance of the wires. Parasitic wire delay size to parasitic capacitance of the wires. Parasitic wire delay is based is based

  • n coarse routing of the wires.
  • n coarse routing of the wires.

I

Fix remaining timing problems through buffering, cloning, restru Fix remaining timing problems through buffering, cloning, restructuring. cturing.

I

Update floor plan if the timing is still not met. Update floor plan if the timing is still not met.

I

For each For each supercell supercell, pick the one standard cell that matches the , pick the one standard cell that matches the required drive strength. required drive strength.

I

Legalize the placement (a.k.a detailed placement) Legalize the placement (a.k.a detailed placement)

I

Perform final routing under delay constraints. Perform final routing under delay constraints.

slide-28
SLIDE 28

Patrick Groeneveld DAC'2000 55

That’s very nice in theory, but…. That’s very nice in theory, but….

I Library only has a few drive strengths: is there a

Library only has a few drive strengths: is there a descretization descretizationerror? error?

I How to account for differences in fall and rise time?

How to account for differences in fall and rise time?

I Do I need a special library?

Do I need a special library?

I What if a very large drive strength is needed?

What if a very large drive strength is needed?

I When are buffers inserted?

When are buffers inserted?

I Isn’t the model too simplistic?

Isn’t the model too simplistic?

I What about the parasitic wire resistance?

What about the parasitic wire resistance?

Patrick Groeneveld DAC'2000 56

Library Analysis Library Analysis

/cmos18/NAND2 (A /cmos18/NAND2 (A -> Z) inverting > Z) inverting model model hide hide typ typ load load gain gain input cap input cap area area rise delay rise delay fall delay fall delay slew slew max slew max slew

  • NAND2d1

NAND2d1 25 25 2.51 2.51 10 10 1 161 161 102 102 66 66 2000 2000 NAND2d2 NAND2d2 54 54 2.71 2.71 20 20 1 153 153 100 100 67 67 2000 2000 NAND2d3 NAND2d3 110 110 2.69 2.69 41 41 2 153 153 100 100 67 67 2000 2000 NAND2d4 NAND2d4 186 186 2.66 2.66 70 70 5 153 153 99 99 67 67 2000 2000 NAND2d5 NAND2d5 D 370 370 18.52 18.52 20 20 9 254 254 293 293 57 57 2000 2000

  • NAND2_SUPER

NAND2_SUPER 370 370 2.74 2.74 148 148 108 108 67 67 2000 2000 I

Gain is averaged Gain is averaged

I

Toss out ‘weird cells’ Toss out ‘weird cells’

I

Typical load is the load the gate Typical load is the load the gate drives when optimized for maximum drives when optimized for maximum speed: g*h =3.59 speed: g*h =3.59 Cload Cin

d1 d2 d4 d3 d5

slide-29
SLIDE 29

Patrick Groeneveld DAC'2000 57

Fixing cell sizes & keeping timing Fixing cell sizes & keeping timing

Standard Cell SuperCell

1x 2x 4x

Cload Cin

1x 2x 4x Permissible range Load violation

Patrick Groeneveld DAC'2000 58

The The discretization discretization error... error...

Gain= 0.3 Gain= 0.3 Gain= 0.9 Gain= 0.9 1x 1x 2x 2x 2x 2x 2.2x 2.2x 2.9x 2.9x 4x 4x 1.3x 1.3x 1.2x 1.2x Gain= 0.7 Gain= 0.7 Gain= 0.9 Gain= 0.9

slide-30
SLIDE 30

Patrick Groeneveld DAC'2000 59

.. is generally not a big problem .. is generally not a big problem

I Delay versus size curve is

Delay versus size curve is flat, because the size is flat, because the size is

  • ptimized for maximum
  • ptimized for maximum

speed speed

I Rounding error is absorbed

Rounding error is absorbed by appropriate up by appropriate up- and and downsizing of surrounding downsizing of surrounding cells. cells.

I On critical paths, buffer

On critical paths, buffer insertion and logic insertion and logic restructuring minimize effect. restructuring minimize effect.

Optimum delay at 3.2x, but size is not available size Path delay 2x 4x 1x x x x

Patrick Groeneveld DAC'2000 60

Load violations Load violations

I

Maximum drive strength in the library might be too small Maximum drive strength in the library might be too small

I

Drive information is stored in super cell, and managed pre Drive information is stored in super cell, and managed pre-placement. placement.

I

Buffering, cloning and restructuring are used to maintain delay Buffering, cloning and restructuring are used to maintain delay during during placement placement

Cload Cin

1x 2x 4x Permissible range Load violation

slide-31
SLIDE 31

Patrick Groeneveld DAC'2000 61

Buffered wire: smallest delay Buffered wire: smallest delay

I Delay per stage (

Delay per stage (elmore elmore): ):

I Optimum buffer distance:

Optimum buffer distance:

I Optimum buffer size:

Optimum buffer size:

2

C R C w

w w

  • pt

τ =

w w buffer

  • pt

C R p L ) 1 ( 2 + = τ

w LC R 2 L C R ) w C L C ( w R d

w 2 w w w

+ + + =

Patrick Groeneveld DAC'2000 62

Buffering in a typical 0.25 Buffering in a typical 0.25 µm process m process

I Optimum buffer distance tends to be around 2000

Optimum buffer distance tends to be around 2000 µm. m.

I This works out to an area of 4mm

This works out to an area of 4mm2, or about 10 , or about 10-20K cells. 20K cells.

I But

But w

  • pt
  • pt is

is much much larger then what most libraries have available: larger then what most libraries have available:

W (buffer size)

Delay per micron 50x 100x 25x

Optimal at 80x

75x

Range of available drive strengths in the library

slide-32
SLIDE 32

Patrick Groeneveld DAC'2000 63

Library constrains performance Library constrains performance

I Limited drive strength in standard cell libraries results

Limited drive strength in standard cell libraries results in significantly longer delays at the chip in significantly longer delays at the chip-level. level.

I This is true for

This is true for ANY ANY methodology, and not exclusive methodology, and not exclusive to gain to gain-based synthesis. based synthesis.

I Reason for limited drive strength:

Reason for limited drive strength:

N Concerns about signal

Concerns about signal electromigration electromigration.

N Router doesn’t handle wide wires.

Router doesn’t handle wide wires.

N Huge cells (20x a ‘normal’ cell) frustrates placer.

Huge cells (20x a ‘normal’ cell) frustrates placer.

N Folklore.

Folklore.

Patrick Groeneveld DAC'2000 64

Parallel cells Parallel cells

I A simple way to test whether a better library would

A simple way to test whether a better library would improve results: improve results:

I Issues:

Issues:

N testability

testability

N signal

signal-EM EM

N congestion: detailed placer

congestion: detailed placer

slide-33
SLIDE 33

Patrick Groeneveld DAC'2000 65

Electromigration Electromigration: wires wear out : wires wear out

Electrons move atoms Electrons move atoms

Contact (tungsten)

‘reservoir’

‘End-of-line’

  • verhang

‘Cavities’ in wire

Patrick Groeneveld DAC'2000 66

Dealing with Dealing with Electromigration Electromigration

I A statistical effect, resulting in a gradual increase of the wir

A statistical effect, resulting in a gradual increase of the wire e resistance, followed by failure. resistance, followed by failure.

I The time that 50% of the wires fail is given by::

The time that 50% of the wires fail is given by::

kT E f

a

e J A t

= * 1 *

2

I Depends on the current density J

Depends on the current density J

N Wider wires would help

Wider wires would help

I Exponential dependency on temperature makes it hard to

Exponential dependency on temperature makes it hard to predict. predict.

I Wires self

Wires self-heat due to resistance heat due to resistance

slide-34
SLIDE 34

Patrick Groeneveld DAC'2000 67

What makes a good DSM library? What makes a good DSM library?

I Many drive strengths per function

Many drive strengths per function

N No functions with few drive strengths

No functions with few drive strengths

N No holes or missing drive strengths

No holes or missing drive strengths

N Also have drive strengths for flip

Also have drive strengths for flip-flops and latches flops and latches

I High drive strengths

High drive strengths

I Linear scaling of load and area

Linear scaling of load and area

N avoid multi

avoid multi-stage cells stage cells

I Avoid multi

Avoid multi-output cells

  • utput cells

I Avoid single stage gates with more than 4 inputs

Avoid single stage gates with more than 4 inputs

I Not many different functions are needed.

Not many different functions are needed.

Patrick Groeneveld DAC'2000 68

Buffering & wire sizing Buffering & wire sizing

I

To tame the quadratic nature of wire delay To tame the quadratic nature of wire delay

I

To avoid load violations To avoid load violations

I

A static timer is run concurrently during (incremental) A static timer is run concurrently during (incremental) placement placement

I

Wire delay is estimated based on the most accurate Wire delay is estimated based on the most accurate information available at the time: information available at the time:

N Elmore

Elmore I (based on I (based on steiner steiner tree) tree)

N Elmore

Elmore II (based on global routing) II (based on global routing)

N 2nd order AWE (post routing)

2nd order AWE (post routing)

I

Buffers are inserted where needed Buffers are inserted where needed

N After buffer insertion the gains need to be re

After buffer insertion the gains need to be re-distributed distributed

slide-35
SLIDE 35

Patrick Groeneveld DAC'2000 69

Wire delay optimization Wire delay optimization

I

Delay after optimization: Delay after optimization:

! buffering

buffering,

,

! cell sizing

cell sizing

! wire sizing

wire sizing.

I

0.18 micron technology 0.18 micron technology

I

∆ Wire length 64x results in Wire length 64x results in

I

∆ Delay < 3x Delay < 3x

10 100 1000 100 1000 10000

Wire Length(um) Delay (ps)

Data courtesy of Prof. Jason Cong, UCLA

Patrick Groeneveld DAC'2000 70

Logic cloning and restructuring Logic cloning and restructuring

I To keep timing fixed by adapting the reality to the

To keep timing fixed by adapting the reality to the model model

I Restructuring and rewiring of the critical path

Restructuring and rewiring of the critical path improves timing. improves timing.

slide-36
SLIDE 36

Patrick Groeneveld DAC'2000 71

Gain based synthesis flow Gain based synthesis flow

I Timing analysis tool runs

Timing analysis tool runs concurrently during all steps concurrently during all steps

I Strong infrastructure is

Strong infrastructure is necessary necessary

I Backend (routing) must

Backend (routing) must make this come true make this come true

Sizing-driven placement buffering cloning, restructuring clock insertion RTL

OK?

Scan insertion detailed placement track routing detailed routing Logic mapping Gain assignment

OK?

GDSII

Library analysis Build supercells Delays fixed, sized floating Delays fixed, Sizes fixed

Patrick Groeneveld DAC'2000 72

Objectives Objectives

I Implement wire pattern that is:

Implement wire pattern that is:

N LVS

LVS-correct: no shorts nor unconnects correct: no shorts nor unconnects

N DRC

DRC-correct, includes electromigration and correct, includes electromigration and antenna rules antenna rules

N

correct: adapt model to reality correct: adapt model to reality

N Deals with special requirements for power and

Deals with special requirements for power and clock routing clock routing

slide-37
SLIDE 37

Patrick Groeneveld DAC'2000 73

Correct by Construction or Correct by Construction or Construct by Correction?? Construct by Correction??

I Traditional tools are primarily focused on completion:

Traditional tools are primarily focused on completion:

N Correct by construction for LVS and DRC, but not for timing!

Correct by construction for LVS and DRC, but not for timing!

N Timing violations addressed by rip

Timing violations addressed by rip-up up-and and-reroute, I.e. ‘construct reroute, I.e. ‘construct by correction’. by correction’.

I Modern EDA flows should target ‘correct by construction’

Modern EDA flows should target ‘correct by construction’ for timing: for timing:

N careful planning for timing budget and

careful planning for timing budget and

N variable spacing and width

variable spacing and width detailed routing. detailed routing.

Patrick Groeneveld DAC'2000 74

Global routing Global routing

Bucket

Finds coarse path and layer Finds coarse path and layer assignment for each net, such that: assignment for each net, such that: wire density is spread evenly wire density is spread evenly

slide-38
SLIDE 38

Patrick Groeneveld DAC'2000 75

Interconnect speed Interconnect speed

ground plane

top view dlat dlat w h dox l Cwire

wire = C

= C0 * ((l * w)/ * ((l * w)/dox

  • x + (2 * l * h)/

+ (2 * l * h)/dlat

lat) =

) = Cwire

wire,gnd gnd + + Cwire wire,lat lat

Consider the middle wire: Consider the middle wire: Rwire

wire = R

= R0 * l/(w * h) * l/(w * h) ground ground lateral lateral

τwire = R

wire * Cwire = quadratic with length l

Patrick Groeneveld DAC'2000 76

Applying Applying Moore’s Moore’s law law

I Double the density by a lateral shrink:

Double the density by a lateral shrink:

N l, w and

l, w and dlat

lat shrink by factor

shrink by factor sqrt sqrt(2) (2)

ground plane

dlat dlat h dox w Cwire

wire = C

= C0 * (( * ((l l * * w)/ )/dox

  • x + (2 *

+ (2 * l * h)/ * h)/dlat

lat)

Rwire

wire = R

= R0 * * l/( /(w * h) = constant * h) = constant ground = half ground = half lateral = constant lateral = constant

slide-39
SLIDE 39

Patrick Groeneveld DAC'2000 77

Shrinking wires Shrinking wires

past present future

I Wire resistance and metal migration force lower

Wire resistance and metal migration force lower resistance and therefore ‘taller’ geometry. resistance and therefore ‘taller’ geometry.

I Capacitance couples to neighbors

Capacitance couples to neighbors

I Total capacitance does not get smaller!

Total capacitance does not get smaller!

Patrick Groeneveld DAC'2000 78

Speedup due to shrink Speedup due to shrink

Cgate Rgate Rwire Cwire unchanged unchanged half hardly smaller … speedup with lateral capacitance is down to 1 instead of factor 2 (without)

slide-40
SLIDE 40

Patrick Groeneveld DAC'2000 79

Lateral capacitance is worse! Lateral capacitance is worse!

effectively 2 xClat This is the miller effect

Patrick Groeneveld DAC'2000 80

Crosstalk Crosstalk Noise on wires Noise on wires

I

The size of the cross talk capacitor The size of the cross talk capacitor

I

Slope of the aggressor Slope of the aggressor

I

Threshold voltage Threshold voltage

I

Ratio between victim and aggressor output resistance's Ratio between victim and aggressor output resistance's

Cross talk causes noise, which depends on: Cross talk causes noise, which depends on: Cgate Rgate Cwire

wire,lat lat

slide-41
SLIDE 41

Patrick Groeneveld DAC'2000 81

Track Routing: maintaining timing Track Routing: maintaining timing

I

Refines the global routing by fixing track positions Refines the global routing by fixing track positions

I

Timing is a given constraint: satisfy crosstalk by spacing apart Timing is a given constraint: satisfy crosstalk by spacing apart ‘unfriendly’ wires. ‘Friendliness’ data is given by timer. ‘unfriendly’ wires. ‘Friendliness’ data is given by timer.

I

Use shielding for clocks, spacing or shielding for signal wires. Use shielding for clocks, spacing or shielding for signal wires.

Spacing between unfriendly nets is enlarged to meet load budget. Patrick Groeneveld DAC'2000 82

“Common Database” Architecture “Common Database” Architecture

Timing algorithm Database, translators (on hard disk) TOOL 1 Data Model Extraction algorithm

. . .

TOOL n Data Model Placement algorithm TOOL 2 Data Model Routing algorithm TOOL 3 Data Model I

Each tool has its own data Each tool has its own data

  • representation. Design data is
  • representation. Design data is

shared by: shared by:

N reading/writing (huge) files.

reading/writing (huge) files.

N Data management layer

Data management layer controls access to files and controls access to files and convert formats convert formats

I

Great for “integrating” many Great for “integrating” many separate tools. separate tools.

I

Makes real Makes real-time sharing of data time sharing of data slow and inefficient. slow and inefficient.

slide-42
SLIDE 42

Patrick Groeneveld DAC'2000 83

Infrastructure is key Infrastructure is key

In-core Data Model Placement Alg. Routing Alg. Tool n Alg.

. . .

TCL access Timing Alg.

I Tools share a common

Tools share a common data structure. They run data structure. They run directly on it. directly on it.

I Let all design data lives

Let all design data lives “in core” during the flow, “in core” during the flow, attached to data attached to data structure. structure.

I Use only one format: the

Use only one format: the data structure data structure

GUI access Verification Alg.

Volcano on disk

External formats

Patrick Groeneveld DAC'2000 84

Track Re Track Re-

  • ordering
  • rdering

I

Crosstalk aware wire ordering during routing Crosstalk aware wire ordering during routing

I

Based on timing windows Based on timing windows

ET LT ET LT ET LT

NET B NET A NET C

ET LT ET LT ET LT

NET A NET C NET B

slide-43
SLIDE 43

Patrick Groeneveld DAC'2000 85

How to get timing closure? How to get timing closure?

I Good placements and floor plans

Good placements and floor plans

N Floorplanning

Floorplanning is a hard and unsolved problem is a hard and unsolved problem

I Let the computer do the work for you

Let the computer do the work for you

N If you have no clue about the floor plan: flatten it!

If you have no clue about the floor plan: flatten it!

I EDA tool needs to:

EDA tool needs to:

N Have massive capacity

Have massive capacity

N Have a transparent data model

Have a transparent data model

I Relaxing some parameters could help dramatically.

Relaxing some parameters could help dramatically.

Patrick Groeneveld DAC'2000 86

Track Routing: maintaining timing Track Routing: maintaining timing

I

Refines the global routing by fixing track positions Refines the global routing by fixing track positions

I

Timing is a given constraint: satisfy crosstalk by spacing apart Timing is a given constraint: satisfy crosstalk by spacing apart ‘unfriendly’ wires. ‘Friendliness’ data is given by timer. ‘unfriendly’ wires. ‘Friendliness’ data is given by timer.

I

Use shielding for clocks, spacing or shielding for signal wires. Use shielding for clocks, spacing or shielding for signal wires.

Spacing between unfriendly nets is enlarged to meet load budget.

slide-44
SLIDE 44

Patrick Groeneveld DAC'2000 87

“Common Database” Architecture “Common Database” Architecture

Timing algorithm Database, translators (on hard disk) TOOL 1 Data Model Extraction algorithm

. . .

TOOL n Data Model Placement algorithm TOOL 2 Data Model Routing algorithm TOOL 3 Data Model I

Each tool has its own data Each tool has its own data

  • representation. Design data is
  • representation. Design data is

shared by: shared by:

N reading/writing (huge) files.

reading/writing (huge) files.

N Data management layer

Data management layer controls access to files and controls access to files and convert formats convert formats

I

Great for “integrating” many Great for “integrating” many separate tools. separate tools.

I

Makes real Makes real-time sharing of data time sharing of data slow and inefficient. slow and inefficient.

Patrick Groeneveld DAC'2000 88

Infrastructure is key Infrastructure is key

In-core Data Model Placement Alg. Routing Alg. Tool n Alg.

. . .

TCL access Timing Alg.

I Tools share a common

Tools share a common data structure. They run data structure. They run directly on it. directly on it.

I Let all design data lives

Let all design data lives “in core” during the flow, “in core” during the flow, attached to data attached to data structure. structure.

I Use only one format: the

Use only one format: the data structure data structure

GUI access Verification Alg.

Volcano on disk

External formats

slide-45
SLIDE 45

Patrick Groeneveld DAC'2000 89

How to get timing closure? How to get timing closure?

I Good placements and floor plans

Good placements and floor plans

N Floorplanning

Floorplanning is a hard and unsolved problem is a hard and unsolved problem

I Let the computer do the work for you

Let the computer do the work for you

N If you have no clue about the floor plan: flatten it!

If you have no clue about the floor plan: flatten it!

I EDA tool needs to:

EDA tool needs to:

N Have massive capacity

Have massive capacity

N Have a transparent data model

Have a transparent data model

I Relaxing some parameters could help dramatically.

Relaxing some parameters could help dramatically.

Kdomain Kdomain 3.2M gates 3.2M gates Odomain Odomain 2.5M gates 2.5M gates T1 T1

812K 812K gates gates

T2 T2 2.1M 2.1M gates gates

3-D labs design 0.18u 266Mhz

slide-46
SLIDE 46

Patrick Groeneveld DAC'2000 91 Patrick Groeneveld DAC'2000 92

Summary Summary

I The gain based synthesis model proves excellent for

The gain based synthesis model proves excellent for the logic to layout conversion. the logic to layout conversion.

I Timing is more important than actual gate size:

Timing is more important than actual gate size: therefore delays is fixed before size. therefore delays is fixed before size.

I The simplicity of the model allows scaling to larger

The simplicity of the model allows scaling to larger chips (millions of chips (millions of placeable placeable objects).

  • bjects).