Part III Part III Gain- -based synthesis based synthesis Gain - - PowerPoint PPT Presentation

part iii part iii gain based synthesis based synthesis
SMART_READER_LITE
LIVE PREVIEW

Part III Part III Gain- -based synthesis based synthesis Gain - - PowerPoint PPT Presentation

Part III Part III Gain- -based synthesis based synthesis Gain enabler for correct- -by by- -construction design construction design enabler for correct ASP- -DAC01 Tutorial DAC01 Tutorial ASP Patrick Groeneveld


slide-1
SLIDE 1

ASP-DAC'01 - Patrick Groeneveld 1

Part III Part III Gain Gain-

  • based synthesis

based synthesis

enabler for ‘correct enabler for ‘correct-

  • by

by-

  • construction’ design

construction’ design ASP ASP-

  • DAC’01 Tutorial

DAC’01 Tutorial Patrick Patrick Groeneveld Groeneveld

( (patrick patrick@magma @magma-

  • da

da.com) .com) Magma Design Automation Magma Design Automation Cupertino, CA Cupertino, CA

slide-2
SLIDE 2

ASP-DAC'01 - Patrick Groeneveld III-2

Summary Summary

I I Iterative flow

Iterative flow vs

  • vs. stepwise refinement

. stepwise refinement

I I Derivation of a simple delay model

Derivation of a simple delay model

I I Gain based delay optimization

Gain based delay optimization

I I Building a tool flow around this model

Building a tool flow around this model

I I Standard cell library issues

Standard cell library issues

I I Getting timing during routing

Getting timing during routing

I I Recommendations

Recommendations

slide-3
SLIDE 3

ASP-DAC'01 - Patrick Groeneveld III-3

Preliminaries Preliminaries

I I Timing closure: Obtaining a feasible layout of a circuit

Timing closure: Obtaining a feasible layout of a circuit that meets the given timing specification. that meets the given timing specification.

I I Objective: Obtain closure as fast and effortless as

Objective: Obtain closure as fast and effortless as possible. possible.

I I Assumptions:

Assumptions:

N ASIC design style.

ASIC design style.

N Standard cell abstraction.

Standard cell abstraction.

N Static CMOS.

Static CMOS.

I I Neglect other design issues

Neglect other design issues

slide-4
SLIDE 4

ASP-DAC'01 - Patrick Groeneveld III-4

Interconnect Interconnect parasitics parasitics (C and R) (C and R)

I I Speed is entirely determined by

Speed is entirely determined by parasitics parasitics

I I Parasitics

Parasitics are tiny are tiny

I I Parasitics

Parasitics depend on the depend on the exact exact layout layout

I I Therefore they are hard or impossible to estimate,

Therefore they are hard or impossible to estimate, especially before placement. especially before placement.

slide-5
SLIDE 5

ASP-DAC'01 - Patrick Groeneveld III-5

Timing Uncertainty Timing Uncertainty

Gate Gate-

  • to

to-

  • gate delay depends on:

gate delay depends on:

  • Wire length (unknown during synthesis)

Wire length (unknown during synthesis)

  • The layer of the wire (determined during routing)

The layer of the wire (determined during routing)

  • The configuration of the neighboring wires:

The configuration of the neighboring wires: distance, near/far (unknown before detailed routing) distance, near/far (unknown before detailed routing)

  • Timing window and slope of the neighboring wires.

Timing window and slope of the neighboring wires.

slide-6
SLIDE 6

ASP-DAC'01 - Patrick Groeneveld III-6

Meeting timing gets harder Meeting timing gets harder

Flip Flip flop flop Flip Flip flop flop Flip Flip flop flop Flip Flip flop flop Flip Flip flop flop

q q q q q q d d 5 5 ns ns max max

slide-7
SLIDE 7

ASP-DAC'01 - Patrick Groeneveld III-7

slack

Timing is a result of the placement Timing is a result of the placement

I I The bad news: the worst timing sets the clock speed!

The bad news: the worst timing sets the clock speed!

slack

Cdream s Creal

slide-8
SLIDE 8

ASP-DAC'01 - Patrick Groeneveld III-8

Prediction vs reality Prediction vs reality

number of number of nets nets Real delay Real delay - predicted delay predicted delay

Average, Average, wireload model, wireload model, what you what you designed for designed for fastest/best fastest/best slowest/worst slowest/worst

circuit does circuit does not work not work

  • 100%

100% +100% +100%

slide-9
SLIDE 9

ASP-DAC'01 - Patrick Groeneveld III-9

The end of the wire load model The end of the wire load model

I I Model is used in

Model is used in coventional coventional synthesis tools synthesis tools

I I It guesses load based on the number of pins of the

It guesses load based on the number of pins of the net net

I I The average is correct but...

The average is correct but...

slide-10
SLIDE 10

ASP-DAC'01 - Patrick Groeneveld III-10

You must iterate! You must iterate!

Logic Synthesis Placement Extraction Routing Timing Analysis Logic Synthesis

GDSII RTL

Multiple iterations

Met timing? NO

Today’s Conventional Flow Today’s Conventional Flow

I I Synthesis does not

Synthesis does not accurately model accurately model interconnect interconnect

I I Cell sizes fixed before

Cell sizes fixed before placement. placement.

I I Place & route unable

Place & route unable to meet timing goal to meet timing goal

slide-11
SLIDE 11

ASP-DAC'01 - Patrick Groeneveld III-11

place & route logic synthesis

The trial and error iteration The trial and error iteration

slide-12
SLIDE 12

ASP-DAC'01 - Patrick Groeneveld III-12

Methodology Problems Methodology Problems

I I To avoid endless iterations, the design must be ‘on

To avoid endless iterations, the design must be ‘on the safe side’ the safe side’

I I Iterations are very slow and may not converge

Iterations are very slow and may not converge

I I You’re never sure if you’ll make it

You’re never sure if you’ll make it

I I Only a painful trial and error process reports design

Only a painful trial and error process reports design feasibility. feasibility.

slide-13
SLIDE 13

ASP-DAC'01 - Patrick Groeneveld III-13

Ways to attack timing closure Ways to attack timing closure

I I Iterate through SPEF or internally

Iterate through SPEF or internally

I I Post

Post-

  • placement optimization (ECO)

placement optimization (ECO)

I I Partition the design into smaller pieces

Partition the design into smaller pieces

N Variation in wire length will decrease

Variation in wire length will decrease

N Better timing closure on each block if #gates < 50,000

Better timing closure on each block if #gates < 50,000

I I Gain

Gain-

  • based synthesis

based synthesis

slide-14
SLIDE 14

ASP-DAC'01 - Patrick Groeneveld III-14

Hierarchy: a solution? Hierarchy: a solution?

I I Make problems smaller

Make problems smaller

I I Structure makes the problem

Structure makes the problem better manageable better manageable

I I Solve sub

Solve sub-

  • problems

problems independently independently

I I Enables efficient re

Enables efficient re-

  • use

use

I I Enables consistent verification

Enables consistent verification

slide-15
SLIDE 15

ASP-DAC'01 - Patrick Groeneveld III-15

Physical hierarchy and timing closure Physical hierarchy and timing closure

I I Wires need to slalom around blocks or traverse

Wires need to slalom around blocks or traverse through or over blocks through or over blocks

I I How to set pin locations?

How to set pin locations?

I I Where to put the buffers?

Where to put the buffers?

I I Automatic floor planning problem is unsolved

Automatic floor planning problem is unsolved

I I Large hidden inefficiency

Large hidden inefficiency

slide-16
SLIDE 16

ASP-DAC'01 - Patrick Groeneveld III-16

Physical hierarchy is a necessary evil Physical hierarchy is a necessary evil

2,000,000 2,000,000 standard cells flat? standard cells flat?

macro

20 x 20 x approx

  • approx. 100,000

. 100,000 standard cells standard cells

macro

I I If you can do it, do it as flat

If you can do it, do it as flat as possible! as possible!

I I Also do

Also do datapath datapath flat flat

4 blocks of 500,000 4 blocks of 500,000 standard cells? standard cells?

macro

slide-17
SLIDE 17

ASP-DAC'01 - Patrick Groeneveld III-17

slack

Conventional layout synthesis Conventional layout synthesis

slack

Cdream s Creal

slide-18
SLIDE 18

ASP-DAC'01 - Patrick Groeneveld III-18

slack

Gain Gain-

  • based synthesis:

based synthesis:

Cdream s Creal

slide-19
SLIDE 19

ASP-DAC'01 - Patrick Groeneveld III-19

Focus for timing closure Focus for timing closure

I I Combine logical and physical worlds.

Combine logical and physical worlds.

I I Crisp: focus on the main effect, skip irrelevant details

Crisp: focus on the main effect, skip irrelevant details

I I Enable blazingly fast optimization

Enable blazingly fast optimization

I I Compact: Memory efficient for tomorrow’s 50M gate chip

Compact: Memory efficient for tomorrow’s 50M gate chip

slide-20
SLIDE 20

ASP-DAC'01 - Patrick Groeneveld III-20

Good practices, bad practices Good practices, bad practices

I

Use a simple model, and adapt reality to it. Use a simple model, and adapt reality to it.

I

At each step, freeze a single constraint, postpone decisions on At each step, freeze a single constraint, postpone decisions on others.

  • thers.

I

Allow sufficient freedom in future steps to fulfill all remainin Allow sufficient freedom in future steps to fulfill all remaining g constraints. constraints.

I

Bail out early if there’s no use continuing Bail out early if there’s no use continuing

I

Fix multiple objectives at once. Fix multiple objectives at once.

I

Iterate. Iterate.

I

Indulge in ‘accurate’ models Indulge in ‘accurate’ models

I

Attempt to be optimal Attempt to be optimal

slide-21
SLIDE 21

ASP-DAC'01 - Patrick Groeneveld III-21

Is ‘Optimal’ optimal??, an example Is ‘Optimal’ optimal??, an example

Contacts in layout have Contacts in layout have parasitic resistance and parasitic resistance and affect reliability affect reliability 8 contacts 8 contacts Optimal Optimal with contact minimization: with contact minimization: 0 contacts 0 contacts But….. But….. There are still 8 contacts! There are still 8 contacts!

(they were just pushed (they were just pushed into the neighboring regions) into the neighboring regions)

slide-22
SLIDE 22

ASP-DAC'01 - Patrick Groeneveld III-22

Compromises Compromises

I I Flexible

Flexible-

  • die design is better:

die design is better:

N Guarantee routing completion

Guarantee routing completion

N Until the last moment we can trade

Until the last moment we can trade-

  • off delay for area.
  • ff delay for area.

I I Fixed

Fixed-

  • die design instead

die design instead

N Need to guess initial utilization.

Need to guess initial utilization.

I I This could result in an iteration.

This could result in an iteration.

slide-23
SLIDE 23

ASP-DAC'01 - Patrick Groeneveld III-23

Simple delay model of a gate Simple delay model of a gate

I

Model transistor by a resistor and a switch Model transistor by a resistor and a switch

I

We can assume that the rise We can assume that the rise-

  • delay and the fall delay are similar.

delay and the fall delay are similar.

I

Therefore pull Therefore pull-

  • up

up Rui Rui and pull and pull-

  • down

down Rdi Rdi become become Ri Ri

I

The transistor impedance depends on the transistor size (W/L) The transistor impedance depends on the transistor size (W/L)

Rgate

in in in

  • ut
  • ut

Cin Cin Cload Cgate

in

  • ut
  • ut

Cgate Cload Rui Rdi

I

C Cin

in : input capacitance of the gate : input capacitance of the gate

I

C Cgate

gate : the internal parasitic capacitance (mostly diffusion) : the internal parasitic capacitance (mostly diffusion)

I

C Cload

load : the external load that the gate is driving : the external load that the gate is driving

I

R Rgate

gate : effective output impedance : effective output impedance

slide-24
SLIDE 24

ASP-DAC'01 - Patrick Groeneveld III-24

Parasitic delay

Gate delay and load Gate delay and load

gate gate load gate abs

C R C R d + =

Rgate

in

Cin

  • ut

Cgate Cload Cload

Cload delay x x x x

Delay dependency on load is often given as table.

Cin

slide-25
SLIDE 25

ASP-DAC'01 - Patrick Groeneveld III-25

Lets double the gate size Lets double the gate size

in

Cin,0

in

  • ut
  • ut

Cgate,0 Cload

in

  • ut

in

  • ut

Cgate,0 Cload

, , ,

2 2 1 2

gate gate gate gate in in

C C R R C C ∗ = = ∗ =

Rgate,0 Rgate,0 Rgate,0

Cgate,0 Cin,0 Cin,0

, , , , , ,

2 2 2 2

gate gate load gate abs gate gate load gate abs

C R C R d C R C R d + = ⇔ ∗ + =

slide-26
SLIDE 26

ASP-DAC'01 - Patrick Groeneveld III-26

Parasitic delay

Gate delay and size Gate delay and size

I I Assume a gate sizing factor

Assume a gate sizing factor α α (=relative scaling towards smallest) (=relative scaling towards smallest)

Gate size delay x x x x

, , , , , ,

1

gate gate load gate abs gate gate gate gate in in

C R C R d C C R R C C + = ∗ = = ∗ = α α α α

Cload Cload Cload So keeping Cload constant results in: Cload

slide-27
SLIDE 27

ASP-DAC'01 - Patrick Groeneveld III-27

Delay and gain Delay and gain

I

The gain is the ratio of the The gain is the ratio of the input capacitance and the input capacitance and the load capacitance: load capacitance:

I

Now we can rewrite the Now we can rewrite the previous equations to in previous equations to in terms of gain: terms of gain:

Rgate

in

Cin

  • ut

Cgate Cload

in load

C C h gain = = p h g p C C g d C R C C C R d C R C R d C C R R R C C C C

in load gate gate in load in gate abs gate gate load gate abs in in gate gate gate in in in in

+ ∗ = + ∗ = ⇔ + = ⇒ + = = = = ⇔ ∗ =

, , , , , , , , ,

α α α

slide-28
SLIDE 28

ASP-DAC'01 - Patrick Groeneveld III-28

Making delay independent of load Making delay independent of load

I

If the gain is constant, delay is constant over a range!! If the gain is constant, delay is constant over a range!! Cload Size = Cin Cload delay

Cload Cload Cload Cload

x x x x x x x x

slide-29
SLIDE 29

ASP-DAC'01 - Patrick Groeneveld III-29

Fixed Timing Methodology Fixed Timing Methodology

Delay Load Size

x

Fixed Timing plane Timing Sign-off

Cin Cload

slide-30
SLIDE 30

ASP-DAC'01 - Patrick Groeneveld III-30

Fixed Timing in a nutshell Fixed Timing in a nutshell

I I Goal:

Goal:

N Correct by construction (eliminate iterations)

Correct by construction (eliminate iterations)

N Emphasis on timing, not on size.

Emphasis on timing, not on size.

I I Map to size

Map to size-

  • independent

independent supercells supercells

I I Pick optimized delay up

Pick optimized delay up-

  • front = pick a gain

front = pick a gain

N If no feasible gain can be found: change your RTL

If no feasible gain can be found: change your RTL

I I Fix this delay throughout placement and routing

Fix this delay throughout placement and routing

I I Keep delay constant primarily by cell sizing.

Keep delay constant primarily by cell sizing.

slide-31
SLIDE 31

ASP-DAC'01 - Patrick Groeneveld III-31

“Fast circuit design on a napkin” “Fast circuit design on a napkin”

Fixed part, Fixed part, parasitic delay parasitic delay Delay of the Delay of the gate + its load gate + its load Electrical effort Electrical effort proportional to output load proportional to output load Cload

load / C

/ C

in in

Logical effort Logical effort depends on depends on function of gate function of gate

Delay = (g * h) + p Delay = (g * h) + p

Ivan Ivan Sutherland Sutherland (1991): (1991):

Cload Cin

For details: See the book: ‘Logical Effort’ by For details: See the book: ‘Logical Effort’ by Sutherland Sutherland, , Sproull Sproull, Harris , Harris Morgan Morgan Kaufmann Kaufmann publishers, ISBN 1 publishers, ISBN 1-55860 55860-557 557-6

slide-32
SLIDE 32

ASP-DAC'01 - Patrick Groeneveld III-32

Logical effort: g Logical effort: g

I

To keep the same output drive strength, the 2 n To keep the same output drive strength, the 2 n-

  • transistors in series

transistors in series must double their size. must double their size.

I

As a result, the input capacitance of the As a result, the input capacitance of the nand nand is larger. is larger.

I

For the same output drive strength, an inverter needs less input For the same output drive strength, an inverter needs less input capacitance: the inverter has a higher gain. capacitance: the inverter has a higher gain.

I

More complex gates have less gain More complex gates have less gain

Inverter: Cin = 1 Inverter: Cin = 1 Nand 2: Cin = 4/3 Nand 2: Cin = 4/3

slide-33
SLIDE 33

ASP-DAC'01 - Patrick Groeneveld III-33

Logical effort: g Logical effort: g

I

Assuming that in static Assuming that in static CMOS gates the mobility of CMOS gates the mobility of the p the p-

  • transistor is half of the

transistor is half of the n n-

  • mobility:

mobility:

Gate 1 2 3 n Inverter 1

  • Nand
  • 4/3

5/3 (n+2)/3 Nor

  • 5/4

7/3 (2n+1)/3

I

3 3-

  • input nor

input nor

slide-34
SLIDE 34

ASP-DAC'01 - Patrick Groeneveld III-34

p: The parasitic delay p: The parasitic delay

I I Independent of size and load

Independent of size and load

I I Dependent on process and logic function

Dependent on process and logic function

I I Can be ignored during optimization

Can be ignored during optimization

, , gate gate C

R p =

Same input cap Cin Then p2nand = 2pinverter

x x x x x x

Gate Relative Parasitic delay Inverter 1 n-input nand n n-input nor n

slide-35
SLIDE 35

ASP-DAC'01 - Patrick Groeneveld III-35

Putting it together Putting it together

Parasitic delay: p Effort delay: g*h 1 2 3 4 5 6 7 1 2 3 h: Electrical effort = gain d: normalized to inverter

Inverter: g=1, p=1 2

  • i

n p u t n

  • r

: g = 5 / 3 , p = 2

p h g d + = *

2-input nand : g=4/3, p=2 4-input nor: g=9/3, p=4

Cin Cload h = Cload/ Cin

slide-36
SLIDE 36

ASP-DAC'01 - Patrick Groeneveld III-36

Optimizing speed Optimizing speed

I I Goal: Drive load as

Goal: Drive load as fast fast as possible as possible

N N What is the optimal number of stages

What is the optimal number of stages n n ? ?

N N What is the size ratio of the gates?

What is the size ratio of the gates? Cload Cin

slide-37
SLIDE 37

ASP-DAC'01 - Patrick Groeneveld III-37

Tune for Tune for maximum maximum speed speed

I I Mead and Conway (1980), ignoring parasitic delay

Mead and Conway (1980), ignoring parasitic delay

gain stage i size i size C C C C H h stages

  • f

number H n gain total C C C C H

i in i in i in i load n i in n load in load

_ 71 . 2 ) ( ) 1 ( _ _ ) ln( _

, 1 , , , 1 , ,

= = + ≈ = = = = = = = =

+

Cload Cin,1 Cin,2 Cin,n Cin,3

I I With the parasitic delay p, the optimum ratio is 3.59

With the parasitic delay p, the optimum ratio is 3.59

slide-38
SLIDE 38

ASP-DAC'01 - Patrick Groeneveld III-38

Maximum speed…. Maximum speed….

slide-39
SLIDE 39

ASP-DAC'01 - Patrick Groeneveld III-39

Tune a path for maximum speed Tune a path for maximum speed

a b

I I Maximum speed is obtained if effort delay f=(g*h) is

Maximum speed is obtained if effort delay f=(g*h) is the same for each stage. the same for each stage.

I I The optimal effort delay is f = 3.59

The optimal effort delay is f = 3.59

I I The more complex the gate, the more capacitance

The more complex the gate, the more capacitance will be propagated backwards. will be propagated backwards.

59 . 3 ) * 4 5 ( =

a in a load

C C

c

59 . 3 ) * 3 6 ( =

b in b load

C C 59 . 3 ) * 3 7 ( =

c in c load

C C

20 =

c load

c 13 59 . 3 20 * 3 7 * = = = f c g C

c load c c in

2 . 7 59 . 3 13 * 3 6 = =

b in

C

5 . 2 59 . 3 2 . 7 * 4 5 = =

a in

C

slide-40
SLIDE 40

ASP-DAC'01 - Patrick Groeneveld III-40

Choosing the right number of stages Choosing the right number of stages (logical depth) (logical depth)

I

During layout: Adding inverters for long During layout: Adding inverters for long-

  • wire delay minimization.

wire delay minimization.

I

The optimum depth depends on the The optimum depth depends on the path effort path effort and process and process parameters. parameters.

I

Not very critical: being 50% off results in less than 10% delay Not very critical: being 50% off results in less than 10% delay penalty penalty

I

Logic depth is determined by synthesis Logic depth is determined by synthesis

I

pre pre-

  • layout: Adding buffers to high

layout: Adding buffers to high-

  • fanout

fanout nets generally improves speed nets generally improves speed due to the high inverter gain. due to the high inverter gain.

slide-41
SLIDE 41

ASP-DAC'01 - Patrick Groeneveld III-41

Assigning delays Assigning delays

I I Timing constraints determine the delay budget:

Timing constraints determine the delay budget:

N N e.g.

e.g. d dabcd

abcd < 2.0ns,

< 2.0ns, d

ded

ed < 2.0ns,

< 2.0ns, d

dfcd

fcd < 2.0ns

< 2.0ns

I I Spread delay budgets evenly over all paths

Spread delay budgets evenly over all paths

N If paths collide, take the smallest delay budget

If paths collide, take the smallest delay budget

N Relax others

Relax others

I I Translate delay budgets into gain.

Translate delay budgets into gain.

ff ff

b c d e f a 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 0.66 0.66 0.66 0.66 0.66 0.66 -> 1.0 > 1.0 1.0 1.0 -> 1.5 > 1.5

slide-42
SLIDE 42

ASP-DAC'01 - Patrick Groeneveld III-42

Pre Pre-

  • layout sign

layout sign-

  • off
  • ff

0.5ns 0.5ns 0.5ns 0.5ns FF ff ff ff ff

I I If there is no feasible gain assignment, the sizes

If there is no feasible gain assignment, the sizes literally ‘explode’. literally ‘explode’.

slide-43
SLIDE 43

ASP-DAC'01 - Patrick Groeneveld III-43

Keeping delay constant during layout Keeping delay constant during layout

I

The gain ratio (= The gain ratio (=Cload Cload/ /Cin Cin) is maintained is placement ) is maintained is placement

I

Sizes change Sizes change during during placement. placement.

I

As a result, delay is (almost) constant As a result, delay is (almost) constant

I

Sizes cannot ‘explode’ Sizes cannot ‘explode’ Cload/Cin = fixed

slide-44
SLIDE 44

ASP-DAC'01 - Patrick Groeneveld III-44

Sizing driven placement Sizing driven placement

I

Gate sizes change gradually during placement to keep Gate sizes change gradually during placement to keep delay constant. delay constant.

I

Placer much be able to cope with the net list changes Placer much be able to cope with the net list changes due to buffering, cloning, restructuring, clock insertion, due to buffering, cloning, restructuring, clock insertion, etc. etc.

I

.. while producing a routable result. .. while producing a routable result.

slide-45
SLIDE 45

ASP-DAC'01 - Patrick Groeneveld III-45

Automatic Automatic Congestion Handling Congestion Handling

I I During placement

During placement

Routing Congestion Utilization Routing Congestion Utilization

slide-46
SLIDE 46

ASP-DAC'01 - Patrick Groeneveld III-46

What happened What happened

…. at the logical …. at the logical-

  • physical boundary?

physical boundary?

I

Delay fixed Delay fixed

I

Cell Area unknown Cell Area unknown

I

Sum of areas determines Sum of areas determines chip size. (Additive) chip size. (Additive)

I

No iterations required No iterations required

I

Each gate has exactly the Each gate has exactly the right drive strength: right drive strength:

N Not too little (fanout

Not too little (fanout violation, timing fails) violation, timing fails)

N Not too much (waste of

Not too much (waste of area) area)

I

Cell Area fixed Cell Area fixed

I

Delay is a gamble Delay is a gamble

I

Worst case delay Worst case delay determines timing (max) determines timing (max)

I

Iterate to make ends meet. Iterate to make ends meet.

I

After timing finally closes, After timing finally closes, many gates will be too big: many gates will be too big:

N

waste of area waste of area

N

waste of power waste of power

slide-47
SLIDE 47

ASP-DAC'01 - Patrick Groeneveld III-47

Conventional way: Conventional way: Worst case delay sets timing Worst case delay sets timing

I

99% of paths meets timing, 99% of paths meets timing, 1% does not 1% does not

I

Cell sizes do not change Cell sizes do not change during Place and Route during Place and Route

I

Design conservatively to avoid Design conservatively to avoid excessive iterations. Also excessive iterations. Also WLM is tuned conservatively. WLM is tuned conservatively.

I

This This oversizes

  • versizes all cells

all cells

N

because also cells on non because also cells on non- critical paths are sized up. critical paths are sized up.

I

Chip significantly bigger than Chip significantly bigger than necessary (10 necessary (10-

  • 30%)

30%)

slide-48
SLIDE 48

ASP-DAC'01 - Patrick Groeneveld III-48

What about In What about In-

  • place optimization?

place optimization?

I

Do a post Do a post-

  • placement ECO,

placement ECO,

I

Change only the cells on the Change only the cells on the critical paths. critical paths.

I

Conservatism is still required Conservatism is still required because of limited ECO because of limited ECO capacity. capacity.

I

All non All non-

  • critical cells are still

critical cells are still

  • versized
  • versized

I

Chip still bigger than Chip still bigger than necessary. necessary.

slide-49
SLIDE 49

ASP-DAC'01 - Patrick Groeneveld III-49

Gain based synthesis: area is additive Gain based synthesis: area is additive

I

Timing is fixed, Timing is fixed,

I

As a result, cell sizes change. As a result, cell sizes change.

I

But large cells and small cells But large cells and small cells cancel out: some get bigger, cancel out: some get bigger,

  • thers smaller
  • thers smaller

I

All cells have exactly the right All cells have exactly the right drive strength: many paths are drive strength: many paths are almost critical. almost critical.

I

Chip size remains small (10 Chip size remains small (10-

  • 30% smaller than conventional

30% smaller than conventional way) way)

slide-50
SLIDE 50

ASP-DAC'01 - Patrick Groeneveld III-50

Logic ( Logic (wireload wireload) Synthesis ) Synthesis

I I For a simple function ( (A’ + B) * C ) `

For a simple function ( (A’ + B) * C ) `

I I Various logic structures are possible with one size

Various logic structures are possible with one size

I I Conventional logic synthesis tool attempts to

Conventional logic synthesis tool attempts to

  • ptimize the delay by:
  • ptimize the delay by:

N Logic restructuring

Logic restructuring

N Picking the proper sizes

Picking the proper sizes

I I This is driven by a vague idea of the wire load

This is driven by a vague idea of the wire load

slide-51
SLIDE 51

ASP-DAC'01 - Patrick Groeneveld III-51

Many sizing combinations Many sizing combinations

Heuristics tradeoffs -- significantly slower than equation-based constant delay

slide-52
SLIDE 52

ASP-DAC'01 - Patrick Groeneveld III-52

Gain Gain-

  • based synthesis:

based synthesis: supercells supercells

I

Need a single ‘super’ cell representing all sizes in a logic fun Need a single ‘super’ cell representing all sizes in a logic function. ction. Super!

I

Contains: Contains:

N

g, h, p g, h, p

N

size size-range range

slide-53
SLIDE 53

ASP-DAC'01 - Patrick Groeneveld III-53

Gain Gain-

  • based mapping

based mapping

I I In timing

In timing-

  • critical parts, the

critical parts, the mapper mapper picks super cells picks super cells that have low parasitic delay and highest maximum that have low parasitic delay and highest maximum drive strength. drive strength.

I I In non

In non-

  • critical parts, ‘weaker’ super cells can be used.

critical parts, ‘weaker’ super cells can be used.

N Pick cells that have potentially the smallest size.

Pick cells that have potentially the smallest size.

I I Insert buffers on high

Insert buffers on high-

  • fanout

fanout nets nets

slide-54
SLIDE 54

ASP-DAC'01 - Patrick Groeneveld III-54

Putting it together Putting it together

I

Map onto generic ‘super cells’ with flexible area. Map onto generic ‘super cells’ with flexible area.

I

Optimize gains for all super cells such that maximum speed is Optimize gains for all super cells such that maximum speed is achieved.

  • achieved. This fixes all delays in the circuit!

This fixes all delays in the circuit!

I

Give up Give up if the (optimally conditioned) circuit does not meet the given if the (optimally conditioned) circuit does not meet the given timing criteria. timing criteria.

I

Perform ‘sizing driven placement’: keep delay constant by adapti Perform ‘sizing driven placement’: keep delay constant by adapting cell ng cell size to parasitic capacitance of the wires. Parasitic wire delay size to parasitic capacitance of the wires. Parasitic wire delay is based is based

  • n coarse routing of the wires.
  • n coarse routing of the wires.

I

Fix remaining timing problems through buffering, cloning, restru Fix remaining timing problems through buffering, cloning, restructuring. cturing.

I

Update floor plan if the timing is still not met. Update floor plan if the timing is still not met.

I

For each For each supercell supercell, pick the one standard cell that matches the , pick the one standard cell that matches the required drive strength. required drive strength.

I

Legalize the placement (a.k.a detailed placement) Legalize the placement (a.k.a detailed placement)

I

Perform final routing under delay constraints. Perform final routing under delay constraints.

slide-55
SLIDE 55

ASP-DAC'01 - Patrick Groeneveld III-55

That’s very nice in theory, but…. That’s very nice in theory, but….

I I Library only has a few drive strengths: is there a

Library only has a few drive strengths: is there a descretization descretization error? error?

I I How to account for differences in fall and rise time?

How to account for differences in fall and rise time?

I I Do I need a special library?

Do I need a special library?

I I What if a very large drive strength is needed?

What if a very large drive strength is needed?

I I When are buffers inserted?

When are buffers inserted?

I I Isn’t the model too simplistic?

Isn’t the model too simplistic?

I I What about the parasitic wire resistance?

What about the parasitic wire resistance?

slide-56
SLIDE 56

ASP-DAC'01 - Patrick Groeneveld III-56

Library Analysis Library Analysis

/cmos18/NAND2 (A /cmos18/NAND2 (A -> Z) inverting > Z) inverting model model hide hide typ typ load load gain gain input cap input cap area area rise delay rise delay fall delay fall delay slew slew max slew max slew

  • NAND2d1

NAND2d1 25 25 2.51 2.51 10 10 1 161 161 102 102 66 66 2000 2000 NAND2d2 NAND2d2 54 54 2.71 2.71 20 20 1 153 153 100 100 67 67 2000 2000 NAND2d3 NAND2d3 110 110 2.69 2.69 41 41 2 153 153 100 100 67 67 2000 2000 NAND2d4 NAND2d4 186 186 2.66 2.66 70 70 5 153 153 99 99 67 67 2000 2000 NAND2d5 NAND2d5 D 370 370 18.52 18.52 20 20 9 254 254 293 293 57 57 2000 2000

  • NAND2_SUPER

NAND2_SUPER 370 370 2.74 2.74 148 148 108 108 67 67 2000 2000 I

Gain is averaged Gain is averaged

I

Toss out ‘weird cells’ Toss out ‘weird cells’

I

Typical load is the load the gate Typical load is the load the gate drives when optimized for maximum drives when optimized for maximum speed: g*h =3.59 speed: g*h =3.59 Cload Cin

d1 d2 d4 d3 d5

slide-57
SLIDE 57

ASP-DAC'01 - Patrick Groeneveld III-57

Fixing cell sizes & keeping timing Fixing cell sizes & keeping timing

Standard Cell SuperCell

1x 2x 4x

Cload Cin

1x 2x 4x Permissible range Load violation

slide-58
SLIDE 58

ASP-DAC'01 - Patrick Groeneveld III-58

The The discretization discretization error... error...

Gain= 0.3 Gain= 0.3 Gain= 0.9 Gain= 0.9 1x 1x 2x 2x 2x 2x 2.2x 2.2x 2.9x 2.9x 4x 4x 1.3x 1.3x 1.2x 1.2x Gain= 0.7 Gain= 0.7 Gain= 0.9 Gain= 0.9

slide-59
SLIDE 59

ASP-DAC'01 - Patrick Groeneveld III-59

.. is generally not a big problem .. is generally not a big problem

I

Delay versus size curve is Delay versus size curve is flat, because the size is flat, because the size is

  • ptimized for maximum
  • ptimized for maximum

speed speed

I

Rounding error is absorbed Rounding error is absorbed by appropriate up by appropriate up-

  • and

and downsizing of surrounding downsizing of surrounding cells. cells.

I

On critical paths, buffer On critical paths, buffer insertion and logic insertion and logic restructuring minimize effect. restructuring minimize effect.

Optimum delay at 3.2x, but size is not available size Path delay 2x 4x 1x x x x

slide-60
SLIDE 60

ASP-DAC'01 - Patrick Groeneveld III-60

Load violations Load violations

I

Maximum drive strength in the library might be too small Maximum drive strength in the library might be too small

I

Drive information is stored in super cell, and managed pre Drive information is stored in super cell, and managed pre-

  • placement.

placement.

I

Buffering, cloning and restructuring are used to maintain delay Buffering, cloning and restructuring are used to maintain delay during during placement placement

Cload Cin

1x 2x 4x Permissible range Load violation

slide-61
SLIDE 61

ASP-DAC'01 - Patrick Groeneveld III-61

Buffered wire: smallest delay Buffered wire: smallest delay

I

Delay per stage ( Delay per stage (elmore elmore): ):

I

Optimum buffer distance: Optimum buffer distance:

I

Optimum buffer size: Optimum buffer size:

2

C R C w

w w

  • pt

τ =

w w buffer

  • pt

C R p L ) 1 ( 2 + = τ

w LC R 2 L C R ) w C L C ( w R d

w 2 w w w

+ + + =

slide-62
SLIDE 62

ASP-DAC'01 - Patrick Groeneveld III-62

Buffering in a typical 0.25 Buffering in a typical 0.25 µ µm process m process

I

Optimum buffer distance tends to be around 2000 Optimum buffer distance tends to be around 2000 µ µm. m.

I

This works out to an area of 4mm This works out to an area of 4mm2, or about 10 , or about 10-

  • 20K cells.

20K cells.

I

But But w wopt

  • pt is

is much much larger then what most libraries have available: larger then what most libraries have available:

W (buffer size)

Delay per micron 50x 100x 25x

Optimal at 80x

75x

Range of available drive strengths in the library

slide-63
SLIDE 63

ASP-DAC'01 - Patrick Groeneveld III-63

Library constrains performance Library constrains performance

I I Limited drive strength in standard cell libraries results

Limited drive strength in standard cell libraries results in significantly longer delays at the chip in significantly longer delays at the chip-

  • level.

level.

I I This is true for

This is true for ANY ANY methodology, and not exclusive methodology, and not exclusive to gain to gain-

  • based synthesis.

based synthesis.

I I Reason for limited drive strength:

Reason for limited drive strength:

N Concerns about signal

Concerns about signal electromigration electromigration. .

N Router doesn’t handle wide wires.

Router doesn’t handle wide wires.

N Huge cells (20x a ‘normal’ cell) frustrates placer.

Huge cells (20x a ‘normal’ cell) frustrates placer.

N Folklore.

Folklore.

slide-64
SLIDE 64

ASP-DAC'01 - Patrick Groeneveld III-64

Parallel cells Parallel cells

I I A simple way to test whether a better library would

A simple way to test whether a better library would improve results: improve results:

I I Issues:

Issues:

N testability

testability

N signal

signal-

  • EM

EM

N congestion: detailed placer

congestion: detailed placer

slide-65
SLIDE 65

ASP-DAC'01 - Patrick Groeneveld III-65

Electromigration Electromigration: wires wear out : wires wear out

Electrons move atoms Electrons move atoms

Contact (tungsten)

‘reservoir’

‘End-of-line’

  • verhang

‘Cavities’ in wire

slide-66
SLIDE 66

ASP-DAC'01 - Patrick Groeneveld III-66

Dealing with Dealing with Electromigration Electromigration

I

A statistical effect, resulting in a gradual increase of the wir A statistical effect, resulting in a gradual increase of the wire e resistance, followed by failure. resistance, followed by failure.

I

The time that 50% of the wires fail is given by:: The time that 50% of the wires fail is given by::

kT E f

a

e J A t

= * 1 *

2

I

Depends on the current density J Depends on the current density J

N Wider wires would help

Wider wires would help

I

Exponential dependency on temperature makes it hard to Exponential dependency on temperature makes it hard to predict. predict.

I

Wires self Wires self-

  • heat due to resistance

heat due to resistance

slide-67
SLIDE 67

ASP-DAC'01 - Patrick Groeneveld III-67

What makes a good DSM library? What makes a good DSM library?

I I Many drive strengths per function

Many drive strengths per function

N No functions with few drive strengths

No functions with few drive strengths

N No holes or missing drive strengths

No holes or missing drive strengths

N Also have drive strengths for flip

Also have drive strengths for flip-

  • flops and latches

flops and latches

I I High drive strengths

High drive strengths

I I Linear scaling of load and area

Linear scaling of load and area

N avoid multi

avoid multi-

  • stage cells

stage cells

I I Avoid multi

Avoid multi-

  • output cells
  • utput cells

I I Avoid single stage gates with more than 4 inputs

Avoid single stage gates with more than 4 inputs

I I Not many different functions are needed.

Not many different functions are needed.

slide-68
SLIDE 68

ASP-DAC'01 - Patrick Groeneveld III-68

Buffering & wire sizing Buffering & wire sizing

I

To tame the quadratic nature of wire delay To tame the quadratic nature of wire delay

I

To avoid load violations To avoid load violations

I

A static timer is run concurrently during (incremental) A static timer is run concurrently during (incremental) placement placement

I

Wire delay is estimated based on the most accurate Wire delay is estimated based on the most accurate information available at the time: information available at the time:

N

Elmore Elmore I (based on I (based on steiner steiner tree) tree)

N

Elmore Elmore II (based on global routing) II (based on global routing)

N

2nd order AWE (post routing) 2nd order AWE (post routing)

I

Buffers are inserted where needed Buffers are inserted where needed

N

After buffer insertion the gains need to be re After buffer insertion the gains need to be re-distributed distributed

slide-69
SLIDE 69

ASP-DAC'01 - Patrick Groeneveld III-69

Wire delay optimization Wire delay optimization

I

Delay after optimization: Delay after optimization:

! buffering

buffering,

,

! cell sizing

cell sizing

! wire sizing

wire sizing.

I

0.18 micron technology 0.18 micron technology

I

∆ ∆ Wire length 64x results in Wire length 64x results in

I

∆ ∆ Delay < 3x Delay < 3x

10 100 1000 100 1000 10000

Wire Length(um) Delay (ps)

Data courtesy of Prof. Jason Cong, UCLA

slide-70
SLIDE 70

ASP-DAC'01 - Patrick Groeneveld III-70

Logic cloning and restructuring Logic cloning and restructuring

I I To keep timing fixed by adapting the reality to the

To keep timing fixed by adapting the reality to the model model

I I Restructuring and rewiring of the critical path

Restructuring and rewiring of the critical path improves timing. improves timing.

slide-71
SLIDE 71

ASP-DAC'01 - Patrick Groeneveld III-71

Gain based synthesis flow Gain based synthesis flow

I I Timing analysis tool runs

Timing analysis tool runs concurrently during all steps concurrently during all steps

I I Strong infrastructure is

Strong infrastructure is necessary necessary

I I Backend (routing) must

Backend (routing) must make this come true make this come true

Sizing-driven placement buffering cloning, restructuring clock insertion RTL

OK?

Scan insertion detailed placement track routing detailed routing Logic mapping Gain assignment

OK?

GDSII

Library analysis Build supercells

Delays fixed, sized floating Delays fixed, Sizes fixed

slide-72
SLIDE 72

ASP-DAC'01 - Patrick Groeneveld III-72

Objectives Objectives

I I Implement wire pattern that is:

Implement wire pattern that is:

N N LVS

LVS-

  • correct: no shorts nor unconnects

correct: no shorts nor unconnects

N N DRC

DRC-

  • correct, includes electromigration and

correct, includes electromigration and antenna rules antenna rules

N N

correct: adapt model to reality correct: adapt model to reality

N N Deals with special requirements for power and

Deals with special requirements for power and clock routing clock routing

slide-73
SLIDE 73

ASP-DAC'01 - Patrick Groeneveld III-73

Correct by Construction or Correct by Construction or Construct by Correction?? Construct by Correction??

I I Traditional tools are primarily focused on completion:

Traditional tools are primarily focused on completion:

N Correct by construction for LVS and DRC, but not for timing!

Correct by construction for LVS and DRC, but not for timing!

N Timing violations addressed by rip

Timing violations addressed by rip-

  • up

up-

  • and

and-

  • reroute, I.e. ‘construct

reroute, I.e. ‘construct by correction’. by correction’.

I I Modern EDA flows should target ‘correct by construction’

Modern EDA flows should target ‘correct by construction’ for timing: for timing:

N careful planning for timing budget and

careful planning for timing budget and

N variable spacing and width

variable spacing and width detailed routing. detailed routing.

slide-74
SLIDE 74

ASP-DAC'01 - Patrick Groeneveld III-74

Global routing Global routing

Bucket

Finds coarse path and layer Finds coarse path and layer assignment for each net, such that: assignment for each net, such that: wire density is spread evenly wire density is spread evenly

slide-75
SLIDE 75

ASP-DAC'01 - Patrick Groeneveld III-75

Interconnect speed Interconnect speed

ground plane

top view dlat dlat w h dox l C Cwire

wire = C

= C0 * ((l * w)/ * ((l * w)/d dox

  • x + (2 * l * h)/

+ (2 * l * h)/d dlat

lat) =

) = C Cwire

wire,gnd gnd + + C

Cwire

wire,lat lat

Consider the middle wire: Consider the middle wire: R Rwire

wire = R

= R0 * l/(w * h) * l/(w * h) ground ground lateral lateral

τwire = Rwire * Cwire = quadratic with length l

slide-76
SLIDE 76

ASP-DAC'01 - Patrick Groeneveld III-76

Applying Applying Moore’s Moore’s law law

I I Double the density by a lateral shrink:

Double the density by a lateral shrink:

N l, w and

l, w and d dlat

lat shrink by factor

shrink by factor sqrt sqrt(2) (2)

ground plane

dlat dlat h dox w C Cwire

wire = C

= C0 * (( * ((l l * * w w)/ )/d dox

  • x + (2 *

+ (2 * l l * h)/ * h)/d dlat

lat)

) R Rwire

wire = R

= R0 * * l l/( /(w w * h) = constant * h) = constant ground = half ground = half lateral = constant lateral = constant

slide-77
SLIDE 77

ASP-DAC'01 - Patrick Groeneveld III-77

Speedup due to shrink Speedup due to shrink

Cgate Rgate Rwire Cwire unchanged unchanged half hardly smaller … speedup with lateral capacitance is down to 1 instead of factor 2 (without)

slide-78
SLIDE 78

ASP-DAC'01 - Patrick Groeneveld III-78

Lateral capacitance is worse! Lateral capacitance is worse!

effectively 2 x Clat This is the miller effect

slide-79
SLIDE 79

ASP-DAC'01 - Patrick Groeneveld III-79

Crosstalk Crosstalk Noise on wires Noise on wires

I

The size of the cross talk capacitor The size of the cross talk capacitor

I

Slope of the aggressor Slope of the aggressor

I

Threshold voltage Threshold voltage

I

Ratio between victim and aggressor output resistance's Ratio between victim and aggressor output resistance's

Cross talk causes noise, which depends on: Cross talk causes noise, which depends on: Cgate Rgate C Cwire

wire,lat lat

slide-80
SLIDE 80

ASP-DAC'01 - Patrick Groeneveld III-80

Track Routing: maintaining timing Track Routing: maintaining timing

I

Refines the global routing by fixing track positions Refines the global routing by fixing track positions

I

Timing is a given constraint: satisfy crosstalk by spacing apart Timing is a given constraint: satisfy crosstalk by spacing apart ‘unfriendly’ wires. ‘Friendliness’ data is given by timer. ‘unfriendly’ wires. ‘Friendliness’ data is given by timer.

I

Use shielding for clocks, spacing or shielding for signal wires. Use shielding for clocks, spacing or shielding for signal wires.

Spacing between unfriendly nets is enlarged to meet load budget.

slide-81
SLIDE 81

ASP-DAC'01 - Patrick Groeneveld III-81

“Common Database” Architecture “Common Database” Architecture

Timing algorithm Database, translators (on hard disk) TOOL 1 Data Model Extraction algorithm

. . .

TOOL n Data Model Placement algorithm TOOL 2 Data Model Routing algorithm TOOL 3 Data Model I

Each tool has its own data Each tool has its own data

  • representation. Design data is
  • representation. Design data is

shared by: shared by:

N

reading/writing (huge) files. reading/writing (huge) files.

N

Data management layer Data management layer controls access to files and controls access to files and convert formats convert formats

I

Great for “integrating” many Great for “integrating” many separate tools. separate tools.

I

Makes real Makes real-

  • time sharing of data

time sharing of data slow and inefficient. slow and inefficient.

slide-82
SLIDE 82

ASP-DAC'01 - Patrick Groeneveld III-82

Infrastructure is key Infrastructure is key

In-core Data Model Placement Alg. Routing Alg. Tool n Alg.

. . .

TCL access Timing Alg.

I

Tools share a common Tools share a common data structure. They run data structure. They run directly on it. directly on it.

I

Let all design data lives Let all design data lives “in core” during the flow, “in core” during the flow, attached to data attached to data structure. structure.

I

Use only one format: the Use only one format: the data structure data structure

GUI access Verification Alg.

Volcano on disk

External formats

slide-83
SLIDE 83

ASP-DAC'01 - Patrick Groeneveld III-83

Track Re Track Re-

  • ordering
  • rdering

I

Crosstalk aware wire ordering during routing Crosstalk aware wire ordering during routing

I

Based on timing windows Based on timing windows

ET LT ET LT ET LT

NET B NET A NET C

ET LT ET LT ET LT

NET A NET C NET B

slide-84
SLIDE 84

ASP-DAC'01 - Patrick Groeneveld III-84

How to get timing closure? How to get timing closure?

I I Good placements and floor plans

Good placements and floor plans

N Floorplanning

Floorplanning is a hard and unsolved problem is a hard and unsolved problem

I I Let the computer do the work for you

Let the computer do the work for you

N If you have no clue about the floor plan: flatten it!

If you have no clue about the floor plan: flatten it!

I I EDA tool needs to:

EDA tool needs to:

N Have massive capacity

Have massive capacity

N Have a transparent data model

Have a transparent data model

I I Relaxing some parameters could help dramatically.

Relaxing some parameters could help dramatically.

slide-85
SLIDE 85

Kdomain Kdomain 3.2M gates 3.2M gates Odomain Odomain 2.5M gates 2.5M gates T1 T1

812K 812K gates gates

T2 T2 2.1M 2.1M gates gates

3-D labs design 0.18u 266Mhz

slide-86
SLIDE 86

ASP-DAC'01 - Patrick Groeneveld III-86

slide-87
SLIDE 87

ASP-DAC'01 - Patrick Groeneveld III-87

Summary Summary

I I The gain based synthesis model proves excellent for

The gain based synthesis model proves excellent for the logic to layout conversion. the logic to layout conversion.

I I Timing is more important than actual gate size:

Timing is more important than actual gate size: therefore delays is fixed before size. therefore delays is fixed before size.

I I The simplicity of the model allows scaling to larger

The simplicity of the model allows scaling to larger chips (millions of chips (millions of placeable placeable objects).

  • bjects).