Part II: Timing Closure Today Lou Scheffer Lou Scheffer Cadence - - PowerPoint PPT Presentation

part ii timing closure today
SMART_READER_LITE
LIVE PREVIEW

Part II: Timing Closure Today Lou Scheffer Lou Scheffer Cadence - - PowerPoint PPT Presentation

Part II: Timing Closure Today Lou Scheffer Lou Scheffer Cadence Cadence San Jose, CA San Jose, CA Lou@cadence.com Lou@cadence.com ASP-DAC'01 Lou Scheffer 1 Timing Closure Today Design Entry Timing more accurate as flow progresses


slide-1
SLIDE 1

ASP-DAC'01 Lou Scheffer 1

Part II: Timing Closure Today

Lou Scheffer Lou Scheffer Cadence Cadence San Jose, CA San Jose, CA

Lou@cadence.com Lou@cadence.com

slide-2
SLIDE 2

ASP-DAC'01 Lou Scheffer 2

Timing Closure Today

  • Timing more accurate as flow progresses
  • Sometimes an earlier stage thinks timing is

OK, but it fails a later stage

  • Need to repeat one or more steps with

tighter constraints

  • We have a timing closure problem when

this process fails. Symptoms include:

  • Non-convergence
  • Too many iterations
  • Solution achievable, but this flow

cannot find it.

Design Entry Synthesis Timing Place Timing Route Timing

slide-3
SLIDE 3

ASP-DAC'01 Lou Scheffer II-3

The Timing Closure Problem

Performance of Circuit Test 7 99 96 100 78 99 83

75 80 85 90 95 100 PKS/WLM P&R IPO P&R Stage Frequency (Target !00MHz)

pks regular

slide-4
SLIDE 4

ASP-DAC'01 Lou Scheffer 4

Examples of Problems

.18 .18 µ µ m m 7.5 ns 7.5 ns

  • 11 / 2000

11 / 2000

  • 0.5 / 500

0.5 / 500 V2 V2 Placed Placed Synthesis Synthesis .25 .25 µ µ m m 8 ns 8 ns

  • 97 / 43k

97 / 43k

  • 0.4 / 100

0.4 / 100 P1 P1 .18 .18 µ µ m m 2.5 2.5-

  • 10 ns

10 ns

  • 48 / 164k

48 / 164k

  • 0.5 / 2000

0.5 / 2000 T1 T1 .18 .18 µ µ m m 7.5 ns 7.5 ns

  • 12 / 15k

12 / 15k 0 / 0 0 / 0 V1 V1 .25 .25 µ µ m m 7.5 ns 7.5 ns

  • 12 / 38k

12 / 38k

  • 1 / 2000

1 / 2000 C1 C1 Tech Tech Cycle Cycle time time Worst slack / # misses Worst slack / # misses Design Design

slide-5
SLIDE 5

ASP-DAC'01 Lou Scheffer II-5

Agenda

I I Traditional design flows

Traditional design flows

I I Summary of DSM Problems

Summary of DSM Problems

I I Timing Analysis Overview

Timing Analysis Overview

I I Timing Correction Overview

Timing Correction Overview

I I Approaches to Fixing Timing Closure

Approaches to Fixing Timing Closure

I I Experimental Results

Experimental Results

I I Summary

Summary

slide-6
SLIDE 6

ASP-DAC'01 Lou Scheffer II-6

Traditional Design Flows

Design Entry Synthesis Timing Place Timing Route Timing 1. Tech independent

  • ptimization

2. Tech mapping 3. Rudimentary timing correction

slide-7
SLIDE 7

ASP-DAC'01 Lou Scheffer II-7

Logic Synthesis Flow

I I Technology independent optimization

Technology independent optimization

N N General goal: reduce connections, literals,

General goal: reduce connections, literals, redundancies, area redundancies, area

I I Technology mapping

Technology mapping

N N Map logic into technology library

Map logic into technology library

I I Timing correction

Timing correction

N N Find and fix critical timing paths

Find and fix critical timing paths

N N Fix electrical violations (load, slew)

Fix electrical violations (load, slew)

slide-8
SLIDE 8

ASP-DAC'01 Lou Scheffer II-8

Traditional Design Flows

Design Entry Synthesis w/Timing Place w/Timing Route Timing

Integrate timing with synthesis and placement

1. Tech independent

  • ptimization

2. Tech mapping 3. Timing correction

slide-9
SLIDE 9

ASP-DAC'01 Lou Scheffer II-9

Agenda

I I Traditional design flows

Traditional design flows

I I Summary of DSM Problems

Summary of DSM Problems

I I Analysis Methods Overview

Analysis Methods Overview

I I Correction Methods Overview

Correction Methods Overview

I I Approaches to Fixing Timing Closure

Approaches to Fixing Timing Closure

I I Experimental Results

Experimental Results

I I Summary

Summary

slide-10
SLIDE 10

ASP-DAC'01 Lou Scheffer II-10

The Wall

I I Logic designers concentrate on logic and

Logic designers concentrate on logic and timing (as understood by synthesis) timing (as understood by synthesis)

I I Design work done in abstract world of gates

Design work done in abstract world of gates and wire load models and wire load models

I I Throw design

Throw design over the wall

  • ver the wall when complete

when complete

I I Physical designers concentrate on layout

Physical designers concentrate on layout and ability to route and ability to route

I I Effective method for many years

Effective method for many years

slide-11
SLIDE 11

ASP-DAC'01 Lou Scheffer II-11

General CMOS Problems

I I Low drive strengths / low power

Low drive strengths / low power

N N Capacitance (not intrinsic delay) plays a large

Capacitance (not intrinsic delay) plays a large role in performance role in performance

N N Variability

Variability – – range between slowest possible range between slowest possible and fastest possible and fastest possible

I I Noise affects delay

Noise affects delay

N N IR drop a big percentage of supply

IR drop a big percentage of supply

N N Crosstalk

Crosstalk can change delay by a factor of 2 can change delay by a factor of 2

slide-12
SLIDE 12

ASP-DAC'01 Lou Scheffer II-12

Additional DSM Problems

I I High density / huge designs

High density / huge designs

I I Very thin and resistive wires

Very thin and resistive wires

I I Very high frequencies

Very high frequencies

N N Inductance becomes more important

Inductance becomes more important

I I Smaller voltages

Smaller voltages

N N IR drop a bigger fraction of signal swing

IR drop a bigger fraction of signal swing

I I Clock skew and latency

Clock skew and latency

I I Electromigration and noise

Electromigration and noise

slide-13
SLIDE 13

ASP-DAC'01 Lou Scheffer II-13

Clock Distribution Problems

I I Most common design approach requires

Most common design approach requires close to zero skew close to zero skew

I I CMOS / DSM problems all affect clocks

CMOS / DSM problems all affect clocks

I I Distribution problem increasing

Distribution problem increasing

N N Number of latches/flip

Number of latches/flip-

  • flops growing

flops growing significantly significantly

I I Power consumed in clock tree significant

Power consumed in clock tree significant

N N ∆

∆I I and noise also of concern and noise also of concern

slide-14
SLIDE 14

ASP-DAC'01 Lou Scheffer II-14

Process Designers are trying to help

I I Many metal layers

Many metal layers

I I Different metal pitches

Different metal pitches

N N Small pitch for local interconnect

Small pitch for local interconnect

N N Big pitch for long, fast wires

Big pitch for long, fast wires

I I Copper wires, thick metal to lower R

Copper wires, thick metal to lower R

I I SOI

SOI – – Silicon On Insulator Silicon On Insulator

I I Low k dielectrics

Low k dielectrics

I I These help but are not enough

These help but are not enough

slide-15
SLIDE 15

ASP-DAC'01 Lou Scheffer II-15

Agenda

I I Traditional design flows

Traditional design flows

I I Summary of DSM Problems

Summary of DSM Problems

I I Analysis Methods Overview

Analysis Methods Overview

I I Correction Methods Overview

Correction Methods Overview

I I Approaches to Fixing Timing Closure

Approaches to Fixing Timing Closure

I I Experimental Results

Experimental Results

I I Summary

Summary

slide-16
SLIDE 16

ASP-DAC'01 Lou Scheffer II-16

Timing Analysis

I I Give accurate time values on each pin/port

Give accurate time values on each pin/port

  • f the network
  • f the network

I I Has to deal with design changes in

Has to deal with design changes in

  • ptimization toolbox
  • ptimization toolbox

I I Static

Static Timing Analysis Timing Analysis

N N Simulation far too slow in optimization

Simulation far too slow in optimization environment environment

N N Accuracy is more than enough

Accuracy is more than enough

slide-17
SLIDE 17

ASP-DAC'01 Lou Scheffer II-17

Timing Analysis Requirements

I I Choose combination of timing analyzer and delay

Choose combination of timing analyzer and delay calculator which are appropriate for level of calculator which are appropriate for level of design design

N N give the best accuracy

give the best accuracy

N N for performance that can be tolerated

for performance that can be tolerated

I I Timing Analysis / Delay calculation must be able

Timing Analysis / Delay calculation must be able to cope with logic design changes to cope with logic design changes

N N Incremental

Incremental

N N Highest performance possible

Highest performance possible

N N Non

Non-

  • linear delay equations

linear delay equations

slide-18
SLIDE 18

ASP-DAC'01 Lou Scheffer II-18

Timing Analysis Requirements

I I Must handle…

Must handle…

N N Difference between rising and falling delays

Difference between rising and falling delays

N N Delay dependent on slew rate

Delay dependent on slew rate

N N Slew and delay dependent on output load

Slew and delay dependent on output load

N N Non

Non-

  • linear delay equations

linear delay equations

slide-19
SLIDE 19

ASP-DAC'01 Lou Scheffer II-19

Late Mode Analysis Definitions

  • Constraints: assertions at the boundaries

Constraints: assertions at the boundaries

– – Arrival times: Arrival times: AT ATa

a,

, AT ATb

b

– – Required arrival time: Required arrival time: RAT RATx

x

  • Delay from

Delay from a a to to x x is the longest time it takes to is the longest time it takes to propagate a signal from propagate a signal from a a to to x x

  • Slack is required arrival time

Slack is required arrival time -

  • arrival time.

arrival time.

a b x c y

a

AT

b

AT

x

RAT

ax

d

slide-20
SLIDE 20

ASP-DAC'01 Lou Scheffer II-20

Example

a b x c y

=

a

AT

1 =

b

AT 2 =

x

RAT

3 =

x

AT 1 3 2 − = − =

x

SL

1 1 − = − =

b

SL

= − =

a

SL 2 =

y

AT =

c

AT 1 2 1 − = − =

y

SL 1 1

1 1 = − =

c

SL

slide-21
SLIDE 21

ASP-DAC'01 Lou Scheffer II-21

Early mode analysis

=

a

AT

1 =

b

AT 2 =

x

RAT 1 =

x

AT

1 2 1 − = − =

x

SL

1 1 = − =

b

SL = − =

a

SL 1 =

y

AT =

c

AT 1 1 = − =

y

SL

a b x c y

  • Definitions change as follows

Definitions change as follows – – longest longest becomes becomes shortest shortest – – slack = arrival slack = arrival -

  • required

required

1 1

1 1 − = − =

c

SL

slide-22
SLIDE 22

ASP-DAC'01 Lou Scheffer II-22

Delay modeling

ax

d

a b x

bx

d

Propagation Arcs cl d

  • d

cl

t

_

  • cl

d

_

Test Arc Timing Model

slide-23
SLIDE 23

ASP-DAC'01 Lou Scheffer II-23

Agenda

I I Traditional design flows

Traditional design flows

I I Summary of DSM Problems

Summary of DSM Problems

I I Analysis Methods Overview

Analysis Methods Overview

I I Correction Methods Overview

Correction Methods Overview

I I Approaches to Fixing Timing Closure

Approaches to Fixing Timing Closure

I I Experimental Results

Experimental Results

I I Summary

Summary

slide-24
SLIDE 24

ASP-DAC'01 Lou Scheffer II-24

Timing Correction

I I Fix electrical violations (slew and load).

Fix electrical violations (slew and load). Takes priority since needed for reliability. Takes priority since needed for reliability.

N N Resize cells

Resize cells

N N Buffer nets

Buffer nets

N N Copy (clone) cells

Copy (clone) cells

I I Fix timing problems

Fix timing problems

N N Local transforms (bag of tricks)

Local transforms (bag of tricks)

N N Path

Path-

  • based transforms

based transforms

slide-25
SLIDE 25

ASP-DAC'01 Lou Scheffer II-25

Local Transforms

I I Resize cells

Resize cells

I I Buffer or clone to reduce load on critical nets

Buffer or clone to reduce load on critical nets

I I Decompose large cells

Decompose large cells

I I Swap connections on commutative pins or among

Swap connections on commutative pins or among equivalent nets equivalent nets

I I Move critical signals forward

Move critical signals forward

I I Pad early paths

Pad early paths

I I Area recovery

Area recovery

slide-26
SLIDE 26

ASP-DAC'01 Lou Scheffer II-26

Transform Example

Delay = 4 ….. Double Inverter Removal ….. ….. Delay = 2

slide-27
SLIDE 27

ASP-DAC'01 Lou Scheffer II-27

Resizing

0.01 0.02 0.03 0.04 0.05 0.2 0.4 0.6 0.8 1

load d

A B C

b a d e f 0.2 0.2 0.3 ? b a A 0.035 b a C 0.026

slide-28
SLIDE 28

ASP-DAC'01 Lou Scheffer II-28

Cloning

0.01 0.02 0.03 0.04 0.05 0.2 0.4 0.6 0.8 1

load d

A B C

b a d e f g h 0.2 0.2 0.2 0.2 0.2 ? b a d e f g h A B

slide-29
SLIDE 29

ASP-DAC'01 Lou Scheffer II-29

Buffering

0.01 0.02 0.03 0.04 0.05 0.2 0.4 0.6 0.8 1

load d

A B C

b a d e f g h 0.2 0.2 0.2 0.2 0.2 ? b a d e f g h 0.1 0.2 0.2 0.2 0.2 B B 0.2

slide-30
SLIDE 30

ASP-DAC'01 Lou Scheffer II-30

Redesign Fan-in Tree

a c d b e Arr(b)=3 Arr(c)=1 Arr(d)=0 Arr(a)=4 Arr(e)=6 1 1 1 c d e Arr(e)=5 1 1 b 1 a

slide-31
SLIDE 31

ASP-DAC'01 Lou Scheffer II-31

Redesign Fan-out Tree

1 1 1 3 1 1 1 Longest Path = 5 1 1 1 3 1 2 Longest Path = 4 Slowdown of buffer due to load

slide-32
SLIDE 32

ASP-DAC'01 Lou Scheffer II-32

Decomposition

slide-33
SLIDE 33

ASP-DAC'01 Lou Scheffer II-33

Swap Commutative Pins

2

c a b 2

1 1 1

1 3 a c b 2

1 1 1 2

1 5 Simple Sorting on arrival times and delay works

slide-34
SLIDE 34

ASP-DAC'01 Lou Scheffer II-34

Move Critical Signals Forward

  • Based on ATPG

Based on ATPG – – linear in circuit size linear in circuit size – – Detects redundancies Detects redundancies efficiently efficiently

  • Efficiently find wires to

Efficiently find wires to be added and remove. be added and remove. – – Based on mandatory Based on mandatory assignments. assignments.

a b c d e a b e d c

slide-35
SLIDE 35

ASP-DAC'01 Lou Scheffer II-35

Path-based Transforms

I I Path

Path-

  • based resizing

based resizing

I I Unmap

Unmap / / remap remap a path or cone a path or cone

I I Slack stealing

Slack stealing

I I Retiming

Retiming

slide-36
SLIDE 36

ASP-DAC'01 Lou Scheffer II-36

Slack Stealing

  • Take advantage of timing behavior of level sensitive registers

Take advantage of timing behavior of level sensitive registers (latches) (latches)

C1 C2

Slack = 0

C1 C2

Slack = +1 Slack = -1

C1 C2

1 2

slide-37
SLIDE 37

ASP-DAC'01 Lou Scheffer II-37

Retiming

Delay=3 Delay=2

Forward Backward

slide-38
SLIDE 38

ASP-DAC'01 Lou Scheffer II-38

Agenda

I I Traditional design flows

Traditional design flows

I I Summary of DSM Problems

Summary of DSM Problems

I I Analysis Methods Overview

Analysis Methods Overview

I I Correction Methods Overview

Correction Methods Overview

I I Approaches to Fixing Timing Closure

Approaches to Fixing Timing Closure

I I Experimental Results

Experimental Results

I I Summary

Summary

slide-39
SLIDE 39

ASP-DAC'01 Lou Scheffer II-39

Solutions to Timing Closure

I I Hand / Custom design

Hand / Custom design

I I Improved analysis

Improved analysis

I I More sophisticated clock design

More sophisticated clock design

I I Carry hierarchical logic design into physical

Carry hierarchical logic design into physical

I I Modify existing flows

Modify existing flows

I I More physically knowledgeable tools

More physically knowledgeable tools

N N Many variations: combined synthesis/place/route,

Many variations: combined synthesis/place/route, gain based synthesis, etc. gain based synthesis, etc.

slide-40
SLIDE 40

ASP-DAC'01 Lou Scheffer II-40

Hand/Custom Design

I I Mentioned for completeness

Mentioned for completeness

N N Hurts productivity

Hurts productivity

N N Yields highest performance

Yields highest performance

I I Can only fix a few things

Can only fix a few things – – for example: for example:

N N Can realistically fix timing or

Can realistically fix timing or crosstalk crosstalk problems on a few nets problems on a few nets

N N Cannot realistically change the size of blocks

Cannot realistically change the size of blocks

slide-41
SLIDE 41

ASP-DAC'01 Lou Scheffer II-41

Improved Analysis Helps

I I Plot shows slack by net for two designs

Plot shows slack by net for two designs

I I A 10% timing delta

A 10% timing delta -

  • > many more bad nets

> many more bad nets

N Often the difference between success and failure

Often the difference between success and failure

500 1000 1500 2000 2500 3000 3500

  • 5

5 10 15 20 Slack Relative to Worst Net (ns) Number of Nets Series1 Series2

slide-42
SLIDE 42

ASP-DAC'01 Lou Scheffer II-42

More accurate analysis

I I Crosstalk

Crosstalk induced delay induced delay

N N Old approach

Old approach – – overestimate coupling C

  • verestimate coupling C

N N Better

Better – – compute nominal timing + compute nominal timing + xtalk xtalk delta delta

I I Customer example from

Customer example from CadMos CadMos

N N Ignore

Ignore crosstalk crosstalk completely completely 400 MHz 400 MHz

! Not an acceptable alternative

Not an acceptable alternative

N N Coupling Caps overestimated by 60%

Coupling Caps overestimated by 60% 300 MHz 300 MHz

N N Nominal delays + computed

Nominal delays + computed crosstalk crosstalk 333 MHz 333 MHz

N N More accurate analysis gains 10% margin

More accurate analysis gains 10% margin

slide-43
SLIDE 43

ASP-DAC'01 Lou Scheffer II-43

Increased accuracy helps

I I Global/detailed route correlation

Global/detailed route correlation

N N Any global route better which than Wire Load

Any global route better which than Wire Load Models or Steiner trees, since global routes Models or Steiner trees, since global routes consider congestion consider congestion

N N But to get that last 10%, need global/detailed

But to get that last 10%, need global/detailed router link router link

! Knowing some nets must detour is good, but….

Knowing some nets must detour is good, but….

! Which

Which net takes net takes which which detour is needed for good detour is needed for good correlation correlation

slide-44
SLIDE 44

ASP-DAC'01 Lou Scheffer II-44

Modified clock design

I I Zero skew is not necessary, and maybe not

Zero skew is not necessary, and maybe not even desirable even desirable

I I We have the freedom to adjust clock arrival

We have the freedom to adjust clock arrival times at memory elements times at memory elements

N N This obtains more margin and thus helps

This obtains more margin and thus helps convergence convergence

I I Similar to retiming but less disruptive

Similar to retiming but less disruptive

I I Improvement very design dependent

Improvement very design dependent

N N If worst path is flip

If worst path is flip-

  • flop to itself, doesn’t help

flop to itself, doesn’t help

I I May impact scan chains

May impact scan chains

slide-45
SLIDE 45

ASP-DAC'01 Lou Scheffer II-45

Hierarchy and Physical Design

I I Logical hierarchy can be carried over into

Logical hierarchy can be carried over into physical design physical design

I I Seems natural top

Seems natural top-

  • down approach, using

down approach, using floorplanning floorplanning as a firm guide to physical as a firm guide to physical design design

slide-46
SLIDE 46

ASP-DAC'01 Lou Scheffer II-46

Hierarchy and Physical Design Advantages…

I I Run time of P&R tools

Run time of P&R tools

I I Blocks can be built independently

Blocks can be built independently

I I Early (

Early (and valuable and valuable) knowledge of global wires ) knowledge of global wires

I I Wire delay within macro may be tolerable

Wire delay within macro may be tolerable

I I Contains the problem size

Contains the problem size

I I Extends naturally to SOC and mixed A/D chips

Extends naturally to SOC and mixed A/D chips

I I May be the only real method available

May be the only real method available

slide-47
SLIDE 47

ASP-DAC'01 Lou Scheffer II-47

Hierarchy and Physical Design Disadvantages

I I Placement solution bounded

Placement solution bounded

I I Ability to find a routable solution hindered

Ability to find a routable solution hindered

I I Hierarchy usually logically

Hierarchy usually logically-

  • based, not

based, not physically physically-

  • based

based

I I Boundary conditions explode and must be

Boundary conditions explode and must be managed carefully to avoid surprises managed carefully to avoid surprises

I I Pin assignment problem for all macros

Pin assignment problem for all macros

slide-48
SLIDE 48

ASP-DAC'01 Lou Scheffer II-48

Hierarchy Example Plots

slide-49
SLIDE 49

ASP-DAC'01 Lou Scheffer II-49

Hierarchy Example Plots

slide-50
SLIDE 50

ASP-DAC'01 Lou Scheffer II-50

Hierarchy Example Plots

slide-51
SLIDE 51

ASP-DAC'01 Lou Scheffer II-51

Previous attempts to fix closure

I I Modifications/Additions to existing flows

Modifications/Additions to existing flows

I I Allow placer to do sizing and buffering

Allow placer to do sizing and buffering

I I Do post placement optimization

Do post placement optimization

N N Simple transformations

Simple transformations

N N Use existing placement

Use existing placement

I I Do post placement re

Do post placement re-

  • synthesis

synthesis

N N Complex transformations allowed

Complex transformations allowed

N N Needs incremental placement and extraction

Needs incremental placement and extraction

slide-52
SLIDE 52

ASP-DAC'01 Lou Scheffer II-52

Post-Placement Optimization

Design Entry Synthesis w/Timing Place Route Timing Synthesis w/Timing 1. In-place optimizations 2. Minimally disturb placement optimizations

slide-53
SLIDE 53

ASP-DAC'01 Lou Scheffer II-53

Post-Placement Optimization

I I In

In-

  • place (little or no placement impact)

place (little or no placement impact)

N N Resizing (carefully)

Resizing (carefully)

N N Pin swapping, some tree rebuilding

Pin swapping, some tree rebuilding

N N Wire sizing / typing

Wire sizing / typing

I I Minimally disruptive

Minimally disruptive

N N Resizing

Resizing

N N Buffering

Buffering

N N Cloning

Cloning

N N Tree rebuilding

Tree rebuilding

N N Cell removal

Cell removal

slide-54
SLIDE 54

ASP-DAC'01 Lou Scheffer II-54

In-place Optimization

I I Not

Not too too difficult difficult

I I Can use extracted electrical data (C, RC)

Can use extracted electrical data (C, RC) from placement tool from placement tool

N N Some changes affect pin locations, but may be

Some changes affect pin locations, but may be ignored ignored

N N Tree rebuilding needs incremental extraction

Tree rebuilding needs incremental extraction

I I Can use timing reports for timing data

Can use timing reports for timing data

N N But, accuracy suffers as changes are made

But, accuracy suffers as changes are made

slide-55
SLIDE 55

ASP-DAC'01 Lou Scheffer II-55

In-place Optimization

Placement & extraction Placed netlist

C/RC data

Optimization Opt’d netlist Resize swap pins rebuild trees

slide-56
SLIDE 56

ASP-DAC'01 Lou Scheffer II-56

Place-disruptive Optimization

I I Nets changing implies…

Nets changing implies…

N N Must be able to recompute C and RC

Must be able to recompute C and RC

N N May need to incrementally place new cells

May need to incrementally place new cells

N N Need incremental timing capability

Need incremental timing capability

slide-57
SLIDE 57

ASP-DAC'01 Lou Scheffer II-57

Place-disruptive Optimization

Placement & extraction Placed netlist

C/RC data Optimization with placer, timer, extractor

Opt’d netlist

Resize buffer clone cell removal rebuild trees

slide-58
SLIDE 58

ASP-DAC'01 Lou Scheffer II-58

Post-Placement Example - Buffering long wires

slide-59
SLIDE 59

ASP-DAC'01 Lou Scheffer II-59

Post-Placement Challenges

I I Getting the timing right

Getting the timing right

N N Different timers used at different stages

Different timers used at different stages

N N Do the optimizer and placer see the same worst

Do the optimizer and placer see the same worst paths as the static timer? paths as the static timer?

I I Design size / tool capacity

Design size / tool capacity

N N Using synthesis technology on flat designs

Using synthesis technology on flat designs

slide-60
SLIDE 60

ASP-DAC'01 Lou Scheffer II-60

Post-Placement Challenges

I I Incompatible tools, formats

Incompatible tools, formats

N N Placer, synthesizer, timer may all use different

Placer, synthesizer, timer may all use different file format, may all be different vendors file format, may all be different vendors

N N Basic interoperability issues

Basic interoperability issues

I I Incremental placer needed for new cells

Incremental placer needed for new cells

N N Doesn’t have to be smart

Doesn’t have to be smart

N N But might produce some infeasible solutions

But might produce some infeasible solutions

N N Must be integrated with optimizer

Must be integrated with optimizer

slide-61
SLIDE 61

ASP-DAC'01 Lou Scheffer II-61

Post-Placement Challenges

I I Extraction/Estimation of net data

Extraction/Estimation of net data

I I Any optimization which significantly alters

Any optimization which significantly alters net topology needs this ability net topology needs this ability

N N Insert cells

Insert cells

N N Remove cells

Remove cells

N N Move connections from one cell to another

Move connections from one cell to another

I I Steiner tree estimation

Steiner tree estimation

I I Net C and delay (RC) calculator

Net C and delay (RC) calculator

I I Do results match other extraction tools?

Do results match other extraction tools?

slide-62
SLIDE 62

ASP-DAC'01 Lou Scheffer II-62

Sample Optimization Results

.18 .18 µ µ m m 7.5 ns 7.5 ns

  • 4 / 1000

4 / 1000

  • 11 / 2000

11 / 2000

  • 0.5 / 500

0.5 / 500 V2 V2 Placed Placed Opt Opt Synthesized Synthesized .25 .25 µ µ m m 8 ns 8 ns

  • 13 / 20k

13 / 20k

  • 97 / 43k

97 / 43k

  • 0.4 / 100

0.4 / 100 P1 P1 .18 .18 µ µ m m 2.5 2.5-

  • 10 ns

10 ns

  • 6 / 62k

6 / 62k

  • 48 / 164k

48 / 164k

  • 0.5 / 2000

0.5 / 2000 T1 T1 .18 .18 µ µ m m 7.5 ns 7.5 ns

  • 0.3 / 100

0.3 / 100

  • 12 / 15k

12 / 15k 0 / 0 0 / 0 V1 V1 .25 .25 µ µ m m 7.5 ns 7.5 ns

  • 2 / 1400

2 / 1400

  • 12 / 38k

12 / 38k

  • 1 / 2000

1 / 2000 C1 C1 Tech Tech Cycle Cycle time time Worst slack / # misses Worst slack / # misses Design Design

slide-63
SLIDE 63

ASP-DAC'01 Lou Scheffer II-63

Root Problem is Wire Load Models

I I Main problem: correlation between Pre

Main problem: correlation between Pre-

  • P&R estimates and Post

P&R estimates and Post-

  • P&R extraction

P&R extraction

I I If correlation is good…

If correlation is good…

N N Problems detected and potentially fixed

Problems detected and potentially fixed early early

I I If correlation is bad…

If correlation is bad…

N N Problems detected

Problems detected late late

N N Not a good situation! Need to re

Not a good situation! Need to re-

  • write RTL is

write RTL is worst case for timing closure. worst case for timing closure.

slide-64
SLIDE 64

ASP-DAC'01 Lou Scheffer II-64

Why are Wire Load Models Used?

I I Can’t complete layout until logic design is

Can’t complete layout until logic design is complete complete

I I Can’t complete logic design without timing

Can’t complete logic design without timing

I I Can’t time without load and net delay data

Can’t time without load and net delay data

I I Can’t extract load and net delay data until

Can’t extract load and net delay data until layout is complete layout is complete

I I Can’t complete layout …

Can’t complete layout …

slide-65
SLIDE 65

ASP-DAC'01 Lou Scheffer II-65

WLM solution – use statistics

I I Don’t know specific layout data

Don’t know specific layout data

I I But we know something about statistical

But we know something about statistical properties properties

I I Average net load, average net delay

Average net load, average net delay

I I Further refine using other characteristics

Further refine using other characteristics

N N Number of sinks

Number of sinks

N N Size of design (number of circuits)

Size of design (number of circuits)

N N Physical size

Physical size

slide-66
SLIDE 66

ASP-DAC'01 Lou Scheffer II-66

Correlation Pre/Post-P&R using averages

I I Wire load models

Wire load models give synthesis an give synthesis an estimate estimate

  • f physical design
  • f physical design

I I We can correlate averages pre

We can correlate averages pre-

  • and post

and post-

  • P&R as accurately as needed

P&R as accurately as needed

I I If specific design has average behavior, its

If specific design has average behavior, its timing, timing, on average

  • n average, can be predicted

, can be predicted

I I Otherwise, a pass through placement can

Otherwise, a pass through placement can provide correct WLM for a design provide correct WLM for a design

slide-67
SLIDE 67

ASP-DAC'01 Lou Scheffer II-67

Timing and averages

I I WLMs

WLMs OK for area, power (properties that OK for area, power (properties that are sums are well handled by statistics) are sums are well handled by statistics)

I I But, timing dictated by the worst

But, timing dictated by the worst specific specific path path

I I That path is built of

That path is built of individual individual nets nets

I I One net can determine the speed of an

One net can determine the speed of an entire design entire design

I I Reality: poor correlation for relatively few

Reality: poor correlation for relatively few nets can cause major headaches nets can cause major headaches

slide-68
SLIDE 68

ASP-DAC'01 Lou Scheffer II-68

Correlation Pre/Post-P&R Averages and Wire Loads

Distribution of C / fan-out

5000 10000 15000 20000 25000 30000 1 2 3 4 5 6 7 8 9 1 1 1 pF per fan-out Number of nets median median mean mean

slide-69
SLIDE 69

ASP-DAC'01 Lou Scheffer II-69

Correlation Pre/Post-P&R Cwire Data by Logic Design

Cwire

Number of fan-outs

slide-70
SLIDE 70

ASP-DAC'01 Lou Scheffer II-70

Better Wire Load Models

I I How can we use information from one pass

How can we use information from one pass through physical design? through physical design?

I I Adjust wire load model coefficients

Adjust wire load model coefficients

I I Back annotate

Back annotate specific net load and delay data to specific net load and delay data to the logic design the logic design

I I New problem: correlation of logic pre

New problem: correlation of logic pre-

  • and post

and post-

  • synthesis

synthesis

I I But, there are fundamental limits to statistical

But, there are fundamental limits to statistical models models – – a new approach is needed a new approach is needed. .

slide-71
SLIDE 71

ASP-DAC'01 Lou Scheffer II-71

A better approach: Combine Synthesis, P & R

I I Don’t use wire load models at all

Don’t use wire load models at all

I I Synthesis does a trial placement as it runs

Synthesis does a trial placement as it runs

N N Loading found from estimated routes

Loading found from estimated routes

I I Must include global routing

Must include global routing

N N Then, feed global route to detailed router

Then, feed global route to detailed router

N N Or, do detailed route itself

Or, do detailed route itself

I I Much better correlation and timing closure

Much better correlation and timing closure

I I No inter

No inter-

  • tool data transfer headaches

tool data transfer headaches

slide-72
SLIDE 72

ASP-DAC'01 Lou Scheffer II-72

Example of Combined SP&R

I I 160k instances

160k instances

I I 70 macros (blocks)

70 macros (blocks)

I I 5 layers, 0.18 micron

5 layers, 0.18 micron

I I Target freq: 100Mhz

Target freq: 100Mhz

Video Graphics Engine

slide-73
SLIDE 73

ASP-DAC'01 Lou Scheffer II-73

Conventional Flow

I I More than 20 Iterations

More than 20 Iterations

I I 89MHz best result

89MHz best result w/manual changes w/manual changes

Synthesis Static Timing syn2GCF

SE Placement base

  • ptimization

Detail route

Floorplan DEF

Extraction DRC

  • Func. & Timing

.TLF Physical LEF

Global route

  • Func. & Timing

.lib

Delay calc

DC PT Pearl

slide-74
SLIDE 74

ASP-DAC'01 Lou Scheffer II-74

Combined SP&R Flow

SE-PKS

Floorplan DEF

Extraction DRC

  • Func. & Timing

.TLF Physical LEF

Delay calc

EDIF netlist

PKS Optimization Global Route Static Timing

Pearl HE

Static Timing

PT TCL Constraints write_constraints

Detail route I I 100MHz final result, met timing

100MHz final result, met timing

I I Correlation within +

Correlation within + -

  • 2.1%

2.1%

I I One pass

One pass

I I 12hrs 20min runtime

12hrs 20min runtime

slide-75
SLIDE 75

ASP-DAC'01 Lou Scheffer II-75

Slack Correlation

Wire Load Based PKS Routed

slide-76
SLIDE 76

ASP-DAC'01 Lou Scheffer II-76

Enlargement of SP&R slack

slide-77
SLIDE 77

ASP-DAC'01 Lou Scheffer II-77

Results from combined SP&R

Case Case size size macros PKS timing macros PKS timing max freq (MHz) max freq (MHz) instances (k) instances (k) error (%) error (%) conventional conventional SP&R SP&R 1 1 350 350 56 56 + + -

  • 3%

3% 140 140 140 140 2 2 250 250 50 50 + + -

  • 3%

3% 97 97 100 100 3 3 50 50 4 4 + + -

  • 0.96%

0.96% 93 93 95 95 4 4 160 160 70 70 + + -

  • 2.1%

2.1% 89 89 100 100

slide-78
SLIDE 78

ASP-DAC'01 Lou Scheffer II-78

Agenda

I I Traditional design flows

Traditional design flows

I I Summary of DSM Problems

Summary of DSM Problems

I I Analysis Methods Overview

Analysis Methods Overview

I I Correction Methods Overview

Correction Methods Overview

I I Approaches to Fixing Timing Closure

Approaches to Fixing Timing Closure

I I Experimental Results

Experimental Results

I I Summary

Summary

slide-79
SLIDE 79

ASP-DAC'01 Lou Scheffer II-79

How do the approaches compare?

I I Jay

Jay McDougal McDougal of

  • f Agilent

Agilent ran many flows ran many flows

  • n the same design
  • n the same design

I I Overconstrain

Overconstrain clock by various amounts clock by various amounts

I I Accurate or conservative

Accurate or conservative WLMs WLMs

N N Tried many levels of conservatism

Tried many levels of conservatism

I I Allow placer to size or not

Allow placer to size or not

I I Do post placement optimization or not

Do post placement optimization or not

I I Physically knowledgeable synthesis

Physically knowledgeable synthesis

slide-80
SLIDE 80

ASP-DAC'01 Lou Scheffer II-80

Characteristics of sample design

I I Design not very difficult

Design not very difficult

N N ColdFire

ColdFire processor processor

N N 80K instances

80K instances

N N 0.25 micron library

0.25 micron library

N N 5 layer process, not congestion dominated

5 layer process, not congestion dominated

N N Design goal was 180 MHz, known to be

Design goal was 180 MHz, known to be possible with this design possible with this design

N N 85% of delay in gates; 15% in interconnect

85% of delay in gates; 15% in interconnect

! 0.18/0.13 micron, bigger designs will show bigger

0.18/0.13 micron, bigger designs will show bigger differences between techniques differences between techniques

slide-81
SLIDE 81

ASP-DAC'01 Lou Scheffer II-81

Key to the plot of results

I I Basic flow

Basic flow – – Design Compiler & Design Compiler & Qplace Qplace

I I TDD = timing driven design

TDD = timing driven design

N N In addition to minimizing wire length and congestion,

In addition to minimizing wire length and congestion, placer is given timing constraints and allowed to placer is given timing constraints and allowed to change gate sizes change gate sizes

I I IPO and PBO are post placement optimizers

IPO and PBO are post placement optimizers

N N IPO

IPO – – runs on synthesis DB with back annotation runs on synthesis DB with back annotation

N N PBO

PBO – – runs on physical DB with synthesis transforms runs on physical DB with synthesis transforms

I I PKS = Physically Knowledgeable Synthesis

PKS = Physically Knowledgeable Synthesis (combined Synthesis/Place/Route) (combined Synthesis/Place/Route)

slide-82
SLIDE 82

ASP-DAC'01 Lou Scheffer II-82

Comparison of Approaches

5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 0.95 1.05 1.15 1.25

Relative size Clock cycle achieved

No WLM 90% WLM 3ns;50%WL IPO 5ns NoWL IPO 3ns NoWL TDD/PBO 50%WL TDD/PBO 90%WL PKS

Required Cycle time

slide-83
SLIDE 83

ASP-DAC'01 Lou Scheffer II-83

Comparison of Approaches

5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 0.95 1.05 1.15 1.25

Relative size Clock cycle achieved

No WLM 90% WLM 3ns;50%WL IPO 5ns NoWL IPO 3ns NoWL TDD/PBO 50%WL TDD/PBO 90%WL PKS

Good area, but iterates between placement and synthesis, worst TTM, didn’t hit timing target One tool, no iteration, better TTM, hit timing target

slide-84
SLIDE 84

ASP-DAC'01 Lou Scheffer II-84

Agenda

I I Traditional design flows

Traditional design flows

I I Summary of DSM Problems

Summary of DSM Problems

I I Analysis Methods Overview

Analysis Methods Overview

I I Correction Methods Overview

Correction Methods Overview

I I Approaches to Fixing Timing Closure

Approaches to Fixing Timing Closure

I I Experimental Results

Experimental Results

I I Summary

Summary

slide-85
SLIDE 85

ASP-DAC'01 Lou Scheffer II-85

Good News

I I At least we understand the problem

At least we understand the problem

N N Analysis of timing is well understood

Analysis of timing is well understood

N N Transformations that help timing are well

Transformations that help timing are well understood understood

N N DSM effects are painful but can be controlled

DSM effects are painful but can be controlled

slide-86
SLIDE 86

ASP-DAC'01 Lou Scheffer II-86

Bad News

I I Cycle time and technology advances

Cycle time and technology advances demand more and more sophisticated demand more and more sophisticated

  • ptimization techniques
  • ptimization techniques

I I In previous flows, corrections must be

In previous flows, corrections must be applied in separate tools applied in separate tools

I I Disconnects among various tools involved

Disconnects among various tools involved increases turn increases turn-

  • around

around-

  • time and limits

time and limits

  • ptimization
  • ptimization
slide-87
SLIDE 87

ASP-DAC'01 Lou Scheffer II-87

Good News

I I The Bad News is commonly recognized

The Bad News is commonly recognized

I I Many tool vendors, academics, in

Many tool vendors, academics, in-

  • house

house EDA researchers are working to solve these EDA researchers are working to solve these problems problems

I I A new generation of tools is already

A new generation of tools is already available that was designed from the ground available that was designed from the ground up to address timing closure up to address timing closure

slide-88
SLIDE 88

ASP-DAC'01 Lou Scheffer II-88

Bad News

I I These problems won’t be the last!

These problems won’t be the last!

I I Each process generation brings new

Each process generation brings new problems problems

N N Increased size

Increased size

N N Weird process rules (antenna)

Weird process rules (antenna)

N N Possible new effects (single event upset)

Possible new effects (single event upset)

slide-89
SLIDE 89

ASP-DAC'01 Lou Scheffer II-89

Summary

I I Timing closure is a very real problem

Timing closure is a very real problem

I I Incremental improvements help somewhat,

Incremental improvements help somewhat, but limiting factor is… but limiting factor is…

I I If synthesis does not understand placement,

If synthesis does not understand placement, it must use wire load models, which have it must use wire load models, which have serious limitations serious limitations

I I Best approach is combined synthesis/P&R

Best approach is combined synthesis/P&R

I I Experimental data backs this up

Experimental data backs this up

slide-90
SLIDE 90

ASP-DAC'01 Lou Scheffer II-90

Acknowledgements

I I Tony

Tony Drumm Drumm wrote the original set of slides wrote the original set of slides for this lecture, including many of the for this lecture, including many of the

  • examples. He credits:
  • examples. He credits:

N N Alex

Alex Suess Suess

N N Jos

José é Neves Neves

N N Bill Joyner

Bill Joyner

N N IBM Rochester EDA folks

IBM Rochester EDA folks

I I But the conclusions, and any mistakes, are

But the conclusions, and any mistakes, are mine mine

slide-91
SLIDE 91

ASP-DAC'01 Lou Scheffer II-91