Analysis and Optimization of Global Interconnects Sachin Sapatnekar - - PowerPoint PPT Presentation

analysis and optimization of global interconnects
SMART_READER_LITE
LIVE PREVIEW

Analysis and Optimization of Global Interconnects Sachin Sapatnekar - - PowerPoint PPT Presentation

Analysis and Optimization of Global Interconnects Sachin Sapatnekar ECE Department University of Minnesota Minneapolis, MN, USA sachin@umn.edu 2 Prashant Saxena, Synopsys Many slides borrowed from Jiang Hu, Texas A&M Acknowledgements


slide-1
SLIDE 1

Analysis and Optimization of Global Interconnects

Sachin Sapatnekar ECE Department University of Minnesota Minneapolis, MN, USA sachin@umn.edu

slide-2
SLIDE 2

Acknowledgements

Many slides borrowed from

  • Chuck Alpert, IBM
  • Jiang Hu, Texas A&M
  • Prashant Saxena, Synopsys

2

slide-3
SLIDE 3

Outline of the talk

  • Interconnect delay metrics
  • Interconnects and scaling theory
  • Synthesis of signal interconnects
  • Noise and congestion issues

3

slide-4
SLIDE 4

Simple delay metrics

4

slide-5
SLIDE 5

Interconnect modeling

  • Precise model requires transmission line analysis
  • Break up wire into segments
  • Each segment can be modeled as

L-model π-model T-model

  • Other issues (crosstalk etc.) modeled using coupling caps
  • Interconnect extraction

– Most precise with a 3-D field solver (takes a long time!) – Other faster approximate techniques useful for design analysis/optimization (R per square, C per unit area, 2.5-D models) R(+sL) C C/2 C/2 R(+sL) C R/2(+sL/2) R/2(+sL/2)

dx

5

slide-6
SLIDE 6

Gate delay models

  • Traditionally: assume that the gate drives a capacitor

– Build macromodels for individual gates

  • Delay = f(widths, transition times, loads)
  • Example: K-factor equations
  • Similar idea used in standard cell characterization:

Delay = f (transition times, load)

– Table lookup models: storage/accuracy tradeoff (e.g. .lib format) – Fast circuit simulation – used in many delay calculators

  • More recently: effective capacitances, current source/voltage

source models

6

slide-7
SLIDE 7

RC delay calculations

  • Delays can be calculated easily
  • For example: RC driven by a step excitation

Response V(t) = ( 1 - e-t/RC ) Time constant = RC

Time constants for more complicated circuits?

C R V(t)

7

slide-8
SLIDE 8

Elmore delay for an RC tree

∑ ∑

∈ ∈

=

) ( ) ( , k Path i i downstream j j i k D

C R T

Ra Rb Rc Rd Re Ca Cb Cc Cd Ce

Root – Elmore Delay to node e

= Ra.(Ca+Cb+Cc+Cd+Ce) + Rb.(Cb+Cd + Ce) + Re.Ce

8

slide-9
SLIDE 9

Incrementally calculating the Elmore delay

A B C

R1 R2 C1 C2

2 2 2 1 1

) ( ) ( C R C C R C A Delay + + = −

9

slide-10
SLIDE 10

Model order reduction methods

  • Elmore delay: RC transfer function

H(s) ≈ a0 b0 + b1 s

  • Can approximate RC circuit transfer function as

a0 + a1 s + ... + an-1 sn-1 b0 + b1 s + ... + bn-1 sn-1 + bn sn

– Response approximated as a sum of exponentials – Useful for interconnect simulation – Other variants: PVL, PRIMA, etc. – Handles linear systems, but drivers may be nonlinear e(t) e’(t) t t td

10

slide-11
SLIDE 11

Effective capacitance model

  • Includes the effects of gate nonlinearities
  • Gate driving RC interconnect

– Determine waveform at gate output; analyze interconnect as a linear system after that

  • Possible model for waveform at x

– Gate driving total capacitance of net?

  • Gives erroneous results due to resistive shielding

– Actual effective capacitance < total wiring capacitance – Techniques exist for determining Ceffective, or modeling the gate using a voltage/current source x x

C1 R C2

11

slide-12
SLIDE 12

Computing Ceff: Overall flow

Cnew=Ctot Ceff=Cnew Ceff Compute Thevenin model at Ceff No

12

Cnew Match charge To get Cnew Ceff=Cnew? Compute delay,slew yes

[C. Kashyap]

slide-13
SLIDE 13

Current source model

  • Represents the transistor I-V curve as a function of input slew

and output load

  • Linear Thevenin driver
  • CCSM (Synopsys), ECSM (Cadence)

± delay = f( slew ,Cload) rd Vout I out = f( slew ,Cload)

[Amin, DAC06]

13

slide-14
SLIDE 14

Wire tapering and layer assignment

  • Elmore delay

Root

– Wires near the root must have low resistances – Wires near the leaves must have low capacitances – Wider wires near root, narrower near leaves

  • In practice: # of wire widths limited to two or three
  • Same principle applies to layer assignment

∑ ∑

∈ ∈

=

) ( ) ( , k Path i i downstream j j i k D

C R T

slide-15
SLIDE 15

Simple buffer insertion problem

Given: Source and sink locations, sink capacitances and RATs, a buffer type, source delay rules, unit wire resistance and capacitance Buffer RAT4 RAT3 s0 RAT2 RAT1

15

slide-16
SLIDE 16

Simple buffer insertion problem

Find: Buffer locations and a routing tree such that slack at the source is minimized

)} , ( ) ( { min ) (

4 1 i i i

s s delay s RAT s q − =

≤ ≤

16

RAT2 RAT4 RAT3 RAT1 s0

slide-17
SLIDE 17

Slack example RAT = 500 delay = 400

slack = -200

RAT = 400 delay = 600 RAT = 500 delay = 350

slack = + 100

RAT = 400 delay = 300

17

slide-18
SLIDE 18

Interconnects and Scaling Theory

slide-19
SLIDE 19

A scaling primer

S S G G D D

  • Ideal process scaling:

– Device geometries shrink by σ (= 0.7x)

  • Device delay shrinks by σ

– Wire geometries shrink by σ

  • Resistance:

ρ l/(wσ.hσ) = R/σ2

  • Coupling cap:

ε (hσ) l /(Sσ) = same

  • Capacitance to ground: similar
  • In each process generation

R doubles, C and Cc unchanged

  • But it doesn’t quite work that way
  • h scales by less than σ to control R

h w l S lσ hσ Sσ wσ

slide-20
SLIDE 20

Block scaling

  • Block area often stays same

– # cells, # nets doubles

  • Wiring histogram shape (almost) invariant

– Global interconnect lengths don’t shrink – Local interconnect lengths shrink by σ

slide-21
SLIDE 21

A typical chip cross-section

  • Wires become “fatter” as you

move to upper layers

  • From one technology to the

next, wire aspect ratios become more skewed

  • R is controlled, at the

expense of coupling capacitance

[Intel]

21

slide-22
SLIDE 22

The role of interconnects

  • Short interconnect

– Used to connect nearby cells, Rdriver >> Rinterconnect – Minimize wire C, i.e., use short minwidth wires

  • Medium to long-distance (“global”) interconnect

– Rdriver ≈ Rinterconnect – Size wires to tradeoff area vs. delay – Increasing width ⇒ Capacitance increases, Resistance decreases Need to find acceptable tradeoff - wire sizing problem

  • “Fat” wires

– Thicker cross-sections in higher metal layers – Useful for reducing delays for global wires – Inductance issues, sharing of limited resource

slide-23
SLIDE 23

Interconnect delay scaling

  • Delay of a wire of length l :

τint = (rl)(cl) = rcl2 (first order)

  • Local interconnects :

τint : (r/σ2)(c)(lσ)2 = rcl2 – Local interconnect delay unchanged (but devices get faster)

  • Global interconnects :

τint : (r/σ2)(c)(l)2 = (rcl2)/σ2 – Global interconnect delay doubles – unsustainable! – Problem somewhat mitigated using buffers, using nonideal scaling as outlined earlier

  • Interconnect delay increasingly more dominant
slide-24
SLIDE 24

ITRS projections

Source: I TRS, 2003 Source: I TRS, 2003

0.1 1 10 100 250 180 130 90 65 45 32 Feature size (nm) Relative delay Gate delay (fanout 4) Local interconnect (M1,2) Global interconnect with repeaters Global interconnect without repeaters IT RS IL D Roadmap E volution 1 2 3 4 5

1 2 3 4 5 6 7

T e c hnolog y Node (µm)

E ffe c tive k

1997 IT RS 1999 IT RS 2003 IT RS

0.25 .045 .065 0.09 0.13 0.18

Industry Ac tua l T re nd

Source: Chia Hong Jan, IEDM 2003 Interconnect Short Course

ITRS projections often a “best case scenario” projection

slide-25
SLIDE 25

Buffer insertion

  • Consider

Vs

  • A buffer effectively isolates the downstream capacitance

25

slide-26
SLIDE 26

Optimizing medium/long interconnects

  • Delays of interconnects may become very large
  • Wire sizing helps to control the delay
  • Repeater insertion is another effective technique
  • Effects of a buffer

– Isolates load capacitances of different “stages” – Adds a delay

26

Cbuf

Subtree cap. CL1 Subtree cap. CL2

Cbuf

Downstream capacitance here is CL1+ Cbuf (CL2 is isolated by the buffer)

Rdriver

Subtree cap. CL1 Subtree cap. CL2

slide-27
SLIDE 27

Buffered global interconnects: Intuition l

Interconnect delay = r.c.l2

Now, interconnect delay = Σ r.c.li

2 < r.c.l2 (where l = Σ lj )

since Σ (lj

2) < (Σ lj )2

(Of course, account for intrinsic buffer delay also)

l1 ln l3 l2

slide-28
SLIDE 28

More precise analysis: Optimal inter-buffer length

  • First order (lumped parasitic, Elmore delay) analysis
  • Assume N identical buffers with equal inter-buffer length l (L = Nl)
  • For minimum delay,

( ) ( ) [ ] ( ) ( )⎥

⎦ ⎤ ⎢ ⎣ ⎡ + + + = + + + =

g d d g g g d

C R l c R rC rcl L cl C rl cl C R N T 1

= dl dT

2

= ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ −

  • pt

g d

l C R rc L

rc C R l

g d

  • pt =

L

Rd – On resistance of inverter Cg – Gate input capacitance r, c – Resistance, cap. per micron

… … l

Rd

Cg

slide-29
SLIDE 29

Optimal interconnect delay

  • Substituting lopt back into the interconnect delay expression:

( ) ( )

⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ + + + =

g d

  • pt

d g

  • pt
  • pt

C R l c R rC rcl L T 1

( )

[ ]

c R rC rc C R L T

d g g d

  • pt

+ + = 2

Delay grows linearly with L (instead of quadratically) Buffer-to-buffer spacing reduces in successive technology nodes

rc C R l

g d

  • pt =

d dσ

Dumb shrink Smart shrink

slide-30
SLIDE 30

Critical inter-buffer lengths

  • Study based on exhaustive

SPICE simulation and projected process files (Saxena et al. TCAD’04)

  • Optimally-sized uniformly for

min delay

– Min distance at which inserting a buffer speeds up the line

  • “Ideally shrunk” circuit

requires additional buffers

(0.7x vs 0.57x)

90nm 65nm 45nm 32nm

M 3 M 6

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Relative critical inter- buffer length 0.57x 0.57x

586 . = σ σ

In line with scaling In line with scaling theory: theory:

slide-31
SLIDE 31

Buffer planning needed!

Past Present/Future

31

slide-32
SLIDE 32

Buffer block planning

32

slide-33
SLIDE 33

Buffer block planning

33

slide-34
SLIDE 34

Critical sequential lengths

  • Optimized for max distance

in one clock period

  • Assumes:

– 2x frequency scaling – Ignores setup, hold, skew

  • Even with 1.4x (“Moore”)

frequency scaling, critical seq. lengths shrink at ~0.62x

  • “Ideally shrunk” circuit requires

much new wire pipelining

(0.7x vs 0.43x / 0.62x)

90nm 65nm 45nm 32nm

M3 M6 1 2 3 4 5 6 7

Relative critical seq. length

0.43x 0.43x

slide-35
SLIDE 35

Architectural impact

  • Example processor floorplan shown below
  • Layout decisions affect # clock cycles required to convey a

signal

– Architectural decisions must be made hand-in-hand with layout

35

slide-36
SLIDE 36

Longer term solution: architectural changes

  • Simplify interconnection complexity architecturally

– Modify wiring histogram shape (i.e. Rent’s parameters) of design

  • An example: multi-core microprocessors

– Goes counter to traditional approach of increased integration through block size scaling # wires wirelength

36

slide-37
SLIDE 37

Synthesis of Signal Interconnects

slide-38
SLIDE 38

Signal interconnect synthesis

  • Interconnect topology generation
  • Interconnect delay optimization
  • Noise optimization
  • Bus design
  • Congestion considerations
slide-39
SLIDE 39

Van Ginneken’s classic algorithm

  • Optimal for multi-sink nets
  • Quadratic runtime
  • Bottom-up from sinks to source
  • Generate list of candidates at each node
  • At source, pick the best candidate in list

39

slide-40
SLIDE 40

Key assumptions

  • Given routing tree
  • Given potential insertion points

40

slide-41
SLIDE 41

Generating candidates

(1) (2) (3)

41

slide-42
SLIDE 42

Pruning candidates

(3) (a) (b) Both (a) and (b) “look” the same to the source. Throw out the one with the worst slack (4)

42

slide-43
SLIDE 43

Candidate example (continued)

(4) (5)

43

slide-44
SLIDE 44

Candidate example (continued)

After pruning (5) At driver, compute which candidate maximizes

  • slack. Result is optimal.

44

slide-45
SLIDE 45

Merging branches

Right Candidates Left Candidates

45

slide-46
SLIDE 46

Pruning merged branches

Critical With pruning

46

slide-47
SLIDE 47

Combining the Options

  • Draw a plot of all (Ck, Dk) pairs for both children m and n

(assuming a binary tree)

1 3 4 5 6 7 2 1 3 4 5 6 7 2

D(m) D(n) C(m) C(n)

D(combined) C(combined)

slide-48
SLIDE 48

Van Ginneken example

(20,400) Wire C=10,d=150 (30,250) (5, 220) Buffer C=5, d=30

48

(20,400) Wire C=15,d=200 C=15,d=120 Buffer C=5, d=50 C=5, d=30 (30,250) (5, 220) (45, 50) (5, 0) (20,100) (5, 70) (20,400)

slide-49
SLIDE 49

Van Ginneken example (continued)

(30,250) (5, 220) (45, 50) (5, 0) (20,100) (5, 70) (20,400)

(5,0) is inferior to (5,70). (45,50) is inferior to (20,100)

(30,250) (5, 220) (20,100) (5, 70) (30,10) (15, -10) Wire C=10 (20,400)

Pick solution with largest slack, follow arrows to get solution

49

slide-50
SLIDE 50

Van Ginneken recap

  • Generate candidates from sinks to source
  • Quadratic runtime

– Adding a buffer adds only one new candidate – Merging branches additive, not multiplicative

  • Optimal for Elmore delay model

50

slide-51
SLIDE 51

Extensions

  • Multiple buffer types
  • Inverters
  • Polarity constraints
  • Controlling buffer resources
  • Capacitance constraints
  • Blockage recognition
  • Wire sizing

51

slide-52
SLIDE 52

Multiple buffer types

(1) (2) Time complexity increases from O(n2) to O(n2B2) where B is the number of different buffer types

52

slide-53
SLIDE 53

Inverters

(1) (2)

  • Maintain a “+ ” and a “-” list of candidates
  • Only merge branches with same polarity
  • Throw out negative candidates at source

53

slide-54
SLIDE 54

Polarity constraints

  • Some sinks are positive, some negative
  • Put negative sinks into “-” list

“-” list “-” list “+ ” list

54

slide-55
SLIDE 55

Controlling buffering resources

Before, maintain list of capacitance slack pairs

(C1, q1), (C2, q2), (C3, q3) (C4, q4), (C5, q5) (C6, q6), (C7, q7), (C8, q8) (C9, q9)

Now, store an array of lists, indexed by # of buffers

3 2 1 (C1, q1, 3), (C2, q2, 3), (C3, q3, 3) (C4, q4, 2), (C5, q5, 2) (C6, q6, 1), (C7, q7, 1), (C8, q8, 1) (C9, q9, 0)

Prune candidates with inferior cap, slack, and # buffers

55

slide-56
SLIDE 56

Buffering resource trade-off

  • 4000
  • 3000
  • 2000
  • 1000

1000 1 2 3 4 5 6 7

# of Buffers Slack (ps)

56

slide-57
SLIDE 57

Blockage recognition

Delete insertion points that run over blockages

57

slide-58
SLIDE 58

Other extensions

  • Modeling effective capacitance
  • Higher-order interconnect delay
  • Slew constraints
  • Noise constraints

58

slide-59
SLIDE 59

π-models

  • Van Ginneken candidate: (Cap, slack)

Cn R Cf C

Replace Cap with π-model (Cn, R, Cf) Total capacitance preserved: Cn + Cf = C R represents degree of resistive shielding

59

slide-60
SLIDE 60

Computing gate delay

  • When inserting buffer, compute effective

capacitance from π-model

Ceff

Use effective instead of lumped

capacitance in gate delay equation

Optimality no longer guaranteed

60

slide-61
SLIDE 61

Higher-order interconnect delay

  • Moment matching with first 3 moments
  • Previously: candidate (π-model, slack)
  • Now: candidate (π-model, m1, m2, m3)
  • Given moments, compute slack on the fly
  • Bottom-up, efficient moment computation
  • Problem: guess slew rate

61

slide-62
SLIDE 62

Slew constraints

  • When inserting buffer, compute slews to gates driven by

buffer

  • If slew exceeds target, prune candidate
  • Difficulty: unknown gate input slew

Slew 300 ps Slew 350 ps

?

62

slide-63
SLIDE 63

Timing-driven Steiner approaches

  • BRBC
  • Prim-Dijkstra
  • P-Tree
  • A-Tree (RSA)
  • SERT
  • MVERT

63

slide-64
SLIDE 64

Rectilinear Steiner arborescence

  • Assume all sinks in first quadrant
  • Iteratively

– Find sink pair p and q maximimizing min(xp, xq) + min (yp, yq) – Remove p and q from consideration – Replace with r = min(xp, xq), min (yp, yq) – Connect p and q to r

64

slide-65
SLIDE 65

RSA example

1 3 4 2 5 6

65

slide-66
SLIDE 66

RSA diagonal line sweep

1 2 3 4 5 6

66

slide-67
SLIDE 67

Prim-Dijkstra algorithm

Prim’s MST Dijkstra’s SPT Trade-off

67

slide-68
SLIDE 68

Prim’s and Dijkstra’s algorithms

  • d(i,j): length of the edge (i, j)
  • p(j): length of the path from source to j
  • Prim: d(i,j)

Dijkstra: d(i,j) + p(j)

p(j) d(i,j)

68

slide-69
SLIDE 69

The Prim-Dijkstra trade-off

  • Prim: add edge minimizing d(i,j)
  • Dijkstra: add edge minimizing p(i) + d(i,j)
  • Trade-off: c(p(i)) + d(i,j) for 0 <= c <= 1
  • When c=0, trade-off = Prim
  • When c=1, trade-off = Dijkstra

69

slide-70
SLIDE 70

Polarity problem

_ + + + _ + _ _ _ _ _

70

slide-71
SLIDE 71

A better solution?

_ + + + _ + _ _ _ _ _

71

slide-72
SLIDE 72

Buffer aware trees

(1) (2) (3)

72

slide-73
SLIDE 73

C-Tree algorithm

  • Cluster sinks by

– Polarity – Manhattan distance – Criticality

  • Two-level tree

– Form tree for each cluster – Form top-level tree

73

slide-74
SLIDE 74

C-Tree example

74

slide-75
SLIDE 75

Clustering distance metric

  • pDist(i,j) = | polarity(i) – polarity(j)|
  • sDist(i,j) = (|xi – xj| + |yi – yj|)/diam
  • tDist(i,j) scaled between 0 and 1, 0 for equal criticalities, 1 for
  • pposite criticalities
  • Final distance metric d(i,j) = pDist(i,j) + βsDist(i,j) + (1-β)tDist(i,j)

75

slide-76
SLIDE 76

Clustering – Finding centers

3 2 R 1 4

76

slide-77
SLIDE 77

Clustering – Group to centers

3

77

R 1 2 4

slide-78
SLIDE 78

Net n8702

78

slide-79
SLIDE 79

Don’t avoid all blockages!

79

slide-80
SLIDE 80

Buffer bays

80

slide-81
SLIDE 81

Blockage avoidance example

2-path1 2-path2 2-path3

81

slide-82
SLIDE 82

Blockage avoidance example

2-path1 2-path2 2-path3

82

slide-83
SLIDE 83

Blockage avoidance example

2-path1 2-path2 2-path3

83

slide-84
SLIDE 84

Noise and Congestion Issues

84

slide-85
SLIDE 85

Crosstalk

  • Crosstalk is caused due to coupling between adjacent wires in a

layout

– Wires have capacitors to GND and between each other – Ccoupling is of the same order of magnitude as Csubstrate

  • Coupling can impact the circuit in two ways

– Increased noise – Increased delays

  • “Chicken-and-egg” problem: do not know coupling cap unless delays are

known; do not know delays unless coupling cap is known

  • Typically solved by iteration using min-max timing windows

85

slide-86
SLIDE 86

Intuition

  • Miller capacitance: equivalent capacitor to ground

– In reality, equivalent coupling caps of < 0 and > 2Cc may be seen; use of –C/C/3C has been proposed

Cc 2 Cc Cc Cc Cc

aggressor victim aggressor victim aggressor victim [Only victim shown here]

86

slide-87
SLIDE 87

Miller capacitors are an approximation!

  • Real picture

Fanout gate acts as a low-pass filter! If the pulse is very sharp + occurs after the transition, it may be filtered out

Aggressor Victim (without noise) Victim (with noise)

Induced noise

Aggressor Victim

87

slide-88
SLIDE 88

Parameters affecting coupling noise

  • “Near end” vs. “Far end”
  • RC model: Vfar end > Vnear end

GND GND Aggressor Victim GND GND Aggressor Victim GND

88

slide-89
SLIDE 89

Noise Optimization

  • Spacing
  • Track permutation

– Temporally non-adjacent signals made spatially adjacent

  • Shielding
  • Downsizing aggressor driver
  • Upsizing victim driver
  • Buffering victim net
  • Up-layering victim net
  • Changing topology of victim net
  • Splitting fanouts of victim net

A V Sh

slide-90
SLIDE 90

Bus design

  • Bundles of signals treated symmetrically

– Identical electrical/physical environment for each bit

  • Abstraction of communication during early design

– Often integrated with floorplanning during µarch exploration

  • Global busses often pre-designed prior to detailed block

implementation (esp. in microprocessors)

  • Several speed-up techniques unique to busses

– Staggered repeaters, swizzling, interleaving of signals traveling in

  • pposite directions

– Relies on minimizing impact of coupling between adjacent bits

  • Cc
  • Cc

+ Cc + Cc

  • Cc
  • Cc

+ Cc + Cc

slide-91
SLIDE 91

Congestion considerations

  • Designs increasingly wire-limited
  • Interconnect optimization: routing resource intensive

– Shielding, spacing, wide-wires, up-layering

  • Congestion can cause detours (or even unroutable designs)
  • Detours increase interconnect delay as well as interconnect

delay unpredictability

– Wire delay models during tech-mapping, placement are based on shortest path routing – Detours increase convergence problems because of poor upstream wire delay modeling

Need to model actual layers, routes for critical nets during placement

slide-92
SLIDE 92

Impact on synthesis

  • Wires cannot be ignored during synthesis

– Fanout based load models obsolete … but wireload models still very inaccurate – Fanouts often isolated by buffers

  • Literal/gate count metrics often misleading

– Area is often wire-limited – Area impact of wire-RC buffers

  • Pre-layout gate sizing is wasted
  • Dense encodings (vs. one-hot and other sparse encodings)
slide-93
SLIDE 93

Buffering and placement

  • # buffers needed on a net depends
  • n its routing
  • Net routing depends on placement
  • Buffer management for intra-block

vs global nets

– Too restrictive to treat global routes/buffers as fixed obstructions

a b a b a b

slide-94
SLIDE 94

Full-chip assembly issues

What if we reduce block area to avoid wire effects?

Many of the new physical synthesis problems go away BUT # blocks increases!

(and block assembly is the hardest part of chip design!)

  • Flat assembly

(Fragmentation of paths across blocks) OR

  • Increased hierarchy

(Lack of visibility across hierarchy levels)

10 20 30 40 50 60 70 80 1 0.9 0.7 0.5 0.3

Block area shrink factor %age of repeaters

45nm 32nm

1 0.9 0.7 0.5 0.3

45nm 32nm

5 10 15 20 25 30 35 40

Normalized # Blocks

Block area shrink factor

slide-95
SLIDE 95

Integrated synthesis and placement

  • Since design metrics depend heavily on layout, generate a layout

plan as early as possible

  • Evolve logic and its layout in tandem (“companion placement ”)

– Integrate logic synthesis / tech mapping with global placement – Embed nodes spatially through recursive logic partitioning and placement – Long, critical wires and buffer needs identified early – Wire loads obtained using embedding of nodes – Hard to estimate area or delay of a Boolean node or FSM

  • Pin positions can help

– Somewhat easier at tech mapping stage…

  • Most industrial physical synthesis tools involve some integration

between tech mapping and placement

slide-96
SLIDE 96

Congestion optimization

  • Congested layouts harder to converge or

unroutable

– More delay from wires – Detours make upstream wire delay models more inaccurate

  • Cannot model congestion by a single number

characterizing entire block

– Spatial map required

  • Congestion can be addressed during

placement

– Congestion cost in objective function – Post-placement remedies

  • Recent work on congestion relief by modifying

netlist structure during tech mapping

– Congestion map generated bottom-up during covering from partial maps propagated during matching

Track requirement = 12 Track requirement = 20

AOI33

(Shelar, ISPD’05)

slide-97
SLIDE 97

Congestion driven supply/signal codesign

  • Interconnect resources increasingly scarce

– Global power and signal wires compete for routing resources

Power Wire Removal + Power Grid Sizing Power Grid Macros

  • r Cells

Signal Netlists Global Router Congestion Map

slide-98
SLIDE 98

Removal illustration Critical wires: 1, 2, 4 and 6 Non-Critical wires: 3 and 5 Removal order: first 3 then 5

slide-99
SLIDE 99

Optimal power grid of “ac3”

slide-100
SLIDE 100

Conclusion

  • Interconnects are the primary bottleneck in design today
  • Many shifts in design methodology can be motivated by

interconnect-related problems (including async or NoCs)

  • The objective of this tutorial was to

– explain why interconnects are important – overview some fundamental algorithms in interconnect design – outline issues that a designer must worry about

100