Placement ECE6133 Physical Design Automation of VLSI Systems Prof. - - PowerPoint PPT Presentation

placement
SMART_READER_LITE
LIVE PREVIEW

Placement ECE6133 Physical Design Automation of VLSI Systems Prof. - - PowerPoint PPT Presentation

Placement ECE6133 Physical Design Automation of VLSI Systems Prof. Sung Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology Placement The process of arranging the circuit components on a layout surface.


slide-1
SLIDE 1

Placement

ECE6133 Physical Design Automation of VLSI Systems

  • Prof. Sung Kyu Lim

School of Electrical and Computer Engineering Georgia Institute of Technology

slide-2
SLIDE 2

Placement

  • The process of arranging the circuit components on a layout surface.
  • Inputs: A set of fixed modules, a netlist.
  • Goal: Find the best position for each module on the chip according to

appropriate cost functions. – Considerations: routability/channel density, wirelength, cut size, performance, thermal issues, I/O pads.

1 2 3 4 5 6 7 8

1 7 5 8 2 3 6 4 1 2 3 4 5 6 8 7 wirelength = 10 wirelength = 12 A Density = 2 (2 tracks required) D B C E F G H A Shorter wirelength, 3 tracks required. B C D E F G H

slide-3
SLIDE 3

Estimation of Wirelength

  • Semi-perimeter method: Half the perimeter of the bounding rectangle

that encloses all the pins of the net to be connected. Most widely used approximation!

  • Complete graph: Since #edges in a complete graph (n(n−1)

2

) is n

2× #

  • f tree edges (n − 1), wirelength ≈ 2

n

  • (i,j)∈net dist(i, j).
  • Minimum chain: Start from one vertex and connect to the closest one,

and then to the next closest, etc.

  • Source-to-sink connection: Connect one pin to all other pins of the
  • net. Not accurate for uncongested chips.
  • Steiner-tree approximation: Computationally expensive.
  • Minimum spanning tree
slide-4
SLIDE 4

4 4 3 3 3 3 3 3 3 4 4 7 semi−perimeter len = 11 10 7 8 8 complete graph len * 2/n = 17.5 chain len = 14 10 source−to−sink len = 17 8 Steiner tree len = 12 7 Spanning tree len = 13

slide-5
SLIDE 5

Placement Methods

  • Constructive methods

– Cluster growth algorithm – Force-directed method – Algorithm by Goto – Min-cut based method

  • Iterative improvement methods

– Pairwise exchange – Simulated annealing: Timberwolf – Genetic algorithm

  • Analytical methods

– Gordian, Gordian-L

slide-6
SLIDE 6

Min-Cut Placement

  • Breuer, “A class of min-cut placement algorithms,” DAC-77.
  • Quadrature: suitable for circuits with high density in the center.
  • Bisection: good for standard-cell placement.
  • Slice/Bisection: good for cells with high interconnection on the periphery.

3a 1 3b 4a 2 4b 3a 2a 3b 1 3c 2b 3d 6a5a6b 4 6c 5b6d

n/2 n/2

1 2 3 4 5 6 7 10a 9a10b8 10c 9b 10d

quadrature

bisection slice/bisection

n/4 n/4

n/2 n/2 n/2 n/4 n/4

C1 C2

C1 C2

n/k n/k (k−2)n/k C2

n/k (k−1)n/k C1

slide-7
SLIDE 7

Algorithm for Min-Cut Placement

Algorithm: Min Cut Placement(N, n, C) /* N: the layout surface */ /* n: # of cells to be placed */ /* n0: # of cells in a slot */ /* C: the connectivity matrix */ 1 begin 2 if (n ≤ n0) then PlaceCells(N, n, C); 3 else 4 (N1, N2) ← CutSurface(N); 5 (n1, C1), (n2, C2) ← Partition(n, C); 6 Call Min Cut Placement(N1, n1, C1); 7 Call Min Cut Placement(N2, n2, C2); 8 end

slide-8
SLIDE 8

Quadrature Placement Example

  • Apply K-L heuristic to partition + Quadrature Placement: Cost C1 = 4, C2L = C2R = 2,

etc.

P Q R

Q1 Q2 Q3

1 2 3 4 5 6 7 8 9 11 10 12 13 14 15 16

C1

C2

2,4,5,7 8,12,13,14 1,3,6,9 10,11,15,16 1 2 3 4 5 6

8

9

13 14 15 16

7

12 10 11

P

C4a C2

Q C4b R

O1

C4a C2

O2 C4b O3

C1 C3b C3a

slide-9
SLIDE 9

Min-Cut Placement with Terminal Propagation

  • Dunlop & Kernighan, “A procedure for placement of standard-cell VLSI

circuits,” IEEE TCAD, Jan. 1985.

  • Drawback of the original min-cut placement:

Does not consider the positions of terminal pins that enter a region. – What happens if we swap {1, 3, 6, 9} and {2, 4, 5, 7} in the previous example?

L1 L2

R

S L1 L2 S

prefer to have them in R1

R1 R2

slide-10
SLIDE 10

Terminal Propagation

  • We should use the fact that s is in L1!

L1 L2

R1 R2

L1 L2

R1 R2

center

p p

dummy cell Lower cost higher cost

P will stay in R1 for the rest of partitioning!

s

s

  • When not to use p to bias partitioning? Net s has cells in many groups?

h/3

p

h

Use p! p

h/3

h

Don’t use p to bias the solution in either direction! R

L

p1 p2 p3 G minimum rectilinear Steiner tree

slide-11
SLIDE 11

Terminal Propagation Example

  • Partitioning must be done breadth-first, not depth-first.

d

b

a

c

S

a

b

c d S

R

L

C1

a b c

d

C1

b

c

d

a p1

L1 L2

R1 R2

c

a

d

C1

L1

L2

R1

R2

b

R

L

C1

a b c

d

without terminal propagation with terminal propagation

unbiased partition

  • f R
slide-12
SLIDE 12

Creating Rows

  • Terminal propagation reduce overall area by ~30%
  • Creating rows

– Choose α and β preferably to balance row to balance row length (during re-arrangement )

C1 C2 C3 Row 1 Row 2 Row 3 Row 4

cells in C1→ row1 cells in C3→ row1 cells in C2 C2

α β α + β = 1

Row 1 Row 2

slide-13
SLIDE 13

Creating Rows

  • Example

– Partitioning of circuit into 32 groups – Each group is either assigned to a single row or divided into 2 rows

1 1 1 1,2 1,2 1,2 1,2 2 2 2,3 2,3 2,3 2,3 3 3 3 3,4 3,4 3,4 3,4 4 4 4 4 5 5 5 5 5 5 4,5 4,5

a four-row standard cell design

slide-14
SLIDE 14

Experimental Results

  • CMOS Chip with 453 nets and 412 cells
  • Manual solution

– track density=147; feedthroughs=184

  • Automated solution

– without terminal propagation: t.d.=313; f.t.=591 – (t.d. reduced to 235 by iterative interchanges) – with terminal propagation: t.d.=186; f.t.=182 – (t.d. reduced to 152 by iterative interchanges) – Iterative Interchange Refinement is helpful

  • The program is in production use as part of an automatic

placement system in AT&T Bell Lab.

– Solutions within 10% of the best hand layout

slide-15
SLIDE 15

Remarks on Min-cut Placement

  • Also implemented F-M partitioning method

– Much faster but solutions appeared to be not as good as K-L

  • Use Simulated Annealing to do partitioning

– Much slower. If restricted to a reasonable CPU time, solutions are

  • f similar quality of those by F-M method. Easy to implement
  • Seeking an elegant way to force some cells to be in

particular positions

  • Investigate other algorithms for terminal propagation

– Terminal propagation is the bottleneck of CPU time

slide-16
SLIDE 16

Practical Problems in VLSI Physical Design Mincut Placement (1/12)

Mincut Placement

Perform quadrature mincut onto 4 × 4 grid

Start with vertical cut first

undirected graph model w/ k-clique weighting thin edges = weight 0.5, thick edges = weight 1

slide-17
SLIDE 17

Practical Problems in VLSI Physical Design Mincut Placement (2/12)

Cut 1 and 2

First cut has min-cutsize of 3 (not unique)

Both cuts 1 and 2 divide the entire chip

slide-18
SLIDE 18

Practical Problems in VLSI Physical Design Mincut Placement (3/12)

Cut 3 and 4

Each cut minimizes cutsize

Helps reduce overall wirelength

slide-19
SLIDE 19

Practical Problems in VLSI Physical Design Mincut Placement (4/12)

Cut 5 and 6

16 partitions generated by 6 cuts

HPBB wirelength = 27

slide-20
SLIDE 20

Practical Problems in VLSI Physical Design Mincut Placement (5/12)

Recursive Bisection

Start with vertical cut

Perform terminal propagation with middle third window

slide-21
SLIDE 21

Practical Problems in VLSI Physical Design Mincut Placement (6/12)

Cut 3: Terminal Propagation

Two terminals are propagated and are “pulling” nodes

Node k and o connect to n and j: p1 propagated (outside window) Node g connect to j, f and b: p2 propagated (outside window) Terminal p1 pulls k/o/g to top partition, and p2 pulls g to bottom

slide-22
SLIDE 22

Practical Problems in VLSI Physical Design Mincut Placement (7/12)

Cut 4: Terminal Propagation

One terminal propagated

Node n and j connect to o/k/g: p1 propagated Node i and j connect to e/f/a: no propagation (inside window) Terminal p1 pulls n and j to right partition

slide-23
SLIDE 23

Practical Problems in VLSI Physical Design Mincut Placement (8/12)

Cut 5: Terminal Propagation

Three terminals propagated

Node i propagated to p1, j to p2, and g to p3 Terminal p1 pulls e and a to left partition Terminal p2 and p3 pull f/b/e to right partition

slide-24
SLIDE 24

Practical Problems in VLSI Physical Design Mincut Placement (9/12)

Cut 6: Terminal Propagation

One terminal propagated

Node n and j are propagated to p1 Terminal p1 pulls o and k to left partition

slide-25
SLIDE 25

Practical Problems in VLSI Physical Design Mincut Placement (10/12)

Cut 7: Terminal Propagation

Three terminals propagated

Node j/f/b propagated to p1, o/k to p2, and h/p to p3 Terminal p1 and p2 pull g and l to left partition Terminal p3 pull l and d to right partition

slide-26
SLIDE 26

Practical Problems in VLSI Physical Design Mincut Placement (11/12)

Cut 8 to 15

16 partitions generated by 15 cuts

HPBB wirelength = 23

slide-27
SLIDE 27

Practical Problems in VLSI Physical Design Mincut Placement (12/12)

Comparison

Quadrature vs recursive bisection + terminal propagation

Number of cuts: 6 vs 15 Wirelength: 27 vs 23

slide-28
SLIDE 28

Analytical Placement

  • Gordian package:

– GORDIAN: Gordian: VLSI Placement by Quadratic Programming and slicing Optimization: J. M. Kleinhans, G.Sigl, F.M. Johannes, K.J. Antreich, IEEE TCAD, 1991 – GORDIAN-L: Analytical Placement: A Linear or a Quadratic Objective Function?: G. Sigl, K. Doll, F.M. Johannes, DAC91

  • Gordian: A Quadratic Placement Approach

– Global optimization: solves a sequence of quadratic programming problems – Partitioning: enforces the non-overlap constraints

slide-29
SLIDE 29

i=0 i=29 i=58 i=87

slide-30
SLIDE 30

Adaptec1 Stats

  • Circuit stats

– # cells/nets/pins 210,863/219,687/19,205 – chip size 6000um × 6000um – bin size 50um × 50um – # placement bins 120 × 120 – Average bin occupancy 210K/1202 =14.6 gates/bin

  • Wirelength result (HPBB)

– iteration 0 34,069,060 – iteration 29 46,352,680 – iteration 58 80,783,336 – iteration 87 98,111,904

slide-31
SLIDE 31

Overview of Gordian Package

Procedure Gordian l:=1; global-optimize(l); while (there exists |Ml|>k) for each r є R(l) partition(r, r’, r”); l++; setup-constraints(l); global-optimize(l); repartition(l); final-placement(l); endprocedure

slide-32
SLIDE 32

Problem Definition

module u

x y connection to

  • ther modules

Squared wire length of net v pin vu (xuv, yuv) net node v (avu, bvu) = offset from center of u (xu, yu) (xv, yv) lvu

vu u uv vu u uv v uv M u v uv v

b y y a x x y y x x L

v

+ = + = − + − = ∑

, ] ) ( ) [(

2 2

slide-33
SLIDE 33

Cost Function

X d CX X x Y d CY Y X d CX X y x w L

T T T y T T x T v N v v

+ = + + + = = ∑

) ( ) , ( 2 1 φ φ φ

  • Minimize the following:
slide-34
SLIDE 34

Constraints

     ∈ = = =

∑ ∑ ∑

∈ ∈ ∈

  • therwise

if / , : constraint

p M i i i iu l l M u u p u M u u

M i F F a u X A F u x F

p p p

  • The center of gravity constraints

– At level l, chip is divided into q (≤ 2l ) regions – For region p, the center coordinates: (up, vp) – Mp: set of modules in region p – Matrix from for all regions

slide-35
SLIDE 35

Problem Formulation

D E F A B C

            = M L L M M M M M M M M M M M M M M M * * * * * * '

) (

ρ ρ

l

A G F E D C B A

(uρ, vρ) (uρ’, vρ’)

} that such ) ( { min : LQP problem g Programmin Quadratic d constraine Linearly

l l T T R x

u X A X d CX X x

m

= + = Φ

slide-36
SLIDE 36

Solution Method

  • Algebraic Manipulation

1 1 1 1 ) (

variable t independen is variable dependent is where , ] [ ] [ x ZX u D X I B D X u D BX D X X X u X X DB B D A

i i d i d i d q m q q q m q

+ =       +      − = + − = =       =

− − − − − × × ×

  • Unconstrained Quadratic Programming problem

– Solved by Conjugate-Gradient method d CX C X C CZX Z X x

T i T T T i i R x

q m i

+ = + = Ψ

where , } ) ( { min : UQP

slide-37
SLIDE 37

3 Types of Quadratic Programming (QP) Problems:

  • Positive Definite Hessian Matrix

(Bowl):

  • One optimal objective value
  • Convex
  • Easy: One Point
  • Semi‐definite Hessian Matrix

(Trough):

  • Line of optimal objective

values

  • Convex
  • Moderate: Any point on line
  • Indefinite Hessian Matrix (Saddle):
  • Optimal is on the boundaries.
  • Non‐Convex
  • NP Hard

Hessian Matrix

  • Second order partial

derivatives

  • Describes local curvature
  • Generalization of

Laplacian

  • Trace (Hessian) =

Laplacian

slide-38
SLIDE 38
  • Here C is Positive Definite.
  • Hence LQP or UQP are Convex
  • Unique optimal X.

Hence, GORDIAN QP always finds a global optimal Thanks to: Anirudha Kurhade, Fall 2016

GORDIAN Quadratic Programming (QP) Problem:

slide-39
SLIDE 39

Partitioning

  • Recursive partitioning is needed

– to resolve module overlap in global placement – global placement problem will be solved again with two additional center_of_gravity constraints

0.0 0.25 0.5 0.75 1.0 40 30 20 10

Cp(a)

∑ ∑ ∑

∈ ∈ ∈

= ≈ = ∈ ∈ ≤ →

C p p

N v v p M u u M u u p p u u p p p

w C F F M u M u x x M M M ) ( : value cut 5 . / ' ' and ' ) , (

'

' ' ' ' ' ' ' ' ' '

α α

slide-40
SLIDE 40

Repartitioning

  • Module exchange after each cut to improve cut size

– terminal propagation using global placement positions

  • Repartitioning

– to ‘undo’ the mistake made at the previous level: Procedure repartition(l) if overlap exists for each r∈R(l-1) merge-regions(r, r’, r’’); partition(r, r’, r’’); setup-constraints(l); global-optimize(l); endif

slide-41
SLIDE 41

Summary of Gordian

Global Optimization minimization of wire length Final Placement adoption of style dependent constraints module coordinates position constraints module coordinates Regions with ≤ k modules

Complexity: space = O(m), time = O(m1.5 log2m) Final placement: standard cell, macro-cell & SOG

Partitioning of module set and dissection of placement region

slide-42
SLIDE 42

Experimental Results

Circuit scb1 scb2 scb3 scb4 scb5 scb6 scb7 scb8 scb9 CPU-time scb8 CPU-time scb9 ratio GORDIAN 2.7 5.8 15.7 14.0 10.6 11.3 16.4 51.7 54.0 120s 135s 1 Min-Cut 3.1 5.3 25.6 16.9 11.3 12.7 20.2 89.2 98.6 366s 440s :3 Annealing 2.6 5.0 9.1 13.2 10.9 12.8 19.8 59.5 80.0 39851s 34709s :300 Area After Routing/mm2 Comparison of Results for Standard Cell Blocks

slide-43
SLIDE 43

Practical Problems in VLSI Physical Design GORDIAN Placement (1/21)

GORDIAN Placement

Perform GORDIAN placement

Uniform area and net weight, area balance factor = 0.5 Undirected graph model: each edge in k-clique gets weight 2/k

slide-44
SLIDE 44

Practical Problems in VLSI Physical Design GORDIAN Placement (2/21)

IO Placement

Necessary for GORDIAN to work

slide-45
SLIDE 45

Practical Problems in VLSI Physical Design GORDIAN Placement (3/21)

Adjacency Matrix

Shows connections among movable nodes

Among nodes a to j

slide-46
SLIDE 46

Practical Problems in VLSI Physical Design GORDIAN Placement (4/21)

Pin Connection Matrix

Shows connections between movable nodes and IO

Rows = movable nodes, columns = IO (fixed)

slide-47
SLIDE 47

Practical Problems in VLSI Physical Design GORDIAN Placement (5/21)

Degree Matrix

Based on both adjacency and pin connection matrices

Sum of entries in the same row (= node degree)

slide-48
SLIDE 48

Practical Problems in VLSI Physical Design GORDIAN Placement (6/21)

Laplacian Matrix

Degree matrix minus adjacency matrix

slide-49
SLIDE 49

Practical Problems in VLSI Physical Design GORDIAN Placement (7/21)

Fixed Pin Vectors

Based on pin connection matrix and IO location

Y-direction is defined similarly

slide-50
SLIDE 50

Practical Problems in VLSI Physical Design GORDIAN Placement (8/21)

Fixed Pin Vectors (cont)

slide-51
SLIDE 51

Practical Problems in VLSI Physical Design GORDIAN Placement (9/21)

Fixed Pin Vectors (cont)

slide-52
SLIDE 52

Practical Problems in VLSI Physical Design GORDIAN Placement (10/21)

Level 0 QP Formulation

No constraint necessary

slide-53
SLIDE 53

Practical Problems in VLSI Physical Design GORDIAN Placement (11/21)

Level 0 Placement

Cells with real dimension will overlap

slide-54
SLIDE 54

Practical Problems in VLSI Physical Design GORDIAN Placement (12/21)

Level 1 Partitioning

Perform level 1 partitioning

Obtain center locations for center-of-gravity constraints

slide-55
SLIDE 55

Practical Problems in VLSI Physical Design GORDIAN Placement (13/21)

Level 1 Constraint

slide-56
SLIDE 56

Practical Problems in VLSI Physical Design GORDIAN Placement (14/21)

Level 1 LQP Formulation

slide-57
SLIDE 57

Practical Problems in VLSI Physical Design GORDIAN Placement (15/21)

Level 1 Placement

slide-58
SLIDE 58

Practical Problems in VLSI Physical Design GORDIAN Placement (16/21)

Verification

Verify that the constraints are satisfied in the left partition

slide-59
SLIDE 59

Practical Problems in VLSI Physical Design GORDIAN Placement (17/21)

Level 2 Partitioning

Add two more cut-lines

This results in p1={c,d}, p2={a,b,e}, p3={g,j}, p4={f,h,i}

slide-60
SLIDE 60

Practical Problems in VLSI Physical Design GORDIAN Placement (18/21)

Level 2 Constraint

slide-61
SLIDE 61

Practical Problems in VLSI Physical Design GORDIAN Placement (19/21)

Level 2 LQP Formulation

slide-62
SLIDE 62

Practical Problems in VLSI Physical Design GORDIAN Placement (20/21)

Level 2 Placement

Clique-based wiring is shown

slide-63
SLIDE 63

Practical Problems in VLSI Physical Design GORDIAN Placement (21/21)

Summary

Center-of-gravity constraint

Helps spread the cells evenly while monitoring wirelength Removes overlaps among the cells (with real dimension)

slide-64
SLIDE 64

Linear vs. Quadratic Objective

A B C

fixed movable fixed a b g

Quadratic objective function A B C

fixed fixed movable g

Linear objective function

γ β α γ β α γ γ γ γ γ γ β α

φ φ φ l l l l l l l l l l l l l l l l l l

l q q

+ + = = = = = → = + − − = + − = + + = , 2 3 1 3 2 2 ) ( 4 ) ( 2

' 2 2 2 2 2

slide-65
SLIDE 65

Linear vs. Quadratic Objective

  • Quadratic objective function

– tends to make very long net shorter than linear objective function – lets short nets become slightly longer

row1 row2 row3 row4 row1 row2 row3 row4 A B A B

Linear objective function Quadratic objective function

slide-66
SLIDE 66

Optimizing Linear Objective

  • Global Placement with linear objective function

function

  • bjective

linear function

  • bjective

quadratic ) (

2

→ − = → − =

∑ ∑ ∑ ∑

∈ ∈ ∈ ∈ N v M u v uv l N v M u v uv q

v v

x x x x φ φ

  • Trick

– use quadratic programming to minimize linear objective function

∑ ∑ ∑ ∑ ∑

∈ ∈ ∈ ∈ ∈

− = − = − = − − =

v v v

M u v uv v v uv uv N v M u uv v uv N v M u v uv v uv l

x x g x x g g x x x x x x , ) ( ) (

2 2

φ

slide-67
SLIDE 67

Analytical Placement Results

100 200 300 400 2 4 6 8 1 1 2 1 4 1 7 Gordian GordianL

wire length /mm circuit primary 2 number of pins of a net Figure: Sum of wire lengths versus #pins

slide-68
SLIDE 68

Analytical Placement Results

Quadratic objective function Linear objective function (a) Global placement with 1 region

slide-69
SLIDE 69

Analytical Placement Results

Quadratic objective function Linear objective function (b) Global placement with 4 regions

slide-70
SLIDE 70

Analytical Placement Results

Quadratic objective function Linear objective function (c) Final placements