Large-Scale Circuit Placement: The Gap and Promise Jason Cong - - PowerPoint PPT Presentation

large scale circuit placement
SMART_READER_LITE
LIVE PREVIEW

Large-Scale Circuit Placement: The Gap and Promise Jason Cong - - PowerPoint PPT Presentation

Large-Scale Circuit Placement: The Gap and Promise Jason Cong Computer Science Department University of California, Los Angeles cong@cs.ucla.edu Contributors: Chin-Chih Chang, Kenton Sze, Tim Kong, Michail Romesis, Joe Shinnerl, Min Xie, Xin


slide-1
SLIDE 1

Large-Scale Circuit Placement:

The Gap and Promise

Contributors: Chin-Chih Chang, Kenton Sze, Tim Kong, Michail Romesis, Joe Shinnerl, Min Xie, Xin Yuan

Jason Cong

Computer Science Department

University of California, Los Angeles cong@cs.ucla.edu

slide-2
SLIDE 2

2

Outline

Optimality and scalability study of placement

problem -- gap analysis

Research on multilevel large-scale placement

problem

Our research plan

slide-3
SLIDE 3

3

Optimality and Scalability Study--- Motivation

Lack of significant progress in wirelength

reduction

Rate of reduction is about 5-10% every 2-3 years Latest developments in placement differ mainly in

runtime

Where do we stand

How much room for further improvement? Will existing placement engines scale well to 10+M gate

designs?

Need to quantify the optimality and scalability of

state-of-the-art placement engines

slide-4
SLIDE 4

4

Optimality and Scalability Study--- Motivation (II)

Most work compare only with existing heuristics Use real design based benchmarks

ISPD98 [C. Alpert 1998]

Use synthetic benchmarks

circ and gen [M. D. Hutton et al, 1998] gnl [D. Stroobandt et al, 2000]

Little understanding about the gap from the

  • ptimal
slide-5
SLIDE 5

5

Optimality and Scalability Study--- Related Work

  • Quantified Suboptimality of VLSI

Layout Heuristics [L. Hagen et al, 1995]

  • Construct scaled instance with

known upperbound x x x x x x x x x x

?

  • Over 10% area suboptimality in

TimberWolf

  • Notable wirelength

suboptimality in GORDIAN-L

  • But test cases are small, the

largest netlist is less than 40K

slide-6
SLIDE 6

6

Our Contribution: Placement Example Construction with Known Optimal Wirelength

  • Construct instances with

known optimal using the characteristic of the

  • riginal problem

?

Optimality and Scalability Study of Existing

Placement Algorithms [C. Chang et al, 2003]

  • Studied the optimality and

scalability of existing algorithms

  • n constructed instances
slide-7
SLIDE 7

7

Construction of Placement Examples with Known Optimal Wirelength (PEKO Examples)

Input Desired number of placeable modules t Net Distribution Vector (NDV) D = ( d2, d3, … dp ), dk

is the # of k-pin nets in the circuit

t and D are extracted from a real circuit

Output Cell library L Netlist N with known optimal wirelength Constraint N has D as its NDV

slide-8
SLIDE 8

8

Our Algorithm for Constructing PEKO Examples

All the modules are of equal size, and there is

no space between rows and adjacent modules

For 2-pin nets , connect any two adjacent

modules

For each n-pin net , connect the n modules in a

rectangular region close to a square, i.e., the length of each side is close to sqrt(n)

The wirelength is of each n-pin net is given by / 2 n n n

             

+ −

slide-9
SLIDE 9

9

Illustration: PEKO Example Construction

Input : t = 64, D = {d2=34,d3=20,d4=7,d5=4,d6=2, d7=1} Total WL = 110

#2-pin nets = 34, WL = 34 #7-pin nets = 1, WL = 4 #3-pin nets = 20, WL = 40 #5-pin nets = 4, WL = 12 #6-pin nets = 2, WL = 6 #4-pin nets = 7, WL= 14

  • Method first conceived by K. Boese (1995), but not implemented
slide-10
SLIDE 10

10

White Space Insertion

Option 1: expanding one dimension

  • f the chip

Option 2: removing some of the nets

Need for white space

mimic real designs Ease for legalization

slide-11
SLIDE 11

11

Four New Suites of Placement Examples with Known Optimal Wirelength

Module number t and NDV extracted from

ISPD98 [C. Alpert, 1998]

Two suites without pads (suite1 and suite2)

suite2 is derived by scaling t and NDV by a factor of

10

Two suites with pads (suite3 and suite4)

suite4 is derived by scaling t and NDV by a factor of

10

15% white space by expanding on dimension of

the chip

URL: http://ballade.cs.ucla.edu/~pubbench/peko.htm

slide-12
SLIDE 12

12

PEKO Characteristics

ckt #cell #net #row Optimal WL Peko01 12506 13865 113 8.14E+05 Peko02 19342 19325 140 1.26E+06 Peko03 22853 27118 152 1.50E+06 Peko04 27220 31683 166 1.75E+06 Peko05 28146 27777 169 1.91E+06 Peko06 32332 34660 181 2.06E+06 Peko07 45639 47830 215 2.88E+06 Peko08 51023 50227 227 3.14E+06 Peko09 53110 60617 231 3.64E+06 Peko10 68685 74452 263 4.73E+06 Peko11 70152 81048 266 4.71E+06 Peko12 70439 76603 266 5.00E+06 Peko13 83709 99176 290 5.87E+06 Peko14 147088 152255 385 9.01E+06 Peko15 161187 186225 402 1.15E+07 Peko16 182980 189544 429 1.25E+07 Peko17 184752 188838 431 1.34E+07 Peko18 210341 201648 460 1.32E+07 ckt #cell #net #row Optimal WL Peko01x10 125060 138650 335 8.14E+06 Peko02x10 193420 193250 441 1.26E+07 Peko03x10 228530 271180 479 1.50E+07 Peko04x10 272200 316830 523 1.75E+07 Peko05x10 281460 277770 532 1.91E+07 Peko06x10 323320 346600 570 2.06E+07 Peko07x10 456390 478300 677 2.88E+07 Peko08x10 510230 502270 715 3.14E+07 Peko09x10 531100 606170 730 3.64E+07 Peko10x10 686850 744520 830 4.73E+07 Peko11x10 701520 810480 839 4.71E+07 Peko12x10 704390 766030 840 5.00E+07 Peko13x10 837090 991760 916 5.87E+07 Peko14x10 1470880 1522550 1214 9.01E+07 Peko15x10 1611870 1862250 1271 1.15E+08 Peko16x10 1829800 1895440 1354 1.25E+08 Peko17x10 1847520 1888380 1360 1.34E+08 Peko18x10 2103410 2016480 1451 1.32E+08

PEKO Suite1 ( 12.5k – 210k ) PEKO Suite2 ( 125k – 2.1M )

slide-13
SLIDE 13

13

Tested four State-of-the-Art Placers

Capo [A. E. Caldwell et al, 2000] based on multilevel partitioner aims to enhance the routability Dragon [M. Wang et al, 2000] uses hMetis for initial partition SA with bin-based swapping mPL [T. Chan et al, 2000] nonlinear programming on the coarsest level Goto based relaxation QPlace [Cadence Inc.] quadratic programming component of Silicon Ensemble

slide-14
SLIDE 14

14

Experiment with State-of-the-Art Placers Using PEKO Suite1

1.00 1.20 1.40 1.60 1.80 2.00 2.20 2.40 2.60 2.80 50000 100000 150000 200000 250000 #cells Multiple of Optim al

Dragon v.2.20 capo v.8.0 mPL v.1.2 qplace v.5.1.55

5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 100000 150000 200000 250000 #cells runtime(s)

Dragon v.2.20 capo v.8.0 mPL v.1.2 qplace v.5.1.55

Existing algorithms are 66-153% away from the optimal on PEKO On examples with pads

mPL and QPlace show improvement of 12% and 10% respectively Dragon and Capo do not benefit much from the additional information

There is significant room for improvement in placement algorithms!

slide-15
SLIDE 15

15

Experiment with State-of-the-Art Placers Using PEKO Suite1 & Suite2

10000 20000 30000 40000 50000 60000 10000 100000 1000000 10000000 #cells runtime(s)

Dragon v.2.20 capo v.8.0 mPL v.1.2 qplace v.5.11.55 1.00 1.20 1.40 1.60 1.80 2.00 2.20 2.40 2.60 2.80 10000 100000 1000000 10000000 #cells Multiple of Optimal

Dragon v.2.0 capo v.8.0 mPL v.1.2 qplace v.5.1.55

Capo, QPlace and mPL scales well in runtime Average solution quality of each tool shows deterioration by

an additional 4% to 25% when the problem size increases by a factor of 10

QoR of the existing placement algorithms can be 80% - 180% away

from the optimal for large designs

slide-16
SLIDE 16

16

slide-17
SLIDE 17

17

Limitation of PEKO Examples

Optimal solution includes local nets only

Unlikely for real designs

Measure wirelength only

Timing and routability are important objectives

for placement algorithms as well

slide-18
SLIDE 18

18

Impact of Global Connections in Real Examples

circuit height width WL of longest net WL contribution

  • f longest 10%

ibm01 8158 4530 7148 51% ibm02 8158 6430 14224 46% ibm03 8158 6740 10624 58% ibm04 8158 9140 15171 53% ibm05 8158 11055 19064 47% ibm06 8158 8715 13966 61% ibm07 8158 14605 14051 51% ibm08 8158 15895 16142 60% ibm09 8158 16395 13780 55% ibm10 8158 27890 30755 53% ibm11 16350 10925 19234 59% ibm12 16350 15545 26748 52% ibm13 16350 12230 19539 59% ibm14 16350 25475 26370 61% ibm15 16350 23785 27284 63% ibm16 16350 34015 42860 59% ibm17 16283 38895 45686 56% ibm18 16350 37065 52846 64%

Produced by Dragon

  • n ISPD98

The wirelength

contribution from global connections can be significant!

Need to consider the

impact of global connections

slide-19
SLIDE 19

19

Placement Examples with Known Upperbounds (PEKU)

Extend PEKO by introducing non-local

nets to mimic global connections

All the modules are of equal size, and there is

no space between rows and adjacent modules

For nets of degree i

i, a subset of them are generated by randomly connecting i i modules, the rest are generated optimally as in PEKO

slide-20
SLIDE 20

20

Placement Examples with Known Upperbounds (PEKU)

Input : t = 64, D = {d2=34,d3=20,d4=7,d5=4,d6=2, d7=1} α=0.2 Total WL = 160

Generate 28 2-pin optimally Generate 6 2-pin randomly Generate 16 3-pin optimally Generate 4 3-pin randomly Generate 6 4-pin optimally Generate 1 4-pin randomly Generate 4 5-pin optimally Generate 2 6-pin optimally Generate 1 7-pin optimally

slide-21
SLIDE 21

21

Placement Examples with Global Connections only (G-PEKU)

Each net connects

either a row or column

Obvious upper bound

Sum the length of each

row and column

Similar to datapath

examples

Input : t = 64

slide-22
SLIDE 22

22

PEKU Suite

Module numbers and NDVs extracted

from ISPD98

Remove connections with pads Vary α from 0 to 10% 15% white space by expanding one

dimension of the chip

slide-23
SLIDE 23

23

PEKU Suite

% non- local nets circuit #cell #net #row Row utilizatio n LB UB Peku01 12506 14111 113 85% 8.14E+05 8.14E+05 Peku05 28146 28446 169 85% 1.91E+06 1.91E+06 Peku10 68685 75196 263 85% 4.73E+06 4.73E+06 Peku15 161187 186608 402 85% 1.15E+07 1.15E+07 Peku18 210341 201920 460 85% 1.32E+07 1.32E+07 Peku01 12506 14111 113 85% 8.14E+05 9.23E+05 Peku05 28146 28446 169 85% 1.91E+06 2.24E+06 Peku10 68685 75196 263 85% 4.73E+06 6.17E+06 Peku15 161187 186608 402 85% 1.15E+07 1.71E+07 Peku18 210341 201920 460 85% 1.32E+07 2.01E+07 Peku01 12506 14111 113 85% 8.14E+05 1.02E+06 Peku05 28146 28446 169 85% 1.91E+06 2.63E+06 Peku10 68685 75196 263 85% 4.73E+06 7.52E+06 Peku15 161187 186608 402 85% 1.15E+07 2.30E+07 Peku18 210341 201920 460 85% 1.32E+07 2.75E+07 Up to 10% 0.25% 0.50%

URL: http://cadlab.cs.ucla.edu/~pubbench/peku.htm

slide-24
SLIDE 24

24

G-PEKU Suite

circuit #cell #net #row UB GPeku01 12506 224 113 7.93E+05 GPeku05 28146 336 169 1.79E+06 GPeku10 68685 525 263 4.38E+06 GPeku15 161187 803 402 1.03E+07 GPeku18 210341 918 460 1.34E+07

Module numbers extracted from ISPD98 URL: http://cadlab.cs.ucla.edu/~pubbench/peku.htm

slide-25
SLIDE 25

25

Studied Four State-of-the-Art Placers

Capo [A. Caldwell et al, 2000] Based on multilevel partitioner Aims to enhance the routability Dragon [M. Wang et al, 2000] Uses hMetis for initial partition SA with bin-based swapping mPL [T. Chan et al, 2000] Nonlinear programming on the coarsest level Goto based relaxation mPG [C. Chang et al, 2002] Uses FC clustering and hierarchical density control Incremental A-tree for routability

slide-26
SLIDE 26

26

Wirelength Comparison between State-of-the-Art Placers Using PEKU

1 1.2 1.4 1.6 1.8 2 2.2 0.00% 0.25% 0.50% 0.75% 1.00% 2.00% 5.00% 10.00% % of non-local nets Quality Ratio Capo v.8.5 Dragon v.2.20 mPG v.1.0 mPL v.2.0

mPL’s QR increases when α is increased from 0 to 0.75%,

while for the other three placers, QRs are steadily decreasing

Absolute value of the QRs may not be meaningful, but it

helps to identify the technique that works best under each scenario

slide-27
SLIDE 27

27

Wirelength Comparison between State-of- the-Art Placers Using G-PEKU

The gap between their solutions and the upper bound

varies between 79% and 102% in the worst case

Another validation that there is significant room for

improvement for the placement problem

circuit Dragon v.2.20 QR Capo v.8.5 QR mPG v.1.0 QR mPL v.2.0 QR GPeku01 1.98 1.56 1.91 1.69 GPeku05 2.01 1.69 1.97 1.83 GPeku10 2.02 1.72 1.98 1.94 GPeku15 1.99 1.79 1.97 1.97 GPeku18 2.02 1.78 1.98 1.98

slide-28
SLIDE 28

28

Significant Opportunity for Better Circuit Placement Algorithms

Best available placement algorithms can be

80% - 180% away from optimal

A significant research and business opportunity

Use of copper interconnect is equivalent to 30%

wirelength reduction

One process generation (e.g. from 0.13um to

0.10um) is equivalent to 30% wirelength reduction

Both requires multi-billion dollar investment!

Better placement may extend/accelerate

Moore’s law by 2-3 generations !

slide-29
SLIDE 29

29

Outline

Optimality and scalability study of placement

problem

Our research on large-scale placement problem Our research plan

slide-30
SLIDE 30

30

Overview of UCLA research on Large-Scale Placement Problem

Multilevel Optimization --- a highly scalable framework

suitable for complex constraints

Hierarchy construction by recursive aggregation Intralevel optimization by various techniques Interpolation to transfer from level to level

Two Parallel Efforts

mPL: multilevel multiheuristic/hybrid optimization mPG: multilevel simulated annealing with congestion control and

mixed block support

slide-31
SLIDE 31

31

Initial Fine-Grain Problem

Multilevel Optimization Framework

Intermediate Level Intermediate Level Coarse-Grain Problem Direct, nonhierarchical solution Aggregate Aggregate

Intermediate Level Relaxation (Refinement) Intermediate Level Relaxation (Refinement)

Final Fine-Grain Problem. Thorough Relaxation and Detailed Solution Aggregate

Interpolate

Interpolate

Aggregate etc.

etc. Interpolate Interpolate

slide-32
SLIDE 32

32

Multilevel Methods in Scientific Computation

Originally developed to solve boundary-value PDE

problems on spatial domains

Discretized elliptic PDE is a structured, positive-definite system of

linear equations

Rapidly extended to problems without physical grids during

the last 30 years

Algebraic Multigrid (AMG)

Most research has been for continuous models, but recent

efforts have targeted large-scale discrete optimization

Graph/Hypergraph Partitioning, traveling salesman, VLSI Physical

Design

slide-33
SLIDE 33

33

Overview of mPL

Recursive Clustering

Version 1.0 Edge-Separability (CapForest) Versions 1.1 – 2.0: FirstChoice

Nonlinear Programming at coarsest level(s) Slot assignment and discrete refinement at all levels Recent Enhancements to Version 2.0

AMG-based weighted disaggregation Quadratic relaxation on subsets (QRS) at all levels Distance-based reaggregation for iterated multilevel flow

slide-34
SLIDE 34

34

Multilevel Placement Study

Coarsening

ESC, AMG-based, First Choice Clustering (FC)

Relaxation (Intralevel Optimization)

Interior-point nonlinear programming k-cycle Goto-style discrete exchange Quadratic relaxation on subsets (QRS)

Interpolation

Declustering + partitioning + slot assignment AMG-style weighted averaging

Iterated Multilevel Flow

Repeated/Recursive V-cycles with distance-based

reaggregation

slide-35
SLIDE 35

35

Coarsening by Recursive Aggregation

Edge-Separability Clustering (ESC) [Cong and Lim,

2000] Use CAPFOREST [Ibaraki and Nagamochi, 1992] to estimate all-pairs min-cut q(x,y) in N log N time. Rank pairs by connectivity and area.

First-Choice Clustering (FC) [Karypis, 1999]

Match each vertex with a neighboring vertex with which it shares the most total hyperedge weight, subject to area- balance constraints. Clusters are connected components.

Weighted Aggregation (AMG)

Split the nodes into C-points and F-points. Associate each F- point with several C-points by weighted average.

slide-36
SLIDE 36

36

Multilevel Placement Study

Coarsening

ESC, AMG, FC, FC+opt, PD-FC

Relaxation (Intralevel Optimization)

Interior-point nonlinear programming k-cycle Goto-style discrete exchange Quadratic relaxation on subsets (QRS)

Interpolation

Declustering + partitioning + slot assignment AMG-style weighted averaging

Iterated Multilevel Flow

Repeated/Recursive V-cycles with reaggregation

slide-37
SLIDE 37

37

mPL Coarse-Level Formulation

Nonlinear-Programming Formulation

Direct formulation for the coarse placement problem Cells are modeled as circular disks for smoothness Quadratic wirelength objective on a clique-model Pairwise nonoverlap constraints

Can accelerate evaluation with adaptation of Fast Multipole

Method

Nonuniform sizes are OK, but reshaping is difficult to

incorporate efficiently

Reasonable performance for coarse-level sizes N <=

500 only.

slide-38
SLIDE 38

38

Interior-Point-Based Nonlinear Programming

Analogous to force-directed methods, but with direct

handling of nonconvex objectives and constraints

Effective but relatively expensive; affordable only at the

coarsest level(s) of mPL 1.0—2.0.

Net impact: 15% wirelength improvement overall

Global cell movement is not scalable to finer levels Plan: Restrict to cell subsets to produce improved, scalable

local relaxations at ALL levels

To date, we have implemented Linear-Programming-based and Quadratic-Programming-based subset relaxations

slide-39
SLIDE 39

39

Goto-based Discrete Relaxation

Each cell’s optimal location is readily calculated when

all other cells are held fixed.

Compute a chain A, B, C, D, E, where

B is a randomly selected neighbor of A’s optimal location, etc.

Examine all permutations of the chain and take the

best one.

Problem: the chain is not closed (A is not necessarily

near any other cell’s optimal location).

slide-40
SLIDE 40

40

Quadratic Relaxation on Noncontiguous Subsets (QRS)

Select a subset M of cells to move. M is obtained as segments of length 3 along a DFS

vertex traversal of the netlist, where starting the DFS at a vertex connected with largest wirelength.

Identify other cells and pads, F, connected to M by

nets in

Decouple the horizontal and vertical problems.

}. | { φ ≠ ∩ ∈ = M e E e E M

slide-41
SLIDE 41

41

Solving the subproblem

Problem formulation (horizontal case): Iterative solve the weighted quadratic minimization

problem, using the current solution to determine the weight (Gordian-L).

number. small is , ) ( | | 1 where | ) ( | ) ) ( ( min

) ( ) ( 2

ε ε

∑ ∑ ∑

∈ ∈ ∈

= + − −

e v e E e e v k e k e

v x e x x v x x v x

M

slide-42
SLIDE 42

42

Ripple-move legalization [Hur and Lillis, 2000]

Calculate a max-gain monotone path on the bin grid Because QRS ignores overlap constraints, post- QRS cell swaps are used to remove the overlap.

slide-43
SLIDE 43

43

Impact of QRS Relaxation

mPL 1.2 mPL 2.0 (with QRS) 2 V-cycles; AMG; No QRS 2 V-cycles; AMG; QRS Circuit WL Time WL %improTime Rel. Time Bin size ibm04 7.20E+06 506 6.83E+06 5.14% 866 1.71 2x2 ibm07 1.11E+07 749 1.02E+07 8.11% 1302 1.74 2x2 ibm09 1.22E+07 860 1.13E+07 7.38% 1569 1.82 3x3 ibm10 1.97E+07 1285 1.91E+07 3.05% 2419 1.88 3x3 ibm14 4.22E+07 2524 4.03E+07 4.50% 5846 2.32 3x3 ibm16 5.72E+07 4018 5.25E+07 8.22% 16760 4.17 4x4 ibm17 7.19E+07 5051 6.78E+07 5.70% 12240 2.42 4x4 ibm18 5.63E+07 4743 5.45E+07 3.20% 13507 2.85 4x4

avg. 5.66% 2.36

slide-44
SLIDE 44

44

Multilevel Placement Study

Coarsening

ESC, AMG, FC, FC+opt, PD-FC

Relaxation (Intralevel Optimization)

Interior-point nonlinear programming k-cycle Goto-style discrete exchange Quadratic relaxation on subsets (QRS)

Interpolation

Declustering + partitioning + slot assignment AMG-style weighted averaging

Iterated Multilevel Flow

Repeated/Recursive V-cycles with reaggregation

slide-45
SLIDE 45

45

AMG-style Linear Interpolation

The inherited position of a cluster component ( ) is determined by several cluster positions, not just its

  • wn.
slide-46
SLIDE 46

46

AMG-based Interpolation

Use the clique-model (graph) to define connectivity

weights

For each FC-cluster, select one node of maximal

connectivity as a C-point

Each C-point is placed at its cluster’s position. Each F-point is placed at the weighted average of the

C-points and F-points to which it is connected

The F-points’ positions can be iteratively improved.

slide-47
SLIDE 47

47

Impact of AMG Interpolation

1 vcycle 1 vcycle wirelength runtime mPL1.1 mPL1.1 AMG %improved %incr Circuit

  • dom. WL

Time

  • dom. WL

Time by AMG ibm04 7.71E+06 261 7.33E+06 427 4.93% 63.60% ibm07 1.18E+07 396 1.12E+07 437 5.08% 10.35% ibm09 1.29E+07 455 1.28E+07 563 0.78% 23.74% ibm10 2.11E+07 661 2.08E+07 744 1.42% 12.56% ibm14 4.54E+07 1982 4.28E+07 2226 5.73% 12.31% ibm16 5.88E+07 3187 5.73E+07 3615 2.55% 13.43% ibm17 8.17E+07 4173 8.08E+07 4416 1.10% 5.82% ibm18 5.81E+07 4051 5.78E+07 4496 0.52% 10.98%

avg. 2.76% 19.10%

slide-48
SLIDE 48

48

Multilevel Placement Study

Coarsening

ESC, AMG, FC, FC+opt, PD-FC

Relaxation (Intralevel Optimization)

Interior-point nonlinear programming k-cycle Goto-style discrete exchange Quadratic relaxation on subsets (QRS)

Interpolation

Declustering + partitioning + slot assignment AMG-style weighted averaging

Iterated Multilevel Flow

Repeated/Recursive V-cycles with distance-based

reaggregation

slide-49
SLIDE 49

49

Iterated Multilevel Flow

Iterated V-Cycles O(logN)) F-Cycle (O(logNlogN)) Backtracking V-Cycle O(logN)

slide-50
SLIDE 50

50

Adjustable Vertex Affinity for Reggregation

Initially, affinity is connectivity + area balancing Subsequently, distance

is incorporated to retain the information from preceding cycles

slide-51
SLIDE 51

51

Impact of 2nd V-cycle

using proximity and connectivity in the 2nd+ coarsening pass(es)

(NP+Goto-exchange only; AMG; FC Coarsening)

Circuit

  • Rel. Time %improved

ibm04 1.10 2.32% ibm07 1.40 3.57% ibm09 1.46 4.69% ibm10 1.49 1.44% ibm14 1.28 1.64% ibm16 1.31 2.27% ibm17 1.30 7.43% ibm18 1.26 5.36%

Avgs. 1.33 3.59%

slide-52
SLIDE 52

52

Summary of Improvements: 12% wirelength reduction

Achieved using the following techniques (mPL2.0)

Coarsening by area-balanced first-choice clustering Relaxation (Intralevel Optimization)

Interior-point nonlinear programming at coarsest level k-cycle Goto-style discrete exchange at every level Quadratic relaxation on subsets (QRS) at every level

Interpolation by AMG-style weighted averaging Iterated V-cycles with distance-based reaggregation

slide-53
SLIDE 53

53

mPL2.0 vs. mPL1.1, Capo8.5, Dragon and Gordian-L

Circuit

mPL1.0 Capo8.5 Dragon Gor-L mPL1.0 Capo8.5 Dragon Gor-L

ibm04

1.12 1.07 0.93 1.00 0.30 0.51 2.91 1.82

ibm07

1.16 1.14 0.97 1.07 0.30 0.54 2.98 3.37

ibm09

1.14 1.12 1.01 1.04 0.29 0.59 4.75 4.31

ibm10

1.11 1.09 0.98 0.98 0.27 0.49 4.20 5.84

ibm14

1.13 1.05 0.92 1.01 0.34 0.47 2.48 6.78

ibm16

1.12 1.12 0.95 1.05 0.19 0.22 2.84 4.88

ibm17

1.21 1.13 1.00 1.00 0.34 0.33 5.27 8.04

ibm18

1.07 1.06 0.92 0.99 0.30 0.31 4.34 9.56

Averages 1.13

1.10 0.96 1.02 0.29 0.43 3.72 5.58

Wirelength/mPL2 CPU time/mPL2 Uniform-Cell-Size IBM/ISPD 98 Circuits

slide-54
SLIDE 54

54

mPL2.0 vs. mPL1.1, Capo8.5, Dragon and Gordian-L

0.00 1.00 2.00 3.00 4.00 5.00 6.00 0.95 1.00 1.05 1.10 1.15 scaled wirelength scaled runtime mPL2.0 mPL1.1 Capo8.5 Dragon Gordian-L

slide-55
SLIDE 55

55

Net Impact of Improvements

FC, QRS relax, AMG, 2 V-cycles

mPL2.0-Dom. mPL2.0-Dom.

  • vs. mPL 1.1-Dom.

Circuit WL CPU WL CPU

ibm04 6.83E+06 866 0.89 3.32 ibm07 1.02E+07 1302 0.86 3.29 ibm09 1.13E+07 1569 0.88 3.45 ibm10 1.91E+07 2419 0.91 3.66 ibm14 4.03E+07 5846 0.89 2.95 ibm16 5.25E+07 16760 0.89 5.26 ibm17 6.78E+07 12240 0.83 2.93 ibm18 5.45E+07 13507 0.94 3.33

Average 0.88 3.52

slide-56
SLIDE 56

56

Comparison of mPL on PEKO using PEKO Suite-I and Suite-II

Comparison with the optimal

0.00 0.50 1.00 1.50 2.00 100000 200000 300000 400000 500000 600000 #cells Multiple of optimal m PL v.1.2 m PL v.2.0

Total runtime

5000 10000 15000 20000 100000 200000 300000 400000 500000 600000 #cells runtim e(s) mPL v.1.2 mPL v.2.0

mPL v.2.0 improves the Quality Ratio by 8% on the average mPL v.2.0 increases the runtime by 131% on the average

slide-57
SLIDE 57

57

Multilevel SA-based Coarse Placement: mPG

Multi-level coarse placement for physical

hierarchy generation

A multi-level SA-based framework Support mixed-size large scale global placement Support routing congestion control Integrate retiming with placement

slide-58
SLIDE 58

58

Summary

There is significant opportunity for circuit

placement improvement

Possibly equal to several process generation

advancement

Multilevel method is a promising approach

to large-scale circuit placement

slide-59
SLIDE 59

59

Our Research Objectives

Fast, scalable, superior-quality placements

by multilevel optimization

Aim at 20-30% improvement (= 1 technology

generation!) in 3 years

Support

large-scale mixed-size placement problem Constraint-driven placement for delay, routing etc. Incremental placement to support dynamic netlist

adjustment.

slide-60
SLIDE 60

60

Related Publications

  • T. Chan, J. Cong, T. Kong and J. Shinnerl, “Multilevel

Optimization for Large-scale Circuit Placement,” ICCAD 2000.

Tim Kong. Novel Techniques for Large-Scale Circuit Placement.

Ph.D. Thesis, CS Dept., UCLA 2002.

Chin-Chih Chang, Jason Cong, David Pan, Xin Yuan. “Physical

Hierarchy Generation with Routing Congestion Control”, ISPD 2002.

Chin-Chih Chang, Jason Cong, Min Xie. “Optimality and

Scalability Study of Existing Placement Algorithms.” ASP-DAC, 2003.

Chin-Chih Chang, Jason Cong, Xin Yuan. “Multi-level Placement

for Large-Scale Mixed-Size IC Designs”, ASP-DAC 2003.

Cong and Shinnerl (editors). Multilevel Optimization and

  • VLSICAD. Kluwer Academic Publishers, 2002.
slide-61
SLIDE 61

61

http://www.wkap.nl/prod/b/1-4020-1081-8