Large Scale Circuit Placement: Large Scale Circuit Placement: Gap - - PowerPoint PPT Presentation

large scale circuit placement large scale circuit
SMART_READER_LITE
LIVE PREVIEW

Large Scale Circuit Placement: Large Scale Circuit Placement: Gap - - PowerPoint PPT Presentation

Large Scale Circuit Placement: Large Scale Circuit Placement: Gap and Progress Gap and Progress Jason Cong Jason Cong UCLA Computer Science Department UCLA Computer Science Department http://cadlab.cs.ucla.edu cadlab.cs.ucla.edu/~cong


slide-1
SLIDE 1

Large Scale Circuit Placement: Large Scale Circuit Placement: Gap and Progress Gap and Progress

Jason Cong Jason Cong

UCLA Computer Science Department UCLA Computer Science Department http:// http://cadlab.cs.ucla.edu cadlab.cs.ucla.edu/~cong /~cong cong@cs.ucla.edu cong@cs.ucla.edu

Joint work with Tony Chan, Joe Joint work with Tony Chan, Joe Shinnerl Shinnerl, Kenton , Kenton Sze Sze, and Min , and Min Xie Xie

slide-2
SLIDE 2

3/29/2005 UCLA VLSICAD LAB 2

Outline Outline

  • Introduction

Introduction

  • Problem Description

Problem Description

  • Popular Methods

Popular Methods

  • Gap Analysis of Existing Placement Algorithms

Gap Analysis of Existing Placement Algorithms

  • PEKO Benchmark Construction

PEKO Benchmark Construction

  • Experiment Results

Experiment Results

  • UCLA mPL5

UCLA mPL5

  • Multiscale Optimization Framework

Multiscale Optimization Framework

  • Generic Force

Generic Force-

  • Directed Formulation

Directed Formulation

  • Multiscale Nonlinear

Multiscale Nonlinear-

  • Programming Solution

Programming Solution

slide-3
SLIDE 3

3/29/2005 UCLA VLSICAD LAB 3

Complex IC Design Example Complex IC Design Example

  • High

High-

  • end Automotive A/V application

end Automotive A/V application

  • 10M Gates

10M Gates

  • 70 clocks (320 MHz)

70 clocks (320 MHz)

  • Technology: 0.13u, 6LM

Technology: 0.13u, 6LM

Courtesy of Magma Design Automation Courtesy of Magma Design Automation

slide-4
SLIDE 4

3/29/2005 UCLA VLSICAD LAB 4

VLSI CAD VLSI CAD

  • Computer

Computer-

  • aided design (CAD) of very large

aided design (CAD) of very large-

  • scale integrated (VLSI) circuits

scale integrated (VLSI) circuits

  • Electronic design automation (EDA)

Electronic design automation (EDA)

slide-5
SLIDE 5

3/29/2005 UCLA VLSICAD LAB 5

ITRS 2004 (Int ITRS 2004 (Int’ ’l Tech. Roadmap for Semiconductors) l Tech. Roadmap for Semiconductors)

50 50 57 57 65 65 70 70 80 80 90 90 100 100 DRAM DRAM ½ ½ pitch pitch (nm) (nm) 16 16 16 16 15 15 15 15 15 15 14 14 13 13 Maximum wiring Maximum wiring level level 12369 12369 10972 10972 9285 9285 6783 6783 5204 5204 4171 4171 2976 2976 On chip local On chip local clock (MHz) clock (MHz) 2 28 80 2 28 80 280 280 2 28 80 2 28 80 2 28 80 280 280 Chip size at Chip size at production (mm production (mm2

2)

) 1227 1227 974 974 773 773 614 614 487 487 386 386 307 307 Functions/Chip at Functions/Chip at production production (M transistors) (M transistors) 2009 2009 2008 2008 2007 2007 2006 2006 2005 2005 2004 2004 2003 2003 Year of Year of production production

slide-6
SLIDE 6

3/29/2005 UCLA VLSICAD LAB 6

Outline Outline

  • Introduction

Introduction

  • Problem Description

Problem Description

  • Popular Methods

Popular Methods

  • Gap Analysis of Existing Placement Algorithms

Gap Analysis of Existing Placement Algorithms

  • UCLA mPL5

UCLA mPL5

slide-7
SLIDE 7

3/29/2005 UCLA VLSICAD LAB 7

Circuit Placement Problem Statement Circuit Placement Problem Statement

  • Given

Given

  • A set of cells ( modules ) of fixed dimensions and the

A set of cells ( modules ) of fixed dimensions and the interconnections interconnections between them between them – – a a netlist netlist

  • Find

Find

  • The position of each cell, such that

The position of each cell, such that

  • no overlap ( and enough routing space )

no overlap ( and enough routing space )

  • minimize total length of all interconnections

minimize total length of all interconnections

  • minimize routing congestion, delay,

minimize routing congestion, delay, … …

  • An NP

An NP-

  • hard problem

hard problem

Bad placement Good placement

D A C G E H B F I

A netlist A net A cell

slide-8
SLIDE 8

3/29/2005 UCLA VLSICAD LAB 8

Popular Placement Methods Popular Placement Methods

Placement problem has been studied extensively for over 30 years Placement problem has been studied extensively for over 30 years

  • Iterative improvement

Iterative improvement

  • Repeatedly rearrange small subsets of modules

Repeatedly rearrange small subsets of modules

  • E.g. Simulated annealing

E.g. Simulated annealing

  • Min

Min-

  • cut based placement

cut based placement

  • Recursively bi

Recursively bi-

  • partition modules in a way that minimize

partition modules in a way that minimize connections between partition blocks connections between partition blocks

  • Quadratic placement with recursive legalization

Quadratic placement with recursive legalization

  • Initial solution by unconstrained quadratic wirelength

Initial solution by unconstrained quadratic wirelength minimization minimization

  • Gradually spread cells out to remove overlap

Gradually spread cells out to remove overlap

slide-9
SLIDE 9

3/29/2005 UCLA VLSICAD LAB 9

Simulated Annealing Based Placement Simulated Annealing Based Placement

E.g. VPR [Betz and Rose, 1997 ] E.g. VPR [Betz and Rose, 1997 ] Overview Overview

  • 2. Select one of its neighbors.
  • 3. Evaluate the wirelength change

due to swapping them.

?

  • 4. If the swap decreases wirelength, accept it.

Otherwise, accept the swap with probability

  • 5. Repeat for reduced T until T approaches to 0
  • 1. Select a module.
slide-10
SLIDE 10

3/29/2005 UCLA VLSICAD LAB 10

Initially, there is only netlist connectivity; no spatial information is available.

Cutsize Cutsize-

  • Driven

Driven Recursive Top Recursive Top-

  • Down Partitioning

Down Partitioning

Apply a standard partitioning algorithm to the given netlist. Multilevel partitioning algorithms are the most effective. After two stages, each cell has been assigned to one of four possible subregions. As few nets as possible have been cut. After three stages, each cell has been assigned to one of eight possible subregions. As few nets as possible have been cut. Iterative improvement by repartitioning with terminal propagation is essential.

slide-11
SLIDE 11

3/29/2005 UCLA VLSICAD LAB 11

Cutsize Cutsize-

  • Driven

Driven Partitioning Partitioning-

  • Based Placement

Based Placement

  • Cutsize = the number of nets not contained in just one

Cutsize = the number of nets not contained in just one side of the partition side of the partition

  • Rent

Rent’ ’s rule shows that wirelength and cutsize correlate to s rule shows that wirelength and cutsize correlate to within about within about X

X2 log

2 log N N [Wang et al, 2000]. [Wang et al, 2000].

  • Fast FM

Fast FM-

  • style iterations with terminal propagation

style iterations with terminal propagation

  • Careful

Careful cutline cutline selection and selection and multiway multiway partitions can help partitions can help

  • e.g. Capo, Feng

e.g. Capo, Feng-

  • Shui, Dragon

Shui, Dragon

slide-12
SLIDE 12

3/29/2005 UCLA VLSICAD LAB 12

Quadratic Placement Quadratic Placement

Optimality Condition:

Example.

5 4 2 3 1 5 3 4 2

Q is the graph Laplacian Matrix: Q = D – G where D is the degree matrix and G is the graph adjacency matrix

slide-13
SLIDE 13

3/29/2005 UCLA VLSICAD LAB 13

Quadratic Placement with Iterative Legalization Quadratic Placement with Iterative Legalization

  • Unconstrained Optimality Condition:

Unconstrained Optimality Condition:

  • Solve one large symmetric positive

Solve one large symmetric positive-

  • definite linear system.

definite linear system.

  • Pads prevent cells from collapsing to a single point.

Pads prevent cells from collapsing to a single point.

  • Example: Gordian

Example: Gordian-

  • L.

L.

  • Minimize cutsize, but use the given placement to form initial pa

Minimize cutsize, but use the given placement to form initial partitions rtitions (e.g., using (e.g., using x x-

  • or
  • r y

y-

  • coordinate median for

coordinate median for cutline cutline) )

  • New subregions generate new center

New subregions generate new center-

  • of mass constraints for subsequent
  • f mass constraints for subsequent

iterations iterations

slide-14
SLIDE 14

3/29/2005 UCLA VLSICAD LAB 14

Example: Gordian Example: Gordian-

  • L

L-

  • style Placement

style Placement

slide-15
SLIDE 15

3/29/2005 UCLA VLSICAD LAB 15

Outline Outline

  • Introduction

Introduction

  • Gap Analysis of Existing Placement Algorithms

Gap Analysis of Existing Placement Algorithms

  • PEKO Benchmark Construction

PEKO Benchmark Construction

  • Experiment Results

Experiment Results

  • Highlights from UCLA mPL5

Highlights from UCLA mPL5

slide-16
SLIDE 16

3/29/2005 UCLA VLSICAD LAB 16

Why Is Placement Still a Problem? Why Is Placement Still a Problem?

  • True, it has been studied over 30 years, but

True, it has been studied over 30 years, but … …

  • We need good solutions more then ever

We need good solutions more then ever

  • One of most important steps in IC implementation flow

One of most important steps in IC implementation flow

  • Directly defines interconnects

Directly defines interconnects

  • Difficult

Difficult

  • Problem size grows 2X every 18

Problem size grows 2X every 18-

  • 24 months

24 months

  • Moore

Moore’ ’s Law s Law

  • Cannot place hierarchically without quality degradation

Cannot place hierarchically without quality degradation

slide-17
SLIDE 17

3/29/2005 UCLA VLSICAD LAB 17

Optimality and Scalability Study Optimality and Scalability Study---

  • -- Motivation

Motivation

  • Lack of significant progress in wirelength reduction

Lack of significant progress in wirelength reduction

  • Rate of reduction is about 5

Rate of reduction is about 5-

  • 10% every 2

10% every 2-

  • 3 years

3 years

  • Latest developments in placement differ mainly in runtime

Latest developments in placement differ mainly in runtime

  • Where do we stand?

Where do we stand?

  • How much room for further improvement?

How much room for further improvement?

  • Will existing placement engines scale well to 10+M gate designs?

Will existing placement engines scale well to 10+M gate designs?

  • Most work compare only with existing heuristics

Most work compare only with existing heuristics

  • Use real design based benchmarks, e.g.

Use real design based benchmarks, e.g.

  • ISPD98 [C. Alpert 1998]

ISPD98 [C. Alpert 1998]

  • Use synthetic benchmarks, e.g.

Use synthetic benchmarks, e.g.

  • circ and

circ and gen gen [M. D. Hutton et al, 1998] [M. D. Hutton et al, 1998]

  • gnl

gnl [D. [D. Stroobandt Stroobandt et al, 2000] et al, 2000]

  • Little understanding of the gap from the optimal

Little understanding of the gap from the optimal

slide-18
SLIDE 18

3/29/2005 UCLA VLSICAD LAB 18

Our Contribution: Placement Example Our Contribution: Placement Example Construction with Known Optimal Wirelength Construction with Known Optimal Wirelength

  • Construct instances with

Construct instances with known optimal using the known optimal using the characteristic of the original characteristic of the original problem problem

?

  • Optimality and Scalability Study of Existing

Optimality and Scalability Study of Existing Placement Algorithms [C. Chang et al, 2003] Placement Algorithms [C. Chang et al, 2003]

  • Studied the optimality and

Studied the optimality and scalability of existing algorithms scalability of existing algorithms

  • n constructed instances
  • n constructed instances
slide-19
SLIDE 19

3/29/2005 UCLA VLSICAD LAB 19

Placement Examples with Known Optimal Placement Examples with Known Optimal Wirelength Wirelength [Chang et al, 2003] [Chang et al, 2003]

  • All the modules are of equal size,

All the modules are of equal size, and there is no space between and there is no space between rows and adjacent modules rows and adjacent modules

  • For

For 2 2-

  • pin nets , connect any two

pin nets , connect any two adjacent modules adjacent modules

/ 2 n n n

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥

+ −

  • For each

For each n n-

  • pin net , connect the

pin net , connect the n n modules in a rectangular region close modules in a rectangular region close to a square, i.e., the length of each side to a square, i.e., the length of each side is close to is close to sqrt( sqrt(n n) )

  • The wirelength is of each

The wirelength is of each n n-

  • pin net is

pin net is given by given by

  • Net degree distributions extracted from

Net degree distributions extracted from real industrial benchmarks real industrial benchmarks

slide-20
SLIDE 20

3/29/2005 UCLA VLSICAD LAB 20

PEKO Characteristics PEKO Characteristics

ckt #cell #net #row Optimal WL Peko01 12506 13865 113 8.14E+05 Peko02 19342 19325 140 1.26E+06 Peko03 22853 27118 152 1.50E+06 Peko04 27220 31683 166 1.75E+06 Peko05 28146 27777 169 1.91E+06 Peko06 32332 34660 181 2.06E+06 Peko07 45639 47830 215 2.88E+06 Peko08 51023 50227 227 3.14E+06 Peko09 53110 60617 231 3.64E+06 Peko10 68685 74452 263 4.73E+06 Peko11 70152 81048 266 4.71E+06 Peko12 70439 76603 266 5.00E+06 Peko13 83709 99176 290 5.87E+06 Peko14 147088 152255 385 9.01E+06 Peko15 161187 186225 402 1.15E+07 Peko16 182980 189544 429 1.25E+07 Peko17 184752 188838 431 1.34E+07 Peko18 210341 201648 460 1.32E+07 ckt #cell #net #row Optimal WL Peko01x10 125060 138650 335 8.14E+06 Peko02x10 193420 193250 441 1.26E+07 Peko03x10 228530 271180 479 1.50E+07 Peko04x10 272200 316830 523 1.75E+07 Peko05x10 281460 277770 532 1.91E+07 Peko06x10 323320 346600 570 2.06E+07 Peko07x10 456390 478300 677 2.88E+07 Peko08x10 510230 502270 715 3.14E+07 Peko09x10 531100 606170 730 3.64E+07 Peko10x10 686850 744520 830 4.73E+07 Peko11x10 701520 810480 839 4.71E+07 Peko12x10 704390 766030 840 5.00E+07 Peko13x10 837090 991760 916 5.87E+07 Peko14x10 1470880 1522550 1214 9.01E+07 Peko15x10 1611870 1862250 1271 1.15E+08 Peko16x10 1829800 1895440 1354 1.25E+08 Peko17x10 1847520 1888380 1360 1.34E+08 Peko18x10 2103410 2016480 1451 1.32E+08

PEKO Suite1 ( 12.5k PEKO Suite1 ( 12.5k – – 210k ) PEKO Suite2 ( 125k 210k ) PEKO Suite2 ( 125k – – 2.1M ) 2.1M )

slide-21
SLIDE 21

3/29/2005 UCLA VLSICAD LAB 21

Studied Four State Studied Four State-

  • of
  • f-
  • the

the-

  • Art Placers

Art Placers

  • Capo [A. Caldwell et al, 2000]

Capo [A. Caldwell et al, 2000]

  • Based on multilevel

Based on multilevel partitioner partitioner

  • Aims to enhance the routability

Aims to enhance the routability

  • Dragon [M. Wang et al, 2000]

Dragon [M. Wang et al, 2000]

  • Uses

Uses hMetis hMetis for initial partition for initial partition

  • SA with bin

SA with bin-

  • based swapping

based swapping

  • mPL [T. Chan et al, 2000]

mPL [T. Chan et al, 2000]

  • Multilevel placer using NLP on the coarsest level

Multilevel placer using NLP on the coarsest level

  • Goto

Goto based relaxation based relaxation

  • QPlace

QPlace [Cadence Inc.] [Cadence Inc.]

  • Leading edge industrial placer

Leading edge industrial placer

  • Component of Silicon Ensemble

Component of Silicon Ensemble

slide-22
SLIDE 22

3/29/2005 UCLA VLSICAD LAB 22

Experiment Results on PEKO, July 2004 Experiment Results on PEKO, July 2004

0.5 1 1.5 2 2.5 3 50000 100000 150000 200000 250000 #cells Multiple of Optima

dragon 2.20 capo 8.6 mPL4 qplace 5.1

5000 10000 15000 20000 25000 30000 35000 40000 50000 100000 150000 200000 250000

#cells runtime(s)

dragon 2.20 capo 8.6 mPL4 qplace 5.1

  • Existing algorithms are 30

Existing algorithms are 30-

  • 153% away from the optimal on PEKO

153% away from the optimal on PEKO

  • There is

There is significant room for improvement significant room for improvement in placement algorithms! in placement algorithms!

  • ROI can be huge

ROI can be huge – – 30% wirelength reduction is equivalent to 30% wirelength reduction is equivalent to

  • Move from aluminum to copper, or

Move from aluminum to copper, or

  • One process generation shrink

One process generation shrink

slide-23
SLIDE 23

3/29/2005 UCLA VLSICAD LAB 23

Experiment with State Experiment with State-

  • of
  • f-
  • the

the-

  • Art Placers Using

Art Placers Using PEKO Suite1 & Suite2 (July 2004) PEKO Suite1 & Suite2 (July 2004)

10000 20000 30000 40000 50000 60000 10000 100000 1000000 10000000 #cells runtime(s)

Dragon 2.20 capo 8.6 mPL 4 qplace 5.1 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 10000 100000 1000000 10000000 #cells Multiple of Optima

Dragon 2.20 capo 8.6 mPL 4 qplace 5.1

  • Capo,

Capo, QPlace QPlace and mPL scales well in runtime and mPL scales well in runtime

  • Average solution quality of each tool shows deterioration by an

Average solution quality of each tool shows deterioration by an additional 4% to additional 4% to 25% when the problem size increases by a factor of 10 25% when the problem size increases by a factor of 10

  • QoR

QoR of the existing placement algorithms can be 40%

  • f the existing placement algorithms can be 40% -
  • 160% away from the optimal

160% away from the optimal for large designs for large designs

slide-24
SLIDE 24

3/29/2005 UCLA VLSICAD LAB 24

slide-25
SLIDE 25

3/29/2005 UCLA VLSICAD LAB 25

Limitations of the PEKO Examples Limitations of the PEKO Examples

  • Optimal solution includes local nets only

Optimal solution includes local nets only

  • Unlikely for real designs

Unlikely for real designs

  • Measure wirelength only

Measure wirelength only

  • Timing and routability are important objectives for placement

Timing and routability are important objectives for placement algorithms as well algorithms as well

slide-26
SLIDE 26

3/29/2005 UCLA VLSICAD LAB 26

Impact of Global Connections in Real Examples Impact of Global Connections in Real Examples

circuit height width WL of longest net WL contribution

  • f longest 10%

ibm01 8158 4530 7148 51% ibm02 8158 6430 14224 46% ibm03 8158 6740 10624 58% ibm04 8158 9140 15171 53% ibm05 8158 11055 19064 47% ibm06 8158 8715 13966 61% ibm07 8158 14605 14051 51% ibm08 8158 15895 16142 60% ibm09 8158 16395 13780 55% ibm10 8158 27890 30755 53% ibm11 16350 10925 19234 59% ibm12 16350 15545 26748 52% ibm13 16350 12230 19539 59% ibm14 16350 25475 26370 61% ibm15 16350 23785 27284 63% ibm16 16350 34015 42860 59% ibm17 16283 38895 45686 56% ibm18 16350 37065 52846 64%

  • Produced by Dragon on

Produced by Dragon on ISPD98 ISPD98

  • The wirelength

The wirelength contribution from global contribution from global connections can be connections can be significant! significant!

  • Need to consider the

Need to consider the impact of global impact of global connections connections

slide-27
SLIDE 27

3/29/2005 UCLA VLSICAD LAB 27

Placement Examples with Known Placement Examples with Known Upperbounds Upperbounds (PEKU) (PEKU)

  • Generate nets with optimal

Generate nets with optimal wirelength as in wirelength as in Peko Peko

  • Add random connections

Add random connections with emulate global nets with emulate global nets

slide-28
SLIDE 28

3/29/2005 UCLA VLSICAD LAB 28

PEKU Suite PEKU Suite

% non- local nets circuit #cell #net #row Row utilizatio n LB UB Peku01 12506 14111 113 85% 8.14E+05 8.14E+05 Peku05 28146 28446 169 85% 1.91E+06 1.91E+06 Peku10 68685 75196 263 85% 4.73E+06 4.73E+06 Peku15 161187 186608 402 85% 1.15E+07 1.15E+07 Peku18 210341 201920 460 85% 1.32E+07 1.32E+07 Peku01 12506 14111 113 85% 8.14E+05 9.23E+05 Peku05 28146 28446 169 85% 1.91E+06 2.24E+06 Peku10 68685 75196 263 85% 4.73E+06 6.17E+06 Peku15 161187 186608 402 85% 1.15E+07 1.71E+07 Peku18 210341 201920 460 85% 1.32E+07 2.01E+07 Peku01 12506 14111 113 85% 8.14E+05 1.02E+06 Peku05 28146 28446 169 85% 1.91E+06 2.63E+06 Peku10 68685 75196 263 85% 4.73E+06 7.52E+06 Peku15 161187 186608 402 85% 1.15E+07 2.30E+07 Peku18 210341 201920 460 85% 1.32E+07 2.75E+07 Up to 10% 0.25% 0.50%

URL: http://cadlab.cs.ucla.edu/~pubbench/peku.htm

slide-29
SLIDE 29

3/29/2005 UCLA VLSICAD LAB 29

Experiment Results on PEKU, July 2004 Experiment Results on PEKU, July 2004

1 1.2 1.4 1.6 1.8 2 2.2 0.00% 0.25% 0.50% 0.75% 1.00% 2.00% 5.00% 10.00% % of non-local nets Quality Ratio capo 8.6 dragon 2.20 mP G 1.0 mP L 4

  • Absolute value of the

Absolute value of the QRs QRs may not be meaningful, but it helps to identify the may not be meaningful, but it helps to identify the technique that works best under each scenario technique that works best under each scenario

  • No existing placer can consistently produce the best quality

No existing placer can consistently produce the best quality

slide-30
SLIDE 30

3/29/2005 UCLA VLSICAD LAB 30

Center-to-center HPWL = 1029536. Pin-to-pin HPWL = 264944.

In Preparation: PEKO In Preparation: PEKO-

  • MS (Mixed

MS (Mixed-

  • Size PEKO)

Size PEKO)

As of March 2005, the best result of mPL5

  • n this benchmark is

still over 6X greater than optimal (in pin- to-pin half-perimeter wirelength)!

slide-31
SLIDE 31

3/29/2005 UCLA VLSICAD LAB 31

Observations from Gap Analysis Observations from Gap Analysis

  • Significant opportunity in placement

Significant opportunity in placement

  • Existing algorithms may produce solutions far away from the

Existing algorithms may produce solutions far away from the

  • ptimal
  • ptimal
  • The quality result of the same placer varies for circuits of

The quality result of the same placer varies for circuits of similar size but different characteristic similar size but different characteristic

  • Scalability problem in runtime and solution quality

Scalability problem in runtime and solution quality

  • Significant ROI

Significant ROI

  • Benefit equal to one to two generations of process scaling

Benefit equal to one to two generations of process scaling

  • But without requiring multi

But without requiring multi-

  • billion dollar investment (we hope!)

billion dollar investment (we hope!)

slide-32
SLIDE 32

3/29/2005 UCLA VLSICAD LAB 32

Outline Outline

  • Introduction

Introduction

  • Gap Analysis of Existing Placement Algorithms

Gap Analysis of Existing Placement Algorithms

  • Highlights from UCLA mPL5

Highlights from UCLA mPL5

  • Multiscale Optimization Framework

Multiscale Optimization Framework

  • Generic Force

Generic Force-

  • Directed Formulation

Directed Formulation

  • Multiscale Nonlinear

Multiscale Nonlinear-

  • Programming Algorithm

Programming Algorithm

slide-33
SLIDE 33

3/29/2005 UCLA VLSICAD LAB 33

Multilevel Optimization Framework Multilevel Optimization Framework

Interpolation & Relaxation (optimization) Coarsening (Clustering)

Problem size decreases

  • Multilevel coarsening generates smaller problem sizes at coarser levels

faster optimization at coarser levels

  • May explore different aspects of the solution space at different levels
  • Gradual refinement on good solutions from coarser levels is very efficient
  • Successful in many applications
  • Originally developed for PDEs
  • Recent success in VLSI CAD: partitioning, placement, routing

Given problem

slide-34
SLIDE 34

3/29/2005 UCLA VLSICAD LAB 34

Multilevel Placement Multilevel Placement

  • Coarsening:

Coarsening: build a hierarchy of problem approximations by build a hierarchy of problem approximations by generalized clustering generalized clustering

  • Relaxation:

Relaxation: improve the placement at each level by iterative improve the placement at each level by iterative

  • ptimization
  • ptimization
  • Interpolation:

Interpolation: transfer coarse transfer coarse-

  • level solution to adjacent, finer

level solution to adjacent, finer level (generalized declustering) level (generalized declustering)

  • Multilevel Flow:

Multilevel Flow: multiple traversals over multiple hierarchies multiple traversals over multiple hierarchies (V (V-

  • cycle variations)

cycle variations)

slide-35
SLIDE 35

3/29/2005 UCLA VLSICAD LAB 35

Multilevel Methods: Coarsening by Recursive Multilevel Methods: Coarsening by Recursive Aggregation Aggregation

  • Recursive aggregation defines the hierarchy.

Recursive aggregation defines the hierarchy.

  • Different aggregation algorithms can be used on different

Different aggregation algorithms can be used on different levels and/or in different V levels and/or in different V-

  • cycles.

cycles.

  • Example: First

Example: First-

  • Choice Clustering (

Choice Clustering (hMetis hMetis [ [Karypis Karypis 1999]). 1999]).

Merge each vertex with its “best” neighbor Merged Nets

slide-36
SLIDE 36

3/29/2005 UCLA VLSICAD LAB 36

Multilevel Methods: Relaxation Multilevel Methods: Relaxation

( (Intralevel Intralevel Optimization) Optimization)

  • Iterative improvement at each

Iterative improvement at each level by local or global fast level by local or global fast computation computation

  • Additional global improvement

Additional global improvement comes from the multilevel comes from the multilevel hierarchy. hierarchy.

  • Example:

Example: Goto Goto-

  • based discrete

based discrete exchange exchange

  • Calculate A’s optimal-wirelength location, holding other cells fixed.
  • Compute a chain A, B, C, D, E, where B is a randomly selected neighbor of

A’s optimal location, etc.

  • Examine all permutations of the chain and take the best one.
slide-37
SLIDE 37

3/29/2005 UCLA VLSICAD LAB 37

Multilevel Methods: Interpolation Multilevel Methods: Interpolation (Generalized Declustering) (Generalized Declustering)

Place representative components Place others by weighted interpolation

  • Transfer a partial solution from a coarser level to its adjacent

Transfer a partial solution from a coarser level to its adjacent finer level finer level

  • Example: place a component ( ) at the weighted average of

Example: place a component ( ) at the weighted average of the positions of the clusters containing its neighbors the positions of the clusters containing its neighbors

slide-38
SLIDE 38

3/29/2005 UCLA VLSICAD LAB 38

Iterated Multilevel Flow Iterated Multilevel Flow

Make use of placement solution from 1st V-cycle First Choice (FC) clustering Geometric based FC clustering

slide-39
SLIDE 39

3/29/2005 UCLA VLSICAD LAB 39

Iterated Multilevel Flow Iterated Multilevel Flow

Iterated V-Cycles F-Cycle Backtracking V-Cycle

slide-40
SLIDE 40

3/29/2005 UCLA VLSICAD LAB 40

Relative Wirelength mPL 1.0 [ICCAD00]

  • Recursive ESC clustering
  • NLP at coarsest level
  • Goto discrete relaxation
  • Slot Assignment legalization
  • Domino detailed placement

year 2000 2001 2002 2003 2004

A Brief History of mPL

mPL 1.1

  • FC-Clustering
  • added partitioning to legalization

mPL 2.0

  • RDFL relaxation
  • primal-dual netlist pruning

mPL 3.0 [ICCAD 03]

  • QRS relaxation
  • AMG interpolation
  • multiple V-cycles
  • cell-area fragmentation

UNIFORM CELL SIZE NON-UNIFORM CELL SIZE

mPL 4.0

  • improved DP
  • better coarsening
  • backtracking V-cycle

mPL 5.0

  • Multilevel Force-Directed
slide-41
SLIDE 41

3/29/2005 UCLA VLSICAD LAB 41

Kraftwerk Framework for Force Kraftwerk Framework for Force-

  • Directed Placement

Directed Placement [ [Eisenmann Eisenmann and Johannes 98] and Johannes 98]

  • Minimize quadratic wirelength

Minimize quadratic wirelength

  • Incorporate density

Incorporate density-

  • gradient forces

gradient forces (f (fk

k) acting on cells into the optimality

) acting on cells into the optimality condition: condition:

  • Assume forces are zero at infinity.

Assume forces are zero at infinity.

  • Iteratively update

Iteratively update v vk

k and

and f fk

k.

.

  • Key limitation: extensive tuning

Key limitation: extensive tuning required for proper force scaling. required for proper force scaling. Cell density is a continuous but NON-SMOOTH function

  • f position
slide-42
SLIDE 42

3/29/2005 UCLA VLSICAD LAB 42

Generalized Force Directed Method in mPL5 Generalized Force Directed Method in mPL5

  • Our generalized force directed method

Our generalized force directed method

  • Minimize log

Minimize log-

  • sum

sum-

  • exp wirelength

exp wirelength W(x W(x) ) [Naylor 01; [Naylor 01; Kahng Kahng and Wang 04] subject to even and Wang 04] subject to even bin density constraints bin density constraints area chip by divded area cells total where , ) ( . . ) ( min = = c c x d t s x W

slide-43
SLIDE 43

3/29/2005 UCLA VLSICAD LAB 43

mPL5 Generalized Force mPL5 Generalized Force-

  • Directed Placement

Directed Placement

  • Basic formulation

Basic formulation

  • Smooth the density constraints by

Smooth the density constraints by Laplace Laplace transformation and transformation and solving a Poisson Equation: solving a Poisson Equation:

  • Assume Neumann boundary conditions: forces pointing outside

Assume Neumann boundary conditions: forces pointing outside the chip boundary are zero. the chip boundary are zero.

  • Can solve a

Can solve a discretized discretized version efficiently using fast discrete version efficiently using fast discrete cosine cosine tranformation tranformation

area chip by divded area cells total where , ) ( . . ) ( min = = c c x d t s x W

slide-44
SLIDE 44

3/29/2005 UCLA VLSICAD LAB 44

Wirelength Wirelength Estimation Estimation

(a) Steiner Tree Rectilinear Length = 14 (b) Steiner Tree with Trunk Rectilinear Length = 15 (c) Minimum Spanning Tree Rectilinear Length = 16 (d) Chain Rectilinear Length = 17 (e) Complete Graph Rectilinear Length = 42

Approximation: half perimeter

  • f the bounding box
slide-45
SLIDE 45

3/29/2005 UCLA VLSICAD LAB 45

Objective Function Used in mPL5 Objective Function Used in mPL5

  • Log

Log-

  • sum

sum-

  • exp smooth approximation to half

exp smooth approximation to half-

  • perimeter

perimeter wirelength [Naylor 2001; wirelength [Naylor 2001; Kahng Kahng and Wang 2004]: and Wang 2004]:

  • Other approximation is also possible (using p

Other approximation is also possible (using p-

  • norm)

norm)

slide-46
SLIDE 46

3/29/2005 UCLA VLSICAD LAB 46

mPL5 Nonlinear mPL5 Nonlinear-

  • Programing

Programing Solution Solution

  • Using the

Using the Uzawa Uzawa algorithm to solve the above nonlinear constrained algorithm to solve the above nonlinear constrained minimization problem, we iteratively solve minimization problem, we iteratively solve

  • No matrix storage and no second derivatives are computed.

No matrix storage and no second derivatives are computed.

  • Use multilevel approach to speed

Use multilevel approach to speed-

  • up computation and better quality

up computation and better quality

slide-47
SLIDE 47

3/29/2005 UCLA VLSICAD LAB 47

mPL5 Framework

Level at which GFD is applied Level 3 Level 2 Level 1 C C I I C+I C+I I I C Coarsening I Interpolation Keep coarsening until # cells less than 500

slide-48
SLIDE 48

3/29/2005 UCLA VLSICAD LAB 48

mPL5 VS other state mPL5 VS other state-

  • of
  • f-
  • the

the-

  • art

art-

  • placers on

placers on FastPlace FastPlace IBM Standard Cell Placement Benchmarks (March 2005) IBM Standard Cell Placement Benchmarks (March 2005)

12.38 1 1.09, 2.29 1.08, 0.18 1.06, 2.03 1.07, 0.3 1 2 3 4 5 6 7 8 9 10 11 12 13 0 . 9 5 1

  • 1. 0 5
  • 1. 1

Scaled wirelength Scaled runtime

Ca po9 . 0 Dr a gon3 .0 1 Fa st P la c e 1.0 Fe ngshui5 . 0 mP L5 mP L5 - f a st

slide-49
SLIDE 49

3/29/2005 UCLA VLSICAD LAB 49

Scalability plot of mPL5 Scalability plot of mPL5-

  • fast VS FastPlace1.0 on

fast VS FastPlace1.0 on FastPlace FastPlace IBM Benchmarks IBM Benchmarks

y = 0.0001x1.2409

( mPL5- f ast)

y = 5E-06x1.4995

( FastPlace1.0)

200 400 600 800 50000 100000 150000 200000 #Cells

Runtime

FastPlace1.0 mPL5-fast

mPL5-fast is slightly more scalable than FastPlace1.0

slide-50
SLIDE 50

3/29/2005 UCLA VLSICAD LAB 50

Placement Plot of Placers on IBM02 Placement Plot of Placers on IBM02

mPL5

  • Rel. WL = 1.00

Fengshui 5.0

  • Rel. WL = 1.11

Capo 9.0

  • Rel. WL = 1.17
slide-51
SLIDE 51

3/29/2005 UCLA VLSICAD LAB 51

Placement Plot of Placers on IBM10 Placement Plot of Placers on IBM10

mPL5

  • Rel. WL = 1.00

Fengshui 5.0

  • Rel. WL = 1.15

Capo 9.0

  • Rel. WL = 1.28
slide-52
SLIDE 52

3/29/2005 UCLA VLSICAD LAB 52

mPL5 VS other state mPL5 VS other state-

  • of
  • f-
  • the

the-

  • art

art-

  • placers on PEKO

placers on PEKO-

  • pad

pad

1.00 1.20 1.40 1.60 1.80 2.00 2.20 2.40 2.60 2.80 12506 27220 45639 68685 83709 182980 #Cells Quality ratio Capo9.0 Dragon3.01 F engshui5.0 F astPlace1.0 mPL5 mPL5-fast

slide-53
SLIDE 53

3/29/2005 UCLA VLSICAD LAB 53

mPL movie mPL movie

slide-54
SLIDE 54

3/29/2005 UCLA VLSICAD LAB 54

Concluding Remarks Concluding Remarks

  • There is still significant opportunity to improve placement

There is still significant opportunity to improve placement technologies. technologies.

  • mPL5 achieves improvement by incorporating

mPL5 achieves improvement by incorporating denisity denisity constrained nonlinear programming into a multilevel constrained nonlinear programming into a multilevel framework. framework.

  • ISPD

ISPD’ ’2005 placement contest 2005 placement contest

  • Multiscale Optimization Framework

Multiscale Optimization Framework

  • Generic Force

Generic Force-

  • Directed Formulation

Directed Formulation

  • Multiscale Nonlinear

Multiscale Nonlinear-

  • Programming Algorithm

Programming Algorithm

slide-55
SLIDE 55

ISPD 2005 Placement Contest ISPD 2005 Placement Contest

  • 9 teams worldwide competing

9 teams worldwide competing

  • SUNY Binghamton (

SUNY Binghamton (FengShui FengShui), T. Munich (FD), UCSD ), T. Munich (FD), UCSD ( (Aplace Aplace), UCLA (Dragon, ), UCLA (Dragon, mPL mPL), U Michigan (Capo), ), U Michigan (Capo), … …

  • 8 new large

8 new large-

  • scale real industrial benchmarks released

scale real industrial benchmarks released

  • n March 20 at 5pm EST
  • n March 20 at 5pm EST
  • Results are due March 25 at 5pm EST

Results are due March 25 at 5pm EST

  • Web site

Web site

slide-56
SLIDE 56

3/29/2005 UCLA VLSICAD LAB 56

Circuit #Objects #Fix'd Obj.'s #Nets #Pins #Pins Fix'd Obj. Design Density Design Utiliz'n

adaptec1

211447 543 221142 944053 20540 75.71% 57.34%

adaptec2

255023 566 266009 1069482 23783 78.56% 44.32%

adaptec3

451650 723 466758 1875039 31187 74.53% 33.66%

adaptec4

496045 1329 515951 1912420 35857 62.67% 27.23%

bigblue1

278164 560 284479 1144691 12835 54.19% 44.67%

bigblue2

557866 23084 577235 2122282 142685 61.80% 37.94%

bigblue3

1096812 1293 1123170 3833218 43111 85.65% 56.68%

bigblue4

2177353 8170 2229886 8900078 189411 65.30% 44.35%

ISPD 2005 Circuit Benchmark Characteristics ISPD 2005 Circuit Benchmark Characteristics

  • Design Density = (Total Object Area) / (Chip Area)

Design Density = (Total Object Area) / (Chip Area) Design Design Utiliz Utiliz’ ’n n = (Total = (Total Mov Mov. . Obj

  • Obj. Area) / (Unused Chip Area)

. Area) / (Unused Chip Area)

  • Adaptec1 and 2 and BigBlue1 have perimeter I/O pads.

Adaptec1 and 2 and BigBlue1 have perimeter I/O pads. All others employ fixed area All others employ fixed area-

  • array I/O objects.

array I/O objects.

slide-57
SLIDE 57

3/29/2005 UCLA VLSICAD LAB 57

mPL5 Solution to Big Blue 1 mPL5 Solution to Big Blue 1

slide-58
SLIDE 58

3/29/2005 UCLA VLSICAD LAB 58

mPL5 Solution to Big Blue 2 mPL5 Solution to Big Blue 2

slide-59
SLIDE 59

3/29/2005 UCLA VLSICAD LAB 59

mPL5 Solution to Big Blue 3 mPL5 Solution to Big Blue 3

slide-60
SLIDE 60

3/29/2005 UCLA VLSICAD LAB 60

mPL5 Solution to Big Blue 4 mPL5 Solution to Big Blue 4

slide-61
SLIDE 61

3/29/2005 UCLA VLSICAD LAB 61

Acknowledgements Acknowledgements

  • We would like to thank the supports from

We would like to thank the supports from

  • Semiconductor Research Corporation (SRC)

Semiconductor Research Corporation (SRC)

  • National Science Foundation (NSF)

National Science Foundation (NSF)

  • Industrial sponsors under the California MICRO programs (

Industrial sponsors under the California MICRO programs (Altera Altera, , Intel, Magma, Intel, Magma, Xilinx Xilinx) )

  • Hard work of a number of former and current graduate

Hard work of a number of former and current graduate students on developing and refining the students on developing and refining the mPL mPL package package