Large Scale Circuit Placement: Large Scale Circuit Placement: Gap - - PowerPoint PPT Presentation
Large Scale Circuit Placement: Large Scale Circuit Placement: Gap - - PowerPoint PPT Presentation
Large Scale Circuit Placement: Large Scale Circuit Placement: Gap and Progress Gap and Progress Jason Cong Jason Cong UCLA Computer Science Department UCLA Computer Science Department http://cadlab.cs.ucla.edu cadlab.cs.ucla.edu/~cong
3/29/2005 UCLA VLSICAD LAB 2
Outline Outline
- Introduction
Introduction
- Problem Description
Problem Description
- Popular Methods
Popular Methods
- Gap Analysis of Existing Placement Algorithms
Gap Analysis of Existing Placement Algorithms
- PEKO Benchmark Construction
PEKO Benchmark Construction
- Experiment Results
Experiment Results
- UCLA mPL5
UCLA mPL5
- Multiscale Optimization Framework
Multiscale Optimization Framework
- Generic Force
Generic Force-
- Directed Formulation
Directed Formulation
- Multiscale Nonlinear
Multiscale Nonlinear-
- Programming Solution
Programming Solution
3/29/2005 UCLA VLSICAD LAB 3
Complex IC Design Example Complex IC Design Example
- High
High-
- end Automotive A/V application
end Automotive A/V application
- 10M Gates
10M Gates
- 70 clocks (320 MHz)
70 clocks (320 MHz)
- Technology: 0.13u, 6LM
Technology: 0.13u, 6LM
Courtesy of Magma Design Automation Courtesy of Magma Design Automation
3/29/2005 UCLA VLSICAD LAB 4
VLSI CAD VLSI CAD
- Computer
Computer-
- aided design (CAD) of very large
aided design (CAD) of very large-
- scale integrated (VLSI) circuits
scale integrated (VLSI) circuits
- Electronic design automation (EDA)
Electronic design automation (EDA)
3/29/2005 UCLA VLSICAD LAB 5
ITRS 2004 (Int ITRS 2004 (Int’ ’l Tech. Roadmap for Semiconductors) l Tech. Roadmap for Semiconductors)
50 50 57 57 65 65 70 70 80 80 90 90 100 100 DRAM DRAM ½ ½ pitch pitch (nm) (nm) 16 16 16 16 15 15 15 15 15 15 14 14 13 13 Maximum wiring Maximum wiring level level 12369 12369 10972 10972 9285 9285 6783 6783 5204 5204 4171 4171 2976 2976 On chip local On chip local clock (MHz) clock (MHz) 2 28 80 2 28 80 280 280 2 28 80 2 28 80 2 28 80 280 280 Chip size at Chip size at production (mm production (mm2
2)
) 1227 1227 974 974 773 773 614 614 487 487 386 386 307 307 Functions/Chip at Functions/Chip at production production (M transistors) (M transistors) 2009 2009 2008 2008 2007 2007 2006 2006 2005 2005 2004 2004 2003 2003 Year of Year of production production
3/29/2005 UCLA VLSICAD LAB 6
Outline Outline
- Introduction
Introduction
- Problem Description
Problem Description
- Popular Methods
Popular Methods
- Gap Analysis of Existing Placement Algorithms
Gap Analysis of Existing Placement Algorithms
- UCLA mPL5
UCLA mPL5
3/29/2005 UCLA VLSICAD LAB 7
Circuit Placement Problem Statement Circuit Placement Problem Statement
- Given
Given
- A set of cells ( modules ) of fixed dimensions and the
A set of cells ( modules ) of fixed dimensions and the interconnections interconnections between them between them – – a a netlist netlist
- Find
Find
- The position of each cell, such that
The position of each cell, such that
- no overlap ( and enough routing space )
no overlap ( and enough routing space )
- minimize total length of all interconnections
minimize total length of all interconnections
- minimize routing congestion, delay,
minimize routing congestion, delay, … …
- An NP
An NP-
- hard problem
hard problem
Bad placement Good placement
D A C G E H B F I
A netlist A net A cell
3/29/2005 UCLA VLSICAD LAB 8
Popular Placement Methods Popular Placement Methods
Placement problem has been studied extensively for over 30 years Placement problem has been studied extensively for over 30 years
- Iterative improvement
Iterative improvement
- Repeatedly rearrange small subsets of modules
Repeatedly rearrange small subsets of modules
- E.g. Simulated annealing
E.g. Simulated annealing
- Min
Min-
- cut based placement
cut based placement
- Recursively bi
Recursively bi-
- partition modules in a way that minimize
partition modules in a way that minimize connections between partition blocks connections between partition blocks
- Quadratic placement with recursive legalization
Quadratic placement with recursive legalization
- Initial solution by unconstrained quadratic wirelength
Initial solution by unconstrained quadratic wirelength minimization minimization
- Gradually spread cells out to remove overlap
Gradually spread cells out to remove overlap
3/29/2005 UCLA VLSICAD LAB 9
Simulated Annealing Based Placement Simulated Annealing Based Placement
E.g. VPR [Betz and Rose, 1997 ] E.g. VPR [Betz and Rose, 1997 ] Overview Overview
- 2. Select one of its neighbors.
- 3. Evaluate the wirelength change
due to swapping them.
?
- 4. If the swap decreases wirelength, accept it.
Otherwise, accept the swap with probability
- 5. Repeat for reduced T until T approaches to 0
- 1. Select a module.
3/29/2005 UCLA VLSICAD LAB 10
Initially, there is only netlist connectivity; no spatial information is available.
Cutsize Cutsize-
- Driven
Driven Recursive Top Recursive Top-
- Down Partitioning
Down Partitioning
Apply a standard partitioning algorithm to the given netlist. Multilevel partitioning algorithms are the most effective. After two stages, each cell has been assigned to one of four possible subregions. As few nets as possible have been cut. After three stages, each cell has been assigned to one of eight possible subregions. As few nets as possible have been cut. Iterative improvement by repartitioning with terminal propagation is essential.
3/29/2005 UCLA VLSICAD LAB 11
Cutsize Cutsize-
- Driven
Driven Partitioning Partitioning-
- Based Placement
Based Placement
- Cutsize = the number of nets not contained in just one
Cutsize = the number of nets not contained in just one side of the partition side of the partition
- Rent
Rent’ ’s rule shows that wirelength and cutsize correlate to s rule shows that wirelength and cutsize correlate to within about within about X
X2 log
2 log N N [Wang et al, 2000]. [Wang et al, 2000].
- Fast FM
Fast FM-
- style iterations with terminal propagation
style iterations with terminal propagation
- Careful
Careful cutline cutline selection and selection and multiway multiway partitions can help partitions can help
- e.g. Capo, Feng
e.g. Capo, Feng-
- Shui, Dragon
Shui, Dragon
3/29/2005 UCLA VLSICAD LAB 12
Quadratic Placement Quadratic Placement
Optimality Condition:
Example.
5 4 2 3 1 5 3 4 2
Q is the graph Laplacian Matrix: Q = D – G where D is the degree matrix and G is the graph adjacency matrix
3/29/2005 UCLA VLSICAD LAB 13
Quadratic Placement with Iterative Legalization Quadratic Placement with Iterative Legalization
- Unconstrained Optimality Condition:
Unconstrained Optimality Condition:
- Solve one large symmetric positive
Solve one large symmetric positive-
- definite linear system.
definite linear system.
- Pads prevent cells from collapsing to a single point.
Pads prevent cells from collapsing to a single point.
- Example: Gordian
Example: Gordian-
- L.
L.
- Minimize cutsize, but use the given placement to form initial pa
Minimize cutsize, but use the given placement to form initial partitions rtitions (e.g., using (e.g., using x x-
- or
- r y
y-
- coordinate median for
coordinate median for cutline cutline) )
- New subregions generate new center
New subregions generate new center-
- of mass constraints for subsequent
- f mass constraints for subsequent
iterations iterations
3/29/2005 UCLA VLSICAD LAB 14
Example: Gordian Example: Gordian-
- L
L-
- style Placement
style Placement
3/29/2005 UCLA VLSICAD LAB 15
Outline Outline
- Introduction
Introduction
- Gap Analysis of Existing Placement Algorithms
Gap Analysis of Existing Placement Algorithms
- PEKO Benchmark Construction
PEKO Benchmark Construction
- Experiment Results
Experiment Results
- Highlights from UCLA mPL5
Highlights from UCLA mPL5
3/29/2005 UCLA VLSICAD LAB 16
Why Is Placement Still a Problem? Why Is Placement Still a Problem?
- True, it has been studied over 30 years, but
True, it has been studied over 30 years, but … …
- We need good solutions more then ever
We need good solutions more then ever
- One of most important steps in IC implementation flow
One of most important steps in IC implementation flow
- Directly defines interconnects
Directly defines interconnects
- Difficult
Difficult
- Problem size grows 2X every 18
Problem size grows 2X every 18-
- 24 months
24 months
- Moore
Moore’ ’s Law s Law
- Cannot place hierarchically without quality degradation
Cannot place hierarchically without quality degradation
3/29/2005 UCLA VLSICAD LAB 17
Optimality and Scalability Study Optimality and Scalability Study---
- -- Motivation
Motivation
- Lack of significant progress in wirelength reduction
Lack of significant progress in wirelength reduction
- Rate of reduction is about 5
Rate of reduction is about 5-
- 10% every 2
10% every 2-
- 3 years
3 years
- Latest developments in placement differ mainly in runtime
Latest developments in placement differ mainly in runtime
- Where do we stand?
Where do we stand?
- How much room for further improvement?
How much room for further improvement?
- Will existing placement engines scale well to 10+M gate designs?
Will existing placement engines scale well to 10+M gate designs?
- Most work compare only with existing heuristics
Most work compare only with existing heuristics
- Use real design based benchmarks, e.g.
Use real design based benchmarks, e.g.
- ISPD98 [C. Alpert 1998]
ISPD98 [C. Alpert 1998]
- Use synthetic benchmarks, e.g.
Use synthetic benchmarks, e.g.
- circ and
circ and gen gen [M. D. Hutton et al, 1998] [M. D. Hutton et al, 1998]
- gnl
gnl [D. [D. Stroobandt Stroobandt et al, 2000] et al, 2000]
- Little understanding of the gap from the optimal
Little understanding of the gap from the optimal
3/29/2005 UCLA VLSICAD LAB 18
Our Contribution: Placement Example Our Contribution: Placement Example Construction with Known Optimal Wirelength Construction with Known Optimal Wirelength
- Construct instances with
Construct instances with known optimal using the known optimal using the characteristic of the original characteristic of the original problem problem
?
- Optimality and Scalability Study of Existing
Optimality and Scalability Study of Existing Placement Algorithms [C. Chang et al, 2003] Placement Algorithms [C. Chang et al, 2003]
- Studied the optimality and
Studied the optimality and scalability of existing algorithms scalability of existing algorithms
- n constructed instances
- n constructed instances
3/29/2005 UCLA VLSICAD LAB 19
Placement Examples with Known Optimal Placement Examples with Known Optimal Wirelength Wirelength [Chang et al, 2003] [Chang et al, 2003]
- All the modules are of equal size,
All the modules are of equal size, and there is no space between and there is no space between rows and adjacent modules rows and adjacent modules
- For
For 2 2-
- pin nets , connect any two
pin nets , connect any two adjacent modules adjacent modules
/ 2 n n n
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
+ −
- For each
For each n n-
- pin net , connect the
pin net , connect the n n modules in a rectangular region close modules in a rectangular region close to a square, i.e., the length of each side to a square, i.e., the length of each side is close to is close to sqrt( sqrt(n n) )
- The wirelength is of each
The wirelength is of each n n-
- pin net is
pin net is given by given by
- Net degree distributions extracted from
Net degree distributions extracted from real industrial benchmarks real industrial benchmarks
3/29/2005 UCLA VLSICAD LAB 20
PEKO Characteristics PEKO Characteristics
ckt #cell #net #row Optimal WL Peko01 12506 13865 113 8.14E+05 Peko02 19342 19325 140 1.26E+06 Peko03 22853 27118 152 1.50E+06 Peko04 27220 31683 166 1.75E+06 Peko05 28146 27777 169 1.91E+06 Peko06 32332 34660 181 2.06E+06 Peko07 45639 47830 215 2.88E+06 Peko08 51023 50227 227 3.14E+06 Peko09 53110 60617 231 3.64E+06 Peko10 68685 74452 263 4.73E+06 Peko11 70152 81048 266 4.71E+06 Peko12 70439 76603 266 5.00E+06 Peko13 83709 99176 290 5.87E+06 Peko14 147088 152255 385 9.01E+06 Peko15 161187 186225 402 1.15E+07 Peko16 182980 189544 429 1.25E+07 Peko17 184752 188838 431 1.34E+07 Peko18 210341 201648 460 1.32E+07 ckt #cell #net #row Optimal WL Peko01x10 125060 138650 335 8.14E+06 Peko02x10 193420 193250 441 1.26E+07 Peko03x10 228530 271180 479 1.50E+07 Peko04x10 272200 316830 523 1.75E+07 Peko05x10 281460 277770 532 1.91E+07 Peko06x10 323320 346600 570 2.06E+07 Peko07x10 456390 478300 677 2.88E+07 Peko08x10 510230 502270 715 3.14E+07 Peko09x10 531100 606170 730 3.64E+07 Peko10x10 686850 744520 830 4.73E+07 Peko11x10 701520 810480 839 4.71E+07 Peko12x10 704390 766030 840 5.00E+07 Peko13x10 837090 991760 916 5.87E+07 Peko14x10 1470880 1522550 1214 9.01E+07 Peko15x10 1611870 1862250 1271 1.15E+08 Peko16x10 1829800 1895440 1354 1.25E+08 Peko17x10 1847520 1888380 1360 1.34E+08 Peko18x10 2103410 2016480 1451 1.32E+08
PEKO Suite1 ( 12.5k PEKO Suite1 ( 12.5k – – 210k ) PEKO Suite2 ( 125k 210k ) PEKO Suite2 ( 125k – – 2.1M ) 2.1M )
3/29/2005 UCLA VLSICAD LAB 21
Studied Four State Studied Four State-
- of
- f-
- the
the-
- Art Placers
Art Placers
- Capo [A. Caldwell et al, 2000]
Capo [A. Caldwell et al, 2000]
- Based on multilevel
Based on multilevel partitioner partitioner
- Aims to enhance the routability
Aims to enhance the routability
- Dragon [M. Wang et al, 2000]
Dragon [M. Wang et al, 2000]
- Uses
Uses hMetis hMetis for initial partition for initial partition
- SA with bin
SA with bin-
- based swapping
based swapping
- mPL [T. Chan et al, 2000]
mPL [T. Chan et al, 2000]
- Multilevel placer using NLP on the coarsest level
Multilevel placer using NLP on the coarsest level
- Goto
Goto based relaxation based relaxation
- QPlace
QPlace [Cadence Inc.] [Cadence Inc.]
- Leading edge industrial placer
Leading edge industrial placer
- Component of Silicon Ensemble
Component of Silicon Ensemble
3/29/2005 UCLA VLSICAD LAB 22
Experiment Results on PEKO, July 2004 Experiment Results on PEKO, July 2004
0.5 1 1.5 2 2.5 3 50000 100000 150000 200000 250000 #cells Multiple of Optima
dragon 2.20 capo 8.6 mPL4 qplace 5.1
5000 10000 15000 20000 25000 30000 35000 40000 50000 100000 150000 200000 250000
#cells runtime(s)
dragon 2.20 capo 8.6 mPL4 qplace 5.1
- Existing algorithms are 30
Existing algorithms are 30-
- 153% away from the optimal on PEKO
153% away from the optimal on PEKO
- There is
There is significant room for improvement significant room for improvement in placement algorithms! in placement algorithms!
- ROI can be huge
ROI can be huge – – 30% wirelength reduction is equivalent to 30% wirelength reduction is equivalent to
- Move from aluminum to copper, or
Move from aluminum to copper, or
- One process generation shrink
One process generation shrink
3/29/2005 UCLA VLSICAD LAB 23
Experiment with State Experiment with State-
- of
- f-
- the
the-
- Art Placers Using
Art Placers Using PEKO Suite1 & Suite2 (July 2004) PEKO Suite1 & Suite2 (July 2004)
10000 20000 30000 40000 50000 60000 10000 100000 1000000 10000000 #cells runtime(s)
Dragon 2.20 capo 8.6 mPL 4 qplace 5.1 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 10000 100000 1000000 10000000 #cells Multiple of Optima
Dragon 2.20 capo 8.6 mPL 4 qplace 5.1
- Capo,
Capo, QPlace QPlace and mPL scales well in runtime and mPL scales well in runtime
- Average solution quality of each tool shows deterioration by an
Average solution quality of each tool shows deterioration by an additional 4% to additional 4% to 25% when the problem size increases by a factor of 10 25% when the problem size increases by a factor of 10
- QoR
QoR of the existing placement algorithms can be 40%
- f the existing placement algorithms can be 40% -
- 160% away from the optimal
160% away from the optimal for large designs for large designs
3/29/2005 UCLA VLSICAD LAB 24
3/29/2005 UCLA VLSICAD LAB 25
Limitations of the PEKO Examples Limitations of the PEKO Examples
- Optimal solution includes local nets only
Optimal solution includes local nets only
- Unlikely for real designs
Unlikely for real designs
- Measure wirelength only
Measure wirelength only
- Timing and routability are important objectives for placement
Timing and routability are important objectives for placement algorithms as well algorithms as well
3/29/2005 UCLA VLSICAD LAB 26
Impact of Global Connections in Real Examples Impact of Global Connections in Real Examples
circuit height width WL of longest net WL contribution
- f longest 10%
ibm01 8158 4530 7148 51% ibm02 8158 6430 14224 46% ibm03 8158 6740 10624 58% ibm04 8158 9140 15171 53% ibm05 8158 11055 19064 47% ibm06 8158 8715 13966 61% ibm07 8158 14605 14051 51% ibm08 8158 15895 16142 60% ibm09 8158 16395 13780 55% ibm10 8158 27890 30755 53% ibm11 16350 10925 19234 59% ibm12 16350 15545 26748 52% ibm13 16350 12230 19539 59% ibm14 16350 25475 26370 61% ibm15 16350 23785 27284 63% ibm16 16350 34015 42860 59% ibm17 16283 38895 45686 56% ibm18 16350 37065 52846 64%
- Produced by Dragon on
Produced by Dragon on ISPD98 ISPD98
- The wirelength
The wirelength contribution from global contribution from global connections can be connections can be significant! significant!
- Need to consider the
Need to consider the impact of global impact of global connections connections
3/29/2005 UCLA VLSICAD LAB 27
Placement Examples with Known Placement Examples with Known Upperbounds Upperbounds (PEKU) (PEKU)
- Generate nets with optimal
Generate nets with optimal wirelength as in wirelength as in Peko Peko
- Add random connections
Add random connections with emulate global nets with emulate global nets
3/29/2005 UCLA VLSICAD LAB 28
PEKU Suite PEKU Suite
% non- local nets circuit #cell #net #row Row utilizatio n LB UB Peku01 12506 14111 113 85% 8.14E+05 8.14E+05 Peku05 28146 28446 169 85% 1.91E+06 1.91E+06 Peku10 68685 75196 263 85% 4.73E+06 4.73E+06 Peku15 161187 186608 402 85% 1.15E+07 1.15E+07 Peku18 210341 201920 460 85% 1.32E+07 1.32E+07 Peku01 12506 14111 113 85% 8.14E+05 9.23E+05 Peku05 28146 28446 169 85% 1.91E+06 2.24E+06 Peku10 68685 75196 263 85% 4.73E+06 6.17E+06 Peku15 161187 186608 402 85% 1.15E+07 1.71E+07 Peku18 210341 201920 460 85% 1.32E+07 2.01E+07 Peku01 12506 14111 113 85% 8.14E+05 1.02E+06 Peku05 28146 28446 169 85% 1.91E+06 2.63E+06 Peku10 68685 75196 263 85% 4.73E+06 7.52E+06 Peku15 161187 186608 402 85% 1.15E+07 2.30E+07 Peku18 210341 201920 460 85% 1.32E+07 2.75E+07 Up to 10% 0.25% 0.50%
…
URL: http://cadlab.cs.ucla.edu/~pubbench/peku.htm
3/29/2005 UCLA VLSICAD LAB 29
Experiment Results on PEKU, July 2004 Experiment Results on PEKU, July 2004
1 1.2 1.4 1.6 1.8 2 2.2 0.00% 0.25% 0.50% 0.75% 1.00% 2.00% 5.00% 10.00% % of non-local nets Quality Ratio capo 8.6 dragon 2.20 mP G 1.0 mP L 4
- Absolute value of the
Absolute value of the QRs QRs may not be meaningful, but it helps to identify the may not be meaningful, but it helps to identify the technique that works best under each scenario technique that works best under each scenario
- No existing placer can consistently produce the best quality
No existing placer can consistently produce the best quality
3/29/2005 UCLA VLSICAD LAB 30
Center-to-center HPWL = 1029536. Pin-to-pin HPWL = 264944.
In Preparation: PEKO In Preparation: PEKO-
- MS (Mixed
MS (Mixed-
- Size PEKO)
Size PEKO)
As of March 2005, the best result of mPL5
- n this benchmark is
still over 6X greater than optimal (in pin- to-pin half-perimeter wirelength)!
3/29/2005 UCLA VLSICAD LAB 31
Observations from Gap Analysis Observations from Gap Analysis
- Significant opportunity in placement
Significant opportunity in placement
- Existing algorithms may produce solutions far away from the
Existing algorithms may produce solutions far away from the
- ptimal
- ptimal
- The quality result of the same placer varies for circuits of
The quality result of the same placer varies for circuits of similar size but different characteristic similar size but different characteristic
- Scalability problem in runtime and solution quality
Scalability problem in runtime and solution quality
- Significant ROI
Significant ROI
- Benefit equal to one to two generations of process scaling
Benefit equal to one to two generations of process scaling
- But without requiring multi
But without requiring multi-
- billion dollar investment (we hope!)
billion dollar investment (we hope!)
3/29/2005 UCLA VLSICAD LAB 32
Outline Outline
- Introduction
Introduction
- Gap Analysis of Existing Placement Algorithms
Gap Analysis of Existing Placement Algorithms
- Highlights from UCLA mPL5
Highlights from UCLA mPL5
- Multiscale Optimization Framework
Multiscale Optimization Framework
- Generic Force
Generic Force-
- Directed Formulation
Directed Formulation
- Multiscale Nonlinear
Multiscale Nonlinear-
- Programming Algorithm
Programming Algorithm
3/29/2005 UCLA VLSICAD LAB 33
Multilevel Optimization Framework Multilevel Optimization Framework
Interpolation & Relaxation (optimization) Coarsening (Clustering)
Problem size decreases
- Multilevel coarsening generates smaller problem sizes at coarser levels
faster optimization at coarser levels
- May explore different aspects of the solution space at different levels
- Gradual refinement on good solutions from coarser levels is very efficient
- Successful in many applications
- Originally developed for PDEs
- Recent success in VLSI CAD: partitioning, placement, routing
Given problem
3/29/2005 UCLA VLSICAD LAB 34
Multilevel Placement Multilevel Placement
- Coarsening:
Coarsening: build a hierarchy of problem approximations by build a hierarchy of problem approximations by generalized clustering generalized clustering
- Relaxation:
Relaxation: improve the placement at each level by iterative improve the placement at each level by iterative
- ptimization
- ptimization
- Interpolation:
Interpolation: transfer coarse transfer coarse-
- level solution to adjacent, finer
level solution to adjacent, finer level (generalized declustering) level (generalized declustering)
- Multilevel Flow:
Multilevel Flow: multiple traversals over multiple hierarchies multiple traversals over multiple hierarchies (V (V-
- cycle variations)
cycle variations)
3/29/2005 UCLA VLSICAD LAB 35
Multilevel Methods: Coarsening by Recursive Multilevel Methods: Coarsening by Recursive Aggregation Aggregation
- Recursive aggregation defines the hierarchy.
Recursive aggregation defines the hierarchy.
- Different aggregation algorithms can be used on different
Different aggregation algorithms can be used on different levels and/or in different V levels and/or in different V-
- cycles.
cycles.
- Example: First
Example: First-
- Choice Clustering (
Choice Clustering (hMetis hMetis [ [Karypis Karypis 1999]). 1999]).
Merge each vertex with its “best” neighbor Merged Nets
3/29/2005 UCLA VLSICAD LAB 36
Multilevel Methods: Relaxation Multilevel Methods: Relaxation
( (Intralevel Intralevel Optimization) Optimization)
- Iterative improvement at each
Iterative improvement at each level by local or global fast level by local or global fast computation computation
- Additional global improvement
Additional global improvement comes from the multilevel comes from the multilevel hierarchy. hierarchy.
- Example:
Example: Goto Goto-
- based discrete
based discrete exchange exchange
- Calculate A’s optimal-wirelength location, holding other cells fixed.
- Compute a chain A, B, C, D, E, where B is a randomly selected neighbor of
A’s optimal location, etc.
- Examine all permutations of the chain and take the best one.
3/29/2005 UCLA VLSICAD LAB 37
Multilevel Methods: Interpolation Multilevel Methods: Interpolation (Generalized Declustering) (Generalized Declustering)
Place representative components Place others by weighted interpolation
- Transfer a partial solution from a coarser level to its adjacent
Transfer a partial solution from a coarser level to its adjacent finer level finer level
- Example: place a component ( ) at the weighted average of
Example: place a component ( ) at the weighted average of the positions of the clusters containing its neighbors the positions of the clusters containing its neighbors
3/29/2005 UCLA VLSICAD LAB 38
Iterated Multilevel Flow Iterated Multilevel Flow
Make use of placement solution from 1st V-cycle First Choice (FC) clustering Geometric based FC clustering
3/29/2005 UCLA VLSICAD LAB 39
Iterated Multilevel Flow Iterated Multilevel Flow
Iterated V-Cycles F-Cycle Backtracking V-Cycle
3/29/2005 UCLA VLSICAD LAB 40
Relative Wirelength mPL 1.0 [ICCAD00]
- Recursive ESC clustering
- NLP at coarsest level
- Goto discrete relaxation
- Slot Assignment legalization
- Domino detailed placement
year 2000 2001 2002 2003 2004
A Brief History of mPL
mPL 1.1
- FC-Clustering
- added partitioning to legalization
mPL 2.0
- RDFL relaxation
- primal-dual netlist pruning
mPL 3.0 [ICCAD 03]
- QRS relaxation
- AMG interpolation
- multiple V-cycles
- cell-area fragmentation
UNIFORM CELL SIZE NON-UNIFORM CELL SIZE
mPL 4.0
- improved DP
- better coarsening
- backtracking V-cycle
mPL 5.0
- Multilevel Force-Directed
3/29/2005 UCLA VLSICAD LAB 41
Kraftwerk Framework for Force Kraftwerk Framework for Force-
- Directed Placement
Directed Placement [ [Eisenmann Eisenmann and Johannes 98] and Johannes 98]
- Minimize quadratic wirelength
Minimize quadratic wirelength
- Incorporate density
Incorporate density-
- gradient forces
gradient forces (f (fk
k) acting on cells into the optimality
) acting on cells into the optimality condition: condition:
- Assume forces are zero at infinity.
Assume forces are zero at infinity.
- Iteratively update
Iteratively update v vk
k and
and f fk
k.
.
- Key limitation: extensive tuning
Key limitation: extensive tuning required for proper force scaling. required for proper force scaling. Cell density is a continuous but NON-SMOOTH function
- f position
3/29/2005 UCLA VLSICAD LAB 42
Generalized Force Directed Method in mPL5 Generalized Force Directed Method in mPL5
- Our generalized force directed method
Our generalized force directed method
- Minimize log
Minimize log-
- sum
sum-
- exp wirelength
exp wirelength W(x W(x) ) [Naylor 01; [Naylor 01; Kahng Kahng and Wang 04] subject to even and Wang 04] subject to even bin density constraints bin density constraints area chip by divded area cells total where , ) ( . . ) ( min = = c c x d t s x W
3/29/2005 UCLA VLSICAD LAB 43
mPL5 Generalized Force mPL5 Generalized Force-
- Directed Placement
Directed Placement
- Basic formulation
Basic formulation
- Smooth the density constraints by
Smooth the density constraints by Laplace Laplace transformation and transformation and solving a Poisson Equation: solving a Poisson Equation:
- Assume Neumann boundary conditions: forces pointing outside
Assume Neumann boundary conditions: forces pointing outside the chip boundary are zero. the chip boundary are zero.
- Can solve a
Can solve a discretized discretized version efficiently using fast discrete version efficiently using fast discrete cosine cosine tranformation tranformation
area chip by divded area cells total where , ) ( . . ) ( min = = c c x d t s x W
3/29/2005 UCLA VLSICAD LAB 44
Wirelength Wirelength Estimation Estimation
(a) Steiner Tree Rectilinear Length = 14 (b) Steiner Tree with Trunk Rectilinear Length = 15 (c) Minimum Spanning Tree Rectilinear Length = 16 (d) Chain Rectilinear Length = 17 (e) Complete Graph Rectilinear Length = 42
Approximation: half perimeter
- f the bounding box
3/29/2005 UCLA VLSICAD LAB 45
Objective Function Used in mPL5 Objective Function Used in mPL5
- Log
Log-
- sum
sum-
- exp smooth approximation to half
exp smooth approximation to half-
- perimeter
perimeter wirelength [Naylor 2001; wirelength [Naylor 2001; Kahng Kahng and Wang 2004]: and Wang 2004]:
- Other approximation is also possible (using p
Other approximation is also possible (using p-
- norm)
norm)
3/29/2005 UCLA VLSICAD LAB 46
mPL5 Nonlinear mPL5 Nonlinear-
- Programing
Programing Solution Solution
- Using the
Using the Uzawa Uzawa algorithm to solve the above nonlinear constrained algorithm to solve the above nonlinear constrained minimization problem, we iteratively solve minimization problem, we iteratively solve
- No matrix storage and no second derivatives are computed.
No matrix storage and no second derivatives are computed.
- Use multilevel approach to speed
Use multilevel approach to speed-
- up computation and better quality
up computation and better quality
3/29/2005 UCLA VLSICAD LAB 47
mPL5 Framework
Level at which GFD is applied Level 3 Level 2 Level 1 C C I I C+I C+I I I C Coarsening I Interpolation Keep coarsening until # cells less than 500
3/29/2005 UCLA VLSICAD LAB 48
mPL5 VS other state mPL5 VS other state-
- of
- f-
- the
the-
- art
art-
- placers on
placers on FastPlace FastPlace IBM Standard Cell Placement Benchmarks (March 2005) IBM Standard Cell Placement Benchmarks (March 2005)
12.38 1 1.09, 2.29 1.08, 0.18 1.06, 2.03 1.07, 0.3 1 2 3 4 5 6 7 8 9 10 11 12 13 0 . 9 5 1
- 1. 0 5
- 1. 1
Scaled wirelength Scaled runtime
Ca po9 . 0 Dr a gon3 .0 1 Fa st P la c e 1.0 Fe ngshui5 . 0 mP L5 mP L5 - f a st
3/29/2005 UCLA VLSICAD LAB 49
Scalability plot of mPL5 Scalability plot of mPL5-
- fast VS FastPlace1.0 on
fast VS FastPlace1.0 on FastPlace FastPlace IBM Benchmarks IBM Benchmarks
y = 0.0001x1.2409
( mPL5- f ast)
y = 5E-06x1.4995
( FastPlace1.0)
200 400 600 800 50000 100000 150000 200000 #Cells
Runtime
FastPlace1.0 mPL5-fast
mPL5-fast is slightly more scalable than FastPlace1.0
3/29/2005 UCLA VLSICAD LAB 50
Placement Plot of Placers on IBM02 Placement Plot of Placers on IBM02
mPL5
- Rel. WL = 1.00
Fengshui 5.0
- Rel. WL = 1.11
Capo 9.0
- Rel. WL = 1.17
3/29/2005 UCLA VLSICAD LAB 51
Placement Plot of Placers on IBM10 Placement Plot of Placers on IBM10
mPL5
- Rel. WL = 1.00
Fengshui 5.0
- Rel. WL = 1.15
Capo 9.0
- Rel. WL = 1.28
3/29/2005 UCLA VLSICAD LAB 52
mPL5 VS other state mPL5 VS other state-
- of
- f-
- the
the-
- art
art-
- placers on PEKO
placers on PEKO-
- pad
pad
1.00 1.20 1.40 1.60 1.80 2.00 2.20 2.40 2.60 2.80 12506 27220 45639 68685 83709 182980 #Cells Quality ratio Capo9.0 Dragon3.01 F engshui5.0 F astPlace1.0 mPL5 mPL5-fast
3/29/2005 UCLA VLSICAD LAB 53
mPL movie mPL movie
3/29/2005 UCLA VLSICAD LAB 54
Concluding Remarks Concluding Remarks
- There is still significant opportunity to improve placement
There is still significant opportunity to improve placement technologies. technologies.
- mPL5 achieves improvement by incorporating
mPL5 achieves improvement by incorporating denisity denisity constrained nonlinear programming into a multilevel constrained nonlinear programming into a multilevel framework. framework.
- ISPD
ISPD’ ’2005 placement contest 2005 placement contest
- Multiscale Optimization Framework
Multiscale Optimization Framework
- Generic Force
Generic Force-
- Directed Formulation
Directed Formulation
- Multiscale Nonlinear
Multiscale Nonlinear-
- Programming Algorithm
Programming Algorithm
ISPD 2005 Placement Contest ISPD 2005 Placement Contest
- 9 teams worldwide competing
9 teams worldwide competing
- SUNY Binghamton (
SUNY Binghamton (FengShui FengShui), T. Munich (FD), UCSD ), T. Munich (FD), UCSD ( (Aplace Aplace), UCLA (Dragon, ), UCLA (Dragon, mPL mPL), U Michigan (Capo), ), U Michigan (Capo), … …
- 8 new large
8 new large-
- scale real industrial benchmarks released
scale real industrial benchmarks released
- n March 20 at 5pm EST
- n March 20 at 5pm EST
- Results are due March 25 at 5pm EST
Results are due March 25 at 5pm EST
- Web site
Web site
3/29/2005 UCLA VLSICAD LAB 56
Circuit #Objects #Fix'd Obj.'s #Nets #Pins #Pins Fix'd Obj. Design Density Design Utiliz'n
adaptec1
211447 543 221142 944053 20540 75.71% 57.34%
adaptec2
255023 566 266009 1069482 23783 78.56% 44.32%
adaptec3
451650 723 466758 1875039 31187 74.53% 33.66%
adaptec4
496045 1329 515951 1912420 35857 62.67% 27.23%
bigblue1
278164 560 284479 1144691 12835 54.19% 44.67%
bigblue2
557866 23084 577235 2122282 142685 61.80% 37.94%
bigblue3
1096812 1293 1123170 3833218 43111 85.65% 56.68%
bigblue4
2177353 8170 2229886 8900078 189411 65.30% 44.35%
ISPD 2005 Circuit Benchmark Characteristics ISPD 2005 Circuit Benchmark Characteristics
- Design Density = (Total Object Area) / (Chip Area)
Design Density = (Total Object Area) / (Chip Area) Design Design Utiliz Utiliz’ ’n n = (Total = (Total Mov Mov. . Obj
- Obj. Area) / (Unused Chip Area)
. Area) / (Unused Chip Area)
- Adaptec1 and 2 and BigBlue1 have perimeter I/O pads.
Adaptec1 and 2 and BigBlue1 have perimeter I/O pads. All others employ fixed area All others employ fixed area-
- array I/O objects.
array I/O objects.
3/29/2005 UCLA VLSICAD LAB 57
mPL5 Solution to Big Blue 1 mPL5 Solution to Big Blue 1
3/29/2005 UCLA VLSICAD LAB 58
mPL5 Solution to Big Blue 2 mPL5 Solution to Big Blue 2
3/29/2005 UCLA VLSICAD LAB 59
mPL5 Solution to Big Blue 3 mPL5 Solution to Big Blue 3
3/29/2005 UCLA VLSICAD LAB 60
mPL5 Solution to Big Blue 4 mPL5 Solution to Big Blue 4
3/29/2005 UCLA VLSICAD LAB 61
Acknowledgements Acknowledgements
- We would like to thank the supports from
We would like to thank the supports from
- Semiconductor Research Corporation (SRC)
Semiconductor Research Corporation (SRC)
- National Science Foundation (NSF)
National Science Foundation (NSF)
- Industrial sponsors under the California MICRO programs (
Industrial sponsors under the California MICRO programs (Altera Altera, , Intel, Magma, Intel, Magma, Xilinx Xilinx) )
- Hard work of a number of former and current graduate