BonnPlace: A Self-Stabilizing Placement Framework
Ulrich Brenner, Anna Hermann, Nils Hoppmann, Philipp Ochsendorf
Research Institute for Discrete Mathematics, University of Bonn
ISPD 2015
1
BonnPlace : A Self-Stabilizing Placement Framework Ulrich Brenner, - - PowerPoint PPT Presentation
BonnPlace : A Self-Stabilizing Placement Framework Ulrich Brenner, Anna Hermann, Nils Hoppmann, Philipp Ochsendorf Research Institute for Discrete Mathematics, University of Bonn ISPD 2015 1 Placement Problem Placement Problem Placement area
Research Institute for Discrete Mathematics, University of Bonn
1
◮ Overlap-free ◮ Respect placement area A ◮ Respect movebounds ◮ Placement in rows
◮ Net length minimization ◮ Low power consumption ◮ Low manufacturing costs ◮ Timing optimization ◮ Routability
2
A placement minimizing quadratic netlength on Franziska (633 666 cells, 22 nm).
3
◮ Partition chip area recursively into regions. ◮ Assign cells to regions they fit into. ◮ Advantages:
◮ Very effective and efficient. ◮ Many different constraints can be considered accurately
(e.g. bounds on density, blockages, movebounds etc.).
◮ Drawbacks:
◮ Lack of stability. ◮ Hard to reflect standard objective functions during partitioning
(e.g. wirelength).
4
Assignments to regions in levels 1 to 3 (upper row) and 4 to 6 (lower row) on Franziska.
5
◮ Pull cells apart from each other in small steps. ◮ Integrate forces into objective function. ◮ Advantages:
◮ Very stable. ◮ Overall objective function is always considered. ◮ Produces very good results in practice.
◮ Drawbacks:
◮ Exact observance of density constraints is difficult. ◮ Placement decisions in a fragmented chip area can be arbitrary. ◮ Complex objective functions hard to model (e.g. congestion and timing). ◮ Significant effort in the legalization may be necessary. 6
◮ Grid Warping [Xiu, Rutenbar ’07]:
◮ Minimize density violations in non-uniform grid and scale to regular bins.
◮ Partitioning-Based BonnPlace [Struzyna ’13]:
◮ Compute cell assignments using flow-based partitioning.
◮ NTUPlace4 [Hsu, Chou, Link, Chang ’11]:
◮ Penalize violations of locally smoothened density functions.
◮ SimPL [Kim, Lee, Markov ’12] (incl. SimPLR, Ripple, ComPLx, Maple):
◮ Run rough but fast legalization. ◮ Pull cells towards legalized positions. ◮ Iterate with new analytical placement.
◮ ePlace [Lu, Chen, Chang, Sha, Huang, Ten, Cheng ’14]:
◮ Translate density violations to potential energy of an electrostatic
system.
7
◮ Compute forces based on legal partitioning-based placement. ◮ Each iteration produces a competitive placement.
◮ The placements in subsequent iterations are similar.
◮ Single iteration quite time-consuming.
◮ Incorporate position-based (un-)clustering scheme.
8
Input: cells C Output: positions pos(c) for all cells c ∈ C
1 iter ← 0 2 while not BreakCondition(pos, iter) do 3
foreach c ∈ C do
4
Connect c to a new pin at position pos(c) via a virtual net of weight 0.01 · iter
5
Partitioning-based GlobalPlacement with position-based (un-)clustering
6
Legalization
7
foreach c ∈ C do
8
Store current location in pos(c)
9
iter ← iter + 1
9
Iteration 0 : Quadratic Placement Iteration 0 : Legal Placement Iteration 0 : Forces Iteration 1 : Quadratic Placement The impact of forces on the first iteration on Beate (41 287 cells, 22 nm).
10
Iteration 0 Iteration 1 Iteration 2 Iteration 9 Cell spreading after Global QP on Beate.
11
Iteration 0 Iteration 1 Iteration 2 Iteration 9 Cell spreading after Global QP on Renaud (324 595 cells, 45 nm).
12
Bounding box net length development of QP(red) and legal placement (blue) on Meinolf (392 920 cells, 22 nm).
13
◮ Perform partitioning-based GlobalPlacement on clustered netlist. ◮ Compute a new clustering in each iteration of the loop. ◮ Dissolve clusters during the levels of GlobalPlacement. ◮ Methods for both clustering and unclustering are position-based. Levels of GlobalPlacement in a single iteration with coarse clustering on Beate.
14
◮ Iteratively unite neighboring clusters u and v maximizing a certain
◮ Stop clustering when a given target ratio α < 1 of cells is reached.
dc(u,v)
u1 v1 w1 u2 v2 w2
a(u) total area of cluster u wN weight of net N |N| number of cells in net N BB(u, v) half perimeter of the bounding box of all cells in u or v s, h chip-dependent constants 15
◮ Dissolve a cluster if its size is large w. r. t. window size in the current
◮ Leads to sharp unclustering in certain levels.
◮ Dissolve clusters with members tending into distinct areas. ◮ Keep cells clustered if their respective optimum positions are close
◮ Still dissolve clusters being very large w. r. t. window size.
16
Input: cells C Output: positions pos(c) for all cells c ∈ C
1 iter ← 0 2 while not BreakCondition(pos, iter) do 3
foreach c ∈ C do
4
Connect c to a new pin at position pos(c) via a virtual net of weight 0.01 · iter
5
Partitioning-based GlobalPlacement with position-based (un-)clustering
6
Legalization
7
if timing optimization enabled then
8
TimingOptimization
9
if routability driven placement enabled then
10
CongestionAvoidance
11
foreach c ∈ C do
12
Store current location in pos(c)
13
iter ← iter + 1
17
◮ Apply timing optimization steps at the end of each iteration
◮ LayerAssignment: Assign critical nets to higher routing layers. ◮ RefinePlace: Locally move cells to straighten timing-critical paths.
◮ Increase net weights on remaining critical paths. ◮ Increase force weights of cells moved by RefinePlace.
18
Slack [ps]
A legalized placement on Ida (20 617 cells, 22 nm) after selected iterations with cells colored by their slack.
19
◮ Run a simplified version of BonnRouteGlobal as a congestion
◮ Inflate cells in routing-critical areas (for next iteration). ◮ Force router to be quite pessimistic, and forbid larger detours of nets. ◮ Dynamically adaption of the target congestion:
◮ On very congestion-critical chips, the goal is to reduce the congestion to
100 % everywhere.
◮ On less critical chips, the target congestion is stepwise reduced to 90 %. ◮ Less congestion on uncritical chips helps to reduce routing detours. 20
Congestion estimation on Renaud in iterations 0 to 2 (upper row) and 3 to 5 (lower row).
21
Internal pessimistic congestion estimation on superblue9 in iterations 0 and 1 (left, upper row) and iterations 2 and 3 (left, lower row); accurate estimation after placement (right).
22
Linear movement [m] between iterations on the 22 nm designs Ida ( , 20 617 cells), Leo ( , 31 590 cells), Antonio ( , 103 795 cells) and Benedikt ( , 370 210 cells). All runs with timing optimization. Note the logarithmic scaling.
23
WL [m] after iteration Final WL Cl. Uncl. 1 2 3 4 5 6 7 8 9 [m] [%] none none 7.23 7.07 7.02 6.98 6.96 6.93 6.92 6.91 6.89 6.88 6.87 100.00 conn pos 7.26 7.10 7.05 7.02 7.01 6.99 6.98 6.98 6.97 6.96 6.93 100.89 pos size 7.48 7.15 7.05 7.02 6.98 6.95 6.94 6.93 6.92 6.91 6.88 100.16 pos pos 7.26 6.98 6.91 6.87 6.85 6.82 6.80 6.78 6.78 6.78 6.76 98.37 Development of linear half-perimeter wirelength on Meinolf. conn: connectivity-based size: size-based pos: position-based ◮ Netlength decreases by 1.63 % due to position-based clustering. ◮ Non-position-based clustering or unclustering increases netlength. ◮ Until level 5 (of 10 ) of GlobalPlacement, the number of cells is
◮ Runtime of GlobalPlacement decreases by 20 % due to clustering.
24
Ida Leo Antonio Benedikt (20 617 circuits) (31 590 circuits) (103 795 circuits) (370 210 circuits) It. WS SNES WL WS SNES WL WS SNES WL WS SNES WL
1
2
3
4
5
6
7
8
9
5 0.0 9.23 Experimental results on industrial 22 nm-designs. WS: worst slack (in picoseconds) SNES: overall sum of negative endpoint slacks (in 103 picoseconds) WL: half-perimeter wirelength (in meters)
25
Self-Stab. BonnPlace NTUPlace4 [Cong et al. ’13] Chip sWL RC [%] IF [%] sWL RC [%] sWL superblue2 6.05 100.17 46.42 6.24 (+ 3.1) 100.68 6.14 (+ 1.5) superblue3 3.22 100.00 58.03 3.62 (+12.3) 103.53 3.60 (+11.6) superblue6 3.37 100.00 31.77 3.42 (+ 1.5) 101.21 3.40 (+ 0.8) superblue7 4.07 100.00 19.23 3.99 (− 2.1) 100.68 3.95 (− 3.0) superblue9 2.35 100.00 26.51 2.55 (+ 8.3) 102.48 2.50 (+ 6.3) superblue11 3.44 100.02 15.97 3.42 (− 0.6) 100.02 3.40 (− 1.2) superblue12 2.80 100.01 41.57 3.12 (+11.2) 100.02 3.04 (+ 8.4) superblue14 2.26 100.00 15.53 2.26 (− 0.3) 100.07 2.45 (+ 8.3) superblue16 2.65 100.00 31.85 2.80 (+ 5.8) 102.39 2.74 (+ 3.4) superblue19 1.51 100.00 20.30 1.53 (+ 1.2) 100.61 1.51 (− 0.2) Average 100.02 30.72 (+ 4.0) 101.17 (+ 3.6) Experimental results on DAC’12 placement benchmarks. sWL: Scaled wirelength ×108 (the objective function of the contest) RC: Routing congestion in % (penalty for congestion) IF: Inflation of the cell sizes relative to the initial cell areas in %
26