[PPT] - BonnPlace : A Self-Stabilizing Placement Framework Ulrich Brenner, PowerPoint Presentation

SLIDE 1

BonnPlace: A Self-Stabilizing Placement Framework

Ulrich Brenner, Anna Hermann, Nils Hoppmann, Philipp Ochsendorf

Research Institute for Discrete Mathematics, University of Bonn

ISPD 2015

1

SLIDE 2

Placement Problem

Placement Problem Instance: Placement area A, blockages B, cells C. Task: Compute a placement of cells C into A respecting given con- straints and optimizing given objectives. Constraints:

◮ Overlap-free ◮ Respect placement area A ◮ Respect movebounds ◮ Placement in rows

Objectives:

◮ Net length minimization ◮ Low power consumption ◮ Low manufacturing costs ◮ Timing optimization ◮ Routability

2

SLIDE 3

Analytical Placement

Minimize analytical global objective function (ignoring most constraints):

A placement minimizing quadratic netlength on Franziska (633 666 cells, 22 nm).

Task: Work towards an overlap-free placement. Two ideas: Partitioning-based and force-directed placement.

3

SLIDE 4

Partitioning-Based Placement

Idea

◮ Partition chip area recursively into regions. ◮ Assign cells to regions they fit into. ◮ Advantages:

◮ Very effective and efficient. ◮ Many different constraints can be considered accurately

(e.g. bounds on density, blockages, movebounds etc.).

◮ Drawbacks:

◮ Lack of stability. ◮ Hard to reflect standard objective functions during partitioning

(e.g. wirelength).

4

SLIDE 5

Levels of Partitioning-Based BonnPlace

Assignments to regions in levels 1 to 3 (upper row) and 4 to 6 (lower row) on Franziska.

5

SLIDE 6

Force-Directed Placement

Idea:

◮ Pull cells apart from each other in small steps. ◮ Integrate forces into objective function. ◮ Advantages:

◮ Very stable. ◮ Overall objective function is always considered. ◮ Produces very good results in practice.

◮ Drawbacks:

◮ Exact observance of density constraints is difficult. ◮ Placement decisions in a fragmented chip area can be arbitrary. ◮ Complex objective functions hard to model (e.g. congestion and timing). ◮ Significant effort in the legalization may be necessary. 6

SLIDE 7

Previous Work

Partitioning-Based Placers:

◮ Grid Warping [Xiu, Rutenbar ’07]:

◮ Minimize density violations in non-uniform grid and scale to regular bins.

◮ Partitioning-Based BonnPlace [Struzyna ’13]:

◮ Compute cell assignments using flow-based partitioning.

Force-Directed Placers:

◮ NTUPlace4 [Hsu, Chou, Link, Chang ’11]:

◮ Penalize violations of locally smoothened density functions.

◮ SimPL [Kim, Lee, Markov ’12] (incl. SimPLR, Ripple, ComPLx, Maple):

◮ Run rough but fast legalization. ◮ Pull cells towards legalized positions. ◮ Iterate with new analytical placement.

◮ ePlace [Lu, Chen, Chang, Sha, Huang, Ten, Cheng ’14]:

◮ Translate density violations to potential energy of an electrostatic

system.

7

SLIDE 8

Our Approach: Self-Stabilizing BonnPlace

Idea:

Integrate a partitioning-based algorithm into a force-directed framework.

◮ Compute forces based on legal partitioning-based placement. ◮ Each iteration produces a competitive placement.

⇒ Timing and congestion evaluation possible.

◮ The placements in subsequent iterations are similar.

⇒ Transferred information on timing and routability is trustworthy.

◮ Single iteration quite time-consuming.

⇒ Only small numbers of iterations affordable.

◮ Incorporate position-based (un-)clustering scheme.

8

SLIDE 9

Basic Algorithm

Algorithm: Self-Stabilizing BonnPlace

Input: cells C Output: positions pos(c) for all cells c ∈ C

1 iter ← 0 2 while not BreakCondition(pos, iter) do 3

foreach c ∈ C do

4

Connect c to a new pin at position pos(c) via a virtual net of weight 0.01 · iter

5

Partitioning-based GlobalPlacement with position-based (un-)clustering

6

Legalization

7

foreach c ∈ C do

8

Store current location in pos(c)

9

iter ← iter + 1

9

SLIDE 10

Forces in Self-Stabilizing BonnPlace

Iteration 0 : Quadratic Placement Iteration 0 : Legal Placement Iteration 0 : Forces Iteration 1 : Quadratic Placement The impact of forces on the first iteration on Beate (41 287 cells, 22 nm).

10

SLIDE 11

Cell Spreading during Iterations (1)

Iteration 0 Iteration 1 Iteration 2 Iteration 9 Cell spreading after Global QP on Beate.

11

SLIDE 12

Cell Spreading during Iterations (2)

Iteration 0 Iteration 1 Iteration 2 Iteration 9 Cell spreading after Global QP on Renaud (324 595 cells, 45 nm).

12

SLIDE 13

Self-Stabilizing Behavior

1 2 3 4 5 6 7 8 9 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 Iterations Net length [in m]

Bounding box net length development of QP(red) and legal placement (blue) on Meinolf (392 920 cells, 22 nm).

13

SLIDE 14

Clustering in Self-Stabilizing BonnPlace

◮ Perform partitioning-based GlobalPlacement on clustered netlist. ◮ Compute a new clustering in each iteration of the loop. ◮ Dissolve clusters during the levels of GlobalPlacement. ◮ Methods for both clustering and unclustering are position-based. Levels of GlobalPlacement in a single iteration with coarse clustering on Beate.

14

SLIDE 15

Position-based BestChoice Clustering

Overview of BestChoice clustering algorithm:

◮ Iteratively unite neighboring clusters u and v maximizing a certain

clustering score d(u, v) ∈ R+.

◮ Stop clustering when a given target ratio α < 1 of cells is reached.

Connectivity-based clustering score [Alpert ’05] dc(u, v) = 1 a(u) + a(v) ·

u,v∈N

wN |N| BonnPlace position-based clustering score dp(u, v) =   

dc(u,v)

BB(u,v)+s
if BB(u, v) ≤ h
therwise

u1 v1 w1 u2 v2 w2

x1 x2

a(u) total area of cluster u wN weight of net N |N| number of cells in net N BB(u, v) half perimeter of the bounding box of all cells in u or v s, h chip-dependent constants 15

SLIDE 16

Position-based Unclustering

Common Idea for Unclustering:

◮ Dissolve a cluster if its size is large w. r. t. window size in the current

placement level.

◮ Leads to sharp unclustering in certain levels.

BonnPlace position-based unclustering:

◮ Dissolve clusters with members tending into distinct areas. ◮ Keep cells clustered if their respective optimum positions are close

together.

◮ Still dissolve clusters being very large w. r. t. window size.

A B AB A B AB

16

SLIDE 17

Complete Algorithm: Timing- and Congestion-Driven

Algorithm: Self-Stabilizing BonnPlace

Input: cells C Output: positions pos(c) for all cells c ∈ C

1 iter ← 0 2 while not BreakCondition(pos, iter) do 3

foreach c ∈ C do

4

Connect c to a new pin at position pos(c) via a virtual net of weight 0.01 · iter

5

Partitioning-based GlobalPlacement with position-based (un-)clustering

6

Legalization

7

if timing optimization enabled then

8

TimingOptimization

9

if routability driven placement enabled then

10

CongestionAvoidance

11

foreach c ∈ C do

12

Store current location in pos(c)

13

iter ← iter + 1

17

SLIDE 18

Timing-Driven BonnPlace

◮ Apply timing optimization steps at the end of each iteration

Local optimization:

◮ LayerAssignment: Assign critical nets to higher routing layers. ◮ RefinePlace: Locally move cells to straighten timing-critical paths.

[Bock, Held, K¨ ammerling, Schorr: DAC’15] Global optimization:

◮ Increase net weights on remaining critical paths. ◮ Increase force weights of cells moved by RefinePlace.

18

SLIDE 19

Slack Distribution during Iterations

Slack [ps]

Iteration 0 Iteration 1 Iteration 2 Iteration 9

A legalized placement on Ida (20 617 cells, 22 nm) after selected iterations with cells colored by their slack.

19

SLIDE 20

Congestion-Driven BonnPlace

◮ Run a simplified version of BonnRouteGlobal as a congestion

estimation at the end of each iteration. [Ahrens, Gester, Klewinghaus, M¨ uller, Peyer, Schulte, T´ ellez ’15]

◮ Inflate cells in routing-critical areas (for next iteration). ◮ Force router to be quite pessimistic, and forbid larger detours of nets. ◮ Dynamically adaption of the target congestion:

◮ On very congestion-critical chips, the goal is to reduce the congestion to

100 % everywhere.

◮ On less critical chips, the target congestion is stepwise reduced to 90 %. ◮ Less congestion on uncritical chips helps to reduce routing detours. 20

SLIDE 21

Congestion during Iterations (1)

Congestion estimation on Renaud in iterations 0 to 2 (upper row) and 3 to 5 (lower row).

21

SLIDE 22

Congestion during Iterations (2)

Internal pessimistic congestion estimation on superblue9 in iterations 0 and 1 (left, upper row) and iterations 2 and 3 (left, lower row); accurate estimation after placement (right).

22

SLIDE 23

Self-Stabilizing Behavior

0→1 1→2 2→3 3→4 4→5 5→6 6→7 7→8 8→9 0.1 1 10 100

Linear movement [m] between iterations on the 22 nm designs Ida ( , 20 617 cells), Leo ( , 31 590 cells), Antonio ( , 103 795 cells) and Benedikt ( , 370 210 cells). All runs with timing optimization. Note the logarithmic scaling.

23

SLIDE 24

Comparison of Clustering and Unclustering Methods

WL [m] after iteration Final WL Cl. Uncl. 1 2 3 4 5 6 7 8 9 [m] [%] none none 7.23 7.07 7.02 6.98 6.96 6.93 6.92 6.91 6.89 6.88 6.87 100.00 conn pos 7.26 7.10 7.05 7.02 7.01 6.99 6.98 6.98 6.97 6.96 6.93 100.89 pos size 7.48 7.15 7.05 7.02 6.98 6.95 6.94 6.93 6.92 6.91 6.88 100.16 pos pos 7.26 6.98 6.91 6.87 6.85 6.82 6.80 6.78 6.78 6.78 6.76 98.37 Development of linear half-perimeter wirelength on Meinolf. conn: connectivity-based size: size-based pos: position-based ◮ Netlength decreases by 1.63 % due to position-based clustering. ◮ Non-position-based clustering or unclustering increases netlength. ◮ Until level 5 (of 10 ) of GlobalPlacement, the number of cells is

kept below 18 %.

◮ Runtime of GlobalPlacement decreases by 20 % due to clustering.

24

SLIDE 25

Results of Timing-Driven BonnPlace

Ida Leo Antonio Benedikt (20 617 circuits) (31 590 circuits) (103 795 circuits) (370 210 circuits) It. WS SNES WL WS SNES WL WS SNES WL WS SNES WL

143
13.7 0.42
500 -889.4 2.49
278 -724.5 2.61
1949 -7248.5 9.43

1

91
8.9 0.43
206 -107.3 2.52
129 -136.8 2.63
495
463.0 9.47

2

63
7.4 0.46
85
95.1 2.58
36
26.4 2.72
26
0.8 9.60

3

57
6.0 0.47
54
26.7 2.56
35
35.6 2.71
11
0.2 9.47

4

51
5.6 0.47
41
27.5 2.57
33
29.4 2.71
21
0.6 9.45

5

58
5.8 0.47
41
28.9 2.54
66
38.7 2.74
21
0.7 9.37

6

54
5.3 0.47
41
26.6 2.55
27
28.3 2.74
12
0.2 9.34

7

57
4.9 0.47
41
23.4 2.54
25
23.0 2.70
18
0.4 9.30

8

54
4.8 0.47
41
23.4 2.55
38
24.5 2.67
3
0.0 9.25

9

52
4.6 0.46
41
25.2 2.54
19
12.0 2.65

5 0.0 9.23 Experimental results on industrial 22 nm-designs. WS: worst slack (in picoseconds) SNES: overall sum of negative endpoint slacks (in 103 picoseconds) WL: half-perimeter wirelength (in meters)

25

SLIDE 26

Results of Congestion-Driven BonnPlace

Self-Stab. BonnPlace NTUPlace4 [Cong et al. ’13] Chip sWL RC [%] IF [%] sWL RC [%] sWL superblue2 6.05 100.17 46.42 6.24 (+ 3.1) 100.68 6.14 (+ 1.5) superblue3 3.22 100.00 58.03 3.62 (+12.3) 103.53 3.60 (+11.6) superblue6 3.37 100.00 31.77 3.42 (+ 1.5) 101.21 3.40 (+ 0.8) superblue7 4.07 100.00 19.23 3.99 (− 2.1) 100.68 3.95 (− 3.0) superblue9 2.35 100.00 26.51 2.55 (+ 8.3) 102.48 2.50 (+ 6.3) superblue11 3.44 100.02 15.97 3.42 (− 0.6) 100.02 3.40 (− 1.2) superblue12 2.80 100.01 41.57 3.12 (+11.2) 100.02 3.04 (+ 8.4) superblue14 2.26 100.00 15.53 2.26 (− 0.3) 100.07 2.45 (+ 8.3) superblue16 2.65 100.00 31.85 2.80 (+ 5.8) 102.39 2.74 (+ 3.4) superblue19 1.51 100.00 20.30 1.53 (+ 1.2) 100.61 1.51 (− 0.2) Average 100.02 30.72 (+ 4.0) 101.17 (+ 3.6) Experimental results on DAC’12 placement benchmarks. sWL: Scaled wirelength ×108 (the objective function of the contest) RC: Routing congestion in % (penalty for congestion) IF: Inflation of the cell sizes relative to the initial cell areas in %

26