SALT: Provably Good Routing Topology by a Novel Steiner Shallow-Light Tree Algorithm
Gengjie Chen, Peishan Tu, Evangeline F. Y. Young
Department of Computer Science & Engineering The Chinese University of Hong Kong
Nov 15, 2017
1 / 24
SALT: Provably Good Routing Topology by a Novel S teiner Sh a llow- L - - PowerPoint PPT Presentation
SALT: Provably Good Routing Topology by a Novel S teiner Sh a llow- L ight T ree Algorithm Gengjie Chen , Peishan Tu, Evangeline F. Y. Young Department of Computer Science & Engineering The Chinese University of Hong Kong Nov 15, 2017 1 / 24
1 / 24
◮ Timing and power are crucial in chip design. ◮ In routing tree:
◮ Path length implies wire delay; ◮ Tree weight implies routing resource usage (routability), power consumption, cell
◮ In spanning/Steiner (¯
◮ Shallowness α = max{ dT (r,v)
dG(r,v)|v ∈ V \{r}} ≤ ¯
◮ dG(r, v): distance from v to root r on graph/metric G. ◮ Lightness β =
w(T ) w(MST (G)) ≤ ¯
2 / 24
13, β = 182 39 )
13, β = 54 39)
13, β = 39 39)
13, β = 61 39)
13, β = 44 39) 3 / 24
◮ Spanning (1 + ǫ, O( 1 ǫ))-SLT
◮ ABP/BRBC (1 + 2ǫ, 1 + 2
ǫ ) [Awerbuch, TR’91] [Cong, TCAD’92];
◮ KRY (1 + ǫ, 1 + 2
ǫ ) [Khuller, SODA’93, Algorithmica’95].
◮ Steiner (1 + ǫ, O(log 1 ǫ))-SLT
◮ ES (1 + 2ǫ, 4 + 2⌈log 2
ǫ ⌉) [Elkin, FOCS’11, SICOMP’15].
◮ PD combines SPT and MST [Alpert, TCAD’95]. ◮ Bonn trades off between cell and wire delay [Scheifele, ICCAD’16, Algorithmica’17].
4 / 24
◮ Propose SALT for general-graph Steiner SLT, whose shallowness-lightness bound
ǫ⌉). ◮ Reduce runtime from O(n2) to O(n log n) in Manhattan space. ◮ Integrate SALT with classical RSMA and RSMT algorithms, which provides a
◮ Propose several effective post processing methods.
5 / 24
6 / 24
◮ Construct MST TM. ◮ Identify breakpoints B on Hamiltonian path P. ◮ Obtain Steiner SPT TB on G[B ∪ {r}], and get graph TM ∪ TB. ◮ Construct spanning SPT on TM ∪ TB, which is the output T.
7 / 24
◮ Construct MST TM. ◮ Identify breakpoints B during DFS on TM, which results to forest F. ◮ Obtain Steiner SPT TB on G[B ∪ {r}], and T = F ∪ TB is the output.
8 / 24
◮ Breakpoints will be connected to r by shortest paths. ◮ Other vertexes also benefit.
9 / 24
𝑒𝑈(𝑨𝑚, 𝑤𝑚) 𝑒𝑈(𝑨𝑠, 𝑤𝑠) 𝑥′(𝑨𝑨𝑚) 𝑥′(𝑨𝑨𝑠)
…… 𝑀𝑙 𝑀𝑙+1 𝑀 𝑀′
𝑠
……
𝒜
◮ A full balanced binary tree. ◮ Constructed level by level from bottom. ◮ Merge neighboring vertexes pair by pair into Steiners in each level.
◮ Determine Steiner by minimizing edge weights while preserving shortest paths. ◮ Select a light matching for paring up along (Hamiltonian) circle.
10 / 24
𝑒𝑈(𝑨𝑚, 𝑤𝑚) 𝑒𝑈(𝑨𝑠, 𝑤𝑠) 𝑥′(𝑨𝑨𝑚) 𝑥′(𝑨𝑨𝑠)
…… 𝑀𝑙 𝑀𝑙+1 𝑀 𝑀′
𝑠
……
𝒜
◮ A full balanced binary tree. ◮ Constructed level by level from bottom. ◮ Merge neighboring vertexes pair by pair into Steiners in each level.
◮ Determine Steiner by minimizing edge weights while preserving shortest paths. ◮ Select a light matching for paring up along (Hamiltonian) circle.
path
edge weight
10 / 24
𝑤1 𝑤2 𝑤3 𝑠 𝑤4 𝑤5 𝑤6 𝑤7
11 / 24
◮ Three differences compared to ES:
◮ Tighter criterion for breakpoints; ◮ Better initial topology (MST instead of Hamiltonian path); ◮ Much lighter Steiner SPT (with lightness bound ¯
◮ ES: 1 + 2⌈log n⌉.
◮ SALT generates a Steiner (1 + ǫ, 2 + ⌈log 2 ǫ⌉)-SLT.
◮ ES: (1 + 2ǫ, 4 + 2⌈log 2
ǫ ⌉).
12 / 24
◮ Construct RSMT TM by FLUTE [Chu, TCAD’08]. ◮ Get breakpoints B and forest F. ◮ Obtain RSMA TB on G[B ∪ {r}] by CL [Cordova, TR’94], and T = F ∪ TB is the
13 / 24
◮ Two differences compared to SALT:
◮ Better initial topology (RSMT by FLUTE instead of MST); ◮ Lighter Steiner SPT (RSMA by CL).
◮ Improve shallowness α and lightness β in practice. ◮ Very efficient: O(n log n) time.
14 / 24
◮ Canceling intersected edges ◮ L-shape flipping ◮ U-shape shifting
15 / 24
𝑤1 𝑤4 𝑤3 𝑤2
𝑤1 𝑤2 𝑤4 𝑤3 𝑤3
′
𝑤4
′
3, v′ 4
𝑤1 𝑤2 𝑤4 𝑤3 𝑤4
′
𝑤3
′
𝑨 𝑨′
3v′ 4
𝑤1 𝑤2 𝑤4 𝑤3 𝑤3
′
𝑨 𝑤4
′
3 or v′ 4
𝑤1 𝑤2 𝑤4 𝑤3 𝑤3
′
𝑤4
′
𝑤1 𝑤2 𝑤4 𝑤3 𝑤3
′
𝑤4
′
◮ Improve (i) path length, (ii) wirelength. ◮ Efficiently identified by R-tree. ◮ Best Steiner vertex z should be a child corner of intersection box.
◮ Child corner: the corner closest to a child vertex among four. 16 / 24
◮ Improve (i) path length, (ii) wirelength. ◮ Optimal by dynamic programming [Ho, TCAD’90]. ◮ O(n) due to bounded vertex degree in SALT. ◮ Iterate until no improvement.
17 / 24
𝑤2 𝑤3 𝑤4 𝑤1
𝑤1 𝑤2 𝑤3 𝑤4 𝑤2
′
𝑤3
′
◮ Improve (i) path length, (ii) wirelength, (iii) Elmore delay [Boese, DAC’93].
18 / 24
◮ i.e., ¯
◮ ABP/BRBC (α = 1.90, β = 1.35); ◮ KRY (α = 1.43, β = 1.10); ◮ PD (α = 1.11, β = 1.15); ◮ Bonn (α = 1.22, β = 2.25).
19 / 24
◮ ICCAD 2015 Contest benchmarks with 2.4 million nets (excluding 2-pin nets).
20 / 24
◮ ǫ is set to 20 values ranging from 0 to 73.895. ◮ Three metrics for each tree:
◮ Shallowness α; ◮ Lightness β′ =
w(T ) w(F LUT E) (instead of β = w(T ) w(MST ));
◮ Delay γ = longest Elmore delay among all paths, normalized by a lower bound. 21 / 24
ǫ SALT w/o post proc. SALT w/ post proc. β′ α γ β′ α γ 0.000 1.100 1.000 1.271 1.066 1.000 1.266 0.050 1.074 1.006 1.258 1.052 1.004 1.259 0.075 1.066 1.010 1.256 1.047 1.007 1.257 0.113 1.056 1.016 1.256 1.041 1.011 1.256 0.169 1.046 1.025 1.258 1.034 1.018 1.257 0.253 1.035 1.039 1.263 1.026 1.029 1.261 0.380 1.024 1.057 1.273 1.018 1.044 1.269 0.570 1.015 1.080 1.287 1.011 1.062 1.281 0.854 1.008 1.108 1.305 1.006 1.085 1.296 1.281 1.003 1.136 1.323 1.003 1.109 1.313 1.922 1.001 1.160 1.339 1.001 1.130 1.328 2.883 1.000 1.176 1.349 1.000 1.146 1.337 4.325 1.000 1.187 1.354 1.000 1.157 1.342 6.487 1.000 1.193 1.356 1.000 1.162 1.344 9.731 1.000 1.195 1.357 1.000 1.164 1.344 ... 1.000 1.196 1.357 1.000 1.164 1.344
1 1.02 1.04 1.06 1.08 1.1 1 1.1 1.2
FLUTE CL SALT w/o SALT w/ ◮ Post proc. simultaneously improves shallowness α, lightness β′ and delay γ. ◮ Efficient: routing + post proc. on 2.4 million nets for 20 times in 22.5 min.
22 / 24
1 1.2 1.4 1.6 1.8 2 2.2 1 1.1 1.2 1.3
FLUTE CL SALT ABP KRY PD ES Bonn 1 1.2 1.4 1.6 1.8 2 2.2 1.2 1.3 1.4 1.5
FLUTE CL SALT ABP KRY PD ES Bonn ◮ Dominate other methods in shallowness-lightness trade-off. ◮ Good in delay-lightness trade-off. ◮ No parallel edges.
23 / 24
◮ Steiner (1 + ǫ, 2 + ⌈log 2 ǫ⌉)-SLT for general-graph. ◮ Reduce O(n log n) runtime in Manhattan space. ◮ Integration with classical RSMA and RSMT algorithms. ◮ Effective post processing methods.
◮ Be closer to RSMA for small ǫ. ◮ Consider routing congestion / blockage.
24 / 24