Rupesh S. Shelar
Technology & Manufacturing Group Intel Corporation, Hillsboro, OR
March 31st 2009, ISPD 2009, San Diego
An Algorithm for Routing With Capacitance/Distance Constraints for Clock Distribution in Microprocessors
An Algorithm for Routing With Capacitance/Distance Constraints for - - PowerPoint PPT Presentation
An Algorithm for Routing With Capacitance/Distance Constraints for Clock Distribution in Microprocessors Rupesh S. Shelar Technology & Manufacturing Group Intel Corporation, Hillsboro, OR March 31 st 2009, ISPD 2009, San Diego Objective
Rupesh S. Shelar
Technology & Manufacturing Group Intel Corporation, Hillsboro, OR
March 31st 2009, ISPD 2009, San Diego
An Algorithm for Routing With Capacitance/Distance Constraints for Clock Distribution in Microprocessors
2
Objective
constraints, which arises in microprocessor clock distribution
3
Agenda
4
Clock Distribution
– Noise, SI – Skew – Delay – Slope – Power
5
Clock Network Classification
– Grid + buffered trees
– Buffered trees
ISPD’07, Maβberg et al. JA’08
– Unbuffered trees
– Link-inserted (buffered) clock trees
6
Microprocessor Clock Hierarchy
Local Clock Network: CTS Solution Space
in most high speed processors: – Distributed as a grid followed by trees
PLL
Global Clock Distribution Using Multiple spines Tunable Grid Buffers Regional Clock Buffers Local Clock Buffers Clock Grid RCBs RCBs LCBs LCBs To state Elements
Global Clock Routing = Post-grid Clock Distribution
7
Block-level Clock Layout
shielding/spacing
global clock distribution
8
Microprocessor Layout Hierarchy
layout areas
blocks
macros
– 100s to 1000s block-level clock ports – Grid-wires and tracks reserved for clock routes, typically in upper metal layers
An example layout area
Grid wires
9
Post-grid Global Clock Distribution
areas
blocks
macros
– 100s to 1000s block-level clock ports – Grid-wires and tracks reserved for clock routes, typically in upper metal layers
An example layout area
Grid wires Vertical reserved tracks
10
Post-grid Global Clock Distribution
areas
blocks
macros
– 100s to 1000s block-level clock ports – Grid-wires and tracks reserved for clock routes, typically in upper metal layers
An example layout area
Grid wires Vertical reserved tracks Horizontal reserved tracks
11
Post-grid Global Clock Distribution
areas
blocks
macros
– 100s to 1000s block-level clock ports – Grid-wires and tracks reserved for clock routes, typically in upper metal layers – Ports marked by blue squares
An example layout area
Grid wires Vertical reserved tracks Horizontal reserved tracks
12
Grid wires
Post-grid Global Clock Distribution
areas
blocks
macros
– 100s to 1000s block-level clock ports – Grid-wires and tracks reserved for clock routes, typically in upper metal layers – Ports marked by blue squares Vertical reserved tracks Horizontal Reserved tracks
Zoomed in picture (before-routing)
13
Horizontal wires
Post-grid Clock Distribution
areas
macros
1000s block-level clock ports
reserved for clock routes, typically in upper metal layers – Ports marked by blue squares
Vertical wires M8 grid wires
Zoomed in picture (post-routing)
14
Motivation for Fast Post-Grid Global Clock Distribution
simulations yield inaccurate arrival times at block-level clock pins
– Affects timing convergence: path reordering due to actual arrival times
power, …
– If loads are too high, the clock may not toggle – Poor slopes at block-level ports may affect the chip-frequency – Block-level timing convergence does affect the load on grid
15
Previous work: Post-grid Clock Distribution
limited to high-performance processors
– Mostly manual, using stable block-level data
– May have to be performed iteratively
level ECOs
16
Agenda
17
Example
M6 track
M8 Grid Wire M8 Grid Wire
M7 track M5 port
Global clock routing problem instance A routing solution
18
M6 track
M8 Grid Wire M8 Grid Wire
M7 track M5 port
Problem Formulation: Graph Construction
p1 p3 c1 c6 s1 s3
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1
19
Problem Statement
p* nodes such that
– For any tree T, distance from sT* to pT* ≤ Distancelimit – For any tree T, ∑Cap(pT*) ≤ Caplimit – ∑over all trees T* Wirelength(T*) is minimum
now)
– Grid loading constraint: Loads including that of interconnects on a grid wire is less than specified limit
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1
20
Trees to Global Clock Routes
A routing solution
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 p3 p2 p9 T1 T3 T2 T4
21
Tress to Global Clock Routes
c1 c6 p1 s4 c9
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 p3 p2 p9
Another routing solution
T1 T3 T2
22
Agenda
23
Algorithm for Routing with Capacitance and Distance constraint
distance constraint
– Can be transformed to clustering with capacity constraints to minimize wirelength, which is NP-complete
problem
– Start growing trees from the source nodes – Grow trees by adding edges till all ports are connected
– Edges are sorted in ascending order of the wire-cap to minimize total wire-cap due to global clock routes
– Add edges iff:
– Doing so does not violate capacitance/distance constraints – The node is already not connected to grid by some other route (tree)
24
Tree Growing Heuristic: Example
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1 s2
Tree Growing
c1 c6 p1 s4 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1
13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13
25
Tree Growing Heuristic: Example
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1
Tree Growing
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1
13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13
26
Tree Growing Heuristic: Example
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1
Tree Growing
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1
13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13
27
Tree Growing Heuristic: Example
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1
Tree Growing
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1
13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13
28
Tree Growing Heuristic: Example
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1
Tree Growing
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1
13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13
29
Tree Growing Heuristic: Example
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1
Tree Growing
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1
13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13
30
Tree Growing Heuristic: Example
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1
Tree Growing
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1
13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13
31
Tree Growing Heuristic: Example
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1
Tree Growing
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1
13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13
32
Tree Growing Heuristic: Example
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1
Tree Growing
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1
13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13
33
Tree Growing Heuristic: Example
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1
Tree Growing
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1
13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13
34
Tree Growing Heuristic: Example
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1
Tree Growing
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1
13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13
35
Tree Growing Heuristic: Example
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1
Tree Growing
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1
13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13
36
Tree Growing Heuristic: Example
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1
Tree Growing
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1
13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13
37
Tree Growing Heuristic: Example
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1
Tree Growing
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1
13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13
38
Tree Growing Heuristic: Example
Routing Graph
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1
Tree Growing
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1
13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13
39
Remove unnecessary trees/edges
c1 c6 p1 s4 c9
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 p3 p2 p9 s1 s1
Tree Growing
c1 c6 p1 s4 s2 c4 c9 c5
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1
Tree Growing Heuristic: Example
40
Tree Growing Heuristic: Example
Routing solution Routing solution on graph
c1 c6 p1 s4 c9
c10
p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 p3 p2 p9 s1 s1
41
Time complexity and limitations
the routing graph
– |E| ≤ 2|P| + 4 mn = O(|P|+mn), where P is the set of ports, m = number of horizontal tracks or grid-wires, n = number of vertical tracks
10 seconds
– Typical problem size: 1000 ports; ~500 tracks each in horizontal/vertical direction; area 1000x1000 um2
– The special case of underlying problem leads to a bin-packing problem
42
Comparison with Nearest Source Heuristic
c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14 14 5
Routing Graph c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14 14
Solution due to Nearest Source Heuristic
43
Comparison with Nearest Source Heuristic
c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14 14 5
Routing Graph c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14 14
Solution due to Nearest Source Heuristic c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14 14
Tree Growing Heuristic
5
44
Comparison with Nearest Source Heuristic
c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14 14 5
Routing Graph c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14 14
Solution due to Nearest Source Heuristic c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14 14
Tree Growing Heuristic
5
45
Comparison with Nearest Source Heuristic
c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14 14 5
Routing Graph c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14 14
Solution due to Nearest Source Heuristic c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14 14
Tree Growing Heuristic
5
46
Comparison with Nearest Source Heuristic
c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14 14 5
Routing Graph c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14 14
Solution due to Nearest Source Heuristic c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14 14
Tree Growing Heuristic
5
47
Comparison with Nearest Source Heuristic
c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14 14 5
Routing Graph c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14 14
Solution due to Nearest Source Heuristic c1 s1 s1 s1 c2 s3 p1 p2 p3
2 14
Solution due to Tree Growing Heuristic
5
Wirelength =30 Wirelength = 21
48
Agenda
49
Global Clock Distribution Flow
– grid-wires, M6/M7 track locations from full-chip layout – location and cap. on clock ports
constraints on routes from grid
– Specified to ensure good slopes/delay/skew
constraints
Clock ports with cap, location info. Grid-wires, routing track locations from FCL database Distance/cap constraints Global clock routed database Automated Global Clock Distribution
50
Implementation Overview
read/update full-chip layout
routing tool
solution for the problem of routing with capacitance and constraints
FCL Tool TCL Interface Tree Growing Heuristic (C++ implementation) PV archive Layout Archive
51
Power Comparison (NS vs. TG)
5 10 15 20 25 1 2 3 4 5 6 7 8 9 10 11 NS TG
Wirelength/Power Comparison
improves wirelength by 17%,
source (NS) heuristic
algorithms
time within ±4 ps
– Mostly, due to the ports close to and far away from grid drivers
Wirelength Comparison (NS vs. TG)
5000 10000 15000 20000 25000 30000 1 2 3 4 5 6 7 8 9 10 11 NS TG
52
Global Clock Distribution: Layout Pictures
Layout area: with grid wires and tracks After global clock distribution
53
Global Clock Distribution: Layout Pictures
Layout area with grid wires and tracks After global clock distribution (with ports marked by blue squares)
54
Global Clock Distribution: Layout Pictures
Horizontal wires Vertical wires
Zoomed in picture (from slide 13)
After global clock distribution (with ports marked by blue squares) M8 grid wires
55
Agenda
56
Summary
constraints, which arises in microprocessor clock distribution
seconds and improves wirelength by 17%, on an average,
microprocessor, leading to
– Productivity improvement, since turn-around time is in minutes – Risk reduction, since potential heavily loaded grid-wires can be identified early on – Power saving because of wirelength reduction
57
Future Directions
– Better heuristics – Approximation algorithms
– Leads to a problem similar to bin-packing
58
Backup