An Algorithm for Routing With Capacitance/Distance Constraints for - - PowerPoint PPT Presentation

an algorithm for routing with capacitance distance
SMART_READER_LITE
LIVE PREVIEW

An Algorithm for Routing With Capacitance/Distance Constraints for - - PowerPoint PPT Presentation

An Algorithm for Routing With Capacitance/Distance Constraints for Clock Distribution in Microprocessors Rupesh S. Shelar Technology & Manufacturing Group Intel Corporation, Hillsboro, OR March 31 st 2009, ISPD 2009, San Diego Objective


slide-1
SLIDE 1

Rupesh S. Shelar

Technology & Manufacturing Group Intel Corporation, Hillsboro, OR

March 31st 2009, ISPD 2009, San Diego

An Algorithm for Routing With Capacitance/Distance Constraints for Clock Distribution in Microprocessors

slide-2
SLIDE 2

2

Objective

  • Explain a problem of routing with capacitance and distance

constraints, which arises in microprocessor clock distribution

  • Present a solution to the problem
slide-3
SLIDE 3

3

Agenda

  • Introduction
  • Problem Formulation
  • Routing algorithm
  • Experimental Results
  • Conclusion
slide-4
SLIDE 4

4

Clock Distribution

  • Most, if not all, digital circuits are synchronous
  • All signals timed wrt. to clocks
  • Clock distribution requirements

– Noise, SI – Skew – Delay – Slope – Power

slide-5
SLIDE 5

5

Clock Network Classification

  • Can be classified depending on underlying structure

– Grid + buffered trees

  • High performance (GHz) processors; less skew possibly at the cost of power
  • Relatively less automation; most of the design is manual
  • See Bailey et al., JSSC’98; Kurd et al. JSSC’01

– Buffered trees

  • Most ASICs (~100s of MHz)
  • Supported by most modern physical design tools
  • See Vittal et al. DAC’95, and many others…, Mehta et al. ICCD’97, Shelar

ISPD’07, Maβberg et al. JA’08

– Unbuffered trees

  • Local distribution in ASICs/processors, using zero-skew routing, for example
  • See Tsay ICCAD’93; Edahiro DAC’94

– Link-inserted (buffered) clock trees

  • See Rajaram et al. DAC’04 and many others…
slide-6
SLIDE 6

6

Microprocessor Clock Hierarchy

Local Clock Network: CTS Solution Space

  • Clock network

in most high speed processors: – Distributed as a grid followed by trees

PLL

Global Clock Distribution Using Multiple spines Tunable Grid Buffers Regional Clock Buffers Local Clock Buffers Clock Grid RCBs RCBs LCBs LCBs To state Elements

Global Clock Routing = Post-grid Clock Distribution

slide-7
SLIDE 7

7

Block-level Clock Layout

  • Replicate, place, size clock cells and route clock wires with

shielding/spacing

  • Create block-level ports, aligned with tracks reserved for

global clock distribution

  • Capacitance/delay limits on ports
slide-8
SLIDE 8

8

Microprocessor Layout Hierarchy

  • Entire die divided into several

layout areas

  • Each layout area contains many

blocks

  • Each block contains std. cells or

macros

  • Each layout area contains

– 100s to 1000s block-level clock ports – Grid-wires and tracks reserved for clock routes, typically in upper metal layers

An example layout area

Grid wires

slide-9
SLIDE 9

9

Post-grid Global Clock Distribution

  • Entire die divided into several layout

areas

  • Each layout area contains many

blocks

  • Each block contains std. cells or

macros

  • Each layout area contains

– 100s to 1000s block-level clock ports – Grid-wires and tracks reserved for clock routes, typically in upper metal layers

An example layout area

Grid wires Vertical reserved tracks

slide-10
SLIDE 10

10

Post-grid Global Clock Distribution

  • Entire die divided into several layout

areas

  • Each layout area contains many

blocks

  • Each block contains std. cells or

macros

  • Each layout area contains

– 100s to 1000s block-level clock ports – Grid-wires and tracks reserved for clock routes, typically in upper metal layers

An example layout area

Grid wires Vertical reserved tracks Horizontal reserved tracks

slide-11
SLIDE 11

11

Post-grid Global Clock Distribution

  • Entire die divided into several layout

areas

  • Each layout area contains many

blocks

  • Each block contains std. cells or

macros

  • Each layout area contains

– 100s to 1000s block-level clock ports – Grid-wires and tracks reserved for clock routes, typically in upper metal layers – Ports marked by blue squares

An example layout area

Grid wires Vertical reserved tracks Horizontal reserved tracks

slide-12
SLIDE 12

12

Grid wires

Post-grid Global Clock Distribution

  • Entire die divided into several layout

areas

  • Each layout area contains many

blocks

  • Each block contains std. cells or

macros

  • Each layout area contains

– 100s to 1000s block-level clock ports – Grid-wires and tracks reserved for clock routes, typically in upper metal layers – Ports marked by blue squares Vertical reserved tracks Horizontal Reserved tracks

Zoomed in picture (before-routing)

slide-13
SLIDE 13

13

Horizontal wires

Post-grid Clock Distribution

  • Entire die divided into several layout

areas

  • Each area contains many blocks
  • Each block contains std. cells or

macros

  • Each layout area contains 100s to

1000s block-level clock ports

  • Contains grid-wires and tracks

reserved for clock routes, typically in upper metal layers – Ports marked by blue squares

Vertical wires M8 grid wires

Zoomed in picture (post-routing)

slide-14
SLIDE 14

14

Motivation for Fast Post-Grid Global Clock Distribution

  • Global clock wires contribute significant load on the clock grid
  • Without accurate (estimated/extracted) global clock wire load,

simulations yield inaccurate arrival times at block-level clock pins

– Affects timing convergence: path reordering due to actual arrival times

  • Design space constrained by clock as well, and not just timing, area,

power, …

– If loads are too high, the clock may not toggle – Poor slopes at block-level ports may affect the chip-frequency – Block-level timing convergence does affect the load on grid

  • Sizing of sequentials, placing them in one area, splitting clock gating latches for timing
  • Difficult to capture using block-level metric, since the load depends on other blocks as well
slide-15
SLIDE 15

15

Previous work: Post-grid Clock Distribution

  • No published work, possibly, since too specific a problem,

limited to high-performance processors

  • In practice (it was/is), …

– Mostly manual, using stable block-level data

  • Employing the nearest source heuristic
  • May not be the best even from total cap. perspective
  • May violate distance/capacitance constraints, leading to slope violations
  • May lead to violation of load limit on grid-wires

– May have to be performed iteratively

  • Partly, since with the nearest source heuristic, capacitances are ignored
  • If there are slope violations on the receivers or grid-loading issues or block-

level ECOs

  • Weeks of effort during critical tape-in period
  • Affects timing convergence and schedule/time to market
slide-16
SLIDE 16

16

Agenda

  • Introduction
  • Problem Formulation
  • Routing algorithm
  • Experimental Results
  • Conclusion
slide-17
SLIDE 17

17

Example

M6 track

M8 Grid Wire M8 Grid Wire

M7 track M5 port

Global clock routing problem instance A routing solution

slide-18
SLIDE 18

18

M6 track

M8 Grid Wire M8 Grid Wire

M7 track M5 port

Problem Formulation: Graph Construction

p1 p3 c1 c6 s1 s3

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1

slide-19
SLIDE 19

19

Problem Statement

  • Find trees connecting s* nodes to

p* nodes such that

– For any tree T, distance from sT* to pT* ≤ Distancelimit – For any tree T, ∑Cap(pT*) ≤ Caplimit – ∑over all trees T* Wirelength(T*) is minimum

  • One more constraint (ignored for

now)

– Grid loading constraint: Loads including that of interconnects on a grid wire is less than specified limit

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1

slide-20
SLIDE 20

20

Trees to Global Clock Routes

A routing solution

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 p3 p2 p9 T1 T3 T2 T4

slide-21
SLIDE 21

21

Tress to Global Clock Routes

c1 c6 p1 s4 c9

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 p3 p2 p9

Another routing solution

T1 T3 T2

slide-22
SLIDE 22

22

Agenda

  • Introduction
  • Problem Formulation
  • Routing algorithm
  • Experimental Results
  • Conclusion
slide-23
SLIDE 23

23

Algorithm for Routing with Capacitance and Distance constraint

  • Multi-source/multi-destination routing problem with capacitance and

distance constraint

– Can be transformed to clustering with capacity constraints to minimize wirelength, which is NP-complete

  • Efficient heuristics/approximation algorithms needed to solve the

problem

  • Tree Growing Heuristic:

– Start growing trees from the source nodes – Grow trees by adding edges till all ports are connected

– Edges are sorted in ascending order of the wire-cap to minimize total wire-cap due to global clock routes

– Add edges iff:

– Doing so does not violate capacitance/distance constraints – The node is already not connected to grid by some other route (tree)

slide-24
SLIDE 24

24

Tree Growing Heuristic: Example

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1 s2

Tree Growing

c1 c6 p1 s4 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1

13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13

slide-25
SLIDE 25

25

Tree Growing Heuristic: Example

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1

Tree Growing

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1

13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13

slide-26
SLIDE 26

26

Tree Growing Heuristic: Example

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1

Tree Growing

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1

13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13

slide-27
SLIDE 27

27

Tree Growing Heuristic: Example

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1

Tree Growing

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1

13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13

slide-28
SLIDE 28

28

Tree Growing Heuristic: Example

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1

Tree Growing

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1

13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13

slide-29
SLIDE 29

29

Tree Growing Heuristic: Example

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1

Tree Growing

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1

13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13

slide-30
SLIDE 30

30

Tree Growing Heuristic: Example

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1

Tree Growing

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1

13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13

slide-31
SLIDE 31

31

Tree Growing Heuristic: Example

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1

Tree Growing

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1

13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13

slide-32
SLIDE 32

32

Tree Growing Heuristic: Example

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1

Tree Growing

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1

13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13

slide-33
SLIDE 33

33

Tree Growing Heuristic: Example

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1

Tree Growing

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1

13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13

slide-34
SLIDE 34

34

Tree Growing Heuristic: Example

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1

Tree Growing

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1

13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13

slide-35
SLIDE 35

35

Tree Growing Heuristic: Example

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1

Tree Growing

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1

13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13

slide-36
SLIDE 36

36

Tree Growing Heuristic: Example

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1

Tree Growing

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1

13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13

slide-37
SLIDE 37

37

Tree Growing Heuristic: Example

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1

Tree Growing

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1

13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13

slide-38
SLIDE 38

38

Tree Growing Heuristic: Example

Routing Graph

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s3 s1 s3 s1

Tree Growing

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1

13 13 13 4 8 5 9 10 15 10 22 8 17 15 18 13 13

slide-39
SLIDE 39

39

Remove unnecessary trees/edges

c1 c6 p1 s4 c9

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 p3 p2 p9 s1 s1

Tree Growing

c1 c6 p1 s4 s2 c4 c9 c5

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 c7 p3 p2 p9 s1 s1

Tree Growing Heuristic: Example

slide-40
SLIDE 40

40

Tree Growing Heuristic: Example

Routing solution Routing solution on graph

c1 c6 p1 s4 c9

c10

p8 c3 c8 p5 p6 p7 p4 s3 s1 c2 p3 p2 p9 s1 s1

slide-41
SLIDE 41

41

Time complexity and limitations

  • Time complexity: O(|E|2Log|E|), where E is the set of edges in

the routing graph

– |E| ≤ 2|P| + 4 mn = O(|P|+mn), where P is the set of ports, m = number of horizontal tracks or grid-wires, n = number of vertical tracks

  • Run-time, in practice, on microprocessor layout areas: up to

10 seconds

– Typical problem size: 1000 ports; ~500 tracks each in horizontal/vertical direction; area 1000x1000 um2

  • Does not consider grid loading limits yet

– The special case of underlying problem leads to a bin-packing problem

slide-42
SLIDE 42

42

Comparison with Nearest Source Heuristic

c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14 14 5

Routing Graph c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14 14

Solution due to Nearest Source Heuristic

slide-43
SLIDE 43

43

Comparison with Nearest Source Heuristic

c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14 14 5

Routing Graph c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14 14

Solution due to Nearest Source Heuristic c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14 14

Tree Growing Heuristic

5

slide-44
SLIDE 44

44

Comparison with Nearest Source Heuristic

c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14 14 5

Routing Graph c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14 14

Solution due to Nearest Source Heuristic c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14 14

Tree Growing Heuristic

5

slide-45
SLIDE 45

45

Comparison with Nearest Source Heuristic

c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14 14 5

Routing Graph c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14 14

Solution due to Nearest Source Heuristic c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14 14

Tree Growing Heuristic

5

slide-46
SLIDE 46

46

Comparison with Nearest Source Heuristic

c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14 14 5

Routing Graph c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14 14

Solution due to Nearest Source Heuristic c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14 14

Tree Growing Heuristic

5

slide-47
SLIDE 47

47

Comparison with Nearest Source Heuristic

c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14 14 5

Routing Graph c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14 14

Solution due to Nearest Source Heuristic c1 s1 s1 s1 c2 s3 p1 p2 p3

2 14

Solution due to Tree Growing Heuristic

5

Wirelength =30 Wirelength = 21

slide-48
SLIDE 48

48

Agenda

  • Introduction
  • Problem Formulation
  • Routing algorithm
  • Experimental Results
  • Conclusion
slide-49
SLIDE 49

49

Global Clock Distribution Flow

  • Uses

– grid-wires, M6/M7 track locations from full-chip layout – location and cap. on clock ports

  • Considers distance and cap.

constraints on routes from grid

– Specified to ensure good slopes/delay/skew

  • Generates routes obeying above

constraints

Clock ports with cap, location info. Grid-wires, routing track locations from FCL database Distance/cap constraints Global clock routed database Automated Global Clock Distribution

slide-50
SLIDE 50

50

Implementation Overview

  • Uses proprietary tool to

read/update full-chip layout

  • TCL interface to integrate the

routing tool

  • Tree Growing Heuristic: a

solution for the problem of routing with capacitance and constraints

FCL Tool TCL Interface Tree Growing Heuristic (C++ implementation) PV archive Layout Archive

slide-51
SLIDE 51

51

Power Comparison (NS vs. TG)

5 10 15 20 25 1 2 3 4 5 6 7 8 9 10 11 NS TG

Wirelength/Power Comparison

  • Tree growing algorithm (TG)

improves wirelength by 17%,

  • n an average, over nearest

source (NS) heuristic

  • Practical run-times for both

algorithms

  • Deviations from ideal arrival

time within ±4 ps

– Mostly, due to the ports close to and far away from grid drivers

Wirelength Comparison (NS vs. TG)

5000 10000 15000 20000 25000 30000 1 2 3 4 5 6 7 8 9 10 11 NS TG

slide-52
SLIDE 52

52

Global Clock Distribution: Layout Pictures

Layout area: with grid wires and tracks After global clock distribution

slide-53
SLIDE 53

53

Global Clock Distribution: Layout Pictures

Layout area with grid wires and tracks After global clock distribution (with ports marked by blue squares)

slide-54
SLIDE 54

54

Global Clock Distribution: Layout Pictures

Horizontal wires Vertical wires

Zoomed in picture (from slide 13)

After global clock distribution (with ports marked by blue squares) M8 grid wires

slide-55
SLIDE 55

55

Agenda

  • Introduction
  • Problem Formulation
  • Routing algorithm
  • Experimental Results
  • Conclusion
slide-56
SLIDE 56

56

Summary

  • Defined a problem of routing with distance/capacitance

constraints, which arises in microprocessor clock distribution

  • To solve the problem, presented an algorithm that runs in

seconds and improves wirelength by 17%, on an average,

  • ver the nearest source heuristic
  • Employed to carry out clock distribution in 45 nm

microprocessor, leading to

– Productivity improvement, since turn-around time is in minutes – Risk reduction, since potential heavily loaded grid-wires can be identified early on – Power saving because of wirelength reduction

slide-57
SLIDE 57

57

Future Directions

  • Explore improvement possibilities

– Better heuristics – Approximation algorithms

  • Consider load limits on the grid wires

– Leads to a problem similar to bin-packing

slide-58
SLIDE 58

58

Q&A

slide-59
SLIDE 59

Backup