Ultra-Fast Interconnect Driven Cell Cloning For Minimizing Critical - - PowerPoint PPT Presentation

ultra fast interconnect driven cell cloning for
SMART_READER_LITE
LIVE PREVIEW

Ultra-Fast Interconnect Driven Cell Cloning For Minimizing Critical - - PowerPoint PPT Presentation

Ultra-Fast Interconnect Driven Cell Cloning For Minimizing Critical Path Delay Zhuo Li 1 , David A. Papa 2,1 , Charles J. Alpert 1 , Shiyan Hu 3 , Weiping Shi 4 , C. N. Sze 1 and Ying Zhou 1 IBM Austin Research Lab 1 Dept. EECS, University of


slide-1
SLIDE 1

Ultra-Fast Interconnect Driven Cell Cloning For Minimizing Critical Path Delay

Zhuo Li1, David A. Papa2,1, Charles J. Alpert1, Shiyan Hu3, Weiping Shi4, C. N. Sze1 and Ying Zhou1 IBM Austin Research Lab1

  • Dept. EECS, University of Michigan2
  • Dept. ECE, Michigan Technological University3
  • Dept. ECE, Texas A&M University4
slide-2
SLIDE 2

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 2

Best Value Toys

Global placement Buffering Gate Sizing Cell movement Vt assignment Layer assignment Routability analysis / recovery Cloning?

$15K

slide-3
SLIDE 3

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 3

Best Value Toys

Global placement Buffering Gate Sizing Cell movement Vt assignment Layer assignment Routability analysis / recovery Cloning

slide-4
SLIDE 4

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 4

Cloning in Logic Synthesis

S2 S5 S3 S4 S1 S6 S7 S8 S9 S100 …

Hwang, ICCAD 1992;

  • J. Lillis, ISCAS 1996;
  • A. Srivastava, TCAD 2001;

  • Reduce net cut and total capacitance load (NP-

hard)

  • Ignore physical information (interconnect,

buffering, …)

S21 S60 S35 S47 S1 S17 S80 S25 S71 S2

slide-5
SLIDE 5

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 5

F1 F2 S1 S2

Interconnect Driven Cloning

1 1 3 3 Slack: 1 Slack: -1 F1 F2 S1 S2 Slack: 1 Slack: -0.5 P P P’ 3 1 1 AT(D1) = AT(D2) = 0 RAT(S1) = RAT(S2) = 5 2.5

slide-6
SLIDE 6

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 6

S1 F1 F2 S2

Interconnect Driven Cloning

1 1 3 3 Slack: 1 Slack: -1 Slack: 1 Slack: 1 F1 F2 S1 S2 P P P’ AT(D1) = AT(D2) = 0 RAT(S1) = RAT(S2) = 5 3 3

slide-7
SLIDE 7

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 7

Our Contribution

Find the “optimal” partitioning and placement of the

  • riginal and duplicated gates

 Assuming linear-buffer-delay model  O(n) algorithm when original gate is fixed  O(nlogn) algorithm when original gate is movable  Just focus on worst slack  For interconnect delay dominant sub-circuit  Extensions

Back of envelop filter

 Logic based cloning: High fan-outs/capacitive load  Physical based cloning: special fan-out location

distributions

slide-8
SLIDE 8

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 8

Cloning Problem

A sub-circuit Two-pin timing arcs D = σ · dis(G1, G2) Clone P to P’, find the partitioning of S and locations of P and P’, to maximize sub-circuit slack P F (Fan-ins) S (Fan-outs)

slide-9
SLIDE 9

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 9

Cloning Problem

P P’ F (Fan-ins) S (Fan-outs) A sub-circuit Two-pin timing arcs D = σ · dis(G1, G2) Clone P to P’, find the partitioning of S and locations of P and P’, to maximize sub-circuit slack

slide-10
SLIDE 10

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 10

Cloning Problem

Reduce to a gate placement problem when the partitioning is given (RUMBLE ISPD08 and Pyramids ICCAD08) Perform real buffering after cloning

O Fanins Fanouts

slide-11
SLIDE 11

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 11

Arrival Time Arc

Each fan-in gate has an arrival time AT(Fi) For each physical point v, AT(v) = max(AT(Fi) + τ . Dis(AT(Fi) , v) The set of points minimizing AT(v) is arri rrival t im e arc rc K(F) K(F) is either an Manhattan arc or a single point Similar to Deferred Merge Embedding (DME) K(F) is also the bottom of a trough AT(v) (overlapping of a set of reverse pyramids)

AT = 1 AT = 3 AT = 5 K(F) AT = 1 AT = 5 K(F) K(F) K(F)

slide-12
SLIDE 12

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 12

Z

Best Region and Best Arrival Time Arc

K(S) : required arrival time arc (maximizing RAT(v)) Best est reg egion Z : every point inside this region has maximum sub-circuit slack (constructed with K(F) and K(S)) Best Arri rrival al Tim e arc arc B is the intersection of Best Region and Arrival Time Arc Define K(Fi) as the arrival time arc for F1, …, Fi, O(n) time to compute K(F), K(S), Z and B. Also O(n) time to compute all K(Fi) and K(Si), instead of O(n2) time.

K(F) K(F) K(S) B K(F) K(S)

Z

B K(S) B

slide-13
SLIDE 13

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 13

Case 1: P is movable

No matter what the partitioning is, one can place P and P’

  • n best arrival time arc, while still achieving the best slack

Divide the whole plane into 6 regions based on slack cuves

i j

H1

slack

i j

H3

slack

i j

H2

slack

i j

H4

slack

i j H1 H2 H2 H3 H4 H4 H5 H5 H6

i j

H5

slack

i j

H6

slack

K(F)

slide-14
SLIDE 14

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 14

Case 1: P is movable (Cont.)

If no gates in H6, O(n) time algorithm

i j

H1

slack

i j

H3

slack

i j

H2

slack

i j

H4

slack

i j H1 H2 H2 H3 H4 H4 H5 H5 H6

i j

H5

slack

i j

H6

slack

K(F)

slide-15
SLIDE 15

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 15

Case 1: P is movable (Cont.)

If there are gates in H6, treat all slack curves as 3-segment trapezoid-like curves O(n) time algorithm i j

slide-16
SLIDE 16

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 16

Case 1: P is movable (Cont.)

If there are gates in H6, treat all slack curves as 3-segment trapezoid-like curves O(n) time algorithm i j P P’

slide-17
SLIDE 17

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 17

Case 2: P is fixed

P may not be on K(F) SlackP(i) = RAT(i) – τ . Dis(P, i) – AT(P) At most O(n) partitions since there are only n possible worst slack values for any partitioning Sort Si accordingly Let P drive the set of fan-outs { S1} , { S1, S2} , { S1, S2, S3} , … O(nlogn) time algorithm Dis(P ,i) SlackP

slide-18
SLIDE 18

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 18

One Example

P F2 F1 S2 S1

P F2 S2 S1 F1

Original circuit After buffering

  • 1.2
  • 0.6
slide-19
SLIDE 19

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 19

P F2 P’ S2 S1 F1 Buffer

One Example

RUMBLE P is fixed

  • 0.5

P F2 S2 S1 F1

  • 0.8
  • 0.8
slide-20
SLIDE 20

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 20

F1 P F2 P’ S2 S1

One Example

P is movable Bad wirelength solution

  • 0.5

F1 P F2 P’ S2 S1

slide-21
SLIDE 21

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 21

Experimental Results

100 random 65 nm sub-circuits

 P is fixed: 279 ps better than pure buffering, 87 ps

better than RUBMLE on average

 P is movable: 309 ps better than pure buffering,

117 ps better than RUMBLE on average

65 nm macros # objs Single transform Compare to a flow with pure buffering Area Increase Slack Imprv. FOM Imprv. Slack Imprv. FOM Imprv. Macro 1 91k 0.480 ns 438 0.097 ns

  • 8

0.5% Macro 2 231k 0.098 ns 0.081 ns 200 0.8% Macro 3 191k 0.383 ns 2837 0.124 ns 280 1%

slide-22
SLIDE 22

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 22

Extensions

Duplicate more than two gates

 O(n2) algorithm

Be smart about Z regions

 Latches  Blockages  Wire-length  FOM extension

K(F) K(S)

Z

B

blockage

slide-23
SLIDE 23

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 23

Vironoi Diagram Partitioning

Best slack for every sink with blockages If we know the locations of P and P’

 The optimal partitioning is the Voronoi diagram between

two points or a point and a diamond in Manhattan space

 Only O(n3) possible partitionings  Try all partitionings and find the best one

P P’ P P’ 5 5 A B 1 5 5 A B

slide-24
SLIDE 24

3/18/2010 Optimal Timing-Driven Cloning - ISPD 2010 24