ultra fast interconnect driven cell cloning for
play

Ultra-Fast Interconnect Driven Cell Cloning For Minimizing Critical - PowerPoint PPT Presentation

Ultra-Fast Interconnect Driven Cell Cloning For Minimizing Critical Path Delay Zhuo Li 1 , David A. Papa 2,1 , Charles J. Alpert 1 , Shiyan Hu 3 , Weiping Shi 4 , C. N. Sze 1 and Ying Zhou 1 IBM Austin Research Lab 1 Dept. EECS, University of


  1. Ultra-Fast Interconnect Driven Cell Cloning For Minimizing Critical Path Delay Zhuo Li 1 , David A. Papa 2,1 , Charles J. Alpert 1 , Shiyan Hu 3 , Weiping Shi 4 , C. N. Sze 1 and Ying Zhou 1 IBM Austin Research Lab 1 Dept. EECS, University of Michigan 2 Dept. ECE, Michigan Technological University 3 Dept. ECE, Texas A&M University 4

  2. Best Value Toys Global Routability placement analysis / recovery Cell Buffering movement Vt assignment Layer assignment Gate Sizing $15K Cloning? Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 2

  3. Best Value Toys Global Routability placement analysis / recovery Cell Buffering movement Vt assignment Layer assignment Gate Sizing Cloning Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 3

  4. Cloning in Logic Synthesis S 1 S 1 S 21 S 2 S 35 S 3 S 47 S 4 S 60 S 5 S 2 S 6 S 17 S 7 S 25 S 8 Hwang, ICCAD 1992; S 71 S 9 J. Lillis, ISCAS 1996; A. Srivastava, TCAD 2001; S 80 … … • Reduce net cut and total capacitance load (NP- hard) S 100 • Ignore physical information (interconnect, Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 4 buffering, …)

  5. Interconnect Driven Cloning 1 1 1 1 P F 1 P S 1 F 1 S 1 Slack: 1 P’ Slack: 1 3 3 3 2.5 F 2 F 2 S 2 S 2 Slack: -1 Slack: -0.5 AT(D 1 ) = AT(D 2 ) = 0 RAT(S 1 ) = RAT(S 2 ) = 5 Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 5

  6. Interconnect Driven Cloning 1 1 F 1 P S 1 F 1 P S 1 Slack: 1 Slack: 1 3 3 3 3 F 2 F 2 S 2 P’ S 2 Slack: -1 Slack: 1 AT(D 1 ) = AT(D 2 ) = 0 RAT(S 1 ) = RAT(S 2 ) = 5 Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 6

  7. Our Contribution Find the “optimal” partitioning and placement of the original and duplicated gates  Assuming linear-buffer-delay model  O(n) algorithm when original gate is fixed  O(nlogn) algorithm when original gate is movable  Just focus on worst slack  For interconnect delay dominant sub-circuit  Extensions Back of envelop filter  Logic based cloning: High fan-outs/capacitive load  Physical based cloning: special fan-out location distributions Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 7

  8. Cloning Problem A sub-circuit Two-pin timing arcs D = σ · dis(G 1 , G 2 ) Clone P to P’, find the partitioning of S and locations of P and P’ , to maximize sub-circuit slack S (Fan-outs) P F (Fan-ins) Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 8

  9. Cloning Problem A sub-circuit Two-pin timing arcs D = σ · dis(G 1 , G 2 ) Clone P to P’, find the partitioning of S and locations of P and P’ , to maximize sub-circuit slack S (Fan-outs) P P’ F (Fan-ins) Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 9

  10. Cloning Problem Reduce to a gate placement problem when the partitioning is given (RUMBLE ISPD08 and Pyramids ICCAD08) Perform real buffering after cloning Fanouts O Fanins Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 10

  11. Arrival Time Arc Each fan-in gate has an arrival time AT(F i ) For each physical point v, AT(v) = max(AT(F i ) + τ . Dis(AT(F i ) , v) The set of points minimizing AT(v) is arri rrival t im e arc rc K(F) K(F) is either an Manhattan arc or a single point Similar to Deferred Merge Embedding (DME) K(F) is also the bottom of a trough AT(v) (overlapping of a set of reverse pyramids) AT = 1 AT = 1 AT = 5 K(F) K(F) AT = 5 AT = 3 K(F) K(F) Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 11

  12. Best Region and Best Arrival Time Arc K(S) : required arrival time arc (maximizing RAT(v) ) Best est reg egion Z : every point inside this region has maximum sub-circuit slack (constructed with K(F) and K(S) ) Best Arri rrival al Tim e arc arc B is the intersection of Best Region and Arrival Time Arc Define K(F i ) as the arrival time arc for F 1 , …, F i , O(n) time to compute K(F) , K(S), Z and B. Also O(n) time to compute all K(F i ) and K(S i ) , instead of O(n 2 ) time. B K(S) K(S) K(F) Z Z B B K(S) K(F) K(F) Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 12

  13. Case 1: P is movable No matter what the partitioning is, one can place P and P’ on best arrival time arc, while still achieving the best slack Divide the whole plane into 6 regions based on slack cuves slack slack H 3 H 4 H 5 H 1 H 3 j i j i j slack slack H 6 H 2 H 4 H 2 H 4 K(F) i j i j i H 1 H 2 H 5 slack slack H 5 H 6 i j i j Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 13

  14. Case 1: P is movable (Cont.) If no gates in H 6 , O(n) time algorithm slack slack H 3 H 4 H 5 H 1 H 3 j i j i j slack slack H 6 H 2 H 4 H 2 H 4 K(F) i j i j i H 1 H 2 H 5 slack slack H 5 H 6 i j i j Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 14

  15. Case 1: P is movable (Cont.) If there are gates in H 6 , treat all slack curves as 3-segment trapezoid-like curves O(n) time algorithm i j Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 15

  16. Case 1: P is movable (Cont.) If there are gates in H 6 , treat all slack curves as 3-segment trapezoid-like curves O(n) time algorithm P P’ i j Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 16

  17. Case 2: P is fixed P may not be on K(F) Slack P (i) = RAT(i) – τ . Dis(P, i) – AT(P) At most O(n) partitions since there are only n possible worst slack values for any partitioning Sort S i accordingly Let P drive the set of fan-outs { S 1 } , { S 1 , S 2 } , { S 1 , S 2 , S 3 } , … O(nlogn) time algorithm Slack P Dis(P ,i) Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 17

  18. One Example Original circuit After buffering S 1 S 1 -1.2 F 1 F 1 P P F 2 F2 -0.6 S 2 S 2 Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 18

  19. One Example RUMBLE P is fixed S 1 S 1 -0.8 0 Buffer F 1 P’ F 1 P P F 2 F 2 -0.5 -0.8 S 2 S 2 Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 19

  20. One Example P is movable Bad wirelength solution S 1 S 1 0 0 F 1 F 1 P’ P’ F 2 P F 2 0 -0.5 S 2 P S 2 Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 20

  21. Experimental Results 100 random 65 nm sub-circuits  P is fixed: 279 ps better than pure buffering, 87 ps better than RUBMLE on average  P is movable: 309 ps better than pure buffering, 117 ps better than RUMBLE on average 65 nm # objs Single transform Compare to a flow with Area macros pure buffering Increase Slack FOM Slack FOM Imprv. Imprv. Imprv. Imprv. Macro 1 91k 0.480 ns 438 0.097 ns -8 0.5% Macro 2 231k 0.098 ns 0 0.081 ns 200 0.8% Macro 3 191k 0.383 ns 2837 0.124 ns 280 1% Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 21

  22. K(S) Extensions Z Duplicate more than B blockage two gates K(F)  O(n 2 ) algorithm Be smart about Z regions  Latches  Blockages  Wire-length  FOM extension Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 22

  23. Vironoi Diagram Partitioning Best slack for every sink with blockages If we know the locations of P and P’  The optimal partitioning is the Voronoi diagram between two points or a point and a diamond in Manhattan space  Only O(n 3 ) possible partitionings  Try all partitionings and find the best one 5 1 A P P 5 A 5 P’ P’ 5 B B Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 23

  24. Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend