integra integra fast multi bit flip flop clustering for
play

INTEGRA: INTEGRA: Fast Multi-Bit Flip-Flop Clustering for Clock - PowerPoint PPT Presentation

INTEGRA: INTEGRA: Fast Multi-Bit Flip-Flop Clustering for Clock Power Saving Based on g Interval Graphs I RIS H UI -R U J IANG C HIH -L ONG C HANG Y U M ING Y ANG Y U -M ING Y ANG NCTU NCTU E VAN Y U -W EN T SAI L ANCER S HENG -F ONG C HEN L


  1. INTEGRA: INTEGRA: Fast Multi-Bit Flip-Flop Clustering for Clock Power Saving Based on g Interval Graphs I RIS H UI -R U J IANG C HIH -L ONG C HANG Y U M ING Y ANG Y U -M ING Y ANG NCTU NCTU E VAN Y U -W EN T SAI L ANCER S HENG -F ONG C HEN L ANCER S HENG F ONG C HEN IRIS Lab Nat’l Chiao Tung Univ. / Faraday Tech Corp.

  2. Outline 2 Introduction Introduction Introduction Introduction Problem & properties Problem & properties Algorithm - INTEGRA Algorithm - INTEGRA Experimental results Experimental results Conclusion Conclusion INTEGRA - ISPD'11

  3. Clock Power Dominates! 3  Power has become one bottleneck for circuit implementation  Clock power is the major dynamic power source p j y p  The clock signal toggles in each cycle  High switching activity  Clock power model: dynamic power  P clk = C clk V dd f clk  P = C V 2 f  C clk : switching capacitance charged/discharged by clock D Q D Q … Comb ckt … clock tree clk clk power 27% Clock network C clk Power breakdown of an ASIC Chen et al . Using multi-bit flip-flop for clock power INTEGRA - ISPD'11 Clock root saving by DesignCompiler. SNUG , 2010.

  4. Multi-Bit Flip-Flops 4  A multi-bit flip-flop (MBFF)  Cluster several single-bit flip-flops (share the drive strength) g p p ( g ) Single-bit flip-flop Dual-bit flip-flop clk Master Master Slave Slave Q Q 1 D 1 latch latch Master Slave Q D latch latch high drive clk strength Master Slave Q 2 D 2 latch latch  Save flip-flop power and area Bit number 1 2 4 Normalized power per bit Normalized power per bit 1 000 0 860 0 780 1.000 0.860 0.780 Normalized area per bit 1.000 0.960 0.713 INTEGRA - ISPD'11

  5. Clock Power Saving using MBFFs (1/2) 5  Reduce switching capacitance charged/discharged by clock Switching capacitance Switching capacitance Clock power saving Clock power saving Other benefits Other benefits Clock sinks Small FF capacitance: Small area: (Flip-flops) Share C into FF clock pins Share the inverter chain Clock network Small wire/buf capacitance: p Regular topology and g p gy #leaf   depth  #buffer  (wires, clock buffers) easy skew control Clock root Clock root 8 C 8 C FF 3 C 3 C FF Pokala et al. Physical synthesis for INTEGRA - ISPD'11 performance optimization. ASIC , 1992.

  6. Clock Power Saving using MBFFs (2/2) 6  Clock power reduction can be significant  FF clock pins, clock buffers/inverters, wires in clock network p , ,  Wire power overhead on data pins is small  Wirelength on data pins << total wirelength D Q D Q D Q D Q D Q Comb ckt Comb ckt … … clk clk clk clk clk Clock network clock tree power 27% INTEGRA - ISPD'11 Clock root

  7. Prior Works on MBFF Clustering 7  Logic synthesis Logic synthesis w/  [Chen et al. , SNUG-10] [ , ] MBFF clustering MBFF clustering  Early physical synthesis Placement  [Hou et al. , ISQED-09] Timing analysis Timing analysis  Post-placement: timing and routing  Post-placement: timing and routing  [Yan and Chen, ICGCS-10] Post-placement MBFF clustering  Minimum clique paritioning  Greedy clustering G d l t i Legalization  Contiguous and infinite MBFF library Clock tree synthesis  [Chang et al ., ICCAD-10]  Window-based clustering Routing  Maximum independent set  Discrete and finite MBFF library y INTEGRA - ISPD'11

  8. INTEGRA 8  Since post-placement MBFF clustering is NP-hard, our goal is to solve it effectively and efficiently instead of optimally.  Do not enumerate all possible combinations (maximal cliques)  Do not relate to the number of layout grids/bins  Do not manipulate on a general graph  Do not manipulate on a general graph  Features:  Efficient representation: a pair of linear size sequences  Efficient representation: a pair of linear-size sequences  Fast operations: coordinate transformation  Few decision points: #decision points << #flip-flops  We cluster flip-flops at only decision points thus leading to an efficient clustering scheme.  Global relationships among flip-flops: cross bin boundaries INTEGRA - ISPD'11

  9. Outline 9 Introduction Introduction Problem & properties Algorithm - INTEGRA Experimental results Conclusion INTEGRA - ISPD'11

  10. The Multi-Bit Flip-Flop Clustering Problem 10  Clock power saving using multi-bit flip flops WL  Given D Q D Q  MBFF library clk clk  Nelist & Placement Power Power  Timing slack constraints (in terms of wirelength)  Timing slack constraints (in terms of wirelength) Clock network  Placement density constraint  Find  MBFF clustering to MBFF l t i t  Minimize  Clock dynamic power  Wirelength  Subject to  Timing slack constraints (in terms of wirelength) g ( g )  Placement density constraints INTEGRA - ISPD'11

  11. MBFF Library 11  MBFF library  Lexicographical order: <1,100,100>, <2,172,192>, <4,312,285> g p , , , , , , , , Normalized Normalized Bit number Power Area power per bit area per bit 1 1 100 100 100 100 1 00 1.00 1 00 1.00 2 172 192 0.86 0.96 4 312 285 0.78 0.71 INTEGRA - ISPD'11

  12. Placement 12  Chip area = W c H c bins = WH grids  Flip-flops should be placed on grid (left-bottom corner) p p p g ( )  Placement density constraint for bin b :  A fb ≤ T b (W b H b A g − A pb ) − A cb  A : FF area  A fb : FF area  A cb : Combinational logic area  A pb : macro area W c W b  A g : grid area A id Bin  T b : target density H b Grid Grid H c point Macro Grid A px W = W c W b c b A py H = H c H b A pb = A px A py INTEGRA - ISPD'11

  13. Timing Slack and Feasible Region 13 Input slack Feasible region  Slack  wirelength F r ( i ) Slope = +1 p Slope = -1 p S fo ( i ) i i S ( i ) S fi ( i ) S ( i ) S fi ( i ) Fanout gate Fanin gate Fanin gate D Q Multiple-fanout: Comb ckt Comb ckt multiple fanout diamonds clk INTEGRA - ISPD'11

  14. Coordinate Transformation (1/3) F ( i ) F r ( i ) 14  It’s hard to F r ( i ) determine if a grid f o ( i ) S fo ( i ) fo ( ) point is located inside or outside S fi ( i ) the feasible region f i ( i ) f ( i ) y Fanout gate x I x' ( i ) Fanin gate y' = e y' ( i )  Rotate 45  clockwise; we ; have rectangles I y' ( i ) instead  Easy checking! y g y' = s y' ( i ) y' x' = s x' ( i ) x' = e x' ( i ) x' INTEGRA - ISPD'11

  15. Coordinate Transformation (2/3) 15  Coordinate transformation is done by integer operations x' = y + x x = ( x' - y' )/2 1 1 S Scaling factor: li f t y' = y - x y = ( x' + y' )/2 1 y x' 1 C =  2 C' Grid point Grid point Non-grid Non grid =( H , H ) C' = ( H+W , H-W ) C' (0, H ) C ( W , H ) C y' Bin Grid  /4 x ( W , 0) C (0, 0) C =( W , -W ) C' = (0, 0) C' ( , ) C ( , ) C INTEGRA - ISPD'11

  16. Coordinate Transformation (3/3) 16 F r ( j 2 ) ( x 0 , y 0 + S ) F r ( i ) f o ( i ) F ( j ) F r ( j ) S j = { j 1 , j 2, j 3 } ( x 0 , y 0 ) ( x 0 - S , ( x 0 + S , y 0 ) F r ( j 1 ) y 0 ) y 0 ) yF r ( j 3 ) F ( j 3 ) f i ( i ) f i ( i ) ( x 0 , y 0 - S ) y y x x x I ( i ) I x' ( i ) y' = e y' ( i ) y' = y' 0 + S S 2 S 2 S ( x' 0 , y' 0 ) I y' ( i ) y' = y' 0 - S I y' ( j ) x' = x' 0 - S y = s y' ( i ) y' = s ( i ) y' ' y' ' x' = x' 0 + S y' x' = s x' ( i ) x' = e x' ( i ) x' x' I x' ( j ) x' INTEGRA - ISPD'11

  17. Outline 17 Introduction Introduction Problem & properties Algorithm - INTEGRA Experimental results Conclusion INTEGRA - ISPD'11

  18. Overview of INTEGRA 18 Analyzes the design intent Analyzes the design intent y y g g 1. 1. Initialization Initialization Finds a decision point in X’ and Finds a decision point in X’ and 2. 2. extracts the essential flip-flops and extracts the essential flip-flops and their related flip-flops their related flip-flops Flip-flop clustering Flip flop clustering Finds the maximal clique in the Finds the maximal clique in the 3. 3. partial Y’ for each essential flip-flop partial Y’ for each essential flip-flop Flip-flop placement Clusters each essential flip-flop Clusters each essential flip-flop p p p p 4. 4. Places the clustered flip-flop at a Places the clustered flip-flop at a 5. 5. legal location with routing cost and legal location with routing cost and Any more density consideration density consideration y y FFs? Y Y Repeats steps 2–5 until all flip- Repeats steps 2–5 until all flip- 6. 6. N flops are investigated flops are investigated Done INTEGRA - ISPD'11

  19. Example (1/5) 19 Initial Transformed 10 FF0 FF0 FF7 FF7 9 9 FF1 FF1 FF6 FF6 8 FF5 FF5 7 FF4 FF4 6 5 FF3 FF3 4 3 2 2 FF2 FF2 1 y y' 0 1 2 3 4 5 6 7 8 9 10 x x x' x' INTEGRA - ISPD'11

  20. Example (2/5) - Representation p 0] 0] [7,10 [0,10 y' ' [5,9] ] ] [1,2] ] [0,5] [2,7] ] [7,8] ] [4,9] ] 20 10 10  Two interval graphs FF0 FF0 FF7 FF7 8 9 9 FF1 FF1 FF6 FF6 8 FF5 FF5 FF5 FF5 3 4 5 6 7 7 FF4 FF4 6 5 FF3 FF3 4 0 1 2 3 3 3 2 FF2 FF2 1 y' 0 1 2 3 4 5 6 7 8 9 10 0 2 3 4 5 6 7 1 x x' 0 1 2 3 4 5 6 7 8 9 10 x' 0 0 [0,4] [0,4] 1 [1,3] [0,7] 2 3 [1,9] [4 6] [4,6] 4 [0,9] 5 6 [8,10] INTEGRA - ISPD'11 7 [2,8]

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend