an exact polynomial time algorithm
play

An Exact Polynomial Time Algorithm for Clock Tree Sizing for - PowerPoint PPT Presentation

An Exact Polynomial Time Algorithm for Clock Tree Sizing for Register Files Alexander Berkovich, Lawrence D. Gonzales, Ataur Patwary Intel Corporation Rupesh S. Shelar Synopsys Inc. Agenda Introduction Clock Trees for Register Files


  1. An Exact Polynomial Time Algorithm for Clock Tree Sizing for Register Files Alexander Berkovich, Lawrence D. Gonzales, Ataur Patwary Intel Corporation Rupesh S. Shelar Synopsys Inc.

  2. Agenda • Introduction • Clock Trees for Register Files • Sizing Algorithm • Experimental Results & Conclusion

  3. Background • Memory hierarchy: on-chip, DRAM, SSDs, (X-point), hard drives • On-chip memory: latch/flop, Register files, SRAM • Register files for fast on-chip storage • Employed in processors meant for low power & high performance applications • Have been around for decades

  4. Register Files • Contain bit-cells arranged in rows and columns • Bit-cell optimized for area, power, speed in each technology • Circuitry for read-/write address decoding • Rail-to-rail voltage swing compared to SRAM • Typically, 8-transistor cells for 1R1W • Bit-cells are larger • Use static domino latch instead of sense amplifiers • Commercial tools available for layout of different configurations

  5. Synchronous Operation • Most circuits are synchronous • Register files also accessed in the same fashion • Requires clock tree sizing considering races • Commercial tools avoid races by employing self-timed circuits • Comes at the cost of additional area, power • May not be desirable for high volume processors

  6. Why clock trees for register files are important • Clock trees consume significant power, in general • True for register files also, as all read/write accesses go through clock network • Bit-cells highly optimized for each technology • Layout is also optimized using regular arrangement in rows & columns • Clock trees for RFs have additional constraints, apart from skew, slew • Races between read & precharge, read & write • Tedious, time-consuming to size manually

  7. Agenda • Introduction • Clock Trees for Register Files • Sizing Algorithm • Experimental Results & Conclusion

  8. High-level View ENTRY 0 ENTRY 47 vn7inc30ls0v vn7lap00ls0i1 1 vn7inc30ls0v vn7lap00ls0i1 1 vn7inc30ls0v vn7lap00ls0i1 1 MCLK vn7inc30ls0v vn7lap00ls0i1 edgewrckdrv 1 Write Address Protection Write Address Protection Latches/3:8 Decoder Latches/3:8 Decoder rddecsdlc0 inv CKWRM2N33 RCB WRDECC WRDECC WRDECC Read Address Protection Read Address Protection Latches/3:8 Decoder Latches/3:8 Decoder edgerdpdecc0 CKRDM1N22 RCB RDDECC RDDECC RDDECC RDDECC RDDECC Layout summary of 48 entry x 4-bit array

  9. Clock Tree Topology • Clock cell instance naming BP1 2 BP1 4 BP1 3 AP1 1 BP1 1 OP1 1 convention (for this example only) L1PCH Lo Lo Lo Hi Hi • G = Ganged NAND Relative spec 0: R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 • B = buffer L0PCH • R = RDWL Lo Hi CKRDM1N44 rdbll0 Relative spec 0: • P0(1) = L0(1) Precharge R L0PCH 0.05 BR RDWL[N:0] G Relative spec 1: F L0PCH 5ps AF RDWL[N:0] • A = AND RDWL[N:0] • O = OR .. N .. INSTANCE[A] • Numbers in suffix denote the BP0 1 RCB reverse topological order in which L0PCH Lo Hi CKRDM1N44 RCB rdbll0 MCLK solutions are created Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Hsw default G CKRDM1N22 Relative spec 1: mclk rise F L0PCH 5ps AF RDWL[N:0] GR 2 BR 2 - 55 ar Absolute spec 0: -27 ar RDWL[N:0] RDWL[N]. Absolute spec 2: GR 1 BR 1 27 ar .. N .. INSTANCE[0]

  10. Relative Timing Constraints between L0PCH and RDWL Latest rise and fall Latest L0PCH EFR RDWL LFF RDWL 0 memcell[0] memcell[1] memcell[2] memcell[3] memcell[4] memcell[5] memcell[6] memcell[7] LFF L0PCH rdbll0 EFR L0PCH Level0 Bitline[0] Level0 Bitline[1] ENR RDWL LNF RDWL bit bit bit bit bit bit bit bit 0 LNF L0PCH ENR L0PCH rdbll0 Level0 Bitline[0] Level0 Bitline[1] • Relative timing constraint between RDWL & L0PCH bit bit bit bit bit bit bit bit • Rise transition: earliest (latest) L0PCH rise before rdbll0 earliest of earliest (latest) RDWL receivers • ENR: Earliest among nearests Level0 Bitline[0] Level0 Bitline[1] bit bit bit bit bit bit bit bit • EFR: Earliest among farthests • Fall transition: latest (earliest) L0PCH falls after Earliest rise and fall Level0 precharge clock ( left ) latest of latest (earliest) RDWL receivers Earliest L0PCH • LNF: Latest among nearests RDWL[0] RDWL[1] RDWL[2] RDWL[3] RDWL[4] RDWL[5] RDWL[6] RDWL[7] • LFF: Latest among farthests

  11. Additional Constraints for RF Clock Tree Sizing • Absolute required arrival time at a specific entry • Relative timing constraints for races between • L0PCH and RDWL • L1PCH and RDBL • RDWL and WRWL • Approach • Size one RDWL to the absolute arrival time • Let the rest of the RDWLs fall as they may • Size L0PCH, L1PCH, WRWL for relative timing constraints

  12. Agenda • Introduction • Clock Trees for Register Files • Sizing Algorithm • Experimental Results & Conclusion

  13. Overall Algorithm • Two stage approach • Stage 1: Size RDWLs and L0PCH in the 1 st stage • Relative timing constraints for L0PCHs will be met • Run timing analysis to find arrival times for RDBLs and RDWLs • Generate constraints for absolute arrival time requirements for L1PCHs and WRWLs • Stage2: Size L1PCH, WRWLs considering those constraints • Iterate to account for perturbations of RDWL timing during Stage 2 • Dynamic programming for clock tree sizing considering races • Specifically, will show how bottom-up solution generation is carried out • Most critical step • Also, dominant one from time complexity perspective

  14. Example • Assume BP1 2 BP1 3 AP1 1 BP1 1 BP1 4 OP1 1 – Each clock buffer has 3 strengths L1PCH Lo Lo Hi Lo Hi • Power = 3 unit, delay = 22 unit Relative spec 0: R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 • Power = 2 unit, delay = 24 unit L0PCH Lo Hi CKRDM1N44 • Power = 1 unit, delay = 26 unit rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] G Relative spec 1: F L0PCH 5ps AF RDWL[N:0] • Generate solutions in reverse topological RDWL[N:0] order .. N .. INSTANCE[A] • A solution associated with a clock buffer BP0 1 RCB characterized by a 10-element vector (EFR, L0PCH Lo Hi CKRDM1N44 RCB rdbll0 MCLK Relative spec 0: LFF, ENR, LNF, RR_spc, RF_spc, Rise-cell- R L0PCH 0.05 BR RDWL[N:0] Hsw default G CKRDM1N22 Relative spec 1: mclk rise GR 2 BR 2 F L0PCH 5ps AF RDWL[N:0] - 55 ar delay, fall-cell-delay, cell-power, total-power) Absolute spec 0: -27 ar RDWL[N:0] RDWL[N]. Absolute spec 2: GR 1 BR 1 27 ar .. N .. INSTANCE[0]

  15. Bottom-up Solution Generation • Assume BP1 2 BP1 4 BP1 3 AP1 1 BP1 1 OP1 1 – required arrival time at the specified entry is 27 unit – L0PCH has to rise/fall before/after RDWL by ≥5 unit • Initialize the solutions at leaves, i.e., the receivers of BR 1 [0:7] and BP0 1 to (27, 27, 27, 27, 27, 27, 0, 0, 0, 0) BP0 1 RCB • Create solutions in the following order: – BR 1 , GR 1 , BP0 1 , BR 2 , GR 2 , RCB GR 2 BR 2 – BP1 2 , OP1 1 , AP1 1 , BP1 3 , BP1 4 , RCB GR 1 BR 1

  16. Bottom-up Solution Generation (C (CNTD.) • 3 steps: – Combine solutions at the output – Consider sizes for the current clock buffer to create solutions – Prune inferior solutions

  17. Bottom-up Solution Generation: Combining Solutions at B BR 1 • Assume max and min wire- BP1 2 BP1 4 AP1 1 BP1 1 BP1 3 OP1 1 delay to be 6 and 2 unit L1PCH Lo Lo Lo Hi Hi Relative spec 0: R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 • Combine solutions at the L0PCH Lo Hi CKRDM1N44 rdbll0 output of BR 1 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] G Relative spec 1: F L0PCH 5ps AF RDWL[N:0] – (27-6, 27-6, 27-2, 27- RDWL[N:0] 2, 27-6, 27-6, 0, 0, 0, .. N .. INSTANCE[A] 0) = (21, 21, 25, 25, BP0 1 RCB 21, 21, 0, 0, 0, 0) L0PCH Lo Hi CKRDM1N44 RCB rdbll0 For 2 nd and 3 rd iteration, MCLK Relative spec 0: – R L0PCH 0.05 BR RDWL[N:0] Hsw default G CKRDM1N22 Relative spec 1: mclk rise F L0PCH 5ps AF RDWL[N:0] - 55 ar GR 2 BR 2 Absolute RDWL[i] may have spec 0: -27 ar RDWL[N:0] different wire-delays, so RDWL[N]. Absolute spec 2: GR 1 BR 1 27 ar .. N .. INSTANCE[0] use the appropriate ones for each BR 1 [i]

  18. Bottom-up Solution Generation: Considering choices for BR 1 BP1 2 BP1 4 BP1 3 AP1 1 BP1 1 • For 1 st iteration, assume slope; OP1 1 L1PCH Lo Lo Hi Lo Hi Relative spec 0: use actual values for later ones R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 L0PCH • 3 solutions corresponding to 3 Lo Hi CKRDM1N44 rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] G choices Relative spec 1: F L0PCH 5ps AF RDWL[N:0] – Solution 1: (21-22, 21- RDWL[N:0] 22, 25-22, 25-22, 21- .. N .. INSTANCE[A] BP0 1 RCB 22, 21-22, 22, 22, 3, 3) L0PCH Lo Hi CKRDM1N44 RCB = (-1, -1, 3, 3, -1, -1, rdbll0 MCLK Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Hsw default G 22, 22, 3, 3) CKRDM1N22 Relative spec 1: mclk rise GR 2 BR 2 F L0PCH 5ps AF RDWL[N:0] - 55 ar Absolute spec 0: -27 ar RDWL[N:0] – Solution 2: (-3, -3, 1, 1, RDWL[N]. Absolute spec 2: GR 1 BR 1 -3, -3, 24, 24, 2, 2) 27 ar .. N .. INSTANCE[0] – Solution 3: (-5, -5, -1, - 1, -5, -5, 26, 26, 1, 1)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend