An Exact Polynomial Time Algorithm for Clock Tree Sizing for Register Files Alexander Berkovich, Lawrence D. Gonzales, Ataur Patwary Intel Corporation Rupesh S. Shelar Synopsys Inc.
Agenda • Introduction • Clock Trees for Register Files • Sizing Algorithm • Experimental Results & Conclusion
Background • Memory hierarchy: on-chip, DRAM, SSDs, (X-point), hard drives • On-chip memory: latch/flop, Register files, SRAM • Register files for fast on-chip storage • Employed in processors meant for low power & high performance applications • Have been around for decades
Register Files • Contain bit-cells arranged in rows and columns • Bit-cell optimized for area, power, speed in each technology • Circuitry for read-/write address decoding • Rail-to-rail voltage swing compared to SRAM • Typically, 8-transistor cells for 1R1W • Bit-cells are larger • Use static domino latch instead of sense amplifiers • Commercial tools available for layout of different configurations
Synchronous Operation • Most circuits are synchronous • Register files also accessed in the same fashion • Requires clock tree sizing considering races • Commercial tools avoid races by employing self-timed circuits • Comes at the cost of additional area, power • May not be desirable for high volume processors
Why clock trees for register files are important • Clock trees consume significant power, in general • True for register files also, as all read/write accesses go through clock network • Bit-cells highly optimized for each technology • Layout is also optimized using regular arrangement in rows & columns • Clock trees for RFs have additional constraints, apart from skew, slew • Races between read & precharge, read & write • Tedious, time-consuming to size manually
Agenda • Introduction • Clock Trees for Register Files • Sizing Algorithm • Experimental Results & Conclusion
High-level View ENTRY 0 ENTRY 47 vn7inc30ls0v vn7lap00ls0i1 1 vn7inc30ls0v vn7lap00ls0i1 1 vn7inc30ls0v vn7lap00ls0i1 1 MCLK vn7inc30ls0v vn7lap00ls0i1 edgewrckdrv 1 Write Address Protection Write Address Protection Latches/3:8 Decoder Latches/3:8 Decoder rddecsdlc0 inv CKWRM2N33 RCB WRDECC WRDECC WRDECC Read Address Protection Read Address Protection Latches/3:8 Decoder Latches/3:8 Decoder edgerdpdecc0 CKRDM1N22 RCB RDDECC RDDECC RDDECC RDDECC RDDECC Layout summary of 48 entry x 4-bit array
Clock Tree Topology • Clock cell instance naming BP1 2 BP1 4 BP1 3 AP1 1 BP1 1 OP1 1 convention (for this example only) L1PCH Lo Lo Lo Hi Hi • G = Ganged NAND Relative spec 0: R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 • B = buffer L0PCH • R = RDWL Lo Hi CKRDM1N44 rdbll0 Relative spec 0: • P0(1) = L0(1) Precharge R L0PCH 0.05 BR RDWL[N:0] G Relative spec 1: F L0PCH 5ps AF RDWL[N:0] • A = AND RDWL[N:0] • O = OR .. N .. INSTANCE[A] • Numbers in suffix denote the BP0 1 RCB reverse topological order in which L0PCH Lo Hi CKRDM1N44 RCB rdbll0 MCLK solutions are created Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Hsw default G CKRDM1N22 Relative spec 1: mclk rise F L0PCH 5ps AF RDWL[N:0] GR 2 BR 2 - 55 ar Absolute spec 0: -27 ar RDWL[N:0] RDWL[N]. Absolute spec 2: GR 1 BR 1 27 ar .. N .. INSTANCE[0]
Relative Timing Constraints between L0PCH and RDWL Latest rise and fall Latest L0PCH EFR RDWL LFF RDWL 0 memcell[0] memcell[1] memcell[2] memcell[3] memcell[4] memcell[5] memcell[6] memcell[7] LFF L0PCH rdbll0 EFR L0PCH Level0 Bitline[0] Level0 Bitline[1] ENR RDWL LNF RDWL bit bit bit bit bit bit bit bit 0 LNF L0PCH ENR L0PCH rdbll0 Level0 Bitline[0] Level0 Bitline[1] • Relative timing constraint between RDWL & L0PCH bit bit bit bit bit bit bit bit • Rise transition: earliest (latest) L0PCH rise before rdbll0 earliest of earliest (latest) RDWL receivers • ENR: Earliest among nearests Level0 Bitline[0] Level0 Bitline[1] bit bit bit bit bit bit bit bit • EFR: Earliest among farthests • Fall transition: latest (earliest) L0PCH falls after Earliest rise and fall Level0 precharge clock ( left ) latest of latest (earliest) RDWL receivers Earliest L0PCH • LNF: Latest among nearests RDWL[0] RDWL[1] RDWL[2] RDWL[3] RDWL[4] RDWL[5] RDWL[6] RDWL[7] • LFF: Latest among farthests
Additional Constraints for RF Clock Tree Sizing • Absolute required arrival time at a specific entry • Relative timing constraints for races between • L0PCH and RDWL • L1PCH and RDBL • RDWL and WRWL • Approach • Size one RDWL to the absolute arrival time • Let the rest of the RDWLs fall as they may • Size L0PCH, L1PCH, WRWL for relative timing constraints
Agenda • Introduction • Clock Trees for Register Files • Sizing Algorithm • Experimental Results & Conclusion
Overall Algorithm • Two stage approach • Stage 1: Size RDWLs and L0PCH in the 1 st stage • Relative timing constraints for L0PCHs will be met • Run timing analysis to find arrival times for RDBLs and RDWLs • Generate constraints for absolute arrival time requirements for L1PCHs and WRWLs • Stage2: Size L1PCH, WRWLs considering those constraints • Iterate to account for perturbations of RDWL timing during Stage 2 • Dynamic programming for clock tree sizing considering races • Specifically, will show how bottom-up solution generation is carried out • Most critical step • Also, dominant one from time complexity perspective
Example • Assume BP1 2 BP1 3 AP1 1 BP1 1 BP1 4 OP1 1 – Each clock buffer has 3 strengths L1PCH Lo Lo Hi Lo Hi • Power = 3 unit, delay = 22 unit Relative spec 0: R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 • Power = 2 unit, delay = 24 unit L0PCH Lo Hi CKRDM1N44 • Power = 1 unit, delay = 26 unit rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] G Relative spec 1: F L0PCH 5ps AF RDWL[N:0] • Generate solutions in reverse topological RDWL[N:0] order .. N .. INSTANCE[A] • A solution associated with a clock buffer BP0 1 RCB characterized by a 10-element vector (EFR, L0PCH Lo Hi CKRDM1N44 RCB rdbll0 MCLK Relative spec 0: LFF, ENR, LNF, RR_spc, RF_spc, Rise-cell- R L0PCH 0.05 BR RDWL[N:0] Hsw default G CKRDM1N22 Relative spec 1: mclk rise GR 2 BR 2 F L0PCH 5ps AF RDWL[N:0] - 55 ar delay, fall-cell-delay, cell-power, total-power) Absolute spec 0: -27 ar RDWL[N:0] RDWL[N]. Absolute spec 2: GR 1 BR 1 27 ar .. N .. INSTANCE[0]
Bottom-up Solution Generation • Assume BP1 2 BP1 4 BP1 3 AP1 1 BP1 1 OP1 1 – required arrival time at the specified entry is 27 unit – L0PCH has to rise/fall before/after RDWL by ≥5 unit • Initialize the solutions at leaves, i.e., the receivers of BR 1 [0:7] and BP0 1 to (27, 27, 27, 27, 27, 27, 0, 0, 0, 0) BP0 1 RCB • Create solutions in the following order: – BR 1 , GR 1 , BP0 1 , BR 2 , GR 2 , RCB GR 2 BR 2 – BP1 2 , OP1 1 , AP1 1 , BP1 3 , BP1 4 , RCB GR 1 BR 1
Bottom-up Solution Generation (C (CNTD.) • 3 steps: – Combine solutions at the output – Consider sizes for the current clock buffer to create solutions – Prune inferior solutions
Bottom-up Solution Generation: Combining Solutions at B BR 1 • Assume max and min wire- BP1 2 BP1 4 AP1 1 BP1 1 BP1 3 OP1 1 delay to be 6 and 2 unit L1PCH Lo Lo Lo Hi Hi Relative spec 0: R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 • Combine solutions at the L0PCH Lo Hi CKRDM1N44 rdbll0 output of BR 1 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] G Relative spec 1: F L0PCH 5ps AF RDWL[N:0] – (27-6, 27-6, 27-2, 27- RDWL[N:0] 2, 27-6, 27-6, 0, 0, 0, .. N .. INSTANCE[A] 0) = (21, 21, 25, 25, BP0 1 RCB 21, 21, 0, 0, 0, 0) L0PCH Lo Hi CKRDM1N44 RCB rdbll0 For 2 nd and 3 rd iteration, MCLK Relative spec 0: – R L0PCH 0.05 BR RDWL[N:0] Hsw default G CKRDM1N22 Relative spec 1: mclk rise F L0PCH 5ps AF RDWL[N:0] - 55 ar GR 2 BR 2 Absolute RDWL[i] may have spec 0: -27 ar RDWL[N:0] different wire-delays, so RDWL[N]. Absolute spec 2: GR 1 BR 1 27 ar .. N .. INSTANCE[0] use the appropriate ones for each BR 1 [i]
Bottom-up Solution Generation: Considering choices for BR 1 BP1 2 BP1 4 BP1 3 AP1 1 BP1 1 • For 1 st iteration, assume slope; OP1 1 L1PCH Lo Lo Hi Lo Hi Relative spec 0: use actual values for later ones R L1PCH 0.05 BR INSTANCE[A:0]/RDBLL0 Relative spec 1: F L1PCH 5ps AF INSTANCE[A:0]/RDBLL0 L0PCH • 3 solutions corresponding to 3 Lo Hi CKRDM1N44 rdbll0 Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] G choices Relative spec 1: F L0PCH 5ps AF RDWL[N:0] – Solution 1: (21-22, 21- RDWL[N:0] 22, 25-22, 25-22, 21- .. N .. INSTANCE[A] BP0 1 RCB 22, 21-22, 22, 22, 3, 3) L0PCH Lo Hi CKRDM1N44 RCB = (-1, -1, 3, 3, -1, -1, rdbll0 MCLK Relative spec 0: R L0PCH 0.05 BR RDWL[N:0] Hsw default G 22, 22, 3, 3) CKRDM1N22 Relative spec 1: mclk rise GR 2 BR 2 F L0PCH 5ps AF RDWL[N:0] - 55 ar Absolute spec 0: -27 ar RDWL[N:0] – Solution 2: (-3, -3, 1, 1, RDWL[N]. Absolute spec 2: GR 1 BR 1 -3, -3, 24, 24, 2, 2) 27 ar .. N .. INSTANCE[0] – Solution 3: (-5, -5, -1, - 1, -5, -5, 26, 26, 1, 1)
Recommend
More recommend