gated precharging reducing bitline precharge in deep sub
play

Gated Precharging: Reducing Bitline Precharge in Deep-Sub Caches - PowerPoint PPT Presentation

Gated Precharging: Reducing Bitline Precharge in Deep-Sub Caches Se-Hyun Yang and Babak Falsafi PowerTap PowerTap http://www.ece.cmu.edu/~powertap Computer Architecture Lab (CALCM) Carnegie Mellon University High Bitline Leakage in Caches


  1. Gated Precharging: Reducing Bitline Precharge in Deep-Sub µ Caches Se-Hyun Yang and Babak Falsafi PowerTap PowerTap http://www.ece.cmu.edu/~powertap Computer Architecture Lab (CALCM) Carnegie Mellon University

  2. High Bitline Leakage in Caches Deep-sub µ high-performance caches BL BL BL BL � Use subarrays … � Precharge entire cache : � Active subarrays: bitlines discharge … � Idle subarrays: bitlines leak … Energy wasted in idle subarrays !

  3. Exploit Temporal Locality in Subarrays Observation � All subarrays precharge/leak � But, only small # of active subarrays Precharge Precharge Precharge Hot Precharge Precharge Precharge Precharge Hot

  4. Contribution: Gated Precharging Precharge only active subarrays Detect temporal locality � Decay counters � Threshold comparison logic Reduce precharging � by 89% in L1 d-cache � by 92% in L1 i-cache � with < 2% performance impact

  5. Outline � Overview � Bitline Leakage � Gated Precharging � Temporal Locality in Subarrays � Implementation � Gating Overhead � Related Work � Results � Conclusion

  6. Bitline Leakage in SRAM Cells BL BL Wordline More than 60% discharge in 0.10 µ

  7. How Much Temporal Locality? We evaluate, in a small window How many accesses reuse subarrays? 1. How many active subarrays? 2.

  8. Subarray Reuse Ratio Even in a small window, high subarray reuse e.g., gcc with 32K L1D with 1K subarrays � 96% accesses reuse subarrays in 100-cycle window of d-cache accesses Cummulative fraction For all benchmarks, 100% 80% in 100-cycle window 60% � 95% for d-cache 40% � 98% for i-cache 20% 0% 1 10 100 1000 10000 100000 Subarray access interval (cycles)

  9. Fraction of Subarrays Accessed In a small window, small # of active subarrays e.g., gcc with 32K L1D with 1K subarrays � 19% of subarrays accessed in 100-cycle window Fraction of subarrays touched in a window For all benchmarks, 100% 80% in 100-cycle window 60% � < 29% for d-cache 40% � < 22% for i-cache 20% 0% 1 10 100 1000 10000 100000 Window size (cycles)

  10. Temporal Locality in Subarrays In 100-cycle window, � >95% of cache accesses reuse < 30% of subarrays Most accesses temporally localized in small # of subarrays

  11. Gated Precharging: Hardware Decay counter per subarray [Kaxiras, et al.] Threshold value to decide “when” to precharge Algorithm Precharge Threshold � if count < threshold Comp Control � precharge CLK Counter reset � if count > threshold Subarray Cache Access � no precharge

  12. Gated Precharging: Overhead Minimal performance overhead � Hits on idle subarrays incur 1 extra cycle � Infrequent due to temporal locality (Example: gcc < 8% d-cache accesses) Minimal energy overhead � 10-bit counter per subarray � Comparison logic � Existing precharge control logic

  13. Related Work Delayed precharging [Alpha 21264] � Precharge only required subarrays � Increase cache access latency by delaying precharge Resizable caches [Albonesi] [Yang, et al.] � Capture working set size variation & resize caches � Coarse switching granularity (time & space) � Relatively larger performance overhead Way prediction [Powell, et al., Inoue, et al.] � Predict set associative way for next access � Orthogonal to gated precharging

  14. Methodology � Wattch [ISCA2000] � 16 SPEC2000/Olden benchmarks � Performance impact < 2% � Base Case � 8-wide issue, 128-entry RUU � 32K direct-mapped L1 I & D w/ 1K-subarray � 512K 4-way unified L2 � Determine threshold values based on profiling � Threshold values ≅ 100 cycles

  15. � by >85% for all but vpr � On average by 89% Reduced Fraction of subarray precharge 10% 20% 30% 40% 50% 0% a m m p a r t Results: D-Cache b h b i s o r t b z i p 2 e m 3 d e q u a k e g c c h e a l t h m c f m e s a t r e e a d d t s p v o r t e x v p r w u p w i s e

  16. � by >90% for 13 benchmarks � On average by 92% Reduced Fraction of subarray precharges 10% 20% 30% 40% 50% 0% a m m p a r t b Results: I-Cache h b i s o r t b z i p 2 e m 3 d e q u a k e g c c h e a l t h m c f m e s a t r e e a d d t s p v o r t e x v p r w u p w i s e

  17. Conclusions High bitline leakage in deep submicron caches Energy wasted in idle subarrays Gated precharging � Exploits temporal locality in subarrays � Reduces 90% of precharging � With < 2% performance impact

  18. For more information PowerTap Project http://www.ece.cmu.edu/~powertap Computer Architecture Lab Carnegie Mellon University

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend