Gated Precharging: Reducing Bitline Precharge in Deep-Sub Caches - PowerPoint PPT Presentation

Gated Precharging: Reducing Bitline Precharge in Deep-Sub µ Caches Se-Hyun Yang and Babak Falsafi PowerTap PowerTap http://www.ece.cmu.edu/~powertap Computer Architecture Lab (CALCM) Carnegie Mellon University

High Bitline Leakage in Caches Deep-sub µ high-performance caches BL BL BL BL � Use subarrays … � Precharge entire cache : � Active subarrays: bitlines discharge … � Idle subarrays: bitlines leak … Energy wasted in idle subarrays !

Exploit Temporal Locality in Subarrays Observation � All subarrays precharge/leak � But, only small # of active subarrays Precharge Precharge Precharge Hot Precharge Precharge Precharge Precharge Hot

Contribution: Gated Precharging Precharge only active subarrays Detect temporal locality � Decay counters � Threshold comparison logic Reduce precharging � by 89% in L1 d-cache � by 92% in L1 i-cache � with < 2% performance impact

Outline � Overview � Bitline Leakage � Gated Precharging � Temporal Locality in Subarrays � Implementation � Gating Overhead � Related Work � Results � Conclusion

Bitline Leakage in SRAM Cells BL BL Wordline More than 60% discharge in 0.10 µ

How Much Temporal Locality? We evaluate, in a small window How many accesses reuse subarrays? 1. How many active subarrays? 2.

Subarray Reuse Ratio Even in a small window, high subarray reuse e.g., gcc with 32K L1D with 1K subarrays � 96% accesses reuse subarrays in 100-cycle window of d-cache accesses Cummulative fraction For all benchmarks, 100% 80% in 100-cycle window 60% � 95% for d-cache 40% � 98% for i-cache 20% 0% 1 10 100 1000 10000 100000 Subarray access interval (cycles)

Fraction of Subarrays Accessed In a small window, small # of active subarrays e.g., gcc with 32K L1D with 1K subarrays � 19% of subarrays accessed in 100-cycle window Fraction of subarrays touched in a window For all benchmarks, 100% 80% in 100-cycle window 60% � < 29% for d-cache 40% � < 22% for i-cache 20% 0% 1 10 100 1000 10000 100000 Window size (cycles)

Temporal Locality in Subarrays In 100-cycle window, � >95% of cache accesses reuse < 30% of subarrays Most accesses temporally localized in small # of subarrays

Gated Precharging: Hardware Decay counter per subarray [Kaxiras, et al.] Threshold value to decide “when” to precharge Algorithm Precharge Threshold � if count < threshold Comp Control � precharge CLK Counter reset � if count > threshold Subarray Cache Access � no precharge

Gated Precharging: Overhead Minimal performance overhead � Hits on idle subarrays incur 1 extra cycle � Infrequent due to temporal locality (Example: gcc < 8% d-cache accesses) Minimal energy overhead � 10-bit counter per subarray � Comparison logic � Existing precharge control logic

Related Work Delayed precharging [Alpha 21264] � Precharge only required subarrays � Increase cache access latency by delaying precharge Resizable caches [Albonesi] [Yang, et al.] � Capture working set size variation & resize caches � Coarse switching granularity (time & space) � Relatively larger performance overhead Way prediction [Powell, et al., Inoue, et al.] � Predict set associative way for next access � Orthogonal to gated precharging

Methodology � Wattch [ISCA2000] � 16 SPEC2000/Olden benchmarks � Performance impact < 2% � Base Case � 8-wide issue, 128-entry RUU � 32K direct-mapped L1 I & D w/ 1K-subarray � 512K 4-way unified L2 � Determine threshold values based on profiling � Threshold values ≅ 100 cycles

� by >85% for all but vpr � On average by 89% Reduced Fraction of subarray precharge 10% 20% 30% 40% 50% 0% a m m p a r t Results: D-Cache b h b i s o r t b z i p 2 e m 3 d e q u a k e g c c h e a l t h m c f m e s a t r e e a d d t s p v o r t e x v p r w u p w i s e

� by >90% for 13 benchmarks � On average by 92% Reduced Fraction of subarray precharges 10% 20% 30% 40% 50% 0% a m m p a r t b Results: I-Cache h b i s o r t b z i p 2 e m 3 d e q u a k e g c c h e a l t h m c f m e s a t r e e a d d t s p v o r t e x v p r w u p w i s e

Conclusions High bitline leakage in deep submicron caches Energy wasted in idle subarrays Gated precharging � Exploits temporal locality in subarrays � Reduces 90% of precharging � With < 2% performance impact

For more information PowerTap Project http://www.ece.cmu.edu/~powertap Computer Architecture Lab Carnegie Mellon University

Gated Precharging: Reducing Bitline Precharge in Deep-Sub Caches - PowerPoint PPT Presentation

Gated Precharging: Reducing Bitline Precharge in Deep-Sub Caches Se-Hyun Yang and Babak Falsafi PowerTap PowerTap http://www.ece.cmu.edu/~powertap Computer Architecture Lab (CALCM) Carnegie Mellon University High Bitline Leakage in Caches

Performance/Power Trade-Offs of Bitline Isolation Se-Hyun Yang and Babak Falsafi Computer

Visualizing Model Architecture john.sekar@mssm.edu SASB `17 Kinetics ~ Reaction Rules Enz Sub

Range gated cameras technology and its applications Friday, October 10, 14 Range gated cameras

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN

1 Mammalian Neurons Have Several Types of Voltage-Gated Ion Channels Why do neurons need so many

Bitline PUF: Daniel E. Holcomb Kevin Fu Building Native Challenge-Response University of

Case 2: Reducing Cardiovascular Risk Type 2 Diabetes Management Case 1: Reducing Hypoglycemic

Terahertz Detection Terahertz Detection with 2D Plasmons in a Grating Gated with 2D Plasmons in

a private, gated in-town waterfront community in League City, Tx CreekWood Harbour, clearly the

1 Early Computers Were Made of Thousands of Identical Electronic Components II. Fine Control of

1 V m = the Value of the Na Battery Plus the I Na is Isolated By Blocking I K Voltage Drop Across g

Gated Path Planning Networks Lisa Lee Machine Learning Department Carnegie Mellon University

from two-sided pricing to gated communities: welcome back to the eighties julien mailland

Image Segmentation with Gated Shape CNN for Autonomous Driving Jeanine Liebold Intelligent

Systems Gated Latches Shankar Balachandran* Associate Professor, CSE Department Indian

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

Recent status and plans at SPring-8 LEPS2 facility M. Miyabe ELPH Tohoku University LEPS and

The Atmospheric Monitoring system of the JEM-EUSO telescope Simona Toscano for the JEM-EUSO

On the Exact Lower Bounds of Encoding Circuit Sizes of Hamming codes and Hadamard codes Zhengrui

Plugin Mechanisms in GCC Uday Khedker (www.cse.iitb.ac.in/uday) GCC Resource Center,

Prime Numbers Prime Numbers Prime number : an integer p>1 that is divisible only by 1 and

Newton Methods for Neural Networks: Gauss Newton Matrix-vector Product Chih-Jen Lin National

Compact Fourier Analysis for Multigrid Methods Cortona 2008 Thomas Huckle joint work with

Breaking and restoration of rotational symmetry in the spectrum of conjugate nuclei on the

Gated Precharging: Reducing Bitline Precharge in Deep-Sub Caches - PowerPoint PPT Presentation

Gated Precharging: Reducing Bitline Precharge in Deep-Sub Caches Se-Hyun Yang and Babak Falsafi PowerTap PowerTap http://www.ece.cmu.edu/~powertap Computer Architecture Lab (CALCM) Carnegie Mellon University High Bitline Leakage in Caches

Performance/Power Trade-Offs of Bitline Isolation Se-Hyun Yang and Babak Falsafi Computer

Visualizing Model Architecture john.sekar@mssm.edu SASB `17 Kinetics ~ Reaction Rules Enz Sub

Range gated cameras technology and its applications Friday, October 10, 14 Range gated cameras

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN &amp; Gated RNN

1 Mammalian Neurons Have Several Types of Voltage-Gated Ion Channels Why do neurons need so many

Bitline PUF: Daniel E. Holcomb Kevin Fu Building Native Challenge-Response University of

Case 2: Reducing Cardiovascular Risk Type 2 Diabetes Management Case 1: Reducing Hypoglycemic

Terahertz Detection Terahertz Detection with 2D Plasmons in a Grating Gated with 2D Plasmons in

a private, gated in-town waterfront community in League City, Tx CreekWood Harbour, clearly the

1 Early Computers Were Made of Thousands of Identical Electronic Components II. Fine Control of

1 V m = the Value of the Na Battery Plus the I Na is Isolated By Blocking I K Voltage Drop Across g

Gated Path Planning Networks Lisa Lee Machine Learning Department Carnegie Mellon University

from two-sided pricing to gated communities: welcome back to the eighties julien mailland

Image Segmentation with Gated Shape CNN for Autonomous Driving Jeanine Liebold Intelligent

Systems Gated Latches Shankar Balachandran* Associate Professor, CSE Department Indian

CS7015 (Deep Learning) : Lecture 15 Long Short Term Memory Cells (LSTMs), Gated Recurrent Units

Recent status and plans at SPring-8 LEPS2 facility M. Miyabe ELPH Tohoku University LEPS and

The Atmospheric Monitoring system of the JEM-EUSO telescope Simona Toscano for the JEM-EUSO

On the Exact Lower Bounds of Encoding Circuit Sizes of Hamming codes and Hadamard codes Zhengrui

Plugin Mechanisms in GCC Uday Khedker (www.cse.iitb.ac.in/uday) GCC Resource Center,

Prime Numbers Prime Numbers Prime number : an integer p&gt;1 that is divisible only by 1 and

Newton Methods for Neural Networks: Gauss Newton Matrix-vector Product Chih-Jen Lin National

Compact Fourier Analysis for Multigrid Methods Cortona 2008 Thomas Huckle joint work with

Breaking and restoration of rotational symmetry in the spectrum of conjugate nuclei on the

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN

Prime Numbers Prime Numbers Prime number : an integer p>1 that is divisible only by 1 and