leveraging high performance g g g data cache techniques
play

Leveraging High Performance g g g Data Cache Techniques to Save - PDF document

4/26/2012 Leveraging High Performance g g g Data Cache Techniques to Save Power in Embedded Systems Major Bhadauria, Sally A. McKee, Karan Singh, Gary S. Tyson Process Technology Leakage Problem 100,000 Lower Operating Voltage 10,000


  1. 4/26/2012 Leveraging High Performance g g g Data Cache Techniques to Save Power in Embedded Systems Major Bhadauria, Sally A. McKee, Karan Singh, Gary S. Tyson Process Technology Leakage Problem 100,000  Lower Operating Voltage 10,000 Ioff (nA/um) 0.25um 1,000 0.18um 0.13um 100 0.1um  Lower Transistor 10 Threshold  Exponential Increase p 1 30 40 50 60 70 80 90 100 110 In Leakage Temperature (C) Leakage vs. Temperature 1

  2. 4/26/2012 Outline  Cache Power Reduction Solutions  Leakage Issue  Possible Solutions  Our Reuse Distance ( RD) policy  Energy and Delay Performance  Future Work Future Work Cache Power Reduction  Reduce Dynamic Power  Partition caches horizontally via cache banking or region  Partition caches horizontally via cache banking or region caches lee+cases00  Partition cache vertically using filter caches or line buffers kamble+islped97 , kin+ieeetc00  Reduce Static Power  Utilize high-VT threshold transistors  Dynamically turn off dead lines kaxiras+isca01  Dynamically turn off dead lines  Dynamically put to sleep unused lines flautner+isca02 2

  3. 4/26/2012 Region Caches  Partition data cache into: stack global and into: stack, global and heap regions*  Steer accesses to cache structures using virtual address* Multiple Access Caches Target Way-Associative Performance without power overhead: power overhead:  Column-associative caches check secondary cache line on miss, extra bit to indicate whether tag line hashed  MRU two-way associative caches check cache ways sequentially rather than parallel, extra bit for MRU way 3

  4. 4/26/2012 Leakage Reduction  High-VT Static Solution  Replace transistors with high-VT ones  Static increase in latency  Gated-VDD Decay Caches (State Losing)  Turn off unused cache lines (loses data)  Requires sleeper transistors  Adaptive Body Biasing (ABB) & Drowsy Caches (Retain State)  Significant delay and dynamic power consumption between wakeup for ABB b t k f ABB  Requires special manufacturing process for ABB  DVS for leakage reduction with drowsy caches  Extra circuitry required for both Previous Drowsy Leakage Policies  Simple  Turn off all cache lines every X cycles  Little overhead, power/performance is variable  No Access  Turn off cache line if not accessed within X cycles  Counters required per cache line  Reuse Most Recently On (RMRO)  Reuse Most Recently On (RMRO)  No Access policy specifically for cache ways  Requires some bits per cache set, only 1 counter 4

  5. 4/26/2012 Reuse Distance (RD) Policy  Measures time using cache accesses to increment counters increment counters  Keeps only the last N accesses “awake” for an RD of size N  Ensures only N lines are ever awake  Clock cycle independent Clock cycle independent  Gives upper bound for power envelope Reuse Distance (RD) LRU  True LRU too expensive, substitute with:  Quasi-LRU via saturating counters  Close approximations via timestamps Cl i ti i ti t LRU Cache Line Counter RD N=4 Cache Accesses Check Drowsy Misses These Bits Increment These Bits 0 7 1 2 1 1 23 23 2 2 3 3 99 833 2 3 0 3 832 0 1 5

  6. 4/26/2012 We Apply  Region caches with the heap cache size reduced by half multiple access cache to reduced by half, multiple access cache to retain performance  Drowsy cache using the RD policy  Target embedded architecture and applications Experimental Setup  Alpha 21264 Architecture/ISA,  HotLeakage Simulator  HotLeakage Simulator  1.5GHz, 70nm, 80 degrees Simulator Parameters  SPEC2000 Benchmarks Using SimPoints  2 Level Cache Hierarchy  32KB 32 byte 4-Way L1 D-Cache (1 cycle)  4-Way Unified L2 256KB/512KB/1MB/2MB  Drowsy Policies  Drowsy Policies  Simple Policy 4K Cycles (NoAccess omitted)  RMRO 256  RD 15 6

  7. 4/26/2012 Column Associative MRU 7

  8. 4/26/2012 Reuse Coverage Performance simple RD 0.992 0.99 0.988 PC Normalized to DM Simple 0.986 0.984 0.982 0.98 0.978 0.976 IP 0.974 0.972 0.97 CA MRU 8

  9. 4/26/2012 Dynamic Energy simple 2-way associative simple column associative simple MRU 1.4 1 2 1.2 Consumption Normalized Simple Direct-Mapped 1 0.8 0.6 Power to 0.4 0.2 0 Static Energy simple RD 0.12 0.1 0 1 Leakage normalized to DM Non-Drowsy Cache 0.08 0.06 0.04 0.02 0 heap stack global Region 9

  10. 4/26/2012 Total Power Consumption simple RD 0.5 M Cache 0.45 Normalized to Non-Drowsy DM 0.4 0.35 0.3 0.25 0.2 0.15 Total Power 0.1 0.05 0 DM CA MRU Conclusion  Cache Power Reductions  Dynamic power reductions achieved via multiple y p p access caches  Significant leakage reduction through RD policy  Minimal performance degradation  Future Work  Investigate cache interaction in CMP systems  Investigate cache interaction in CMP systems  Use compiler hints for static cache assignments 10

  11. 4/26/2012 Q&A 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend