Mellow Writes: Extending Lifetime in Resistive Memories through - - PowerPoint PPT Presentation

mellow writes
SMART_READER_LITE
LIVE PREVIEW

Mellow Writes: Extending Lifetime in Resistive Memories through - - PowerPoint PPT Presentation

Mellow Writes: Extending Lifetime in Resistive Memories through Selective Slow Write Backs Lunkai Zhang, Brian Neely, Diana Franklin, Dmitri Strukov, Yuan Xie, Frederic T. Chong Presented at ISCA 2016 EECS 573, University of Michigan, Ann


slide-1
SLIDE 1

Mellow Writes:

Extending Lifetime in Resistive Memories through Selective Slow Write Backs

Lunkai Zhang, Brian Neely, Diana Franklin, Dmitri Strukov, Yuan Xie, Frederic T. Chong

1

EECS 573, University of Michigan, Ann Arbor Presented by Nishil Talati and Tarunesh Verma

Presented at ISCA 2016

slide-2
SLIDE 2

Executive Summary

2

DRAM scaling problem RRAM to the rescue Improve RRAM lifetime

Trade-off performance for extended lifetime

slide-3
SLIDE 3

RRAM at the Rescue

3

Resistive RAM (RRAM)

  • More scalable compared to DRAM
  • Does not solve the DRAM scaling

problem

  • Shortcomings
  • Longer write latency
  • Higher write energy
  • Limited endurance
slide-4
SLIDE 4

Trade-off Between Endurance and Write Latency in RRAM

4

  • D. B. Strukov, “Endurance-write-speed tradeoffs in nonvolatile

memories,” Applied Physics A, vol. 122, no. 4, pp. 1–4, 2016.

For typical Resistive Memory technologies, slower writes are predicted to have a quadratic endurance advantage!

Write operation High power Low power Short time Low endurance Long time High endurance

(Courtesy: Zhang et al., ISCA 2016)

slide-5
SLIDE 5

Is Having a Single Write Latency Wise?

  • Is it possible to let a system adaptively use

different write latencies to improve the lifetime without loss of performance?

5

Short write latency

  • Limited lifetime

Long write latency

  • Poor performance
slide-6
SLIDE 6

Motivation – Typical Bank Utilization Trends

  • Memory banks are idle for most of the time
  • Is it possible to use the bank idle time to slowly write

back the data?

6

(Courtesy: Zhang et al., ISCA 2016)

slide-7
SLIDE 7

Proposed Schemes

  • Mellow Writes

–Bank-Aware Mellow Writes –Eager Mellow Writes

  • Wear Quota

7

slide-8
SLIDE 8

Proposed Schemes

  • Mellow Writes

–Bank-Aware Mellow Writes –Eager Mellow Writes

  • Wear Quota

8

slide-9
SLIDE 9

Motivation: Bank Level Imbalance

  • Bank 0 has only 1 memory block to be written back

9

  • Bank 2 has more memory blocks to be written back

# Awaiting Writes 3 1 2 1

(Courtesy: Zhang et al., ISCA 2016)

Intuition: slow writes to bank 0 and fast writes to bank 2

slide-10
SLIDE 10

Bank-Aware Mellow Writes

10

  • Proposed Approach:

Slowly writing back a memory block only when there is no

  • ther memory block queued for the same bank

# Awaiting Writes 3 1 2 1

  • Write back the only memory block for Bank 0 in slow speed
  • Write back current memory block for Bank 2 in normal speed

(Courtesy: Zhang et al., ISCA 2016)

slide-11
SLIDE 11

Simulated System

  • OoO Alpha core
  • 32KB L1 I/D-$, 256KB L2$, 2MB L3$ (LLC)
  • 4GB Resistive Main Memory (ReRAM technology), 16 Banks (across 4

ranks), 32-entry read/write queues, write drain, Start-Gap Wear Leveling, (1.0x latency = 150ns, 1.00x endurance = 5.0 * 10^6 ): – Norm Writes (1.0x): 1.00x latency, 1.00x endurance – Slow Writes (3.0x): 3.00x latency, 9.00x endurance

  • Eight-Year lifetime requirement

11

slide-12
SLIDE 12

Effectiveness of Bank-Aware Mellow Writes

12

No Noticeable Performance Degradation. Geomean 87% lifetime improvement compared with All-Norm. 4 out of 11 applications meet the 8-year lifetime requirements.

(Courtesy: Zhang et al., ISCA 2016)

slide-13
SLIDE 13

Schemes

  • Mellow Writes

–Bank-Aware Mellow Writes –Eager Mellow Writes

  • Wear Quota

13

slide-14
SLIDE 14

Is it possible to reschedule the writes?

If it is possible to evenly reschedule the writes

Wasted! Too Crowded!

Motivation:

Write Scheduling Imbalance in a Memory Bank

With Bank-Aware Mellow Writes

14

(Courtesy: Zhang et al., ISCA 2016)

slide-15
SLIDE 15

Eager Mellow Writes

  • Predict which dirty cache lines in the Last

Level Cache will not be written again before their evictions, and eagerly and slowly write back these cache lines

  • In some sense, we treat Last Level Cache as a

large write buffer, in which we find proper write backs to fill the idle memory intervals

15

slide-16
SLIDE 16

Choosing Cache Lines for Eager Mellow Writes

  • This paper chooses dirty cache lines which are predicted to be

useless as the candidates for Eager Mellow Writes. Those are, the cache lines will not be accessed again before their eviction

16

Set 0 Set 1 Set 2 Set 3

Predicted useless Predicted useful Candidates of Eager Mellow Writes if Dirty

Last Level Cache

(Courtesy: Zhang et al., ISCA 2016)

slide-17
SLIDE 17

A Utility Based Approach To Predict Useless Cache Lines

For an LRU Set-associative Last Level Cache (LLC):

  • Add an access counter for each LRU stack position in LLC
  • Increase the corresponding access counter if there is an access hit on an

LRU position

  • For every time slice (500,000 cycles), choose the consecutive least-used

LRU positions with sum less than 1/32 LLC accesses

  • In the next time slice, consider these cache lines with these LRU positions

as useless, and they can be eagerly written back

17

Moinuddin K. Qureshi & Yale N. Patt, “Utility-Based Cache Partitioning: A Low-Overhead, High- Performance, Runtime Mechanism to Partition Shared Caches”, MICRO'06. (Courtesy: Zhang et al., ISCA 2016)

slide-18
SLIDE 18

Architectural Modifications

18

  • Eager Mellow Write Requests
  • Eager Mellow Queue

Lowest Priority, No Write Drains, Just Slow Writes

(Courtesy: Zhang et al., ISCA 2016)

slide-19
SLIDE 19

Effectiveness of Eager Mellow Writes

19

No Performance Degradation, even some performance benefit Geomean 158% lifetime improvement compared with All-Norm. 6 out of 11 applications meet the 8-year lifetime requirements. 5 applications still suffer from short lifetime!

(Courtesy: Zhang et al., ISCA 2016)

slide-20
SLIDE 20

Schemes

  • Mellow Writes

–Bank-Aware Mellow Writes –Eager Mellow Writes

  • Wear Quota

20

slide-21
SLIDE 21

Partition the time into Time slices

  • Wear Quota (per bank):

the average available wear of each time slice.

21

Expected Lifetime

Total Amount of Available Wear of Resistive Main Memory

Wear Quota Time Slice

(Courtesy: Zhang et al., ISCA 2016)

slide-22
SLIDE 22

Wear Quota

22

Wear Quota Time Slice 1 Wear Time Slice 1 Wear Quota Time Slice 2 Wear Quota Time Slice 3 Wear Quota Time Slice 4 Wear Time Slice 2 Wear Time Slice 3 Wear Time Slice 4

Within Wear Quota Exceeding Wear Quota Within Wear Quota Time Slice 1: Mellow Writes Policy Time Slice 2: Mellow Writes Policy Time Slice 3: All-Slow Writes Policy Time Slice 4: Mellow Writes Policy Within Wear Quota

(Courtesy: Zhang et al., ISCA 2016)

slide-23
SLIDE 23

Effectiveness of Wear Quota

  • All 11 applications meet the 8-year lifetime requirements.

23

  • Does not degrade the performance if the lifetime requirement is

already met.

  • Degrades the performance only when necessary!

(Courtesy: Zhang et al., ISCA 2016)

slide-24
SLIDE 24

Technical Insights and Conclusion

– A dynamic trade-off between write latency and endurance. – Two Mellow Writes schemes which improve the lifetime without sacrificing the performance. – Wear Quota scheme which guarantees a minimal lifetime with relatively small performance loss. – Low hardware overhead and easy to implement.

slide-25
SLIDE 25

Discussion

  • System biased to support their hypothesis

– Single-core vs. multi-core environment – Open page vs. relaxed closed page policy

  • Memory-control level write forwarding for eager

mellow write queue

  • Performance improves in case of eager mellow writes
  • Wear quota vs. remapping dead memory cells

25

slide-26
SLIDE 26

Backup Slides

26

slide-27
SLIDE 27

How About Energy?

27

  • Operation Level

A 3x Slow Write consumes 66% more Energy Compared with a normal write.

  • Total Memory Consumption of the Execution

On Average Less than 50% more memory energy compared with All-Norm Policy An Affordable Cost Compared with the Lifetime Benefit.

(Courtesy: Zhang et al., ISCA 2016)

slide-28
SLIDE 28

Sensitivity to Analytic Model

  • In a typical ReRAM technology, compared with default speed

writes, slow writes are predicted to achieve a quadratic endurance benefit. Based on a wider range of device parameters, the endurance benefit could be linear to cubic.

  • What will happen if we have a different endurance benefit?
  • Even with a pessimistic linear endurance benefit, we can still

achieve 47% lifetime improvement.

28

  • D. B. Strukov, “Endurance-write-speed tradeoffs in nonvolatile

memories,” Applied Physics A, vol. 122, no. 4, pp. 1–4, 2016. (Courtesy: Zhang et al., ISCA 2016)