A mnesic C ache M anagement for Non-Volatile Memory Dongwoo Kang , - - PowerPoint PPT Presentation

a mnesic c ache m anagement for non volatile memory
SMART_READER_LITE
LIVE PREVIEW

A mnesic C ache M anagement for Non-Volatile Memory Dongwoo Kang , - - PowerPoint PPT Presentation

A mnesic C ache M anagement for Non-Volatile Memory Dongwoo Kang , Seungjae Baek, Jongmoo Choi Donghee Lee Dankook University, South Korea University of Seoul, South Korea { kangdw, baeksj, chiojm}@dankook.ac.kr dhl_express@uos.ac.kr Sam H.


slide-1
SLIDE 1

Amnesic Cache Management for Non-Volatile Memory

Dongwoo Kang, Seungjae Baek, Jongmoo Choi Dankook University, South Korea {kangdw, baeksj, chiojm}@dankook.ac.kr Donghee Lee University of Seoul, South Korea dhl_express@uos.ac.kr Sam H. Noh Hongik University, South Korea samhnoh@hongik.ac.kr Onur Mutlu Carnegie Mellon University, USA

  • nur@cmu.edu
slide-2
SLIDE 2

Outline

Introduction & Motivation Design Evaluation Conclusion

slide-3
SLIDE 3

Introduction : Volatility

Non-Volatile Memory

PCM (Phase Change Memory), STT

  • RAM (Spin Transfer Torque RAM), RRAM

(Resistive RAM), Fe-RAM (Ferroelectric Random Access Memory) Byte addressability and Non-Volatility RAM, storage, file cache, CPU cache

Volatility Non-Volatile DRAM NVM ( STT-RAM, PCM,..) SSD & Flash Hard disk

slide-4
SLIDE 4

Introduction : Volatility

Non-Volatile Memory

PCM (Phase Change Memory), STT

  • RAM (Spin Transfer Torque RAM), RRAM

(Resistive RAM), Fe-RAM (Ferroelectric Random Access Memory) Byte addressability and Non-Volatility RAM, storage, file cache, CPU cache Limited retention capability, relaxation write

Volatility Non-Volatile DRAM NVM ( STT-RAM, PCM,..) SSD & Flash Hard disk Less retentive More retentive 64ms 1 years 10¹⁵ seconds

slide-5
SLIDE 5

Introduction : Phase Change Memory

States of PCM (Phase Change Memory)

Target band

A region of resistances that corresponds to valid bits

Write scheme

PCM adopts iterative write scheme

Resistance drifts

The resistance in a PCM cell has a tendency to increase with time When the resistance drifts up to the boundary of the next region, the state can be incorrectly represented leading to data loss

Cell distribution Resistance

State ’11' State ’10' State ’01' State ’00'

Margin Target band Resistance drift

slide-6
SLIDE 6

Introduction : Tradeoff

Tradeoff between retention capability and write speed

Narrowing target bands

Requires more precise control over the iterative mechanism Demands smaller ∆R resulting in a slowdown of the write latency

Higher retention increasing write latency

1.7x write speedup can be obtained by reducing the retention capability of PCM from 10⁷ to 10⁴ seconds [Liu et al.]

Cell distribution State ’11' State ’10' State ’01' State ’00' Cell distribution State ’11' State ’10' State ’01' State ’00' Normalized performance

0.55 1.1 1.65 2.2

Non-Volatility (sec)

10⁷ 10⁶ 10⁵ 10⁴ 10³ 10²

Write speedup

(source Liu et al.)

How to exploit these characteristics of the PCM?

slide-7
SLIDE 7

NVM Cache Employing an NVM cache provides performance improvements Fetching/Eviction data from/to storage system Retention capability for the cache

10⁷ seconds is recommended retention capability from JEDEC But, data will be evicted from the NVM cache Ensure retention capability while the data is in the cache

Motivation : What about NVM cache?

Application NVM cache Storage

Eviction with long retention capability

How much retention capability requires with the NVM cache?

Fetching

Data

Time on Cache

slide-8
SLIDE 8

Motivation : Caching time

Caching time on the NVM cache

We measure the caching time with LRU scheme 75% of the data is less than 10⁵ seconds Don’t need to ensure 10⁷ seconds retention capability in the cache

TCaching = TEvict −TFirst

1 10 100 1000 10000 100000 1e+06 128MB 256MB 512MB 1GB 2GB 4GB Caching time(sec) Cache size Quartiles Median 1 10 100 1000 10000 100000 1e+06 128MB 256MB 512MB 1GB 2GB 4GB Caching time(sec) Cache size Quartiles Median

hm₀ proj₃

slide-9
SLIDE 9

Motivation : Reference interval

Reference interval

90% of data are re-referenced within the 10⁵ second interval Retention relaxation can enhance write performance However, when data is re-referenced after its retention capability, it will induce a miss, reducing the hit ratio and triggering extra accesses to retrieve the data from storage.

0% 20% 40% 60% 80% 100% usr0 stg0 src20 hm0 mds0 prn0 prn1 proj3 Percentage of Reference interval Workloads ∼ 102 ∼ 103 ∼ 104 ∼ 105 ∼ 106

slide-10
SLIDE 10

Outline

Introduction & Motivation Design REF SACM AACM Evaluation Conclusion

slide-11
SLIDE 11

Design : REF

REF(REFresh-based cache management scheme)

REF is similar to the LRU scheme Free state and Used state Enhances write speed by relaxing retention capability from 10⁷ to 10⁴

Write latency is decrease by 1.7X

Performs refreshing for data whose retention time is about to expire Issue

Refresh operation

Free Used

Relaxation write Evict Refresh with Relaxation write

slide-12
SLIDE 12

Design : SACM

Simple Amnesic Cache Management

Free State to Tentative State

Initial write into the cache, the datum is written with the relaxed write(10⁴)

Tentative State to Confirmed State

If it is referenced again within the retention time It is rewritten with 10⁷ retention capability

Confirmed State to Free State

If it is not referenced again and the retention time expires

Issue

Additional writes

Free Tentative Confirmed

Relaxation write Expired Expired Cache hit & Default write

slide-13
SLIDE 13

Design : AACM (1/2)

Adaptive Amnesic Cache Management

Key idea

Estimates the next reference of each data and adaptive write

Estimation by IRG model Adaptive write

Ensure appropriate retention capability adaptively for each data

Ghost buffer Issue

Adaptive write Estimation

Free Tentative Confirmed based on IRG

Relaxation write Expired Expired Cache hit & Adaptive write Ghost hit & Adaptive write

slide-14
SLIDE 14

Design : AACM (1/2)

Estimation of IRG

Use 1st order Markov chain for estimation of IRG Coarse grain levels

10², 10³, 10⁴, 10⁵, 10⁶, 10⁷ seconds

Estimation is larger than 90% Memory overhead is 144 bytes for each data

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% usr0 stg0 src20 hm0 mds0 prn0 prn1 homes webmail wm+online Accuracy

slide-15
SLIDE 15

Outline

Introduction & Motivation Design Evaluation Conclusion

slide-16
SLIDE 16

Evaluation : Environment

Simulator

Time accurate in-house simulator Storage simulator and trace replayer

Trace

MSR-Cambridge traces during 7 days FIU traces during 21 days Websearch3 trace during 3.1 days

Simulator parameters

PCM SSD READ LA TENCY

16 us 50 us

WRITE LA TENCY

91.2 us 900 us

READ ENERGY

81.9 nj 14.25uj

WRITE ENERGY

4.73 uj 256 uj

RETENTION SPEEDUP 10⁷

1X

10⁶

1.2X

10⁵

1.5X

10⁴

1.7X

10³

1.9X

10²

2.1X

slide-17
SLIDE 17

Evaluation : Hit ratio

Hit ratio

Cache size is set to 25 % of working set of each workload

Cache size is set to be 1.95GB with hm₀ trace(the working set is 7.8GB)

Comparable to LRU giving and taking a little bit depending on the workload

LRU REF SACM AACM

slide-18
SLIDE 18

Evaluation : Latency

Latency (normalized to that of LRU)

REF reduces latency even more by as much as 48% (36% on average) SACM does it by as much as 7% (4% on average) AACM does it up to 40% (30% on average)

LRU REF SACM AACM

slide-19
SLIDE 19

Evaluation : Latency with refresh

Latency (normalized to that of LRU)

REF with refresh operations increases normalized latency up to 6X

LRU REF SACM AACM

slide-20
SLIDE 20

Evaluation : Latency with refresh (without REF)

Latency (normalized to that of LRU)

REF with refresh operations increases normalized latency up to 6X SACM and AACM perform better than LRU though the margin has dwindled

SACM decreases the latency by 5% on average AACM decreases the latency by 15% on average

LRU SACM AACM

slide-21
SLIDE 21

Evaluation : Endurance

Endurance

REF harms the endurance from refresh operations

LRU REF SACM AACM

slide-22
SLIDE 22

Evaluation : Endurance (without REF)

Endurance

REF harms the endurance from refresh operations SACM showing similar write counts to LRU AACM incurs roughly 1% more writes compared to LRU (4% at maximum Considering the MLC PCM endurance (10⁵), the total amount of writes (wm +online), we can estimate that the lifetime is around 26 years.

LRU SACM AACM

slide-23
SLIDE 23

Evaluation : Energy consumption

Energy consumption

Energy = Nread x Eread + Nwrite x Ewrite REF is 9 times higher than LRU (refresh overhead)

LRU REF SACM AACM

slide-24
SLIDE 24

Evaluation : Energy consumption

Energy consumption

Energy = Nread x Eread + Nwrite x Ewrite REF is 9 times higher than LRU (refresh overhead) SACM reduces energy consumption on average 11% AACM saves energy consumption on average 37% (and as high as 49%)

LRU SACM AACM

slide-25
SLIDE 25

Evaluation : Energy consumption

Energy consumption

Energy = Nread x Eread + Nwrite x Ewrite REF is 9 times higher than LRU (refresh overhead) SACM reduces energy consumption on average 11% AACM saves energy consumption on average 37% (and as high as 49%) Also, AACM saves energy by an average of 13% on whole storage system Cause of retention relaxation and reduction of accesses in SSD

LRU REF SACM AACM

slide-26
SLIDE 26

Evaluation : Hit ratio with various cache size

Hit ratio and latency with various cache size

AACM performs better when the cache size is set to be small Also, when the cache size becomes larger, both schemes show comparable performance since LRU also keeps most of the cacheable data

50% 62.5% 75% 87.5% 100% Cache size 25% Cache size 50% Cache size 80%

LRU-hm_0 LRU-mds_0 LRU-prn_0 LRU-stg_0 LRU-usr_0 LRU-webmail AACM-hm_0 AACM-mds_0 AACM-prn_0 AACM-stg_0 AACM-usr_0 AACM-webmail

slide-27
SLIDE 27

Evaluation : Latency with various cache size

Hit ratio and latency with various cache size

AACM performs better when the cache size is set to be small Also, when the cache size becomes larger, both schemes show comparable performance since LRU also keeps most of the cacheable data In terms of latency, AACM outperforms LRU due to retention relaxation for all considered cache sizes

Cache size 25% Cache size 50% Cache size 80%

slide-28
SLIDE 28

Outline

Introduction & Motivation Design Evaluation Conclusion

slide-29
SLIDE 29

Conclusion

Conclusion

We suggest new cache management schemes that introduce the amnesic notion to balance the limited retention capability and write speed Experimental results show that our proposal is effective in terms of performance and energy consumption.

AACM can reduce write latency by up to 40% (30% on average) Also, AACM save energy consumption by up to 49% (37% on average)

slide-30
SLIDE 30

Q&A

slide-31
SLIDE 31

Cell distribution Resistance

State ’11' State ’10' State ’01' State ’00'

Cell distribution Resistance

State ’11' State ’10' State ’01' State ’00'