 
              Adap%ve policies for balancing performance and life%me of mixed SSD arrays through workload sampling Sangwhan Moon A. L. Narasimha Reddy Texas A&M University
2 / 16 Outline • Introduc%on – Mixed SSD Arrays – Workload distribu%on of mixed SSD array • Problem Statement • Selec%ve caching policies • Our approach – Online sampling – Adap%ve workload distribu%on • Evalua%on • Conclusion
3 / 16 Different classes of SSDs 100 10 Cost ($/GB) Low-‑end SSDs High-‑end SSDs 1 0.1 0.1 1 10 100 Device Writes Per Day (DWPD, higher is be>er)
4 / 16 Mixed SSD array • High-‑end SSDs cache – Faster: PCIe interface – Reliable: SLC eMLC (write endurance = 100K) – Expensive per gigabyte • Low-‑end SSDs main storage – Slower: Serial ATA interface – Less reliable: MLC TLC (write endurance < 30K) – Cheap per gigabyte
5 / 16 Workload distribu%on of mixed SSD array • LRU Caching Policy Read/write workload w C = m r r + w r , w read write w C , w S Writes per flash cell N C ⋅ C C Cache read/write miss rate 1. r 2. w m r , m w High-‑end SSDs The number of SSDs N C , N S read miss dirty entry evic%on The capacity of SSD C C , C S 4. m r r 5.( m r r + m w w ) ⋅ d 3. m r r Write endurance of cache/storage l C , l S Low-‑end SSDs w S = m w w ! $ min l C l S N S ⋅ C S Lifetime = , # & w C w S " %
6 / 16 Workload distribu%on of mixed SSD array • 1 high-‑end SSD cache for 3 low-‑end SSDs Item DescripKon SpecificaKon w C = 0.5 ⋅ 100 MB / s + 250 MB / s Capacity 100 GB High-‑end SSD 1 ⋅ 100 GB (SLC) Write Endurance 100 K read write Capacity 200 GB Low-‑end SSD (MLC) Write Endurance 10 K 1. r 2. w Read/write (MB/s) 100 / 250 High-‑end SSDs Workload Read/write cache hit rate 50% / 15% read miss dirty entry evic%on 5.( m r r + m w w ) ⋅ d 4. m r r Read / write length 4KB / 64KB 3. m r r Low-‑end SSDs high-‑end low-‑end w C = 0.85 ⋅ 250 MB / s ! $ Lifetime = min 1.47 years , 6.34 years # & 1 ⋅ 100 GB " %
7 / 16 Problem statement • High-‑end SSDs cache can wear out faster than low-‑end SSDs main storage – Caching less results in poor performance – Caching more results in poor reliability • Sta%c workload classifiers can be less efficient • The characteris%cs of workload can change over %me • Objec%ves – Balance the performance and life%me of cache and storage at the same %me metric : Latency over Life0me (less is be5er)
8 / 16 Selec%ve caching policies • Request Size based Caching Policy • Hotness based Caching Policy Sta0c workload classifiers cannot distribute workload across cache and storage precisely I/O requests whose sizes 90% of workload is reference are 4KB are domina%ng once and never accessed
9 / 16 Selec%ve caching policies • Control trade-‑offs between performance and life%me p (threshold): the probability of caching data p is more: cache wears out faster, performance enhances p is less: cache wears slower, performance degrades read write 1. h r r 4. pw Frontend Cache bypassed bypassed read miss dirty entry evic%on read miss writes 6. m w wp 2. m r r 7. m r (1 − p ) r 5.(1 − p ) w 3. m r pr Backend Storage ProbabilisKc Caching Policy
10 / 16 Online sampling Es%mate latency over life%me for each sampling cache Employ best value of p , the Sampling Rate: 10% proximity of caching 1% 1% 1% 1% 1% 90 % . . . Sampling Sampling Sampling Sampling Sampling Selec%ve . . . Cache Cache Cache Cache Cache Cache p 0.2 1.0 1.0 – p 0.1 0.3 0.9 . . . LRU LRU LRU LRU LRU LRU Main Storage
11 / 16 Simula%on environment • Trace-‑driven simulator • Microsog Research Cambridge I/O Block Trace – 13 enterprise applica%ons trace for a week • Cache provisioning = 5% – Cache size / Storage size • Unique data size of workload / Storage Size = 0.5 • Caching policies – LRU, size-‑based (+ sampling), hotness-‑based (+ sampling), probabilis%c (+ sampling)
12 / 16 Adap%ve threshold Hardware monitoring 10 latency 1 life%me metric Cache less Cache less Cache more Cache more 0.1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Threshold Threshold 10 Web server latency 1 life%me metric Cache less Cache more Cache less Cache more 0.1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Threshold Threshold Sampling based analysis StaKc threshold based analysis
13 / 16 Different workload traces • Overall, reduced latency over life%me by 60%. – Very effec%ve on some traces (mds, stg, web, prn, usr, proj, src1, src2) – Less effec%ve on very skewed workload (wdev, rsrch, ts, hm, prxy)
14 / 16 Different sampling rates • Higher sampling rate results in more accurate es%ma%on (beneficial) and less space for adap%ve cache (harmful)
15 / 16 Conclusion • We showed that high-‑end SSD cache can wear out faster than low-‑end SSD main storage. • We proposed sampling based selec%ve caching to balance the performance and life%me of cache and storage. • Trace-‑based simula%on showed that the proposed caching policy is effec%ve.
16 / 16 Q & A
Recommend
More recommend