ZombieNAND: Resurrecting Dead NAND Flash for Improved SSD Longevity - - PowerPoint PPT Presentation

zombienand resurrecting dead nand flash for improved ssd
SMART_READER_LITE
LIVE PREVIEW

ZombieNAND: Resurrecting Dead NAND Flash for Improved SSD Longevity - - PowerPoint PPT Presentation

ZombieNAND: Resurrecting Dead NAND Flash for Improved SSD Longevity Ellis H. Wilson III 1 , 2 Myoungsoo Jung 3 Mahmut Kandemir 1 1 Department of Computer Science and Engineering, The Pennsylvania State University 2 Panasas, Inc. 3 Department of


slide-1
SLIDE 1

ZombieNAND: Resurrecting Dead NAND Flash for Improved SSD Longevity

Ellis H. Wilson III 1,2 Myoungsoo Jung 3 Mahmut Kandemir 1

1Department of Computer Science and Engineering,

The Pennsylvania State University

2Panasas, Inc. 3Department of Electrical Engineering,

The University of Texas at Dallas

September 10th, 2014

slide-2
SLIDE 2

Motivation Simulation Results

Before We Begin: Get the Slides and Paper

Slides and Paper are Available At:

www.ellisv3.com

ellis (www.ellisv3.com) ZombieNAND

slide-3
SLIDE 3

Motivation Simulation Results

Contents: I

1

Motivation and Background for ZombieNAND Background on Flash Proof-of-Concept Problem Statement

2

Simulation Model and ZombieNAND Wear-Leveling High-Fidelity Longevity Simulation Fixing Current Wear-Leveling Shortcomings

3

Synthetic and Trace-Driven Simulation Results Experimental Setup Synthetic Experiment Results Trace-Driven Experiment Results

ellis (www.ellisv3.com) ZombieNAND

slide-4
SLIDE 4

Motivation Simulation Results Background Proof-of-Concept Problem Statement

The Present and Future of Flash

Well-Known Flash Dynamics SLC: Fast, Long Life, Small Size MLC: Medium, Medium Life, Medium Size TLC: Slow, Short Lived, Large Size Cells are getting smaller (i.e., slower, shorter-lived)! Future Flash: As consumers push towards higher-capacity and SSDs slowly replace HDDs, longevity will return to the forefront of the discussion

ellis (www.ellisv3.com) ZombieNAND

slide-5
SLIDE 5

Motivation Simulation Results Background Proof-of-Concept Problem Statement

Leveraging the Little-Known

A Little-Known Flash Fact SLC, MLC, and TLC are solely logical differentiations Same underlying NAND material! Our Question: Given impending longevity concerns and increasing NAND diversity, can we develop a scheme that will increase longevity without sacrificing manufacturer longevity guarantees or performance?

ellis (www.ellisv3.com) ZombieNAND

slide-6
SLIDE 6

Motivation Simulation Results Background Proof-of-Concept Problem Statement

Proof-Of-Concept on Real Hardware SSD A (MLC):

1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 Latency (us) P/E cycle LSB MSB Ty pical Transition Occurrence

SSD B (MLC):

1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 Latency (us) P/E cycle LSB MSB Typical Transition Occurrence

Take-away: Potential! But this (write all pages, erase, repeat) is an extremely simplified scenario.

ellis (www.ellisv3.com) ZombieNAND

slide-7
SLIDE 7

Motivation Simulation Results Background Proof-of-Concept Problem Statement

Problem Statement

How Best to Leverage This Trick? Sounds Simple: Just transition a block down a bit-level when it approaches death Open Problems Targetted: Upon bit-switch, how long will the new MLC (or SLC) block survive? Can we do a double-death? How much does ZombieNAND extend lifetime? Do current-gen algorithms (e.g., wear-leveling) work with this? What is the impact on performance (before and after rebirth)? Don’t break any manufacturer guarantees!

ellis (www.ellisv3.com) ZombieNAND

slide-8
SLIDE 8

Motivation Simulation Results High-Fidelity Longevity Simulation Wear-Leveling

Simulation Framework

Existing Simulators Fall Short Existing simulators use simple block counters Works when bit-levels remain constant, but when you

  • switch. . .

Extending DiskSim Add a physics-accurate stress model Add support to existing mechanics (e.g., garbage collection) to handle varying bit-levels blocks

ellis (www.ellisv3.com) ZombieNAND

slide-9
SLIDE 9

Motivation Simulation Results High-Fidelity Longevity Simulation Wear-Leveling

ZombieNAND Oxide Stress Model

1: procedure calc stress(cycle) 2:

A ← 0.08

3:

B ← 5.0

4:

Cox ← 2.15e−17

5:

q ← 1.6e−19

6:

δNit ← A ∗ cycle0.62

7:

δNot ← B ∗ cycle0.30

8:

δVit ← (δNit ∗ q)/Cox

9:

δVot ← (δNot ∗ q)/Cox

10: return (δVit + δVot) 11: end procedure

Conservative estimate: We ignore charge leakage (cell recovery) due to manufacturing variability

ellis (www.ellisv3.com) ZombieNAND

slide-10
SLIDE 10

Motivation Simulation Results High-Fidelity Longevity Simulation Wear-Leveling

Limitations of Existing Wear-Leveling

Existing Wear-Leveling Algorithms Overdo It Early experiments with adapted DiskSim demonstrate limited improvements Problem: We actually don’t want all of the cells to switch simultaneously Solution: Controlled Wear-Unleveling for Lifetime Early Blocks ≤ (R − W ) × B 2S−2 (1) R=reserved percentage, W=high-watermark percentage, B=number of blocks per element, S=starting bit-level

See the paper for the rest of the wear-leveling and GC algorithms

ellis (www.ellisv3.com) ZombieNAND

slide-11
SLIDE 11

Motivation Simulation Results Setup Synthetic Trace-Driven

Experimental Setup: Timings

Fixed access latencies and lifetime by bit-level:

Access Type (unit) SLC (2KB) MLC (4KB) TLC (8KB) Read (page) 0.025 ms 0.05 ms 0.15 ms Write (page) 0.2 ms 0.5 ms 1.0 ms Erase (block) 1.5 ms 1.5 ms 3.0 ms Lifetime (cycle) 75,000 6,000 1,000

Derived from specification documents from Micron. Fixed access latencies are not reasonable for small studies, but for lifetime studies they work fine.

ellis (www.ellisv3.com) ZombieNAND

slide-12
SLIDE 12

Motivation Simulation Results Setup Synthetic Trace-Driven

Experimental Setup: SSD Configuration

Key experimental SSD configurations.

Synthetic Trace-Driven Flash Chips 1 4 Blocks per Element 128 512 Planes per Element 8 8 Blocks per Plane 16 64 Pages per Block 128 128

Yes, these are “small” configurations (128MB and 1GB SSD sizes) relative to modern drives (often 128GB to 1TB) due to raw duration of simulation.

ellis (www.ellisv3.com) ZombieNAND

slide-13
SLIDE 13

Motivation Simulation Results Setup Synthetic Trace-Driven

Synthetic TLC Results: 50% Read/Write Ratio

Normalized Lifetime

10 20 30 40 50 60 70 80 90

Working Set Size (% of SSD)

6 8 10 12 14 16 18 20 22 24 26

Reserved Area (% of SSD)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Relative Improvement Over Baseline

Normalized Latency

10 20 30 40 50 60 70 80 90

Working Set Size (% of SSD)

6 8 10 12 14 16 18 20 22 24 26

Reserved Area (% of SSD)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Relative Change from Baseline

Take-aways: 1) All lifetimes are at least as long as the baseline. 2) Latency degradations occur largely after death of baseline. 3) Large lifetime gains (up to 16x) are not unreasonable due to huge differences in TLC/MLC/SLC P/E cycles; latency gains are accordingly less drastic. 4) Other R/W ratios follow same trend (see paper).

ellis (www.ellisv3.com) ZombieNAND

slide-14
SLIDE 14

Motivation Simulation Results Setup Synthetic Trace-Driven

Trace-Driven TLC Simulation Lifetime Results

1 2 3 4 5 6 7 8 9 10 11 12 Fin-A Fin-B NFS-A NFS-B NFS-C User-A User-B User-C SQL-A SQL-B

Longevity Improvement (Relative to Baseline) Application Trace Baseline 10% Reserve 20% Reserve 30% Reserve

Take-aways: 1) Still assuring in the worst case we match baseline. 2) Lifetime improvements vary widely across applications. 3) Efficacy has a strong correlation to address reuse of writes (see paper for details).

ellis (www.ellisv3.com) ZombieNAND

slide-15
SLIDE 15

Motivation Simulation Results Setup Synthetic Trace-Driven

Trace-Driven TLC Simulation Latency Changes Over Lifetime

For TLC, 20% Reserved Scenario (see paper for rest)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 20 40 60 80 100 120 140 160 180 200 Baseline Death

Latency Relative to Baseline Normalized Lifetime (100=Baseline Death, 200=Death)

Fin-A Fin-B NFS-A NFS-B NFS-C User-A User-B User-C SQL-A SQL-B

Take-aways: 1) Match or exceed performance up until last 5% of baseline life. 2) Some traces get even better after baseline death, some get far worse – desirable compared to complete death. 3) Spill-over accesses drive performance loss in post-baseline area.

ellis (www.ellisv3.com) ZombieNAND

slide-16
SLIDE 16

Motivation Simulation Results Setup Synthetic Trace-Driven

Questions?

ellis (www.ellisv3.com) ZombieNAND