Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery
Yu Cai, Yixin Luo, Erich F. Haratsch*, Ken Mai, Onur Mutlu Carnegie Mellon University, *LSI Corporation
1
Flash Memory: Characterization, Optimization, and Recovery Yu Cai, - - PowerPoint PPT Presentation
Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai, Yixin Luo, Erich F. Haratsch*, Ken Mai, Onur Mutlu Carnegie Mellon University, *LSI Corporation 1 You Probably Know Many use cases: + High
Yu Cai, Yixin Luo, Erich F. Haratsch*, Ken Mai, Onur Mutlu Carnegie Mellon University, *LSI Corporation
1
+ High performance, low energy consumption
2
– Requires erase before program (write) – High raw bit error rate
3
CPU Flash Controller
ECC Controller
Raw Flash Memory Chips
4
Program/Erase (P/E) Cycles (or Writes Per Cell) Raw bit error rate (RBER) ECC-correctable RBER ~3000 ~2000
Goal: Extend flash memory lifetime at low cost
P/E Cycle Lifetime
5
Charge leakage over time
One dominant source of flash memory errors [DATE ‘12, ICCD ‘12]
1
Retention error
Flash cell
6
Before I show you how we extend flash lifetime …
7
Normalized Vth
1
Flash cell Flash cell
8
Normalized Vth
1
Probability Density Function (PDF)
9
Normalized Vth PDF
1
Vref
10
Normalized Vth Erased (11) P1 (10) P2 (00) P3 (01) PDF ER-P1 Vref P1-P2 Vref P2-P3 Vref
11
Normalized Vth PDF P1 (10) P2 (00) P3 (01) Before retention loss: After some retention loss:
Fixed Read Reference Voltage Becomes Suboptimal
12
Normalized Vth P1-P2 Vref P2-P3 Vref Normalized Vth PDF P1 (10) P2 (00) P3 (01) Raw bit errors Before retention loss: After some retention loss:
13
Normalized Vth PDF P1 (10) P2 (00) P3 (01) P1-P2 Vref P2-P3 Vref P1-P2 OPT P2-P3 OPT Minimal raw bit errors After some retention loss:
14
Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage
Correctable errors
15
Normalized Vth PDF P1 (10) P2 (00) P3 (01) P1-P2 Vref P2-P3 Vref Uncorrectable errors After some retention loss: After significant retention loss:
16
Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors
17
To understand the effects of retention loss:
18
Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors To understand the effects of retention loss:
Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage
19
FPGA-based flash memory testing platform [Cai+,FCCM ‘11]
20
21
Characterize the effects of retention loss
22
Normalized Vth PDF P1 P2 P3
23
Finding: Cell’s threshold voltage decreases over time P1 P2 P3 0-day 40-day 0-day 40-day
24
P1 P2 P3 Finding: OPT decreases over time 0-day OPT 40-day OPT 0-day OPT 40-day OPT
25
P/E Cycles RBER
Actual OPT Reading data with 7-day worth of retention loss.
26
ECC-correctable RBER
Finding: Using actual OPT achieves the longest lifetime
Vref closer to actual OPT Nominal Lifetime Extended Lifetime
Due to re retention lo loss
‐Cell’s threshold voltage (Vth) decreases over time ‐Optim imal read reference volt ltage (OPT) decreases
Using the actual OPT T for reading
‐Achieves the longest lif lifetime
27
28
Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors To understand the effects of retention loss:
Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage
Key idea: Read the data multiple times with different read reference voltages until the raw bit errors are correctable by ECC Finds the optimal read reference voltage Requires many read-retries higher read latency
29
30
Flash Read Techniques Lifetime (P/E Cycle) Performance (Read Latency) Fixed Vref
Sweeping Vref
Our Goal
decreases over time Key idea: Record the old OPT as a prediction (Vpred) of the actual OPT Benefit: Close to actual OPT Fewer read retries
within a flash block Key idea: Record only one Vpred for each block Benefit: Small storage overhead (768KB out of 512GB)
31
Components:
‐Periodically records a Vpred for each block
‐Utilizes the recorded Vpred to minimize read-retry count
32
33
Normalized Vth PDF
New Vpred Old Vpred
34
Normalized Vth PDF
OPT Vpred
Very close
35
Flash Read Techniques Lifetime (P/E Cycle) Performance (Read Latency) Fixed Vref
Sweeping Vref
ROR
36
Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors To understand the effects of retention loss:
Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage
Correctable errors
37
Normalized Vth PDF P1 (10) P2 (00) P3 (01) P1-P2 Vref P2-P3 Vref Uncorrectable errors After some retention loss: After significant retention loss:
38
Normalized Vth PDF S F low-leaking cell ast-leaking cell S F
39
Normalized Vth PDF S F S F S F S F P2 P3
P2 P3 F F F F
40
Normalized Vth PDF S F S F S F S F
Fast-leaking cells have lower Vth Slow-leaking cells have higher Vth
41
Normalized Vth PDF S F S F S F S F
OPT
P2 P3
Key idea: Guess original state of the cell from its leakage speed property Three steps 1. Identify risky cells 2. Identify fast-/slow-leaking cells 3. Guess original states
42
43
Normalized Vth PDF S S F F OPT+σ OPT OPT–σ Risky cells P2 P3 + S = + F = Key Formula
44
Normalized Vth PDF OPT+σ OPT OPT–σ Risky cells P2 P3 + S = + F = Key Formula ? ? ? ? ? ?
45
Normalized Vth PDF OPT+σ OPT OPT–σ Risky cells P2 P3 + S = + F = Key Formula ? ? ? ? S F F S ? ?
46
Normalized Vth PDF S F F S Risky cells P2 P3 + S = + F = Key Formula
50% of raw bit errors
remaining errors
47
Program with random data Detect failure, backup data Recover data 28 days 12 addt’l. days
48
Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors To understand the effects of retention loss:
Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage
Problem: Retention loss reduces flash lifetime Overall Goal: Extend flash lifetime at low cost Flash Characterization: Developed an understanding
Retention Optimized Reading: A low-cost mechanism that dynamically finds the optimal read reference voltage
‐ 64% lifetime ↑, 70.4% read latency ↓
Retention Failure Recovery: An offline mechanism that recovers data after detecting uncorrectable errors
‐ Raw bit error rate 50% ↓, reduces data loss
49
Yu Cai, Yixin Luo, Erich F. Haratsch*, Ken Mai, Onur Mutlu Carnegie Mellon University, *LSI Corporation
50
51
Data loss can happen in many ways 1. High P/E cycle 2. High temperature accelerates retention loss 3. High retention age (lost power for a long time)
52
Key: RFR does not have to correct all errors Example:
(40 total errors ECC correctable)
53
54
Threshold voltage mean
P1 P2 P3 Finding: Vthshifts faster in higher voltage states
Quickly decrease Slowly decrease Relatively constant
55
Actual OPT
Reading data with 7-day retention age.
Finding: The actual OPT achieves the lowest RBER
RBER gradually decreases as read reference voltage approaches the actual OPT
56
Normalized Vth PDF Normalized Vth PDF OPT OPT V0 V0 Case: V0 < OPT Case: V0 > OPT
riodically learn and record OPT for page 255 as per-block starting read reference voltage (V0)
‐ Page 255 has the shortest retention age ‐ Other pages within the block have longer retention age and retention age will increase over time
57 *ΔV is the smallest step size for changing read reference voltage.
58
Observation: Average ECC latency ∝ RBER
Attempt 1
time Read page A (Stage-0):
Flash Read Latency ECC Latency
= Constant ∝ Raw bit error*
*We provide detailed analysis of ECC latency in the paper.
59
1 year 32 hours Room temperature (20°C) High temperature (70°C)
High temperature accelerates retention loss
60
Slow-leaking cells Fast-leaking cells
(-1σ,μ) (1σ,2σ) (2σ,3σ) (3σ,+∞) (-∞,-3σ) (-3σ,-2σ) (-2σ,-1σ) (μ,1σ) Retention age (days)
*Similar trends are found in P2 state, as shown in the paper.
Average Vth shift
Ends up in higher Vth Ends up in higher Vth
61
Normalized Vth PDF μ 1σ 2σ 3σ
Threshold voltage marks after 28 days:
62
Slow-leaking cells Fast-leaking cells
(-1σ,μ) (1σ,2σ) (2σ,3σ) (3σ,+∞) (-∞,-3σ) (-3σ,-2σ) (-2σ,-1σ) (μ,1σ) Retention age (days)
*Similar trends are found in P2 state, as shown in the paper.
Average Vth shift
Ends up in higher Vth Ends up in lower Vth
63
Substrate Floating gate (FG) Control gate (CG) Drain Source Inter-poly oxide Tunnel oxide
Substrate FG CG D S