flash memory characterization
play

Flash Memory: Characterization, Optimization, and Recovery Yu Cai, - PowerPoint PPT Presentation

Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai, Yixin Luo, Erich F. Haratsch*, Ken Mai, Onur Mutlu Carnegie Mellon University, *LSI Corporation 1 You Probably Know Many use cases: + High


  1. Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai, Yixin Luo, Erich F. Haratsch*, Ken Mai, Onur Mutlu Carnegie Mellon University, *LSI Corporation 1

  2. You Probably Know • Many use cases: + High performance, low energy consumption 2

  3. NAND Flash Memory Challenges – Requires erase before program (write) – High raw bit error rate Controller Raw Flash Flash CPU Memory Chips ECC Controller 3

  4. Limited Flash Memory Lifetime Goal: Extend flash memory lifetime Raw bit error rate (RBER) at low cost P/E Cycle Lifetime ECC-correctable RBER ~2000 ~3000 Program/Erase (P/E) Cycles (or Writes Per Cell) 4

  5. Retention Loss Charge leakage over time 0 0 1 Retention Flash cell error One dominant source of flash memory errors [DATE ‘12, ICCD ‘12] 5

  6. Before I show you how we extend flash lifetime … NAND Flash 101 6

  7. Threshold Voltage (V th ) Flash cell Flash cell 0 1 Normalized V th 7

  8. Threshold Voltage (V th ) Distribution Probability Density Function (PDF) 0 1 Normalized V th 8

  9. Read Reference Voltage (V ref ) PDF V ref 0 1 Normalized V th 9

  10. Multi-Level Cell (MLC) ER-P1 V ref P1-P2 V ref P2-P3 V ref PDF Erased P1 P2 P3 (11) (10) (00) (01) Normalized V th 10

  11. Threshold Voltage Reduces Over Time Before retention loss: After some retention loss: PDF P1 P2 P3 (10) (00) (01) Normalized V th 11

  12. Fixed Read Reference Voltage Becomes Suboptimal Before retention loss: After some retention loss: P1-P2 V ref P2-P3 V ref PDF P1 P2 P3 (10) (00) (01) Normalized V th Normalized V th Raw bit errors 12

  13. Optimal Read Reference Voltage (OPT) After some retention loss: P1-P2 OPT P2-P3 OPT P1-P2 V ref P2-P3 V ref PDF P1 P2 P3 (10) (00) (01) Normalized V th Minimal raw bit errors 13

  14. Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage 14

  15. Retention Failure After some retention loss: After significant retention loss: P1-P2 V ref P2-P3 V ref PDF P1 P2 P3 (10) (00) (01) Normalized V th Uncorrectable errors Correctable errors 15

  16. Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 16

  17. To understand the effects of retention loss: - Characterize retention loss using real chips 17

  18. To understand the effects of retention loss: - Characterize retention loss using real chips Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 18

  19. Characterization Methodology FPGA-based flash memory testing platform [Cai+,FCCM ‘11] 19

  20. Characterization Methodology • FPGA-based flash memory testing platform • Real 20- to 24-nm MLC NAND flash chips • 0- to 40-day worth of retention loss • Room temperature (20⁰C) • 0 to 50k P/E Cycles 20

  21. Characterize the effects of retention loss 1. Threshold Voltage Distribution 2. Optimal Read Reference Voltage 3. RBER and P/E Cycle Lifetime 21

  22. 1. Threshold Voltage (V th ) Distribution PDF P1 P2 P3 Normalized V th 22

  23. 1. Threshold Voltage (V th ) Distribution 0-day 0-day 40-day 40-day P1 P2 P3 Finding: Cell’s threshold voltage decreases over time 23

  24. 2. Optimal Read Reference Voltage (OPT) 40-day 0-day 40-day 0-day OPT OPT OPT OPT P1 P2 P3 Finding: OPT decreases over time 24

  25. 3. RBER and P/E Cycle Lifetime RBER P/E Cycles 25

  26. 3. RBER and P/E Cycle Lifetime V ref closer to Reading data with 7-day worth of retention loss. actual OPT Nominal Extended Lifetime Lifetime Actual OPT ECC-correctable RBER Finding: Using actual OPT achieves the longest lifetime 26

  27. Characterization Summary Due to re retention lo loss ‐ Cell’s threshold voltage (V th ) decreases over time ‐ Optim imal read reference volt ltage (OPT) decreases over time Using the actual OPT T for reading ‐ Achieves the longest lif lifetime 27

  28. To understand the effects of retention loss: - Characterize retention loss using real chips Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 28

  29. Naïve Solution: Sweeping V ref Key idea: Read the data multiple times with different read reference voltages until the raw bit errors are correctable by ECC  Finds the optimal read reference voltage  Requires many read-retries  higher read latency 29

  30. Comparison of Flash Read Techniques Flash Read Lifetime Performance Techniques (P/E Cycle) (Read Latency)   Fixed V ref   Sweeping V ref   Our Goal 30

  31. Observations 1. The optimal read reference voltage gradually decreases over time Key idea: Record the old OPT as a prediction (V pred ) of the actual OPT Benefit: Close to actual OPT  Fewer read retries 2. The amount of retention loss is similar across pages within a flash block Key idea: Record only one V pred for each block Benefit: Small storage overhead (768KB out of 512GB) 31

  32. Retention Optimized Reading (ROR) Components: 1. Online pre-optimization algorithm ‐ Periodically records a V pred for each block 2. Improved read-retry technique ‐ Utilizes the recorded V pred to minimize read-retry count 32

  33. 1. Online Pre-Optimization Algorithm • Triggered periodically (e.g., per day) • Find and record an OPT as per-block V pred • Performed in background • Small storage overhead New Old PDF V pred V pred Normalized V th 33

  34. 2. Improved Read-Retry Technique • Performed as normal read • V pred already close to actual OPT • Decrease V ref if V pred fails, and retry PDF OPT V pred Normalized V th Very close 34

  35. Retention Optimized Reading: Summary Flash Read Lifetime Performance Techniques (P/E Cycle) (Read Latency)   Fixed V ref  64% ↑  Sweeping V ref  64% ↑  _____ Nom. Life: 2.4% ↓ ROR Ext. Life: 70.4% ↓ 35

  36. To understand the effects of retention loss: - Characterize retention loss using real chips Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 36

  37. Retention Failure After significant retention loss: After some retention loss: P1-P2 V ref P2-P3 V ref PDF P1 P2 P3 (10) (00) (01) Normalized V th Uncorrectable errors Correctable errors 37

  38. Leakage Speed Variation PDF S S low-leaking cell F ast-leaking cell F Normalized V th 38

  39. Initially, Right After Programming PDF P2 P3 S S F F F F S S Normalized V th 39

  40. After Some Retention Loss Fast-leaking cells have lower V th PDF Slow-leaking cells have higher V th P2 P3 S S F F F F F F F F S S Normalized V th 40

  41. Eventually: Retention Failure PDF OPT P2 P3 S S F F F F S S Normalized V th 41

  42. Retention Failure Recovery (RFR) Key idea: Guess original state of the cell from its leakage speed property Three steps 1. Identify risky cells 2. Identify fast-/slow-leaking cells 3. Guess original states 42

  43. 1. Identify Risky Cells OPT –σ OPT+ σ + S = P2 Risky PDF OPT cells + F = P3 Key Formula S F F S Normalized V th 43

  44. 2. Identifying Fast- vs. Slow-Leaking Cells OPT –σ OPT+ σ + S = P2 Risky PDF OPT cells + F = P3 Key Formula ? ? ? ? ? ? Normalized V th 44

  45. 2. Identifying Fast- vs. Slow-Leaking Cells OPT –σ OPT+ σ + S = P2 Risky PDF OPT cells + F = P3 Key Formula ? S ? ? F ? ? F ? S Normalized V th 45

  46. 3. Guess Original States + S = P2 Risky PDF cells + F = P3 Key Formula S F F S Normalized V th 46

  47. RFR Evaluation • Expect to eliminate Program with 50% of raw bit errors random data • ECC can correct remaining errors 28 days Detect failure, backup data 12 addt’l . days Recover data 47

  48. To understand the effects of retention loss: - Characterize retention loss using real chips Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 48

  49. Conclusion Problem: Retention loss reduces flash lifetime Overall Goal: Extend flash lifetime at low cost Flash Characterization: Developed an understanding of the effects of retention loss in real chips Retention Optimized Reading: A low-cost mechanism that dynamically finds the optimal read reference voltage ‐ 64% lifetime ↑ , 70.4% read latency ↓ Retention Failure Recovery: An offline mechanism that recovers data after detecting uncorrectable errors ‐ Raw bit error rate 50% ↓ , reduces data loss 49

  50. Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai, Yixin Luo, Erich F. Haratsch*, Ken Mai, Onur Mutlu Carnegie Mellon University, *LSI Corporation 50

  51. Backup Slides 51

  52. RFR Motivation Data loss can happen in many ways 1. High P/E cycle High temperature  accelerates retention 2. loss 3. High retention age (lost power for a long time) 52

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend