Improving NAND Endurance
by Dynamic Program and Erase Scaling
NVRAMOS 2013
October 24, 2013
1
Jihong Kim
Department of Computer Science and Engineering Seoul National University, Korea
Improving NAND Endurance by Dynamic Program and Erase Scaling - - PowerPoint PPT Presentation
1 Improving NAND Endurance by Dynamic Program and Erase Scaling Jihong Kim Department of Computer Science and Engineering Seoul National University, Korea NVRAMOS 2013 October 24, 2013 Trend 1 : NAND Capacity +2x / 2 years Capacity
October 24, 2013
1
Department of Computer Science and Engineering Seoul National University, Korea
/ 31
2
SLC (1 bit/cell) MLC (2 bits/cell) TLC (3 bits/cell) (2000~2012 ISCC, VLSI)
/ 31
3
/ 31
4
/ 31
5
/ 31
6
Reducing WAF by increasing the efficiency of an FTL algorithm (e.g., garbage collection, wear leveling) โ Data compression โก Deduplication โข Dynamic throttling
/ 31 7
/ 31 8
/ 31 9
P/E Cycles Bit Errors ECC Limit
Control Gate Floating Gate Tunnel Oxide Substrate
/ 31 10
/ 31 11
/ 31
0.0 0.5 1.0 1.5 1 2 3 4
Number of P/E cycles [K]
r = 0.00 r = 0.07 r = 0.14
Average retention BER (normalized)
0.0 0.5 1.0 1.5 0.80 0.85 0.90 0.95 1.00
Effective wearing Normalized erase voltage (1- r)
12
/ 31 13
Total sum of effective wearing Endurance 3K P/E
3.00K 6.52K 3.00K 1.38K
/ 31 14
Width of Vth distributions Width of Vth distributions
/ 31 15
Program time Program time Width of Vth distributions Width of Vth distributions Step voltage Step voltage 1 2 3 1 2 3 4 5
/ 31 16
1.0 1.5 2.0 0.00 0.20 0.40 0.60
VISPP scaling ratio Program time (normalized)
1.0 1.5 2.0 0.85 0.90 0.95 1.00
Normalized erase voltage (1- r) Minimum Program time (normalized)
๐ฟ๐ญ๐ญ๐ญ๐ญ ๐ ๐ฟ๐ญ๐ญ๐ญ๐ญ ๐ ๐ฟ๐ญ๐ญ๐ญ๐ญ ๐ ๐ฟ๐ญ๐ญ๐ญ๐ญ ๐ ๐ฟ๐ญ๐ญ๐ญ๐ญ ๐ ๐ญ๐ญ๐ญ๐ญ๐ญ๐ญ ๐ ๐ญ๐ญ๐ญ๐ญ๐ญ๐ญ ๐ ๐ญ๐ญ๐ญ๐ญ๐ญ๐ญ ๐ ๐ญ๐ญ๐ญ๐ญ๐ญ๐ญ ๐ ๐ญ๐ญ๐ญ๐ญ๐ญ๐ญ ๐
(โ ๐ป๐ป๐ป๐ญ๐ญ ๐ญ๐พ๐พ ๐ญ๐ป๐๐๐๐)
/ 31 17
๐ญ๐ญ๐ญ๐ญ๐ญ๐ญ ๐ ๐ญ๐ญ๐ญ๐ญ๐ญ๐ญ ๐ ๐ญ๐ญ๐ญ๐ญ๐ญ๐ญ ๐ ๐ญ๐ญ๐ญ๐ญ๐ญ๐ญ ๐ ๐ญ๐ญ๐ญ๐ญ๐ญ๐ญ ๐ Vth voltage margin
๐ฟ๐ญ๐ญ๐ญ๐ญ ๐ ๐ฟ๐ญ๐ญ๐ญ๐ญ ๐ ๐ฟ๐ญ๐ญ๐ญ๐ญ ๐ ๐ฟ๐ญ๐ญ๐ญ๐ญ ๐ ๐ฟ๐ญ๐ญ๐ญ๐ญ ๐
Vth distribution erased with ๐ญ๐ญ๐ญ๐ญ๐ญ๐ญ ๐
/ 31 18
๐ฟ๐ญ๐ญ๐ญ๐ญ ๐ ๐ญ๐ญ๐ญ๐ญ๐ญ๐ญ ๐
Erasing with ๐ญ๐ญ๐ญ๐ญ๐ญ๐ญ ๐ .
๐ญ๐ญ๐ญ๐ญ๐ญ๐ญ ๐
To write data with a faster write mode (e.g., ๐ฟ๐ญ๐ญ๐ญ๐ญ ๐ ) to the shallowly erased block,
Writing with ๐ฟ๐ญ๐ญ๐ญ๐ญ ๐
๐ฟ๐ญ๐ญ๐ญ๐ญ ๐
Shallow erase
/ 31
Wearing Minimum program time High erase voltage mode
Low erase voltage mode
19
Short Long Low damage
High damage
/ 31 20
/ 31 21
Utilization
Write Request
Logical-to-Physical Mapping Table NAND Flash Memory Wear Leveler DPES Manager Garbage Collector
Background Foreground Number of pages to be copied
Per-Block Mode Table NAND Setting Table
EVmodej , ESmodek
Extended Mapping Table
DeviceSettings
Mode Selector
NAND Endurance Model Circular Buffer Program Erase
Wmode Selector Emode Selector
Wmodei Read
/ 31 22
Enqueue Dequeue K-entry circular buffer head tail Buffer utilization (u) Write mode u โค 20% 4 20% < u โค 40% 3 40% < u โค 60% 2 60% < u โค 80% 1 u > 80%
Short idle times Long idle times Program time
/ 31 23
DPES Manager Per-Block Mode Table Write/read Request mode set NAND Chips Address Translation Table Logical Address Physical Address Block_Addr 3 Device Setting for mode time
write (3) write (3) write (3)
2
read (2) read (2) Read/Verify references, ISPP voltages, (Erase voltage) Time overhead << TPROG
Block_Addr
/ 31 24
Cases Foreground garbage collection Background garbage collection, wear leveling ๐ญ๐ญ๐ญ๐ญ๐ญ๐ญ ๐ ๐ = ๐ Prediction based on the past utilization history Incorrect prediction ๏ Lazy erase
๐ฎ๐ฎ๐พ๐ฎ๐๐ญ ๐ฎ๐พ๐๐๐๐๐ป๐พ๐๐ญ๐ ๐ญ๐ ๐ ๐๐๐ ๐ฎ๐๐ป๐ ๐๐ฎ๐๐๐ญ๐
/ 31 25
Page copy with ๐ฟ๐ญ๐ญ๐ญ๐ญ ๐
Victim block Free block Circular Buffer Circular Buffer
Current utilization, ๐ฎ Effective utilization,
โ๐ฎ๐ ๐ญ๐ ๐
/ 31 26
/ 31 27
NAND flash chip 128 blocks/chip, 8 KB/page Chips/channel 2 8 # of channels 1 4 Size of circular buffer 80 KB 32 MB NAND timing model Timing accurate emulation model using hrtimers (variation < 1%) I/O traces Mobile (2ea) Server (6ea)
* S. Lee et al., โFlashBench: A Workbench for a Rapid Development of Flash-Based Storage Devices,โ
IEEE Int. Symp. Rapid System Prototyping, 2012.
Extended FlashBench* configuration
/ 31 0% 20% 40% 60% 80% 100%
28
Inter-arrival time effective
= ๐ผ๐ธ๐ธ๐ธ๐ธ ๐ผ๐ญ๐พ๐ป๐ # ๐ญ๐ ๐ ๐พ๐๐ ๐
/ 31 0% 20% 40% 60% 80% 100%
prxy_0 proj_0
0.5 1 1.5 2 2.5 3
Baseline AutoFTL
29
+46% +50% +82% +78% +39%
+76% +80% +37%
mode 3 mode 2 mode 1 mode 0 mode 4
/ 31 0.5 1 1.5
Baseline AutoFTL
30
Max.
/ 31 31