SLIDE 1
Long-Term JPEG Data Protection and Recovery for NAND Flash-Based Solid-State Storage
Yu-Chun Kuo, Ruei-Fong Chiu, and Ren-Shuo Liu
System and Storage Design Lab Department of Electrical Engineering National Tsing Hua University Taiwan
1
SLIDE 2 Overview
- SD cards and eMMC consistute massive storage
- Tens to hundreds of Exabytes per year
- JPEG pictures are one of the most valuable data in them
- Leaving JPEG files in SD and eMMC for a long term is risky
- NAND flash is prone to have retention errors
- Uncorrectable errors corrupted pictures
2
A Few Years Later
SLIDE 3 Contributions
- Increase the robustness of JPEG stored in NAND flash
- At the cost of 9.9% storage overhead
- Rescue corrupted JPEG files
- Four techniques based on our observations
- Strong-page header protection
- Bit error propagation prevention
- DC error propagation mitigation
- Huffman-assisted error correction
- Compatible with existing JPEG viewers
3
SLIDE 4 Outline
- JPEG Background
- Observations and Design
- Evaluation
- Conclusion
4
SLIDE 5 JPEG Encoding Steps (Simplified)
- DCT: Discrete Cosine Transform
- DPCM: Differential Pulse Code Modulation
- JFIF: JPEG File Interchange Format
5
8x8 Blocks DCT DPCM Huffman Compression JFIF DC AC
SLIDE 6
6 8 8
8x8 Blocks DCT DPCM Huffman Compression JFIF DC AC
SLIDE 7
7 8 8 8 8
DC (mean value of the 8x8 block) 63 AC's
DCT
8x8 Blocks DCT DPCM Huffman Compression JFIF DC AC
SLIDE 8
8
Absolute DC values: Differential DC values: 8x8 Blocks DCT DPCM Huffman Compression JFIF DC AC
SLIDE 9
9
Popular values Less bits Less-popular values More bits
011 111 0011110 111
2 3 14 3
8x8 Blocks DCT DPCM Huffman Compression JFIF DC AC
SLIDE 10 10
Header
- Picture width & height
- Sampling method
- Huffman tables
Body
Huffman bits
8x8 Blocks DCT DPCM Huffman Compression JFIF DC AC
SLIDE 11 Outline
- Background
- Observations and Design
- Evaluation
- Conclusion
11
SLIDE 12 Observations
- Unequal criticality of JPEG file contents
- Error propagation phenomena
- Bit error propagation
- DC error propagation
- Skewed reliability of NAND flash
12
SLIDE 13 Unequal Criticality of JPEG File Contents
13
Header
- Picture width & height
- Sampling method
- Huffman tables
Body Huffman bits of all 8×8 blocks
SLIDE 14
Unequal Criticality of JPEG Data
14
Header having a single bit error very likely corrupts the entire picture
SLIDE 15 Unequal Criticality of JPEG Data
15
Body having a single bit error the results depends
- Nearly identical
- Horizontal stripes • Totally corrupted
SLIDE 16 Observations
- Unequal criticality of JPEG
- Error propagation phenomena
16
Horizontal stripes
- DC error propagation
- Bit error propagation
Totally corrupted
SLIDE 17
Bit Error Propagation Phenomenon
17
Huffman is a variable-length coding scheme
bit error can change code length many following codes can thus be mis-decoded
011 111 0011110 111 011 111 011 111 011
2 3 14 3
111
2 3 2 2 3
SLIDE 18
DC Error Propagation Phenomenon
18
JPEG stores differential DC values
Once a bit error interferes with one value, the following values are also mis-decoded Original values: DPCM encoded: Decoded values:
SLIDE 19 Observations
- Unequal criticality of JPEG
- Error propagation phenomena
- Bit error propagation
- DC error propagation
- Skewed reliability of NAND flash
19
SLIDE 20 Skewed Storage Reliability
- One third of flash pages can store data much more reliably
than the other pages
- We refer to them as strong/weak pages
- This property is known to SD and eMMC vendors but
is not exposed to users and applications
20
Flash Address Space Strong Weak
SLIDE 21 Skewed Storage Reliability
- Bits are grouped into MSB, CSB, LSB pages
- LSB pages are strong pages for the flash we tested
21
unlikely to happen
2 3 2
# of
SLIDE 22 Proposed Techniques
- Strong-page header protection
- Bit error propagation prevention
- DC error propagation mitigation
- Huffman-assisted error correction
22
SLIDE 23
23
Application Storage
weak weak strong
Applications Oblivious to Strong/Weak Pages
SLIDE 24
Strong-Page Header Protection
24
Application Storage
strong strong strong
SLIDE 25 Bit Error Propagation Prevention
- We additionally store the length of each 8×8 block in JPEG
header
- Stop bit errors from propagation
25
16
011 111 0011110 111 011 111 011 111 011
2 3 14 3
111
2 3 2 2 3
SLIDE 26 DC Error Propagation Mitigation
- Thumbnail
- Small JPEG embedded in the header of the
main JPEG
- Facilitate image preview
- We propose to set the width and height of
the thumbnail to be 1/8 of the main JPEG
- By doing so, thumbnail pixels approximate
the DC values of the main JPEG
26 Thumbnail:
SLIDE 27 DC Error Propagation Mitigation
- Use thumbnail pixels to calibrate
decoded DCs
- Error propagation is mitigated
27 Thumbnail:
Decoded DCs Thumbnail pixels
10 16 17 19 21
SLIDE 28 Huffman-Assisted Error Correction
- Many 8×8 blocks contain only single bit error
- 8×8 block is around 100 bits
- Target bit error rate is 10-2
- We propose to correct single bit error
per 8×8 block in a trial-and-error manner
28 Flip one bit Decode the block Check
SLIDE 29 Huffman-Assisted Error Correction
- We additionally store the number of Huffman codes of each
8×8 block in the header to check whether decoding is successful
29
4 Decoding results: 5
011 111 0011110 111 011 111 011 111 011
2 3 14 3
111
2 3 2 2 3
Flip one bit Decode the block Check
Check
SLIDE 30 Outline
- Background
- Observations and Design
- Evaluation
- Conclusion
30
SLIDE 31 Setup
- Platform
- Xilinx Zedboard FPGA
- 16nm, 3-bit-per-cell flash chip
- 105 JPEG files
- 100 from personal iPhone (3264×2448)
- Five from the Kodak suite (3072×2048)
- Temperature acceleration
- 70 hours under 85°C = 10 years under 25°C
- Assume bit error rates greater than 5×10-3 are uncorrectable
31
SLIDE 32 Experiments
- Flash characterization
- Average bit error rate
- Percentage of uncorrectable 2KB data blocks
- JPEG image quality at retention time wihin 10 years
- PSNR (Peak Signal to Noise Ratio)
- SSIM (Structural Similarity Index)
32
SLIDE 33
Average Raw BERs (Within 10 Years at 25 °C)
33
Strong pages Weak pages
SLIDE 34
Average % of Uncorrectable 2KB Blocks
34
Strong pages Weak pages
SLIDE 35 Image Quality (10 Years at 25 °C)
35
Ideal JPEG
protection
prevention
This work
All the four techniques
Baseline
SLIDE 36
Average PSNR (Within 10 Years at 25 °C)
36
SLIDE 37
Average SSIM (Within 10 Years at 25 °C)
37
SLIDE 38 Concerns About Employing Extra ECC Parities
- Employing that at flash chip level
- Cost per bit increases
- Vendors are reluctant to do so
- Employing that at disk level
- Disk capacity becomes non-constant
- May be problematic to applications and operating systems
- Employing that at application level
- Effectiveness of the extra parities is limited
- Modern ECCs heavily rely on low-level accesses to flash memory
38
SLIDE 39 Conclusion
- Increasing the robustness of JPEG files and rescue corrupted
JPEG files in flash-based storage
- Four techniques
- Strong-page header protection
- Bit error propagation prevention
- DC error propagation mitigation
- Huffman-assisted error correction
- Rescue corrupted JPEG files (10 years @ 25 °C)
- Up to 24.3 dB PSNR improvement
- At the cost of 9.9% of storage overhead
- Backward compatible with existing JPEG viewers
39
SLIDE 40
Long-Term JPEG Data Protection and Recovery for NAND Flash-Based Solid-State Storage
Yu-Chun Kuo, Ruei-Fong Chiu, and Ren-Shuo Liu System and Storage Design Lab Department of Electrical Engineering National Tsing Hua University Taiwan
40
SLIDE 41 JPEG Decoding and Recover Speed
- It takes 12 seconds on average for our program to recover a
corrupted (10-year) JPEG file
- Note that
- Speed is not a top concern for rescuing corrupted JPEG files
- It is easy to parallelize the recovery tasks of multiple corrupted
JPEG files
41
SLIDE 42
Skewed Storage Reliability
42
LSB pages are strong pages MSB pages are strong pages