SLIDE 1 sa pa
MICRO 2013 Adrian Sampson Jacob Nelson Karin Strauss Luis Ceze
Approximate Storage in Solid-State Memories
University of Washington Microsoft Research & UW University of Washington
SLIDE 2 Vector Processor
GPU CPU
Compiler Runtime
Accelerator
SLIDE 3
SLIDE 4 Compute
Display Disk Network
I/O Storage
Memory
SLIDE 5 Compute
Disk
I/O Storage
Memory
SLIDE 6
SLIDE 7
SLIDE 8
SLIDE 9 0% 20% 40% 60% 80% 100% 1.6 1.8 2 2.2 2.4 2.6 2.8 3
average write steps jmeint
% 2 % 4 % 6 % 8 % 1 % 2 . 2 2 . 4 2 . 6 2 . 8 3 3 . 2 3 . 4
t p u t q u a l i t y l
s w r i t e s
× 1
7
y t r a c e r z x i n g f f t j m e i n t m c s m m s
Main-memory applications using failed blocks.
s e n s
l
b y f h a s
I m p a c t
correcting per
SLIDE 10 Themes in approximate computing
approx precise
Interleaving: Programs are both approximate & precise
LO HI
x ± y
Error mitigation: Exploit the hardware to minimize error
SLIDE 11
Phase-change memory (PCM)
Surpass DRAM’s scaling limits
+
Non-volatile Faster than flash “Almost” as fast as DRAM
:)
SLIDE 12 Phase-change memory (PCM)
+
:(
Write speed & energy Cells wear out
SLIDE 13 Phase-change memory (PCM) :(
Write speed & energy Cells wear out
Multi-level cells are denser
but need more time and energy. Cells wear out over time and can no longer be used.
SLIDE 14
Phase-change memory (PCM) :
Multi-level cells are denser
but need more time and energy
to protect against errors. Cells wear out over time and can no longer be used for precise data storage.
(
SLIDE 15
Phase-change memory (PCM) :
(
Fast Dense
SLIDE 16
Phase-change memory (PCM) :
(
Fast Dense Accurate
SLIDE 17
Approximate storage in PCM
Trade off accuracy for performance in multi-level cell accesses. Use worn-out memory for approximate data instead of throwing it away.
SLIDE 18
Approximate storage in PCM
Trade off accuracy for performance in multi-level cell accesses. Use worn-out memory for approximate data instead of throwing it away.
1 2
SLIDE 19
Approximate storage in PCM
Trade off accuracy for performance in multi-level cell accesses. Use approximate throwing it away.
1 2
SLIDE 20
Single-level cells
high low analog value digital value 1
SLIDE 21
Multi-level cells
high low analog value digital value 00 11 01 10
SLIDE 22 Writing to multi-level cells
high low analog value digital value 00 11 01 10
probability
SLIDE 23 Writing to multi-level cells,
approximately
high low analog value digital value 00 11 01 10
probability
SLIDE 24
Speed Density Accuracy
SLIDE 25 10
Iterative writes
high low 00 11 01 time
target range
SLIDE 26 10
Iterative writes, approximately
high low 00 11 01 time
target range
SLIDE 27 10
Iterative writes, approximately
high low 00 11 01 time
target range
SLIDE 28
wider target range fewer iterations to converge faster writes (or better density at the same speed)
SLIDE 29 Encoding to minimize error in approximate MLC
1 cell, 4 bits unreliable reliable
LO HI
x ± y
SLIDE 30 lots of errors 4 cells, 16 bits
Encoding to minimize error in approximate MLC
LO HI
x ± y
SLIDE 31 lots of errors 4 cells, 16 bits
Encoding to minimize error in approximate MLC
LO HI
x ± y
SLIDE 32 Write speedup for approximate MLC
Writes are 1.7× faster on average with quality loss under 10%
0.5 1 1.5 2 2.5
mc smm sor fft lu zxing jmeint raytracer pa nn ml image mean
main-memory benchmarks persistent data
best write speedup
SLIDE 33
Approximate storage in PCM
Trade off performance in accesses. Use worn-out memory for approximate data instead of throwing it away.
SLIDE 34 Failed cells are a fact of life
0 1 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 1 1 0 0 0 1 1 0 0 1 0 1 1 0 0 1 0 0 0 1
a good block
SLIDE 35 Failed cells are a fact of life
0 1 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 1 1 0 0 0 1 1 0 0 1 0 1 1 0 0 1 0 0 0 1
a (tragically) failed block
SLIDE 36 Traditional error correction
0 1 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 1 1 0 0 0 1 1 0 0 1 0 1 1 0 0 1 0 0 0 1
corrected data block correction bits
SLIDE 37 Correction resources are exhaustible
0 1 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 1 1 0 0 0 1 1 0 0 1 0 1 1 0 0 1 0 0 0 1
uncorrectable (bad) block correction bits a p p r
i m a t e
SLIDE 38 Prioritized error correction
0 1 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 1 1 0 0 0 1 1 0 0 1 0 1 1 0 0 1 0 0 0 1
uncorrectable (bad) block correction bits a p p r
i m a t e
LO HI
x ± y
error exposed where it does the least harm
SLIDE 39 Lifetime extension with block recycling
Lifetime extended by 23% on average
- r from about 5.2 to 6.5 years
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
mc smm sor fft lu zxing jmeint raytracer pa nn ml image mean
main-memory benchmarks persistent data
normalized lifetime (writes)
SLIDE 40 Compute
Display Disk Network
I/O Storage
Memory
SLIDE 41 Compute
Display Disk Network
I/O Storage
Memory