The Unwritten Contract
- f Solid State Drives
Jun He, Sudarsun Kannan, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
Department of Computer Sciences, University of Wisconsin - Madison
The Unwritten Contract of Solid State Drives Jun He, Sudarsun - - PowerPoint PPT Presentation
The Unwritten Contract of Solid State Drives Jun He, Sudarsun Kannan, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau Department of Computer Sciences, University of Wisconsin - Madison Enterprise SSD revenue is expected to exceed enterprise
Jun He, Sudarsun Kannan, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
Department of Computer Sciences, University of Wisconsin - Madison
5 10 15 2012 2013 2014 2015E 2016E 2017E 2018E 2019E Year Revenue (Billion Dollars) Enterprise HDD Enterprise SSD
HDD SSD
Source: Gartner, Stifel Estimates https://www.theregister.co.uk/2016/01/07/gartner_enterprise_ssd_hdd_revenue_crossover_in_2017/
2017
for SSD
for SSD
for SSD
for SSD
for SSD
for SSD
http://crestingwave.com/sites/default/files/collateral/velobit_whitepaper_ssdperformancetips.pdf
Technologies (FAST ’10), San Jose, California, February 2010
http://crestingwave.com/sites/default/files/collateral/velobit_whitepaper_ssdperformancetips.pdf
Technologies (FAST ’10), San Jose, California, February 2010
Performance degradation
http://crestingwave.com/sites/default/files/collateral/velobit_whitepaper_ssdperformancetips.pdf
Technologies (FAST ’10), San Jose, California, February 2010
Performance degradation Performance fluctuation
http://crestingwave.com/sites/default/files/collateral/velobit_whitepaper_ssdperformancetips.pdf
Technologies (FAST ’10), San Jose, California, February 2010
Early end of device life
Performance degradation Performance fluctuation
Block Device Interface: read(range), write(range), discard(range)
Block Device Interface: read(range), write(range), discard(range)
efficient than farther ones
MEMS-based storage devices and standard disk interfaces: A square peg in a round hole? Steven W. Schlosser, Gregory R. Ganger FAST’04
Block Device Interface: read(range), write(range), discard(range)
efficient than farther ones
MEMS-based storage devices and standard disk interfaces: A square peg in a round hole? Steven W. Schlosser, Gregory R. Ganger FAST’04
simulator
simulator
simulator
LevelDB
RocksDB SQLite
(RollBack)
SQLite
(WAL)
Varmail
5 apps
for SSD
LevelDB
RocksDB SQLite
(RollBack)
SQLite
(WAL)
Varmail
5 apps
for SSD
ext4 F2FS XFS
for SSD
3 file systems
LevelDB
RocksDB SQLite
(RollBack)
SQLite
(WAL)
Varmail
5 apps
for SSD
ext4 F2FS XFS
for SSD
3 file systems
Rule 1 Rule 2 Rule 3 Rule 4 Rule 5
Contract
LevelDB
RocksDB SQLite
(RollBack)
SQLite
(WAL)
Varmail
5 apps
for SSD
ext4 F2FS XFS
for SSD
3 file systems
Rule 1 Rule 2 Rule 3 Rule 4 Rule 5
Contract
LevelDB
RocksDB SQLite
(RollBack)
SQLite
(WAL)
Varmail
5 apps
for SSD
ext4 F2FS XFS
for SSD
3 file systems
Rule 1 Rule 2 Rule 3 Rule 4 Rule 5
Contract To study the contract, we built a sophisticated SSD simulator and workload analyzer
Overview SSD Unwritten Contract Violations of the Unwritten Contract Conclusions
Overview SSD Unwritten Contract Violations of the Unwritten Contract Conclusions
P
P P P Block …
P P P Block … P P P … P P P … …
P P P Block … P P P … P P P … … Channel
P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel
P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel …
P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel …
P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel …
P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel …
Controller
P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel …
Controller
P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel …
Controller
P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel …
Controller
RAM
P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel P P P Block … P P P … P P P … … Channel …
Controller
RAM
Mapping Table Data Cache
SSD clients should issue large data requests or multiple
SSD clients should issue large data requests or multiple
Channel Channel Channel Channel SSD
SSD clients should issue large data requests or multiple
Channel Channel Channel Channel SSD
Request
SSD clients should issue large data requests or multiple
Channel Channel Channel Channel SSD
SSD clients should issue large data requests or multiple
Channel Channel Channel Channel SSD
Channel Channel Channel Channel SSD
Channel Channel Channel Channel SSD
Channel Channel Channel Channel SSD
Channel Channel Channel Channel SSD
Channel Channel Channel Channel SSD
Proceedings of the 17th International Symposium on High Performance Com- puter Architecture (HPCA-11), pages 266–277, San Antonio, Texas, February 2011.
SSD clients should access with locality RAM FLASH
Logical to Physical Mapping Table
SSD
P P P P P P P P P P P P P P P P P P P P P P P P
SSD clients should access with locality RAM FLASH
Logical to Physical Mapping Table
SSD
P P P P P P P P P P P P P P P P P P P P P P P P P P
SSD clients should access with locality RAM FLASH
Logical to Physical Mapping Table
SSD
P P P P P P P P P P P P P P P P P P P P P P P P P P
SSD clients should access with locality RAM FLASH
Logical to Physical Mapping Table
SSD
P P P P P P P P P P P P P P P P P P P P P P P P P P
SSD clients should access with locality RAM FLASH
Logical to Physical Mapping Table
SSD
P P P P P P P P P P P P P P P P P P P P P P P P P P
Logical to Physical Mapping Table
SSD clients should access with locality RAM FLASH SSD
P P P P P P P P P P P P P P P P P P P P P P P P
Logical to Physical Mapping Table
SSD clients should access with locality RAM FLASH SSD
P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P
Logical to Physical Mapping Table
SSD clients should access with locality RAM FLASH SSD
P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P
Logical to Physical Mapping Table
SSD clients should access with locality RAM FLASH SSD
P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P
Optimize Address Translation in Flash Memory. In Proceedings of the EuroSys Conference (EuroSys ’15), Bordeaux, France, April 2015.
Details in the paper
Data with similar death times should be placed in the same block.
Data with similar death times should be placed in the same block.
1pm 1pm 1pm 1pm 2pm 2pm 2pm 2pm
Data with similar death times should be placed in the same block.
1pm 1pm 1pm 1pm 2pm 2pm 2pm 2pm
Time
1pm 2pm
Data with similar death times should be placed in the same block.
1pm 1pm 1pm 1pm 2pm 2pm 2pm 2pm
Time
1pm 2pm
Data with similar death times should be placed in the same block.
1pm 1pm 1pm 1pm 2pm 2pm 2pm 2pm
Time
1pm 2pm
1pm 1pm 1pm 1pm
Data with similar death times should be placed in the same block.
1pm 1pm 1pm 1pm 2pm 2pm 2pm 2pm
Time
1pm 2pm
1pm 1pm 1pm 1pm
Data with similar death times should be placed in the same block.
1pm 1pm 1pm 1pm 2pm 2pm 2pm 2pm
Time
1pm 2pm
1pm 1pm 1pm 1pm
1pm 1pm 2pm 2pm 1pm 1pm 2pm 2pm
Time
1pm 2pm
1pm 1pm 2pm 2pm 1pm 1pm 2pm 2pm
Time
1pm 2pm
1pm 1pm 2pm 2pm 1pm 1pm 2pm 2pm
Time
1pm 2pm
1pm 1pm 2pm 2pm 1pm 1pm 2pm 2pm
1pm 1pm 2pm 2pm 1pm 1pm 2pm 2pm
Time
1pm 2pm
1pm 1pm 2pm 2pm
2pm 2pm
1pm 1pm 2pm 2pm 1pm 1pm 2pm 2pm
Time
1pm 2pm
1pm 1pm 2pm 2pm
2pm 2pm
1pm 1pm 2pm 2pm
Time
1pm 2pm
1pm 1pm 2pm 2pm
2pm 2pm
1pm 1pm 2pm 2pm
Time
1pm 2pm
1pm 1pm 2pm 2pm
2pm 2pm
1pm 1pm 2pm 2pm
Time
1pm 2pm
1pm 1pm 2pm 2pm
2pm 2pm
1pm 1pm 2pm 2pm
Time
1pm 2pm
1pm 1pm 2pm 2pm
2pm 2pm
1pm 1pm 2pm 2pm
Time
1pm 2pm
1pm 1pm 2pm 2pm
2pm 2pm
Performance impact: 4.8x write bandwidth 1.6x throughput 1.8x block erasure count
and Storage Technologies (FAST ’15), Santa Clara, California, February 2015. J.-U. Kang, J. Hyun, H. Maeng, and S. Cho. The Multi- streamed Solid-State Drive. In 6th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage ’14), Philadelphia, PA, June 2014. .
Clients of SSDs should create data with similar lifetimes
Clients of SSDs should create data with similar lifetimes Lifetime
1 Day
Clients of SSDs should create data with similar lifetimes
Usage Count: SSD
Lifetime
1 Day
Clients of SSDs should create data with similar lifetimes
Usage Count:
SSD
Lifetime
1 Day
Clients of SSDs should create data with similar lifetimes
Usage Count:
SSD
Lifetime
1 Day
Lifetime
1 Day 1000 Years
SSD Usage Count:
Lifetime
1 Day 1000 Years
SSD
Usage Count:
Lifetime
1 Day 1000 Years
SSD
Usage Count:
Lifetime
1 Day 1000 Years
SSD
Usage Count:
Lifetime
1 Day 1000 Years
SSD
Usage Count:
Performance impact: 1.6x write latency
Technologies (FAST ’10), San Jose, California, February 2010.
Overview SSD Unwritten Contract Violations of the Unwritten Contract Conclusions
Block Trace
Block Trace
SSD Simulator: WiscSim
Block Trace
SSD Simulator: WiscSim
Rule violation? Analyzer: WiscSee
Block Trace Root Cause
SSD Simulator: WiscSim
Rule violation? Analyzer: WiscSee
2 of Our 24 Observations
2 of Our 24 Observations
leveldb rocksdb sqlite−rb sqlite−wal varmail 10 20 30 10 20 30 read write seq rand mix seq rand mix seq rand mix seq rand mix small large mix Num of Concurrent Requests ext4 f2fs xfs leveldb rocksdb sqlite−rb sqlite−wal varmail 400 800 1200 400 800 1200 read write seq rand mix seq rand mix seq rand mix seq rand mix small large mix Request Size (KB) ext4 f2fs xfs
leveldb rocksdb sqlite−rb sqlite−wal varmail 10 20 30 10 20 30 read write seq rand mix seq rand mix seq rand mix seq rand mix small large mix Num of Concurrent Requests ext4 f2fs xfs leveldb rocksdb sqlite−rb sqlite−wal varmail 400 800 1200 400 800 1200 read write seq rand mix seq rand mix seq rand mix seq rand mix small large mix Request Size (KB) ext4 f2fs xfs
leveldb rocksdb sqlite−rb sqlite−wal varmail 10 20 30 10 20 30 read write seq rand mix seq rand mix seq rand mix seq rand mix small large mix Num of Concurrent Requests ext4 f2fs xfs leveldb rocksdb sqlite−rb sqlite−wal varmail 400 800 1200 400 800 1200 read write seq rand mix seq rand mix seq rand mix seq rand mix small large mix Request Size (KB) ext4 f2fs xfs
LevelDB & RocksDB: Insertions Background Compactions
leveldb rocksdb sqlite−rb sqlite−wal varmail 10 20 30 10 20 30 read write seq rand mix seq rand mix seq rand mix seq rand mix small large mix Num of Concurrent Requests ext4 f2fs xfs leveldb rocksdb sqlite−rb sqlite−wal varmail 400 800 1200 400 800 1200 read write seq rand mix seq rand mix seq rand mix seq rand mix small large mix Request Size (KB) ext4 f2fs xfs
LevelDB & RocksDB: Insertions Background Compactions Median: ~100KB
leveldb rocksdb sqlite−rb sqlite−wal varmail 10 20 30 10 20 30 read write seq rand mix seq rand mix seq rand mix seq rand mix small large mix Num of Concurrent Requests ext4 f2fs xfs leveldb rocksdb sqlite−rb sqlite−wal varmail 400 800 1200 400 800 1200 read write seq rand mix seq rand mix seq rand mix seq rand mix small large mix Request Size (KB) ext4 f2fs xfs
LevelDB & RocksDB: Insertions Background Compactions Median: ~100KB Median: ~2
LevelDB and RocksDB can access files in large sizes.
App Page Cache Block Layer SSD
App Page Cache Block Layer SSD read()
2MB
App Page Cache Block Layer SSD read()
2MB
128KB 128KB 128KB …
App Page Cache Block Layer SSD read()
2MB
128KB 128KB 128KB … One request at a time
App Page Cache Block Layer SSD read()
2MB
128KB 128KB 128KB …
Surprise! Even reading 2MB in your app will not utilize SSD well.
One request at a time
2 of Our 24 Observations
2 of Our 24 Observations
Valid
Valid Invalid
Valid ratio 1.0 0.25 0.75 0.75
Valid ratio 1.0 0.25 0.75 0.75
Valid ratio 1.0 0.25 0.75 0.75
Valid ratio 1.0 0.25 0.75 0.75
Valid ratio 1.0 0.25 0.75 0.75
Valid ratio 1.0 0.25 0.75 0.75
Valid ratio 1.0 0.25 0.75 0.75
Over-provisioned Over-provisioned
Over-provisioned Over-provisioned
Ready to be used Ready to be used
Over-provisioned
Over-provisioned
Move data before use
Over-provisioned
Over-provisioned
Flash Space
1 0.5 1.5 2
0.25 0.5 0.75 1.0
Valid Ratio
Animations cannot be displayed in PDF. Please see the animations at http://pages.cs.wisc.edu/~jhe/zombie.html
Flash Space
1 0.5 1.5 2
0.25 0.5 0.75 1.0
Valid Ratio
Animations cannot be displayed in PDF. Please see the animations at http://pages.cs.wisc.edu/~jhe/zombie.html
Flash Space
1 0.5 1.5 2
0.25 0.5 0.75 1.0
Valid Ratio ext4
Animations cannot be displayed in PDF. Please see the animations at http://pages.cs.wisc.edu/~jhe/zombie.html
F2FS
Flash Space
1 0.5 1.5 2
0.25 0.5 0.75 1.0
Valid Ratio ext4
Animations cannot be displayed in PDF. Please see the animations at http://pages.cs.wisc.edu/~jhe/zombie.html
F2FS
Flash Space
1 0.5 1.5 2
0.25 0.5 0.75 1.0
Valid Ratio ext4 Stable-state curves characterize workloads.
Animations cannot be displayed in PDF. Please see the animations at http://pages.cs.wisc.edu/~jhe/zombie.html
F2FS
Flash Space
1 0.5 1.5 2
0.25 0.5 0.75 1.0
Valid Ratio ext4 Stable-state curves characterize workloads.
Animations cannot be displayed in PDF. Please see the animations at http://pages.cs.wisc.edu/~jhe/zombie.html
Legacy file system allocation policies break locality
Legacy file system allocation policies break locality Application log structuring does not reduce GC
Legacy file system allocation policies break locality Application log structuring does not reduce GC
The SSD contract is multi-dimensional
workloads
The SSD contract is multi-dimensional
workloads Although not perfect, traditional file systems perform surprisingly well upon SSDs
The SSD contract is multi-dimensional
workloads Although not perfect, traditional file systems perform surprisingly well upon SSDs Myths spread if the unwritten contract is not clarified
Understanding the unwritten contract is crucial for designing high performance application and file systems
Understanding the unwritten contract is crucial for designing high performance application and file systems System designing demands more vertical analysis
Understanding the unwritten contract is crucial for designing high performance application and file systems System designing demands more vertical analysis WiscSee (analyzer) and WiscSim (SSD simulator) are available at: http://research.cs.wisc.edu/adsl/Software/wiscsee