EvenDB: Optimizing Key-Value Storage for Spatial Locality
Eran Gilad, Edward Bortnikov, Anastasia Braginsky, Yonatan Gottesman, Eshcar Hillel (Yahoo Research), Idit Keidar (Technion), Nurit Moscovici (Outbrain), Rana Shahout (Technion)
EvenDB: Optimizing Key-Value Storage for Spatial Locality Eran - - PowerPoint PPT Presentation
EvenDB: Optimizing Key-Value Storage for Spatial Locality Eran Gilad, Edward Bortnikov, Anastasia Braginsky, Yonatan Gottesman, Eshcar Hillel (Yahoo Research), Idit Keidar (Technion), Nurit Moscovici (Outbrain), Rana Shahout (Technion)
Eran Gilad, Edward Bortnikov, Anastasia Braginsky, Yonatan Gottesman, Eshcar Hillel (Yahoo Research), Idit Keidar (Technion), Nurit Moscovici (Outbrain), Rana Shahout (Technion)
2
k1 → v1 k2 → v2 k3 → v3 k4 → v4 k5 → v5 k6 → v6 k7 → v7 k8 → v8 k9 → v9 put, get, scan
keys are hotter
3
k1 → v1 k2 → v2 k3 → v3 k4 → v4 k5 → v5 k6 → v6 k7 → v7 k8 → v8 k9 → v9 put, get, scan + + + + + + + Hot Cold
keys are hotter
ranges are hotter ○
e.g., complex keys
4
k1_l1 → v1 k1_l2 → v2 k1_l3 → v3 k2_l1 → v4 k2_l2 → v5 k3_l1 → v6 k3_l2 → v7 k3_l3 → v8 k3_l4 → v9 put, get, scan + + + + + + + Hot Cold
keys are hotter
ranges are hotter ○
e.g., complex keys
○
appname_timestamp
○
1% of apps ⇒ 1% key prefixes ⇒ 94% of events
5
Mobile apps events distribution Probability density 10-2 10-4 10-6 10-8 App popularity ranking 100 101 102 103 104
Log scale
6
k1..kn k1..kn k1..kn k1..kn k1..kn k1..kn Memory Disk L0 L1 L2 MemTable Ranges overlap More capacity (e.g., 10x)
7
Memory Disk L0 L1 L2 MemTable Compactions merge hot and cold ranges Update time
8
Memory Disk L0 L1 L2 MemTable Ranges are fragmented scan(...):
9
○
Much smaller than shards
○
Much larger than blocks
○
Disk I/O
○
Compaction
○
Memory caching
○
Concurrency control
10
11
chunk chunk chunk
Linked list of chunks Chunk objects hold metadata - versions,
file handles, stats etc.
RAM disk
12
chunk chunk chunk
i n d e x Quickly locate the chunk whose range includes the given key
RAM disk
13
RAM disk funk chunk SSTable log
Bloom filters
chunk chunk
i n d e x Immediately store in log; Occasionally merge log into SST
row cache
14
RAM disk funk chunk SSTable log
Bloom filters
chunk chunk
i n d e x #1 - search row cache #2 - search log #3 - search SST Scans always search SST and log
row cache
15
RAM disk funk chunk SSTable log
Bloom filters
row cache chunk chunk
i n d e x #1 - Store in log
munk cache funk chunk munk SSTable log RAM disk chunk chunk munk
i n d e x
funk SSTable log funk SSTable log
Bloom filters
#2 - Store in munk #4 - Rarely create SST from munk #3 - Occasionally rebalance munk
16
RAM disk funk chunk SSTable log
Bloom filters
row cache chunk chunk
i n d e x
munk cache funk chunk munk SSTable log RAM disk chunk chunk munk
i n d e x
funk SSTable log funk SSTable log
Bloom filters
Search/scan munk
○
Traces from internal production system, 256GB DB - some presented next
○
Standard and extended YCSB benchmarks - results in paper
17
18
EvenDB 4.4x faster, write amp. 4x lower (better)
19
EvenDB runs much smoother RocksDB throughput drops during compaction
20
EvenDB 1.2x faster than RocksDB
~38 minutes stall after DB creation RocksDB faster after storage optimized
workloads than LSM: ○
Lower write amplification
○
Single level of storage, no overlapping
○
Memory serves reads and writes
21
Thank you! Qs?
○
Workload is spatially-local or most working set fits in RAM
○
In par otherwise
○
Demonstrated in real workload and synthetic YCSB benchmarks