EvenDB: Optimizing Key-Value Storage for Spatial Locality Eran - - PowerPoint PPT Presentation

evendb optimizing key value storage for spatial locality
SMART_READER_LITE
LIVE PREVIEW

EvenDB: Optimizing Key-Value Storage for Spatial Locality Eran - - PowerPoint PPT Presentation

EvenDB: Optimizing Key-Value Storage for Spatial Locality Eran Gilad, Edward Bortnikov, Anastasia Braginsky, Yonatan Gottesman, Eshcar Hillel (Yahoo Research), Idit Keidar (Technion), Nurit Moscovici (Outbrain), Rana Shahout (Technion)


slide-1
SLIDE 1

EvenDB: Optimizing Key-Value Storage for Spatial Locality

Eran Gilad, Edward Bortnikov, Anastasia Braginsky, Yonatan Gottesman, Eshcar Hillel (Yahoo Research), Idit Keidar (Technion), Nurit Moscovici (Outbrain), Rana Shahout (Technion)

slide-2
SLIDE 2
  • key -> value mapping

2

Key-value stores

k1 → v1 k2 → v2 k3 → v3 k4 → v4 k5 → v5 k6 → v6 k7 → v7 k8 → v8 k9 → v9 put, get, scan

slide-3
SLIDE 3
  • key -> value mapping
  • skewed workload: some

keys are hotter

3

Key-value stores

k1 → v1 k2 → v2 k3 → v3 k4 → v4 k5 → v5 k6 → v6 k7 → v7 k8 → v8 k9 → v9 put, get, scan + + + + + + + Hot Cold

slide-4
SLIDE 4
  • key -> value mapping
  • skewed workload: some

keys are hotter

  • spatial locality: some

ranges are hotter ○

e.g., complex keys

4

Key-value stores

k1_l1 → v1 k1_l2 → v2 k1_l3 → v3 k2_l1 → v4 k2_l2 → v5 k3_l1 → v6 k3_l2 → v7 k3_l3 → v8 k3_l4 → v9 put, get, scan + + + + + + + Hot Cold

slide-5
SLIDE 5
  • key -> value mapping
  • skewed workload: some

keys are hotter

  • spatial locality: some

ranges are hotter ○

e.g., complex keys

  • Sample production trace:

appname_timestamp

1% of apps ⇒ 1% key prefixes ⇒ 94% of events

5

Key-value stores

Mobile apps events distribution Probability density 10-2 10-4 10-6 10-8 App popularity ranking 100 101 102 103 104

Log scale

slide-6
SLIDE 6

6

LSM-trees

k1..kn k1..kn k1..kn k1..kn k1..kn k1..kn Memory Disk L0 L1 L2 MemTable Ranges overlap More capacity (e.g., 10x)

slide-7
SLIDE 7

7

LSM-trees are designed for temporal locality

Memory Disk L0 L1 L2 MemTable Compactions merge hot and cold ranges Update time

slide-8
SLIDE 8

8

LSM-trees are less suited for spatial locality

Memory Disk L0 L1 L2 MemTable Ranges are fragmented scan(...):

slide-9
SLIDE 9
  • Ordered key-value store
  • Optimized for spatial locality
  • Low write amplification
  • Persistent, fast recovery
  • Atomic operations, including scan

9

EvenDB

slide-10
SLIDE 10
  • Dynamically partitioned key space into chunks

Much smaller than shards

Much larger than blocks

  • Chunks are the basic unit for

Disk I/O

Compaction

Memory caching

Concurrency control

10

Chunk-based organization

slide-11
SLIDE 11

11

Chunks metadata

chunk chunk chunk

Linked list of chunks Chunk objects hold metadata - versions,

  • sync. mechanisms,

file handles, stats etc.

RAM disk

slide-12
SLIDE 12

12

Chunks index

chunk chunk chunk

i n d e x Quickly locate the chunk whose range includes the given key

RAM disk

slide-13
SLIDE 13

13

Disk storage - updates

RAM disk funk chunk SSTable log

Bloom filters

chunk chunk

i n d e x Immediately store in log; Occasionally merge log into SST

row cache

slide-14
SLIDE 14

14

Disk storage - lookups

RAM disk funk chunk SSTable log

Bloom filters

chunk chunk

i n d e x #1 - search row cache #2 - search log #3 - search SST Scans always search SST and log

row cache

slide-15
SLIDE 15

15

Memory cache - updates

RAM disk funk chunk SSTable log

Bloom filters

row cache chunk chunk

i n d e x #1 - Store in log

munk cache funk chunk munk SSTable log RAM disk chunk chunk munk

i n d e x

funk SSTable log funk SSTable log

Bloom filters

#2 - Store in munk #4 - Rarely create SST from munk #3 - Occasionally rebalance munk

slide-16
SLIDE 16

16

Memory cache - lookups

RAM disk funk chunk SSTable log

Bloom filters

row cache chunk chunk

i n d e x

munk cache funk chunk munk SSTable log RAM disk chunk chunk munk

i n d e x

funk SSTable log funk SSTable log

Bloom filters

Search/scan munk

slide-17
SLIDE 17
  • 3 benchmark suites

Traces from internal production system, 256GB DB - some presented next

Standard and extended YCSB benchmarks - results in paper

  • State-of-the-art LSM: RocksDB

17

Evaluation

slide-18
SLIDE 18

18

Real dataset ingestion

EvenDB 4.4x faster, write amp. 4x lower (better)

slide-19
SLIDE 19

19

Compactions impact

EvenDB runs much smoother RocksDB throughput drops during compaction

slide-20
SLIDE 20

20

Real dataset scans

EvenDB 1.2x faster than RocksDB

~38 minutes stall after DB creation RocksDB faster after storage optimized

slide-21
SLIDE 21
  • EvenDB introduces a novel key-value store architecture
  • Chunk arrangement better suited for spatially-local

workloads than LSM: ○

Lower write amplification

Single level of storage, no overlapping

Memory serves reads and writes

21

Summary

Thank you! Qs?

  • EvenDB outperforms RocksDB when:

Workload is spatially-local or most working set fits in RAM

In par otherwise

Demonstrated in real workload and synthetic YCSB benchmarks