Hybrid Indexes Huanchen Zhang David G. Andersen, Andrew Pavlo, - - PowerPoint PPT Presentation

hybrid indexes
SMART_READER_LITE
LIVE PREVIEW

Hybrid Indexes Huanchen Zhang David G. Andersen, Andrew Pavlo, - - PowerPoint PPT Presentation

Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes Huanchen Zhang David G. Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, Rui Shen PARALLEL DATA LABORATORY Carnegie Mellon University 2 3 4 Part I Initial


slide-1
SLIDE 1

Reducing the Storage Overhead of Main-Memory OLTP Databases with

Hybrid Indexes

Huanchen Zhang

David G. Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, Rui Shen

PARALLEL DATA LABORATORY

Carnegie Mellon University

slide-2
SLIDE 2

2

slide-3
SLIDE 3

3

slide-4
SLIDE 4

4

slide-5
SLIDE 5

Part I Initial Exploration

  • f Hybrid Indexes

[SIGMOD’16]

5

slide-6
SLIDE 6

You are running out of memory

6

slide-7
SLIDE 7

You are running out of memory

6

slide-8
SLIDE 8

Buy more

?

You are running out of memory

6

slide-9
SLIDE 9

2M 4M 6M 8M 10M 20K 60K

Transactions Executed Throughput TPC-C on

  • Store

7

Memory (GB)

Disk tuples In-memory tuples Indexes

4 8 Memory Limit = 5GB

slide-10
SLIDE 10

8

slide-11
SLIDE 11

The better way: Use memory more efficiently

9

slide-12
SLIDE 12

Indexes are LARGE

Benchmark % space for index

TPC-C Voter Articles

58% 55% 34%

Hybrid Index

34% 41% 18%

10

slide-13
SLIDE 13

Our Contributions [SIGMOD’16] The hybrid index architecture The Dual-Stage Transformation Applied to 4 index structures

  • B+tree
  • Masstree

11

  • Skip List
  • Adaptive Radix Tree (ART)

Performance Space

30 – 70%

slide-14
SLIDE 14

2M 4M 6M 8M 10M 20K 60K

Transactions Executed Throughput (txn/s) Did we solve this problem?

TPC-C on

  • Store

Stay tuned

12

slide-15
SLIDE 15

How do hybrid indexes achieve memory savings ?

13

Static

slide-16
SLIDE 16

dynamic stage static stage

Hybrid Index: a dual-stage architecture

14

slide-17
SLIDE 17

dynamic stage static stage write merge

Inserts are batched in the dynamic stage

15

slide-18
SLIDE 18

dynamic stage static stage

Reads search the stages in order

16

slide-19
SLIDE 19

dynamic stage static stage read

A Bloom filter improves read performance

17

slide-20
SLIDE 20

dynamic stage static stage read write merge Memory-efficient Skew-aware

18

~ ~ ~ ~ ~ ~ ~ ~

slide-21
SLIDE 21

dynamic stage static stage merge

The Dual-Stage Transformation

19

slide-22
SLIDE 22

dynamic stage static stage merge

The Dual-Stage Transformation

19

slide-23
SLIDE 23

Compaction Reduction Compression

The Dynamic-to-Static Rules

20

slide-24
SLIDE 24

Compaction Reduction Compression

The Dynamic-to-Static Rules

20

slide-25
SLIDE 25

2 4 4 1 2 a b 6 8 10 3 4 c d 5 5 e f 5 6 g h 7 8 i j 9 10 k l 11 12 m n 21

slide-26
SLIDE 26

2 4 4 1 2 a b 6 8 10 3 c 4 d 5 5 e f 5 g 6 h 7 i 8 j 9 k 10 l 11 m 12 n

Compaction: minimize # of memory blocks

21

slide-27
SLIDE 27

1 2 3 a b c 4 5 6 d h 7 8 9 i j k 10 11 12 l m n e f g 3 6 9 21

Compaction: minimize # of memory blocks

slide-28
SLIDE 28

1 2 3 a b c 4 5 6 d h 7 8 9 i j k 10 11 12 l m n e f g 3 6 9

Reduction: minimize structural overhead

22

slide-29
SLIDE 29

1 2 3 a b c 4 5 6 d h 7 8 9 i j k 10 11 12 l m n e f g 3 6 9 22

Reduction: minimize structural overhead

slide-30
SLIDE 30

2 4 4 1 2 a b 6 8 10 3 4 c d 5 5 e f 5 6 g h 7 8 i j 9 10 k l 11 12 m n 1 2 3 a b c 4 5 6 d h 7 8 9 i j k 10 11 12 l m n e f g 3 6 9 22

Reduction: minimize structural overhead

slide-31
SLIDE 31

dynamic stage static stage merge

The merge routine is a blocking process

23

slide-32
SLIDE 32

dynamic stage static stage merge

23

?

The merge routine is a blocking process

Size %

slide-33
SLIDE 33

2M 4M 6M 8M 10M 20K 60K

Transactions Executed Throughput (txn/s) Did we solve this problem?

TPC-C on

  • Store

B+tree

24

slide-34
SLIDE 34

2M 4M 6M 8M 10M 20K 60K

Transactions Executed Throughput (txn/s)

60K 20K

Yes, we improved the DBMS’s capacity!

TPC-C on

  • Store

B+tree Hybrid

24

slide-35
SLIDE 35

Transactions Executed Throughput (txn/s)

20K 60K 20K 60K

Memory (GB)

2M 4M 6M 8M 10M

TPC-C on

  • Store

4 4 8 8

B+tree Hybrid B+tree Hybrid Disk tuples In-memory tuples Indexes

25

slide-36
SLIDE 36

Transactions Executed Throughput (txn/s)

20K 60K 20K 60K

Memory (GB)

2M 4M 6M 8M 10M

TPC-C on

  • Store

4 4 8 8

B+tree Hybrid B+tree Hybrid Disk tuples In-memory tuples Indexes

25

slide-37
SLIDE 37

Transactions Executed Throughput (txn/s)

20K 60K 20K 60K

Memory (GB)

2M 4M 6M 8M 10M

TPC-C on

  • Store

4 4 8 8

B+tree Hybrid B+tree Hybrid Disk tuples In-memory tuples Indexes

25

slide-38
SLIDE 38

Transactions Executed Throughput (txn/s)

20K 60K 20K 60K

Memory (GB)

2M 4M 6M 8M 10M

TPC-C on

  • Store

4 4 8 8

B+tree Hybrid B+tree Hybrid Disk tuples In-memory tuples Indexes

25

slide-39
SLIDE 39

Transactions Executed Throughput (txn/s)

20K 60K 20K 60K

Memory (GB)

2M 4M 6M 8M 10M

TPC-C on

  • Store

4 4 8 8

B+tree Hybrid B+tree Hybrid Disk tuples In-memory tuples Indexes

25

slide-40
SLIDE 40

Transactions Executed Throughput (txn/s)

20K 60K 20K 60K

Memory (GB)

2M 4M 6M 8M 10M

TPC-C on

  • Store

4 4 8 8

B+tree Hybrid B+tree Hybrid Disk tuples In-memory tuples Indexes

25

Take Away:

Larger working set in memory Higher throughput Memory saved by indexes

slide-41
SLIDE 41

Part I Recap The hybrid index architecture The Dual-Stage Transformation Applied to 4 index structures GENERAL PRACTICAL USEFUL

26

  • B+tree
  • Masstree
  • Skip List
  • Adaptive Radix Tree (ART)
slide-42
SLIDE 42

Part II Concurrent hybrid indexes with non- blocking merge

27

slide-43
SLIDE 43

dynamic stage static stage write merge

Building Concurrent Hybrid Index?

28

slide-44
SLIDE 44

dynamic stage static stage write merge

Building Concurrent Hybrid Index?

28

slide-45
SLIDE 45

29

Use concurrent data structures for dynamic-stage

dynamic stage static stage write merge

slide-46
SLIDE 46

30

Static-stage is perfectly concurrent by default

dynamic stage static stage write merge

slide-47
SLIDE 47

31

Challenge: efficient non-blocking merge algorithm

dynamic stage static stage write merge

slide-48
SLIDE 48

Merge Algorithm Requirements Efficient

  • Fast
  • Bounded temporary memory use

Non-blocking

  • All existing items are accessible during merge
  • New items can still enter

32

slide-49
SLIDE 49

Naïve Solution 1: Coarse-grained Locking

dynamic stage static stage write merge

33

slide-50
SLIDE 50

Naïve Solution 1: Coarse-grained Locking

dynamic stage static stage merge write

33

slide-51
SLIDE 51

The intermediate stage unblocks write traffic

static stage merge dynamic stage write

34

slide-52
SLIDE 52

The intermediate stage unblocks write traffic

static stage merge dynamic stage write Intermediate stage freeze

34

slide-53
SLIDE 53

The intermediate stage unblocks write traffic

freeze static stage merge dynamic stage write Intermediate stage

34

slide-54
SLIDE 54

static stage merge Intermediate stage

35

How do we unblock reads during merge?

slide-55
SLIDE 55

Naïve Solution 2: Full Copy-on-write

static stage merge Intermediate stage

36

slide-56
SLIDE 56

Key Observation

Merged-in items in the static-stage will NOT be accessed until the intermediate-stage is deleted

Merge Incrementally!

37

slide-57
SLIDE 57

Our Solution: Incremental Copy-on-write with Rapid GC

  • ld

new parent

38

slide-58
SLIDE 58
  • ld

new parent When can we safely reclaim the garbage?

38

Our Solution: Incremental Copy-on-write with Rapid GC

slide-59
SLIDE 59
  • ld

new parent When can we safely reclaim the garbage?

38

Our Solution: Incremental Copy-on-write with Rapid GC

slide-60
SLIDE 60
  • ld

new parent When no thread still holds a reference to it!

38

Our Solution: Incremental Copy-on-write with Rapid GC

slide-61
SLIDE 61
  • ld

new parent When no thread still holds a reference to it!

38

Our Solution: Incremental Copy-on-write with Rapid GC

Thread-local counters

C1 C2 C3 Cn

slide-62
SLIDE 62
  • ld

new parent When no thread still holds a reference to it! Thread-local counters

Cmax Cmax Cmin GC Condition: Cmin > garbage tag ++Ci = MAX(Ci , Cmax) + 1

38

Our Solution: Incremental Copy-on-write with Rapid GC

C1 C2 C3 Cn

slide-63
SLIDE 63

A Quick Recap of the Merge Algorithm The intermediate stage separates writes from the merge process The incremental merge algorithm with rapid GC is non-blocking and space-efficient

39

slide-64
SLIDE 64

What we are building now Compact Radix Tree Non-blocking Merge

40

slide-65
SLIDE 65

What we are building now Compact Radix Tree Non-blocking Merge

40

slide-66
SLIDE 66

What we are building now Compact Radix Tree Non-blocking Merge

40

slide-67
SLIDE 67

What we are building now Compact Radix Tree Non-blocking Merge Bwtree

40

slide-68
SLIDE 68

What we are building now Compact Radix Tree Non-blocking Merge Skiplist

40

slide-69
SLIDE 69

What we are building now Compact Radix Tree Non-blocking Merge Masstree

40

slide-70
SLIDE 70

Part III Super-compact static-stage

41

slide-71
SLIDE 71

Go “crazy” on space-efficiency

42

Succinct Data Structures

  • Z + o(Z), where Z is the information-theoretic lower bound
  • Still allow for efficient query operations

rank1(x) = # of 1’s up to position x select1(x) = position of the x-occurrence of 1 100011010000101…

slide-72
SLIDE 72

Encoding Radix Tree

43

a $ab $lnr$a $iio$ i$n $$ 100 100010 10001 010 11 10 1000 10000100 100101010 101010 1010

a $ a b $ l n r $ a $ i i

  • n

i $ $ $ $

slide-73
SLIDE 73

44

Memory Savings with the New Encoding

200 400 600 800 1000

Memory (MB)

ART Our Encoding

84%

50M email keys with average length = 20 bytes

slide-74
SLIDE 74

The Takeaway Message

45

Hybrid indexes can save the precious memory resources with minimum performance penalty.

slide-75
SLIDE 75

1-844-88-CMUDB Toll-Free Hotline:

44

slide-76
SLIDE 76

Back-up Slides

slide-77
SLIDE 77

Latency (ms)

50% 99% MAX 10 50 115

Hybrid

10 52 611

B+tree

slide-78
SLIDE 78

YCSB-based Microbenchmark Evaluation

Workload: insert, read/update(50/50) Key: email Value: 64-bit unsigned integer (pointer) Single thread 50M entries, 10M queries (Zipf distributed)

slide-79
SLIDE 79

B+tree Masstree Skip List ART 4 8

Memory (GB) Hybrid index saves memory

Original Hybrid

30 – 70%

slide-80
SLIDE 80

8M 16M

Throughput (txn/s) Hybrid index provides comparable throughput

Original Hybrid Read/Update (50/50)

4M 2M

Insert-only