after index creation Nikolaus Glombiewski 1 , Bernhard Seeger 1 , - - PowerPoint PPT Presentation

after index creation
SMART_READER_LITE
LIVE PREVIEW

after index creation Nikolaus Glombiewski 1 , Bernhard Seeger 1 , - - PowerPoint PPT Presentation

Waves of misery after index creation Nikolaus Glombiewski 1 , Bernhard Seeger 1 , Goetz Graefe 2 1 University of Marburg 2 Google Inc. 1 Outline Problem Assessment Basic Solution Ideal Solution Practical Remedies Experimental


slide-1
SLIDE 1

Waves of misery after index creation

Nikolaus Glombiewski1, Bernhard Seeger1, Goetz Graefe2

1University of Marburg 2Google Inc.

1

slide-2
SLIDE 2

Outline

  • Problem Assessment
  • Basic Solution
  • Ideal Solution
  • Practical Remedies
  • Experimental Evaluation
  • Conclusion

2

slide-3
SLIDE 3

Outline

  • Problem Assessment
  • Basic Solution
  • Ideal Solution
  • Practical Remedies
  • Experimental Evaluation
  • Conclusion

3

slide-4
SLIDE 4

Indexes: Pros & Cons

  • Pros
  • Fast lookups
  • Fast ordered range scans

➔ Best supported by bulk loading a perfect secondary b-tree

  • Cons
  • Maintenance cost
  • Robustness of performance over time

4

slide-5
SLIDE 5

Creation of a Perfect B-tree

1, 5, 8, 15, 17, 19, 41, 50, 90, 100, 120, 142 1, 5, 8 Building B-Tree Max Nodesize = 3 15, 17, 19 100, 120, 142 41, 50, 90 15, 41, 100 Sort

5

slide-6
SLIDE 6

Subsequent Insertions on a Perfect B-tree

1, 5, 8 15, 17, 19 100, 120, 142 41, 50, 90 15, 41, 100 Max Nodesize = 3 Insert Batch 4, 10, 22, 60, 102, 150

6

slide-7
SLIDE 7

Subsequent Insertions on a Perfect B-tree

1, 4, 5, 8, 10 14, 17, 19, 22 100, 102, 120, 142, 150 41, 50, 60, 90 15, 41, 100 Node Split Node Split Node Split Node Split Max Nodesize = 3

=> Immediate, widespread node splits after index creation

Insert Batch 4, 10, 22, 60, 102, 150

7

slide-8
SLIDE 8

Problem of Subsequent Insertions

  • Splits of almost all leaves within a short time period
  • high I/O load
  • low buffer utilization
  • low query performance due to contention
  • Status quo database solution: Leave free space (e.g. 30%)
  • Oracle, SQL Server, DB2, …

100% 50% 50% Split

8

slide-9
SLIDE 9

Creation of a Perfect B-tree with free space

1, 5, 8, 15, 17, 19, 41, 50, 90, 100, 120, 142 1, 5 Building B-Tree Max Nodesize = 3, Utilization = 70% 17, 19 120, 142 90, 100 8, 17, 41, 90, 120 Sort 8, 15 41, 50

9

slide-10
SLIDE 10

Subsequent insertions

1, 5 17, 19 120, 142 90, 100 8, 17, 41, 90, 120 8, 15 41, 50 4, 10, 22, 60, 102, 150 Max Nodesize = 3 Insert Batch

10

slide-11
SLIDE 11

Subsequent insertions

1, 4, 5 17, 19, 22 120, 142, 150 90, 100, 102 8, 17, 41, 90, 120 8, 10, 14 41, 50, 60 Max Nodesize = 3 Insert Batch 4, 10, 22, 60, 102, 150

10

slide-12
SLIDE 12

Continuation of insertions

1, 4, 5 17, 19, 22 120, 142, 150 90, 100, 102 8, 17, 41, 90, 120 8, 10, 14 41, 50, 60 Max Nodesize = 3 Insert Batch 6, 13, 38, 55, 95, 136

12

slide-13
SLIDE 13

Continuation of insertions

1, 4, 5, 6 17, 19, 22, 38 120, 136, 142, 150 90, 95, 100, 102 8, 17, 41, 90, 120 8, 10, 13, 14 41, 50, 55, 60 Max Nodesize = 3 Insert Batch 6, 13, 38, 55, 95, 136 Node Split

=> Delayed, widespread node splits after index creation

Node Split Node Split Node Split Node Split Node Split

11

slide-14
SLIDE 14

Limitations of the status quo

  • The problem of splits is merely delayed
  • Moreover, the problem occurs in waves

12

slide-15
SLIDE 15

Problem Assessment – When does it occur?

  • Loading Distribution = Insert Distribution
  • E.g.: Hash-Keys

13

slide-16
SLIDE 16

Outline

  • Problem Assessment
  • Basic Solution
  • Ideal Solution
  • Practical Remedies
  • Experimental Evaluation
  • Conclusion

14

slide-17
SLIDE 17

Basic Idea

  • Do not leave constant free space while loading

90% 60% 95% Data 50% 70% Bulk Loading

15

slide-18
SLIDE 18

Basic Idea – Insert Batch

90% 60% 95% 50% 70% Insert Batch Node Split Node Split

16

=> Distributing node splits over time

slide-19
SLIDE 19

Outline

  • Problem Assessment
  • Basic Solution
  • Ideal Solution
  • Practical Remedies
  • Experimental Evaluation
  • Conclusion

17

slide-20
SLIDE 20

Ideal solution for predictable splits

Moving hills into valleys

18

slide-21
SLIDE 21

Ideal solution (Leaf Nodes)

B/2 B B/2 1 Item B-5

  • qB = Probability of a split after insertion

qB

19

slide-22
SLIDE 22

Ideal solution (Leaf Nodes)

  • Fringe Analysis:
  • Insert-Operation:

Ԧ 𝑟 𝑜 ∗ 𝐽 +

1 𝑜+1 𝑈

= Ԧ 𝑟 𝑜 + 1

  • Goal: Ԧ

𝑟 𝑜 = Ԧ 𝑟 𝑜 + 𝑙 => A stable state

20

slide-23
SLIDE 23

Ideal solution (Leaf Nodes)

  • Goal: Ԧ

𝑟 𝑜 = Ԧ 𝑟 𝑜 + 𝑙 => A stable state

  • Analyze Transition:
  • Formula holds for qj = 1/(j+1)

21

slide-24
SLIDE 24

Ideal solution (Leaf Nodes)

  • Intuition:
  • Few full pages split immediately
  • Many half full pages eventually
  • Ideal solution
  • …for expected B-tree utilization
  • ...i.e., for Utilization of ln(2) = 69%

22

slide-25
SLIDE 25

Outline

  • Problem Assessment
  • Basic Solution
  • Ideal Solution
  • Practical Remedies
  • Experimental Evaluation
  • Conclusion

23

slide-26
SLIDE 26

Practical Remedies – Random

  • While loading: Randomly pick around target utilization

80% 80%+x 80%-y nextPage()

24

slide-27
SLIDE 27

Practical Remedies – Suffix Truncation

  • While loading: Search for shortest key within range
  • Added compression effect

"catchphrase" 100% 80% nextPage() "catchweight" "cattle" ...

25

slide-28
SLIDE 28

Outline

  • Problem Assessment
  • Basic Solution
  • Ideal Solution
  • Practical Remedies
  • Experimental Evaluation
  • Conclusion

26

slide-29
SLIDE 29

Experimental Evaluation – Setup

  • Procedure:
  • Records: 21 integers (84 bytes), normal distribution
  • Loading b-tree with 100,000 pages of 8KB
  • Inserting batches of 10,000 records
  • Workstation:
  • AMD Ryzen7 2700X
  • 16GB memory
  • Java indexing library XXL

27

slide-30
SLIDE 30

Experimental Evaluation – Random

28

slide-31
SLIDE 31

Experimental Evaluation – Buffer Utilization

29

slide-32
SLIDE 32

Outline

  • Problem Assessment
  • Basic Solution
  • Ideal Solution
  • Practical Remedies
  • Experimental Evaluation
  • Conclusion

30

slide-33
SLIDE 33

Waves of Misery after index creation

  • Loading secondary b-tree index in...
  • Write-intensive workloads
  • Loading distribution = Insert distribution
  • Want to achieve predictable split performance:

Don't just leave constant free space in your tree nodes! Work towards starting in the steady state of the b-tree.

31

slide-34
SLIDE 34

Thank you for your attention!

32