after index creation
play

after index creation Nikolaus Glombiewski 1 , Bernhard Seeger 1 , - PowerPoint PPT Presentation

Waves of misery after index creation Nikolaus Glombiewski 1 , Bernhard Seeger 1 , Goetz Graefe 2 1 University of Marburg 2 Google Inc. 1 Outline Problem Assessment Basic Solution Ideal Solution Practical Remedies Experimental


  1. Waves of misery after index creation Nikolaus Glombiewski 1 , Bernhard Seeger 1 , Goetz Graefe 2 1 University of Marburg 2 Google Inc. 1

  2. Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 2

  3. Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 3

  4. Indexes: Pros & Cons • Pros • Fast lookups • Fast ordered range scans ➔ Best supported by bulk loading a perfect secondary b-tree • Cons • Maintenance cost • Robustness of performance over time 4

  5. Creation of a Perfect B-tree Sort 1, 5, 8, 15, 17, 19, 41, 50, 90, 100, 120, 142 Building B-Tree Max Nodesize = 3 15, 41, 100 1, 5, 8 15, 17, 19 41, 50, 90 100, 120, 142 5

  6. Subsequent Insertions on a Perfect B-tree 4, 10, 22, 60, 102, 150 Insert Batch 15, 41, 100 Max Nodesize = 3 1, 5, 8 15, 17, 19 41, 50, 90 100, 120, 142 6

  7. Subsequent Insertions on a Perfect B-tree 4, 10, 22, 60, 102, 150 Insert Batch 15, 41, 100 Max Nodesize = 3 1, 4, 5, 8, 10 14, 17, 19, 22 41, 50, 60, 90 100, 102, 120, 142, 150 Node Split Node Split Node Split Node Split => Immediate , widespread node splits after index creation 7

  8. Problem of Subsequent Insertions • Splits of almost all leaves within a short time period 100% • high I/O load • low buffer utilization Split • low query performance due to contention 50% 50% • Status quo database solution: Leave free space (e.g. 30%) • Oracle, SQL Server, DB2, … 8

  9. Creation of a Perfect B-tree with free space Sort 1, 5, 8, 15, 17, 19, 41, 50, 90, 100, 120, 142 Building B-Tree Max Nodesize = 3, Utilization = 70% 8, 17, 41, 90, 120 1, 5 8, 15 17, 19 41, 50 90, 100 120, 142 9

  10. Subsequent insertions 4, 10, 22, 60, 102, 150 Insert Batch 8, 17, 41, 90, 120 Max Nodesize = 3 1, 5 8, 15 17, 19 41, 50 90, 100 120, 142 10

  11. Subsequent insertions 4, 10, 22, 60, 102, 150 Insert Batch 8, 17, 41, 90, 120 Max Nodesize = 3 1, 4, 5 8, 10, 14 17, 19, 22 41, 50, 60 90, 100, 102 120, 142, 150 10

  12. Continuation of insertions 6, 13, 38, 55, 95, 136 Insert Batch 8, 17, 41, 90, 120 Max Nodesize = 3 1, 4, 5 8, 10, 14 17, 19, 22 41, 50, 60 90, 100, 102 120, 142, 150 12

  13. Continuation of insertions 6, 13, 38, 55, 95, 136 Insert Batch 8, 17, 41, 90, 120 Max Nodesize = 3 1, 4, 5, 6 8, 10, 13, 14 17, 19, 22, 38 41, 50, 55, 60 90, 95, 100, 102 120, 136, 142, 150 Node Split Node Split Node Split Node Split Node Split Node Split => Delayed , widespread node splits after index creation 11

  14. Limitations of the status quo • The problem of splits is merely delayed • Moreover, the problem occurs in waves 12

  15. Problem Assessment – When does it occur? • Loading Distribution = Insert Distribution • E.g.: Hash-Keys 13

  16. Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 14

  17. Basic Idea • Do not leave constant free space while loading Data Bulk Loading 70% 50% 90% 60% 95% 15

  18. Basic Idea – Insert Batch Insert Batch 70% 50% 90% 60% 95% Node Split Node Split => Distributing node splits over time 16

  19. Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 17

  20. Ideal solution for predictable splits Moving hills into valleys 18

  21. Ideal solution (Leaf Nodes) 1 Item q B B B/2 B/2 B-5 • q B = Probability of a split after insertion 19

  22. Ideal solution (Leaf Nodes) • Fringe Analysis: 1 • Insert-Operation: 𝑟 𝑜 ∗ 𝐽 + Ԧ 𝑜+1 𝑈 = Ԧ 𝑟 𝑜 + 1 • Goal: Ԧ 𝑟 𝑜 = Ԧ 𝑟 𝑜 + 𝑙 => A stable state 20

  23. Ideal solution (Leaf Nodes) • Goal: Ԧ 𝑟 𝑜 = Ԧ 𝑟 𝑜 + 𝑙 => A stable state • Analyze Transition: • Formula holds for q j = 1/(j+1) 21

  24. Ideal solution (Leaf Nodes) • Intuition: • Few full pages split immediately • Many half full pages eventually • Ideal solution • …for expected B -tree utilization • ...i.e., for Utilization of ln(2) = 69% 22

  25. Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 23

  26. Practical Remedies – Random • While loading: Randomly pick around target utilization 80%+x nextPage() 80% 80%-y 24

  27. Practical Remedies – Suffix Truncation • While loading: Search for shortest key within range 100% "cattle" ... nextPage() "catchweight" "catchphrase" 80% • Added compression effect 25

  28. Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 26

  29. Experimental Evaluation – Setup • Procedure: • Records: 21 integers (84 bytes), normal distribution • Loading b-tree with 100,000 pages of 8KB • Inserting batches of 10,000 records • Workstation: • AMD Ryzen7 2700X • 16GB memory • Java indexing library XXL 27

  30. Experimental Evaluation – Random 28

  31. Experimental Evaluation – Buffer Utilization 29

  32. Outline • Problem Assessment • Basic Solution • Ideal Solution • Practical Remedies • Experimental Evaluation • Conclusion 30

  33. Waves of Misery after index creation • Loading secondary b-tree index in... • Write-intensive workloads • Loading distribution = Insert distribution • Want to achieve predictable split performance: Don't just leave constant free space in your tree nodes! Work towards starting in the steady state of the b-tree. 31

  34. Thank you for your attention! 32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend