database cracking
play

Database Cracking Languages and Runtimes for Big Data CSE 662 - - PowerPoint PPT Presentation

Database Cracking Languages and Runtimes for Big Data CSE 662 - Database Languages & Runtimes 1 Row Stores A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D4 Traditional DB: Lay out data on disk in rows CSE 662 - Database


  1. Database Cracking Languages and Runtimes for Big Data CSE 662 - Database Languages & Runtimes 1

  2. Row Stores A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D4 Traditional DB: Lay out data on disk in rows CSE 662 - Database Languages & Runtimes 2

  3. Column Stores A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 B4 C4 D4 A4 Columnar DB: Lay out data on disk in columns CSE 662 - Database Languages & Runtimes 3

  4. Column Stores A1 Row1 B1 Row1 C1 Row1 D1 Row1 A2 B2 C2 D2 Row2 Row2 Row2 Row2 A3 B3 C3 Row3 Row3 Row3 D3 Row3 B4 C4 D4 A4 Row4 Row4 Row4 Row4 Store with Row ID to recover original table CSE 662 - Database Languages & Runtimes 4

  5. Why use a Column Store? CSE 662 - Database Languages & Runtimes 5

  6. Immediate Data Access Problem: Data is initially unsorted Query: Find all rows where 100 < A ≤ 200 What is the fastest way to answer this query? CSE 662 - Database Languages & Runtimes 6

  7. Immediate Data Access Problem: Data is initially unsorted What if you get 2 queries? … 3 queries? … 100 queries? CSE 662 - Database Languages & Runtimes 7

  8. Immediate Data Access Problem: Data is initially unsorted Strategy 1: Index the data then run queries First few queries are much slower (upfront indexing cost) Strategy 2: Linear scans over the data Last few queries are much slower (no indexing!) CSE 662 - Database Languages & Runtimes 8

  9. Immediate Data Access Problem: Data is initially unsorted Strategy 3: Index while you run queries! Re-use compute effort of scans. CSE 662 - Database Languages & Runtimes 9

  10. Cracking 0 9 3 5 4 8 7 1 2 6 Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 10

  11. Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 9 3 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 11

  12. Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 9 3 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 12

  13. Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 9 3 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 13

  14. Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 9 3 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 14

  15. Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 9 3 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 15

  16. Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 3 9 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 16

  17. Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 3 9 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 17

  18. Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 3 4 5 9 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 18

  19. Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 3 4 5 9 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 19

  20. Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 3 4 1 9 8 7 5 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 20

  21. Cracking Step 1: Split into 2 bins: > 4 and ≤ 4 Low 0 3 4 1 2 8 7 5 9 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 21

  22. Cracking Step 2: Split into 2 bins: > 7 and ≤ 7 High Low 0 3 4 1 2 8 7 5 9 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 22

  23. Cracking Step 2: Split into 2 bins: > 7 and ≤ 7 High Low 0 3 4 1 2 8 7 5 9 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 23

  24. Cracking Step 2: Split into 2 bins: > 7 and ≤ 7 Low High 0 3 4 1 2 7 8 5 9 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 24

  25. Cracking Step 2: Split into 2 bins: > 7 and ≤ 7 Low High 0 3 4 1 2 7 5 6 9 8 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 25

  26. Cracking Step 2: Split into 2 bins: > 7 and ≤ 7 Low High 0 3 4 1 2 7 5 6 9 8 Result Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 26

  27. Cracking Binary ≤ 4 ≤ 7 Tree 0 3 4 1 2 7 5 6 9 8 Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 27

  28. Cracking ≤ 4 ≤ 7 0 3 4 1 2 7 5 6 9 8 Query 2: Find 2 < X ≤ 5 CSE 662 - Database Languages & Runtimes 28

  29. Cracking ≤ 2 ≤ 4 ≤ 5 ≤ 7 0 1 2 3 4 5 7 6 9 8 Result Query 2: Find 2 < X ≤ 5 CSE 662 - Database Languages & Runtimes 29

  30. 3-Way Cracking Low High 0 9 3 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 30

  31. 3-Way Cracking Low High 0 9 3 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 31

  32. 3-Way Cracking Low High 0 9 3 5 4 8 7 1 2 6 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 32

  33. 3-Way Cracking Low High 0 6 3 5 4 8 7 1 2 9 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 33

  34. 3-Way Cracking Low High 0 3 6 5 4 8 7 1 2 9 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 34

  35. 3-Way Cracking Low High 0 3 4 2 1 5 6 7 8 9 Current Query 1: Find 4 < X ≤ 7 CSE 662 - Database Languages & Runtimes 35

  36. Discussion Questions… Does cracking work with a row-oriented database? CSE 662 - Database Languages & Runtimes 36

  37. 
 Discussion Questions… How would one crack a multi-attribute index? (e.g., a spatial index?) CSE 662 - Database Languages & Runtimes 37

  38. Discussion Questions… Can updates be performed efficiently on a cracker index? CSE 662 - Database Languages & Runtimes 38

  39. Discussion Questions… Can updates be performed efficiently on a cracker index? What constraints are required? CSE 662 - Database Languages & Runtimes 39

  40. Discussion Questions… What applications would cracking work well on? What applications would cracking work poorly on? CSE 662 - Database Languages & Runtimes 40

  41. Discussion Questions… Upfront Indexing vs Sequential Scan vs Cracking… Where is the cutoff? CSE 662 - Database Languages & Runtimes 41

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend