CSE 662 - Database Languages & Runtimes
Database Cracking
Languages and Runtimes for Big Data
1
Database Cracking Languages and Runtimes for Big Data CSE 662 - - - PowerPoint PPT Presentation
Database Cracking Languages and Runtimes for Big Data CSE 662 - Database Languages & Runtimes 1 Row Stores A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D4 Traditional DB: Lay out data on disk in rows CSE 662 - Database
CSE 662 - Database Languages & Runtimes
Languages and Runtimes for Big Data
1
CSE 662 - Database Languages & Runtimes
2
A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D4
Traditional DB: Lay out data on disk in rows
CSE 662 - Database Languages & Runtimes
3
A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D4
Columnar DB: Lay out data on disk in columns
CSE 662 - Database Languages & Runtimes
Row1 Row2 Row3 Row4 Row1 Row2 Row3 Row4 Row1 Row2 Row3 Row4 Row1 Row2 Row3 Row4
4
B1 C1 D1 B2 C2 D2 B3 C3 D3 A1 A2 A3 A4 B4 C4 D4
Store with Row ID to recover original table
CSE 662 - Database Languages & Runtimes
5
CSE 662 - Database Languages & Runtimes
6
Problem: Data is initially unsorted Query: Find all rows where 100 < A ≤ 200 What is the fastest way to answer this query?
CSE 662 - Database Languages & Runtimes
7
Problem: Data is initially unsorted What if you get 2 queries? … 3 queries? … 100 queries?
CSE 662 - Database Languages & Runtimes
8
Problem: Data is initially unsorted Strategy 1: Index the data then run queries Strategy 2: Linear scans over the data
First few queries are much slower (upfront indexing cost) Last few queries are much slower (no indexing!)
CSE 662 - Database Languages & Runtimes
9
Problem: Data is initially unsorted Strategy 3: Index while you run queries!
Re-use compute effort of scans.
CSE 662 - Database Languages & Runtimes
10
9 3 5 4 8 7 1 2 6
Query 1: Find 4 < X ≤ 7
CSE 662 - Database Languages & Runtimes
11
9 3 5 4 8 7 1 2 6
Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4
CSE 662 - Database Languages & Runtimes
12
9 3 5 4 8 7 1 2 6
Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4
CSE 662 - Database Languages & Runtimes
13
9 3 5 4 8 7 1 2 6
Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4
CSE 662 - Database Languages & Runtimes
14
9 3 5 4 8 7 1 2 6
Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4
CSE 662 - Database Languages & Runtimes
15
9 3 5 4 8 7 1 2 6
Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4
CSE 662 - Database Languages & Runtimes
16
3 9 5 4 8 7 1 2 6
Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4
CSE 662 - Database Languages & Runtimes
17
3 9 5 4 8 7 1 2 6
Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4
CSE 662 - Database Languages & Runtimes
18
3 4 5 9 8 7 1 2 6
Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4
CSE 662 - Database Languages & Runtimes
19
3 4 5 9 8 7 1 2 6
Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4
CSE 662 - Database Languages & Runtimes
20
3 4 1 9 8 7 5 2 6
Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4
CSE 662 - Database Languages & Runtimes
21
3 4 1 2 8 7 5 9 6
Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4
CSE 662 - Database Languages & Runtimes
22
3 4 1 2 8 7 5 9 6
Query 1: Find 4 < X ≤ 7 Low Current Step 2: Split into 2 bins: > 7 and ≤ 7 High
CSE 662 - Database Languages & Runtimes
23
3 4 1 2 8 7 5 9 6
Query 1: Find 4 < X ≤ 7 Low Current Step 2: Split into 2 bins: > 7 and ≤ 7 High
CSE 662 - Database Languages & Runtimes
24
3 4 1 2 7 8 5 9 6
Query 1: Find 4 < X ≤ 7 Low Current Step 2: Split into 2 bins: > 7 and ≤ 7 High
CSE 662 - Database Languages & Runtimes
25
3 4 1 2 7 5 6 9 8
Query 1: Find 4 < X ≤ 7 Low Current Step 2: Split into 2 bins: > 7 and ≤ 7 High
CSE 662 - Database Languages & Runtimes
26
3 4 1 2 7 5 6 9 8
Query 1: Find 4 < X ≤ 7 Low Step 2: Split into 2 bins: > 7 and ≤ 7 High Result
CSE 662 - Database Languages & Runtimes
27
3 4 1 2 7 5 6 9 8
Query 1: Find 4 < X ≤ 7 ≤4 ≤7 Binary Tree
CSE 662 - Database Languages & Runtimes
28
3 4 1 2 7 5 6 9 8
Query 2: Find 2 < X ≤ 5 ≤4 ≤7
CSE 662 - Database Languages & Runtimes
29
1 2 3 4 5 7 6 9 8
Query 2: Find 2 < X ≤ 5 ≤4 ≤7 ≤2 ≤5 Result
CSE 662 - Database Languages & Runtimes
30
9 3 5 4 8 7 1 2 6
Query 1: Find 4 < X ≤ 7 Low Current High
CSE 662 - Database Languages & Runtimes
31
9 3 5 4 8 7 1 2 6
Query 1: Find 4 < X ≤ 7 Low Current High
CSE 662 - Database Languages & Runtimes
32
9 3 5 4 8 7 1 2 6
Query 1: Find 4 < X ≤ 7 Low Current High
CSE 662 - Database Languages & Runtimes
33
6 3 5 4 8 7 1 2 9
Query 1: Find 4 < X ≤ 7 Low Current High
CSE 662 - Database Languages & Runtimes
34
3 6 5 4 8 7 1 2 9
Query 1: Find 4 < X ≤ 7 Low Current High
CSE 662 - Database Languages & Runtimes
35
3 4 2 1 5 6 7 8 9
Query 1: Find 4 < X ≤ 7 Low Current High
CSE 662 - Database Languages & Runtimes
36
Does cracking work with a row-oriented database?
CSE 662 - Database Languages & Runtimes
37
How would one crack a multi-attribute index? (e.g., a spatial index?)
CSE 662 - Database Languages & Runtimes
38
Can updates be performed efficiently on a cracker index?
CSE 662 - Database Languages & Runtimes
39
Can updates be performed efficiently on a cracker index? What constraints are required?
CSE 662 - Database Languages & Runtimes
40
What applications would cracking work well on? What applications would cracking work poorly on?
CSE 662 - Database Languages & Runtimes
41
Upfront Indexing vs Sequential Scan vs Cracking… Where is the cutoff?