Database Cracking Languages and Runtimes for Big Data CSE 662 - - - PowerPoint PPT Presentation

database cracking
SMART_READER_LITE
LIVE PREVIEW

Database Cracking Languages and Runtimes for Big Data CSE 662 - - - PowerPoint PPT Presentation

Database Cracking Languages and Runtimes for Big Data CSE 662 - Database Languages & Runtimes 1 Row Stores A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D4 Traditional DB: Lay out data on disk in rows CSE 662 - Database


slide-1
SLIDE 1

CSE 662 - Database Languages & Runtimes

Database Cracking

Languages and Runtimes for Big Data

1

slide-2
SLIDE 2

CSE 662 - Database Languages & Runtimes

Row Stores

2

A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D4

Traditional DB: Lay out data on disk in rows

slide-3
SLIDE 3

CSE 662 - Database Languages & Runtimes

Column Stores

3

A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D4

Columnar DB: Lay out data on disk in columns

slide-4
SLIDE 4

CSE 662 - Database Languages & Runtimes

Row1 Row2 Row3 Row4 Row1 Row2 Row3 Row4 Row1 Row2 Row3 Row4 Row1 Row2 Row3 Row4

Column Stores

4

B1 C1 D1 B2 C2 D2 B3 C3 D3 A1 A2 A3 A4 B4 C4 D4

Store with Row ID to recover original table

slide-5
SLIDE 5

CSE 662 - Database Languages & Runtimes

Why use a Column Store?

5

slide-6
SLIDE 6

CSE 662 - Database Languages & Runtimes

Immediate Data Access

6

Problem: Data is initially unsorted Query: Find all rows where 100 < A ≤ 200 What is the fastest way to answer this query?

slide-7
SLIDE 7

CSE 662 - Database Languages & Runtimes

Immediate Data Access

7

Problem: Data is initially unsorted What if you get 2 queries? … 3 queries? … 100 queries?

slide-8
SLIDE 8

CSE 662 - Database Languages & Runtimes

Immediate Data Access

8

Problem: Data is initially unsorted Strategy 1: Index the data then run queries Strategy 2: Linear scans over the data

First few queries are much slower (upfront indexing cost) Last few queries are much slower (no indexing!)

slide-9
SLIDE 9

CSE 662 - Database Languages & Runtimes

Immediate Data Access

9

Problem: Data is initially unsorted Strategy 3: Index while you run queries!

Re-use compute effort of scans.

slide-10
SLIDE 10

CSE 662 - Database Languages & Runtimes

Cracking

10

9 3 5 4 8 7 1 2 6

Query 1: Find 4 < X ≤ 7

slide-11
SLIDE 11

CSE 662 - Database Languages & Runtimes

Cracking

11

9 3 5 4 8 7 1 2 6

Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4

slide-12
SLIDE 12

CSE 662 - Database Languages & Runtimes

Cracking

12

9 3 5 4 8 7 1 2 6

Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4

slide-13
SLIDE 13

CSE 662 - Database Languages & Runtimes

Cracking

13

9 3 5 4 8 7 1 2 6

Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4

slide-14
SLIDE 14

CSE 662 - Database Languages & Runtimes

Cracking

14

9 3 5 4 8 7 1 2 6

Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4

slide-15
SLIDE 15

CSE 662 - Database Languages & Runtimes

Cracking

15

9 3 5 4 8 7 1 2 6

Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4

slide-16
SLIDE 16

CSE 662 - Database Languages & Runtimes

Cracking

16

3 9 5 4 8 7 1 2 6

Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4

slide-17
SLIDE 17

CSE 662 - Database Languages & Runtimes

Cracking

17

3 9 5 4 8 7 1 2 6

Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4

slide-18
SLIDE 18

CSE 662 - Database Languages & Runtimes

Cracking

18

3 4 5 9 8 7 1 2 6

Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4

slide-19
SLIDE 19

CSE 662 - Database Languages & Runtimes

Cracking

19

3 4 5 9 8 7 1 2 6

Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4

slide-20
SLIDE 20

CSE 662 - Database Languages & Runtimes

Cracking

20

3 4 1 9 8 7 5 2 6

Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4

slide-21
SLIDE 21

CSE 662 - Database Languages & Runtimes

Cracking

21

3 4 1 2 8 7 5 9 6

Query 1: Find 4 < X ≤ 7 Low Current Step 1: Split into 2 bins: > 4 and ≤ 4

slide-22
SLIDE 22

CSE 662 - Database Languages & Runtimes

Cracking

22

3 4 1 2 8 7 5 9 6

Query 1: Find 4 < X ≤ 7 Low Current Step 2: Split into 2 bins: > 7 and ≤ 7 High

slide-23
SLIDE 23

CSE 662 - Database Languages & Runtimes

Cracking

23

3 4 1 2 8 7 5 9 6

Query 1: Find 4 < X ≤ 7 Low Current Step 2: Split into 2 bins: > 7 and ≤ 7 High

slide-24
SLIDE 24

CSE 662 - Database Languages & Runtimes

Cracking

24

3 4 1 2 7 8 5 9 6

Query 1: Find 4 < X ≤ 7 Low Current Step 2: Split into 2 bins: > 7 and ≤ 7 High

slide-25
SLIDE 25

CSE 662 - Database Languages & Runtimes

Cracking

25

3 4 1 2 7 5 6 9 8

Query 1: Find 4 < X ≤ 7 Low Current Step 2: Split into 2 bins: > 7 and ≤ 7 High

slide-26
SLIDE 26

CSE 662 - Database Languages & Runtimes

Cracking

26

3 4 1 2 7 5 6 9 8

Query 1: Find 4 < X ≤ 7 Low Step 2: Split into 2 bins: > 7 and ≤ 7 High Result

slide-27
SLIDE 27

CSE 662 - Database Languages & Runtimes

Cracking

27

3 4 1 2 7 5 6 9 8

Query 1: Find 4 < X ≤ 7 ≤4 ≤7 Binary Tree

slide-28
SLIDE 28

CSE 662 - Database Languages & Runtimes

Cracking

28

3 4 1 2 7 5 6 9 8

Query 2: Find 2 < X ≤ 5 ≤4 ≤7

slide-29
SLIDE 29

CSE 662 - Database Languages & Runtimes

Cracking

29

1 2 3 4 5 7 6 9 8

Query 2: Find 2 < X ≤ 5 ≤4 ≤7 ≤2 ≤5 Result

slide-30
SLIDE 30

CSE 662 - Database Languages & Runtimes

3-Way Cracking

30

9 3 5 4 8 7 1 2 6

Query 1: Find 4 < X ≤ 7 Low Current High

slide-31
SLIDE 31

CSE 662 - Database Languages & Runtimes

3-Way Cracking

31

9 3 5 4 8 7 1 2 6

Query 1: Find 4 < X ≤ 7 Low Current High

slide-32
SLIDE 32

CSE 662 - Database Languages & Runtimes

3-Way Cracking

32

9 3 5 4 8 7 1 2 6

Query 1: Find 4 < X ≤ 7 Low Current High

slide-33
SLIDE 33

CSE 662 - Database Languages & Runtimes

3-Way Cracking

33

6 3 5 4 8 7 1 2 9

Query 1: Find 4 < X ≤ 7 Low Current High

slide-34
SLIDE 34

CSE 662 - Database Languages & Runtimes

3-Way Cracking

34

3 6 5 4 8 7 1 2 9

Query 1: Find 4 < X ≤ 7 Low Current High

slide-35
SLIDE 35

CSE 662 - Database Languages & Runtimes

3-Way Cracking

35

3 4 2 1 5 6 7 8 9

Query 1: Find 4 < X ≤ 7 Low Current High

slide-36
SLIDE 36

CSE 662 - Database Languages & Runtimes

Discussion Questions…

36

Does cracking work with a row-oriented database?

slide-37
SLIDE 37

CSE 662 - Database Languages & Runtimes

Discussion Questions…

37

How would one crack a multi-attribute index? 
 (e.g., a spatial index?)

slide-38
SLIDE 38

CSE 662 - Database Languages & Runtimes

Discussion Questions…

38

Can updates be performed efficiently on a cracker index?

slide-39
SLIDE 39

CSE 662 - Database Languages & Runtimes

Discussion Questions…

39

Can updates be performed efficiently on a cracker index? What constraints are required?

slide-40
SLIDE 40

CSE 662 - Database Languages & Runtimes

Discussion Questions…

40

What applications would cracking work well on? What applications would cracking work poorly on?

slide-41
SLIDE 41

CSE 662 - Database Languages & Runtimes

Discussion Questions…

41

Upfront Indexing vs Sequential Scan vs Cracking… Where is the cutoff?