ADVANCED DATABASE SYSTEMS Query Execution & Processing @ - - PowerPoint PPT Presentation

advanced database systems
SMART_READER_LITE
LIVE PREVIEW

ADVANCED DATABASE SYSTEMS Query Execution & Processing @ - - PowerPoint PPT Presentation

Lect ure # 15 ADVANCED DATABASE SYSTEMS Query Execution & Processing @ Andy_Pavlo // 15- 721 // Spring 2019 CMU 15-721 (Spring 2019) 2 ARCH ITECTURE OVERVIEW Networking Layer SQL Query SQL Parser Planner Binder Rewriter


slide-1
SLIDE 1

Query Execution & Processing

@ Andy_Pavlo // 15- 721 // Spring 2019

ADVANCED DATABASE SYSTEMS

Lect ure # 15

slide-2
SLIDE 2 CMU 15-721 (Spring 2019)

Scheduling / Placement Concurrency Control Indexes Operator Execution

ARCH ITECTURE OVERVIEW

2

SQL Query

Networking Layer Planner Compiler Execution Engine Storage Manager

SQL Parser Binder Optimizer / Cost Models Rewriter Storage Models Logging / Checkpoints

We Are Here

slide-3
SLIDE 3 CMU 15-721 (Spring 2019)

O PERATO R EXECUTIO N

Query Plan Processing Application Logic Execution (UDFs) Parallel Join Algorithms Vectorized Operators Query Compilation

3

slide-4
SLIDE 4 CMU 15-721 (Spring 2019)

Q UERY EXECUTIO N

A query plan is comprised of operators. An operator instance is an invocation of an

  • perator on some segment of data.

A task is the execution of a sequence of one or more operator instances.

4

slide-5
SLIDE 5 CMU 15-721 (Spring 2019)

EXECUTIO N O PTIM IZATIO N

We are now going to start discussing ways to improve the DBMS's query execution performance for data sets that fit entirely in memory. There are other bottlenecks to target when we remove the disk.

5

slide-6
SLIDE 6 CMU 15-721 (Spring 2019)

O PTIM IZATIO N GOALS

Approach #1: Reduce Instruction Count

→ Use fewer instructions to do the same amount of work.

Approach #2: Reduce Cycles per Instruction

→ Execute more CPU instructions in fewer cycles. → This means reducing cache misses and stalls due to memory load/stores.

Approach #3: Parallelize Execution

→ Use multiple threads to compute each query in parallel.

6

slide-7
SLIDE 7 CMU 15-721 (Spring 2019)

MonetDB/X100 Analysis Processing Models Parallel Execution

7

slide-8
SLIDE 8 CMU 15-721 (Spring 2019)

M O N ETDB/ X10 0

Low-level analysis of execution bottlenecks for in- memory DBMSs on OLAP workloads.

→ Show how DBMS are designed incorrectly for modern CPU architectures.

Based on these findings, they proposed a new DBMS called MonetDB/X100.

→ Renamed to Vectorwise and acquired by Actian in 2010. → Rebranded as Vector and Avalance

8

MONETDB/X100: HYPER- PIPELINING QUERY EXECUTION

CIDR 2 2005

slide-9
SLIDE 9 CMU 15-721 (Spring 2019)

M O N ETDB/ X10 0

Low-level analysis of execution bottlenecks for in- memory DBMSs on OLAP workloads.

→ Show how DBMS are designed incorrectly for modern CPU architectures.

Based on these findings, they proposed a new DBMS called MonetDB/X100.

→ Renamed to Vectorwise and acquired by Actian in 2010. → Rebranded as Vector and Avalance

8

MONETDB/X100: HYPER- PIPELINING QUERY EXECUTION

CIDR 2 2005

slide-10
SLIDE 10 CMU 15-721 (Spring 2019)

M O N ETDB/ X10 0

Low-level analysis of execution bottlenecks for in- memory DBMSs on OLAP workloads.

→ Show how DBMS are designed incorrectly for modern CPU architectures.

Based on these findings, they proposed a new DBMS called MonetDB/X100.

→ Renamed to Vectorwise and acquired by Actian in 2010. → Rebranded as Vector and Avalance

8

MONETDB/X100: HYPER- PIPELINING QUERY EXECUTION

CIDR 2 2005

slide-11
SLIDE 11 CMU 15-721 (Spring 2019)

CPU OVERVIEW

CPUs organize instructions into pipeline stages. The goal is to keep all parts of the processor busy at each cycle by masking delays from instructions that cannot complete in a single cycle. Super-scalar CPUs support multiple pipelines.

→ Execute multiple instructions in parallel in a single cycle if they are independent. → Flynn's Taxonomy: Single Instruction stream, Single Data stream (SISD)

9

slide-12
SLIDE 12 CMU 15-721 (Spring 2019)

DBM S / CPU PRO BLEM S

Problem #1: Dependencies

→ If one instruction depends on another instruction, then it cannot be pushed immediately into the same pipeline.

Problem #2: Branch Prediction

→ The CPU tries to predict what branch the program will take and fill in the pipeline with its instructions. → If it gets it wrong, it has to throw away any speculative work and flush the pipeline.

10

slide-13
SLIDE 13 CMU 15-721 (Spring 2019)

BRAN CH M ISPREDICTIO N

Because of long pipelines, CPUs will speculatively execute branches. This potentially hides the long stalls between dependent instructions. The most executed branching code in a DBMS is the filter operation during a sequential scan. But this is (nearly) impossible to predict correctly.

11

slide-14
SLIDE 14 CMU 15-721 (Spring 2019)

SELECT * FROM table WHERE key >= $(low) AND key <= $(high)

SELECTIO N SCAN S

12

Source: Bogdan Raducanu

slide-15
SLIDE 15 CMU 15-721 (Spring 2019)

SELECTIO N SCAN S

12

Scalar (Branching)

i = 0 for t in table: key = t.key if (key≥low) && (key≤high): copy(t, output[i]) i = i + 1

Source: Bogdan Raducanu

slide-16
SLIDE 16 CMU 15-721 (Spring 2019)

SELECTIO N SCAN S

12

Scalar (Branching)

i = 0 for t in table: key = t.key if (key≥low) && (key≤high): copy(t, output[i]) i = i + 1

Source: Bogdan Raducanu

slide-17
SLIDE 17 CMU 15-721 (Spring 2019)

SELECTIO N SCAN S

12

Scalar (Branching)

i = 0 for t in table: key = t.key if (key≥low) && (key≤high): copy(t, output[i]) i = i + 1

Scalar (Branchless)

i = 0 for t in table: copy(t, output[i]) key = t.key m = (key≥low ? 1 : 0) && ⮱(key≤high ? 1 : 0) i = i + m

Source: Bogdan Raducanu

slide-18
SLIDE 18 CMU 15-721 (Spring 2019)

SELECTIO N SCAN S

12

Scalar (Branching)

i = 0 for t in table: key = t.key if (key≥low) && (key≤high): copy(t, output[i]) i = i + 1

Scalar (Branchless)

i = 0 for t in table: copy(t, output[i]) key = t.key m = (key≥low ? 1 : 0) && ⮱(key≤high ? 1 : 0) i = i + m

Source: Bogdan Raducanu

slide-19
SLIDE 19 CMU 15-721 (Spring 2019)

SELECTIO N SCAN S

13

Source: Bogdan Raducanu

slide-20
SLIDE 20 CMU 15-721 (Spring 2019)

EXCESSIVE IN STRUCTIO NS

The DBMS needs to support different data types, so it must check a values type before it performs any operation on that value.

→ This is usually implemented as giant switch statements. → Also creates more branches that can be difficult for the CPU to predict reliably.

Example: Postgres' addition for NUMERIC types.

14

slide-21
SLIDE 21 CMU 15-721 (Spring 2019)

EXCESSIVE IN STRUCTIO NS

The DBMS needs to support different data types, so it must check a values type before it performs any operation on that value.

→ This is usually implemented as giant switch statements. → Also creates more branches that can be difficult for the CPU to predict reliably.

Example: Postgres' addition for NUMERIC types.

14

slide-22
SLIDE 22 CMU 15-721 (Spring 2019)

PRO CESSIN G M O DEL

A DBMS's processing model defines how the system executes a query plan.

→ Different trade-offs for different workloads.

Approach #1: Iterator Model Approach #2: Materialization Model Approach #3: Vectorized / Batch Model

15

slide-23
SLIDE 23 CMU 15-721 (Spring 2019)

ITERATO R M O DEL

Each query plan operator implements a next function.

→ On each invocation, the operator returns either a single tuple or a null marker if there are no more tuples. → The operator implements a loop that calls next on its children to retrieve their tuples and then process them.

Also called Volcano or Pipeline Model.

16

slide-24
SLIDE 24 CMU 15-721 (Spring 2019)

ITERATO R M O DEL

17

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100

A B

A.id=B.id value>100 A.id, B.value

s

p

for t in A: emit(t) for t1 in left.Next(): buildHashTable(t1) for t2 in right.Next(): if probe(t2): emit(t1⨝t2) for t in child.Next(): emit(projection(t)) for t in child.Next(): if evalPred(t): emit(t) for t in B: emit(t)

slide-25
SLIDE 25 CMU 15-721 (Spring 2019)

ITERATO R M O DEL

17

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100

A B

A.id=B.id value>100 A.id, B.value

s

p

for t in A: emit(t) for t1 in left.Next(): buildHashTable(t1) for t2 in right.Next(): if probe(t2): emit(t1⨝t2) for t in child.Next(): emit(projection(t)) for t in child.Next(): if evalPred(t): emit(t) for t in B: emit(t)

1

slide-26
SLIDE 26 CMU 15-721 (Spring 2019)

ITERATO R M O DEL

17

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100

A B

A.id=B.id value>100 A.id, B.value

s

p

for t in A: emit(t) for t1 in left.Next(): buildHashTable(t1) for t2 in right.Next(): if probe(t2): emit(t1⨝t2) for t in child.Next(): emit(projection(t)) for t in child.Next(): if evalPred(t): emit(t) for t in B: emit(t)

1 2

slide-27
SLIDE 27 CMU 15-721 (Spring 2019)

ITERATO R M O DEL

17

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100

A B

A.id=B.id value>100 A.id, B.value

s

p

for t in A: emit(t) for t1 in left.Next(): buildHashTable(t1) for t2 in right.Next(): if probe(t2): emit(t1⨝t2) for t in child.Next(): emit(projection(t)) for t in child.Next(): if evalPred(t): emit(t) for t in B: emit(t)

1 2 3

Single Tuple

slide-28
SLIDE 28 CMU 15-721 (Spring 2019)

ITERATO R M O DEL

17

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100

A B

A.id=B.id value>100 A.id, B.value

s

p

for t in A: emit(t) for t1 in left.Next(): buildHashTable(t1) for t2 in right.Next(): if probe(t2): emit(t1⨝t2) for t in child.Next(): emit(projection(t)) for t in child.Next(): if evalPred(t): emit(t) for t in B: emit(t)

1 2 3 5 4

slide-29
SLIDE 29 CMU 15-721 (Spring 2019)

ITERATO R M O DEL

17

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100

A B

A.id=B.id value>100 A.id, B.value

s

p

for t in A: emit(t) for t1 in left.Next(): buildHashTable(t1) for t2 in right.Next(): if probe(t2): emit(t1⨝t2) for t in child.Next(): emit(projection(t)) for t in child.Next(): if evalPred(t): emit(t) for t in B: emit(t)

1 2 3 5 4

slide-30
SLIDE 30 CMU 15-721 (Spring 2019)

ITERATO R M O DEL

This is used in almost every DBMS. Allows for tuple pipelining. Some operators have to block until their children emit all of their tuples.

→ Joins, Subqueries, Order By

Output control works easily with this approach.

18

slide-31
SLIDE 31 CMU 15-721 (Spring 2019)

M ATERIALIZATIO N M O DEL

Each operator processes its input all at once and then emits its output all at once.

→ The operator "materializes" it output as a single result. → The DBMS can push down hints into to avoid scanning too many tuples. → Can send either a materialized row or a single column.

The output can be either whole tuples (NSM) or subsets of columns (DSM)

19

slide-32
SLIDE 32 CMU 15-721 (Spring 2019)

M ATERIALIZATIO N M O DEL

20

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100

A B

A.id=B.id value>100 A.id, B.value

s

p

  • ut = { }

for t in A:

  • ut.add(t)

return out

  • ut = { }

for t1 in left.Output(): buildHashTable(t1) for t2 in right.Output(): if probe(t2): out.add(t1⨝t2) return out

  • ut = { }

for t in child.Output():

  • ut.add(projection(t))

return out

  • ut = { }

for t in child.Output(): if evalPred(t): out.add(t) return out

  • ut = { }

for t in B:

  • ut.add(t)

return out

1

slide-33
SLIDE 33 CMU 15-721 (Spring 2019)

M ATERIALIZATIO N M O DEL

20

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100

A B

A.id=B.id value>100 A.id, B.value

s

p

  • ut = { }

for t in A:

  • ut.add(t)

return out

  • ut = { }

for t1 in left.Output(): buildHashTable(t1) for t2 in right.Output(): if probe(t2): out.add(t1⨝t2) return out

  • ut = { }

for t in child.Output():

  • ut.add(projection(t))

return out

  • ut = { }

for t in child.Output(): if evalPred(t): out.add(t) return out

  • ut = { }

for t in B:

  • ut.add(t)

return out

1 2 3

All Tuples

slide-34
SLIDE 34 CMU 15-721 (Spring 2019)

M ATERIALIZATIO N M O DEL

20

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100

A B

A.id=B.id value>100 A.id, B.value

s

p

  • ut = { }

for t in A:

  • ut.add(t)

return out

  • ut = { }

for t1 in left.Output(): buildHashTable(t1) for t2 in right.Output(): if probe(t2): out.add(t1⨝t2) return out

  • ut = { }

for t in child.Output():

  • ut.add(projection(t))

return out

  • ut = { }

for t in child.Output(): if evalPred(t): out.add(t) return out

  • ut = { }

for t in B:

  • ut.add(t)

return out

1 2 3 5 4

slide-35
SLIDE 35 CMU 15-721 (Spring 2019)

M ATERIALIZATIO N M O DEL

20

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100

A B

A.id=B.id value>100 A.id, B.value

s

p

  • ut = { }

for t in A:

  • ut.add(t)

return out

  • ut = { }

for t1 in left.Output(): buildHashTable(t1) for t2 in right.Output(): if probe(t2): out.add(t1⨝t2) return out

  • ut = { }

for t in child.Output():

  • ut.add(projection(t))

return out

  • ut = { }

for t in child.Output(): if evalPred(t): out.add(t) return out

  • ut = { }

for t in B:

  • ut.add(t)

return out

1 2 3 5 4

slide-36
SLIDE 36 CMU 15-721 (Spring 2019)

M ATERIALIZATIO N M O DEL

Better for OLTP workloads because queries only access a small number of tuples at a time.

→ Lower execution / coordination overhead. → Fewer function calls.

Not good for OLAP queries with large intermediate results.

21

slide-37
SLIDE 37 CMU 15-721 (Spring 2019)

VECTO RIZATIO N M O DEL

Like the Iterator Model, each operator implements a next function in this model. Each operator emits a batch of tuples instead of a single tuple.

→ The operator's internal loop processes multiple tuples at a time. → The size of the batch can vary based on hardware or query properties.

22

slide-38
SLIDE 38 CMU 15-721 (Spring 2019)

VECTO RIZATIO N M O DEL

23

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100

A B

A.id=B.id value>100 A.id, B.value

s

p

  • ut = { }

for t in A:

  • ut.add(t)

if |out|>n: emit(out)

  • ut = { }

for t1 in left.Output(): buildHashTable(t1) for t2 in right.Output(): if probe(t2): out.add(t1⨝t2) if |out|>n: emit(out)

  • ut = { }

for t in child.Output():

  • ut.add(projection(t))

if |out|>n: emit(out)

  • ut = { }

for t in child.Output(): if evalPred(t): out.add(t) if |out|>n: emit(out)

1 2 3

  • ut = { }

for t in B:

  • ut.add(t)

if |out|>n: emit(out)

slide-39
SLIDE 39 CMU 15-721 (Spring 2019)

VECTO RIZATIO N M O DEL

23

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100

A B

A.id=B.id value>100 A.id, B.value

s

p

  • ut = { }

for t in A:

  • ut.add(t)

if |out|>n: emit(out)

  • ut = { }

for t1 in left.Output(): buildHashTable(t1) for t2 in right.Output(): if probe(t2): out.add(t1⨝t2) if |out|>n: emit(out)

  • ut = { }

for t in child.Output():

  • ut.add(projection(t))

if |out|>n: emit(out)

  • ut = { }

for t in child.Output(): if evalPred(t): out.add(t) if |out|>n: emit(out)

1 2 3

  • ut = { }

for t in B:

  • ut.add(t)

if |out|>n: emit(out)

5 4

slide-40
SLIDE 40 CMU 15-721 (Spring 2019)

VECTO RIZATIO N M O DEL

Ideal for OLAP queries because it greatly reduces the number of invocations per operator. Allows for operators to use vectorized (SIMD) instructions to process batches of tuples.

24

slide-41
SLIDE 41 CMU 15-721 (Spring 2019)

PLAN PRO CESSIN G DIRECTIO N

Approach #1: Top-to-Bottom

→ Start with the root and "pull" data up from its children. → Tuples are always passed with function calls.

Approach #2: Bottom-to-Top

→ Start with leaf nodes and push data to their parents. → Allows for tighter control of caches/registers in pipelines. → We will see this later in HyPer and Peloton ROF.

25

slide-42
SLIDE 42 CMU 15-721 (Spring 2019)

IN TER- Q UERY PARALLELISM

Improve overall performance by allowing multiple queries to execute simultaneously.

→ Provide the illusion of isolation through concurrency control scheme.

The difficulty of implementing a concurrency control scheme is not significantly affected by the DBMS’s process model.

27

slide-43
SLIDE 43 CMU 15-721 (Spring 2019)

IN TRA- Q UERY PARALLELISM

Improve the performance of a single query by executing its operators in parallel. Approach #1: Intra-Operator (Horizontal) Approach #2: Inter-Operator (Vertical) These techniques are not mutually exclusive. There are parallel algorithms for every relational

  • perator.

28

slide-44
SLIDE 44 CMU 15-721 (Spring 2019)

IN TRA- O PERATO R PARALLELISM

Approach #1: Intra-Operator (Horizontal)

→ Operators are decomposed into independent instances that perform the same function on different subsets of data.

The DBMS inserts an exchange operator into the query plan to coalesce results from children

  • perators.

29

slide-45
SLIDE 45 CMU 15-721 (Spring 2019)

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100

IN TRA- O PERATO R PARALLELISM

30

A2 A1 A3

1 2 3

A B

s

p

s

slide-46
SLIDE 46 CMU 15-721 (Spring 2019)

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100

IN TRA- O PERATO R PARALLELISM

30

A2 A1 A3

1 2 3

A B

s

p

s

s s s

slide-47
SLIDE 47 CMU 15-721 (Spring 2019)

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100

IN TRA- O PERATO R PARALLELISM

30

A2 A1 A3

1 2 3

A B

s

p

s

s s s p p p

slide-48
SLIDE 48 CMU 15-721 (Spring 2019)

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100

IN TRA- O PERATO R PARALLELISM

30

A2 A1 A3

Build HT Build HT Build HT

1 2 3

Exchange

A B

s

p

s

s s s

p p p

slide-49
SLIDE 49 CMU 15-721 (Spring 2019)

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100

IN TRA- O PERATO R PARALLELISM

30

A2 A1 A3

Build HT Build HT Build HT

1 2 3

Exchange

A B

s

p

s

s s s

B1 B2 B3

1 2 3

p p p

slide-50
SLIDE 50 CMU 15-721 (Spring 2019)

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100

IN TRA- O PERATO R PARALLELISM

30

A2 A1 A3

Build HT Build HT Build HT

1 2 3

Exchange

A B

s

p

s

s s s

B1 B2 B3

1 2 3

s s s

p p p p p p

slide-51
SLIDE 51 CMU 15-721 (Spring 2019)

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100

IN TRA- O PERATO R PARALLELISM

30

A2 A1 A3

Build HT Build HT Build HT

1 2 3

Exchange

A B

s

p

s

s s s

B1 B2 B3

1 2 3

s s s

Probe HT Probe HT Probe HT

p p p p p p

Exchange

slide-52
SLIDE 52 CMU 15-721 (Spring 2019)

IN TER- O PERATO R PARALLELISM

Approach #2: Inter-Operator (Vertical)

→ Operations are overlapped in order to pipeline data from

  • ne stage to the next without materialization.

Also called pipelined parallelism. AFAIK, this approach is not widely used in traditional relational DBMSs.

→ Not all operators can emit output until they have seen all

  • f the tuples from their children.

→ It is more common in stream processing systems.

31

slide-53
SLIDE 53 CMU 15-721 (Spring 2019)

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100

A B

s

p

s

IN TER- O PERATO R PARALLELISM

32

1 ⨝

for t1 ∊ outer: for t2 ∊ inner: emit(t1⨝t2)

slide-54
SLIDE 54 CMU 15-721 (Spring 2019)

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100

A B

s

p

s

IN TER- O PERATO R PARALLELISM

32

1 ⨝

for t1 ∊ outer: for t2 ∊ inner: emit(t1⨝t2)

2 p

for t ∊ incoming: emit(pt)

slide-55
SLIDE 55 CMU 15-721 (Spring 2019)

O BSERVATIO N

Coming up with the right number of workers to use for a query plan depends on the number of CPU cores, the size of the data, and functionality

  • f the operators.

33

slide-56
SLIDE 56 CMU 15-721 (Spring 2019)

WO RKER ALLO CATIO N

Approach #1: One Worker per Core

→ Each core is assigned one thread that is pinned to that core in the OS. → See sched_setaffinity

Approach #2: Multiple Workers per Core

→ Use a pool of workers per core (or per socket). → Allows CPU cores to be fully utilized in case one worker at a core blocks.

34

slide-57
SLIDE 57 CMU 15-721 (Spring 2019)

TASK ASSIGN M EN T

Approach #1: Push

→ A centralized dispatcher assigns tasks to workers and monitors their progress. → When the worker notifies the dispatcher that it is finished, it is given a new task.

Approach #1: Pull

→ Workers pull the next task from a queue, process it, and then return to get the next task.

35

slide-58
SLIDE 58 CMU 15-721 (Spring 2019)

PARTIN G TH O UGH TS

The easiest way to implement something is not going to always produce the most efficient execution strategy for modern CPUs. We will see that vectorized / bottom-up execution will be the better way to execute OLAP queries.

36

slide-59
SLIDE 59 CMU 15-721 (Spring 2019)

N EXT CLASS

User-defined Functions Stored Procedures

37