Cost-based Query Optimization Christoph Koch Computing the cost of - - PowerPoint PPT Presentation

cost based query optimization christoph koch computing
SMART_READER_LITE
LIVE PREVIEW

Cost-based Query Optimization Christoph Koch Computing the cost of - - PowerPoint PPT Presentation

Cost-based Query Optimization Christoph Koch Computing the cost of a query plan We know how to estimate the cost of each individual operator. But for doing this we need to know the size of the input. Compute the size of each relation


slide-1
SLIDE 1

Cost-based Query Optimization Christoph Koch

slide-2
SLIDE 2

2

Computing the cost of a query plan

We know how to estimate the cost of each individual operator. But for doing this we need to know the size of the input. Compute the size of each relation produced by some operator Bottom-up We need to know estimates of selectivities/reduction factors

for both selection and join conditions.

Given a fixed query plan, if we exchange a particular operator

implementation (e.g. NL join against Hashjoin), the output does not change and we do not have to recompute output sizes.

slide-3
SLIDE 3

3

Model of Query plans

We use a very powerful model. Ingredients: Algebra tree For each operation, a choice of implementation For some binary operations (e.g. NL joins), an annotation

saying which of the two inputs is the outer and which is the inner loop.

A choice of whether an operator’s output is to be written

back to disk or pipelined into the next operator (sometimes there is no choice).

An allocation of memory buffer pages to operators and their

input/output lines.

In case of pipelining, the allocation may not be operator by operator

but span several operations.

slide-4
SLIDE 4

4

A Remark on the cost function for block NL joins

On this slides I use the formula

pages_outer + round_up(pages_outer / (buffer_pages – 1)) * (pages_inner - 1) + 1

I explained this formula in class but it’s not in the book. You can use the formula from the book instead:

pages_outer + round_up(pages_outer / (buffer_pages – 1)) * pages_inner

slide-5
SLIDE 5

5

A University Database

room grad name id 226 Full Socrates 2125 7 Full Kant 2137 36 Full Curie 2136 309 Assoc Augustinus 2134 52 Assoc Popper 2133 310 Assoc Kopernikus 2127 232 Full Russel 2126

Professor (p)

semester name sid 18 Xenokrates 24002 2 Feuerbach 29555 2 Theophrastos 29120 3 Carnap 28106 6 Schopenhauer 27550 8 Aristoxenos 26830 10 Fichte 26120 12 Jonas 25403

Student (s)

2137 4 The three critiques 4630 2134 2 CS432 5022 2133 2 The Vienna Circle 5259 taught_by credits name cid 2137 4 Foundations 5001 2126 2 Bioethics 5216 2126 3 Theory of Science 5052 2125 4 Logics 4052 2125 2 Maieutics 5049 2126 3 Epistemology 5043 2125 4 Ethics 5041

Course (v)

5216 28106 5022 25403 5022 29555 5049 29120 5041 29120 cid sid 5001 26120 5001 29120 5259 28106 5052 28106 5041 28106 4052 27550 5001 27550

Takes (h)

slide-6
SLIDE 6

6

Assumptions

#tuples per page: p: b1024/50c =20 s: b1024/50c = 20 v: b1024/100c = 10 h: b1024/16c = 64 (page overhead is ignored) #pages: p: 800/20 = 40 s: 38000/20 = 1900 v: 2000/10 = 200 h: d60000/64e = 938 (we ignore the overhead of the primary index.) Relation sizes (# tuples) |p|= 800 |s|= 38000 |v|= 2000 |h|= 60000

  • Avg. Tuple size

p: 50 Bytes s: 50 Bytes v: 100 Bytes h: 16 Bytes selectivities Sel[sh] = 2.6 * 10^ -5 Sel[hv] = 5 * 10^ -4 Sel[vp] = 1.25 * 10^ -3 Sel[p.Name= ...] = 1.25 * 10^ -3 Page size 1024 Bytes Main memory buffers m = 20+ 1 pages

slide-7
SLIDE 7

7

Example 1

Access(p)

πs.semester

h v s (with duplicates) Joins.sid=h.sid Joinp.id=v.taught_by Joinv.cid=h.cid Selp.name=´Socrates´

slide-8
SLIDE 8

8

Example 1

Access(p)

πs.semester

s h v (with duplicates) Joinp.id=v.taught_by Joinv.cid=h.cid Joins.sid=h.sid Selp.name=´Socrates´

slide-9
SLIDE 9

9

Example 1

Access(p)

πs.semester

s h v (with duplicates) Joins.sid=h.sid Joinp.id=v.taught_by Joinv.cid=h.cid Selp.name=´Socrates´

slide-10
SLIDE 10

10

Example 1

Access(p)

πs.semester

s h v (with duplicates) Joins.sid=h.sid Joinp.id=v.taught_by Joinv.cid=h.cid Selp.name=´Socrates´

slide-11
SLIDE 11

11

Example 1

Access(p)

πs.semester

s h v (with duplicates) Joins.sid=h.sid Joinp.id=v.taught_by Joinv.cid=h.cid Selp.name=´Socrates´

slide-12
SLIDE 12

12

Example 1

Access(p)

πs.semester

s h v (with duplicates)

And now we have to allocate memory buffers...

Joins.sid=h.sid Joinp.id=v.taught_by Joinv.cid=h.cid Selp.name=´Socrates´

slide-13
SLIDE 13

13

Example 1a

Access(p) NL-Joins.sid=h.sid NL-Joinp.id=v.taught_by

π s.semester

Selp.name=´Socrates´ NL-Joinv.cid=h.cid s h v (with duplicates)

1 9 Buffer assignments in this color

pipl. pipl.

1

pipl.

9 1 (inner: recompute!)

slide-14
SLIDE 14

14

Example 1a

Access(p)

π s.semester

s h v (with duplicates) pipl. pipl.

1

pipl.

1 9 9 1 (inner: recompute!)

NL-Joins.sid=h.sid NL-Joinp.id=v.taught_by NL-Joinv.cid=h.cid Selp.name=´Socrates´

slide-15
SLIDE 15

15

Example 1a

s h v Access(p)

π s.semester

(with duplicates) NL-Joins.sid=h.sid NL-Joinp.id=v.taught_by NL-Joinv.cid=h.cid Selp.name=´Socrates´

slide-16
SLIDE 16

16

Example 1b

Access(p)

π s.semester

s h v (with duplicates)

1 9 Buffer assignments in this color

pipl. pipl.

1

pipl.

9 1

NL-Joins.sid=h.sid NL-Joinp.id=v.taught_by NL-Joinv.cid=h.cid Selp.name=´Socrates´

slide-17
SLIDE 17

17

Example 1b

Access(p)

π s.semester

s h v (with duplicates) pipl. pipl.

1

pipl.

1 9 9 1

NL-Joins.sid=h.sid NL-Joinp.id=v.taught_by NL-Joinv.cid=h.cid Selp.name=´Socrates´

slide-18
SLIDE 18

18

Example 1b

s h v Access(p)

π s.semester

(with duplicates) NL-Joins.sid=h.sid NL-Joinp.id=v.taught_by NL-Joinv.cid=h.cid Selp.name=´Socrates´

slide-19
SLIDE 19

19

Example 1c

Access(p)

π s.semester

s h v (with duplicates)

1 19 1 Buffer assignments in this color

pipl. pipl.

1

pipl.

18 1 Store intermediate result

NL-Joins.sid=h.sid NL-Joinp.id=v.taught_by NL-Joinv.cid=h.cid Selp.name=´Socrates´

slide-20
SLIDE 20

20

Example 1c

Access(p)

π s.semester

s h v (with duplicates)

1 19 1

pipl. pipl.

1

pipl.

18 1

NL-Joins.sid=h.sid NL-Joinp.id=v.taught_by NL-Joinv.cid=h.cid Selp.name=´Socrates´

slide-21
SLIDE 21

21

Example 1c

s h v Access(p)

π s.semester

(with duplicates) NL-Joins.sid=h.sid NL-Joinp.id=v.taught_by NL-Joinv.cid=h.cid Selp.name=´Socrates´

slide-22
SLIDE 22

22

Example 2

s h v Access(p) NL-Joins.sid=h.sid NL-Joinp.id=v.taught_by

π s.semester

NL-Joinv.cid=h.cid pipl. pipl.

1 1 5

Selp.name=´Socrates´

slide-23
SLIDE 23

23

Example 2

s h v Access(p) pipl.

1

pipl.

(Same query, different Join order)

1 5

π s.semester

NL-Joins.sid=h.sid NL-Joinp.id=v.taught_by NL-Joinv.cid=h.cid Selp.name=´Socrates´

slide-24
SLIDE 24

24

Example 2

s h v Access(p)

1

pipl. pipl.

1 5

π s.semester

NL-Joins.sid=h.sid NL-Joinp.id=v.taught_by NL-Joinv.cid=h.cid Selp.name=´Socrates´

slide-25
SLIDE 25

25

Example 2

s h v Access(p)

1 5

pipl.

1

pipl.

π s.semester

NL-Joins.sid=h.sid NL-Joinp.id=v.taught_by NL-Joinv.cid=h.cid Selp.name=´Socrates´

slide-26
SLIDE 26

26

Example 2

s h v Access(p)

5

pipl.

1

pipl.

1

π s.semester

NL-Joins.sid=h.sid NL-Joinp.id=v.taught_by NL-Joinv.cid=h.cid Selp.name=´Socrates´

slide-27
SLIDE 27

27

Example 2

s h v Access(p) pipl.

1

pipl.

1 5

π s.semester

NL-Joins.sid=h.sid NL-Joinp.id=v.taught_by NL-Joinv.cid=h.cid Selp.name=´Socrates´

slide-28
SLIDE 28

28

Example 3a

Access(p) Index-Joinp.id=v.taught_by Index-Joins.sid=h.sid Merge-Joinv.cid=h.cid

π s.semester

Sortv.cid Sorth.cid v h s Selp.name=´Socrates´

taught_by

slide-29
SLIDE 29

29

Example 3a

20+1

pipl.

1 1

pipl.

5 e.g. 2

pipl. Access(p)

π s.semester

Sortv.cid Sorth.cid v h s Index-Joinp.id=v.taught_by Index-Joins.sid=h.sid Merge-Joinv.cid=h.cid Selp.name=´Socrates´

taught_by

slide-30
SLIDE 30

30

Example 3a

20+1

pipl.

1 1

pipl.

e.g. 2

pipl.

5

Access(p)

π s.semester

Sortv.cid Sorth.cid v h s Index-Joinp.id=v.taught_by Index-Joins.sid=h.sid Merge-Joinv.cid=h.cid Selp.name=´Socrates´

taught_by

slide-31
SLIDE 31

31

Example 3a

20+1

pipl.

1 1

pipl.

e.g. 2

pipl.

5

Access(p)

π s.semester

Sortv.cid Sorth.cid v h s Index-Joinp.id=v.taught_by Index-Joins.sid=h.sid Merge-Joinv.cid=h.cid Selp.name=´Socrates´

taught_by

slide-32
SLIDE 32

32

Example 3a

20+1

pipl.

1 1

pipl.

e.g. 2

pipl.

5

Access(p)

πs.semester

Sortv.cid Sorth.cid v h s Index-Joinp.id=v.taught_by Index-Joins.sid=h.sid Merge-Joinv.cid=h.cid Selp.name=´Socrates´

taught_by

slide-33
SLIDE 33

33

Example 3a

20+1

pipl.

1 1

pipl.

e.g. 2

pipl.

5

Access(p)

πs.semester

Sortv.cid Sorth.cid v h s Index-Joinp.id=v.taught_by Index-Joins.sid=h.sid Merge-Joinv.cid=h.cid Selp.name=´Socrates´

taught_by

slide-34
SLIDE 34

34

Example 3a

pipl.

18

pipl.

20+1

pipl.

1 1

pipl.

e.g. 2

pipl.

5

Access(p)

π s.semester

Sortv.cid Sorth.cid v h s Index-Joinp.id=v.taught_by Index-Joins.sid=h.sid Merge-Joinv.cid=h.cid Selp.name=´Socrates´

taught_by

slide-35
SLIDE 35

35

Example 3a

pipl.

18

pipl.

20+1

pipl.

1 1

pipl.

e.g. 2

pipl.

5

Access(p)

π s.semester

Sortv.cid Sorth.cid v h s Index-Joinp.id=v.taught_by Index-Joins.sid=h.sid Merge-Joinv.cid=h.cid Selp.name=´Socrates´

taught_by

slide-36
SLIDE 36

36

Example 3a

20+1

pipl.

1 1

pipl.

e.g. 2

pipl.

5

Access(p)

π s.semester

Sortv.cid Sorth.cid v h s Index-Joinp.id=v.taught_by Index-Joins.sid=h.sid Merge-Joinv.cid=h.cid Selp.name=´Socrates´

taught_by

slide-37
SLIDE 37

37

Example 3b

pipl.

3

pipl.

20+1

pipl.

1 1

pipl.

e.g. 2

pipl.

5

Access(p)

π s.semester

Sortv.cid Sorth.cid v h s IndexDup Hash Index-Joinp.id=v.taught_by Index-Joins.sid=h.sid Merge-Joinv.cid=h.cid Selp.name=´Socrates´

taught_by

slide-38
SLIDE 38

38

Example 3b

pipl. pipl.

3

pipl.

20+1

pipl.

1 1

pipl.

e.g. 2 5

Access(p)

π s.semester

Sortv.cid Sorth.cid v h s IndexDup Hash Index-Joinp.id=v.taught_by Index-Joins.sid=h.sid Merge-Joinv.cid=h.cid Selp.name=´Socrates´

taught_by

slide-39
SLIDE 39

39

Example 3b

pipl. pipl.

3

pipl.

20+1

pipl.

1 1

pipl.

e.g. 2 5

Access(p)

π s.semester

Sortv.cid Sorth.cid v h s IndexDup Hash Index-Joinp.id=v.taught_by Index-Joins.sid=h.sid Merge-Joinv.cid=h.cid Selp.name=´Socrates´

taught_by

slide-40
SLIDE 40

40

Example 3c

pipl. pipl.

3

pipl. pipl.

1 e.g. 2 5 1 (15 hashbuckets)

Access(p)

π s.semester

v h s IndexDup Hash Index-Joinp.id=v.taught_by Index-Joins.sid=h.sid Hash-Joinv.cid=h.cid Selp.name=´Socrates´

taught_by

slide-41
SLIDE 41

41

Example 3d

pipl. pipl.

3

pipl. pipl.

1 e.g. 2 1 1 5

Access(p)

π s.semester

v h IndexDup Hash s Index-Joinp.id=v.taught_by Index-Joins.sid=h.sid NL-Joinv.cid=h.cid Selp.name=´Socrates´

taught_by

slide-42
SLIDE 42

42

Example 3e

pipl. pipl.

3

pipl. pipl.

1 e.g. 2 1 5 e.g. 2

Access(p)

π s.semester

v h IndexDup Hash s Index-Joinp.id=v.taught_by Index-Joins.sid=h.sid Index-Joinv.cid=h.cid Selp.name=´Socrates´

taught_by

slide-43
SLIDE 43

43

Observations

Join order has BIG impact on query evaluation time! Cost of Operations and size of intermediate result can be

computed separately.

The selection of a join operator has only local influence on

the runtime of the query, except if it destroys properties of the data (such as sort order) that are important for

  • perations later on in the pipeline or if it requires main

memory buffers that are allocated to other operators.

slide-44
SLIDE 44

44

Example 4: Selectivity estimates

Access(p) v

Now no selectivities are given, but we know that

  • Professors (p) are in a 1:n relationship with

Courses (v) (via foreign key taught_by).

  • Each Professor has either rank „Full“ or „Assoc“.

Selectp.grad = ´Full´ Index-Joinp.id=v.taught_by

(Unclustered B+-tree on taught_by

slide-45
SLIDE 45

45

Example 4a: Selectivity estimates

Access(p) v

1

taught_by

Index-Joinp.id=v.taught_by Selectp.grad = ´Full´

slide-46
SLIDE 46

46

Example 4b: Selectivity estimates

Access(p) v

It is better NOT to Use the index!

Cost: (Unclustered B+-tree on taught_by

NL-Joinp.id=v.taught_by Selectp.grad = ´Full´

slide-47
SLIDE 47

47

Using Histograms

5 0 5 0 1 0 0 1 0 0 1 5 0 1 5 0 2 0 0 2 0 0 2 5 0 2 5 0 3 0 0 3 0 0 3 5 0 3 5 0 4 0 0 4 0 0 1 6 - 1 6 -1 7 1 7 1 8 - 1 8 -1 9 1 9 2 0 - 2 0 -2 1 2 1 2 2 - 2 2 -2 3 2 3 2 4 - 2 4 -2 5 2 5 2 6 - 2 6 -2 7 2 7

St St udent udent .Age

Histogram computed by „analyze table“ statement

slide-48
SLIDE 48

48

Example 5: (co-)Clusters

p v

Prof 1 Course 1.1 Course 1.2 Prof 2 Course 2.1 Prof 3 Course 3.1 Course 3.2 Course 3.3

1 1 1

NL-Joinp.id=v.taught_by Selectp.grad = ´Full´

slide-49
SLIDE 49

49

Example 6: Seeks

p v

(Unclustered B+-tree on taught_by

NL-Joinp.id=v.taught_by Selectp.grad = ´Full´