Query Processing 5DV120 Database System Principles Ume a - - PowerPoint PPT Presentation

query processing
SMART_READER_LITE
LIVE PREVIEW

Query Processing 5DV120 Database System Principles Ume a - - PowerPoint PPT Presentation

Query Processing 5DV120 Database System Principles Ume a University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner Query Processing 20130530 Slide 1 of 62 Overview Question: How is a


slide-1
SLIDE 1

Query Processing

5DV120 — Database System Principles Ume˚ a University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner

Query Processing 20130530 Slide 1 of 62

slide-2
SLIDE 2

Overview

Question: How is a query in SQL processed by the system to produce the answer?

  • The block diagram below identifies the three main steps.

Lexer + Parser + View Resolver: In the first step, the (declarative) SQL query is translated to an operational query, usually but not always expressed in the relational algebra. Query Optimizer: In the second step, the algebraic representation is augmented to be an execution plan, with possible rearrangement of the algebraic operations as well as implementation-specific information on how to evaluate in an efficient fashion. Code Generator and Evaluator: In the final step, the plan of execution is carried out.

Lexer Parser View Resolver Query Optimizer Code Generator and Evaluator SQL Source Algebraic Rep. Execution Plan Result

Query Processing 20130530 Slide 2 of 62

slide-3
SLIDE 3

Scanning, Parsing, and View Resolution

  • Lexical analysis and parsing are carried out using well-known techniques

from translator design. Lexical analysis: The lexer (or lexical analyzer or tokenzier) breaks the input stream up into tokens. Parsing: The parser builds a parse tree for the tokens according to a grammar for the language.

  • These topics will not be considered in this course.

View resolution: corresponds to the generation of an intermediate representation in programming-language translator.

  • In the case of SQL, the relational algebra is often but not always used the

the intermediate language.

  • The relationship between SQL and the relational algebra, as well as how

to translate queries from one to the other, has already been studied in the introductory course.

  • The topic of how to obtain an equivalent expression in the relational

algebra from a query in SQL will not be considered further here.

Query Processing 20130530 Slide 3 of 62

slide-4
SLIDE 4

Query Optimization

  • Query optimization involves at least two distinct processes.

Query rewriting: A given expression in the relational algebra (e.g., the

  • utput of the lexer + parser + view resolver) may be represented by an

equivalent expression which is amenable to more efficient evaluation. Annotation: An expression in the relational algebra may be annotated with specific information on how to carry out its steps, such as:

  • which algorithms to use;
  • which indices to use.
  • These topics will be examined in these slides.

Lexer Parser View Resolver Query Optimizer Code Generator and Evaluator SQL Source Algebraic Rep. Execution Plan Result

Query Processing 20130530 Slide 4 of 62

slide-5
SLIDE 5

Code Generation and Evaluation

  • Executing an execution plan is a relatively straightforward process,

although there are certainly nontrivial details which must be addressed.

  • This task will not be examined further in these slides.
  • Thus, the focus will be upon query optimization.

Lexer Parser View Resolver Query Optimizer Code Generator and Evaluator SQL Source Algebraic Rep. Execution Plan Result

Query Processing 20130530 Slide 5 of 62

slide-6
SLIDE 6

Code Generation and Evaluation

  • Executing an execution plan is a relatively straightforward process,

although there are certainly nontrivial details which must be addressed.

  • This task will not be examined further in these slides.
  • Thus, the focus will be upon query optimization.

Lexer Parser View Resolver Query Optimizer Code Generator and Evaluator SQL Source Algebraic Rep. Execution Plan Result

Query Processing 20130530 Slide 5 of 62

slide-7
SLIDE 7

Executing Operations in the Relational Algebra

  • In these slides, the relational algebra will be used as the intermediate

language for (unannotated) execution plans.

  • Therefore, it is important to begin with a study of algorithms for the

following, under a variety of conditions for index availability. projection selection join removal of duplicates

  • rdering of results

aggregation

Query Processing 20130530 Slide 6 of 62

slide-8
SLIDE 8

Basic Measures of Cost for Data Access

  • Recall that access to secondary storage (usually hard disks) takes much

longer (thousands of times longer) than access to primary storage (main memory).

  • Minimizing the number of times that secondary storage must be accessed

is therefore paramount in the design of efficient algorithms for query processing.

  • It is useful to begin with some basic parameters for disk access.

tS: The (average) time required to access one block of data (seek time + rotational latency). tT: The (average) time required to transfer one block of data from secondary to primary storage.

  • An operation which requires ns seeks to transfer nb blocks thus requires a

total time of ns · tS + nb · tT.

  • The relationship between ns and nb depends upon how the required

blocks are arranged on the secondary device (random vs. sequential neighbors).

Query Processing 20130530 Slide 7 of 62

slide-9
SLIDE 9

Cost Measures Associated with Indices

  • If an attribute is indexed and a query involves that attribute, then it is
  • ften the case that an optimal evaluation algorithm will involve access via

that attribute. hi: For an index which is a B+-tree, the depth of the index; i.e., one less than the length of a path from the root to a leaf.

  • As noted previously, it is access to secondary storage which is the prime

consumer of time.

  • Therefore, it is appropriate to decompose

hi = hipri + hisec as follows. hipri: In a path from the root to a leaf, the (average) number of pointers whose destination is already in main memory. hisec: In a path from the root to a leaf, the (average) number of pointers whose destination is not in main memory. Disk access rule of thumb: In most cases, access times along hipri may be ignored, since they will be thousands of times less than for hisec.

Query Processing 20130530 Slide 8 of 62

slide-10
SLIDE 10

Cases for Selection

  • Selection is the most basic operation of the relational algebra which

involves the use of indices.

  • It is convenient to decompose access into a number of cases.

Simple cases: A is an attribute and e is a fixed expression which may be evaluated in constant time. Equality on a single attribute: σA=e Inequality on a single attribute: σA=e Simple range on a single attribute: σA≤e, σA≥e, σA<e, σA>e Compound cases: θ and each θi is a simple condition Ai ⊙ ei with ⊙ ∈ {=, ≤, <, ≥, >, =}. Conjunction: σθ1∧θ2∧...∧θk Disjunction: σθ1∨θ2∨...∨θk Negation: σ¬θ (Not really needed as a separate case, since, for example σ¬(A≤e) is the same as σA>e, but care must be taken to handle null values correctly.)

Query Processing 20130530 Slide 9 of 62

slide-11
SLIDE 11

Simple Selection without Index Support

  • The case labels used here match those of the textbook.

nblk: The number of blocks used for the file containing the relation. A1: Linear search: The entire relation is searched for tuples which match the condition A ⊙ e, where ⊙ ∈ {=, ≤, <, ≥, >, =}. Total cost (time): tS + nblk · tT.

  • tS only occurs once because the data are searched sequentially.

A1: Linear search; equality on a key: The entire relation is searched for tuples which match the condition A = e. Worst-case total cost (time): tS + nblk · tT. Best-case total cost (time): tS + tT. Average-case total cost (time): tS + (nblk/2) · tT.

  • On the average, about half of the file must be searched before the

key is found.

Query Processing 20130530 Slide 10 of 62

slide-12
SLIDE 12

Simple Selection with Primary Index Support

A2: Primary B+-tree index on a key; equality search on the index attribute: This is the best of all possible cases. All-cases total cost (time): (hisec + 1) · (tS + tT).

  • There is one seek plus disk access for each level of the index which is

not in main memory, plus one more to reach the block containing the desired record A3: Primary B+-tree index on a non-key; equality search on index attribute: This is almost as good, but more than one block may need to be retrieved. All-cases total cost (time): (hisec + 1) · tS + (hisec + nrec) · tT.

  • The number of seeks is exactly as in the previous case.
  • The number of transfers is determined by the number nrec of blocks

which contain records matching the key.

  • The times will be even less if the block and/or more of the index is

cached.

Query Processing 20130530 Slide 11 of 62

slide-13
SLIDE 13

Simple Selection with Secondary Index Support

A4: Secondary B+-tree index on key; equality search on index attribute: The analysis is the similar to the case for a primary index, since only one record is retrieved. All-cases total cost (time): (hisec + 2) · (tS + tT).

  • There is one more seek+access than for a primary index — the

leaves of the B+-tree will contain references, not actual records. A4: Secondary B+-tree index on non-key; equality search on index attribute: Here the records to be found need not lie in the same block, or even in contiguous blocks. Worst-case total cost (time): (hisec + 1 + nrec) · (tS + tT).

  • nrec is the number of records which are retrieved.
  • There is one seek and one access for each such record.
  • There is one additional seek+access per record because the leaves of

the B+-tree will contain references and not the actual records.

  • The average-case is likely not much better.
  • The times will be less if the blocks and/or more of the index is cached.

Query Processing 20130530 Slide 12 of 62

slide-14
SLIDE 14

Extension to Inequality and Simple Range Queries

A5: Primary B+-tree index; simple range search on the index attribute: Applies to both key and non-key attributes. All-cases total cost (time): (hisec + 1) · tS + (hisec + nrec) · tT.

  • This time is identical to that of A3, but nrec will of course be larger

in general.

  • The retrieved records will be stored contiguously.

Example: σA≥10. Here the retrieval starts at A = 10, and returns all blocks “to the right” in the sequential ordering. A6: Secondary B+-tree index; simple range search on the index attribute: Applies to both key and non-key attributes. Worst-case total cost (time): (hisec + 1 + nrec) · (tS + tT).

  • This time is identical to that of A4, but nrec will of course be larger

in general.

  • The retrieved records will be stored contiguously.
  • The average-case is likely not much better.

Query Processing 20130530 Slide 13 of 62

slide-15
SLIDE 15

Extension to Conjunctive Selection Conditions

  • Considered here are queries of the form σθ1∧θ2∧...∧θk

A7: Conjunctive selection using one B+-tree index:

  • This approach requires that (at least) one of the conditions θi

involves an indexed attribute.

  • First, evaluate σθi using one of the approaches A2-A6.
  • Then, resolve the remaining conditions directly on the result of the

above evaluation.

  • If there are alternatives for θi, two heuristics may apply.
  • Choose the one which will return the fewest records for σθi.
  • Choose the one which is fastest for answering σθi.

A8: Conjunctive selection using a composite B+-tree index:

  • Here there is a composite index on two of the attributes, one for θi

and a second for θj.

  • The records which satisfy both conditions may be retrieved at once.
  • The details are similar to those of A7 and will not be developed in

detail here.

Query Processing 20130530 Slide 14 of 62

slide-16
SLIDE 16

Extension to Conjunctive Selection Conditions – 2

  • Considered here are queries of the form σθ1∧θ2∧...∧θk

A9: Conjunctive selection by intersection of pointers or identifiers:

  • Let J ⊆ {1, 2, . . . , k} with the property that for each j ∈ J, it is

possible to retrieve a set of pointers or identifiers which provides the solution to σθj.

  • Then the intersection of these sets provides pointers or identifiers to

a solution of σ

j∈J θj.

  • The resolution of the remaining subqueries of the form σθi with

i ∈ J is achieved by examining the retrieved records directly, as in the case of A7.

  • The computation of intersection will be considered shortly.

Query Processing 20130530 Slide 15 of 62

slide-17
SLIDE 17

Extension to Disjunctive Selection Conditions

  • Considered here are queries of the form σθ1∨θ2∨...∨θk
  • As is the case with most other problems involving disjunction (e.g.,

satisfiability of logical expressions), there are in general no efficient algorithms.

  • One approach which may provide some improvement is the disjunctive

version of A9. A10: Disjunctive selection by union of pointers or identifiers:

  • The idea here is to retrieve a set of pointers or identifiers, one for

each query σθi,

  • This approach only works is such a retrieval is possible for all indices.
  • In the notation of A9, J must equal {1, 2, . . . , n}.
  • Then the union of these sets provides pointers or identifiers to the

desired result.

  • The computation of union will be considered shortly.

Query Processing 20130530 Slide 16 of 62

slide-18
SLIDE 18

Two-Way Sort-Merge

  • It is often the case that the set of records to be sorted is too large to fit

into main memory, so a special algorithm is needed. Sort-merge: The idea behind the sort-merge algorithm is shown below.

  • First the unsorted list of records is decomposed into blocks which are

small enough to fit in main memory (one at a time).

  • The blocks are next brought into memory and sorted, one at a time.
  • The blocks are finally merged repeatedly until a single list is obtained.

35 68 12 19 88 44 98 06 53 13 15 02 Unsorted records 35 68 12 19 88 44 98 06 53 13 15 02 Decompose into blocks 12 35 68 19 44 88 06 53 98 02 13 15 Sort each block 12 19 35 44 68 88 02 06 13 15 53 98 Merge adjacent blocks 02 06 12 13 15 19 35 44 53 68 88 98 Merge adjacent blocks

Query Processing 20130530 Slide 17 of 62

slide-19
SLIDE 19

Two-Way Sort-Merge with Limited Primary Memory

  • The algorithm just described involves the recursive merger of two

potentially large lists, too large to fit into primary memory.

  • The algorithm for execution in limited memory is as follows.
  • Only those records shown in coral need be in main memory.
  • In general, for efficiency, much larger blocks, as opposed to just one

record, will be brought into memory at once.

  • A pointer is kept to the first unused element in each list, and is advanced

as the elements are merged into the new list.

  • The complete processing for this tiny example is shown on the next slide.

12 35 68 19 44 88

Query Processing 20130530 Slide 18 of 62

slide-20
SLIDE 20

Two-Way Sort-Merge with Limited Primary Memory – Example

  • Steps are left to right, then top to bottom.

12 35 68 19 44 88 12 35 68 19 44 88 12 12 35 68 19 44 88 12 12 35 68 19 44 88 12 19 12 35 68 19 44 88 12 19 12 35 68 19 44 88 12 19 35 12 35 68 19 44 88 12 19 35 12 35 68 19 44 88 12 19 35 44 12 35 68 19 44 88 12 19 35 44 12 35 68 19 44 88 12 19 35 44 68 12 35 68 19 44 88 12 19 35 44 68 12 35 68 19 44 88 12 19 35 44 68 88 12 35 68 19 44 88 12 19 35 44 68 88

Query Processing 20130530 Slide 19 of 62

slide-21
SLIDE 21

N-Way Sort-Merge

Two-way merging: The examples just shown merge two lists a time; hence the name. N-way merging: It is certainly possible to merge more than two lists a time.

  • When N lists are merged at a time, it is called N-way merging.
  • A simple example of four-way merging is shown below.
  • Regardless of the number of ways, the number of blocks which may be

kept in main memory at one time is fixed.

  • With N-way merging, the number of blocks need not be a power of two.

12 35 68 19 44 88 06 53 98 02 13 15 02 06 12 13 15 19 35 44 53 68 88 98

Query Processing 20130530 Slide 20 of 62

slide-22
SLIDE 22

Complexity of the Algorithm for External Sorting

nblk: Total number of blocks of records. nbps: Number of blocks transferred between primary and secondary memory in one operation (requires only one seek). M: Let M denote the number of blocks which will fit into the assigned buffer area of main memory (nbps < M).

  • A full formula for the complexity, based upon these parameters, may be

found in the corrections to the textbook. (The formula in the book itself contains errors.) Order of Complexity (number of seeks): After stripping away constants and smaller terms, the order of the number of seeks looks like this: nblk nbps

  • ·
  • log⌊M/nbps⌋

nblk M

  • Observations: The smaller the ratios nblk/nbps, nblk/M, and M/nbps, the

better.

Query Processing 20130530 Slide 21 of 62

slide-23
SLIDE 23

The Nested-Loop Algorithm for Join

  • Algorithms for computing the join are particularly important since the

number of records to process is the product of the number records in each of the two relations. Nested loop: The brute-force nested loop approach simply compares each tuple in the first relation r1 to each tuple in the second r2.

  • In the worst case, this can result in n1 · n2 disk seeks, which is prohibitive

for all but the smallest relations. Other Attr A a

r1

n1 tuples A Other Attr a

r2

n2 tuples

r2

n1 · n2 possible join tuples

Query Processing 20130530 Slide 22 of 62

slide-24
SLIDE 24

The Block-Oriented Nested-Loop Algorithm for Join

  • The basic nested-loop algorithm can be improved by retrieving large

blocks of records from each relation at once.

  • While the number of comparisons is not changed, the number of disk

seeks is reduced greatly.

  • This will result in (n1/nrpb1) · (n2/nrpb2) disk seeks, which may or may

not be prohibitive, depending upon the number of blocks.

  • These algorithms are very basic, and for many cases better ones are

available.

Other Attr A a

r1

n1 tuples A Other Attr a

r2

n2 tuples

r2

(n1/nrpb1) · (n2/nrpb2) pairs of blocks to process nrpb1 records per block nrpb2 records per block

Query Processing 20130530 Slide 23 of 62

slide-25
SLIDE 25

Join with an Index on One of the Relations

  • Suppose that an index exists on the join attribute(s) of one of the

relations (r2 here).

  • The other relation (r1) has no index and the tuples need not be ordered
  • n the join attribute(s).
  • The tuples of r1 may be processed sequentially, looking in the index to

see if a matching tuple exists in r2. Other Attr A e c a g e

r1

n1 tuples A Other Attr d a f a c d b

r2

n2 tuples

r2

p1 Index

  • n A

Query Processing 20130530 Slide 24 of 62

slide-26
SLIDE 26

Join with an Index on One of the Relations — Complexity

  • The complexity in this case is formally Θ(n1), assuming that an index

lookup in r2 takes constant time.

  • If index lookup in r2 takes Θ(log(n2)) time, then the complexity becomes

Θ(n1 · log(n2)).

  • However, the constants are likely to be quite large, since disk seeks will

be involved.

  • The performance may of course be improved by retrieving large blocks of

tuples from r1 at a a time, and batching identical values for A with a single index search in r2. Other Attr A e c a g e

r1

n1 tuples A Other Attr d a f a c d b

r2

n2 tuples

r2

p1 Index

  • n A

Query Processing 20130530 Slide 25 of 62

slide-27
SLIDE 27

Merge Join

  • Merge join is a very effective approach, but applies only when both

relations have primary indices on the join attribute(s).

  • The algorithm is similar to that of external merging, although the lists

are not actually merged.

  • Pointers are maintained to the sorted lists of tuples for each relation.
  • The one pointing to the lesser value is advanced.
  • Equal attribute values for the two pointers identifies a joinable pair.
  • The complexity is Θ(n1 + n2), but the constant is likely to be much

smaller than for the indexed join described previously. Other Attr A a c c e g

r1

n1 tuples A Other Attr b c d e e f h

r2

n2 tuples

r2

p1 p2

Query Processing 20130530 Slide 26 of 62

slide-28
SLIDE 28

Hybrid Merge Join

  • Here the records of only one relation are sorted, but the other has a

secondary (B+-tree) index.

  • The leaves of that B+-tree are pairs of the form (v, p), with v the value
  • f attribute A and p a pointer to the actual record.
  • They are sorted on v only.
  • This gives effectively a sorted list of the records of r2, but it is only a list
  • f pointers, not of physical records.

Other Attr A a c c e g

r1

n1 tuples A Other Attr d a f a c d b

r2

n2 tuples

r2

p1 Secondary Index

  • n A

Records ordered

  • n attribute A

Secondary B+-tree index

  • n attribute A

Query Processing 20130530 Slide 27 of 62

slide-29
SLIDE 29

Hybrid Merge Join — 2

Recall: The leaves of that B+-tree are pairs of the form (v, p), with v the value of attribute A and p a pointer to the actual record.

  • The records of r1 may be matched with the leaves of the B+-tree of r2

via a merge-like procedure, with sorting on the common attribute values.

  • This merged list may be furthermore optimized by doing a secondary sort
  • n the pointer values p for each group of entries from r2 with the same

value for the key v. Other Attr A a c c e g

r1

n1 tuples A Other Attr d a f a c d b

r2

n2 tuples

r2

p1 Secondary Index

  • n A

Records ordered

  • n attribute A

Secondary B+-tree index

  • n attribute A

Query Processing 20130530 Slide 28 of 62

slide-30
SLIDE 30

Double Hybrid Merge Join

  • This same approach may be applied in the case that both relations have

secondary B+-tree indices on the attributes to be joined.

  • Here the matching is effectively only on pointers to the actual records.

Other Attr A e c a g e

r1

n1 tuples A Other Attr d a f a c d b

r2

n2 tuples

r2

Secondary Index

  • n A

Secondary Index

  • n A

Secondary B+-tree index

  • n attribute A

Secondary B+-tree index

  • n attribute A

Query Processing 20130530 Slide 29 of 62

slide-31
SLIDE 31

Hash Join

  • In (GRACE) hash join, for each relation, the values for the join

attribute(s) are hashed into buckets using the same hash function.

  • The buckets should be small enough so that each corresponding pair of

buckets fits into main memory, but otherwise, the bigger the better.

  • Matches for the join will always be found within the corresponding

buckets for each relation.

  • Usually, an index is built for each bucket to facilitate searching.
  • If the blocks are too large for main memory, recursive partitioning is

employed. Other Attr A e c a g e

r1

n1 tuples A Other Attr d a f a c d b

r2

n2 tuples

r2

a g c e e a a f c b d d Hash buckets for r1 Hash buckets for r2

Query Processing 20130530 Slide 30 of 62

slide-32
SLIDE 32

Analysis of Hash Join

  • Hash join is widely used when there are no indices available on the join

attributes. Complexity: A rough estimate of the number of disk seeks is not difficult to

  • btain.

nblki: The number of disk blocks for relation ri. nbuf: The number of disk blocks which fit into the in-memory buffer. nhash: The number of hash buckets per relation. Result: Number of disk seeks: 2 · nblk1 nbuf

  • +

nblk2 nbuf

  • + 2 · nhash
  • The first term arises because each block of each hash bucket must be

written once it is computed, and re-read when it is processed for computing join matches with its partner bucket.

  • The second term arises when computing the content of the buckets. One

sequential scan of each relation is required for each bucket.

  • Recursive partitioning adds a factor which is log in the number of blocks.

Query Processing 20130530 Slide 31 of 62

slide-33
SLIDE 33

Algorithms for Projection

  • There is little of a special nature which can be done for projection.
  • Each record must be processed in turn, discarding the undesired

attributes.

  • In the case that the retained attributes do not form a key, it may be

necessary to remove duplicates (if the query requires it).

Query Processing 20130530 Slide 32 of 62

slide-34
SLIDE 34

Algorithms for Removal of Duplicates

  • There are two basic ways to remove duplicates from a list.

Sort and scan: The list is first sorted on a key, and then processed sequentially, with all but the first occurrence of the key removed.

  • The (N-way) merge sort algorithm is used, and governs the

complexity. On-the-fly index creation: The list is processed sequentially, building an index of all keys which have occurred.

  • If the key which is found in a given step is already present in the

index, the containing tuple is discarded.

  • If the index would be too large for main memory, hashing (as in hash

join) can be used, with the duplicate-removal process applied to each bucket.

  • Complexity analysis is similar to that for hash join.
  • In general, the removal of duplicates is an expensive operation, and so

must be requested explicitly.

Query Processing 20130530 Slide 33 of 62

slide-35
SLIDE 35

Set Operations

  • The binary set operations include union (∪), intersection (∩), and

difference (\).

  • For each of these, records with identical key values must be found in each
  • f the two sets.
  • To find these matching values, the main options are sorting and hashing.
  • Each of these approaches will be described in turn.

Query Processing 20130530 Slide 34 of 62

slide-36
SLIDE 36

Set Operations via Sorting

Set intersection using sorting: The most straightforward approach is to first sort the lists, and then use an approach similar to that for merge join to identify the matching elements. Set union using sorting: The most straightforward approach is to combine the two sets of records and then sort the result, discarding duplicates.

  • If the two input sets are already sorted, then a procedure similar to

merge join may be used, but this time an element is kept if it occurs in either list.

  • The merging operation is used only to eliminate duplicates in this

case. Set difference using sorting: The approach is similar to that for intersection, except that a tuple from the first list is discarded rather than kept if a matching tuple is found in the second list.

  • Unlike union and intersection, this operation is not symmetric in the

two lists.

  • In all cases, the computation of complexity is straightforward, based upon

previous analyses.

Query Processing 20130530 Slide 35 of 62

slide-37
SLIDE 37

Set Operations via Hashing

  • Suppose that a set operation on relations r1 and r2 is to be performed.
  • The first step in each case is to construct matching sets of hash buckets

{ri1, ri2, . . . , rik} for i ∈ {1, 2}, as in the hash join.

  • Next, for each j, 1 ≤ j ≤ k, do the following:
  • Bring r1j and r2j into main memory.
  • Build an index of r1j.
  • Complete the step listed below.

Set intersection using hashing: To compute r1j ∩ r2j, for each tuple in r2i, probe the index of r1i and delete the tuple from r1 if no match is found. Set difference using hashing: To compute r1j \ r2j, for each tuple in r2i, probe the index of r1i and delete the tuple from r1 if a match is found. Set union using hashing: To compute r1j ∪ r2j, for each tuple in r2i, probe the index of r1i and add that tuple to r1 is no match is found.

  • The results for each j are then combined.
  • In all cases, the computation of complexity is straightforward, based upon

previous analyses.

Query Processing 20130530 Slide 36 of 62

slide-38
SLIDE 38

Aggregation

Example: Average salary by department:

SELECT dept_name , avg(salary) FROM instructor GROUP BY dept_name;

  • The difficult part, with respect to computational complexity, is to

partition the tuples into the groups.

  • The approach for partitioning is similar to that for join, and involves

sorting and/or hashing. Two main approaches: First group, then aggregate: Compute the aggregate group (group by dept name in the example), and then perform the aggregation (averaging in the example). Aggregate on the fly, as groups are computed: As elements are added to each group, update the aggregation.

  • This works directly for operations such as count, sum, min, and max.
  • For avg, count and sum are aggregated on the fly, and avg is

computed from them at the end.

Query Processing 20130530 Slide 37 of 62

slide-39
SLIDE 39

Tree Representation of Expressions

  • An expression in the relational algebra may be represented by an operator

tree in which each vertex is an operator and the children of a vertex are the arguments of its operator.

  • For example, the expression

πname(σbuilding=’MIT-huset’(department) instructor) has the following tree representation. πname

  • σbuilding=’MIT-huset’

department instructor

  • Evaluation is usually bottom-up, and has several flavors.

Query Processing 20130530 Slide 38 of 62

slide-40
SLIDE 40

Materialized Evaluation of Expressions

  • In materialized evaluation, entire

subexpressions are evaluated before the parent operator is applied.

  • Thus, in the tree to the right, the entire

selection is evaluated, and the entire instructor relation is fetched, before applying them to the join operation.

πname

  • σbuilding=’MIT-huset’

department instructor

  • Similarly, the entire join is computed before applying the projection
  • perator.
  • This approach has at least two drawbacks:

No vertical concurrency: All subexpressions must be evaluated completely before an operator may be applied. Intermediate results to disk: The subexpressions may be large enough to require writing to disk and then re-reading by the parent operator.

  • It is therefore important to look for more efficient approaches.

Query Processing 20130530 Slide 39 of 62

slide-41
SLIDE 41

Pipelined Evaluation of Expressions

  • In pipelined evaluation, partial results are

passed up the evaluation tree, thus permitting vertical concurrency. Simple Example: Tuples from the join

  • peration may be fed to the projection
  • peration as the join is computed, where

they may be projected immediately.

πname

  • σbuilding=’MIT-huset’

department instructor

More Complex Example: Tuples from the select operation, as well as from the instructor relation, may be fed to the join operation before the select is completed.

  • The algorithms for computing the join incrementally are more

complex.

  • There are two principal models of execution.

Demand-driven pipelining: Lazy evaluation; supply tuples to the parent as they are requested. Producer-driven pipelining: Eager evaluation; create tuples even without requests from the parent and save them until a buffer is filled.

Query Processing 20130530 Slide 40 of 62

slide-42
SLIDE 42

Demand-Driven vs. Producer-Driven Pipelining

Demand-driven pipelining is most widely used on systems with limited facility for parallelism (i.e., only one vertex of the evaluation tree can be “operating” at a time).

πname

  • σbuilding=’MIT-huset’

department instructor

Producer-driven pipelining is more appropriate in situations in which a significant degree of true (i.e., hardware-level) parallelism is possible. Reasons:

  • The implementation of producer-driven pipelining is substantially more

complex, since it requires buffers between the stages and protocols for ensuring that the buffers never overfill.

  • Having such buffers is not likely to provide a great speedup if there is only
  • ne process and the communication is not disk-bound (which it is not).
  • Producer-driven pipelining, on the other hand, is amenable to greater

parallelism, since a child can fill its output buffer and need not wait for the parent to require input before producing it.

Query Processing 20130530 Slide 41 of 62

slide-43
SLIDE 43

Algorithms for Pipelining

  • Algorithms for pipelining assume that the inputs come one at a time, or

in small blocks. Algorithms for σ and π: These are straightforward since there is only one input stream.

  • The inputs are select or rejected (for σ) or projected (for π) one at a

time. Algorithms for : These are more complex, since there are two input streams, and any tuple from the first stream may be a potential match for any tuple in the second stream. Rough idea: Tuples from the two input streams are cached until a “critical amount” is reached. Then the join on those tuples is computed.

  • When a second batch of inputs arrives, it must be integrated into

the first.

  • Tuples which arrived previously must be tagged, so that they are not

paired again.

Query Processing 20130530 Slide 42 of 62

slide-44
SLIDE 44

Optimization

  • The user of a DBMS is expected to write semantically correct queries,

but not to “optimize” the SQL so that it executes efficiently.

  • Rather, the task of optimization is an internal one.
  • The process of optimization involves a number of steps.

Transformation: Equivalence rules on relational expressions are used to identify appropriate ones for consideration. Estimation of size: With the help of statistical information about the relations (found in the system catalog), estimates of how large the intermediate results will be is made. Choice of a plan of evaluation: Using the size information as well as knowledge of which indices are available, an optimal plan of execution, or at least a very good one, may be chosen.

Query Processing 20130530 Slide 43 of 62

slide-45
SLIDE 45

Equivalence of Expressions in the Relational Algebra

  • Consider the following simple query on the university database schema.

SELECT name FROM department NATURAL JOIN instructor WHERE building=’MIT -huset ’;

  • The Lexer+Parser+View Resolver might produce the representation

shown to the left below.

πname σbuilding=’MIT-huset’

  • department

instructor

πname

  • σbuilding=’MIT-huset’

department instructor

  • However, the one on the right is equivalent, and better from an execution

point of view in that it performs the selection before the join. Question: How does the system know this and make the transformation?

Query Processing 20130530 Slide 44 of 62

slide-46
SLIDE 46

Equivalence Rules Example 1

Equivalence rules: The system is able to make the kind of transformation shown on the previous slide (and below) by applying equivalence rules.

  • The specific equivalence rule applied here is identified as 7a in the

textbook, and reads as follows: σθ0(E1 θ E2) = (σθ0(E1)) θ E2

  • The equivalence holds provided that the attributes involved in θ0 form a

subset of those involved in E1; i.e., Attrset(θ0) ⊆ Attrset(E1).

πname σbuilding=’MIT-huset’

  • department

instructor

πname

  • σbuilding=’MIT-huset’

department instructor

Query Processing 20130530 Slide 45 of 62

slide-47
SLIDE 47

Equivalence Rules Example 1 — 2

  • If the instructor relation is large, the operator tree to the right might

provide the basis for a more efficient solution than would the one to the left.

  • Although the equivalence is “obvious” to a human, it actually requires

the application of several of the rules from the textbook to obtain the equivalence.

  • The specific equivalence rules applied here are 3, 5, and 8b from the

textbook.

  • The way in which these rules are applied is discussed on the following

slides.

πname

  • σbuilding=’MIT-huset’

department instructor

πname

  • σbuilding=’MIT-huset’

department

πname,dept name

instructor

Query Processing 20130530 Slide 46 of 62

slide-48
SLIDE 48

Equivalence Rules Example 1 — 3

  • Rule 3 provides the obvious condition on nested projections with Li ⊆ Lj

for i ≤ j: πL1(πL2(. . . (πLn(e)) . . .)) = πL1(E)

  • Rule 5 just states that the join is commutative:

E1 θ E2 = E2 θ E1

  • Applying these rules to the expression on the left yields the expression on

the right.

πname

  • σbuilding=’MIT-huset’

department instructor

πname πname,dept name

  • instructor

σbuilding=’MIT-huset’

department

Query Processing 20130530 Slide 47 of 62

slide-49
SLIDE 49

Equivalence Rules Example 1 — 4

  • Rule 8b is complex and has the following form

πL1∪L2(E1 θ E2) = πL1∪L2((πL1∪L3(E1)) θ (πL2∪L4(E2))) with L1 ⊆ Attrset(E1) L3 = Attrset(E1) ∩ (Attrset(θ) \ (L1 ∪ L2)) L2 ⊆ Attrset(E2) L4 = Attrset(E2) ∩ (Attrset(θ) \ (L1 ∪ L2)).

  • Applying this rule to the expression on the left with with

E1 = instructor L1 = {name, dept name} L3 = ∅ E2 = σbuilding=’MIT-huset’(department) L2 = Attrset(department) L4 = ∅ and Attrset(θ) = {dept name} yields the expression on the right.

πname πname,dept name

  • instructor

σbuilding=’MIT-huset’

department

πname πname,dept name

  • πname,dept name

instructor

σbuilding=’MIT-huset’

department

Query Processing 20130530 Slide 48 of 62

slide-50
SLIDE 50

Equivalence Rules Example 1 — 5

  • Applying rules 3 and 5 once again yields the desired result.

Moral: Even “obvious” equivalences can require the application of fairly complex rules in a formal derivation.

πname πname,dept name

  • πname,dept name

instructor

σbuilding=’MIT-huset’

department

πname

  • σbuilding=’MIT-huset’

department

πname,dept name

instructor

Query Processing 20130530 Slide 49 of 62

slide-51
SLIDE 51

The Relevance of Form for Input Queries

  • Shown at the bottom of the page are two equivalent queries in SQL,

together with tree representations of corresponding operator expressions.

  • The tree on the left is “better” for the purposes of efficient evaluation.
  • However, this must not be interpreted as a need for the user to prefer

and to supply the SQL on the left.

  • It is the task of the query optimizer, not the user, to answer the query

using the appropriately optimized operator tree. Question: Can this transformation be made using the equivalences in the textbook?

SELECT name SELECT name FROM instructor FROM instructor WHERE (dept_name = ’Comp.Sci.’) WHERE (dept_name = ’Comp.Sci.’) INTERSECT AND (salary >70000); SELECT name FROM instructor WHERE (salary >70000);

πname σ(dept name=Comp. Sci.)∧(salary>70000)

instructor

∩ πname σ(dept name=Comp. Sci.)

instructor

πname σ(salary>70000)

instructor

Query Processing 20130530 Slide 50 of 62

slide-52
SLIDE 52

How to Find out What PostgreSQL Really Does

  • To find out what PostgreSQL really does for a given query, use the

EXPLAIN directive.

university =# EXPLAIN SELECT name FROM instructor WHERE (dept_name = ’Comp.Sci.’) AND (salary >70000);

QUERY PLAN

  • - ----------------------------------------------------------------------------------

Seq Scan on instructor (cost =0.00..16.60 rows =1 width =58) Filter: (( salary > 70000:: numeric) AND (( dept_name ):: text = ’Comp.Sci.’:: text )) (2 rows)

Question: Is the representation using INTERSECT handled the same way?

Query Processing 20130530 Slide 51 of 62

slide-53
SLIDE 53

How to Find out What PostgreSQL Really Does — 2

university =# EXPLAIN SELECT name FROM instructor WHERE (dept_name = ’Comp.Sci.’) INTERSECT SELECT name FROM instructor WHERE (salary >70000);

QUERY PLAN

  • - -------------------------------------------------------------------------------

HashSetOp Intersect (cost =0.00..32.86 rows =1 width =58)

  • >

Append (cost =0.00..32.49 rows =149 width =58)

  • >

Subquery Scan on "*SELECT*1" (cost =0.00..15.52 rows =2 width =58)

  • >

Seq Scan on instructor (cost =0.00..15.50 rows =2 width =58) Filter: (( dept_name ):: text = ’Comp.Sci.’:: text)

  • >

Subquery Scan on "*SELECT*2" (cost =0.00..16.97 rows =147 width =58)

  • >

Seq Scan on instructor (cost =0.00..15.50 rows =147 width =58) Filter: (salary > 70000:: numeric) (8 rows)

Answer: Apparently not; it solves the two subqueries separately and then combines the results.

Query Processing 20130530 Slide 52 of 62

slide-54
SLIDE 54

How to Find out What PostgreSQL Really Does — 3

  • Here is a more complex query involving a join:

university =# EXPLAIN SELECT name FROM department NATURAL JOIN instructor WHERE building=’Watson ’ AND salary >70000;

QUERY PLAN

  • - ----------------------------------------------------------------------------------------------------

Hash Join (cost =16.79..32.85 rows =1 width =58) (actual time =0.055..0.069 rows =3 loops =1) Output: instructor.name Hash Cond: (( instructor .dept_name ):: text = (department .dept_name ):: text)

  • >

Seq Scan on instructor (cost =0.00..15.50 rows =147 width =116) (actual time =0.018..0.027 rows =8 loops =1) Output: instructor.id , instructor .name , instructor .dept_name , instructor .salary Filter: (salary > 70000:: numeric)

  • >

Hash (cost =16.75..16.75 rows =3 width =58) (actual time =0.018..0.018 rows =2 loops =1) Output: department.dept_name

  • >

Seq Scan on department (cost =0.00..16.75 rows =3 width =58) (actual time =0.006..0.011 rows =2 loops =1) Output: department.dept_name Filter: (( building ):: text = ’Watson ’:: text) Total runtime: 0.123 ms (12 rows) Query Processing 20130530 Slide 53 of 62

slide-55
SLIDE 55

How to Find out What PostgreSQL Really Does — 4

  • An an equivalent formulation via INTERSECT:

university =# EXPLAIN SELECT name FROM department NATURAL JOIN instructor WHERE building=’Watson ’ INTERSECT SELECT name FROM department NATURAL JOIN instructor WHERE salary >70000;

  • - ----------------------------------------------------------------------------------------------------

HashSetOp Intersect (cost =16.79..74.39 rows =2 width =58)

  • >

Append (cost =16.79..74.02 rows =149 width =58)

  • >

Subquery Scan "*SELECT*1" (cost =16.79..32.88 rows =2 width =58)

  • >

Hash Join (cost =16.79..32.86 rows =2 width =58) Hash Cond: (( public.instructor .dept_name ):: text = (public. department .dept_name ):: text)

  • >

Seq Scan on instructor (cost =0.00..14.40 rows =440 width =116)

  • >

Hash (cost =16.75..16.75 rows =3 width =58)

  • >

Seq Scan on department (cost =0.00..16.75 rows =3 width =58) Filter: (( building ):: text = ’Watson ’:: text)

  • >

Subquery Scan "*SELECT*2" (cost =22.15..41.14 rows =147 width =58)

  • >

Hash Join (cost =22.15..39.67 rows =147 width =58) Hash Cond: (( public.instructor .dept_name ):: text = (public. department .dept_name ):: text)

  • >

Seq Scan on instructor (cost =0.00..15.50 rows =147 width =116) Filter: (salary > 70000:: numeric)

  • >

Hash (cost =15.40..15.40 rows =540 width =58)

  • >

Seq Scan on department (cost =0.00..15.40 rows =540 width =58) (16 rows)

  • Again, it solves the two subqueries separately and then combines the

results.

Query Processing 20130530 Slide 54 of 62

slide-56
SLIDE 56

Choosing a Plan of Evaluation

Question: How does a DBMS choose a plan of evaluation?

  • There are many things to be taken into consideration.

Choice of equivalent expression: There is a huge number of equivalent

  • perator expressions which may in principle be considered.
  • Classical optimization techniques such as dynamic programming are

used to reduce the number of possible choices. Size of the components: The size of the relations, and of the parts of relations which will be used in solving the queries, has a profound effect upon performance.

  • The system maintains comprehensive statistics not only on the sizes
  • f whole relations, but on the sizes of selected range queries as well.

Available indices: Indices should of course be used to advantage whenever possible. Heuristics: Last but not least, heuristics are often employed. Example: A very common (and natural) heuristic is to move operations which reduce size (e.g., selection and projection) to be evaluated before those which increase size (e.g., join).

Query Processing 20130530 Slide 55 of 62

slide-57
SLIDE 57

Implementation Requirements for Views

  • From a theoretical point of view, a view is just a query.
  • However, there are important differences which mandate a different form
  • f implementation.

Persistence: Unlike queries, which provide “snapshot” information about the database at a single point in time, views are persistent and provide information which changes over time. Implications of persistence: Persistence has at least two significant implications which impact implementation. Multiple access: A view may be accessed over and over, possibly by several users.

  • Re-evaluation of the query at each access may not be a realistic
  • ption.

Presentation of updates: If the main schema is updated, a change in the view state may result.

  • Complete re-evaluation of the query at each update may not be a

realistic option.

Query Processing 20130530 Slide 56 of 62

slide-58
SLIDE 58

Materialized Views and View Maintenance

Materialization of views: For the reasons just noted, views are often materialized; that is, a copy of the view is kept in storage (primary or secondary), just as if it were a true relation of the schema. Updates: When the database is updated, the materialization of the view may also need to be changed.

  • It is not feasible to re-compute the entire view state after each update of

the main schema. Incremental view maintenance: Rather, only the part of the view which is affected by the update is changed.

  • The basic ideas for insertions and deletions for views defined by selection,

projection, and join will be presented.

  • The techniques for modifications (SQL UPDATE) are similar and often

even simpler.

Query Processing 20130530 Slide 57 of 62

slide-59
SLIDE 59

Algorithms for Incremental View Maintenance — Selection

Selection: Suppose that a view is defined by a single selection operation σθ

  • n relation R with state r.
  • If the tuples in rins are added to R, so that the new state is r ∪ rins, then

the new state of the view will be σθ(r ∪ rins) = σθ(r) ∪ σθ(rins)

  • Thus, to reflect this new state in the materialized view, it is only

necessary to add the tuples in σθ(rins) to the existing materialization.

  • If the tuples in rdel are deleted from R, so that the new state is r \ rdel,

then the new state of the view will be σθ(r \ rdel) = σθ(r) \ σθ(rdel) since the selection operation is injective (one to one) on tuples.

  • Thus, to reflect this new state in the materialized view, it is only

necessary to delete the tuples in σθ(rdel) from the existing materialization.

Query Processing 20130530 Slide 58 of 62

slide-60
SLIDE 60

Algorithms for Incremental View Maintenance — Projection

Projection: A view is defined by a single projection operation πA on the set

  • f attributes A of relation R, with the current state being r.
  • If the tuples in rins are added to R, so the new state is r ∪ rins, then the

new state of the view will be πA(r ∪ rins) = πA(r) ∪ πA(rins)

  • It must be noted here that, in general, if A does not contain a key for R

πA(r) ∩ πA(rins) = ∅ need not hold, and so duplicates will have to be eliminated before the physical insertion of πA(rins) into the materialization.

  • Deletion from a projection which does not contain a key of R is even

more complex.

  • If the tuples in rdel are deleted from R, the new state is r \ rdel.
  • However, the new state of the view is not necessarily πA(r) \ πA(rdel),

since a tuple in the view may arise from several distinct tuples in the main schema.

  • A solution is discussed on the next slide.

Query Processing 20130530 Slide 59 of 62

slide-61
SLIDE 61

Algorithms for Incremental View Maintenance — Projection 2

Projection: A view is defined by a single projection operation πA on the set

  • f attributes A of relation R, with the current state being r.
  • The solution to the non-injectivity problem is to keep a count, in the

view, of the number of ways that a view tuple arose from a tuple of the main schema. Insertion: When an insertion is performed, the counts of all inserted tuples are increased.

  • If an inserted tuple is not already present, its count is set to 1.
  • If an inserted tuple is already present, its count is incremented by 1.

Deletion: When a deletion is performed, the counts of all deleted tuples are decreased by 1.

  • If the count of a deleted tuple is reduced to 0, that tuple is removed.
  • If the count of a deleted tuple remains 1 or greater, it is not

removed.

Query Processing 20130530 Slide 60 of 62

slide-62
SLIDE 62

Algorithms for Incremental View Maintenance — Join

Join: Suppose that a view is defined by a single join operation θ on relations R1 and R2 with instances r1 and r2, respectively.

  • If the tuples in rins are added to R2, so that the new state is r2 ∪ rins,

then the new state of the view will be r1 θ (r2 ∪ rins) = (r1 r2) ∪ (r1 θ rins)

  • Thus, to reflect this new state in the materialized view, it is only

necessary to add the tuples in r1 θ rins to the existing materialization.

  • If the tuples in rdel are deleted from R2, so that the new state is r2 \ rdel,

then the new state of the view will be r1 θ (r2 \ rdel) = (r1 r2) \ (r1 θ rdel) since the join operation is injective (one to one) on tuples.

  • Thus, to reflect this new state in the materialized view, it is only

necessary to delete the tuples in r1 θ rdel from the existing materialization.

Query Processing 20130530 Slide 61 of 62

slide-63
SLIDE 63

Algorithms for Incremental View Maintenance — Other

SPJ-views: Combinations of selection, projection, and join (the so-called SPJ-views), are handled via a straightforward combination of the techniques just described. Aggregation: Aggregation is handled in a manner similar to that for projection, with counts maintained and updated as necessary.

  • The details are not presented here.

Set Operations: These are also handled in a very straightforward fashion, using variants of the techniques already described.

Query Processing 20130530 Slide 62 of 62