Query Processing Query Processing Steps balance < 2500 ( - - PowerPoint PPT Presentation
Query Processing Query Processing Steps balance < 2500 ( - - PowerPoint PPT Presentation
Query Processing Query Processing Steps balance < 2500 ( balance ( account)) balance ( balance < 2500 ( account)) Step 1 SELECT balance FROM account WHERE balance < 2500 Step 2 Step 3 A B + -tree index on balance
CMPT 454: Database II -- Query Processing 2
Query Processing Steps
σbalance<2500(∏balance(account)) ∏balance(σbalance<2500(account)) Step 1 Step 2 Step 3 SELECT balance FROM account WHERE balance < 2500 A B+-tree index on balance
CMPT 454: Database II -- Query Processing 3
Query Cost Measures
- Query processing as an optimization problem
– Search space: all possible equivalent relational algebra expressions all possible query execution plans – Goal: find the most efficient query execution
- Efficiency: I/O or CPU?
– CPU processing time is often much smaller than I/O cost, and is hard to estimate (real systems do consider) – Each I/O access cost may slightly different – Number of block transfers is a measure of the dominant component of query answering cost – Communication cost in distributed database systems
CMPT 454: Database II -- Query Processing 4
Selection
- Table-scan: read the blocks one by one from disk
– Linear search: scan each file block,
- Cost = tS + br * tT, where br is the number of blocks in the file, tS is the average
seek time, and tT is the average block transfer time
- Search on a key attribute: an average cost of br / 2
- Can be used in any cases
– Binary search: the file is ordered on an attribute, the selection condition is an equality comparison on the attribute
- Cost: ⎡log2(br)⎤ * (tS + tT)
- If the attribute is not a key, some extra blocks may need to be read
- Index-scan: using an index in selection
– Primary index, equality on key: if a B+-tree is used, (hi + 1) * (tS + tT), where hi is the height of the tree – Primary index, equality on nonkey: hi * (tT + tS) + tS + tT * b, where b is the number of blocks containing records with the specified search key – Secondary index, equality on key: (hi + 1) * (tS + tT) – Secondary index, equality on nonkey: (hi + n) * (tT + tS), can be worse than linear search
CMPT 454: Database II -- Query Processing 5
Selection Involving Comparisons
- Primary index, comparison
– Case A > v or A ≥ v: use the index to find the first tuple having A ≥ v, scan the tuple up to the end of the file, cost hi * (tT + tS) + tS + tT * b – Case A < v or A ≤ v: scan from the beginning of the file until the condition is violated, the index is not used
- Secondary index, comparison: use the index to find the pointers to the
record, retrieve the data blocks
– Sort pointers to ensure each block is read once – Can be costly if the selectivity of a query is low (i.e., many tuples satisfy the condition)
- Conjunction σθ1∧ θ2∧. . . θn(r)
– Selection and test using one index on one attribute of composite search key – Selection by intersection of identifiers
- Disjunction σθ1∨ θ2 ∨. . . θn (r) by union of identifiers
CMPT 454: Database II -- Query Processing 6
Nested-Loop Join
- To compute the theta join r
θ s
for each tuple tr in r do begin for each tuple ts in s do begin test whether pair (tr,ts) satisfies the join condition θ if so, add tr • ts to the result end end
– r : the outer relation of the join – s : the inner relation of the join
- No indexes, can be used with any kind of join condition
- Cost
– Worst case – only one block of each relation in memory: nr ∗ bs + br – Best case: both relations are in memory: br + bs – If one relation can fit entirely in main memory, use that relation as the inner relation: br + bs
CMPT 454: Database II -- Query Processing 7
Block Nested-Loop Join
- Idea: once a block is read into main memory, the records in the block
should be utilized as much as possible for each block Br of r do begin for each block Bs of s do begin for each tuple tr in Br do begin for each tuple ts in Bs do begin check if (tr,ts) satisfies the join condition if so, add tr • ts to the result end end end end
- Cost
– Worst case – only one block for each relation : br ∗ bs + br – Best case: br + bs
CMPT 454: Database II -- Query Processing 8
Further Improvements
- In natural join or equi-join, if the join attributes form a key on the inner
relation, then for each outer relation tuple, the inner loop can stop as soon as the first match is found
- In the block nested-loop algorithm, if M blocks are available, use M-2
blocks for outer relation (why?)
– Total cost: ⎡br / (M-2)⎤ ∗ bs + br
- Scan the inner loop alternately forward and backward (similar to the
elevator algorithm), reuse the blocks remaining in the buffer
– How and why is it good?
- Indexed nested-loop join
– An index exists on the inner loop’s join attribute – use the index lookups to replace file scans – Cost: br (tT + tS) + nr ∗ c, where c is the cost of a single selection on s using the join condition, nr is the number of records in r
- If indices are available on the join attributes of both r and s, use the
relation with fewer tuples as the outer relation
CMPT 454: Database II -- Query Processing 9
Merge Join
- Can be used only for equi-joins and natural joins
- Sort both relations on their join attribute (if not already
sorted on the join attributes)
- Merge the sorted relations to join them
– Join step is similar to the merge stage of the sort-merge algorithm – Every pair with same value on join attribute must be matched
- Cost: br + bs block transfers + ⎡br / bb⎤ + ⎡bs / bb⎤
seeks + the cost of sorting if relations are unsorted
– After sorting, each block needs to be read only once – Suppose all tuples for any given value of the join attributes fit in memory – Can be further improved by combining the merge phase of merge-sort with the merge phase of merge- join – merge-join multiple sorted sublists
CMPT 454: Database II -- Query Processing 10
Hash Join: the Idea
CMPT 454: Database II -- Query Processing 11
Hash Join
- For equi-joins and natural joins only
- A hash function h depending only on the join attributes is
used to partition tuples of both relations,
– h maps JoinAttrs values to {0, 1, ..., n} – r0, r1, . . ., rn are partitions of r tuples, each tuple tr ∈ r is put in partition ri where i = h(tr [JoinAttrs]) – s0,, s1. . ., sn are partitions of s tuples, each tuple ts ∈s is put in partition si, where i = h(ts [JoinAttrs]) – r tuples in ri need only to be compared with s tuples in si
CMPT 454: Database II -- Query Processing 12
Setting Parameters of Hash Joins
- Algorithm: relation s: build input, relation r: probe input
Partition the relation s using hashing function h, when partitioning a relation,
- ne block of memory is reserved as the output buffer for each partition
Partition r similarly For each i do
Load si into memory and build an in-memory hash index on it using the join attribute
This hash index uses a different hash function than the earlier one h
Read the tuples in ri from the disk one by one For each tuple tr locate each matching tuple ts in si using the in-memory hash index Output the concatenation of their attributes
- n and the hash function h is chosen such that each si should fit in memory
– Use the smaller input relation as the build relation – The probe relation partitions ri need not fit in memory
- Typically n is chosen as ⎡bs/M⎤ * f where f is a “fudge factor”, typically around
1.2
CMPT 454: Database II -- Query Processing 13
Recursive Partitioning, Overflow
- For number of partitions n is greater than number of pages M of memory,
instead of partitioning n ways, use M – 1 partitions for s
- Further partition the M – 1 partitions using a different hash function, use same
partitioning method on r (Rarely required)
- Hash table overflow: a partition cannot fit in memory
– Many tuples with same value for join attributes due to bad hash function – Partitioning is said skewed if some partitions have significantly more tuples than some others
- Overflow resolution in build phase
– Partition si is further partitioned using different hash function – Partition ri must be similarly partitioned
- Overflow avoidance
– Performs partitioning carefully to avoid overflows during build phase – E.g. partition build relation into many partitions, then combine them
- Both approaches fail with large numbers of duplicates
– Fallback option: use block nested loops join on overflowed partitions
CMPT 454: Database II -- Query Processing 14
Performance Analysis
- Without recursive partitioning: 3 (br + bs) + 4 ∗ nh
- For recursive partitioning: 2 (br + bs) ⎡logM–1(bs) –
1⎤ + br + bs
– Cost of partitioning s: ⎡logM–1(bs) – 1⎤ – Similar cost for partitioning r – best to choose the smaller relation as the build relation
- If the entire build input can be kept in main
memory, then do not partition the relations into temporary files
– Cost: br + bs
CMPT 454: Database II -- Query Processing 15
Hybrid Hash Join
- Join the first partitions during partitioning the
tables
– Partition relation s, keep the first partition s0 in main memory – Partition relation r, join tuples in r0 with s0 in main memory – No need to store s0 and r0
- Most useful if M >>
s
b
CMPT 454: Database II -- Query Processing 16
Complex Joins
- Join with a conjunctive condition rθ1∧θ 2∧…∧θ n s
– Either use nested loops/block nested loops, or – Compute the result of one of the simpler joins rθis
- Final result comprises those tuples in the intermediate result
that satisfy the remaining conditions θ1∧...∧θi –1∧θi +1∧...∧θn
- Join with a disjunctive condition rθ1∨θ2∨...∨θns
– Either use nested loops/block nested loops, or – Compute as the union of the records in individual joins rθis
- Compute (rθ1s) ∪ (rθ2s) ∪ ... ∪ (rθns)
CMPT 454: Database II -- Query Processing 17
Sort-Based Algorithms
Operators Memory cost I/O cost Duplicate elimination, grouping and aggregation SQRT(B) 3B Union, intersection and difference SQRT(B(R)+B(S)) 3(B(R)+B(S)) Merge-join SQRT(MAX(B(R), B(S)) 5(B(R)+B(S)) Merge-join (improved) SQRT(B(R)+B(S)) 3(B(R)+B(S))
CMPT 454: Database II -- Query Processing 18
Hash-Based Algorithms
Operators Memory cost I/O cost Duplicate elimination, grouping and aggregation SQRT(B) 3B Union, intersection and difference SQRT(MIN(B(R),B(S))) 3(B(R)+B(S)) Simple hash-join SQRT(MIN(B(R),B(S))) 3(B(R)+B(S)) Hash-join (improved) SQRT(MIN(B(R),B(S)))
CMPT 454: Database II -- Query Processing 19
Duplicate Elimination Using Sorting
- Pass-1: sort tuples in sublists
- Pass-2: use the available main memory to hold one block
from each sorted sublist, repeatedly copy one to the output and ignore all tuples identical to it
- I/O cost: 3B(R)
– B(R) to read each block of R when creating the sorted sublists – B(R) to write each of the sorted sublists to disk – B(R) to read each block from the sublists back to generate the final results
- Memory usage:
– Each sublist can have up to M blocks (why?) – Up to M sublists can be processed in the second pass
CMPT 454: Database II -- Query Processing 20
Example
Memory Disk 1 2 2 2, 2 5 2 3 4 4, 4 5 1 1 2 3, 5 Sublists 2, 5, 2, 1, 2, 2 4, 5, 4, 3, 4, 2 1, 5, 2, 1, 3
Sorting
Memory Disk 2 2 2, 2 5 2 3 4 4, 4 5 2 3 5
Output “1” Output “2”
Memory Disk 5 3 4 4, 4 5 3 5 Memory Disk 5 4 4 4 5 5
Output “3” Output “4”
Memory Disk 5 5 5
O u t p u t “ 5 ”
Answer: 1, 2, 3, 4, 5
CMPT 454: Database II -- Query Processing 21
Duplicate Elimination Using Hashing
- Hash R to M-1 buckets
– Two duplicate tuples will be hashed to the same bucket
- Eliminate duplicates in each bucket
– Assumption: each bucket can fit into main memory – Memory usage:
- I/O cost: 3B(R)
– B(R) in each of the three phases: reading in, writing hashing result, processing each bucket
) (R B
CMPT 454: Database II -- Query Processing 22
Grouping and Aggregation
- The sort-based method – similar to duplicate
elimination
– Please study the algorithm and analysis by yourself
- The hash-based method
– Use a hash function depending only on the grouping attributes to hash all tuples to (M-1) buckets – Scan each bucket once to compute groups
CMPT 454: Database II -- Query Processing 23
Union Algorithms
- How to compute R∪S?
- The sort-based algorithm
– Sort R and S respectively using the same order – Merge the sorted sublists, remove duplicates
- The hash-based method
– Hashing S and R into buckets R1, …, RM-1, and S1, …, SM-1 using the same hash function – Compute Ri∪Si, get the union of the buckets
- Intersection and difference can be computed in a