1
Relational Query Optimization
UMass Amherst March 25 and 27, 2008
Slide Content Courtesy of R. Ramakrishnan, J. Gehrke, and J. Hellerstein
Relational Query Optimization UMass Amherst March 25 and 27, 2008 - - PowerPoint PPT Presentation
Relational Query Optimization UMass Amherst March 25 and 27, 2008 Slide Content Courtesy of R. Ramakrishnan, J. Gehrke, and J. Hellerstein 1 Overview of Query Evaluation Query Evaluation Plan : tree of relational algebra (R.A.) operators,
1
Slide Content Courtesy of R. Ramakrishnan, J. Gehrke, and J. Hellerstein
2
3
SELECT {DISTINCT} <list of columns> FROM <list of relations> {WHERE <list of "Boolean Factors">} {GROUP BY <list of columns> {HAVING <list of Boolean Factors>}} {ORDER BY <list of columns>};
4
SELECT {DISTINCT} <list of columns> FROM <list of relations> {WHERE <list of "Boolean Factors">} {GROUP BY <list of columns> {HAVING <list of Boolean Factors>}} {ORDER BY <list of columns>};
5
6
7
8
9
The algebraic expression partially specifies
Reserves Sailors
sid=sid bid=100 rating > 5 sname
10
Query evaluation plan is an
Cost: 500+500*1000 I/Os Misses several opportunities:
Reserves Sailors
sid=sid bid=100 rating > 5 sname
(Simple Nested Loops) (On-the-fly) (On-the-fly) (File scan) (File scan)
11
12
13
Push selections below the join. Materialization: store a
Reserves Sailors
sid=sid bid=100 sname
(On-the-fly)
rating > 5
(Scan; write to temp T1) (Sort-Merge Join) (File scan) (File scan)
With 5 buffers, cost of plan:
(Scan; write to temp T2)
14
Selection using index: clustered index
Indexed NLJ: pipelining the outer and
Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Hash index; Do not write to temp) (Index Nested Loops With pipelining) (On-the-fly)
(Hash index scan on bid)
Push rating>5 before the join? Need to use search arguments
Cost: Selection of Reserves tuples (10 I/Os); for each, must
(Hash index
15
16
1.
2.
17
18
For each block, the plans considered are:
C D B A
B A C D
B A C D
19
For each block, the plans considered are:
20
21
22
23
24
25
B A C D
26
27
28
ORDER BY, GROUP BY, aggregates etc. handled as a
29
Uniform distribution of values:
Often causes highly inaccurate estimates
Histogram: approximates a data distribution
30
Buckets Counts Frequency
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
8 4 15 3 15 8/3 4/3 15/3 3/3 15/3
Buckets Counts Frequency
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
9 10 10 7 9 9/4 10/4 10/2 7/4 9/1
favoring frequent values
31
Predicates are independent:
Often causes highly inaccurate estimates
Multi-dimensional histograms [PI’97, MVW’98, GKT’00]
Dependency-based histograms [DGR’01]
32
Nested query (block): a query that
Nested query with no correlation: the
33
Nested query with correlation: the
34
Implicit ordering of nested
The equivalent, non-nested
Query decorrelation is an
35
Query optimization is an important task in relational DBMS. Must understand optimization in order to understand the
Two parts to optimizing a query:
36