August 2002
Slide 1 PLASTIC Presentation (VLDB)
PLAN SELECTION based on QUERY CLUSTERING Antara Ghosh Jignashu - - PowerPoint PPT Presentation
PLAN SELECTION based on QUERY CLUSTERING Antara Ghosh Jignashu Parikh Vibhuti Sengar Jayant Haritsa Computer Science & Automation Indian Institute of Science Bangalore, INDIA Slide 1 PLASTIC Presentation (VLDB) August 2002
August 2002
Slide 1 PLASTIC Presentation (VLDB)
August 2002
Slide 2 PLASTIC Presentation (VLDB)
August 2002
Slide 3 PLASTIC Presentation (VLDB)
August 2002
Slide 4 PLASTIC Presentation (VLDB)
Query (Q) Query Optimizer Plan P(Q)
DB stats Cost Model
August 2002
Slide 5 PLASTIC Presentation (VLDB)
– Exact Match: Current commercial
– Similarity Match: PLASTIC (PLAn Selection Through Incremental Clustering)
plans (a plan template is the
Query Space Plan Space
August 2002
Slide 6 PLASTIC Presentation (VLDB)
– Makes it affordable to run optimizers at their highest
have to be explicitly optimized – Reduces workload on DBAs
August 2002
Slide 7 PLASTIC Presentation (VLDB)
Select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment From part p, supplier s, partsupp ps, nation n, region r Where p_partkey = ps_partkey and s_suppkey= ps_suppkey and p_size := :1 and p_type like :2 and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name := :3 and ps_supplycost := :4
August 2002
Slide 8 PLASTIC Presentation (VLDB)
Note: 80% of space occupied by 20% of the Plans
August 2002
Slide 9 PLASTIC Presentation (VLDB)
– Results in heterogeneous clusters making it difficult to classify new queries
August 2002
Slide 10 PLASTIC Presentation (VLDB)
select a.firstname, a.lastname ,b.projno, c.resume from employee as a, emp_act as b, emp_resume as c where a.empno=b.empno and b.empno=c.empno select * from employee as a, emp_act as b, emp_photo as c where a.empno=b.empno and b.empno=c.empno and a.empno>'000000' and b.empno<'000400‘ and c.empno between '000010' and '000390'
August 2002
Slide 11 PLASTIC Presentation (VLDB)
August 2002
Slide 12 PLASTIC Presentation (VLDB)
– Feature vectors have structural + statistical components (explained later) – Each cluster is defined by a single representative query – Clustering in Query Space may result in multiple clusters mapping to the same plan template
Query Space Plan Space
August 2002
Slide 13 PLASTIC Presentation (VLDB)
August 2002
Slide 14 PLASTIC Presentation (VLDB)
Query (Q) Query Optimizer Plan p(qi)
DecTree
q1,p(q1)
Seed queries
Feature Vector q2,p(q2) qk,p(qk) Sim Check
August 2002
Slide 15 PLASTIC Presentation (VLDB)
August 2002
Slide 16 PLASTIC Presentation (VLDB)
Feature Vector Extractor Similarity Check Plan Generator
Cluster Reorganization
Query Cluster Database
Plan Template Database System Catalogs Plan Template Generator Query Optimizer
Feature Vector Feature Vector Cluster Id
Match No Match Plan
Query
August 2002
Slide 17 PLASTIC Presentation (VLDB)
August 2002
Slide 18 PLASTIC Presentation (VLDB)
– Structural Features
– Statistical Features
– study of query optimization literature – characteristics of plans generated by commercial
– not involving computation of any plan specific information – not requiring additional inputs beyond those already available to the optimizer
August 2002
Slide 19 PLASTIC Presentation (VLDB)
– No. of Join Predicates in which the table is involved
– JIC[k] = Number of join predicates (in which the table participates) having k indexed attributes in the join predicate k = 0, 1 or 2
– Count of SARGable and Non-SARGable predicates in which the table is involved
– Set if all the selection attributes and projections on that table can be evaluated through indexes only ( i.e. Required information can be
tables)
August 2002
Slide 20 PLASTIC Presentation (VLDB)
August 2002
Slide 21 PLASTIC Presentation (VLDB)
Select A.a1,B.b1 from A, B Where A.a1 = B.b2 and A.a2 >100 and B.b3 <25
5000 200000 ETS 100000 400000 TS {0, 0,1} {0, 0,1} JIC PCnsarg 1 1 PCsarg 1 IF 1 1 DT Table B Table A Feature
August 2002
Slide 22 PLASTIC Presentation (VLDB)
August 2002
Slide 23 PLASTIC Presentation (VLDB)
be different
August 2002
Slide 24 PLASTIC Presentation (VLDB)
August 2002
Slide 25 PLASTIC Presentation (VLDB)
, ) , ) ( ) ( ) , (
2 1 2 1 2 2 1 1 2 1 j i j i j i j i ij
TS max(TS ETS ETS w TS TS w T T dist − ∗ + − ∗ =
i = Table size of ith Table of Query k
i = Estimated Table size of ith Table of Query k
– w1, w2 ε [0,1] and w2 = 1-w1
mapping with the mindist (minimum aggregate value of dist) is selected
August 2002
Slide 26 PLASTIC Presentation (VLDB)
G g g
August 2002
Slide 27 PLASTIC Presentation (VLDB)
– smaller percentage of error-causing clusters (i.e. clusters straddling plan boundaries in the plan diagram), – larger number of clusters increases the search space for classification
August 2002
Slide 28 PLASTIC Presentation (VLDB)
Q1: select * from nation, region where n_nationkey=r_regionkey Q2: select n_nationkey from nation, region where n_nationkey = r_regionkey Q3: select n_comment, r_comment from nation, region
Q1 and Q2 although they look similar!!
Q1 and Q3 although they seem different!
August 2002
Slide 29 PLASTIC Presentation (VLDB)
August 2002
Slide 30 PLASTIC Presentation (VLDB)
August 2002
Slide 31 PLASTIC Presentation (VLDB)
August 2002
Slide 32 PLASTIC Presentation (VLDB)
August 2002
Slide 33 PLASTIC Presentation (VLDB)
– Default optimization class of DB2 (level 5)
Queries
– Queries are uniformly distributed over the selectivity space (limited to 2D) – Static resource configuration
August 2002
Slide 34 PLASTIC Presentation (VLDB)
Select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment From part p, supplier s, partsupp ps, nation n, region r Where p_partkey = ps_partkey and s_suppkey= ps_suppkey and p_size := :1 and p_type like :2 and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name := :3 and ps_supplycost := :4
65 Clusters Generated with Threshold value of 0.01 W1= 0.7 and W2 = 0.3
August 2002
Slide 35 PLASTIC Presentation (VLDB)
3.96KB 1.97KB
0.00025 s 0.004s 0.1s Efficiency 88.8% 90.76% 100% Accuracy P-DB2 Decision Tree P-DB2 Leader DB2 Metric
159078 161814 5 158681 158577 4 1.1 188684 173913 3 2 246000 241054 2 1.9 266260 261209 1 Risk Factor (%) P-DB2 Cost timeron DB2 Cost timeron Error Case
Risk Factor
August 2002
Slide 36 PLASTIC Presentation (VLDB)
August 2002
Slide 37 PLASTIC Presentation (VLDB)
August 2002
Slide 38 PLASTIC Presentation (VLDB)
August 2002
Slide 39 PLASTIC Presentation (VLDB)
Select l_extendedprice, l_discount From customer, orders, lineitem, supplier, nation, region Where c_custkey = o_orderkey and L_COMMITDATE = O_ORDERDATE and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ‘AFRICA’ and o_orderdate >= date (‘1997-01-01’) and year(o_orderdate) < (year (‘1997-01-01’)+1);
August 2002
Slide 40 PLASTIC Presentation (VLDB)
Select l_extendedprice, l_discount From customer, orders, lineitem, supplier, nation, region Where c_custkey = o_orderkey and L_COMMITDATE = O_ORDERDATE and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ‘AFRICA’ and o_orderdate >= date (‘1997-01-01’) and year(o_orderdate) < (year (‘1997-01-01’)+1);
August 2002
Slide 41 PLASTIC Presentation (VLDB)
Select l_extendedprice, l_discount From customer, orders, lineitem, supplier, nation, region Where c_custkey = o_orderkey and L_SHIPDATE = O_ORDERDATE and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ‘AFRICA’ and o_orderdate >= date (‘1997-01-01’) and year(o_orderdate) < (year (‘1997-01-01’)+1);
Counts in both queries remain same
August 2002
Slide 42 PLASTIC Presentation (VLDB)
August 2002
Slide 43 PLASTIC Presentation (VLDB)
– handle correlated nested queries, as well as GROUP BY and HAVING clauses – handle changes in the system resource availability between training and operational stages
– Error varies with table selectivities – Cluster sizes should thus be made sensitive to selectivities
August 2002
Slide 44 PLASTIC Presentation (VLDB)
– No attempt to optimize Queries – Instead, we aim to reuse previous optimization results – PLASTIC’s plan selection is not specific to a temporal window of queries
– We do not try to characterize the plan space for a given query – Our approach extends to sharing of plans across similar queries
August 2002
Slide 45 PLASTIC Presentation (VLDB)