Statistical Learning Techniques for Costing XML Queries
Ning Zhang1 Peter J. Haas2 Vanja Josifovski2 Guy M. Lohman2 Chun Zhang2
1University of Waterloo 2IBM Almaden Research Center
VLDB 2005
1 Ning Zhang
Statistical Learning Techniques for Costing XML Queries Ning Zhang 1 - - PowerPoint PPT Presentation
Statistical Learning Techniques for Costing XML Queries Ning Zhang 1 Peter J. Haas 2 Vanja Josifovski 2 Guy M. Lohman 2 Chun Zhang 2 1 University of Waterloo 2 IBM Almaden Research Center VLDB 2005 Ning Zhang 1 COMET: A New Cost-Modeling
1University of Waterloo 2IBM Almaden Research Center
1 Ning Zhang
Catalog Statistics Production query Identify Features
# cache misses Selectivity …
Estimate feature values
RUNSTATS
Collect Statistics Develop analytical cost model Apply cost function
Cost estimate
cost
CPU_speed) hash_cost( * ty selectivi cost ColCard / | | y selectivit R 2 Ning Zhang
Catalog Statistics Production query Identify Features
# cache misses Selectivity …
Estimate feature values
RUNSTATS
Collect Statistics Develop analytical cost model Apply cost function
Cost estimate
Identify Features
# cache misses Selectivity …
Estimate feature values
RUNSTATS
Collect Statistics Apply cost function
Cost estimate Learn cost model Training queries
) ˆ , , ˆ ( 1
n
v v f cost f
ColCard / | | y selectivit R
cost
CPU_speed) hash_cost( * ty selectivi cost ColCard / | | y selectivit R 2 Ning Zhang
3 Ning Zhang
3 Ning Zhang
4 Ning Zhang
5 Ning Zhang
6 Ning Zhang
6 Ning Zhang
7 Ning Zhang
8 Ning Zhang
8 Ning Zhang
8 Ning Zhang
9 Ning Zhang
9 Ning Zhang
10 Ning Zhang
10 Ning Zhang
j j
wj cost
45o 11 Ning Zhang
12 Ning Zhang
12 Ning Zhang
12 Ning Zhang
12 Ning Zhang
13 Ning Zhang
13 Ning Zhang
14 Ning Zhang
1000 3000 5000 1000 3000 5000 7000 Predicted vs. Actual Values Predicted (msec.) Actual (msec.)
NRMSE = 0.084 R−sq = 0.997 OPD = 0.972 MUP = 1000.110 (14.6%)
20000 40000 60000 80000 20000 40000 60000 Predicted vs. Actual Values Predicted (msec.) Actual (msec.)
NRMSE = 0.099 R−sq = 0.980 OPD = 0.948 MUP = 6428.379 (14.3%)
15 Ning Zhang
16 Ning Zhang
2000 3000 0.1 0.2 0.3 0.4 0.5 Number of training queries NRMSE
17 Ning Zhang
18 Ning Zhang
18 Ning Zhang