1/30
An Approach Based on Bayesian Networks for Query Selectivity Estimation
Max Halford12 Philippe Saint-Pierre1 Franck Morvan2
1Toulouse Institute of Mathematics (IMT) 2Toulouse Institute of Informatics Research (IRIT)
An Approach Based on Bayesian Networks for Query Selectivity - - PowerPoint PPT Presentation
An Approach Based on Bayesian Networks for Query Selectivity Estimation Max Halford 12 Philippe Saint-Pierre 1 Franck Morvan 2 1 Toulouse Institute of Mathematics (IMT) 2 Toulouse Institute of Informatics Research (IRIT) DASFAA 2019, Chiang Mai
1/30
1Toulouse Institute of Mathematics (IMT) 2Toulouse Institute of Informatics Research (IRIT)
2/30
3/30
4/30
5/30
6/30
7/30
◮ Unidimensional [IC93, TCS13, HKM15] (textbook approach) ◮ Multidimensional [GKTD05, Aug17] (exponential number of
◮ Bayesian networks [GTK01, TDJ11] (complex compilation
◮ Single relation [PSC84, LN90] (works well but has a high
◮ Multiple relations [Olk93, VMZC15, ZCL+18] (empty-join
◮ Query feedback [SLMK01] (useless for unseen queries) ◮ Supervised learning [LXY+15, KKR+18] (not appropriate in
8/30
9/30
10/30
11/30
12/30
◮ P(Swedish) = 4
6
◮ P(Blond) = 3
6
◮ P(Swedish, Blond) ≃ P(Swedish) × P(Blond) = 2
6 = 0.333
◮ P(Blond | Swedish) = 3
4
◮ P(Swedish, Blond) = P(Blond | Swedish) × P(Swedish) =
3×4 4×6 = 0.5
13/30
14/30
15/30
◮ 2D conditional distributions (1D for the root) ◮ Low memory footprint ◮ Low inference time
16/30
17/30
18/30
◮ The “textbook approach” used by PostgreSQL ◮ Bernoulli sampling ◮ Bayesian networks (our method)
◮ Time needed to build the model ◮ Accuracy of the cardinality estimates ◮ Time needed to produce estimates ◮ Values needed to store each model
19/30
20/30
21/30
22/30
23/30
24/30
25/30
26/30
27/30
28/30
29/30
30/30