 
              Cardinality Estimation Torsten Grust Chapter 9 Cardinality Estimation How Many Rows Does a Query Yield? Cardinality Estimation Database Profiles Assumptions Architecture and Implementation of Database Systems Estimating Operator Summer 2016 Cardinality Selection σ Projection π Set Operations ∪ , \ , × Join � Histograms Equi-Width Equi-Depth Statistical Views Torsten Grust Wilhelm-Schickard-Institut für Informatik Universität Tübingen 1
Cardinality Estimation Cardinality Estimation Torsten Grust Web Forms Applications SQL Interface SQL Commands Cardinality Estimation Executor Parser Database Profiles Assumptions Operator Evaluator Optimizer Estimating Operator Cardinality Selection σ Files and Access Methods Projection π Transaction Set Operations ∪ , \ , × Manager Join � Recovery Buffer Manager Histograms Manager Lock Equi-Width Equi-Depth Manager Disk Space Manager Statistical Views DBMS data files, indices, . . . Database 2
Cardinality Estimation Cardinality Estimation Torsten Grust • A relational query optimizer performs a phase of cost-based plan search to identify the—presumably—“cheapest” Cardinality Estimation alternative among a a set of equivalent execution plans Database Profiles Assumptions ( ր Chapter on Query Optimization). Estimating Operator • Since page I/O cost dominates, the estimated cardinality Cardinality Selection σ of a (sub-)query result is crucial input to this search. Projection π Set Operations ∪ , \ , × • Cardinality typically measured in pages or rows. Join � Histograms • Cardinality estimates are also valuable when it comes to Equi-Width Equi-Depth buffer “right-sizing” before query evaluation starts ( e.g. , Statistical Views allocate B buffer pages and determine blocking factor b for external sort). 3
Cardinality Estimation Estimating Query Result Cardinality Torsten Grust There are two principal approaches to query cardinality estimation: 1 Database Profile. Maintain statistical information about numbers and sizes of Cardinality Estimation tuples , distribution of attribute values for base relations , as Database Profiles part of the database catalog (meta information) during Assumptions database updates . Estimating Operator Cardinality • Calculate these parameters for intermediate query Selection σ Projection π results based upon a (simple) statistical model during Set Operations ∪ , \ , × Join � query optimization. Histograms Equi-Width • Typically, the statistical model is based upon the Equi-Depth Statistical Views uniformity and independence assumptions . • Both are typically not valid , but they allow for � simple calculations ⇒ limited accuracy . • In order to improve accuracy, the system can record histograms to more closely model the actual value distributions in relations. 4
Cardinality Estimation Estimating Query Result Cardinality Torsten Grust 2 Sampling Techniques. Cardinality Estimation Gather the necessary characteristics of a query plan (base Database Profiles relations and intermediate results) at query execution time : Assumptions Estimating Operator • Run query on a small sample of the input. Cardinality • Extrapolate to the full input size. Selection σ Projection π Set Operations ∪ , \ , × Join � • It is crucial to find the right balance between sample Histograms Equi-Width size and the resulting accuracy. Equi-Depth Statistical Views These slides focus on 1 Database Profiles. 5
Cardinality Estimation Database Profiles Torsten Grust Keep profile information in the database catalog . Update whenever SQL DML commands are issued (database updates): Cardinality Estimation Typical database profile for relation R Database Profiles Assumptions Estimating Operator Cardinality | R | number of records in relation R Selection σ Projection π N R number of disk pages allocated for these records Set Operations ∪ , \ , × s ( R ) average record size (width) Join � Histograms V ( A , R ) number of distinct values of attribute A Equi-Width MCV ( A , R ) most common values of attribute A Equi-Depth Statistical Views MCF ( A , R ) frequency of most common values of attribute A . . . possibly many more 6
Cardinality Estimation Database Profiles: IBM DB2 Torsten Grust Excerpt of IBM DB2 catalog information for a TPC-H database 1 db2 => SELECT TABNAME, CARD, NPAGES 2 db2 (cont.) => FROM SYSCAT.TABLES Cardinality Estimation 3 db2 (cont.) => WHERE TABSCHEMA = ’TPCH’; Database Profiles Assumptions 4 TABNAME CARD NPAGES Estimating Operator 5 -------------- -------------------- -------------------- Cardinality Selection σ 6 ORDERS 1500000 44331 Projection π 7 CUSTOMER 150000 6747 Set Operations ∪ , \ , × 8 NATION 25 2 Join � 9 REGION 5 1 Histograms Equi-Width 10 PART 200000 7578 Equi-Depth 11 SUPPLIER 10000 406 Statistical Views 12 PARTSUPP 800000 31679 13 LINEITEM 6001215 207888 8 record(s) selected. 14 • Note : Column CARD ≡ | R | , column NPAGES ≡ N R . 7
Cardinality Estimation Database Profile: Assumptions Torsten Grust In order to obtain tractable cardinality estimation formulae, assume one of the following : Uniformity & independence (simple, yet rarely realistic) All values of an attribute uniformly appear with the same Cardinality Estimation probability (even distribution). Values of different attributes are Database Profiles Assumptions independent of each other. Estimating Operator Cardinality Selection σ Worst case (unrealistic) Projection π Set Operations ∪ , \ , × No knowledge about relation contents at all. In case of a Join � Histograms selection σ p , assume all records will satisfy predicate p . Equi-Width Equi-Depth (May only be used to compute upper bounds of expected cardinality.) Statistical Views Perfect knowledge (unrealistic) Details about the exact distribution of values are known. Requires huge catalog or prior knowledge of incoming queries. (May only be used to compute lower bounds of expected cardinality.) 8
Cardinality Estimation Cardinality Estimation for σ (Equality Predicate) Torsten Grust • Database systems typically operate under the uniformity assumption . We will come across this assumption multiple times below. Cardinality Estimation Database Profiles Assumptions Estimating Operator Query: Q ≡ σ A = c ( R ) Cardinality Selection σ Projection π Selectivity sel ( A = c ) MCF ( A , R )[ c ] if c ∈ MCV ( A , R ) Set Operations ∪ , \ , × Join � � Histograms Equi-Width Selectivity sel ( A = c ) 1 / V ( A , R ) Uniformity Equi-Depth Statistical Views Cardinality | Q | sel ( A = c ) · | R | Record size s ( Q ) s ( R ) 9
Cardinality Estimation Selectivity Estimation for σ (Other Predicates) Torsten Grust • Equality between attributes ( Q ≡ σ A = B ( R ) ): Approximate selectivity by sel ( A = B ) = 1 / max ( V ( A , R ) , V ( B , R )) . Cardinality Estimation (Assumes that each value of the attribute with fewer Database Profiles distinct values has a corresponding � Assumptions match in the other attribute.) Independence Estimating Operator Cardinality • Range selections ( Q = σ A > c ( R ) ): Selection σ Projection π In the database profile, maintain the minimum and Set Operations ∪ , \ , × Join � maximum value of attribute A in relation R , Low ( A , R ) and Histograms High ( A , R ) . Equi-Width � Equi-Depth Approximate selectivity by Uniformity Statistical Views sel ( A > c ) = High ( A , R ) − c  High ( A , R ) − Low ( A , R ) , Low ( A , R ) � c � High ( A , R )  0 , otherwise  10
Cardinality Estimation Cardinality Estimation for π Torsten Grust • For Q ≡ π L ( R ) , estimating the number of result rows is difficult ( L = � A 1 , A 2 , . . . , A n � : list of projection attributes): Cardinality Estimation Database Profiles Q ≡ π L ( R ) (duplicate elimination) Assumptions  Estimating Operator V ( A , R ) , if L = � A � Cardinality    Selection σ  | R | , if keys of R ∈ L  Projection π Cardinality | Q | Set Operations ∪ , \ , × | R | , no dup. elim. Join �    Histograms  � | R | , � � min A i ∈ L V ( A i , R ) , otherwise  Equi-Width � Equi-Depth Independence Statistical Views � Record size s ( Q ) A i ∈ L s ( A i ) 11
Cardinality Estimation Cardinality Estimation for ∪ , \ , × Torsten Grust Q ≡ R ∪ S | Q | | R | + | S | � s ( Q ) = s ( R ) = s ( S ) schemas of R , S identical Cardinality Estimation Database Profiles Assumptions Q ≡ R \ S Estimating Operator Cardinality Selection σ Projection π max ( 0 , | R | − | S | ) | Q | | R | � � Set Operations ∪ , \ , × Join � s ( Q ) = s ( R ) = s ( S ) Histograms Equi-Width Equi-Depth Q ≡ R × S Statistical Views | Q | = | R | · | S | s ( Q ) = s ( R ) + s ( S ) � V ( A , R ) , for A ∈ R V ( A , Q ) = V ( A , S ) , for A ∈ S 12
Recommend
More recommend