Chapter 9 Cardinality Estimation How Many Rows Does a Query Yield? - PowerPoint PPT Presentation

Cardinality Estimation Torsten Grust Chapter 9 Cardinality Estimation How Many Rows Does a Query Yield? Cardinality Estimation Database Profiles Assumptions Architecture and Implementation of Database Systems Estimating Operator Summer 2016 Cardinality Selection σ Projection π Set Operations ∪ , \ , × Join � Histograms Equi-Width Equi-Depth Statistical Views Torsten Grust Wilhelm-Schickard-Institut für Informatik Universität Tübingen 1

Cardinality Estimation Cardinality Estimation Torsten Grust Web Forms Applications SQL Interface SQL Commands Cardinality Estimation Executor Parser Database Profiles Assumptions Operator Evaluator Optimizer Estimating Operator Cardinality Selection σ Files and Access Methods Projection π Transaction Set Operations ∪ , \ , × Manager Join � Recovery Buffer Manager Histograms Manager Lock Equi-Width Equi-Depth Manager Disk Space Manager Statistical Views DBMS data files, indices, . . . Database 2

Cardinality Estimation Cardinality Estimation Torsten Grust • A relational query optimizer performs a phase of cost-based plan search to identify the—presumably—“cheapest” Cardinality Estimation alternative among a a set of equivalent execution plans Database Profiles Assumptions ( ր Chapter on Query Optimization). Estimating Operator • Since page I/O cost dominates, the estimated cardinality Cardinality Selection σ of a (sub-)query result is crucial input to this search. Projection π Set Operations ∪ , \ , × • Cardinality typically measured in pages or rows. Join � Histograms • Cardinality estimates are also valuable when it comes to Equi-Width Equi-Depth buffer “right-sizing” before query evaluation starts ( e.g. , Statistical Views allocate B buffer pages and determine blocking factor b for external sort). 3

Cardinality Estimation Estimating Query Result Cardinality Torsten Grust There are two principal approaches to query cardinality estimation: 1 Database Profile. Maintain statistical information about numbers and sizes of Cardinality Estimation tuples , distribution of attribute values for base relations , as Database Profiles part of the database catalog (meta information) during Assumptions database updates . Estimating Operator Cardinality • Calculate these parameters for intermediate query Selection σ Projection π results based upon a (simple) statistical model during Set Operations ∪ , \ , × Join � query optimization. Histograms Equi-Width • Typically, the statistical model is based upon the Equi-Depth Statistical Views uniformity and independence assumptions . • Both are typically not valid , but they allow for � simple calculations ⇒ limited accuracy . • In order to improve accuracy, the system can record histograms to more closely model the actual value distributions in relations. 4

Cardinality Estimation Estimating Query Result Cardinality Torsten Grust 2 Sampling Techniques. Cardinality Estimation Gather the necessary characteristics of a query plan (base Database Profiles relations and intermediate results) at query execution time : Assumptions Estimating Operator • Run query on a small sample of the input. Cardinality • Extrapolate to the full input size. Selection σ Projection π Set Operations ∪ , \ , × Join � • It is crucial to find the right balance between sample Histograms Equi-Width size and the resulting accuracy. Equi-Depth Statistical Views These slides focus on 1 Database Profiles. 5

Cardinality Estimation Database Profiles Torsten Grust Keep profile information in the database catalog . Update whenever SQL DML commands are issued (database updates): Cardinality Estimation Typical database profile for relation R Database Profiles Assumptions Estimating Operator Cardinality | R | number of records in relation R Selection σ Projection π N R number of disk pages allocated for these records Set Operations ∪ , \ , × s ( R ) average record size (width) Join � Histograms V ( A , R ) number of distinct values of attribute A Equi-Width MCV ( A , R ) most common values of attribute A Equi-Depth Statistical Views MCF ( A , R ) frequency of most common values of attribute A . . . possibly many more 6

Cardinality Estimation Database Profiles: IBM DB2 Torsten Grust Excerpt of IBM DB2 catalog information for a TPC-H database 1 db2 => SELECT TABNAME, CARD, NPAGES 2 db2 (cont.) => FROM SYSCAT.TABLES Cardinality Estimation 3 db2 (cont.) => WHERE TABSCHEMA = ’TPCH’; Database Profiles Assumptions 4 TABNAME CARD NPAGES Estimating Operator 5 -------------- -------------------- -------------------- Cardinality Selection σ 6 ORDERS 1500000 44331 Projection π 7 CUSTOMER 150000 6747 Set Operations ∪ , \ , × 8 NATION 25 2 Join � 9 REGION 5 1 Histograms Equi-Width 10 PART 200000 7578 Equi-Depth 11 SUPPLIER 10000 406 Statistical Views 12 PARTSUPP 800000 31679 13 LINEITEM 6001215 207888 8 record(s) selected. 14 • Note : Column CARD ≡ | R | , column NPAGES ≡ N R . 7

Cardinality Estimation Database Profile: Assumptions Torsten Grust In order to obtain tractable cardinality estimation formulae, assume one of the following : Uniformity & independence (simple, yet rarely realistic) All values of an attribute uniformly appear with the same Cardinality Estimation probability (even distribution). Values of different attributes are Database Profiles Assumptions independent of each other. Estimating Operator Cardinality Selection σ Worst case (unrealistic) Projection π Set Operations ∪ , \ , × No knowledge about relation contents at all. In case of a Join � Histograms selection σ p , assume all records will satisfy predicate p . Equi-Width Equi-Depth (May only be used to compute upper bounds of expected cardinality.) Statistical Views Perfect knowledge (unrealistic) Details about the exact distribution of values are known. Requires huge catalog or prior knowledge of incoming queries. (May only be used to compute lower bounds of expected cardinality.) 8

Cardinality Estimation Cardinality Estimation for σ (Equality Predicate) Torsten Grust • Database systems typically operate under the uniformity assumption . We will come across this assumption multiple times below. Cardinality Estimation Database Profiles Assumptions Estimating Operator Query: Q ≡ σ A = c ( R ) Cardinality Selection σ Projection π Selectivity sel ( A = c ) MCF ( A , R )[ c ] if c ∈ MCV ( A , R ) Set Operations ∪ , \ , × Join � � Histograms Equi-Width Selectivity sel ( A = c ) 1 / V ( A , R ) Uniformity Equi-Depth Statistical Views Cardinality | Q | sel ( A = c ) · | R | Record size s ( Q ) s ( R ) 9

Cardinality Estimation Selectivity Estimation for σ (Other Predicates) Torsten Grust • Equality between attributes ( Q ≡ σ A = B ( R ) ): Approximate selectivity by sel ( A = B ) = 1 / max ( V ( A , R ) , V ( B , R )) . Cardinality Estimation (Assumes that each value of the attribute with fewer Database Profiles distinct values has a corresponding � Assumptions match in the other attribute.) Independence Estimating Operator Cardinality • Range selections ( Q = σ A > c ( R ) ): Selection σ Projection π In the database profile, maintain the minimum and Set Operations ∪ , \ , × Join � maximum value of attribute A in relation R , Low ( A , R ) and Histograms High ( A , R ) . Equi-Width � Equi-Depth Approximate selectivity by Uniformity Statistical Views sel ( A > c ) = High ( A , R ) − c  High ( A , R ) − Low ( A , R ) , Low ( A , R ) � c � High ( A , R )  0 , otherwise  10

Cardinality Estimation Cardinality Estimation for π Torsten Grust • For Q ≡ π L ( R ) , estimating the number of result rows is difficult ( L = � A 1 , A 2 , . . . , A n � : list of projection attributes): Cardinality Estimation Database Profiles Q ≡ π L ( R ) (duplicate elimination) Assumptions  Estimating Operator V ( A , R ) , if L = � A � Cardinality    Selection σ  | R | , if keys of R ∈ L  Projection π Cardinality | Q | Set Operations ∪ , \ , × | R | , no dup. elim. Join �    Histograms  � | R | , � � min A i ∈ L V ( A i , R ) , otherwise  Equi-Width � Equi-Depth Independence Statistical Views � Record size s ( Q ) A i ∈ L s ( A i ) 11

Cardinality Estimation Cardinality Estimation for ∪ , \ , × Torsten Grust Q ≡ R ∪ S | Q | | R | + | S | � s ( Q ) = s ( R ) = s ( S ) schemas of R , S identical Cardinality Estimation Database Profiles Assumptions Q ≡ R \ S Estimating Operator Cardinality Selection σ Projection π max ( 0 , | R | − | S | ) | Q | | R | � � Set Operations ∪ , \ , × Join � s ( Q ) = s ( R ) = s ( S ) Histograms Equi-Width Equi-Depth Q ≡ R × S Statistical Views | Q | = | R | · | S | s ( Q ) = s ( R ) + s ( S ) � V ( A , R ) , for A ∈ R V ( A , Q ) = V ( A , S ) , for A ∈ S 12

Chapter 9 Cardinality Estimation How Many Rows Does a Query Yield? - PowerPoint PPT Presentation

Cardinality Estimation Torsten Grust Chapter 9 Cardinality Estimation How Many Rows Does a Query Yield? Cardinality Estimation Database Profiles Assumptions Architecture and Implementation of Database Systems Estimating Operator Summer

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 11/27/2006 Chapter 13

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 Inheritance Concepts

Chapter 13 Chapter 13 1 What is this? Chapter 13 2 What is this? Chapter 13 3 What is

CHAPTER CHAPTER VII CHAPTER CHAPTER VII VII VII MANAGEMENT AND MANAGEMENT AND

Appendix A Chapter 9 versus Chapter 1 1 at a Glance Chapter 9 Chapter 1 1 ( I n) voluntary Cannot

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Pushdown Automata Chapter 5 Chapter 5 Chapter 5 Chapter 5

Chapter 6 Programme design and development Lets Recap Chapter 2: Chapter 3: Chapter 1:

OWASP London Chapter Meeting 27th July 2017 London Chapter Chapter Leaders: Sam

Constraint Satisfaction Problem s C t i t S ti f ti P bl Reading: Chapter 6 (3 rd ed );

Chapter 3 Chapter 3 Data Description McGraw-Hill, Bluman, 7 th ed, Chapter 3 1 Ch Chapter 3

OWASP London Chapter Meeting 23rd November 2017 London Chapter Chapter Leaders: Sam

A.I.S. Class 22: Outline I Learning Objectives for Chapter 8 I Chapter 8 Quiz I New ACCESS Features

A.I.S. Class 27: Outline I Learning Objectives for Chapter 8 I Chapter 8 Quiz I New ACCESS Features

Chapters for the Final Exam Chapter 20: Electric forces and fields (Conceptual Questions) Chapter

Chapter: 9 9 9 9 Chapter: Chapter: Chapter: High-Speed Downlink High-Speed Downlink Packet

Take your Wor ord tem emplates s to o the he ne next xt le level! l! Doc ocuments

CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun

1 Automating Machine Learning and Deep Learning Workflows 2 Information Name: Mourad

HOW TO SET UP A COMMUNITY-BASED ENVIRONMENTAL HEALTH SENSING PROJECT Dawn Nafus @dawnnafus

(R)DBMSs today DMBS Approx. market share (Microsoft Access) Database Systems Oracle

Trivariate Density Revisited Steven E. Shreve Carnegie Mellon University Conference in

CLIC CLient-Informed Caching for Storage Servers Xin Liu Ashraf Aboulnaga Ken Salem Xuhui Li

Dr. Pat Selinger IBM Fellow and VP of Data Management Architecture and Technology IBM, USA DB2