When to Optimize Enumerating all possible plans Selection Pushdown - PDF document

When to Optimize Enumerating all possible plans Selection Pushdown Join Conversion Join Reordering Pick a Join Algo Which Plan is the Best? Always push down selections Always convert joins Which join order??? Which join algo? What makes a plan the best? Idea 2: IO Cost Idea 1: CPU Cost IO Cost Overview How do we measure IO Cost? Number of reads performed by each operator Number of writes performed by each operator What about communicating between operators? Assume operators can communicate with each other for free. Costs only include: The cost of materializing the data IF it needs to be materialized on disk The cost of reading the data back in IF it needs to be read back in. What else do we need? For some of these estimates, we’ll need to be able to estimate the size of each table (call the # of pages in R: |R|) Basic properties of the data: Key Columns Distribution of Values IO Costs File Scan (R) Number of IOs : |R| Selection ( σ (R)) Number of IOs : 0 (never need to materialize a selection) Index Lookup ( σ (R) where R is a file scan) Number of IOs for a Hash Index : | σ (R)| How big is this? Return to it later. Number of IOs for a B+Tree Index with directory pages of size B: | σ (R)| + log B (|R|) Projection ( π (R)) Number of IOs : 0 (never need to materialize a projection) Union Number of IOs : 0 (never need to materialize a BAG union — see distinct for set union) Sort ( τ (R)) — External Sort with B pages of memory

Number of IOs : ~2 • log B (|R| / 2) Cross-Product (R x S) — BNLJ with B pages of memory for blocking R Number of IOs : |S| + (|R| / B) • (|S|) Need to write all of S to disk once: |S| pages Repeat (|R| / B) times… Read B pages of data from source operator R: Free Join the block with the materialized data in S, one tuple at a time: |S| Join (R ⋈ S) — 1-pass Hash/Tree Join Number of IOs: 0 (entirely in-memory) Join (R ⋈ S) — 2-pass Hash Join Number of IOs: 2 • (|R| + |S|) Write all |R| and |S| to disk, bucketizing: |R| + |S| Read in each bucket: |R| + |S| Join ( τ (R) ⋈ τ (S)) — Sort/Merge Join Number of IOs: 0 + cost of the τ (S) (Merge step is free) Join (R ⋈ R.A = S.A S) — Index Nested Loop Join (assuming index on S) Number of IOs: |R| • [ cost of one index lookup: σ [const] = S.A (S) ] Each inner loop is basically one Index Scan Aggregation ( ɣ (R)) — In-memory Number of IOs: 0 Aggregation ( ɣ (R)) — On-Disk, Hash-Based Number of IOs: 2|R| Write each bucket out, read each bucket in Aggregation ( ɣ ( τ (R)) — On-Disk, Sort-Based Number of IOs: 0 + cost of τ (R) Distinct ( δ (R))— Works EXACTLY like Aggregation Cardinality (Size) Estimation Most of the operators are straightforward π (R), τ (R) : |R| R U S : |R| + |S| R x S : |R| * |S| R ⋈ S : Identical to σ (R x S)… Some are hard σ (R) ɣ (R) & δ (R) Selection : Compute Selectivity (or % tuples passed through) Generic (Default) Heuristic: Selectivity = 0.5 Works … mostly well 70% of the time. Very brittle and liable to break things Be wary : DBMSes actually do this! R.A = [Const] If R.A is a Key, then precisely 1 tuple passes through… given Idea : Collect stats: # of distinct values Selectivity = 1 / # of distinct values of R.A

Works well… but only for discrete data (Strings, Ints, Dates) Also gives you “Key” for free Also works for R.A in [List] R.A < [Const] (also works for others) Idea : Collect stats: Min/Max, and assume a uniform distribution of values Selectivity = ([Const] - Min) / (Max - Min) Works for continuous data (Floats) R.A = R.B (the Equijoin condition) Idea 1 : Assume no correlation Becomes identical to either R.A = const or R.B = const For each row, you’re testing whether R.B = Some specific, somewhat arbitrary value Both are an upper bound on the selectivity, so take whichever reduction gives you the lower value C1 AND C2 Assuming no correlation between C1 and C2: Selectivity(C1) • Selectivity(C2) Going more fancy: Histograms (See attached)

When to Optimize Enumerating all possible plans Selection Pushdown - PDF document

When to Optimize Enumerating all possible plans Selection Pushdown Join Conversion Join Reordering Pick a Join Algo Which Plan is the Best? Always push down selections Always convert joins Which join order??? Which join algo? What makes a

MINUTE OPTIMIZE YOUR PH MONITORING OPTIMIZE WITH HAVING CHALLENGES MEASURING

AVOIDING THE CRASH: AVOIDING THE CRASH 1: DONT INTUBATE , OPTIMIZE OPTIMIZE YOUR PRE, PERI,

Dont Optimize my Queries; Optimize my Data! Julian Hyde DataEngConf NYC 2017/10/30

Learning to Optimize as Policy Learning Yisong Yue Policy Learning (Reinforcement &

OPTIMIZE YOUR PAGES, LEVERAGE YOUR BUSINESS CONTAINS ALL VIDEO SLIDEDECKS FOR THIS SESSION 1

VSAM P ERFORMANCE S UITE Optimize VSAM performance with this powerful suite of tools from CSI

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Graham Neubig 1,2,3 ,

Optimizing zlib for A deflated story Adenilson Cavalcanti BS. MSc. Staff Engineer - Arm San Jose

Online Algorithms for Rent or Buy with Expert Advice Sreenivas Gollapudi Debmalya Panigrahi How

An introduction to A/B testing using a Google Optimize example Juan M. Fonseca-Sol s

AutoTVM & Device Fleet ` Learning to Optimize Tensor Programs Frameworks High-level data

Strategies to Optimize Heart Failure Treatment: New Insights and Challenges Harleen Singh,

their ability to fully capture in vivo enzyme activity. Goal Express and optimize the bacterial

HOW TO OPTIMIZE EQ And Become a Successful IA Change Catalyst J A N N I E S S . B U R L I N G A

Welcome to the program Utilizing Physiology to Optimize Patient Care: A Best- Practices Guide

Carnitol Stimulate energy metabolism Optimize overall performances of animals Carnitol, for

Joining Ranked Input In Practice Ihab F. Ilyas Purdue University Joint work with Walid G. Aref

Network Query Engines Network Query Engines Craig Knoblock USC Information Sciences Institute 1

SQL Workshop Joins Doug Shook Inner Joins Joins are used to combine data from multiple

Relational Operators Select Evaluating Relational Operators: Project Part II Join

Relational algebra with discriminative naively Relational algebra, joins and lazy products

An Accurate Join for Zonotopes, Preserving Affine Input/Output Relations Eric Goubault, Tristan

Top-K Queries Marcin Kwietniewski Agenda Introduction Early solution Translation

SQL Queries 1 / 28 The SELECT-FROM-WHERE Structure SELECT <attributes > FROM <tables

When to Optimize Enumerating all possible plans Selection Pushdown - PDF document

When to Optimize Enumerating all possible plans Selection Pushdown Join Conversion Join Reordering Pick a Join Algo Which Plan is the Best? Always push down selections Always convert joins Which join order??? Which join algo? What makes a

MINUTE OPTIMIZE YOUR PH MONITORING OPTIMIZE WITH HAVING CHALLENGES MEASURING

AVOIDING THE CRASH: AVOIDING THE CRASH 1: DONT INTUBATE , OPTIMIZE OPTIMIZE YOUR PRE, PERI,

Dont Optimize my Queries; Optimize my Data! Julian Hyde DataEngConf NYC 2017/10/30

Learning to Optimize as Policy Learning Yisong Yue Policy Learning (Reinforcement &amp;

OPTIMIZE YOUR PAGES, LEVERAGE YOUR BUSINESS CONTAINS ALL VIDEO SLIDEDECKS FOR THIS SESSION 1

VSAM P ERFORMANCE S UITE Optimize VSAM performance with this powerful suite of tools from CSI

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Graham Neubig 1,2,3 ,

Optimizing zlib for A deflated story Adenilson Cavalcanti BS. MSc. Staff Engineer - Arm San Jose

Online Algorithms for Rent or Buy with Expert Advice Sreenivas Gollapudi Debmalya Panigrahi How

An introduction to A/B testing using a Google Optimize example Juan M. Fonseca-Sol s

AutoTVM &amp; Device Fleet ` Learning to Optimize Tensor Programs Frameworks High-level data

Strategies to Optimize Heart Failure Treatment: New Insights and Challenges Harleen Singh,

their ability to fully capture in vivo enzyme activity. Goal Express and optimize the bacterial

HOW TO OPTIMIZE EQ And Become a Successful IA Change Catalyst J A N N I E S S . B U R L I N G A

Welcome to the program Utilizing Physiology to Optimize Patient Care: A Best- Practices Guide

Carnitol Stimulate energy metabolism Optimize overall performances of animals Carnitol, for

Joining Ranked Input In Practice Ihab F. Ilyas Purdue University Joint work with Walid G. Aref

Network Query Engines Network Query Engines Craig Knoblock USC Information Sciences Institute 1

SQL Workshop Joins Doug Shook Inner Joins Joins are used to combine data from multiple

Relational Operators Select Evaluating Relational Operators: Project Part II Join

Relational algebra with discriminative naively Relational algebra, joins and lazy products

An Accurate Join for Zonotopes, Preserving Affine Input/Output Relations Eric Goubault, Tristan

Top-K Queries Marcin Kwietniewski Agenda Introduction Early solution Translation

SQL Queries 1 / 28 The SELECT-FROM-WHERE Structure SELECT &lt;attributes &gt; FROM &lt;tables

Learning to Optimize as Policy Learning Yisong Yue Policy Learning (Reinforcement &

AutoTVM & Device Fleet ` Learning to Optimize Tensor Programs Frameworks High-level data

SQL Queries 1 / 28 The SELECT-FROM-WHERE Structure SELECT <attributes > FROM <tables