Depth Estimation for Ranking Query Optimization - PowerPoint PPT Presentation

Depth Estimation for Ranking Query Optimization Karl�Schnaitter,�UC�Santa�Cruz Joshua�Spiegel,�BEA�Systems,�Inc. Neoklis�Polyzotis,�UC�Santa�Cruz

Relational Ranking Queries SELECT h.hid, r.rid, e.eid FROM Hotels h, Restaurants r, Events e WHERE h.city = r.city AND r.city = e.city RANK BY 0.3/h.price + 0.5*r.rating + 0.2*isMusic(e) LIMIT 10 • A� �� for�each�table�in�[0,1] � � (h) = 1/h.price Hotels: � � (r) = r.rating Restaurants: � � (e) = isMusic(e) Events: • Combined�with�a� �� ( � � ,� � � ,� � � )�=�0.3* � � +�0.5* � � +�0.2* � � • Return�top� �� results�based�on� � In�this�case,� � =�10 2

Ranking Query Execution SELECT h.hid, r.rid, e.eid FROM Hotels h, Restaurants r, Events e WHERE h.city = r.city AND r.city = e.city RANK BY 0.3/h.price + 0.5*r.rating + 0.2*isMusic(e) LIMIT 10 conventional�plan rank4aware�plan �� Rank join Join Rank join � Join � � � � � Ordered 3 by score

Depth Estimation • �� : number�of�accessed�tuples – Indicates�execution�cost – Linked�to�memory�consumption Rank join �� • �� Estimate�depths�for� each operator�in�a�rank4aware�plan 4

Depth Estimation Methods • Ilyas�et�al.�(SIGMOD�2004) – Uses�probabilistic�model�of�data – Assumes�relations�of�equal�size�and�a� scoring�function�that�sums�scores – Limited�applicability • Li�et�al.�(SIGMOD�2005) – Samples�a�subset�of�rows�from�each�table – Independent�samples�give�a�poor�model� of�join�results 5

Our Solution: DEEP • �� pth� � stimation�for� � hysical�plans • Strengths�of�DEEP – A�principled methodology • Uses statistical�model�of�data�distribution • Formally�computes�depth�over�statistics – Efficient�estimation�algorithms – Widely�applicable • Works�with state4of4the4art�physical�plans • Realizable�with�common�data�synopses 6

Outline • Preliminaries • DEEP�Framework • Experimental�Results 7

Monotonic Functions • A�function� � ( � 1 ,..., � � )�is� �� if � � ( � � ≤y � )� � � ( � 1 ,..., � � )�≤ � (y 1 ,...,y � ) f(x) x 9

Monotonic Functions • A�function� � ( � 1 ,..., � � )�is� �� if � � ( � � ≤y � )� � � ( � 1 ,..., � � )�≤ � (y 1 ,...,y � ) • Most�scoring�functions�are�monotonic – E.g.�sum,�product,�avg,�max,�min • Monotonicity�enables�bound�on�score – In�example�query,�score�was 0.3/h.price�+�0.5*r.rating�+�0.2*isMusic(e) – Given�a�restaurant� � ,�upper�bound�is 0.3*1 +�0.5*r.rating�+�0.2*1 10

Hash Rank Join [IAE04] • The� �� algorithm – Joins�inputs�sorted�by�score – Returns�results�with�highest�score • Main�ideas – Alternate�between�inputs�based�on� ��! – Score�bounds�allow�early�termination Bound: 1.8 Bound: 1.7 L a b L R a b R Query: Top result from L R with scoring function x 1.0 y 1.0 S ( b L , b R ) = b L + b R y 0.8 z 0.9 Result: y w 0.7 Score: 1.8 11

HRJN* [IAE04] • The�HRJN*�pull�strategy: a) Pull�from�the�input�with�highest�bound b) If�(a)�is�a�tie,�pull�from�input�with�the� smaller�number�of�pulls�so�far c) If�(b)�is�a�tie,�pull�from�the�left Bound: 2.0 1.8 Bound: 2.0 1.9 1.7 Query: Top result from L R L a b L R a b R with scoring function x 1.0 y 1.0 S ( b L , b R ) = b L + b R y 0.8 z 0.9 Result: y � ? w 0.7 Score: 1.8 12

Supported Operators Evidence�in�favor�of�HRJN* – Pull�strategy�has�strong�properties� • Within�constant�factor�of�optimal�cost • Optimal�for�a�significant�class�of�inputs • More�details�in�the�paper – Efficient�in experiments�[IAE04] � DEEP�explicitly�supports�HRJN* – Easily�extended�to�other�join�operators – Selection�operators�too 14

DEEP: Conceptual View �� defined in Depth terms of Statistical Computation Data Model �� defined in Estimation terms of Statistics Data Algorithms Interface Synopsis 15

Statistics Model • Statistics�yield�the�distribution�of� scores�for�base�tables�and�joins � " � " ( � " ) F L 1.0 5 0.9 2 F L R b L b R F L R ( b L , b R ) 0.8 3 1.0 1.0 6 0.6 12 1.0 0.5 4 0.4 8 1.0 0.7 3 0.9 0.7 2 � � � � ( � � ) F R 0.6 0.7 2 1.0 3 0.7 1 0.5 2 16

Statistics Interface • DEEP�accesses�statistics�with�two�methods – ��# ( � ):�Return�frequency�of� � – �� ( � , � ):�Return�next�lowest�score�on�dimension� � b L b R F L R ( b L , b R ) 1.0 1.0 6 ��# ( � )�=�3 1.0 0.5 4 �� ( � ,1)=0.9 � 1.0 0.7 3 �� ( � ,2)=0.5 0.9 0.7 2 0.6 0.7 2 • The interface�allows�for�efficient�algorithms – Abstracts�the�physical statistics�format – Allows�statistics�to�be�generated�on4the4fly 17

Statistics Implementation • Interface�can�be�implemented�over� common�types�of�data�synopses • Can�use�a�histogram�if a) Base�score�function�is�invertible,�or b) Base�score�measures�distance • Assume�uniformity�&�independence�if a) Base�score�function�is�too�complex,�or b) Sufficient�statistics�are�not�available 18

Depth Estimation Overview Top4 � query�plan Estimates�made Value Score of the k th best s 1 1. s 1 tuple out of 1 1 2. Depths of needed l 1 and r 1 l 1 r 1 1 to output score of s 1 th best C s 2 3. Score of the l 1 s 2 tuple out of 2 2 l 2 r 2 4. Depths of needed l 2 and r 2 2 to output score of s 2 B A 19

Estimating Terminal Score • Suppose�we�want� • Idea the�10 th best�score – Sort�by�total�score – Sum�frequencies b L b R F L R ( b L , b R ) b L + b R F L R ( b L ,b R ) sum 1.0 1.0 6 2.0 6 6 1.0 0.5 4 1.7 3 9 1.0 0.7 3 1.6 2 11 0.9 0.7 2 1.5 4 0.6 0.7 2 1.3 2 � term =�1.6 20

Estimation Algorithm • Idea:�Only�process�necessary�statistics 1 0.7 0.5 b L b R F L R ( b L , b R ) 6 3 4 1 1.0 1.0 6 2 0.9 1.0 0.5 4 1.0 0.7 3 0.8 0.9 0.7 2 2 0.6 0.7 2 0.6 � term =�1.6 • Algorithm�relies�solely�on� ��# and� �� – Avoids�materializing�complete�table • Worst4case�complexity�equivalent�to�sorting�table – More�efficient�in�practice 21

Depth Estimation Overview Top4 � query�plan Estimates�made Value Score of the k th best s 1 1. s 1 tuple out of 1 1 2. Depths of needed l 1 and r 1 l 1 r 1 1 to output score of s 1 th best C s 2 3. Score of the l 1 s 2 tuple out of 2 2 l 2 r 2 4. Depths of needed l 2 and r 2 2 to output score of s 2 B A 22

Estimating Depth for HRJN* �� ≤ ≤ depth�of�HRJN*� ≤ $ ≤ ≤ Input Score Bounds Example: � term =�1.6 >�S term � " � " ( � " ) � " + 1 � " ( � " ) > S term � > S term 1.0 5 2.0 5 >�S term 1.9 2 0.9 2 $ =�S term 1.8 3 0.8 3 = S term 1.6 4 0.6 12 =�S term 0.4 8 1.4 8 <�S term 11� ≤ depth� ≤ 15 <�S term • Estimation�algorithm <�S term < S term – Access�via� ��# and� �� <�S term – Similar�to�estimation�of � term 23

Experimental Setting • TPC4H�data�set – Total�size�of�1�GB – Varying�amount�of�skew • Workloads�of�250�queries – Top410,�top4100,�top41000�queries – One�or�two�joins�per�query • Error�metric:� ��%�� 25

Depth Estimation Techniques • DEEP – Uses 150�KB�TuG�synopsis�[SP06] • Probabilistic�[IAE04] – Uses�same�TuG�synopsis – Modified�to�handle�single4join�queries� with�varying�table�sizes • Sampling�[LCIS05] – 5%�sample�=�4.6�MB 26

Depth Estimation for Ranking Query Optimization - PowerPoint PPT Presentation

Depth Estimation for Ranking Query Optimization KarlSchnaitter,UCSantaCruz JoshuaSpiegel,BEASystems,Inc. NeoklisPolyzotis,UCSantaCruz Relational Ranking Queries SELECT h.hid, r.rid, e.eid FROM Hotels h,

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

for each dst in my.out_edges if dst.depth > my.depth+1 then dst.depth = my.depth+1

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query Optimization Through the Looking Glass Some Lessons From Building an LLVM-Based Query

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth

Introduction Query Execution Engine Implements a set of physical operators 2 key

CSE 232A Graduate Database Systems Arun Kumar Topic 4: Query Optimization Chapters 12 and

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

SAMPLING-BASED QUERY RE- OPTIMIZATION Wentao Wu Microsoft Research Background 2 Query

The MariaDB/MySQL Query Executor In-depth Presented by: Timour Katchaounov Optimizer team: Igor

Creating Shared Value to End Extreme Poverty with Science, Technology, Innovation, and

CSE 232A Database System Implementation Arun Kumar Topic 8: Data Systems for ML Workloads

Experiences and Perspectives from Applying MBSE in Manufacturing

Lectures 1&2: Introduction to Supply Chain Management Supply Chain Management Quality

GaN HEMT Reliability J. A. del Alamo and J. Joh Microsystems Technology Laboratories, MIT ESREF

Laboratoire Kastler Brossel Collge de France, ENS, UPMC, CNRS Introduction to Ultracold Atoms

Partitioning Introduction to Partitioning Mahapatra-Texas A&M-Spring02 1 System

The Pumping Lemma for Regular Languages The Pumping Lemma forRegular Languages p.1/39

Depth Estimation for Ranking Query Optimization - PowerPoint PPT Presentation

Depth Estimation for Ranking Query Optimization KarlSchnaitter,UCSantaCruz JoshuaSpiegel,BEASystems,Inc. NeoklisPolyzotis,UCSantaCruz Relational Ranking Queries SELECT h.hid, r.rid, e.eid FROM Hotels h,

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

for each dst in my.out_edges if dst.depth &gt; my.depth+1 then dst.depth = my.depth+1

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query Optimization Through the Looking Glass Some Lessons From Building an LLVM-Based Query

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth

Introduction Query Execution Engine Implements a set of physical operators 2 key

CSE 232A Graduate Database Systems Arun Kumar Topic 4: Query Optimization Chapters 12 and

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

SAMPLING-BASED QUERY RE- OPTIMIZATION Wentao Wu Microsoft Research Background 2 Query

The MariaDB/MySQL Query Executor In-depth Presented by: Timour Katchaounov Optimizer team: Igor

Creating Shared Value to End Extreme Poverty with Science, Technology, Innovation, and

CSE 232A Database System Implementation Arun Kumar Topic 8: Data Systems for ML Workloads

Experiences and Perspectives from Applying MBSE in Manufacturing

Lectures 1&amp;2: Introduction to Supply Chain Management Supply Chain Management Quality

GaN HEMT Reliability J. A. del Alamo and J. Joh Microsystems Technology Laboratories, MIT ESREF

Laboratoire Kastler Brossel Collge de France, ENS, UPMC, CNRS Introduction to Ultracold Atoms

Partitioning Introduction to Partitioning Mahapatra-Texas A&amp;M-Spring02 1 System

The Pumping Lemma for Regular Languages The Pumping Lemma forRegular Languages p.1/39

for each dst in my.out_edges if dst.depth > my.depth+1 then dst.depth = my.depth+1

Lectures 1&2: Introduction to Supply Chain Management Supply Chain Management Quality

Partitioning Introduction to Partitioning Mahapatra-Texas A&M-Spring02 1 System