Depth Estimation for Ranking Query Optimization - - PowerPoint PPT Presentation

depth estimation for ranking query optimization
SMART_READER_LITE
LIVE PREVIEW

Depth Estimation for Ranking Query Optimization - - PowerPoint PPT Presentation

Depth Estimation for Ranking Query Optimization KarlSchnaitter,UCSantaCruz JoshuaSpiegel,BEASystems,Inc. NeoklisPolyzotis,UCSantaCruz Relational Ranking Queries SELECT h.hid, r.rid, e.eid FROM Hotels h,


slide-1
SLIDE 1

Depth Estimation for Ranking Query Optimization

KarlSchnaitter,UCSantaCruz JoshuaSpiegel,BEASystems,Inc. NeoklisPolyzotis,UCSantaCruz

slide-2
SLIDE 2

2

Relational Ranking Queries

  • A foreachtablein[0,1]
  • Combinedwitha

(,,)=0.3* +0.5* +0.2*

  • Returntopresultsbasedon

Inthiscase, =10

RANK BY 0.3/h.price + 0.5*r.rating + 0.2*isMusic(e) LIMIT 10 (e) = isMusic(e) Events: (r) = r.rating Restaurants: (h) = 1/h.price Hotels: SELECT h.hid, r.rid, e.eid FROM Hotels h, Restaurants r, Events e WHERE h.city = r.city AND r.city = e.city

slide-3
SLIDE 3

3

Ranking Query Execution

SELECT h.hid, r.rid, e.eid FROM Hotels h, Restaurants r, Events e WHERE h.city = r.city AND r.city = e.city RANK BY 0.3/h.price + 0.5*r.rating + 0.2*isMusic(e) LIMIT 10

  • conventionalplan

rank4awareplan

  • Ordered

by score

Rank join Rank join Join Join

slide-4
SLIDE 4

4

Depth Estimation

  • : numberofaccessedtuples

– Indicatesexecutioncost – Linkedtomemoryconsumption

  • Estimatedepthsfor

each operatorinarank4awareplan

  • Rank join
slide-5
SLIDE 5

5

Depth Estimation Methods

  • Ilyasetal.(SIGMOD2004)

– Usesprobabilisticmodelofdata – Assumesrelationsofequalsizeanda scoringfunctionthatsumsscores – Limitedapplicability

  • Lietal.(SIGMOD2005)

– Samplesasubsetofrowsfromeachtable – Independentsamplesgiveapoormodel

  • fjoinresults
slide-6
SLIDE 6

6

Our Solution: DEEP

  • pthstimationforhysicalplans
  • StrengthsofDEEP

– Aprincipled methodology

  • Uses statisticalmodelofdatadistribution
  • Formallycomputesdepthoverstatistics

– Efficientestimationalgorithms – Widelyapplicable

  • Workswith state4of4the4artphysicalplans
  • Realizablewithcommondatasynopses
slide-7
SLIDE 7

7

  • Preliminaries
  • DEEPFramework
  • ExperimentalResults

Outline

slide-8
SLIDE 8

8

  • Preliminaries
  • DEEPFramework
  • ExperimentalResults

Outline

slide-9
SLIDE 9

9

Monotonic Functions

x f(x)

  • Afunction(1,...,)is if

(≤y) (1,...,)≤ (y1,...,y)

slide-10
SLIDE 10

10

Monotonic Functions

  • Afunction(1,...,)is if

(≤y) (1,...,)≤ (y1,...,y)

  • Mostscoringfunctionsaremonotonic

– E.g.sum,product,avg,max,min

  • Monotonicityenablesboundonscore

– Inexamplequery,scorewas

0.3/h.price+0.5*r.rating+0.2*isMusic(e)

– Givenarestaurant,upperboundis

0.3*1 +0.5*r.rating+0.2*1

slide-11
SLIDE 11

11

Hash Rank Join [IAE04]

  • The algorithm

– Joinsinputssortedbyscore – Returnsresultswithhighestscore

  • Mainideas

– Alternatebetweeninputsbasedon! – Scoreboundsallowearlytermination

— — — 0.8 y 1.0 x bL a L — — — 0.7 w 0.9 z 1.0 y bR a R Query: Top result from L R with scoring function S(bL, bR) = bL + bR Result: y Score: 1.8

Bound: 1.8 Bound: 1.7

slide-12
SLIDE 12

12

HRJN* [IAE04]

  • TheHRJN*pullstrategy:

a) Pullfromtheinputwithhighestbound b) If(a)isatie,pullfrominputwiththe smallernumberofpullssofar c) If(b)isatie,pullfromtheleft

bL a L bR a R

Bound: 2.0 Bound: 2.0 1.8 1.9 1.7

x 1.0 y 0.8 y 1.0 z 0.9 w 0.7 Result: y Score: 1.8 Query: Top result from L R with scoring function S(bL, bR) = bL + bR

?

slide-13
SLIDE 13

13

  • Preliminaries
  • DEEPFramework
  • ExperimentalResults

Outline

slide-14
SLIDE 14

14

Supported Operators

EvidenceinfavorofHRJN*

– Pullstrategyhasstrongproperties

  • Withinconstantfactorofoptimalcost
  • Optimalforasignificantclassofinputs
  • Moredetailsinthepaper

– Efficientin experiments[IAE04]

DEEPexplicitlysupportsHRJN*

– Easilyextendedtootherjoinoperators – Selectionoperatorstoo

slide-15
SLIDE 15

15

DEEP: Conceptual View

Depth Computation Statistical Data Model

defined in terms of

  • Estimation

Algorithms Statistics Interface

defined in terms of

  • Data

Synopsis

slide-16
SLIDE 16

16

Statistics Model

  • Statisticsyieldthedistributionof

scoresforbasetablesandjoins

1.0 0.5 0.7 0.7 0.7 bR 6 4 3 2 2 FL

R(bL,bR)

1.0 1.0 1.0 0.9 0.6 bL 5 2 3 12 8 "(") 1.0 0.9 0.8 0.6 0.4 " 3 1 2 () 1.0 0.7 0.5

  • FR

FL FL R

slide-17
SLIDE 17

17

Statistics Interface

  • DEEPaccessesstatisticswithtwomethods

– #():Returnfrequencyof – (,):Returnnextlowestscoreondimension

#()=3 (,1)=0.9 (,2)=0.5

1.0 0.5 0.7 0.7 0.7 bR 6 4 3 2 2 FL R(bL,bR) 1.0 1.0 1.0 0.9 0.6 bL

  • The interfaceallowsforefficientalgorithms

– Abstractsthephysical statisticsformat – Allowsstatisticstobegeneratedon4the4fly

slide-18
SLIDE 18

18

Statistics Implementation

  • Interfacecanbeimplementedover

commontypesofdatasynopses

  • Canuseahistogramif

a) Basescorefunctionisinvertible,or b) Basescoremeasuresdistance

  • Assumeuniformity&independenceif

a) Basescorefunctionistoocomplex,or b) Sufficientstatisticsarenotavailable

slide-19
SLIDE 19

19

Depth Estimation Overview

Value Estimatesmade

2 2 1 1

1 2

Top4 queryplan A B C

l2 r2 r1 l1 s2 s1 Score of the kth best tuple out of s1 1. Depths of needed to output score of s1 l1 and r1 2. Score of the l1

th best

tuple out of s2 3. Depths of needed to output score of s2 l2 and r2 4.

slide-20
SLIDE 20

20

  • Idea

– Sortbytotalscore – Sumfrequencies

Estimating Terminal Score

  • Supposewewant

the10th bestscore

sum FL R(bL,bR) bL + bR 6 9 11 1.0 0.5 0.7 0.7 0.7 bR 6 4 3 2 2 FL

R(bL,bR)

1.0 1.0 1.0 0.9 0.6 bL 6 3 2 4 2 2.0 1.7 1.6 1.5 1.3 term =1.6

slide-21
SLIDE 21

21

Estimation Algorithm

1 0.7 0.5 6 4 1 0.9 0.8 0.6 3 2 2

  • Idea:Onlyprocessnecessarystatistics

1.0 0.5 0.7 0.7 0.7 bR 6 4 3 2 2 FL

R(bL,bR)

1.0 1.0 1.0 0.9 0.6 bL

  • Algorithmreliessolelyon# and

– Avoidsmaterializingcompletetable

  • Worst4casecomplexityequivalenttosortingtable

– Moreefficientinpractice

term =1.6

slide-22
SLIDE 22

22

Depth Estimation Overview

Value Estimatesmade

2 2 1 1

1 2

Top4 queryplan A B C

l2 r2 r1 l1 s2 s1 Score of the kth best tuple out of s1 1. Depths of needed to output score of s1 l1 and r1 2. Score of the l1

th best

tuple out of s2 3. Depths of needed to output score of s2 l2 and r2 4.

slide-23
SLIDE 23

23

Estimating Depth for HRJN*

Example: term =1.6

11≤ depth≤ 15 5 2 3 12 8 "(") 1.0 0.9 0.8 0.6 0.4 "

<Sterm < Sterm <Sterm <Sterm <Sterm =Sterm = Sterm =Sterm >Sterm > Sterm > Sterm >Sterm Input Score Bounds

≤ ≤ ≤ depthofHRJN*≤ $

  • $
  • Estimationalgorithm

– Accessvia# and – Similartoestimationof term

5 2 3 4 8 "(") 2.0 1.9 1.8 1.6 1.4 " + 1

slide-24
SLIDE 24

24

  • Preliminaries
  • DEEPFramework
  • ExperimentalResults

Outline

slide-25
SLIDE 25

25

Experimental Setting

  • TPC4Hdataset

– Totalsizeof1GB – Varyingamountofskew

  • Workloadsof250queries

– Top410,top4100,top41000queries – Oneortwojoinsperquery

  • Errormetric:%
slide-26
SLIDE 26

26

Depth Estimation Techniques

  • DEEP

– Uses 150KBTuGsynopsis[SP06]

  • Probabilistic[IAE04]

– UsessameTuGsynopsis – Modifiedtohandlesingle4joinqueries withvaryingtablesizes

  • Sampling[LCIS05]

– 5%sample=4.6MB

slide-27
SLIDE 27

27

Error for Varying Skew

342% 3% 12% 2% 18% 16% 44% 39% 489% 501% 297%

1 10 100 1000 0.5 1 1.5 z DEEP Probabilistic Sampling

Zipfian Skew Parameter Percentage Error

slide-28
SLIDE 28

28

Error at Each Input

697% 683% 696% 679% 284% 695% 644% 28% 28% 28% 28% 28% 25% 35%

1 10 100 1000 10000 1 2 3 4 5 6 7 Input DEEP Sampling

Input of Two-Join Query

Percentage Error

slide-29
SLIDE 29

29

Conclusions

  • Depthestimationisnecessarytooptimize

relationalrankingqueries

  • DEEPisaprincipledandpractical solution

– Takesdatadistributionintoaccount – Appliestomanycommonscenarios – Integrates withdatasummarizationtechniques

  • NewtheoreticalresultsforHRJN*
  • Nextsteps

– Accuracyguarantees – Datasynopsesforcomplexbasescores (especiallytextpredicates)

slide-30
SLIDE 30

30

Thank You

slide-31
SLIDE 31

31

Related Work

  • Selectivityestimationisasimilaridea
  • Itistheinverseproblem

L R L R

depth? depth? selectivity = k depth = |L| selectivity? depth = |R|

Selectivity Estimation Depth Estimation

slide-32
SLIDE 32

32

Other Features

  • DEEPcanbeextendedtoNLRJand

selectionoperators

  • DEEPcanbeextendedtootherpulling

strategies

– Block4basedHRJN* – Block4basedalternation

slide-33
SLIDE 33

33

Analysis of HRJN*

  • WithintheclassofallHRJNvariants:

–HRJN*isoptimalfor manycases

  • Withnoties ofscoreboundbetweeninputs
  • Withno tiesofscoreboundwithinoneinput

–HRJN*isinstanceoptimal