[PDF] - OverviewofQueryEvaluation Chapter12 PDF Document, free download

SLIDE 1

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 1

OverviewofQueryEvaluation

Chapter12

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 2

OverviewofQueryEvaluation

Plan: TreeofR.A.ops,withchoiceof alg foreachop.

✁

Eachoperatortypicallyimplementedusinga`pull’ interface:whenanoperatoris`pulled’forthenextoutput tuples,it`pulls’onitsinputsandcomputesthem.

Twomainissuesinqueryoptimization:

✁

Foragivenquery,whatplansareconsidered?

Algorithmtosearchplanspaceforcheapest(estimated)plan.

✁

Howisthecostofaplanestimated?

Ideally:Wanttofindbestplan.Practically:Avoid

worstplans!

WewillstudytheSystemRapproach.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 3

SomeCommonTechniques

Algorithmsforevaluatingrelationaloperators

usesomesimpleideasextensively:

✂

Indexing: CanuseWHEREconditionstoretrieve smallsetof tuples (selections,joins)

✂

Iteration: Sometimes,fastertoscanall tuples evenif thereisanindex.(Andsometimes,wecanscanthe dataentriesinanindexinsteadofthetableitself.)

✂

Partitioning: Byusingsortingorhashing,wecan partitiontheinput tuples andreplaceanexpensive

perationbysimilaroperationsonsmallerinputs.

*Watchforthesetechniquesaswediscussqueryevaluation!

SLIDE 2

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 4

StatisticsandCatalogs

Needinformationabouttherelationsandindexes

involved.Catalogs typicallycontainatleast:

✁

# tuples (NTuples)and#pages(NPages)foreachrelation.

✁

#distinctkeyvalues(NKeys)and NPages foreachindex.

✁

Indexheight,low/highkeyvalues(Low/High)foreach treeindex.

Catalogsupdatedperiodically.

✁

Updatingwheneverdatachangesistooexpensive;lotsof approximationanyway,soslightinconsistencyok.

Moredetailedinformation(e.g.,histogramsofthe

valuesinsomefield)aresometimesstored.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 5

AccessPaths

Anaccesspath isamethodofretrieving tuples:

✂

Filescan,orindex thatmatches aselection(inthequery)

Atreeindexmatches (aconjunctionof)termsthat

involveonlyattributesinaprefix ofthesearchkey.

✁

E.g.,Treeindexon<a,b,c>matchestheselection a=5

AND b=3,anda=5AND b>6,butnot b=3.

Ahashindexmatches (aconjunctionof)termsthat

hasatermattribute=value foreveryattributeinthe searchkeyoftheindex.

✁

E.g.,Hashindexon<a,b,c>matchesa=5AND b=3AND c=5;butitdoesnotmatch b=3,or a=5AND b=3,or a>5

AND b=3AND c=5.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 6

ANoteonComplexSelections

Selectionconditionsarefirstconvertedtoconjunctive

normalform(CNF): (day<8/9/94OR bid=5OR sid=3)AND (rname=‘Paul’OR bid=5OR sid=3)

Weonlydiscusscasewithno ORs;seetextifyouare

curiousaboutthegeneralcase. (day<8/9/94AND rname=‘Paul’)OR bid=5OR sid=3

SLIDE 3

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 7

OneApproachtoSelections

Findthemostselectiveaccesspath,retrieve tuples using

it,andapplyanyremainingtermsthatdon’tmatch theindex:

✁

Mostselectiveaccesspath:Anindexorfilescanthatwe estimatewillrequirethefewestpageI/Os.

✁

Termsthatmatchthisindexreducethenumberof tuples retrieved;othertermsareusedtodiscardsomeretrieved tuples,butdonotaffectnumberof tuples/pagesfetched.

✁

Considerday<8/9/94ANDbid=5AND sid=3. AB+tree indexon daycanbeused;then,bid=5 and sid=3mustbe checkedforeachretrieved tuple.Similarly,ahashindexon <bid, sid>couldbeused;day<8/9/94 mustthenbechecked.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 8

UsinganIndexforSelections

Costdependson#qualifying tuples,and

clustering.

✁

Costoffindingqualifyingdataentries(typicallysmall) pluscostofretrievingrecords(couldbelargew/o clustering).

✁

Inexample,assuminguniformdistributionofnames, about10%of tuples qualify(100pages,10000 tuples). Withaclusteredindex,costislittlemorethan100I/Os; if unclustered, upto 10000I/Os!

SELECT * FROM

ReservesR

WHERE

R.rname <‘C%’

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 9

Projection

Theexpensivepartisremovingduplicates.

✂

SQLsystemsdon’tremoveduplicatesunlessthekeyword DISTINCTisspecifiedinaquery.

SortingApproach:Sorton<sid,bid>andremove

duplicates.(Canoptimizethisbydroppingunwanted informationwhilesorting.)

HashingApproach:Hashon<sid,bid>tocreate

partitions.Loadpartitionsintomemoryoneata time,buildin-memoryhashstructure,andeliminate duplicates.

IfthereisanindexwithbothR.sid andR.bidinthe

searchkey,maybecheapertosortdataentries!

SELECT DISTINCT

R.sid,R.bid

FROM

ReservesR

SLIDE 4

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 10

Join:IndexNestedLoops

Ifthereisanindexonthejoincolumnofonerelation

(sayS),canmakeittheinnerandexploittheindex.

✁

Cost:M+((MpR)costoffindingmatchingS tuples)

ForeachR tuple,costofprobingSindexisabout1.2

forhashindex,2-4forB+tree.CostofthenfindingS tuples (assumingAlt.(2)or(3)fordataentries) dependsonclustering.

✁

Clusteredindex:1I/O(typical), unclustered: upto 1I/O permatchingS tuple. foreachtuple rinRdo foreachtuple sinSwhere ri == sj do add<r,s>toresult

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 11

ExamplesofIndexNestedLoops

Hash-index(Alt.2)on sid ofSailors(asinner):

✁

ScanReserves:1000pageI/Os,100*1000 tuples.

✁

ForeachReserves tuple:1.2I/Ostogetdataentryin index,plus1I/Otoget(theexactlyone)matchingSailors tuple.Total:220,000I/Os.

Hash-index(Alt.2)on sid ofReserves(asinner):

✁

ScanSailors:500pageI/Os,80*500 tuples.

✁

ForeachSailors tuple:1.2I/Ostofindindexpagewith dataentries,pluscostofretrievingmatchingReserves tuples.Assuminguniformdistribution,2.5reservations persailor(100,000/40,000).Costofretrievingthemis1or 2.5I/Osdependingonwhethertheindexisclustered.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 12

Join:Sort-Merge(RS)

SortRandSonthejoincolumn,thenscanthemtodo

a``merge’’(onjoincol.),andoutputresult tuples.

✁

AdvancescanofRuntilcurrentR-tuple >=currentS tuple, thenadvancescanofSuntilcurrentS-tuple >=currentR tuple;dothisuntilcurrentR tuple =currentS tuple.

✁

Atthispoint,allR tuples withsamevaluein Ri (currentR group)andallS tuples withsamevaluein Sj (currentS group)match;output<r,s>forallpairsofsuch tuples.

✁

ThenresumescanningRandS.

Risscannedonce;eachSgroupisscannedonceper

matchingR tuple.(MultiplescansofanSgroupare likelytofindneededpagesinbuffer.)

i=j

SLIDE 5

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 13

ExampleofSort-MergeJoin

Cost:MlogM+NlogN+(M+N)

✁

Thecostofscanning,M+N,couldbeM*N(veryunlikely!)

With35,100or300bufferpages,bothReservesand

Sailorscanbesortedin2passes;totaljoincost:7500. sid sname rating age 22 dustin 7 45.0 28 yuppy 9 35.0 31 lubber 8 55.5 44 guppy 5 35.0 58 rusty 10 35.0 sid bid day rname 28 103 12/4/96 guppy 28 103 11/3/96 yuppy 31 101 10/10/96 dustin 31 102 10/12/96 lubber 31 101 10/11/96 lubber 58 103 11/12/96 dustin

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 14

HighlightsofSystemROptimizer

Impact:

✁

Mostwidelyusedcurrently;workswellfor<10joins.

Costestimation:Approximateartatbest.

✁

Statistics,maintainedinsystemcatalogs,usedtoestimate costofoperationsandresultsizes.

✁

ConsiderscombinationofCPUandI/Ocosts.

PlanSpace:Toolarge,mustbepruned.

✁

Onlythespaceofleft-deepplansisconsidered.

Left-deepplansallowoutputofeachoperatortobepipelined into

thenextoperatorwithoutstoringitinatemporaryrelation.

✁

Cartesianproductsavoided.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 15

CostEstimation

Foreachplanconsidered,mustestimatecost:

✁

Mustestimatecost ofeachoperationinplantree.

Dependsoninputcardinalities.
We’vealreadydiscussedhowtoestimatethecostof
perations(sequentialscan,indexscan,joins,etc.)

✁

Mustalsoestimatesizeofresultforeachoperation intree!

Useinformationabouttheinputrelations.
Forselectionsandjoins,assumeindependenceof

predicates.

SLIDE 6

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 16

SizeEstimationandReductionFactors

Consideraqueryblock:
Maximum# tuples inresultistheproductofthe

cardinalitiesofrelationsintheFROM clause.

Reductionfactor(RF)associatedwitheach term reflects

theimpactoftheterm inreducingresultsize.Result cardinality =Max# tuples *productofall RF’s.

✁

Implicitassumption thatterms areindependent!

✁

Term col=valuehasRF1/NKeys(I),givenindexIon col

✁

Termcol1=col2hasRF1/MAX(NKeys(I1), NKeys(I2))

✁

Term col>valuehasRF(High(I)-value)/(High(I)-Low(I))

SELECT attributelist FROM relationlist WHERE term1AND ...AND termk

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 17

SchemaforExamples

Similartooldschema; rname addedforvariations.
Reserves:

✁

Each tuple is40byteslong,100 tuples perpage,1000pages.

Sailors:

✁

Each tuple is50byteslong,80 tuples perpage,500pages. Sailors(sid:integer, sname:string,rating:integer,age:real) Reserves(sid:integer,bid:integer,day:dates, rname:string)

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 18

MotivatingExample

Cost:500+500*1000I/Os
Bynomeanstheworstplan!
Missesseveralopportunities:

selectionscouldhavebeen `pushed’earlier,nouseismade

fanyavailableindexes,etc.
Goalofoptimization:Tofindmore

efficientplansthatcomputethe sameanswer.

SELECT S.sname FROM ReservesR,SailorsS WHERE R.sid=S.sid AND

R.bid=100AND S.rating>5

Reserves Sailors

sid=sid bid=100 rating>5 sname

Reserves Sailors

sid=sid bid=100 rating>5 sname

(SimpleNestedLoops) (On-the-fly) (On-the-fly)

RATree: Plan:

SLIDE 7

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 19

AlternativePlans1 (NoIndexes)

Maindifference:pushselects.
With5buffers,costofplan:
ScanReserves(1000)+writetempT1(10pages,ifwehave100boats,

uniformdistribution).

ScanSailors(500)+writetempT2(250pages,ifwehave10ratings).
SortT1(2*2*10),sortT2(2*3*250),merge(10+250)
Total:3560pageI/Os.
IfweusedBNLjoin, joincost=10+4*250,totalcost=2770.
Ifwe`push’projections,T1hasonly sid,T2only sid and sname:
T1fitsin3pages,costofBNLdropstounder250pages,total<2000.

Reserves Sailors

sid=sid bid=100 sname(On-the-fly) rating>5

(Scan; writeto tempT1) (Scan; writeto tempT2) (Sort-MergeJoin)

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 20

AlternativePlans2 WithIndexes

Withclusteredindexonbidof

Reserves,weget100,000/100= 1000 tuples on1000/100=10pages.

INLwithpipelining (outerisnot

materialized).

v Decisionnottopushrating>5beforethejoinisbasedon

availabilityof sid indexonSailors.

v Cost:SelectionofReserves tuples (10I/Os);foreach,

mustgetmatchingSailors tuple (1000*1.2);total1210I/Os.

v Joincolumn sid isakeyforSailors. –Atmostonematching tuple, unclustered indexon sid OK. –Projectingoutunnecessaryfieldsfromouterdoesn’thelp.

Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating>5 (Usehash index;do notwrite resultto temp) (IndexNestedLoops, withpipelining) (On-the-fly) DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 21

Summary

Thereareseveralalternativeevaluationalgorithmsforeach

relationaloperator.

Aqueryisevaluatedbyconvertingittoatreeofoperatorsand

evaluatingtheoperatorsinthetree.

Mustunderstandqueryoptimizationinordertofully

understandtheperformanceimpactofagivendatabasedesign (relations,indexes)onaworkload(setofqueries).

Twopartstooptimizingaquery:
Considerasetofalternativeplans.
Mustprunesearchspace;typically,left-deepplansonly.
Mustestimatecostofeachplanthatisconsidered.
Mustestimatesizeofresultandcostforeachplannode.
Keyissues:Statistics,indexes,operatorimplementations.

OverviewofQueryEvaluation

Chapter12

OverviewofQueryEvaluation

Eachoperatortypicallyimplementedusinga`pull’ interface:whenanoperatoris`pulled’forthenextoutput tuples,it`pulls’onitsinputsandcomputesthem.

Foragivenquery,whatplansareconsidered?

Howisthecostofaplanestimated?

worstplans!

SomeCommonTechniques

usesomesimpleideasextensively:

Indexing: CanuseWHEREconditionstoretrieve smallsetof tuples (selections,joins)

Iteration: Sometimes,fastertoscanall tuples evenif thereisanindex.(Andsometimes,wecanscanthe dataentriesinanindexinsteadofthetableitself.)

Partitioning: Byusingsortingorhashing,wecan partitiontheinput tuples andreplaceanexpensive

*Watchforthesetechniquesaswediscussqueryevaluation!

StatisticsandCatalogs

involved.Catalogs typicallycontainatleast:

# tuples (NTuples)and#pages(NPages)foreachrelation.

#distinctkeyvalues(NKeys)and NPages foreachindex.

Indexheight,low/highkeyvalues(Low/High)foreach treeindex.

Updatingwheneverdatachangesistooexpensive;lotsof approximationanyway,soslightinconsistencyok.

valuesinsomefield)aresometimesstored.

AccessPaths

Filescan,orindex thatmatches aselection(inthequery)

involveonlyattributesinaprefix ofthesearchkey.

E.g.,Treeindexon<a,b,c>matchestheselection a=5

hasatermattribute=value foreveryattributeinthe searchkeyoftheindex.

E.g.,Hashindexon<a,b,c>matchesa=5AND b=3AND c=5;butitdoesnotmatch b=3,or a=5AND b=3,or a>5

ANoteonComplexSelections

normalform(CNF): (day<8/9/94OR bid=5OR sid=3)AND (rname=‘Paul’OR bid=5OR sid=3)

curiousaboutthegeneralcase. (day<8/9/94AND rname=‘Paul’)OR bid=5OR sid=3

OneApproachtoSelections

it,andapplyanyremainingtermsthatdon’tmatch theindex:

Mostselectiveaccesspath:Anindexorfilescanthatwe estimatewillrequirethefewestpageI/Os.

Termsthatmatchthisindexreducethenumberof tuples retrieved;othertermsareusedtodiscardsomeretrieved tuples,butdonotaffectnumberof tuples/pagesfetched.

Considerday<8/9/94ANDbid=5AND sid=3. AB+tree indexon daycanbeused;then,bid=5 and sid=3mustbe checkedforeachretrieved tuple.Similarly,ahashindexon <bid, sid>couldbeused;day<8/9/94 mustthenbechecked.

UsinganIndexforSelections

clustering.

Costoffindingqualifyingdataentries(typicallysmall) pluscostofretrievingrecords(couldbelargew/o clustering).

Inexample,assuminguniformdistributionofnames, about10%of tuples qualify(100pages,10000 tuples). Withaclusteredindex,costislittlemorethan100I/Os; if unclustered, upto 10000I/Os!

ReservesR

R.rname <‘C%’

Projection

SQLsystemsdon’tremoveduplicatesunlessthekeyword DISTINCTisspecifiedinaquery.

duplicates.(Canoptimizethisbydroppingunwanted informationwhilesorting.)

partitions.Loadpartitionsintomemoryoneata time,buildin-memoryhashstructure,andeliminate duplicates.

searchkey,maybecheapertosortdataentries!

R.sid,R.bid

ReservesR

Join:IndexNestedLoops

(sayS),canmakeittheinnerandexploittheindex.

Cost:M+((M*pR)*costoffindingmatchingS tuples)

forhashindex,2-4forB+tree.CostofthenfindingS tuples (assumingAlt.(2)or(3)fordataentries) dependsonclustering.

Clusteredindex:1I/O(typical), unclustered: upto 1I/O permatchingS tuple. foreachtuple rinRdo foreachtuple sinSwhere ri == sj do add<r,s>toresult

ExamplesofIndexNestedLoops

ScanReserves:1000pageI/Os,100*1000 tuples.

ForeachReserves tuple:1.2I/Ostogetdataentryin index,plus1I/Otoget(theexactlyone)matchingSailors tuple.Total:220,000I/Os.

ScanSailors:500pageI/Os,80*500 tuples.

ForeachSailors tuple:1.2I/Ostofindindexpagewith dataentries,pluscostofretrievingmatchingReserves tuples.Assuminguniformdistribution,2.5reservations persailor(100,000/40,000).Costofretrievingthemis1or 2.5I/Osdependingonwhethertheindexisclustered.

Join:Sort-Merge(RS)

a``merge’’(onjoincol.),andoutputresult tuples.

AdvancescanofRuntilcurrentR-tuple >=currentS tuple, thenadvancescanofSuntilcurrentS-tuple >=currentR tuple;dothisuntilcurrentR tuple =currentS tuple.

Atthispoint,allR tuples withsamevaluein Ri (currentR group)andallS tuples withsamevaluein Sj (currentS group)match;output<r,s>forallpairsofsuch tuples.

ThenresumescanningRandS.

matchingR tuple.(MultiplescansofanSgroupare likelytofindneededpagesinbuffer.)

ExampleofSort-MergeJoin

Thecostofscanning,M+N,couldbeM*N(veryunlikely!)

HighlightsofSystemROptimizer

Mostwidelyusedcurrently;workswellfor<10joins.

Statistics,maintainedinsystemcatalogs,usedtoestimate costofoperationsandresultsizes.

ConsiderscombinationofCPUandI/Ocosts.

Onlythespaceofleft-deepplansisconsidered.

Cartesianproductsavoided.

CostEstimation

Mustestimatecost ofeachoperationinplantree.

Mustalsoestimatesizeofresultforeachoperation intree!

SizeEstimationandReductionFactors

cardinalitiesofrelationsintheFROM clause.

theimpactoftheterm inreducingresultsize.Result cardinality =Max# tuples *productofall RF’s.

Implicitassumption thatterms areindependent!

Term col=valuehasRF1/NKeys(I),givenindexIon col

Termcol1=col2hasRF1/MAX(NKeys(I1), NKeys(I2))

Cost:M+((MpR)costoffindingmatchingS tuples)