OverviewofQueryEvaluation Chapter12 - - PDF document

overview of query evaluation
SMART_READER_LITE
LIVE PREVIEW

OverviewofQueryEvaluation Chapter12 - - PDF document

OverviewofQueryEvaluation Chapter12 DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 1 OverviewofQueryEvaluation Plan :


slide-1
SLIDE 1

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 1

OverviewofQueryEvaluation

Chapter12

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 2

OverviewofQueryEvaluation

  • Plan: TreeofR.A.ops,withchoiceof alg foreachop.

Eachoperatortypicallyimplementedusinga`pull’ interface:whenanoperatoris`pulled’forthenextoutput tuples,it`pulls’onitsinputsandcomputesthem.

  • Twomainissuesinqueryoptimization:

Foragivenquery,whatplansareconsidered?

  • Algorithmtosearchplanspaceforcheapest(estimated)plan.

Howisthecostofaplanestimated?

  • Ideally:Wanttofindbestplan.Practically:Avoid

worstplans!

  • WewillstudytheSystemRapproach.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 3

SomeCommonTechniques

  • Algorithmsforevaluatingrelationaloperators

usesomesimpleideasextensively:

Indexing: CanuseWHEREconditionstoretrieve smallsetof tuples (selections,joins)

Iteration: Sometimes,fastertoscanall tuples evenif thereisanindex.(Andsometimes,wecanscanthe dataentriesinanindexinsteadofthetableitself.)

Partitioning: Byusingsortingorhashing,wecan partitiontheinput tuples andreplaceanexpensive

  • perationbysimilaroperationsonsmallerinputs.

*Watchforthesetechniquesaswediscussqueryevaluation!

slide-2
SLIDE 2

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 4

StatisticsandCatalogs

  • Needinformationabouttherelationsandindexes

involved.Catalogs typicallycontainatleast:

# tuples (NTuples)and#pages(NPages)foreachrelation.

#distinctkeyvalues(NKeys)and NPages foreachindex.

Indexheight,low/highkeyvalues(Low/High)foreach treeindex.

  • Catalogsupdatedperiodically.

Updatingwheneverdatachangesistooexpensive;lotsof approximationanyway,soslightinconsistencyok.

  • Moredetailedinformation(e.g.,histogramsofthe

valuesinsomefield)aresometimesstored.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 5

AccessPaths

  • Anaccesspath isamethodofretrieving tuples:

Filescan,orindex thatmatches aselection(inthequery)

  • Atreeindexmatches (aconjunctionof)termsthat

involveonlyattributesinaprefix ofthesearchkey.

E.g.,Treeindexon<a,b,c>matchestheselection a=5

AND b=3,anda=5AND b>6,butnot b=3.

  • Ahashindexmatches (aconjunctionof)termsthat

hasatermattribute=value foreveryattributeinthe searchkeyoftheindex.

E.g.,Hashindexon<a,b,c>matchesa=5AND b=3AND c=5;butitdoesnotmatch b=3,or a=5AND b=3,or a>5

AND b=3AND c=5.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 6

ANoteonComplexSelections

  • Selectionconditionsarefirstconvertedtoconjunctive

normalform(CNF): (day<8/9/94OR bid=5OR sid=3)AND (rname=‘Paul’OR bid=5OR sid=3)

  • Weonlydiscusscasewithno ORs;seetextifyouare

curiousaboutthegeneralcase. (day<8/9/94AND rname=‘Paul’)OR bid=5OR sid=3

slide-3
SLIDE 3

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 7

OneApproachtoSelections

  • Findthemostselectiveaccesspath,retrieve tuples using

it,andapplyanyremainingtermsthatdon’tmatch theindex:

Mostselectiveaccesspath:Anindexorfilescanthatwe estimatewillrequirethefewestpageI/Os.

Termsthatmatchthisindexreducethenumberof tuples retrieved;othertermsareusedtodiscardsomeretrieved tuples,butdonotaffectnumberof tuples/pagesfetched.

Considerday<8/9/94ANDbid=5AND sid=3. AB+tree indexon daycanbeused;then,bid=5 and sid=3mustbe checkedforeachretrieved tuple.Similarly,ahashindexon <bid, sid>couldbeused;day<8/9/94 mustthenbechecked.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 8

UsinganIndexforSelections

  • Costdependson#qualifying tuples,and

clustering.

Costoffindingqualifyingdataentries(typicallysmall) pluscostofretrievingrecords(couldbelargew/o clustering).

Inexample,assuminguniformdistributionofnames, about10%of tuples qualify(100pages,10000 tuples). Withaclusteredindex,costislittlemorethan100I/Os; if unclustered, upto 10000I/Os!

SELECT * FROM

ReservesR

WHERE

R.rname <‘C%’

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 9

Projection

  • Theexpensivepartisremovingduplicates.

SQLsystemsdon’tremoveduplicatesunlessthekeyword DISTINCTisspecifiedinaquery.

  • SortingApproach:Sorton<sid,bid>andremove

duplicates.(Canoptimizethisbydroppingunwanted informationwhilesorting.)

  • HashingApproach:Hashon<sid,bid>tocreate

partitions.Loadpartitionsintomemoryoneata time,buildin-memoryhashstructure,andeliminate duplicates.

  • IfthereisanindexwithbothR.sid andR.bidinthe

searchkey,maybecheapertosortdataentries!

SELECT DISTINCT

R.sid,R.bid

FROM

ReservesR

slide-4
SLIDE 4

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 10

Join:IndexNestedLoops

  • Ifthereisanindexonthejoincolumnofonerelation

(sayS),canmakeittheinnerandexploittheindex.

Cost:M+((M*pR)*costoffindingmatchingS tuples)

  • ForeachR tuple,costofprobingSindexisabout1.2

forhashindex,2-4forB+tree.CostofthenfindingS tuples (assumingAlt.(2)or(3)fordataentries) dependsonclustering.

Clusteredindex:1I/O(typical), unclustered: upto 1I/O permatchingS tuple. foreachtuple rinRdo foreachtuple sinSwhere ri == sj do add<r,s>toresult

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 11

ExamplesofIndexNestedLoops

  • Hash-index(Alt.2)on sid ofSailors(asinner):

ScanReserves:1000pageI/Os,100*1000 tuples.

ForeachReserves tuple:1.2I/Ostogetdataentryin index,plus1I/Otoget(theexactlyone)matchingSailors tuple.Total:220,000I/Os.

  • Hash-index(Alt.2)on sid ofReserves(asinner):

ScanSailors:500pageI/Os,80*500 tuples.

ForeachSailors tuple:1.2I/Ostofindindexpagewith dataentries,pluscostofretrievingmatchingReserves tuples.Assuminguniformdistribution,2.5reservations persailor(100,000/40,000).Costofretrievingthemis1or 2.5I/Osdependingonwhethertheindexisclustered.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 12

Join:Sort-Merge(RS)

  • SortRandSonthejoincolumn,thenscanthemtodo

a``merge’’(onjoincol.),andoutputresult tuples.

AdvancescanofRuntilcurrentR-tuple >=currentS tuple, thenadvancescanofSuntilcurrentS-tuple >=currentR tuple;dothisuntilcurrentR tuple =currentS tuple.

Atthispoint,allR tuples withsamevaluein Ri (currentR group)andallS tuples withsamevaluein Sj (currentS group)match;output<r,s>forallpairsofsuch tuples.

ThenresumescanningRandS.

  • Risscannedonce;eachSgroupisscannedonceper

matchingR tuple.(MultiplescansofanSgroupare likelytofindneededpagesinbuffer.)

  • i=j
slide-5
SLIDE 5

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 13

ExampleofSort-MergeJoin

  • Cost:MlogM+NlogN+(M+N)

Thecostofscanning,M+N,couldbeM*N(veryunlikely!)

  • With35,100or300bufferpages,bothReservesand

Sailorscanbesortedin2passes;totaljoincost:7500. sid sname rating age 22 dustin 7 45.0 28 yuppy 9 35.0 31 lubber 8 55.5 44 guppy 5 35.0 58 rusty 10 35.0 sid bid day rname 28 103 12/4/96 guppy 28 103 11/3/96 yuppy 31 101 10/10/96 dustin 31 102 10/12/96 lubber 31 101 10/11/96 lubber 58 103 11/12/96 dustin

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 14

HighlightsofSystemROptimizer

  • Impact:

Mostwidelyusedcurrently;workswellfor<10joins.

  • Costestimation:Approximateartatbest.

Statistics,maintainedinsystemcatalogs,usedtoestimate costofoperationsandresultsizes.

ConsiderscombinationofCPUandI/Ocosts.

  • PlanSpace:Toolarge,mustbepruned.

Onlythespaceofleft-deepplansisconsidered.

  • Left-deepplansallowoutputofeachoperatortobepipelined into

thenextoperatorwithoutstoringitinatemporaryrelation.

Cartesianproductsavoided.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 15

CostEstimation

  • Foreachplanconsidered,mustestimatecost:

Mustestimatecost ofeachoperationinplantree.

  • Dependsoninputcardinalities.
  • We’vealreadydiscussedhowtoestimatethecostof
  • perations(sequentialscan,indexscan,joins,etc.)

Mustalsoestimatesizeofresultforeachoperation intree!

  • Useinformationabouttheinputrelations.
  • Forselectionsandjoins,assumeindependenceof

predicates.

slide-6
SLIDE 6

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 16

SizeEstimationandReductionFactors

  • Consideraqueryblock:
  • Maximum# tuples inresultistheproductofthe

cardinalitiesofrelationsintheFROM clause.

  • Reductionfactor(RF)associatedwitheach term reflects

theimpactoftheterm inreducingresultsize.Result cardinality =Max# tuples *productofall RF’s.

Implicitassumption thatterms areindependent!

Term col=valuehasRF1/NKeys(I),givenindexIon col

Termcol1=col2hasRF1/MAX(NKeys(I1), NKeys(I2))

Term col>valuehasRF(High(I)-value)/(High(I)-Low(I))

SELECT attributelist FROM relationlist WHERE term1AND ...AND termk

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 17

SchemaforExamples

  • Similartooldschema; rname addedforvariations.
  • Reserves:

Each tuple is40byteslong,100 tuples perpage,1000pages.

  • Sailors:

Each tuple is50byteslong,80 tuples perpage,500pages. Sailors(sid:integer, sname:string,rating:integer,age:real) Reserves(sid:integer,bid:integer,day:dates, rname:string)

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 18

MotivatingExample

  • Cost:500+500*1000I/Os
  • Bynomeanstheworstplan!
  • Missesseveralopportunities:

selectionscouldhavebeen `pushed’earlier,nouseismade

  • fanyavailableindexes,etc.
  • Goalofoptimization:Tofindmore

efficientplansthatcomputethe sameanswer.

SELECT S.sname FROM ReservesR,SailorsS WHERE R.sid=S.sid AND

R.bid=100AND S.rating>5

Reserves Sailors

sid=sid bid=100 rating>5 sname

Reserves Sailors

sid=sid bid=100 rating>5 sname

(SimpleNestedLoops) (On-the-fly) (On-the-fly)

RATree: Plan:

slide-7
SLIDE 7

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 19

AlternativePlans1 (NoIndexes)

  • Maindifference:pushselects.
  • With5buffers,costofplan:
  • ScanReserves(1000)+writetempT1(10pages,ifwehave100boats,

uniformdistribution).

  • ScanSailors(500)+writetempT2(250pages,ifwehave10ratings).
  • SortT1(2*2*10),sortT2(2*3*250),merge(10+250)
  • Total:3560pageI/Os.
  • IfweusedBNLjoin, joincost=10+4*250,totalcost=2770.
  • Ifwe`push’projections,T1hasonly sid,T2only sid and sname:
  • T1fitsin3pages,costofBNLdropstounder250pages,total<2000.

Reserves Sailors

sid=sid bid=100 sname(On-the-fly) rating>5

(Scan; writeto tempT1) (Scan; writeto tempT2) (Sort-MergeJoin)

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 20

AlternativePlans2 WithIndexes

  • Withclusteredindexonbidof

Reserves,weget100,000/100= 1000 tuples on1000/100=10pages.

  • INLwithpipelining (outerisnot

materialized).

v Decisionnottopushrating>5beforethejoinisbasedon

availabilityof sid indexonSailors.

v Cost:SelectionofReserves tuples (10I/Os);foreach,

mustgetmatchingSailors tuple (1000*1.2);total1210I/Os.

v Joincolumn sid isakeyforSailors. –Atmostonematching tuple, unclustered indexon sid OK. –Projectingoutunnecessaryfieldsfromouterdoesn’thelp.

Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating>5 (Usehash index;do notwrite resultto temp) (IndexNestedLoops, withpipelining) (On-the-fly) DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 21

Summary

  • Thereareseveralalternativeevaluationalgorithmsforeach

relationaloperator.

  • Aqueryisevaluatedbyconvertingittoatreeofoperatorsand

evaluatingtheoperatorsinthetree.

  • Mustunderstandqueryoptimizationinordertofully

understandtheperformanceimpactofagivendatabasedesign (relations,indexes)onaworkload(setofqueries).

  • Twopartstooptimizingaquery:
  • Considerasetofalternativeplans.
  • Mustprunesearchspace;typically,left-deepplansonly.
  • Mustestimatecostofeachplanthatisconsidered.
  • Mustestimatesizeofresultandcostforeachplannode.
  • Keyissues:Statistics,indexes,operatorimplementations.