EvaluationofRelationalOperations: OtherTechniques Chapter12,PartB - - PDF document

▶

Dec 14, 2022 48 likes •90 views

EvaluationofRelationalOperations: OtherTechniques Chapter12,PartB DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 1

SLIDE 1

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 1

EvaluationofRelationalOperations: OtherTechniques

Chapter12,PartB

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 2

UsinganIndexforSelections

Costdependson#qualifyingtuples,andclustering.

✁

Costoffindingqualifyingdataentries(typicallysmall)plus costofretrievingrecords(couldbelargew/oclustering).

✁

Inexample,assuminguniformdistributionofnames,about 10%oftuplesqualify(100pages,10000tuples).Witha clusteredindex,costislittlemorethan100I/Os;if unclustered,upto10000I/Os!

Importantrefinementforunclusteredindexes:

1.Findqualifyingdataentries. 2.Sorttherid’softhedatarecordstoberetrieved. 3.Fetchridsinorder.Thisensuresthateachdatapageis lookedatjustonce(though#ofsuchpageslikelytobe higherthanwithclustering).

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 3

TwoApproachestoGeneralSelections

Firstapproach: Findthemostselectiveaccesspath,

retrievetuplesusingit,andapplyanyremaining termsthatdon’tmatch theindex:

✁

Mostselectiveaccesspath:Anindexorfilescanthatwe estimatewillrequirethefewestpageI/Os.

✁

Termsthatmatchthisindexreducethenumberoftuples retrieved;othertermsareusedtodiscardsomeretrieved tuples,butdonotaffectnumberoftuples/pagesfetched.

✁

Considerday<8/9/94ANDbid=5ANDsid=3. AB+tree indexon daycanbeused;then,bid=5 andsid=3mustbe checkedforeachretrievedtuple.Similarly,ahashindexon <bid,sid>couldbeused;day<8/9/94 mustthenbechecked.

SLIDE 2

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 4

IntersectionofRids

Secondapproach (ifwehave2ormorematching

indexesthatuseAlternatives(2)or(3)fordata entries):

✁

Getsetsofridsofdatarecordsusingeachmatchingindex.

✁

Thenintersect thesesetsofrids(we’lldiscussintersection soon!)

✁

Retrievetherecordsandapplyanyremainingterms.

✁

Considerday<8/9/94ANDbid=5ANDsid=3. Ifwehavea B+treeindexonday andanindexonsid,bothusing Alternative(2),wecanretrieveridsofrecordssatisfying day<8/9/94usingthefirst,ridsofrecssatisfyingsid=3 using thesecond,intersect,retrieverecordsandcheckbid=5.

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 5

TheProjectionOperation

Anapproachbasedonsorting:

✁

ModifyPass0ofexternalsorttoeliminateunwantedfields. Thus,runsofabout2Bpagesareproduced,buttuplesin runsaresmallerthaninputtuples.(Sizeratiodependson #andsizeoffieldsthataredropped.)

✁

Modifymergingpassestoeliminateduplicates.Thus, numberofresulttuplessmallerthaninput.(Difference dependson#ofduplicates.)

✁

Cost:InPass0,readoriginalrelation(sizeM),writeout samenumberofsmallertuples.Inmergingpasses,fewer tupleswrittenoutineachpass.UsingReservesexample, 1000inputpagesreducedto250inPass0ifsizeratiois0.25

SELECT DISTINCT

R.sid,R.bid

FROM

ReservesR

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 6

ProjectionBasedonHashing

Partitioningphase:ReadRusingoneinputbuffer.For

eachtuple,discardunwantedfields,applyhash functionh1 tochooseoneofB-1outputbuffers.

✁

ResultisB-1partitions(oftupleswithnounwantedfields). 2tuplesfromdifferentpartitionsguaranteedtobedistinct.

Duplicateeliminationphase:Foreachpartition,readit

andbuildanin-memoryhashtable,usinghashfnh2 (<>h1)onallfields,whilediscardingduplicates.

✁

Ifpartitiondoesnotfitinmemory,canapplyhash-based projectionalgorithmrecursivelytothispartition.

Cost:Forpartitioning,readR,writeouteachtuple,

butwithfewerfields.Thisisreadinnextphase.

SLIDE 3

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 7

DiscussionofProjection

Sort-basedapproachisthestandard;betterhandling
fskewandresultissorted.
Ifanindexontherelationcontainsallwanted

attributesinitssearchkey,candoindex-only scan.

✁

Applyprojectiontechniquestodataentries(muchsmaller!)

Ifanordered(i.e.,tree)indexcontainsallwanted

attributesasprefixofsearchkey,candoevenbetter:

✁

Retrievedataentriesinorder(index-onlyscan),discard unwantedfields,compareadjacenttuplestocheckfor duplicates.

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 8

SetOperations

Intersectionandcross-productspecialcasesofjoin.
Union(Distinct)andExceptsimilar;we’lldounion.
Sortingbasedapproachtounion:

✁

Sortbothrelations(oncombinationofallattributes).

✁

Scansortedrelationsandmergethem.

✁

Alternative:MergerunsfromPass0forboth relations.

Hashbasedapproachtounion:

✁

PartitionRandSusinghashfunctionh.

✁

ForeachS-partition,buildin-memoryhashtable(usingh2), scancorr.R-partitionandaddtuplestotablewhile discardingduplicates.

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 9

AggregateOperations(AVG,MIN,etc.)

Withoutgrouping:

✁

Ingeneral,requiresscanningtherelation.

✁

Givenindexwhosesearchkeyincludesallattributesinthe

SELECT orWHERE clauses,candoindex-onlyscan.

Withgrouping:

✁

Sortongroup-byattributes,thenscanrelationandcompute aggregateforeachgroup.(Canimproveuponthisby combiningsortingandaggregatecomputation.)

✁

Similarapproachbasedonhashingongroup-byattributes.

✁

Giventreeindexwhosesearchkeyincludesallattributesin

SELECT,WHEREandGROUPBYclauses,candoindex-only

scan;ifgroup-byattributesformprefixofsearchkey,can retrievedataentries/tuplesingroup-byorder.

SLIDE 4

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 10

ImpactofBuffering

Ifseveraloperationsareexecutingconcurrently,

estimatingthenumberofavailablebufferpagesis guesswork.

Repeatedaccesspatternsinteractwithbuffer

replacementpolicy.

✁

e.g.,InnerrelationisscannedrepeatedlyinSimple NestedLoopJoin.Withenoughbufferpagestohold inner,replacementpolicydoesnotmatter.Otherwise, MRUisbest,LRUisworst(sequentialflooding).

✁

DoesreplacementpolicymatterforBlockNestedLoops?

✁

WhataboutIndexNestedLoops?Sort-MergeJoin?

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 11

Summary

AvirtueofrelationalDBMSs:queriesarecomposedofa

fewbasicoperators;theimplementationofthese

peratorscanbecarefullytuned(anditisimportant

todothis!).

Manyalternativeimplementationtechniquesforeach
perator;nouniversallysuperiortechniqueformost
perators.
Mustconsideravailablealternativesforeach
perationinaqueryandchoosebestonebasedon

systemstatistics,etc.Thisispartofthebroadertask

foptimizingaquerycomposedofseveralops.