EvaluationofRelationalOperations: OtherTechniques Chapter12,PartB - - PDF document

evaluation of relational operations other techniques
SMART_READER_LITE
LIVE PREVIEW

EvaluationofRelationalOperations: OtherTechniques Chapter12,PartB - - PDF document

EvaluationofRelationalOperations: OtherTechniques Chapter12,PartB DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 1


slide-1
SLIDE 1

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 1

EvaluationofRelationalOperations: OtherTechniques

Chapter12,PartB

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 2

UsinganIndexforSelections

  • Costdependson#qualifyingtuples,andclustering.

Costoffindingqualifyingdataentries(typicallysmall)plus costofretrievingrecords(couldbelargew/oclustering).

Inexample,assuminguniformdistributionofnames,about 10%oftuplesqualify(100pages,10000tuples).Witha clusteredindex,costislittlemorethan100I/Os;if unclustered,upto10000I/Os!

  • Importantrefinementforunclusteredindexes:

1.Findqualifyingdataentries. 2.Sorttherid’softhedatarecordstoberetrieved. 3.Fetchridsinorder.Thisensuresthateachdatapageis lookedatjustonce(though#ofsuchpageslikelytobe higherthanwithclustering).

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 3

TwoApproachestoGeneralSelections

  • Firstapproach: Findthemostselectiveaccesspath,

retrievetuplesusingit,andapplyanyremaining termsthatdon’tmatch theindex:

Mostselectiveaccesspath:Anindexorfilescanthatwe estimatewillrequirethefewestpageI/Os.

Termsthatmatchthisindexreducethenumberoftuples retrieved;othertermsareusedtodiscardsomeretrieved tuples,butdonotaffectnumberoftuples/pagesfetched.

Considerday<8/9/94ANDbid=5ANDsid=3. AB+tree indexon daycanbeused;then,bid=5 andsid=3mustbe checkedforeachretrievedtuple.Similarly,ahashindexon <bid,sid>couldbeused;day<8/9/94 mustthenbechecked.

slide-2
SLIDE 2

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 4

IntersectionofRids

  • Secondapproach (ifwehave2ormorematching

indexesthatuseAlternatives(2)or(3)fordata entries):

Getsetsofridsofdatarecordsusingeachmatchingindex.

Thenintersect thesesetsofrids(we’lldiscussintersection soon!)

Retrievetherecordsandapplyanyremainingterms.

Considerday<8/9/94ANDbid=5ANDsid=3. Ifwehavea B+treeindexonday andanindexonsid,bothusing Alternative(2),wecanretrieveridsofrecordssatisfying day<8/9/94usingthefirst,ridsofrecssatisfyingsid=3 using thesecond,intersect,retrieverecordsandcheckbid=5.

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 5

TheProjectionOperation

  • Anapproachbasedonsorting:

ModifyPass0ofexternalsorttoeliminateunwantedfields. Thus,runsofabout2Bpagesareproduced,buttuplesin runsaresmallerthaninputtuples.(Sizeratiodependson #andsizeoffieldsthataredropped.)

Modifymergingpassestoeliminateduplicates.Thus, numberofresulttuplessmallerthaninput.(Difference dependson#ofduplicates.)

Cost:InPass0,readoriginalrelation(sizeM),writeout samenumberofsmallertuples.Inmergingpasses,fewer tupleswrittenoutineachpass.UsingReservesexample, 1000inputpagesreducedto250inPass0ifsizeratiois0.25

SELECT DISTINCT

R.sid,R.bid

FROM

ReservesR

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 6

ProjectionBasedonHashing

  • Partitioningphase:ReadRusingoneinputbuffer.For

eachtuple,discardunwantedfields,applyhash functionh1 tochooseoneofB-1outputbuffers.

ResultisB-1partitions(oftupleswithnounwantedfields). 2tuplesfromdifferentpartitionsguaranteedtobedistinct.

  • Duplicateeliminationphase:Foreachpartition,readit

andbuildanin-memoryhashtable,usinghashfnh2 (<>h1)onallfields,whilediscardingduplicates.

Ifpartitiondoesnotfitinmemory,canapplyhash-based projectionalgorithmrecursivelytothispartition.

  • Cost:Forpartitioning,readR,writeouteachtuple,

butwithfewerfields.Thisisreadinnextphase.

slide-3
SLIDE 3

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 7

DiscussionofProjection

  • Sort-basedapproachisthestandard;betterhandling
  • fskewandresultissorted.
  • Ifanindexontherelationcontainsallwanted

attributesinitssearchkey,candoindex-only scan.

Applyprojectiontechniquestodataentries(muchsmaller!)

  • Ifanordered(i.e.,tree)indexcontainsallwanted

attributesasprefixofsearchkey,candoevenbetter:

Retrievedataentriesinorder(index-onlyscan),discard unwantedfields,compareadjacenttuplestocheckfor duplicates.

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 8

SetOperations

  • Intersectionandcross-productspecialcasesofjoin.
  • Union(Distinct)andExceptsimilar;we’lldounion.
  • Sortingbasedapproachtounion:

Sortbothrelations(oncombinationofallattributes).

Scansortedrelationsandmergethem.

Alternative:MergerunsfromPass0forboth relations.

  • Hashbasedapproachtounion:

PartitionRandSusinghashfunctionh.

ForeachS-partition,buildin-memoryhashtable(usingh2), scancorr.R-partitionandaddtuplestotablewhile discardingduplicates.

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 9

AggregateOperations(AVG,MIN,etc.)

  • Withoutgrouping:

Ingeneral,requiresscanningtherelation.

Givenindexwhosesearchkeyincludesallattributesinthe

SELECT orWHERE clauses,candoindex-onlyscan.

  • Withgrouping:

Sortongroup-byattributes,thenscanrelationandcompute aggregateforeachgroup.(Canimproveuponthisby combiningsortingandaggregatecomputation.)

Similarapproachbasedonhashingongroup-byattributes.

Giventreeindexwhosesearchkeyincludesallattributesin

SELECT,WHEREandGROUPBYclauses,candoindex-only

scan;ifgroup-byattributesformprefixofsearchkey,can retrievedataentries/tuplesingroup-byorder.

slide-4
SLIDE 4

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 10

ImpactofBuffering

  • Ifseveraloperationsareexecutingconcurrently,

estimatingthenumberofavailablebufferpagesis guesswork.

  • Repeatedaccesspatternsinteractwithbuffer

replacementpolicy.

e.g.,InnerrelationisscannedrepeatedlyinSimple NestedLoopJoin.Withenoughbufferpagestohold inner,replacementpolicydoesnotmatter.Otherwise, MRUisbest,LRUisworst(sequentialflooding).

DoesreplacementpolicymatterforBlockNestedLoops?

WhataboutIndexNestedLoops?Sort-MergeJoin?

DatabaseManagementSystems3ed,R.RamakrishnanandJohannesGehrke 11

Summary

  • AvirtueofrelationalDBMSs:queriesarecomposedofa

fewbasicoperators;theimplementationofthese

  • peratorscanbecarefullytuned(anditisimportant

todothis!).

  • Manyalternativeimplementationtechniquesforeach
  • perator;nouniversallysuperiortechniqueformost
  • perators.
  • Mustconsideravailablealternativesforeach
  • perationinaqueryandchoosebestonebasedon

systemstatistics,etc.Thisispartofthebroadertask

  • foptimizingaquerycomposedofseveralops.