Set-difference () Tuples in reln.1,butnotin reln.2. Union - - PDF document

set difference tuples in reln 1 but not in reln 2 union
SMART_READER_LITE
LIVE PREVIEW

Set-difference () Tuples in reln.1,butnotin reln.2. Union - - PDF document

EvaluationofRelationalOperations Chapter14,PartA(Joins) DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 1


slide-1
SLIDE 1

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 1

EvaluationofRelationalOperations

Chapter14,PartA(Joins)

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 2

RelationalOperations

  • Wewillconsiderhowtoimplement:

Selection ()Selectsasubsetofrowsfromrelation.

Projection ()Deletesunwantedcolumnsfromrelation.

Join ()Allowsustocombinetworelations.

Set-difference () Tuples in reln.1,butnotin reln.2.

Union () Tuples in reln.1andin reln.2.

Aggregation (SUM,MIN,etc.)andGROUPBY

  • Sinceeachopreturnsarelation,opscanbecomposed!

Afterwecovertheoperations,wewilldiscusshowto

  • ptimizequeriesformedbycomposingthem.

σ

π

✂ ✄ ✄

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 3

SchemaforExamples

  • Similartooldschema; rname addedforvariations.
  • Reserves:

Each tuple is40byteslong,100 tuples perpage,1000pages.

  • Sailors:

Each tuple is50byteslong,80 tuples perpage,500pages. Sailors(sid:integer, sname:string,rating:integer,age:real) Reserves(sid:integer,bid:integer,day:dates, rname:string)

slide-2
SLIDE 2

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 4

EqualityJoinsWithOneJoinColumn

  • Inalgebra:RS.Common!Mustbecarefully
  • ptimized.RSislarge;so,RSfollowedbya

selectionisinefficient.

  • Assume:M tuples inR, pR tuples perpage,N tuples

inS, pS tuples perpage.

Inourexamples,RisReservesandSisSailors.

  • Wewillconsidermorecomplexjoinconditionslater.
  • Costmetric:#ofI/Os.Wewillignoreoutputcosts.

SELECT * FROM

ReservesR1,SailorsS1

WHERE R1.sid=S1.sid

  • ×

×

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 5

SimpleNestedLoopsJoin

  • Foreach tuple intheouter relationR,wescanthe

entireinner relationS.

Cost:M+ pR *M*N=1000+100*1000*500I/Os.

  • Page-orientedNestedLoopsjoin:Foreachpage ofR,

geteachpage ofS,andwriteoutmatchingpairsof tuples <r,s>,whererisinR-pageandSisinS- page.

Cost:M+M*N=1000+1000*500

Ifsmallerrelation(S)isouter,cost=500+500*1000 foreachtuple rinRdo foreachtuple sinSdo if ri == sj thenadd<r,s>toresult

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 6

IndexNestedLoopsJoin

  • Ifthereisanindexonthejoincolumnofonerelation

(sayS),canmakeittheinnerandexploittheindex.

Cost:M+((M*pR)*costoffindingmatchingS tuples)

  • ForeachR tuple,costofprobingSindexisabout1.2

forhashindex,2-4forB+tree.CostofthenfindingS tuples (assumingAlt.(2)or(3)fordataentries) dependsonclustering.

Clusteredindex:1I/O(typical), unclustered: upto 1I/O permatchingS tuple. foreachtuple rinRdo foreachtuple sinSwhere ri == sj do add<r,s>toresult

slide-3
SLIDE 3

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 7

ExamplesofIndexNestedLoops

  • Hash-index(Alt.2)on sid ofSailors(asinner):

ScanReserves:1000pageI/Os,100*1000 tuples.

ForeachReserves tuple:1.2I/Ostogetdataentryin index,plus1I/Otoget(theexactlyone)matchingSailors tuple.Total:220,000I/Os.

  • Hash-index(Alt.2)on sid ofReserves(asinner):

ScanSailors:500pageI/Os,80*500 tuples.

ForeachSailors tuple:1.2I/Ostofindindexpagewith dataentries,pluscostofretrievingmatchingReserves tuples.Assuminguniformdistribution,2.5reservations persailor(100,000/40,000).Costofretrievingthemis1or 2.5I/Osdependingonwhethertheindexisclustered.

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 8

BlockNestedLoopsJoin

  • Useonepageasaninputbufferforscanningthe

innerS,onepageastheoutputbuffer,anduseall remainingpagestohold``block’’ofouterR.

Foreachmatching tuple rinR-block,sinS-page,add <r,s>toresult.ThenreadnextR-block,scanS,etc.

✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✂ ✂ ✂ ✂ ✂

... ...

R&S

HashtableforblockofR (k<B-1pages) InputbufferforS Outputbuffer

✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ✆ ✆ ✆ ✆

...

JoinResult

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 9

ExamplesofBlockNestedLoops

  • Cost:Scanofouter+#outerblocks*scanofinner

#outerblocks=

  • WithReserves(R)asouter,and100pagesofR:

CostofscanningRis1000I/Os;atotalof10blocks.

PerblockofR,wescanSailors(S);10*500I/Os.

Ifspaceforjust90pagesofR,wewouldscanS12times.

  • With100-pageblockofSailorsasouter:

CostofscanningSis500I/Os;atotalof5blocks.

PerblockofS,wescanReserves;5*1000I/Os.

  • Withsequentialreads considered,analysischanges:

maybebesttodividebuffersevenlybetweenRandS.

✝ ✞

# /

  • f pages of outer

blocksize

slide-4
SLIDE 4

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 10

Sort-MergeJoin(RS)

  • SortRandSonthejoincolumn,thenscanthemtodo

a``merge’’(onjoincol.),andoutputresult tuples.

AdvancescanofRuntilcurrentR-tuple >=currentS tuple, thenadvancescanofSuntilcurrentS-tuple >=currentR tuple;dothisuntilcurrentR tuple =currentS tuple.

Atthispoint,allR tuples withsamevaluein Ri (currentR group)andallS tuples withsamevaluein Sj (currentS group)match;output<r,s>forallpairsofsuch tuples.

ThenresumescanningRandS.

  • Risscannedonce;eachSgroupisscannedonceper

matchingR tuple.(MultiplescansofanSgroupare likelytofindneededpagesinbuffer.)

  • i=j

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 11

ExampleofSort-MergeJoin

  • Cost:MlogM+NlogN+(M+N)

Thecostofscanning,M+N,couldbeM*N(veryunlikely!)

  • With35,100or300bufferpages,bothReservesand

Sailorscanbesortedin2passes;totaljoincost:7500. sid sname rating age 22 dustin 7 45.0 28 yuppy 9 35.0 31 lubber 8 55.5 44 guppy 5 35.0 58 rusty 10 35.0 sid bid day rname 28 103 12/4/96 guppy 28 103 11/3/96 yuppy 31 101 10/10/96 dustin 31 102 10/12/96 lubber 31 101 10/11/96 lubber 58 103 11/12/96 dustin

(BNLcost:2500to15000I/Os)

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 12

RefinementofSort-MergeJoin

  • Wecancombinethemergingphasesinthesorting of

RandSwiththemergingrequiredforthejoin.

WithB>,whereListhesizeofthelargerrelation,using thesortingrefinementthatproducesrunsoflength2Bin Pass0,#runsofeachrelationis<B/2.

Allocate1pageperrunofeachrelation,and`merge’while checkingthejoincondition.

Cost:read+writeeachrelationinPass0+readeachrelation in(only)mergingpass(+writingofresult tuples).

Inexample,costgoesdownfrom7500to4500I/Os.

  • Inpractice,costofsort-mergejoin,likethecostof

externalsorting,islinear.

L

slide-5
SLIDE 5

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 13

Hash-Join

  • Partitionboth

relationsusinghash fnh:R tuples in partitioniwillonly matchS tuples in partitioni.

  • Readinapartition
  • fR,hashitusing

h2(<>h!).Scan matchingpartition

  • fS,searchfor

matches.

Partitions

  • fR&S

Inputbuffer for Si

Hashtableforpartition Ri (k<B-1pages) Bmainmemorybuffers Disk

Output buffer

Disk JoinResult hash fn h2 h2 Bmainmemorybuffers Disk Disk Original Relation

OUTPUT 2 INPUT 1 hash function

h

B-1

Partitions 1 2 B-1

...

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 14

ObservationsonHash-Join

  • #partitionsk<B-1(why?),andB-2>sizeoflargest

partition tobeheldinmemory.Assuminguniformly sizedpartitions,andmaximizingk,weget:

k=B-1,andM/(B-1)<B-2,i.e.,Bmustbe>

  • Ifwebuildanin-memoryhashtabletospeedupthe

matchingof tuples,alittlemorememoryisneeded.

  • Ifthehashfunctiondoesnotpartitionuniformly,one
  • rmoreRpartitionsmaynotfitinmemory.Can

applyhash-jointechniquerecursivelytodothejoinof thisR-partitionwithcorrespondingS-partition.

M

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 15

CostofHash-Join

  • Inpartitioningphase,read+writeboth relns;2(M+N).

Inmatchingphase,readboth relns;M+NI/Os.

  • Inourrunningexample,thisisatotalof4500I/Os.
  • Sort-MergeJoinvs.HashJoin:

Givenaminimumamountofmemory(whatisthis,foreach?) bothhaveacostof3(M+N)I/Os.HashJoinsuperioron thiscountifrelationsizesdiffergreatly.Also,HashJoin showntobehighly parallelizable.

Sort-Mergelesssensitivetodataskew;resultissorted.

slide-6
SLIDE 6

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 16

GeneralJoinConditions

  • Equalitiesoverseveralattributes(e.g.,R.sid=S.sid

AND R.rname=S.sname):

ForIndexNL,buildindexon <sid, sname>(ifSisinner);or useexistingindexeson sid or sname.

ForSort-MergeandHashJoin,sort/partitionon combinationofthetwojoincolumns.

  • Inequalityconditions(e.g.,R.rname <S.sname):

ForIndexNL,need(clustered!)B+treeindex.

  • Rangeprobesoninner;#matcheslikelytobemuchhigherthanfor

equalityjoins.

HashJoin,SortMergeJoinnotapplicable.

BlockNLquitelikelytobethebestjoinmethodhere.