[PDF] - Set-difference () Tuples in reln.1,butnotin reln.2. Union PDF Document

SLIDE 1

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 1

EvaluationofRelationalOperations

Chapter14,PartA(Joins)

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 2

RelationalOperations

Wewillconsiderhowtoimplement:

✁

Selection ()Selectsasubsetofrowsfromrelation.

✁

Projection ()Deletesunwantedcolumnsfromrelation.

✁

Join ()Allowsustocombinetworelations.

✁

Set-difference () Tuples in reln.1,butnotin reln.2.

✁

Union () Tuples in reln.1andin reln.2.

✁

Aggregation (SUM,MIN,etc.)andGROUPBY

Sinceeachopreturnsarelation,opscanbecomposed!

Afterwecovertheoperations,wewilldiscusshowto

ptimizequeriesformedbycomposingthem.

σ

π

−

✂ ✄ ✄

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 3

SchemaforExamples

Similartooldschema; rname addedforvariations.
Reserves:

✁

Each tuple is40byteslong,100 tuples perpage,1000pages.

Sailors:

✁

Each tuple is50byteslong,80 tuples perpage,500pages. Sailors(sid:integer, sname:string,rating:integer,age:real) Reserves(sid:integer,bid:integer,day:dates, rname:string)

SLIDE 2

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 4

EqualityJoinsWithOneJoinColumn

Inalgebra:RS.Common!Mustbecarefully
ptimized.RSislarge;so,RSfollowedbya

selectionisinefficient.

Assume:M tuples inR, pR tuples perpage,N tuples

inS, pS tuples perpage.

✁

Inourexamples,RisReservesandSisSailors.

Wewillconsidermorecomplexjoinconditionslater.
Costmetric:#ofI/Os.Wewillignoreoutputcosts.

SELECT * FROM

ReservesR1,SailorsS1

WHERE R1.sid=S1.sid

×

×

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 5

SimpleNestedLoopsJoin

Foreach tuple intheouter relationR,wescanthe

entireinner relationS.

✁

Cost:M+ pR MN=1000+1001000500I/Os.

Page-orientedNestedLoopsjoin:Foreachpage ofR,

geteachpage ofS,andwriteoutmatchingpairsof tuples <r,s>,whererisinR-pageandSisinS- page.

✁

Cost:M+MN=1000+1000500

✁

Ifsmallerrelation(S)isouter,cost=500+500*1000 foreachtuple rinRdo foreachtuple sinSdo if ri == sj thenadd<r,s>toresult

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 6

IndexNestedLoopsJoin

Ifthereisanindexonthejoincolumnofonerelation

(sayS),canmakeittheinnerandexploittheindex.

✁

Cost:M+((MpR)costoffindingmatchingS tuples)

ForeachR tuple,costofprobingSindexisabout1.2

forhashindex,2-4forB+tree.CostofthenfindingS tuples (assumingAlt.(2)or(3)fordataentries) dependsonclustering.

✁

Clusteredindex:1I/O(typical), unclustered: upto 1I/O permatchingS tuple. foreachtuple rinRdo foreachtuple sinSwhere ri == sj do add<r,s>toresult

SLIDE 3

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 7

ExamplesofIndexNestedLoops

Hash-index(Alt.2)on sid ofSailors(asinner):

✁

ScanReserves:1000pageI/Os,100*1000 tuples.

✁

ForeachReserves tuple:1.2I/Ostogetdataentryin index,plus1I/Otoget(theexactlyone)matchingSailors tuple.Total:220,000I/Os.

Hash-index(Alt.2)on sid ofReserves(asinner):

✁

ScanSailors:500pageI/Os,80*500 tuples.

✁

ForeachSailors tuple:1.2I/Ostofindindexpagewith dataentries,pluscostofretrievingmatchingReserves tuples.Assuminguniformdistribution,2.5reservations persailor(100,000/40,000).Costofretrievingthemis1or 2.5I/Osdependingonwhethertheindexisclustered.

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 8

BlockNestedLoopsJoin

Useonepageasaninputbufferforscanningthe

innerS,onepageastheoutputbuffer,anduseall remainingpagestohold``block’’ofouterR.

✁

Foreachmatching tuple rinR-block,sinS-page,add <r,s>toresult.ThenreadnextR-block,scanS,etc.

✁

✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✂ ✂ ✂ ✂ ✂

... ...

R&S

HashtableforblockofR (k<B-1pages) InputbufferforS Outputbuffer

✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ✆ ✆ ✆ ✆

...

JoinResult

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 9

ExamplesofBlockNestedLoops

Cost:Scanofouter+#outerblocks*scanofinner

✁

#outerblocks=

WithReserves(R)asouter,and100pagesofR:

✁

CostofscanningRis1000I/Os;atotalof10blocks.

✁

PerblockofR,wescanSailors(S);10*500I/Os.

✁

Ifspaceforjust90pagesofR,wewouldscanS12times.

With100-pageblockofSailorsasouter:

✁

CostofscanningSis500I/Os;atotalof5blocks.

✁

PerblockofS,wescanReserves;5*1000I/Os.

Withsequentialreads considered,analysischanges:

maybebesttodividebuffersevenlybetweenRandS.

✝ ✞

# /

f pages of outer

blocksize

SLIDE 4

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 10

Sort-MergeJoin(RS)

SortRandSonthejoincolumn,thenscanthemtodo

a``merge’’(onjoincol.),andoutputresult tuples.

✁

AdvancescanofRuntilcurrentR-tuple >=currentS tuple, thenadvancescanofSuntilcurrentS-tuple >=currentR tuple;dothisuntilcurrentR tuple =currentS tuple.

✁

Atthispoint,allR tuples withsamevaluein Ri (currentR group)andallS tuples withsamevaluein Sj (currentS group)match;output<r,s>forallpairsofsuch tuples.

✁

ThenresumescanningRandS.

Risscannedonce;eachSgroupisscannedonceper

matchingR tuple.(MultiplescansofanSgroupare likelytofindneededpagesinbuffer.)

i=j

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 11

ExampleofSort-MergeJoin

Cost:MlogM+NlogN+(M+N)

✁

Thecostofscanning,M+N,couldbeM*N(veryunlikely!)

With35,100or300bufferpages,bothReservesand

Sailorscanbesortedin2passes;totaljoincost:7500. sid sname rating age 22 dustin 7 45.0 28 yuppy 9 35.0 31 lubber 8 55.5 44 guppy 5 35.0 58 rusty 10 35.0 sid bid day rname 28 103 12/4/96 guppy 28 103 11/3/96 yuppy 31 101 10/10/96 dustin 31 102 10/12/96 lubber 31 101 10/11/96 lubber 58 103 11/12/96 dustin

(BNLcost:2500to15000I/Os)

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 12

RefinementofSort-MergeJoin

Wecancombinethemergingphasesinthesorting of

RandSwiththemergingrequiredforthejoin.

✁

WithB>,whereListhesizeofthelargerrelation,using thesortingrefinementthatproducesrunsoflength2Bin Pass0,#runsofeachrelationis<B/2.

✁

Allocate1pageperrunofeachrelation,and`merge’while checkingthejoincondition.

✁

Cost:read+writeeachrelationinPass0+readeachrelation in(only)mergingpass(+writingofresult tuples).

✁

Inexample,costgoesdownfrom7500to4500I/Os.

Inpractice,costofsort-mergejoin,likethecostof

externalsorting,islinear.

L

SLIDE 5

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 13

Hash-Join

Partitionboth

relationsusinghash fnh:R tuples in partitioniwillonly matchS tuples in partitioni.

Readinapartition
fR,hashitusing

h2(<>h!).Scan matchingpartition

fS,searchfor

matches.

Partitions

fR&S

Inputbuffer for Si

Hashtableforpartition Ri (k<B-1pages) Bmainmemorybuffers Disk

Output buffer

Disk JoinResult hash fn h2 h2 Bmainmemorybuffers Disk Disk Original Relation

OUTPUT 2 INPUT 1 hash function

h

B-1

Partitions 1 2 B-1

...

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 14

ObservationsonHash-Join

#partitionsk<B-1(why?),andB-2>sizeoflargest

partition tobeheldinmemory.Assuminguniformly sizedpartitions,andmaximizingk,weget:

✁

k=B-1,andM/(B-1)<B-2,i.e.,Bmustbe>

Ifwebuildanin-memoryhashtabletospeedupthe

matchingof tuples,alittlemorememoryisneeded.

Ifthehashfunctiondoesnotpartitionuniformly,one
rmoreRpartitionsmaynotfitinmemory.Can

applyhash-jointechniquerecursivelytodothejoinof thisR-partitionwithcorrespondingS-partition.

M

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 15

CostofHash-Join

Inpartitioningphase,read+writeboth relns;2(M+N).

Inmatchingphase,readboth relns;M+NI/Os.

Inourrunningexample,thisisatotalof4500I/Os.
Sort-MergeJoinvs.HashJoin:

✁

Givenaminimumamountofmemory(whatisthis,foreach?) bothhaveacostof3(M+N)I/Os.HashJoinsuperioron thiscountifrelationsizesdiffergreatly.Also,HashJoin showntobehighly parallelizable.

✁

Sort-Mergelesssensitivetodataskew;resultissorted.

SLIDE 6

DatabaseManagementSystems3ed,R. Ramakrishnan andJ.Gehrke 16

GeneralJoinConditions

Equalitiesoverseveralattributes(e.g.,R.sid=S.sid

AND R.rname=S.sname):

✁

ForIndexNL,buildindexon <sid, sname>(ifSisinner);or useexistingindexeson sid or sname.

✁

ForSort-MergeandHashJoin,sort/partitionon combinationofthetwojoincolumns.

Inequalityconditions(e.g.,R.rname <S.sname):

✁

ForIndexNL,need(clustered!)B+treeindex.

Rangeprobesoninner;#matcheslikelytobemuchhigherthanfor

equalityjoins.

✁

HashJoin,SortMergeJoinnotapplicable.

✁