Hash-BasedIndexes Chapter10 - - PDF document

hash based indexes
SMART_READER_LITE
LIVE PREVIEW

Hash-BasedIndexes Chapter10 - - PDF document

Hash-BasedIndexes Chapter10 DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 1 Introduction Asforanyindex,3alternativesfordataentries


slide-1
SLIDE 1

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 1

Hash-BasedIndexes

Chapter10

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 2

Introduction

  • Asforanyindex,3alternativesfordataentriesk*:

Datarecordwithkeyvalue k

<k,ridofdatarecordwithsearchkeyvalue k>

<k,listofridsofdatarecordswithsearchkeyk>

Choiceorthogonaltotheindexingtechnique

  • Hash-based indexesarebestforequality selections.

Cannot supportrangesearches.

  • Staticanddynamichashingtechniquesexist;

trade-offssimilartoISAMvs.B+trees.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 3

StaticHashing

  • #primarypagesfixed,allocatedsequentially,

neverde-allocated;overflowpagesifneeded.

  • h(k)modM=buckettowhichdataentrywith

key kbelongs.(M=#ofbuckets)

h(key)modN h key Primarybucketpages Overflowpages 2 N-1

slide-2
SLIDE 2

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 4

StaticHashing(Contd.)

  • Bucketscontaindataentries.
  • Hashfnworksonsearchkeyfieldofrecordr.Must

distributevaluesoverrange0...M-1.

h(key)=(a*key +b)usuallyworkswell.

aandbareconstants;lotsknownabouthowtotuneh.

  • Longoverflowchainscandevelopanddegrade

performance.

Extendible andLinear Hashing:Dynamictechniquestofix thisproblem.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 5

ExtendibleHashing

  • Situation:Bucket(primarypage)becomesfull.

Whynotre-organizefilebydoubling#ofbuckets?

Readingandwritingallpagesisexpensive!

Idea:Usedirectoryofpointerstobuckets,double#of bucketsbydoublingthedirectory,splittingjustthe bucketthatoverflowed!

Directorymuchsmallerthanfile,sodoublingitis muchcheaper.Onlyonepageofdataentriesissplit. No overflow page!

Trickliesinhowhashfunctionisadjusted!

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 6

Example

  • Directoryisarrayofsize4.
  • Tofindbucketforr,take

last`globaldepth’#bitsof h(r);wedenoter byh(r).

Ifh(r)=5=binary101, itisinbucketpointedto by01.

  • Insert:Ifbucketisfull,split it(allocatenewpage,re-distribute).
  • Ifnecessary,doublethedirectory.(Aswewillsee,splittinga

bucketdoesnotalwaysrequiredoubling;wecantellby comparingglobaldepthwithlocaldepthforthesplitbucket.)

✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎

13* 00 01 10 11 2 2 2 2

✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆

2 LOCALDEPTH GLOBALDEPTH DIRECTORY BucketA BucketB BucketC BucketD DATAPAGES 10* 1* 21* 4* 12* 32* 16* 15* 7* 19* 5*

slide-3
SLIDE 3

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 7

Inserth(r)=20(CausesDoubling)

✁ ✁ ✁ ✁ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✄ ✄ ✄ ✄ ✄

20*

☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎

00 01 10 11 2 2

✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆

2 2 LOCALDEPTH 2 2 DIRECTORY GLOBALDEPTH BucketA BucketB BucketC BucketD BucketA2 (`splitimage'

  • fBucketA)

1* 5* 21*13* 32*16* 10* 15* 7* 19* 4* 12*

✝ ✝ ✝ ✝ ✝ ✞ ✞ ✞ ✞ ✟ ✟ ✟ ✟ ✠ ✠ ✠ ✠ ✠ ✠ ✠ ✠ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ✡ ☛ ☛ ☛ ☛ ☛ ☛ ☛ ☛

19* 2 2 2 000 001 010 011 100 101 110 111 3 3 3 DIRECTORY BucketA BucketB BucketC BucketD BucketA2 (`splitimage'

  • fBucketA)

32* 1* 5* 21*13* 16* 10* 15* 7* 4* 20* 12* LOCALDEPTH GLOBALDEPTH DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 8

PointstoNote

  • 20=binary10100.Last2 bits(00)tellusrbelongsin

AorA2.Last3 bitsneededtotellwhich.

Globaldepthofdirectory:Max#ofbitsneededtotell whichbucketanentrybelongsto.

Localdepthofabucket:#ofbitsusedtodetermineifan entrybelongstothisbucket.

  • Whendoesbucketsplitcausedirectorydoubling?

Beforeinsert,localdepthofbucket=globaldepth.Insert causeslocaldepthtobecome>globaldepth;directoryis doubledbycopyingitoverand`fixing’pointertosplit imagepage.(Useofleastsignificantbitsenablesefficient doublingviacopyingofdirectory!)

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 9

DirectoryDoubling

☞ ☞ ☞ ☞ ☞

00 01 10 11 2

Whyuseleastsignificantbitsindirectory?

✌ Allowsfordoublingviacopying! ✍ ✍ ✍ ✍ ✍ ✍ ✍ ✍ ✍ ✍

000 001 010 011 3 100 101 110 111

vs.

✎ ✎ ✎ ✎ ✎ ✎ ✎ ✎ ✎ ✎

1 1

6* 6* 6*

6=110

✏ ✏ ✏ ✏ ✏

00 10 01 11 2

✑ ✑ ✑ ✑ ✑ ✑ ✑ ✑

3

✒ ✒ ✒ ✒ ✒ ✒ ✒ ✒ ✒ ✒

1 1

6* 6* 6*

6=110

000 100 010 110 001 101 011 111

LeastSignificant MostSignificant

slide-4
SLIDE 4

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 10

CommentsonExtendibleHashing

  • Ifdirectoryfitsinmemory,equalitysearch

answeredwithonediskaccess;elsetwo.

100MBfile,100bytes/rec,4Kpagescontains1,000,000 records(asdataentries)and25,000directoryelements; chancesarehighthatdirectorywillfitinmemory.

Directorygrowsinspurts,and,ifthedistributionofhash valuesisskewed,directorycangrowlarge.

Multipleentrieswithsamehashvaluecauseproblems!

  • Delete:Ifremovalofdataentrymakesbucket

empty,canbemergedwith`splitimage’.Ifeach directoryelementpointstosamebucketasitssplit image,canhalvedirectory.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 11

LinearHashing

  • Thisisanotherdynamichashingscheme,an

alternativetoExtendibleHashing.

  • LHhandlestheproblemoflongoverflowchains

withoutusingadirectory,andhandlesduplicates.

  • Idea:Useafamilyofhashfunctionsh0,h1,h2,...

hi(key)=h(key)mod(2iN);N=initial# buckets

hissomehashfunction(rangeisnot 0toN-1)

IfN=2d0,forsomed0,hi consistsofapplyinghandlooking atthelastdi bits,wheredi =d0 +i.

hi+1doublestherangeofhi(similartodirectorydoubling)

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 12

LinearHashing(Contd.)

  • DirectoryavoidedinLHbyusingoverflow

pages,andchoosingbuckettosplitround-robin.

Splittingproceedsin`rounds’.Roundendswhenall NR initial(forroundR)bucketsaresplit.Buckets0to Next-1havebeensplit;Next toNR yettobesplit.

CurrentroundnumberisLevel.

Search: Tofindbucketfordataentryr,find hLevel(r):

  • IfhLevel(r)inrange`Next toNR’ ,rbelongshere.
  • Else,rcouldbelongtobuckethLevel(r)orbucket

hLevel(r)+NR;mustapplyhLevel+1(r)tofindout.

slide-5
SLIDE 5

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 13

OverviewofLHFile

  • Inthemiddleofaround.

Level h

Bucketsthatexistedatthe beginningofthisround: thisistherangeof Next Buckettobesplit

  • fotherbuckets)inthisround

Level h searchkeyvalue) ( searchkeyvalue) ( Bucketssplitinthisround: If isinthisrange,mustuse hLevel+1 `splitimage'bucket. todecideifentryisin created(throughsplitting `splitimage'buckets:

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 14

LinearHashing(Contd.)

  • Insert:FindbucketbyapplyinghLevel /hLevel+1:

Ifbuckettoinsertintoisfull:

  • Addoverflowpageandinsertdataentry.
  • (Maybe)SplitNextbucketandincrementNext.
  • Canchooseanycriterionto`trigger’split.
  • Sincebucketsaresplitround-robin,longoverflow

chainsdon’tdevelop!

  • DoublingofdirectoryinExtendibleHashingis

similar;switchingofhashfunctionsisimplicit in howthe#ofbitsexaminedisincreased.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 15

ExampleofLinearHashing

  • Onsplit,hLevel+1isusedto

re-distribute entries.

h h 1 (Thisinfo isforillustration

  • nly!)

Level=0,N=4 00 01 10 11 000 001 010 011 (Theactualcontents

  • fthelinearhashed

file) Next=0 PRIMARY PAGES Dataentryr withh(r)=5 Primary bucketpage 44* 36* 32* 25* 9* 5* 14*18*10*30* 31*35* 11* 7* h h 1 Level=0 00 01 10 11 000 001 010 011 Next=1 PRIMARY PAGES 44* 36* 32* 25* 9* 5* 14*18*10*30* 31*35* 11* 7* OVERFLOW PAGES 43* 00 100

slide-6
SLIDE 6

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 16

Example:EndofaRound

h h1 22* 00 01 10 11 000 001 010 011 00 100 Next=3 01 10 101 110 Level=0 PRIMARY PAGES OVERFLOW PAGES 32* 9* 5* 14* 25* 66* 10* 18* 34* 35* 31* 7* 11* 43* 44* 36* 37*29* 30* h h1 37* 00 01 10 11 000 001 010 011 00 100 10 101 110 Next=0 Level=1 111 11 PRIMARY PAGES OVERFLOW PAGES 11 32* 9* 25* 66* 18* 10*34* 35* 11* 44* 36* 5* 29* 43* 14* 30* 22* 31*7* 50* DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 17

LHDescribedasaVariantofEH

  • Thetwoschemesareactuallyquitesimilar:

BeginwithanEHindexwheredirectoryhasN elements.

Useoverflowpages,splitbucketsround-robin.

Firstsplitisatbucket0.(Imaginedirectorybeingdoubled atthispoint.)Butelements<1,N+1>,<2,N+2>,...arethe same.So,needonlycreatedirectoryelementN,which differsfrom0,now.

  • Whenbucket1splits,createdirectoryelementN+1,etc.
  • So,directorycandoublegradually.Also,primary

bucketpagesarecreatedinorder.Iftheyareallocated insequencetoo(sothatfindingi’thiseasy),we actuallydon’tneedadirectory!Voila,LH.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 18

Summary

  • Hash-basedindexes:bestforequalitysearches,

cannotsupportrangesearches.

  • StaticHashingcanleadtolongoverflowchains.
  • ExtendibleHashingavoidsoverflowpagesby

splittingafullbucketwhenanewdataentryistobe addedtoit.(Duplicatesmayrequireoverflowpages.)

Directorytokeeptrackofbuckets,doublesperiodically.

Cangetlargewithskeweddata;additionalI/Oifthis doesnotfitinmainmemory.

slide-7
SLIDE 7

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 19

Summary(Contd.)

  • LinearHashingavoidsdirectorybysplittingbuckets

round-robin,andusingoverflowpages.

Overflowpagesnotlikelytobelong.

Duplicateshandledeasily.

SpaceutilizationcouldbelowerthanExtendibleHashing, sincesplitsnotconcentratedon`dense’dataareas.

  • Cantunecriterionfortriggeringsplitstotrade-off

slightlylongerchainsforbetterspaceutilization.

  • Forhash-basedindexes,askewed datadistributionis
  • neinwhichthehashvaluesofdataentriesarenot

uniformlydistributed!