Tree-StructuredIndexes Chapter9 - - PDF document

tree structured indexes
SMART_READER_LITE
LIVE PREVIEW

Tree-StructuredIndexes Chapter9 - - PDF document

Tree-StructuredIndexes Chapter9 DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 1 Introduction


slide-1
SLIDE 1

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 1

Tree-StructuredIndexes

Chapter9

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 2

Introduction

  • Asforanyindex,3alternativesfordataentriesk*:

Datarecordwithkeyvalue k

<k,ridofdatarecordwithsearchkeyvalue k>

<k,listofridsofdatarecordswithsearchkeyk>

  • Choiceisorthogonaltotheindexingtechnique

usedtolocatedataentries k*.

  • Tree-structuredindexingtechniquessupport

bothrangesearchesandequalitysearches.

  • ISAM:staticstructure; B+tree:dynamic,

adjustsgracefullyunderinsertsanddeletes.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 3

RangeSearches

  • ``Findallstudentswith gpa >3.0’’

Ifdataisinsortedfile,dobinarysearchtofindfirst suchstudent,thenscantofindothers.

Costofbinarysearchcanbequitehigh.

  • Simpleidea:Createan`index’file.

* Candobinarysearchon(smaller)indexfile!

Page1 Page2 PageN Page3

DataFile

k2 kN k1

IndexFile

slide-2
SLIDE 2

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 4

ISAM

  • Indexfilemaystillbequitelarge.Butwecan

applytheidearepeatedly!

* Leafpagescontaindataentries.

P0 K 1 P 1 K 2 P 2 K m P m

indexentry

Non-leaf Pages Pages Overflow page Primarypages Leaf DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 5

CommentsonISAM

  • Filecreation:Leaf(data)pagesallocated

sequentially,sortedbysearchkey;thenindex pagesallocated,thenspaceforoverflowpages.

  • Indexentries:<searchkeyvalue,pageid>;they

`direct’searchfordataentries,whichareinleafpages.

  • Search:Startatroot;usekeycomparisonstogotoleaf.

CostlogFN;F=#entries/indexpg,N=#leafpgs

  • Insert:Findleafdataentrybelongsto,andputitthere.
  • Delete:Findandremovefromleaf;ifemptyoverflow

page,de-allocate.

* Statictreestructure:inserts/deletesaffectonlyleafpages.

Data Pages IndexPages Overflowpages

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 6

ExampleISAMTree

  • Eachnodecanhold2entries;noneedfor

`next-leaf-page’pointers.(Why?)

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97* 20 33 51 63 40 Root

slide-3
SLIDE 3

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 7

AfterInserting23*,48*,41*,42*...

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97* 20 33 51 63 40 Root 23* 48* 41* 42* Overflow Pages Leaf Index Pages Pages Primary DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 8

...ThenDeleting42*,51*,97*

* Notethat51*appearsinindexlevels,butnotinleaf!

10* 15* 20* 27* 33* 37* 40* 46* 55* 63* 20 33 51 63 40 Root 23* 48* 41*

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 9

B+Tree:MostWidelyUsedIndex

  • Insert/deleteatlogF Ncost;keeptreeheight-

balanced.(F= fanout,N=#leafpages)

  • Minimum50%occupancy(exceptforroot).Each

nodecontainsd <=m <=2d entries.The parameterd iscalledtheorder ofthetree.

  • Supportsequalityandrange-searchesefficiently.

IndexEntries DataEntries ("Sequenceset") (Directsearch)

slide-4
SLIDE 4

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 10

ExampleB+Tree

  • Searchbeginsatroot,andkeycomparisons

directittoaleaf(asinISAM).

  • Searchfor5*,15*,alldataentries>=24*...

* Basedonthesearchfor15*,weknow itisnotinthetree!

Root 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 11

B+TreesinPractice

  • Typicalorder:100.Typicalfill-factor:67%.
  • averagefanout=133
  • Typicalcapacities:
  • Height4:1334 =312,900,700records
  • Height3:1333 =2,352,637records
  • Canoftenholdtoplevelsinbufferpool:
  • Level1=1page=8Kbytes
  • Level2=133pages=1Mbyte
  • Level3=17,689pages=133MBytes

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 12

InsertingaDataEntryintoaB+Tree

  • FindcorrectleafL.
  • PutdataentryontoL.
  • IfLhasenoughspace,done!
  • Else,mustsplit L(intoLandanewnodeL2)
  • Redistributeentriesevenly,copyup middlekey.
  • InsertindexentrypointingtoL2intoparentofL.
  • Thiscanhappenrecursively
  • Tosplitindexnode,redistributeentriesevenly,but

pushup middlekey.(Contrastwithleafsplits.)

  • Splits“grow”tree;rootsplitincreasesheight.
  • Treegrowth:getswider oroneleveltallerattop.
slide-5
SLIDE 5

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 13

Inserting8*intoExampleB+Tree

  • Observehow

minimum

  • ccupancyis

guaranteedin bothleafand indexpgsplits.

  • Notedifference

betweencopy- up andpush-up; besureyou understandthe reasonsforthis.

2* 3* 5* 7* 8* 5 Entrytobeinsertedinparentnode. (Notethat5is continuestoappearintheleaf.) scopiedupand appearsonceintheindex.Contrast 5 24 30 17 13 Entrytobeinsertedinparentnode. (Notethat17ispushedupandonly thiswithaleafsplit.) DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 14

ExampleB+TreeAfterInserting8*

v Noticethatrootwassplit,leadingtoincreaseinheight. v Inthisexample,wecanavoidsplitbyre-distributing entries;however,thisisusuallynotdoneinpractice.

2* 3* Root 17 24 30 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 13 5 7* 5* 8* DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 15

DeletingaDataEntryfromaB+Tree

  • Startatroot,findleafL whereentrybelongs.
  • Removetheentry.
  • IfLisatleasthalf-full,done!
  • IfLhasonlyd-1entries,
  • Trytore-distribute,borrowingfromsibling (adjacent

nodewithsameparentasL).

  • Ifre-distributionfails,merge Landsibling.
  • Ifmergeoccurred,mustdeleteentry(pointingtoL
  • rsibling)fromparentofL.
  • Mergecouldpropagatetoroot,decreasingheight.
slide-6
SLIDE 6

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 16

ExampleTreeAfter(Inserting8*, Then)Deleting19*and20*...

  • Deleting19*iseasy.
  • Deleting20*isdonewithre-distribution.

Noticehowmiddlekeyiscopiedup.

2* 3* Root 17 30 14* 16* 33* 34* 38* 39* 13 5 7* 5* 8* 22* 24* 27 27* 29* DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 17

...AndThenDeleting24*

  • Mustmerge.
  • Observe`toss’of

indexentry(onright), and`pulldown’of indexentry(below).

30 22* 27* 29* 33* 34* 38* 39* 2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39* 5* 8* Root 30 13 5 17 DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 18

ExampleofNon-leafRe-distribution

  • Treeisshownbelowduringdeletionof24*.(What

couldbeapossibleinitialtree?)

  • Incontrasttopreviousexample,canre-distribute

entryfromleftchildofroottorightchild.

Root 13 5 17 20 22 30 14* 16* 17* 18* 20* 33* 34* 38* 39* 22* 27* 29* 21* 7* 5* 8* 3* 2*

slide-7
SLIDE 7

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 19

AfterRe-distribution

  • Intuitively,entriesarere-distributedby`pushing

through’thesplittingentryintheparentnode.

  • Itsufficestore-distributeindexentrywithkey20;

we’vere-distributed17aswellforillustration.

14* 16* 33* 34* 38* 39* 22* 27* 29* 17* 18* 20* 21* 7* 5* 8* 2* 3* Root 13 5 17 30 20 22 DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 20

PrefixKeyCompression

  • Importanttoincreasefan-out.(Why?)
  • Keyvaluesinindexentriesonly`directtraffic’;

canoftencompressthem.

  • E.g.,Ifwehaveadjacentindexentrieswithsearch

keyvaluesDannonYogurt,DavidSmithand DevarakondaMurthy,wecanabbreviateDavid Smith toDav.(Theotherkeyscanbecompressedtoo...)

  • Isthiscorrect?Notquite!Whatifthereisadataentry

DaveyJones?(CanonlycompressDavidSmithtoDavi)

  • Ingeneral,whilecompressing,mustleaveeachindexentry

greaterthaneverykeyvalue(inanysubtree)toitsleft.

  • Insert/deletemustbesuitablymodified.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 21

BulkLoadingofaB+Tree

  • Ifwehavealargecollectionofrecords,andwe

wanttocreateaB+treeonsomefield,doingso byrepeatedlyinsertingrecordsisveryslow.

  • BulkLoading canbedonemuchmoreefficiently.
  • Initialization:Sortalldataentries,insertpointer

tofirst(leaf)pageinanew(root)page.

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* Sortedpagesofdataentries;notyetinB+tree Root

slide-8
SLIDE 8

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 22

BulkLoading(Contd.)

  • Indexentriesforleaf

pagesalways enteredintoright- mostindexpagejust aboveleaflevel. Whenthisfillsup,it splits.(Splitmaygo upright-mostpath totheroot.)

  • Muchfasterthan

repeatedinserts, especiallywhenone considerslocking!

3* 4* 6* 9* 10* 11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44* Root Dataentrypages notyetinB+tree 35 23 12 6 10 20 3* 4* 6* 9* 10* 11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44* 6 Root 10 12 23 20 35 38 notyetinB+tree Dataentrypages DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 23

SummaryofBulkLoading

  • Option1:multipleinserts.
  • Slow.
  • Doesnotgivesequentialstorageofleaves.
  • Option2: BulkLoading
  • Hasadvantagesforconcurrencycontrol.
  • FewerI/Osduringbuild.
  • Leaveswillbestoredsequentially(andlinked,of

course).

  • Cancontrol“fillfactor”onpages.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 24

ANoteon`Order’

  • Order (d)conceptreplacedbyphysicalspace

criterioninpractice(`atleasthalf-full’).

  • Indexpagescantypicallyholdmanymoreentries

thanleafpages.

  • Variablesizedrecordsandsearchkeysmeandiffernt

nodeswillcontaindifferentnumbersofentries.

  • Evenwithfixedlengthfields,multiplerecordswith

thesamesearchkeyvalue(duplicates)canleadto variable-sizeddataentries(ifweuseAlternative(3)).

slide-9
SLIDE 9

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 25

Summary

  • Tree-structuredindexesareidealforrange-

searches,alsogoodforequalitysearches.

  • ISAMisastaticstructure.
  • Onlyleafpagesmodified;overflowpagesneeded.
  • Overflowchainscandegradeperformanceunlesssize
  • fdatasetanddatadistributionstayconstant.
  • B+treeisadynamicstructure.
  • Inserts/deletesleavetreeheight-balanced;logF Ncost.
  • Highfanout(F)meansdepthrarelymorethan3or4.
  • Almostalwaysbetterthanmaintainingasortedfile.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 26

Summary(Contd.)

  • Typically,67% occupancyonaverage.
  • UsuallypreferabletoISAM,modulo locking

considerations;adjuststogrowthgracefully.

  • Ifdataentriesaredatarecords,splitscanchangerids!
  • Keycompressionincreasesfanout,reducesheight.
  • Bulkloadingcanbemuchfasterthanrepeated

insertsforcreatingaB+treeonalargedataset.

  • Mostwidelyusedindexindatabasemanagement

systemsbecauseofitsversatility.Oneofthemost

  • ptimizedcomponentsofaDBMS.