[PDF] - OverviewofStorageandIndexing Chapter8 PDF Document, free download

SLIDE 1

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 1

OverviewofStorageandIndexing

Chapter8

“Howindex-learningturnsnostudentpale Yetholdstheeelofsciencebythetail.”

- AlexanderPope(1688-1744)

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 2

DataonExternalStorage

Disks: Canretrieverandompageatfixedcost

✁

Butreadingseveralconsecutivepagesismuchcheaperthan readingtheminrandomorder

Tapes: Canonlyreadpagesinsequence

✁

Cheaperthandisks;usedforarchivalstorage

Fileorganization: Methodofarrangingafileofrecords
nexternalstorage.

✁

Recordid(rid) issufficienttophysicallylocaterecord

✁

Indexes aredatastructuresthatallowustofindtherecordids

frecordswithgivenvaluesinindexsearchkey fields
Architecture: Buffermanager stagespagesfromexternal

storagetomainmemorybufferpool.Fileandindex layersmakecallstothebuffermanager.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 3

AlternativeFileOrganizations

Manyalternativesexist,eachidealforsome situations,andnotsogoodinothers:

✂

Heap(randomorder)files: Suitablewhentypical accessisafilescanretrievingallrecords.

✂

SortedFiles: Bestifrecordsmustberetrievedin someorder,oronlya`range’ofrecordsisneeded.

✂

Indexes: Datastructurestoorganizerecordsvia treesorhashing.

Likesortedfiles,theyspeedupsearchesforasubsetof

records,basedonvaluesincertain(“searchkey”)fields

Updatesaremuchfasterthaninsortedfiles.

SLIDE 2

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 4

Indexes

Anindexonafilespeedsupselectionsonthe

searchkeyfieldsfortheindex.

✂

Anysubsetofthefieldsofarelationcanbethe searchkeyforanindexontherelation.

✂

Searchkeyisnot thesameaskey (minimalsetof fieldsthatuniquelyidentifyarecordinarelation).

Anindexcontainsacollectionofdataentries,

andsupportsefficientretrievalofalldata entriesk*withagivenkeyvaluek.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 5

AlternativesforDataEntryk*inIndex

Threealternatives:

✁

Datarecordwithkeyvalue k

✁

<k,ridofdatarecordwithsearchkeyvalue k>

✁

<k,listofridsofdatarecordswithsearchkeyk>

Choiceofalternativefordataentriesis
rthogonaltotheindexingtechniqueusedto

locatedataentrieswithagivenkeyvaluek.

✁

Examplesofindexingtechniques:B+trees,hash- basedstructures

✁

Typically,indexcontainsauxiliaryinformationthat directssearchestothedesireddataentries

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 6

AlternativesforDataEntries(Contd.)

Alternative1:

✂

Ifthisisused,indexstructureisafileorganization fordatarecords(insteadofaHeapfileorsorted file).

✂

Atmostoneindexonagivencollectionofdata recordscanuseAlternative1.(Otherwise,data recordsareduplicated,leadingtoredundant storageandpotentialinconsistency.)

✂

Ifdatarecordsareverylarge,#ofpages containingdataentriesishigh.Impliessizeof auxiliaryinformationintheindexisalsolarge, typically.

SLIDE 3

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 7

AlternativesforDataEntries(Contd.)

Alternatives2and3:

✂

Dataentriestypicallymuchsmallerthandata records.So,betterthanAlternative1withlarge datarecords,especiallyifsearchkeysaresmall. (Portionofindexstructureusedtodirectsearch, whichdependsonsizeofdataentries,ismuch smallerthanwithAlternative1.)

✂

Alternative3morecompactthanAlternative2,but leadstovariablesizeddataentriesevenifsearch keysareoffixedlength.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 8

IndexClassification

Primary vs.secondary:Ifsearchkeycontains

primarykey,thencalledprimaryindex.

✂

Unique index:Searchkeycontainsacandidatekey.

Clustered vs. unclustered:Iforderofdatarecords

isthesameas,or`closeto’,orderofdataentries, thencalledclusteredindex.

✂

Alternative1impliesclustered;inpractice,clustered alsoimpliesAlternative1(sincesortedfilesarerare).

✂

Afilecanbeclusteredonatmostonesearchkey.

✂

Costofretrievingdatarecordsthroughindexvaries greatlybasedonwhetherindexisclusteredornot!

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 9

Clusteredvs. Unclustered Index
SupposethatAlternative(2)isusedfordataentries,

andthatthedatarecordsarestoredinaHeapfile.

✂

Tobuildclusteredindex,firstsorttheHeapfile(with somefreespaceoneachpageforfutureinserts).

✂

Overflowpagesmaybeneededforinserts.(Thus,orderof data recs is`closeto’,butnotidenticalto,thesortorder.)

Indexentries Dataentries directsearchfor (IndexFile) (Datafile) DataRecords dataentries Dataentries DataRecords CLUSTERED UNCLUSTERED

SLIDE 4

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 10

Hash-BasedIndexes

Goodforequalityselections.
Indexisacollectionofbuckets.Bucket=primary

page pluszeroormore overflow pages.

Hashingfunction h:h(r)=bucketinwhich

recordr belongs.h looksatthesearchkey fields

fr.
IfAlternative(1) isused,thebucketscontain

thedatarecords;otherwise,theycontain<key, rid>or<key,rid-list>pairs.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 11

B+TreeIndexes

Leafpagescontain dataentries,andarechained(prev &next)
Non-leafpagescontainindexentries anddirectsearches:

P0 K 1 P 1 K 2 P 2 K m P m

indexentry

Non-leaf Pages Pages Leaf DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 12

ExampleB+Tree

Find28*?29*?All>15*and<30*
Insert/delete:Finddataentryinleaf,then

changeit.Needtoadjustparentsometimes.

✁

Andchangesometimesbubblesupthetree

2* 3* Root

17

30 14* 16* 33* 34* 38* 39* 13 5 7* 5* 8* 22* 24* 27 27* 29*

Entries<=17 Entries>17

SLIDE 5

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 13

CostModelforOurAnalysis

WeignoreCPUcosts,forsimplicity:

✂

B:Thenumberofdatapages

✂

R:Numberofrecordsperpage

✂

D:(Average)timetoreadorwritediskpage

✂

MeasuringnumberofpageI/O’signoresgainsof pre-fetchingasequenceofpages;thus,evenI/O costisonlyapproximated.

✂

Average-caseanalysis;basedonseveralsimplistic assumptions. * Goodenoughtoshowtheoveralltrends!

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 14

ComparingFileOrganizations

Heapfiles(randomorder;insertat eof)
Sortedfiles,sortedon<age, sal>
ClusteredB+treefile,Alternative(1),search

key<age, sal>

Heapfilewith unclustered B+treeindexon

searchkey<age, sal>

Heapfilewith unclustered hashindexon

searchkey<age, sal>

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 15

OperationstoCompare

Scan:Fetchallrecordsfromdisk
Equalitysearch
Rangeselection
Insertarecord
Deletearecord

SLIDE 6

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 16

AssumptionsinOurAnalysis

HeapFiles:

✂

Equalityselectiononkey;exactlyonematch.

SortedFiles:

✂

Filescompactedafterdeletions.

Indexes:

✁

Alt(2),(3):dataentrysize=10%sizeofrecord

✂

Hash:Nooverflowbuckets.

80%pageoccupancy=>Filesize=1.25datasize

✂

Tree:67%occupancy(thisistypical).

Impliesfilesize=1.5datasize

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 17

CostofOperations

(a)Scan

(b) Equality (c)Range (d)Insert (e)Delete (1)Heap

(2)Sorted
(3)Clustered
(4)Unclustered

Treeindex

(5)Unclustered

Hashindex

* Severalassumptionsunderliethese(rough)estimates!

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 18

CostofOperations

(a)Scan

(b)Equality (c)Range (d)Insert (e)Delete (1)Heap BD

0.5BD BD 2D Search +D

(2)Sorted BD

Dlog2B Dlog2B+ #matches Search +BD Search +BD

(3)Clustered 1.5BD

DlogF1.5B DlogF1.5B +# matches Search +D Search +D

(4)Unclustered Treeindex BD(R+0.15) D(1+

logF0.15B) DlogF0.15B +# matches D(3+ logF0.15B) Search +2D

(5)Unclustered Hashindex

BD(R+0.1 25) 2D BD 4D Search +2D

* Severalassumptionsunderliethese(rough)estimates!

SLIDE 7

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 19

UnderstandingtheWorkload

Foreachqueryintheworkload:

✁

Whichrelationsdoesitaccess?

✁

Whichattributesareretrieved?

✁

Whichattributesareinvolvedinselection/joinconditions? Howselectivearetheseconditionslikelytobe?

Foreachupdateintheworkload:

✁

Whichattributesareinvolvedinselection/joinconditions? Howselectivearetheseconditionslikelytobe?

✁

Thetypeofupdate(INSERT/DELETE/UPDATE),andthe attributesthatareaffected.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 20

ChoiceofIndexes

Whatindexesshouldwecreate?

✁

Whichrelationsshouldhaveindexes?Whatfield(s) shouldbethesearchkey?Shouldwebuildseveral indexes?

Foreachindex,whatkindofanindexshouldit

be?

✁

Clustered?Hash/tree?

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 21

ChoiceofIndexes(Contd.)

Oneapproach: Considerthemostimportantqueries

inturn.Considerthebestplanusingthecurrent indexes,andseeifabetterplanispossiblewithan additionalindex.Ifso,createit.

✂

Obviously,thisimpliesthatwemustunderstandhowa DBMSevaluatesqueriesandcreatesqueryevaluationplans!

✂

Fornow,wediscusssimple1-tablequeries.

Beforecreatinganindex,mustalsoconsiderthe

impactonupdatesintheworkload!

✁

Trade-off: Indexescanmakequeriesgofaster,updates slower.Requirediskspace,too.

SLIDE 8

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 22

IndexSelectionGuidelines

AttributesinWHEREclausearecandidatesforindexkeys.

✁

Exactmatchconditionsuggestshashindex.

✁

Rangequerysuggeststreeindex.

Clusteringisespeciallyusefulforrangequeries;canalsohelp on

equalityqueriesiftherearemanyduplicates.

Multi-attributesearchkeysshouldbeconsideredwhena

WHEREclausecontainsseveralconditions.

✂

Orderofattributesisimportantforrangequeries.

✂

Suchindexescansometimesenableindex-only strategiesfor importantqueries.

Forindex-onlystrategies,clusteringisnotimportant!
Trytochooseindexesthatbenefitasmanyqueriesas

possible.Sinceonlyoneindexcanbeclusteredperrelation, chooseitbasedonimportantqueriesthatwouldbenefitthe mostfromclustering.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 23

ExamplesofClusteredIndexes

B+treeindexonE.agecanbe

usedtogetqualifying tuples.

✁

Howselectiveisthecondition?

✁

Istheindexclustered?

ConsidertheGROUPBYquery.

✁

Ifmany tuples haveE.age >10,using E.age indexandsortingtheretrieved tuples maybecostly.

✁

ClusteredE.dno indexmaybebetter!

Equalityqueriesandduplicates:

✁

ClusteringonE.hobby helps!

SELECT E.dno FROM Emp E WHERE E.age>40 SELECT E.dno, COUNT (*) FROM Emp E WHERE E.age>10 GROUPBYE.dno SELECT E.dno FROM Emp E WHERE E.hobby=Stamps

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 24

IndexeswithCompositeSearchKeys

CompositeSearchKeys:Search
nacombinationoffields.

✂

Equalityquery:Everyfield valueisequaltoaconstant value.E.g. wrt <sal,age>index:

age=20and sal =75

✂

Rangequery: Somefieldvalue isnotaconstant.E.g.:

age=20;orage=20and sal >10
Dataentriesinindexsorted

bysearchkeytosupport rangequeries.

✂

Lexicographicorder,or

✂

Spatialorder.

sue 13 75 bob cal joe 12 10 20 80 11 12 name age sal <sal,age> <age, sal> <age> <sal> 12,20 12,10 11,80 13,75 20,12 10,12 75,13 80,11 11 12 12 13 10 20 75 80

Datarecords sortedbyname Dataentriesinindex sortedby<sal,age> Dataentries sortedby<sal> Examplesofcompositekey indexesusinglexicographicorder.

SLIDE 9

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 25

CompositeSearchKeys

Toretrieve Emp recordswithage=30AND sal=4000,

anindexon<age,sal>wouldbebetterthananindex

nage oranindexon sal.

✁

Choiceofindexkeyorthogonaltoclusteringetc.

Ifconditionis:20<age<30AND 3000<sal<5000:

✁

Clusteredtreeindexon<age,sal>or<sal,age>isbest.

Ifconditionis:age=30AND 3000<sal<5000:

✁

Clustered<age,sal>indexmuchbetterthan<sal,age> index!

Compositeindexesarelarger,updatedmoreoften.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 26

Index-OnlyPlans

Anumberof

queriescanbe answered without retrievingany tuples fromone

rmoreofthe

relations involvedifa suitableindex isavailable.

SELECT D.mgr FROM DeptD, Emp E WHERE D.dno=E.dno SELECT D.mgr,E.eid FROM DeptD, Emp E WHERE D.dno=E.dno SELECT E.dno,COUNT(*) FROM Emp E GROUPBYE.dno SELECT E.dno,MIN(E.sal) FROM Emp E GROUPBYE.dno SELECT AVG(E.sal) FROM Emp E WHEREE.age=25AND

E.sal BETWEEN 3000AND5000 <E.dno> <E.dno,E.eid> Treeindex! <E.dno> <E.dno,E.sal> Treeindex! <E.age,E.sal>

r

<E.sal,E.age> Tree!

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 27

Index-OnlyPlans(Contd.)

Index-onlyplans

arepossibleifthe keyis<dno,age>

rwehaveatree

indexwithkey <age,dno>

✂

Whichisbetter?

✂

Whatifwe considerthe secondquery?

SELECT E.dno, COUNT (*) FROM Emp E WHERE E.age=30 GROUPBYE.dno SELECT E.dno, COUNT (*) FROM Emp E WHERE E.age>30 GROUPBYE.dno

SLIDE 10

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 28

Summary

Manyalternativefileorganizationsexist,each

appropriateinsomesituation.

Ifselectionqueriesarefrequent,sortingthe

fileorbuildinganindex isimportant.

✁

Hash-basedindexesonlygoodforequalitysearch.

✁

Sortedfilesandtree-basedindexesbestforrange search;alsogoodforequalitysearch.(Filesrarely keptsortedinpractice;B+treeindexisbetter.)

Indexisacollectionofdataentriesplusaway

toquicklyfindentrieswithgivenkeyvalues.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 29

Summary(Contd.)

Dataentriescanbeactualdatarecords,<key,

rid>pairs,or<key,rid-list>pairs.

✁

Choiceorthogonaltoindexingtechniqueusedto locatedataentrieswithagivenkeyvalue.

Canhaveseveralindexesonagivenfileof

datarecords,eachwithadifferentsearchkey.

Indexescanbeclassifiedasclusteredvs.

unclustered,primaryvs.secondary,and densevs.sparse.Differenceshaveimportant consequencesforutility/performance.

DatabaseManagementSystems3ed,R.RamakrishnanandJ.Gehrke 30

Summary(Contd.)

Understandingthenatureoftheworkload forthe

application,andtheperformancegoals,isessential todevelopingagooddesign.

✁

Whataretheimportantqueriesandupdates?What attributes/relationsareinvolved?

Indexesmustbechosentospeedupimportant

queries(andperhapssomeupdates!).

✁

Indexmaintenanceoverheadonupdatestokeyfields.

✁

Chooseindexesthatcanhelpmanyqueries,ifpossible.

✁

Buildindexestosupportindex-onlystrategies.

✁

Clusteringisanimportantdecision;onlyoneindexona givenrelationcanbeclustered!

✁