TheEvilsofRedundancy Redundancy isattherootofseveralproblems - - PDF document

the evils of redundancy
SMART_READER_LITE
LIVE PREVIEW

TheEvilsofRedundancy Redundancy isattherootofseveralproblems - - PDF document

TheEvilsofRedundancy Redundancy isattherootofseveralproblems associatedwithrelationalschemas: SchemaRefinementand redundantstorage,insert/delete/updateanomalies


slide-1
SLIDE 1

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 1

SchemaRefinementand NormalForms

Chapter15

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 2

TheEvilsofRedundancy

❖ Redundancy isattherootofseveralproblems

associatedwithrelationalschemas:

– redundantstorage,insert/delete/updateanomalies

❖ Integrityconstraints,inparticular functional

dependencies,canbeusedtoidentifyschemaswith suchproblemsandtosuggestrefinements.

❖ Mainrefinementtechnique:decomposition (replacing

ABCDwith,say,ABandBCD,orACDandABD).

❖ Decompositionshouldbeusedjudiciously:

– Istherereasontodecomposearelation? – Whatproblems(ifany)doesthedecompositioncause?

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 3

FunctionalDependencies(FDs)

❖ Afunctionaldependency XYholdsoverrelationR

if,foreveryallowableinstancer ofR:

– t1r,t2r,(t1)=(t2)implies(t1)=(t2) – i.e.,giventwo tuples inr,iftheXvaluesagree,thentheY

valuesmustalsoagree.(XandYaresets ofattributes.)

❖ AnFDisastatementaboutall allowablerelations.

– Mustbeidentifiedbasedonsemanticsofapplication. – Givensomeallowableinstancer1 ofR,wecancheckifit

violatessomeFDf,butwecannottelliff holdsoverR!

❖ KisacandidatekeyforRmeansthatKR

– However,KRdoesnotrequireKtobeminimal!

∈ ∈ π X

π X π Y πY →

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 4

Example:ConstraintsonEntitySet

❖ ConsiderrelationobtainedfromHourly_Emps:

– Hourly_Emps (ssn,name,lot,rating, hrly_wages,hrs_worked)

❖ Notation:Wewilldenotethisrelationschemaby

listingtheattributes:SNLRWH

– Thisisreallytheset ofattributes{S,N,L,R,W,H}. – Sometimes,wewillrefertoallattributesofarelationby

usingtherelationname.(e.g.,Hourly_Emps forSNLRWH)

❖ Some FDs onHourly_Emps:

– ssn isthekey:SSNLRWH – rating determines hrly_wages:RW

→ →

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 5

Example(Contd.)

❖ ProblemsduetoRW: – Updateanomaly:Can

wechangeWinjust the1st tuple ofSNLRWH?

– Insertionanomaly:Whatifwe

wanttoinsertanemployee anddon’tknowthehourly wageforhisrating?

– Deletionanomaly:Ifwedelete

allemployeeswithrating5, welosetheinformationabout thewageforrating5!

S N L R W H 123-22-3666 Attishoo 48 8 10 40 231-31-5368 Smiley 22 8 10 30 131-24-3650 Smethurst 35 5 7 30 434-26-3751 Guldu 35 5 7 32 612-67-4134 Madayan 35 8 10 40 S N L R H 123-22-3666 Attishoo 48 8 40 231-31-5368 Smiley 22 8 30 131-24-3650 Smethurst 35 5 30 434-26-3751 Guldu 35 5 32 612-67-4134 Madayan 35 8 40 R W 8 10 5 7

Hourly_Emps2 Wages

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 6

RefininganERDiagram

❖ 1stdiagramtranslated:

Workers(S,N,L,D,S) Departments(D,M,B)

– Lotsassociatedwithworkers.

❖ Supposeallworkersina

deptareassignedthesame lot:DL

❖ Redundancy;fixedby:

Workers2(S,N,D,S) Dept_Lots(D,L)

❖ Canfine-tunethis:

Workers2(S,N,D,S) Departments(D,M,B,L)

lot dname budget did since name Works_In Departments Employees ssn lot dname budget did since name Works_In Departments Employees ssn

Before: After:

slide-2
SLIDE 2

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 7

ReasoningAbout FDs

❖ Givensome FDs,wecanusuallyinferadditional FDs:

– ssn

did,didlotimplies ssn lot

❖ AnFDf isimpliedby asetof FDs F iff holds

wheneverall FDs inF hold.

=closureofFisthesetofall FDs thatareimpliedbyF.

❖ Armstrong’sAxioms(X,Y,Zaresetsofattributes):

– Reflexivity:IfXY,thenXY – Augmentation:IfXY,thenXZYZforanyZ – Transitivity:IfXYandYZ,thenXZ

❖ Thesearesound andcomplete inferencerulesfor FDs!

→ → →

F +

→ → → → → →

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 8

ReasoningAbout FDs (Contd.)

❖ Coupleofadditionalrules(thatfollowfromAA):

– Union:IfXYandXZ,thenXYZ – Decomposition:IfXYZ,thenXYandXZ

❖ Example:Contracts(cid,sid,jid,did,pid,qty,value),and:

– Cisthekey:CCSJDPQV – Projectpurchaseseachpartusingsinglecontract: JPC – Deptpurchasesatmostonepartfromasupplier: SDP

❖ JPC,CCSJDPQVimplyJPCSJDPQV ❖ SDPimpliesSDJJP ❖ SDJJP,JPCSJDPQVimplySDJCSJDPQV

→ → → → → → →

→ → → → → → → → → →

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 9

ReasoningAbout FDs (Contd.)

❖ Computingtheclosureofasetof FDs canbe

expensive.(Sizeofclosureisexponentialin# attrs!)

❖ Typically,wejustwanttocheckifagivenFDXYis

intheclosureofasetof FDs F.Anefficientcheck:

– Computeattributeclosure ofX(denoted) wrt F:

◆ SetofallattributesAsuchthatXAisin ◆ Thereisalineartimealgorithmtocomputethis.

– CheckifYisin

❖ DoesF={AB,BC,CDE}implyAE?

– i.e,isAEintheclosure?Equivalently,isEin ?

X+

X+ F+ A+ F+

→ → → → →

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 10

NormalForms

❖ Returningtotheissueofschemarefinement,thefirst

questiontoaskiswhetheranyrefinementisneeded!

❖ Ifarelationisinacertainnormalform (BCNF,3NF

etc.),itisknownthatcertainkindsofproblemsare avoided/minimized.Thiscanbeusedtohelpus decidewhetherdecomposingtherelationwillhelp.

❖ Roleof FDs indetectingredundancy:

– ConsiderarelationRwith3attributes,ABC.

◆ No FDs hold:Thereisnoredundancyhere. ◆ GivenAB:Several tuples couldhavethesameA

value,andifso,they’llallhavethesameBvalue!

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 11

Boyce-Codd NormalForm(BCNF)

❖ Reln Rwith FDs F isinBCNF if,forallXAin

– AX(calledatrivial FD),or – XcontainsakeyforR.

❖ Inotherwords,RisinBCNFiftheonlynon-trivial

FDs thatholdoverRarekeyconstraints.

– NodependencyinRthatcanbepredictedusing FDs alone. – Ifweareshowntwo tuples thatagreeupon

theXvalue,wecannotinfertheAvaluein

  • ne tuple fromtheAvalueintheother.

– IfexamplerelationisinBCNF,the2 tuples

mustbeidentical(sinceXisakey).

F+ →

X Y A x y1 a x y2 ?

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 12

ThirdNormalForm(3NF)

❖ Reln Rwith FDs F isin3NF if,forallXAin

– AX(calledatrivial FD),or – XcontainsakeyforR,or – AispartofsomekeyforR.

❖ Minimality ofakeyiscrucialinthirdconditionabove! ❖ IfRisinBCNF,obviouslyin3NF. ❖ IfRisin3NF,someredundancyispossible.Itisa

compromise,usedwhenBCNFnotachievable(e.g., no``good’’ decomp,orperformanceconsiderations).

– Lossless-join,dependency-preservingdecompositionofRintoa

collectionof3NFrelationsalwayspossible.

F+ →

slide-3
SLIDE 3

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 13

WhatDoes3NFAchieve?

❖ If3NFviolatedbyXA,oneofthefollowingholds:

– XisasubsetofsomekeyK

◆ Westore(X,A)pairsredundantly.

– Xisnotapropersubsetofanykey.

◆ Thereisachainof FDs KXA,whichmeansthat

wecannotassociateanXvaluewithaKvalueunlesswe alsoassociateanAvaluewithanXvalue.

❖ But:evenif reln isin3NF,theseproblemscouldarise.

– e.g.,ReservesSBDC,SC,CSisin3NF,butfor

eachreservationofsailorS,same(S,C)pairisstored.

❖ Thus,3NFisindeedacompromiserelativetoBCNF.

→ → → → →

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 14

DecompositionofaRelationScheme

❖ SupposethatrelationRcontainsattributesA1...An.

Adecomposition ofRconsistsofreplacingRbytwoor morerelationssuchthat:

– Eachnewrelationschemecontainsasubsetoftheattributes

  • fR(andnoattributesthatdonotappearinR),and

– EveryattributeofRappearsasanattributeofoneofthe

newrelations.

❖ Intuitively,decomposingRmeanswewillstore

instancesoftherelationschemesproducedbythe decomposition,insteadofinstancesofR.

❖ E.g.,CandecomposeSNLRWH intoSNLRH andRW.

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 15

ExampleDecomposition

❖ Decompositionsshouldbeusedonlywhenneeded.

– SNLRWHhas FDs SSNLRWHandRW – SecondFDcausesviolationof3NF;Wvaluesrepeatedly

associatedwithRvalues.Easiestwaytofixthisistocreate arelationRWtostoretheseassociations,andtoremoveW fromthemainschema:

◆ i.e.,wedecomposeSNLRWHintoSNLRHandRW

❖ TheinformationtobestoredconsistsofSNLRWH

tuples.Ifwejuststoretheprojectionsofthese tuples

  • ntoSNLRHandRW,arethereanypotential

problemsthatweshouldbeawareof? → →

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 16

ProblemswithDecompositions

❖ Therearethreepotentialproblemstoconsider:

❶ Somequeriesbecomemoreexpensive.

◆ e.g.,HowmuchdidsailorJoeearn?(salary=W*H)

❷ Giveninstancesofthedecomposedrelations,wemaynot

beabletoreconstructthecorrespondinginstanceofthe

  • riginalrelation!

◆ Fortunately,notintheSNLRWHexample.

❸ Checkingsomedependenciesmayrequirejoiningthe

instancesofthedecomposedrelations.

◆ Fortunately,notintheSNLRWHexample.

❖ Tradeoff:Mustconsidertheseissuesvs.redundancy.

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 17

Lossless JoinDecompositions

❖ DecompositionofRintoXandYis lossless-join w.r.t.

asetof FDs Fif,foreveryinstancer thatsatisfiesF:

(r)(r)=r

❖ Itisalwaystruethatr(r)(r)

– Ingeneral,theotherdirectiondoesnothold!Ifitdoes,the

decompositionis lossless-join.

❖ Definitionextendedtodecompositioninto3ormore

relationsinastraightforwardway.

❖ Itisessentialthatalldecompositionsusedtodealwith

redundancybe lossless!(AvoidsProblem(2).) π X π Y

⊆ π X

✂ ✄

π Y

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 18

Moreon Lossless Join

❖ ThedecompositionofRinto

XandYis lossless-join wrt F ifandonlyiftheclosureofF contains:

– XYX,or – XYY

❖ Inparticular,the

decompositionofRinto UVandR- Vis lossless-join ifUVholdsoverR. → →

∩ ∩

→ A B C 1 2 3 4 5 6 7 2 8 1 2 8 7 2 3 A B C 1 2 3 4 5 6 7 2 8 A B 1 2 4 5 7 2 B C 2 3 5 6 2 8

slide-4
SLIDE 4

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 19

DependencyPreservingDecomposition

❖ ConsiderCSJDPQV,Ciskey,JPCandSDP.

– BCNFdecomposition:CSJDQVandSDP – Problem:CheckingJPCrequiresajoin!

❖ Dependencypreservingdecomposition(Intuitive):

– IfRisdecomposedintoX,YandZ,andweenforcethe FDs

thatholdonX,onYandonZ,thenall FDs thatweregiven toholdonRmustalsohold.(AvoidsProblem(3).)

❖ Projectionofsetof FDs F:IfRisdecomposedintoX,...

projectionofFontoX(denotedFX )isthesetof FDs UVinF+ (closureofF) suchthatU,VareinX. → → → →

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 20

DependencyPreservingDecompositions(Contd.)

❖ DecompositionofRintoXandYisdependency

preserving if(FX unionFY)+=F +

– i.e.,ifweconsideronlydependenciesintheclosureF + that

canbecheckedinXwithoutconsideringY,andinY withoutconsideringX,theseimplyalldependenciesinF +.

❖ ImportanttoconsiderF +,notF,inthisdefinition:

– ABC,AB,BC,CA,decomposedintoABandBC. – Isthisdependencypreserving?IsCApreserved?????

❖ Dependencypreservingdoesnotimply lossless join:

– ABC,AB,decomposedintoABandBC.

❖ Andvice-versa!(Example?)

→ → → → →

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 21

DecompositionintoBCNF

❖ ConsiderrelationRwith FDs F.IfXYviolates

BCNF,decomposeRintoR- YandXY.

– Repeatedapplicationofthisideawillgiveusacollectionof

relationsthatareinBCNF; lossless joindecomposition,and guaranteedtoterminate.

– e.g.,CSJDPQV,keyC,JPC,SDP,JS – TodealwithSDP,decomposeintoSDP,CSJDQV. – TodealwithJS,decomposeCSJDQVintoJSandCJDQV

❖ Ingeneral,severaldependenciesmaycauseviolation

  • fBCNF.Theorderinwhichwe``dealwith’’them

couldleadtoverydifferentsetsofrelations! → → → → → →

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 22

BCNFandDependency Preservation

❖ Ingeneral, theremaynotbeadependencypreserving

decompositionintoBCNF.

– e.g.,CSZ,CSZ,ZC – Can’tdecomposewhilepreserving1stFD;notinBCNF.

❖ Similarly,decompositionofCSJDQVintoSDP,JS

andCJDQVisnotdependencypreserving(w.r.t.the FDs JPC,SDPandJS).

– However,itisa lossless joindecomposition. – Inthiscase,addingJPCtothecollectionofrelationsgives

usadependencypreservingdecomposition.

◆ JPC tuples storedonlyforcheckingFD!(Redundancy!)

→ → → → →

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 23

Decompositioninto3NF

❖ Obviously,thealgorithmfor lossless join decomp into

BCNFcanbeusedtoobtaina lossless join decomp into3NF(typically,canstopearlier).

❖ Toensuredependencypreservation,oneidea:

– IfXYisnotpreserved,addrelationXY. – ProblemisthatXYmayviolate3NF!e.g.,considerthe

additionofCJPto`preserve’JPC.Whatifwealso haveJC?

❖ Refinement:Insteadofthegivensetof FDs F,usea

minimalcoverforF. → → →

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 24

MinimalCoverforaSetof FDs

❖ Minimalcover Gforasetof FDs F:

– ClosureofF=closureofG. – RighthandsideofeachFDinGisasingleattribute. – IfwemodifyGbydeletinganFDorbydeletingattributes

fromanFDinG,theclosurechanges.

❖ Intuitively,everyFDinGisneeded,and``assmallas

possible’’inordertogetthesameclosureasF.

❖ e.g.,AB,ABCDE,EFGH,ACDFEG

hasthefollowingminimalcover:

– AB,ACDE,EFGandEFH

❖ M.C.→ Lossless-Join,Dep.Pres. Decomp!!!(inbook)

→ → → → → → → →

slide-5
SLIDE 5

DatabaseManagementSystems,R. Ramakrishnan andJ. Gehrke 25

SummaryofSchemaRefinement

❖ IfarelationisinBCNF,itisfreeofredundanciesthat

canbedetectedusing FDs.Thus,tryingtoensure thatallrelationsareinBCNFisagoodheuristic.

❖ IfarelationisnotinBCNF,wecantrytodecompose

itintoacollectionofBCNFrelations.

– Mustconsiderwhetherall FDs arepreserved.Ifa lossless-

join,dependencypreservingdecompositionintoBCNFis notpossible(orunsuitable,giventypicalqueries),should considerdecompositioninto3NF.

– Decompositionsshouldbecarriedoutand/orre-examined

whilekeepingperformancerequirements inmind.