ProbabilisticModeling and JointDistributionModel Probabilistic - - PowerPoint PPT Presentation

probabilistic modeling and joint distribution model
SMART_READER_LITE
LIVE PREVIEW

ProbabilisticModeling and JointDistributionModel Probabilistic - - PowerPoint PPT Presentation

ProbabilisticModeling and JointDistributionModel Probabilistic Modeling / Joint Distribution Model 1 Haluk Madencioglu ElementsofProbabilityTheory Introduction Concernedwithanalysisofrandomphenomena


slide-1
SLIDE 1

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

1

ProbabilisticModeling and JointDistributionModel

slide-2
SLIDE 2

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

2

ElementsofProbabilityTheory

Concernedwithanalysisofrandomphenomena Originatedfromgambling&games Usesideasofcounting,combinatoricsandmeasure

theory

Usesmathematicalabstractionsofnon!deterministic

events

Introduction

slide-3
SLIDE 3

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

3

ElementsofProbabilityTheory

Continuousprobabilitytheorydealswitheventsthat

  • ccurinacontinuoussamplespace

Discreteprobabilitydealswitheventsthatoccurin

countablesamplespaces

Events:asetofoutcomesofanexperiment Events:asubsetofsamplespace

Introduction

slide-4
SLIDE 4

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

4

ElementsofProbabilityTheory

Nonnegativity :0≤P(E)≤1 Additivity : Normalization(unitmeasure):P(Ω) =1, P(∅)=0 Someconsequences:

P(Ω \ E) = 1-P(E)

{Ω : universe}

P(A U B) = P(A) + P(B) – P(A∩B) P(A \ B) = P(A) – P(B) if B ⊆ A

AxiomsofProbability

1 2

( , ..., ) ( )

n i i

P E E E P E = ∑

slide-5
SLIDE 5

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

5

ElementsofProbabilityTheory

Bayes Rule: P(A|B) =P(A,B) / P(B) OR:

P(A|B) =P(B|A).P(A) / P(B)

Independencycondition:P(A,B) = P(A).P(B) Mutuallyexclusiveevents:P(A,B) = 0 Mutuallyexclusiveevents:P(A U B) = P(A) + P(B)

OR P(A \ B) = P(A)

Conditionalprobability

slide-6
SLIDE 6

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

6

ElementsofProbabilityTheory

Avariable Afunctionmappingthesamplespaceofarandom

processtothevalues

Valuescanbediscreteorcontinuous Eachoutcomeasvalue(orarange)isassigneda

probability

RandomVariables

slide-7
SLIDE 7

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

7

ElementsofProbabilityTheory

Avariable Afunctionmappingthesamplespaceofarandom

processtothevalues

Valuescanbediscreteorcontinuous Discreteexample:faircointoss X={ 1ifheads,0iftails} Orfairdiceroll:X={ “thenumbershownondice”}

RandomVariables

slide-8
SLIDE 8

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

8

ElementsofProbabilityTheory

Continuousexample:spinner Outcomecanbeanyrealnumberin [0,2π) Anyspecificvaluehaszeroprobability Soweuserangesinsteadofsinglepoints E.g.havingavaluein[0,π/2 ] hasprobability 1/4

RandomVariables

slide-9
SLIDE 9

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

9

ElementsofProbabilityTheory

( )

X

P x

Incaseofdiscreterandomvariablesweuse

probabilitymassfunction

  • ={ 1/2 if X=0, 1/2 if X=1, 0 otherwise}

Noticetheuseofuppercasefortherandomvariable

andlowercaseforthemassfunctionvariable

Cumulativedistributionfunction(CDF):

RandomVariables

( ) ( )

X

F x P X x = ≤

slide-10
SLIDE 10

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

10

ElementsofProbabilityTheory

[ ] ( )

b X a

P a X b p x dx ≤ ≤ = ∫

Incaseofcontinuousvariables, Weuseaprobabilitydensityfunction SothattheCDFbecomes

RandomVariables

( ) ( )

x X

F x p u du

−∞

= ∫

slide-11
SLIDE 11

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

11

ElementsofProbabilityTheory

Discreteuniformdistribution

WellKnownDistributions

slide-12
SLIDE 12

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

12

ElementsofProbabilityTheory

Binomialdistribution Specialcase:n=1!>Bernoullidistribution

WellKnownDistributions

slide-13
SLIDE 13

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

13

ElementsofProbabilityTheory

Specialcase:n=1!>Bernoullidistribution

WellKnownDistributions

slide-14
SLIDE 14

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

14

ElementsofProbabilityTheory

Poissondistribution:neventsoccurwithaknown

averagerateλ λ λ λ andindependentlyofthetimesince thelastevent

WellKnownDistributions

slide-15
SLIDE 15

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

15

ElementsofProbabilityTheory

Expectedvalue:Ameasureofprobabilityweighted

averageofexpectedoutcomes

Variance:expectedvalueofthesquareofthe

deviationofrandomvariablefromitsexpectedvalue

ExpectedValueandVariance

slide-16
SLIDE 16

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

16

ElementsofProbabilityTheory

Morethanonerandomvariable Onthesameprobabilityspace(universe) Eventsdefinedintermsofallvariables Calledmultivariatedistribution Calledbivariate iftwovariablesinvolved RememberingBayes rule,conditionaldistribution:

JointDistributions

slide-17
SLIDE 17

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

17

ProbabilisticModeling

Similartoprobabilities,ifvariablesareindependent: Continuousdistributioncase: Marginaldistributions: Reducestosimpleproductsummationifindependent

JointDistributions

slide-18
SLIDE 18

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

18

ProbabilisticModeling

1 2

( , ..., )

n

V V V V =

Ingeneralasetofnrandomvariables: Withpossibleoutcomesforeachvariable: Aconfigurationisavectorofxwhereeachvalueis

assignedtoavariable

CSCI6509NotesFall2009 FacultyofComputerScienceDalhousieUniversity

RandomConfigurations

1 2

{ , ..., }

m

x x x

1 2

( , ..., )

n

x x x x =

slide-19
SLIDE 19

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

19

ProbabilisticModeling

(1) ( )

,...,

t

x x

Inmodelingweassumeasequenceofconfigurations:

  • Hereweassumeafixednumber(n)ofcomponentsineach

configuration,andarevaluesfromfiniteset

RandomConfigurations

(1) 11 12 1

( , ,... )

n

x x x x =

(2) 21 22 2

( , ,... )

n

x x x x =

( ) 1 2

( , ,... )

t t t tn

x x x x =

ij

x

slide-20
SLIDE 20

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

20

ProbabilisticModeling

NLPusesprobabilisticmodelingasaframeworkfor

solvingproblems

Computationaltasks:

Representationofmodels Simulation:generatingrandomconfigurations Evaluation:computingprobabilityofacompleteconfiguration Marginalization:computingprobabilityofapartialconfiguration Conditioning:computingconditionalprobabilityofcompletion

givenpartialobservation

Completion:findmostprobablecompletionofpartialobservation Learning:parameterestimation

RandomConfigurations

slide-21
SLIDE 21

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

21

ProbabilisticModeling

Jointdistributionmodel

Ajointprobabilitydistribution

specifiestheprobabilityofeachcomplete configuration

Ingeneralittakesmxnparameters(lessone

constraint)tospecifyanarbitraryjointdistributionon nrandomvariableswithmvalues

1 1 2 2

( , ,.... )

n n

P X x X x X x = = =

slide-22
SLIDE 22

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

22

ProbabilisticModeling

Thiscanbecapturedinlookuptable

wheregivestheprobabilityofRV’stakingon jointlytheconfiguration

So Satisfying

Jointdistributionmodel

( )

( )

( )

k

k x

P X x θ = =

( )

1

1

k

V x k

θ

=

=

(1) (1) ( )

, ,...

n V

x x x

θ θ θ

( ) k

x

θ

( ) k

x

slide-23
SLIDE 23

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

23

ProbabilisticModeling

Simulation:Giventhelookuptablerepresentation,computethe

cumulativevalueoftheconfigurations,selectthe whose cumulativeprobabilityintervalcontainsagivenpvalue

Evaluation:Evaluatetheprobabilityofacompleteconfiguration

Fromthelookuptable:

Marginalization:theprobabilityofanincompleteconfiguration:

Fromlookuptable:

Moreoncomputationaltasks

( ) k

x

θ

( ) k

x

1 2

( , ..., )

n

x x x x =

1 2

1 1 ( ..... )

( ,... )

n

n n x x x

P X x X x θ = = =

1

1 1 1 1 1 1

( ,... ) .... ( ,... , ...., )

k n

n n k k k k n n y y

P X x X x P X x X x X y X y

+

+ +

= = = = = = =

∑ ∑

1 2 1 1

( ..... , ..... )

....

k k n k n

x x x y y y y

θ

+ +

= ∑

slide-24
SLIDE 24

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

24

ProbabilisticModeling

Completion: Computetheconditionalprobabilityofapossible

completion givenanincompleteconfiguration Needtoevaluateacompleteconfigurationandthendividebya marginalsum

Moreoncomputationaltasks

1 2

( , ..., )

k k n

y y y

+ + 1 2

( , ..., )

n

x x x x =

1 2 1 1 2 1 1

( ..... .... ) ( ..... , ..... )

....

k k n k k n k n

x x x y y x x x z z z z

θ θ

+ + +

∑ ∑

slide-25
SLIDE 25

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

25

ProbabilisticModeling

Spamdetection:anarbitrarye!mailmessageis

spamornot

  • Caps=‘Y’ ifthemessagesubjectlinedoesnotcontainlowercaseletter,‘N’
  • therwise,
  • Free=‘Y’ iftheword‘free’ appearsinthemessagesubjectline(lettercaseis

ignored),‘N’ otherwise, and

  • Spam=‘Y’ ifthemessageisspam,and‘N’ otherwise.

Randomlyselect100messages,counthowmanytimeseachconfiguration appears

Example

slide-26
SLIDE 26

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

26

ProbabilisticModeling

  • Given a fully specified joint distribution table, one can lookup the

probability of any configuration. For example: P(Free = Y; Caps = Y; Spam = Y ) = 0.2 P(Free = Y; Caps = N; Spam = N) = 0.0

Example

slide-27
SLIDE 27

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

27

ProbabilisticModeling

Jointdistributionmodel

DrawbacksofJointDistributionModel: memorycosttostoretable running!timecosttodosummations thesparsedataprobleminlearning

slide-28
SLIDE 28

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

28

ProbabilisticModeling

Ideafortraditionalgenerativemodel: whatdoestheautomatonbelowgenerate ?

  • Iknowthatskyisblue,Iknowthatheknowsthatskyisblue,IknowthatI

knowthatskyisblue,…

  • Butnot:skyisblue,Iknowhe,Ibluethat…
  • Thisisthelanguageofthisautomaton

GenerativeModel

I know that he knows sky is blue

slide-29
SLIDE 29

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

29

ProbabilisticModeling

Ideaforprobabilisticgenerativemodel:

P(STOP|Qi) = 0.2 (Manning,Raghavan &Schutze,2009)

Ifinsteadeachnodehasaprobabilitydistribution

  • vergeneratingdifferentterms,wehavealanguage

model

GenerativeModel

Qi

string assigned probability the 0.2 a 0.1 frog 0.01 toad 0.01 said 0.03 likes 0.02 that 0.04 …. ….

slide-30
SLIDE 30

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

30

ProbabilisticModeling

*

( ) 1

i i

P t

∈Σ

=

Alanguagemodelisafunctionthatputsa

probabilitymeasureoverstringsdrawnfromsome vocabulary

Eachisatermemissionprobabilityinthis

unigrammodel

Suchamodelplacesaprobabilitydistributionover

anysequenceofwords

Byconstruction,italsoprovidesamodelfor

generatingtextaccordingtoitsdistribution

GenerativeModel

( )

i

P t

slide-31
SLIDE 31

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

31

ProbabilisticModeling

P(frogsaidthattoadlikesfrog)=(0.01×0.03× 0.04

× 0.01× 0.02× 0.01){emissionprobabilities} X (0.8×0.8× 0.8× 0.8× 0.8× 0.8× 0.2) {continue/stopprobabilities} =0.000000000001573

Usuallycontinue/stopprobabilitiesareomittedwhen

comparingmodels

Basedoncomputedvalue,amodelismorelikely

GenerativeModel

slide-32
SLIDE 32

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

32

ProbabilisticModeling

Comparethismodeltothepreviousmodel:

{omittingP(stop)} P(s|M1)=0.00000000000048 P(s|M2)=0.000000000000000384 Somodel1ismorelikely

GenerativeModel

string assigned probability the 0.15 a 0.12 frog 0.0002 toad 0.0001 said 0.03 likes 0.04 that 0.04 …. ….

slide-33
SLIDE 33

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

33

ProbabilisticModeling

Ingeneralforasequenceofeventsusingearlier

successiveeventsusingBayesianInferenceRule:

Iftotalindependenceamongeventsexists: Thisisunigrammodel

TypesofGenerativeModels

1 2 3 4 1 2 1 3 1 2 4 1 2 3

( ) ( ) ( | ) ( | ) ( | ) P t t t t P t P t t P t t t P t t t t =

1 2 3 4 1 2 3 4

( ) ( ) ( ) ( ) ( )

uni

P t t t t P t P t P t P t =

slide-34
SLIDE 34

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

34

ProbabilisticModeling

Ifonlyconditioningisonthepreviousterm Thisisbigram model Unigrammodelsfrequentlyusedwhensentence

structureisnotimportant

E.g.inIRbutnotinspeechrecognition

TypesofGenerativeModels

1 2 3 4 1 2 1 3 2 4 3

( ) ( ) ( | ) ( | ) ( | )

bi

P t t t t P t P t t P t t P t t =

slide-35
SLIDE 35

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

35

ProbabilisticModeling

Unigrammodelsareoftype‘bagofwords’ Recallsamultinomialdistributionofprobabilities

  • verwords
  • WhereisthelengthofdocumentdwithvocabularyofsizeM
  • Observeherethepositionsofthetermsareinsignificant

TypesofGenerativeModels

1 2 3 4 1 2 1 3 2 4 3

( ) ( ) ( | ) ( | ) ( | )

bi

P t t t t P t P t t P t t P t t =

, , , 1 2 1 2

1 2 , , ,

! ( ) ( ) ( ) .... ( ) ! !... !

t d t d tM d M

f f f d M t d t d t d

L P d P t P t P t tf tf tf =

d

L

slide-36
SLIDE 36

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

36

ProbabilisticModeling

Fundamentalquestion:whichmodeltouse? Speechrecognition:themodelhastobegeneral

enoughbeyondobserveddatatoallowunknown sequences

IR:adocumentisfiniteandmostlyfixed

Getarepresentativesample Buildalanguagemodelfordocument Calculategenerative probabilitiesofsequencesfromthemodel Rankdocumentsbyprobabilityrankingprinciple

TypesofGenerativeModels

slide-37
SLIDE 37

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

37

ProbabilisticApproaches

rankdocumentsbytheirestimatedprobabilityof

relevance

P(R=1|d,q)fordocumentd,queryq Basiccase:1/0loss Rankdocuments,returntopk Nonrestrictivecase:Bayes optimaldecisionrule disrelevantiffP(R=1|d,q)>P(R=0|d,q)

ProbabilityRankingPrinciple

slide-38
SLIDE 38

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

38

ProbabilisticApproaches

Ifcostisinvolved:

C0 · P(R = 0|d) − C1 · P(R = 1|d) ≤ C0 · P(R = 0|d′) − C1 · P(R = 1|d′)) where C1=costofmissingrelevantdocument C0=costofreturningnonrelevantdocument

ProbabilityRankingPrinciple

slide-39
SLIDE 39

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

39

ProbabilisticModeling

Ratherthanadocumentmodel,andchecking

likelihoodofgeneratingquery,

Buildaquerymodelandchecklikelihoodof

generatingadocument

OR:usebothapproachestogether

Needsameasureofdivergencebetweendocumentandquerymodels Kullback!Leibler divergence:

TypesofOtherGenerativeModels

( | ) ( ; ) ( | )log ( | )

q q t V d

P t M R d q P t M P t M

=∑

slide-40
SLIDE 40

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

40

ProbabilisticModeling

Translationalmodelgeneratesquerywordsnotina

documentbytranslatingintoalternatetermswith similarmeaning,

Needstoknowconditionalprobabilitydistribution

betweenvocabularyterms

Whereisthequerytranslationmodel,

isthe documentlanguagemodel,istheconditionalprobability distributionbetweenvocabularyterms

TypesofOtherGenerativeModels

( | ) ( | ) ( | )

d d v V t q

P q M P v M T t v

∈ ∈

=

∑ ∏

( | )

d

P q M

( | )

d

P v M

( | ) T t v

slide-41
SLIDE 41

Probabilistic Modeling / Joint Distribution Model Haluk Madencioglu

41

Sources:

CSCI6509NotesFall2009 FacultyofComputerScienceDalhousieUniversity http://www.cs.dal.ca/~vlado/csci6509/coursecalendar.html Manning,Raghavan &Schutze,2009,Anintroductiontoinformation retrieval Jurafsky,Martin,2000,AnIntroductiontoNLP,Computational LinguisticsandSpeechRecognition Ghahramani,2000,FundamentalsofProbability