MLE/MAP + NaïveBayes
1
10601IntroductiontoMachineLearning
MattGormley Lecture17 Mar.20,2020
MachineLearningDepartment SchoolofComputerScience CarnegieMellonUniversity
MLE/MAP + Nave Bayes Matt Gormley Lecture 17 Mar. 20, 2020 1 - - PowerPoint PPT Presentation
10 601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Nave Bayes Matt Gormley Lecture 17 Mar. 20,
1
10601IntroductiontoMachineLearning
MattGormley Lecture17 Mar.20,2020
MachineLearningDepartment SchoolofComputerScience CarnegieMellonUniversity
Homework 5:NeuralNetworks
Out:Fri,Feb28 Due:Sun,Mar22at11:59pm
Homework 6:Learning Theory /Generative Models
Out:Fri,Mar20 Due:Fri,Mar27at11:59pm
TIP:Dothe readings! Today’s InClass Poll
http://poll.mlcourse.org
Matt’s newafterclass officehours (on Zoom)
2
14
SupposewehaveNsamplesD={x(1),x(2),…,x(N)}froma randomvariableX Thelikelihood function:
Case1:Xisdiscrete withpmf p(x|) L()=p(x(1)|)p(x(2)|)…p(x(N)|) Case2:Xiscontinuous withpdf f(x|) L()=f(x(1)|)f(x(2)|)…f(x(N)|)
Theloglikelihood function:
Case1:Xisdiscrete withpmf p(x|)
Case2:Xiscontinuous withpdf f(x|)
17
Inbothcases (discrete/ continuous),the likelihood tellsus howlikelyone sampleisrelative toanother Inbothcases (discrete/ continuous),the likelihood tellsus howlikelyone sampleisrelative toanother
pair ofrandomvariablesX,Y
Case1:Yisdiscrete withpmf p(y|x,) L()=p(y(1)|x(1),)…p(y(N)|x(N),) Case2:Yiscontinuous withpdf f(y|x,) L()=f(y(1)|x(1),)…f(y(N)|x(N),)
Case1:XandYarediscrete withpmf p(x,y|) L()=p(x(1),y(1)|)…p(x(N),y(N)|) Case2:XandYarecontinuous withpdf f(x,y|) L()=f(x(1),y(1)|)…f(x(N),y(N)|)
18
pair ofrandomvariablesX,Y
Case1:XandYarediscrete withpmf p(x,y|) L()=p(x(1),y(1)|)…p(x(N),y(N)|) Case2:XandYarecontinuous withpdf f(x,y|) L()=f(x(1),y(1)|)…f(x(N),y(N)|) Case3:Yisdiscrete withpmf p(y|)and Xiscontinuous withpdf f(x|y,) L(,)=f(x(1)|y(1),)p(y(1)|)…f(x(N)|y(N),)p(y(N)|) Case4:Yiscontinuous withpdf f(y|)and Xisdiscrete withpmf p(x|y,) L(,)=p(x(1)|y(1),)f(y(1)|)…p(x(N)|y(N),)f(y(N)|)
19
Mixed discrete/ continuous!
20
MaximumLikelihoodEstimate(MLE) L() MLE MLE 2 1 L(1,2)
21
1. Assumedatawasgeneratedi.i.d.fromsomemodel (i.e.writethegenerativestory) x(i) ~p(x|) 2. Writeloglikelihood
l()=log p(x(1)|)+…+log p(x(N)|)
3. Computepartialderivatives(i.e.gradient) l()/1 =… l()/2 =… … l()/M =… 4. Setderivativestozeroandsolvefor l()/m =0forallm {1,…,M} MLE =solutiontosystemofMequationsandMvariables 5. Computethesecondderivativeandcheckthatl()isconcavedown atMLE
22
23
Goal: Steps:
24
25
26
Question: AssumewehaveNsamplesx(1), x(2),…,x(N) drawnfroma Bernoulli(). Whatistheloglikelihood of thedatal()? AssumeN1 =#of(x(i) =1) N0 =#of(x(i) =0) Question: AssumewehaveNsamplesx(1), x(2),…,x(N) drawnfroma Bernoulli(). Whatistheloglikelihood of thedatal()? AssumeN1 =#of(x(i) =1) N0 =#of(x(i) =0)
27
Answer: A. l()=N1log()+N0 (1 log()) B. l()=N1log()+N0 log(1) C. l()=log()N1 +(1 log())N0 D. l()=log()N1 +log(1)N0 E. l()=N0log()+N1 (1 log()) F. l()=N0log()+N1 log(1) G. l()=log()N0 +(1 log())N1 H. l()=log()N0 +log(1)N1 I. l()=themostlikelyanswer Answer: A. l()=N1log()+N0 (1 log()) B. l()=N1log()+N0 log(1) C. l()=log()N1 +(1 log())N0 D. l()=log()N1 +log(1)N0 E. l()=N0log()+N1 (1 log()) F. l()=N0log()+N1 log(1) G. l()=log()N0 +(1 log())N1 H. l()=log()N0 +log(1)N1 I. l()=themostlikelyanswer
Question: AssumewehaveNsamplesx(1), x(2),…,x(N) drawnfroma Bernoulli(). Whatisthederivative ofthe loglikelihoodl()/? AssumeN1 =#of(x(i) =1) N0 =#of(x(i) =0) Question: AssumewehaveNsamplesx(1), x(2),…,x(N) drawnfroma Bernoulli(). Whatisthederivative ofthe loglikelihoodl()/? AssumeN1 =#of(x(i) =1) N0 =#of(x(i) =0)
28
Answer: A. l()/ =N1+(1 )N0 B. l()/ = /N1+(1 )/N0 C. l()/ =N1/ +N0 /(1 ) D. l()/ =log()/N1+log(1 )/N0 E. l()/ =N1/log()+N0 /log(1 ) Answer: A. l()/ =N1+(1 )N0 B. l()/ = /N1+(1 )/N0 C. l()/ =N1/ +N0 /(1 ) D. l()/ =log()/N1+log(1 )/N0 E. l()/ =N1/log()+N0 /log(1 )
29
30
MaximumLikelihoodEstimate(MLE) Maximumaposteriori (MAP)estimate
31
MaximumLikelihoodEstimate(MLE) Maximumaposteriori (MAP)estimate Prior Prior
32
MaximumLikelihoodEstimate(MLE) Maximumaposteriori (MAP)estimate Prior Prior
33
1. Assumedatawasgeneratedi.i.d.fromsomemodel (i.e.writethegenerativestory) x(i) ~p(x|) 2. Writeloglikelihood
l()=log p(x(1)|)+…+log p(x(N)|)
3. Computepartialderivatives(i.e.gradient) l()/1 =… l()/2 =… … l()/M =… 4. Setderivativestozeroandsolvefor l()/m =0forallm {1,…,M} MLE =solutiontosystemofMequationsandMvariables 5. Computethesecondderivativeandcheckthatl()isconcavedown atMLE
34
35
36
MLE/MAP Youshouldbeableto… 1. Recallprobabilitybasics,includingbutnotlimitedto:discrete andcontinuousrandomvariables,probabilitymassfunctions, probabilitydensityfunctions,eventsvs.randomvariables, expectationandvariance,jointprobabilitydistributions, marginalprobabilities,conditionalprobabilities,independence, conditionalindependence 2. DescribecommonprobabilitydistributionssuchastheBeta, Dirichlet,Multinomial,Categorical,Gaussian,Exponential,etc. 3. Statetheprincipleofmaximumlikelihoodestimationand explainwhatittriestoaccomplish 4. Statetheprincipleofmaximumaposterioriestimationand explainwhyweuseit 5. DerivetheMLEorMAPparametersofasimplemodelinclosed form
37
38
Economistvs.Onionarticles Document bagofwords binary featurevector
Generatingsynthetic"labeleddocuments" Definitionofmodel NaiveBayesassumption Counting#ofparameterswith/without NBassumption
Datalikelihood MLEforNaiveBayes MAPforNaiveBayes
39
40
42
43
Wecanpretendthenaturalprocessgeneratingthesevectorsisstochastic…
44
45
IfHEADS,flip eachredcoin Flipweightedcoin IfTAILS,flip eachbluecoin 1 1 … 1 y x1 x2 x3 … xM 1 1 … 1 1 1 1 1 … 1 1 … 1 1 1 … 1 1 1 … Eachredcoin correspondsto anxm … … Wecangenerate datain thisfashion.Thoughin practiceweneverwould sinceourdataisgiven. Instead,thisprovidesan explanationofhow the datawasgenerated (albeitaterribleone).
46
47
1. Assumedatawasgeneratedi.i.d.fromsomemodel (i.e.writethegenerativestory) x(i) ~p(x|) 2. Writeloglikelihood
l()=log p(x(1)|)+…+log p(x(N)|)
3. Computepartialderivatives(i.e.gradient) l()/1 =… l()/2 =… … l()/M =… 4. Setderivativestozeroandsolvefor l()/m =0forallm {1,…,M} MLE =solutiontosystemofMequationsandMvariables 5. Computethesecondderivativeandcheckthatl()isconcavedown atMLE
48