MLE/MAP + Nave Bayes Matt Gormley Lecture 17 Mar. 20, 2020 1 - PowerPoint PPT Presentation

10 � 601 � Introduction � to � Machine � Learning Machine � Learning � Department School � of � Computer � Science Carnegie � Mellon � University MLE/MAP + Naïve � Bayes Matt � Gormley Lecture � 17 Mar. � 20, � 2020 1

Reminders � Homework 5: � Neural � Networks � Out: � Fri, � Feb � 28 � Due: � Sun, � Mar � 22 � at � 11:59pm � Homework 6: � Learning Theory / � Generative Models � Out: � Fri, � Mar � 20 � Due: � Fri, � Mar � 27 � at � 11:59pm TIP: � Do � the readings! � Today’s In � Class Poll � http://poll.mlcourse.org � Matt’s new � after � class office � hours (on Zoom) 2

MLE � AND � MAP 14

One � R.V. One � R.V. Likelihood � Function � Suppose � we � have � N � samples � D � = � {x (1) , � x (2) , � …, � x (N) } � from � a � random � variable � X In � both � cases � In � both � cases � � The � likelihood function: � (discrete � / � (discrete � / � continuous), � the � continuous), � the � � Case � 1: � X � is � discrete with � pmf p(x| � ) likelihood tells � us � likelihood tells � us � L( � ) � = � p(x (1) | � ) � p(x (2) | � ) � … � p(x (N) | � ) how � likely � one � how � likely � one � � Case � 2: � X � is � continuous with � pdf f(x| � ) � sample � is � relative � sample � is � relative � L( � ) � = � f(x (1) | � ) � f(x (2) | � ) � … � f(x (N) | � ) to � another to � another � The � log � likelihood function: � Case � 1: � X � is � discrete with � pmf p(x| � ) l ( � ) � = � log p(x (1) | � ) � + � … � + � log p(x (N) | � ) � Case � 2: � X � is � continuous with � pdf f(x| � ) � l ( � ) � = � log f(x (1) | � ) � +… � + � log f(x (N) | � ) 17

Two � R.V.s Two � R.V.s Likelihood � Function � Suppose � we � have � N � samples D � = � {(x (1) , � y (1) ), � …, � (x (N) , � y (N) )} � from � a � pair of � random � variables � X, � Y � The � conditional � likelihood � function: � Case � 1: � Y � is � discrete with � pmf p(y � | � x, �� ) L( � ) � = � p(y (1) � | � x (1) , �� ) � …p(y (N) � | � x (N) , �� ) � � Case � 2: � Y � is � continuous with � pdf f(y � | � x, �� ) L( � ) � = � f(y (1) � | � x (1) , �� ) � …f(y (N) � | � x (N) , �� ) � � The � joint � likelihood � function: � Case � 1: � X � and � Y � are � discrete with � pmf p(x,y| � ) L( � ) � = � p(x (1) , � y (1) | � ) � … � p(x (N) , � y (N) | � ) � Case � 2: � X � and � Y � are � continuous with � pdf f(x,y| � ) � L( � ) � = � f(x (1) , � y (1) | � ) � … � f(x (N) , � y (N) | � ) 18

Two � R.V.s Two � R.V.s Likelihood � Function � Suppose � we � have � N � samples D � = � {(x (1) , � y (1) ), � …, � (x (N) , � y (N) )} � from � a � pair of � random � variables � X, � Y � The � joint � likelihood � function: � Case � 1: � X � and � Y � are � discrete with � pmf p(x,y| � ) L( � ) � = � p(x (1) , � y (1) | � ) � … � p(x (N) , � y (N) | � ) Mixed � � Case � 2: � X � and � Y � are � continuous with � pdf f(x,y| � ) � discrete/ � L( � ) � = � f(x (1) , � y (1) | � ) � … � f(x (N) , � y (N) | � ) continuous! � Case � 3: � Y � is � discrete with � pmf p(y| � ) � and � X � is � continuous with � pdf f(x|y, � ) � L( � , �� ) � = � f(x (1) | � y (1) , �� ) � p(y (1) | � ) � … � f(x (N) | � y (N) , �� ) � p(y (N) | � ) � Case � 4: � Y � is � continuous with � pdf f(y| � ) � and � X � is � discrete with � pmf p(x|y, � ) � L( � , �� ) � = � p(x (1) | � y (1) , �� ) � f(y (1) | � ) � … � p(x (N) | � y (N) , �� ) � f(y (N) | � ) 19

MLE Principle � of � Maximum � Likelihood � Estimation: Choose � the � parameters � that � maximize � the � likelihood � of � the � data. Maximum � Likelihood � Estimate � (MLE) � 2 � MLE L( � ) L( � 1 , �� 2 ) � MLE � 1 20

MLE What � does � maximizing � likelihood � accomplish? � There � is � only � a � finite � amount � of � probability � mass � (i.e. � sum � to � one � constraint) � MLE � tries � to � allocate � as � much � probability � mass � as � possible � to � the � things � we � have � observed… … at � the � expense of � the � things � we � have � not observed 21

Recipe � for � Closed � form � MLE 1. Assume � data � was � generated � i.i.d. � from � some � model (i.e. � write � the � generative � story) x (i) ~ � p(x| � ) 2. Write � log � likelihood l ( � ) � = � log p(x (1) | � ) � + � … � + � log p(x (N) | � ) 3. Compute � partial � derivatives � (i.e. � gradient) � l ( � )/ � � 1 = � … � l ( � )/ � � 2 = � … … � l ( � )/ � � M = � … 4. Set � derivatives � to � zero � and � solve � for � � � l ( � )/ � � m = � 0 � for � all � m � � {1, � …, � M} � MLE = � solution � to � system � of � M � equations � and � M � variables Compute � the � second � derivative � and � check � that � l ( � ) � is � concave � down � 5. at � � MLE 22

MLE Example: � MLE � of � Exponential � Distribution Goal: Steps: 23

MLE Example: � MLE � of � Exponential � Distribution 24

MLE Example: � MLE � of � Exponential � Distribution 25

MLE In � Class � Exercise In � Class � Exercise Steps � to � answer: Steps � to � answer: Show � that � the � MLE � of � Show � that � the � MLE � of � 1. Write � log � likelihood � 1. Write � log � likelihood � parameter � � for � N � parameter � � for � N � of � sample of � sample samples � drawn � from � samples � drawn � from � 2. Compute � derivative � 2. Compute � derivative � Bernoulli( � ) � is: Bernoulli( � ) � is: w.r.t. � � w.r.t. � � 3. Set � derivative � to � 3. Set � derivative � to � zero � and � solve � for � � zero � and � solve � for � � 26

MLE Question: Question: Answer: Answer: l( � ) � = � N 1 � log( � ) � + � N 0 (1 �� log( � )) l( � ) � = � N 1 � log( � ) � + � N 0 (1 �� log( � )) Assume � we � have � N � samples � x (1) , � Assume � we � have � N � samples � x (1) , � A. A. x (2) , � …, � x (N) drawn � from � a � x (2) , � …, � x (N) drawn � from � a � l( � ) � = � N 1 � log( � ) � + � N 0 log(1 � � ) l( � ) � = � N 1 � log( � ) � + � N 0 log(1 � � ) B. B. Bernoulli( � ). Bernoulli( � ). l( � ) � = � log( � ) N1 + � (1 �� log( � )) N0 l( � ) � = � log( � ) N1 + � (1 �� log( � )) N0 C. C. l( � ) � = � log( � ) N1 + � log(1 � � ) N0 l( � ) � = � log( � ) N1 + � log(1 � � ) N0 D. D. What � is � the � log � likelihood of � What � is � the � log � likelihood of � l( � ) � = � N 0 � log( � ) � + � N 1 (1 �� log( � )) l( � ) � = � N 0 � log( � ) � + � N 1 (1 �� log( � )) E. E. the � data � l ( � ) ? the � data � l ( � ) ? l( � ) � = � N 0 � log( � ) � + � N 1 log(1 � � ) l( � ) � = � N 0 � log( � ) � + � N 1 log(1 � � ) F. F. l( � ) � = � log( � ) N0 + � (1 �� log( � )) N1 l( � ) � = � log( � ) N0 + � (1 �� log( � )) N1 G. G. Assume � N 1 = � # � of � (x (i) = � 1) Assume � N 1 = � # � of � (x (i) = � 1) l( � ) � = � log( � ) N0 + � log(1 � � ) N1 l( � ) � = � log( � ) N0 + � log(1 � � ) N1 H. H. N 0 = � # � of � (x (i) = � 0) N 0 = � # � of � (x (i) = � 0) l( � ) � = � the � most � likely � answer l( � ) � = � the � most � likely � answer I. I. 27

MLE/MAP + Nave Bayes Matt Gormley Lecture 17 Mar. 20, 2020 1 - PowerPoint PPT Presentation

10 601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Nave Bayes Matt Gormley Lecture 17 Mar. 20,

MLE/MAP + Nave Bayes MLE / MAP Readings: Nave Bayes Readings: Matt Gormley

MLE vs. MAP Aarti Singh Machine Learning 10-701/15-781 Sept 15, 2010 1 MLE vs. MAP Maximum

Making Life Easier Online service for people within North Lanarkshire MLE History MLE website

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

Excel2013: Model Logistic MLE 1Y1X Sept 2015 V1A V1A V1A Excel2013 Model Logistic MLE 1Y1X

Logistic Regression: MLE vs. OLS1 in Excel2013 29 Aug 2016 V0B V0B V0B Schield MLE vs.

MLE, MAP, AND NAIVE BAYES 10-601 RECITATION MARY MCGLOHON MLE The usual representation we come

Laying a Solid Foundation for Learning: Lessons from the Kom MLE Project in Cameroon Paul

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Homework 2 MLE and Naive Bayes Instructions Answer the questions and upload your answers to

2015 Schield Logistic MLE1C Excel2013 8/18/2016 V0D V0D V0D 2015 Schield Logistic MLE 1C

2015 Schield Logistic MLE1A Excel2013 10/29/2015 V0D V0D V0D 2015 Schield Logistic MLE 1A

Benefits Eligibility 1 Eligibility Transition - General All eCommerce associates enrolling

Interstate Compacts Legal Roundtable Case Law Update The Smarter Balance Assessment Consortium

JSF Lifecycle Diagram If immediate=true then Actions, Action Listeners, and Value Change

Overview: Multimodal Architecture and Interfaces Deborah Dahl W3C Workshop on Multimodal

Maximum Likelihood Learning Stefano Ermon, Aditya Grover Stanford University Lecture 4 Stefano

Estimating the parameters of some probability distributions: Exemplifications 1. Estimating the

PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 2: Estimation Theory October 2019 Heikki

Maximum Likelihood Estimation MLE tool for parameter estimation good approach for cases

Sambuz

Useful Links

Newsletter

Mail Us