mle map na ve bayes
play

MLE/MAP + Nave Bayes Matt Gormley Lecture 17 Mar. 20, 2020 1 - PowerPoint PPT Presentation

10 601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Nave Bayes Matt Gormley Lecture 17 Mar. 20,


  1. 10 � 601 � Introduction � to � Machine � Learning Machine � Learning � Department School � of � Computer � Science Carnegie � Mellon � University MLE/MAP + Naïve � Bayes Matt � Gormley Lecture � 17 Mar. � 20, � 2020 1

  2. Reminders � Homework 5: � Neural � Networks � Out: � Fri, � Feb � 28 � Due: � Sun, � Mar � 22 � at � 11:59pm � Homework 6: � Learning Theory / � Generative Models � Out: � Fri, � Mar � 20 � Due: � Fri, � Mar � 27 � at � 11:59pm TIP: � Do � the readings! � Today’s In � Class Poll � http://poll.mlcourse.org � Matt’s new � after � class office � hours (on Zoom) 2

  3. MLE � AND � MAP 14

  4. One � R.V. One � R.V. Likelihood � Function � Suppose � we � have � N � samples � D � = � {x (1) , � x (2) , � …, � x (N) } � from � a � random � variable � X In � both � cases � In � both � cases � � The � likelihood function: � (discrete � / � (discrete � / � continuous), � the � continuous), � the � � Case � 1: � X � is � discrete with � pmf p(x| � ) likelihood tells � us � likelihood tells � us � L( � ) � = � p(x (1) | � ) � p(x (2) | � ) � … � p(x (N) | � ) how � likely � one � how � likely � one � � Case � 2: � X � is � continuous with � pdf f(x| � ) � sample � is � relative � sample � is � relative � L( � ) � = � f(x (1) | � ) � f(x (2) | � ) � … � f(x (N) | � ) to � another to � another � The � log � likelihood function: � Case � 1: � X � is � discrete with � pmf p(x| � ) l ( � ) � = � log p(x (1) | � ) � + � … � + � log p(x (N) | � ) � Case � 2: � X � is � continuous with � pdf f(x| � ) � l ( � ) � = � log f(x (1) | � ) � +… � + � log f(x (N) | � ) 17

  5. Two � R.V.s Two � R.V.s Likelihood � Function � Suppose � we � have � N � samples D � = � {(x (1) , � y (1) ), � …, � (x (N) , � y (N) )} � from � a � pair of � random � variables � X, � Y � The � conditional � likelihood � function: � Case � 1: � Y � is � discrete with � pmf p(y � | � x, �� ) L( � ) � = � p(y (1) � | � x (1) , �� ) � …p(y (N) � | � x (N) , �� ) � � Case � 2: � Y � is � continuous with � pdf f(y � | � x, �� ) L( � ) � = � f(y (1) � | � x (1) , �� ) � …f(y (N) � | � x (N) , �� ) � � The � joint � likelihood � function: � Case � 1: � X � and � Y � are � discrete with � pmf p(x,y| � ) L( � ) � = � p(x (1) , � y (1) | � ) � … � p(x (N) , � y (N) | � ) � Case � 2: � X � and � Y � are � continuous with � pdf f(x,y| � ) � L( � ) � = � f(x (1) , � y (1) | � ) � … � f(x (N) , � y (N) | � ) 18

  6. Two � R.V.s Two � R.V.s Likelihood � Function � Suppose � we � have � N � samples D � = � {(x (1) , � y (1) ), � …, � (x (N) , � y (N) )} � from � a � pair of � random � variables � X, � Y � The � joint � likelihood � function: � Case � 1: � X � and � Y � are � discrete with � pmf p(x,y| � ) L( � ) � = � p(x (1) , � y (1) | � ) � … � p(x (N) , � y (N) | � ) Mixed � � Case � 2: � X � and � Y � are � continuous with � pdf f(x,y| � ) � discrete/ � L( � ) � = � f(x (1) , � y (1) | � ) � … � f(x (N) , � y (N) | � ) continuous! � Case � 3: � Y � is � discrete with � pmf p(y| � ) � and � X � is � continuous with � pdf f(x|y, � ) � L( � , �� ) � = � f(x (1) | � y (1) , �� ) � p(y (1) | � ) � … � f(x (N) | � y (N) , �� ) � p(y (N) | � ) � Case � 4: � Y � is � continuous with � pdf f(y| � ) � and � X � is � discrete with � pmf p(x|y, � ) � L( � , �� ) � = � p(x (1) | � y (1) , �� ) � f(y (1) | � ) � … � p(x (N) | � y (N) , �� ) � f(y (N) | � ) 19

  7. MLE Principle � of � Maximum � Likelihood � Estimation: Choose � the � parameters � that � maximize � the � likelihood � of � the � data. Maximum � Likelihood � Estimate � (MLE) � 2 � MLE L( � ) L( � 1 , �� 2 ) � MLE � 1 20

  8. MLE What � does � maximizing � likelihood � accomplish? � There � is � only � a � finite � amount � of � probability � mass � (i.e. � sum � to � one � constraint) � MLE � tries � to � allocate � as � much � probability � mass � as � possible � to � the � things � we � have � observed… … at � the � expense of � the � things � we � have � not observed 21

  9. Recipe � for � Closed � form � MLE 1. Assume � data � was � generated � i.i.d. � from � some � model (i.e. � write � the � generative � story) x (i) ~ � p(x| � ) 2. Write � log � likelihood l ( � ) � = � log p(x (1) | � ) � + � … � + � log p(x (N) | � ) 3. Compute � partial � derivatives � (i.e. � gradient) � l ( � )/ � � 1 = � … � l ( � )/ � � 2 = � … … � l ( � )/ � � M = � … 4. Set � derivatives � to � zero � and � solve � for � � � l ( � )/ � � m = � 0 � for � all � m � � {1, � …, � M} � MLE = � solution � to � system � of � M � equations � and � M � variables Compute � the � second � derivative � and � check � that � l ( � ) � is � concave � down � 5. at � � MLE 22

  10. MLE Example: � MLE � of � Exponential � Distribution Goal: Steps: 23

  11. MLE Example: � MLE � of � Exponential � Distribution 24

  12. MLE Example: � MLE � of � Exponential � Distribution 25

  13. MLE In � Class � Exercise In � Class � Exercise Steps � to � answer: Steps � to � answer: Show � that � the � MLE � of � Show � that � the � MLE � of � 1. Write � log � likelihood � 1. Write � log � likelihood � parameter � � for � N � parameter � � for � N � of � sample of � sample samples � drawn � from � samples � drawn � from � 2. Compute � derivative � 2. Compute � derivative � Bernoulli( � ) � is: Bernoulli( � ) � is: w.r.t. � � w.r.t. � � 3. Set � derivative � to � 3. Set � derivative � to � zero � and � solve � for � � zero � and � solve � for � � 26

  14. MLE Question: Question: Answer: Answer: l( � ) � = � N 1 � log( � ) � + � N 0 (1 �� log( � )) l( � ) � = � N 1 � log( � ) � + � N 0 (1 �� log( � )) Assume � we � have � N � samples � x (1) , � Assume � we � have � N � samples � x (1) , � A. A. x (2) , � …, � x (N) drawn � from � a � x (2) , � …, � x (N) drawn � from � a � l( � ) � = � N 1 � log( � ) � + � N 0 log(1 � � ) l( � ) � = � N 1 � log( � ) � + � N 0 log(1 � � ) B. B. Bernoulli( � ). Bernoulli( � ). l( � ) � = � log( � ) N1 + � (1 �� log( � )) N0 l( � ) � = � log( � ) N1 + � (1 �� log( � )) N0 C. C. l( � ) � = � log( � ) N1 + � log(1 � � ) N0 l( � ) � = � log( � ) N1 + � log(1 � � ) N0 D. D. What � is � the � log � likelihood of � What � is � the � log � likelihood of � l( � ) � = � N 0 � log( � ) � + � N 1 (1 �� log( � )) l( � ) � = � N 0 � log( � ) � + � N 1 (1 �� log( � )) E. E. the � data � l ( � ) ? the � data � l ( � ) ? l( � ) � = � N 0 � log( � ) � + � N 1 log(1 � � ) l( � ) � = � N 0 � log( � ) � + � N 1 log(1 � � ) F. F. l( � ) � = � log( � ) N0 + � (1 �� log( � )) N1 l( � ) � = � log( � ) N0 + � (1 �� log( � )) N1 G. G. Assume � N 1 = � # � of � (x (i) = � 1) Assume � N 1 = � # � of � (x (i) = � 1) l( � ) � = � log( � ) N0 + � log(1 � � ) N1 l( � ) � = � log( � ) N0 + � log(1 � � ) N1 H. H. N 0 = � # � of � (x (i) = � 0) N 0 = � # � of � (x (i) = � 0) l( � ) � = � the � most � likely � answer l( � ) � = � the � most � likely � answer I. I. 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend