Chapter 8.3. Maximum Likelihood Estimation Prof. Tesler Math 283 - PowerPoint PPT Presentation

Chapter 8.3. Maximum Likelihood Estimation Prof. Tesler Math 283 Fall 2019 Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 1 / 11

Estimating parameters Let Y be a random variable with a distribution of known type but unknown parameter value θ . Bernoulli or geometric with unknown p . Poisson with unknown mean µ . Denote the pdf of Y by P Y ( y ; θ ) to emphasize that there is a parameter θ . Do n independent trials to get data y 1 , y 2 , y 3 , . . . , y n . The joint pdf is P Y 1 ,..., Y n ( y 1 , . . . , y n ; θ ) = P Y ( y 1 ; θ ) · · · P Y ( y n ; θ ) Goal: Use the data to estimate θ . Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 2 / 11

Likelihood function Previously, we knew the parameter θ and regarded the y ’s as unknowns (occurring with certain probabilities). Define the likelihood of θ given data y 1 , . . . , y n to be L ( θ ; y 1 , . . . , y n ) = P Y 1 ,..., Y n ( y 1 , . . . , y n ; θ ) = P Y ( y 1 ; θ ) · · · P Y ( y n ; θ ) It’s the exact same formula as the joint pdf; the difference is the interpretation. Now the data y 1 , . . . , y n is given while θ is unknown. Definition (Maximum Likelihood Estimate, or MLE) The value θ = � θ that maximizes L is the Maximum Likelihood Estimate . Often, it is found using Calculus by locating a critical point: d 2 L dL d θ = 0 d θ 2 < 0 However, be sure to check for complications such as discontinuities and boundary values of θ . Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 3 / 11

MLE for the Poisson distribution Y has a Poisson distribution with unknown parameter µ � 0 . Collect data from independent trials: Y 1 = y 1 , Y 2 = y 2 , · · · , Y n = y n Likelihood: n e − µ µ y i y i ! = e − n µ µ y 1 + ··· + y n � L ( µ ; y 1 , . . . , y n ) = y 1 ! · · · y n ! i = 1 Log likelihood is maximized at the same µ and is easier to use: ln L ( µ ; y 1 , . . . , y n ) = − n µ + ( y 1 + · · · + y n ) ln µ − ln ( y 1 ! · · · y n ! ) Critical point: Solve d ( ln L ) / d µ = 0 : d ( ln L ) = − n + y 1 + · · · + y n µ = y 1 + · · · + y n = 0 so d µ µ n Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 4 / 11

MLE for the Poisson distribution Log likelihood is maximized at the same µ and is easier to use: ln L ( µ ; y 1 , . . . , y n ) = − n µ + ( y 1 + · · · + y n ) ln µ − ln ( y 1 ! · · · y n ! ) Critical point: Solve d ( ln L ) / d µ = 0 : d ( ln L ) = − n + y 1 + · · · + y n µ = y 1 + · · · + y n = 0 so d µ µ n Check second derivative is negative: d 2 ( ln L ) n 2 = − y 1 + · · · + y n = − < 0 d µ 2 µ 2 y 1 + · · · + y n provided y 1 + · · · + y n > 0 . So it’s a max unless y 1 + · · · + y n = 0 . Boundaries for range µ ≥ 0 : Must check µ → 0 + and µ → ∞ . Both send ln L → − ∞ , so the µ identified above gives the max. The Maximum Likelihood Estimate for the Poisson distribution µ = y 1 + · · · + y n = 0 (# of 0’s) + 1 (# of 1’s) + 2 (# of 2’s) + · · · ˆ n n Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 5 / 11

MLE for the Poisson distribution The exceptional case on the previous slide was y 1 + · · · + y n = 0 , giving y 1 = · · · = y n = 0 (since all y i � 0 ). In this case, ln L ( µ ; y 1 , . . . , y n ) = − n µ + ( y 1 + · · · + y n ) ln µ − ln ( y 1 ! · · · y n ! ) = − n µ + 0 ln µ − ln ( 0 ! · · · 0 ! ) = − n µ On the range µ � 0 , this is maximized at ˆ µ = 0 , which agrees with the main formula: µ = y 1 + · · · + y n = 0 + · · · + 0 ˆ = 0 n n Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 6 / 11

Repeating the estimation gives different results Scenario: In a lab class, each student does 10 trials of an experiment and averages them. How do their results compare? A does n trials y A 1 , y A 2 , . . . , y An , leading to MLE � θ A , B does n trials y B 1 , y B 2 , . . . , y Bn , leading to MLE � θ B , etc. How do � θ A , � θ B , . . . compare? Treat the n trials in each experiment as random variables Y 1 , . . . , Y n and the MLE as a random variable � Θ . Estimate Poisson parameter with n = 10 trials (secret: µ = 1 . 23 ) � Experiment Θ Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y 8 Y 9 Y 10 A 1 0 0 0 3 0 2 2 0 2 1.0 B 1 2 0 1 1 3 0 0 0 1 0.9 C 3 2 2 1 1 1 1 2 1 1 1.5 D 1 2 1 2 1 4 2 3 2 1 1.9 E 0 3 0 1 1 0 0 1 2 2 1.0 Mean 1.2 1.8 0.6 1 1.4 1.6 1 1.6 1 1.4 1.26 Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 7 / 11

Desireable properties of an estimator � Θ � Θ should be narrowly distributed around the correct value of θ . Increasing n should improve the estimate. The distribution of � Θ should be known. The MLE often does this (though not always!). Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 8 / 11

Bias Suppose Y is Poisson with secret parameter µ . Poisson MLE from data is µ = Y 1 + · · · + Y n ˆ n If many MLEs are computed from independent data sets, the average tends to � Y 1 + · · · + Y n � = E ( Y 1 ) + · · · + E ( Y n ) E ( ˆ µ ) = E n n = µ + · · · + µ = n µ n = µ n Since E ( ˆ µ ) = µ , we say ˆ µ is an unbiased estimator of µ . Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 9 / 11

Bias If E ( ˆ µ ) = µ , then ˆ µ is an unbiased estimator of µ . But if E ( ˆ µ ) � µ , then ˆ µ is a biased estimator of µ . µ ′ = 2 Y 1 has E ( ˆ Contrived example: Estimator ˆ µ ′ ) = 2 µ , so it’s biased (unless µ = 0 ). We will soon see an example (normal distribution) where the MLE gives a biased estimator. Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 10 / 11

Efficiency (want estimates to have small spread) Increasing n µ = Y 1 + ··· + Y n Continue with Poisson MLE ˆ and secret mean µ . n The variance is � Y 1 + · · · + Y n � = Var ( Y 1 ) + · · · + Var ( Y n ) Var ( ˆ µ ) = Var n 2 n = n Var ( Y 1 ) = Var ( Y 1 ) = µ n 2 n n Increasing n makes the variance smaller (ˆ µ is more efficient ). Another estimator µ ′ = Y 1 + 2 Y 2 Set ˆ (and ignore Y 3 , . . . , Y n ). 3 µ ′ ) = µ + 2 µ E ( ˆ = µ so unbiased 3 µ ′ ) = Var ( Y 1 ) + 4 Var ( Y 2 ) = µ + 4 µ = 5 µ Var ( ˆ 9 9 9 so it has higher variance (less efficient) than the MLE. Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 11 / 11

Chapter 8.3. Maximum Likelihood Estimation Prof. Tesler Math 283 - PowerPoint PPT Presentation

Chapter 8.3. Maximum Likelihood Estimation Prof. Tesler Math 283 Fall 2019 Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 1 / 11 Estimating parameters Let Y be a random variable with a distribution of known type but

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Maximum likelihood

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

Phylogenetic trees IV Maximum Likelihood Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Quasi-maximum likelihood estimation for multivariate CARMA processes Eckhard Schlemm Institute

Week 2: Maximum Likelihood Estimation Instructor: Sergey Levine 1 Recap: MLE for the binomial

Phylogenetic trees IV Maximum Likelihood Gerhard Jger Words, Bones, Genes, Tools February 28,

Lecture 8: Maximum Likelihood Estimation (MLE) (contd.) Maximum a posteriori (MAP)

Parameter Estimation Probability theory tells us what to expect when we carry out some experiment

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

Estimation & Maximum Likelihood Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314)

Lecture 22: Point Estimation 0/ 23 Today we start Chapter 6 and with it the statistics port of

Estimation Theory Overview Introduction Up until now we have defined and discussed properties

Estimation: Sample Complexity and the Bias-Variance Tradeoff CMPUT 296: Basics of Machine

Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical

1 1 easy to compute , 1 easy to compute 2

Chapter 8.3. Maximum Likelihood Estimation Prof. Tesler Math 283 - PowerPoint PPT Presentation

Chapter 8.3. Maximum Likelihood Estimation Prof. Tesler Math 283 Fall 2019 Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 1 / 11 Estimating parameters Let Y be a random variable with a distribution of known type but

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Maximum likelihood

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

Phylogenetic trees IV Maximum Likelihood Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Quasi-maximum likelihood estimation for multivariate CARMA processes Eckhard Schlemm Institute

Week 2: Maximum Likelihood Estimation Instructor: Sergey Levine 1 Recap: MLE for the binomial

Phylogenetic trees IV Maximum Likelihood Gerhard Jger Words, Bones, Genes, Tools February 28,

Lecture 8: Maximum Likelihood Estimation (MLE) (contd.) Maximum a posteriori (MAP)

Parameter Estimation Probability theory tells us what to expect when we carry out some experiment

Bias, Variance and Error Bias and Variance given algorithm that outputs estimate for , we

Estimation &amp; Maximum Likelihood Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314)

Lecture 22: Point Estimation 0/ 23 Today we start Chapter 6 and with it the statistics port of

Estimation Theory Overview Introduction Up until now we have defined and discussed properties

Estimation: Sample Complexity and the Bias-Variance Tradeoff CMPUT 296: Basics of Machine

Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical

1 1 easy to compute , 1 easy to compute 2

Estimation & Maximum Likelihood Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314)