About this class Point Estimators The next two lectures are really - PowerPoint PPT Presentation

About this class Point Estimators The next two lectures are really coming from Let’s say we have a stream of values all coming a statistics perspective, but we’re going to dis- from the same population (no changing with cover how useful it is for the problems we are time): x 1 , . . . , x n interested in! Suppose the population is described by a pdf Chapter 7 of Casella and Berger is a good ref- f ( x | θ ) erence for this material (most of this lecture is based on that chapter). We want to estimate θ Statistics thinks largely about samples , partic- An estimator is a function of the sample: ularly random samples. X 1 , . . . , X n . Random variables ( X i ): Functions from sam- An estimate is a number, which is a function ple space to R of the realized values x 1 , . . . , x n Realized values of random variables: x i Think of an estimator as an algorithm that Random sample of size n from population f ( x ): produces estimates when given its inputs X 1 , . . . , X n are independent and identically dis- tributed (iid) random variables with pdf or pmf Can you think of a good estimator for the pop- f ( x ) ulation mean? 1 2

Maximum Likelihood Maximum Likelihood For a sample x = x 1 , . . . , x n let ˆ θ ( x ) be the parameter value at which L ( θ | x ) attains its max- Method for deriving estimators. imum (as a function of θ , with x held fixed). Let x denote a realized random sample Then ˆ θ ( x ) is the maximum likelihood estimate ˆ of θ based on the realized sample x . θ ( X ) Likelihood function: is the maximum likelihood estimator based on n the sample X . � L ( θ | x ) = L ( θ | x 1 , . . . , x n ) = f ( x i | θ ) i =1 Note that the MLE has the same range as the parameter, by definition If X is discrete, L ( θ | x ) = P θ ( X = x ) Potential problems Intuitively, if L ( θ 1 | x ) > L ( θ 2 | x ) then θ 1 is in some ways a more plausible value for θ than is θ 2 • How to find and verify the maximum of the function? Can be generalized to multiple parameters θ 1 , . . . , θ n • Numerical sensitivity 3 4

Normal MLE Suppose X 1 , . . . , X n are iid N ( θ , 1) Di ff erentiable Likelihood Functions n 1 2 π e − 1 2 ( x i − θ ) 2 � L ( θ | x ) = √ i =1 Possible candidates are the values of θ 1 , . . . θ k that solve: Standard trick: work with the log likelihood ∂ L ( θ | x ) = 0 , ( i = 1 , . . . , k ) ∂θ i n 1 − 1 Must check whether any such value of θ is in 2( x i − θ ) 2 � log L ( θ | x ) = √ fact a global maximum (could be a minimum, 2 π i =1 an inflection point, a local maximum, and the boundary needs to be checked). Take the derivative, etc... n d 1 � d θ log L ( θ | x ) = √ ( x i − θ ) 2 π i =1 5 6

Bernoulli MLE Let X 1 , . . . , X n be iid Bernoulli( p ) d d θ log L ( θ | x ) = 0 n � n ( x i − θ ) = 0 ⇒ p x i (1 − p ) 1 − x i � L ( p | x ) = i =1 i =1 The only zero of this is for ˆ θ = x = p y (1 − p ) n − y where y = � x i To show that this is, in fact, the maximum likelihood estimate: 1. Show it is a maximum: log L ( p | x ) = y log p + ( n − y ) log(1 − p ) d 2 1 d θ 2 log L ( θ | x ) = 2 π ( − n ) < 0 √ If 0 < y < n dp log L ( p | x ) = y 1 d 1 p − ( n − y ) 2. Unique interior extremum, and a maximum 1 − p – therefore a global maximum dp log L ( p | x ) = 0 ⇒ 1 − p d = n − y p y 7

Binomial MLE, Unknown Number of Trials Population is binomial ( k, p ) with known p and unknown k n � k p = y Then ˆ � p x i (1 − p ) k − x i � n L ( k | x , p ) = x i i =1 Verify the maximum, and consider separately the cases where y = 0 (log likelihood is n log(1 − Maximizing by the di ff erentiation approach is p ) and y = n (log likelihood is n log p ) tricky k ≥ max x i i L ( k | x , p ) > L ( k − 1 | x , p ) L ( k | x , p ) > L ( k + 1 | x , p ) 8

MLE Instability L ( k − 1 | x , p ) = ( k (1 − p )) n L ( k | x , p ) Olkin, Petkau and Zidek [JASA 1981] give the � n i =1 ( k − x i ) following example. Suppose you are estimating the parameters for Conditions for a maximum are: a binomial ( k, p ) distribution (both k and p un- n known) and have the following data: ( k (1 − p )) n ≥ � ( k − x i ) i =1 16 , 18 , 22 , 25 , 27 and n Turns out the ML estimate of k is 99. (( k + 1)(1 − p )) n < � ( k + 1 − x i ) i =1 Question – what do you think the ML estimate of p is? Solution: Solve the equation: But what if the data were slightly noisy, and n (1 − p ) n = � (1 − x i z ) the 27 should have been a 28? i =1 for 0 ≤ z ≤ max i x i . Call this ˆ z The ML estimate of k is now 190! What’s going on here? Most likely the likeli- ˆ k is the largest integer equal to or less than hood function is very flat in the neighborhood 1 / ˆ z of the maximum 9

Bayesian Estimators Classical vs. Bayesian approach to statistics Classical: θ is an unknown but fixed parameter Bayesian: θ is a quantity described by a distri- Where m ( x ) is the marginal distribution of x , bution � f ( x | θ ) π ( θ ) d θ Prior distribution describes ones beliefs about The posterior distribution can be used to make θ before any data is seen statements about θ , but it’s still a distribution! For example, could use the mean of this dis- A sample is taken and the prior is then updated tribution as a point estimate of θ . to take the data into account, leading to a posterior distribution Let the prior be π ( θ ) and the sampling distribution be f ( x | θ ). Then the posterior is given by π ( θ | x ) = f ( x | θ ) π ( θ ) /m ( x ) 10

Binomial Bayes Estimation Let X 1 , . . . , X n be iid Bernoulli( p ) Let Y = � X i Suppose the prior distribution on p is beta( α , β ) (really, I should subscript these, but for nota- tional convenience I won’t...) Brief recap on the beta distribution – family of continuous distributions defined on [0 , 1] and Probability density function governed by the two shape parameters. Γ ( α + β ) Γ ( α ) Γ ( β ) x α − 1 (1 − x ) β − 1 A picture from wikipedia... α Nice fact: Mean is α + β 11

� n � p y (1 − p ) n − y f ( y | p ) = y π ( p ) = Γ ( α + β ) Γ ( α ) Γ ( β ) p α − 1 (1 − p ) β − 1 � 1 f ( y ) = 0 f ( y | p ) f ( p ) dp Bayes estimate combines prior information with � 1 � Γ ( α + β ) the data. � n Γ ( α ) Γ ( β ) p y + α − 1 (1 − p ) n − y + β − 1 dp = y 0 If we want to use a single number, we could use � Γ ( α + β ) � n Γ ( y + α ) Γ ( n − y + β ) = the mean of the posterior distribution, given by y Γ ( α ) Γ ( β ) Γ ( n + α + β ) y + α n + α + β Then the posterior distribution is given by f ( y | p ) π ( p ) f ( y ) Γ ( n + α + β ) Γ ( y + α ) Γ ( n − y + β ) p y + α − 1 (1 − p ) n − y + β − 1 = which is Beta( y + α , n − y + β ) !

Normal MLE when µ and σ Are Both Unknown n ( x i − θ ) 2 log L ( θ , σ 2 | x ) = − n 2 log(2 π ) − n 2 log σ 2 − 1 � σ 2 2 i =1 Partial derivatives: n ∂θ log L ( θ , σ 2 | x ) = 1 ∂ � ( x i − θ ) σ 2 i =1 n ∂σ 2 log L ( θ , σ 2 | x ) = − n 1 ∂ ( x i − θ ) 2 � 2 σ 2 + 2 σ 4 i =1 Setting to 0 and solving gives us: ˆ θ = x n σ 2 = 1 ( x i − x ) 2 � ˆ n i =1 12

About this class Point Estimators The next two lectures are really - PowerPoint PPT Presentation

About this class Point Estimators The next two lectures are really coming from Lets say we have a stream of values all coming a statistics perspective, but were going to dis- from the same population (no changing with cover how useful it

L-estimators, R-estimators, Redescending M gr. Jakub Petr asek Estimators Revision Seminar

Dynamic Panel Data estimators Christopher F Baum EC 823: Applied Econometrics Boston College,

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Dynamic Panel Data estimators Christopher F Baum ECON 8823: Applied Econometrics Boston College,

From Importance Sampling to Doubly Robust Policy Gradient Jiawei Huang (UIUC) Nan Jiang (UIUC)

Regression Discontinuity Estimators and LATE James Heckman University of Chicago Econ 312 May

Debugging Floating-Point Debugging Floating-Point Debugging Floating-Point Math in Racket Math

Point-wise map recovery Task : Recover a point-to-point map from its functional representation n

point to point telephone & telegraph History of Information October 22 overview point to

3/14/16 Review Class/Object Type Class Keyword class class Point

Continuous attractors as unreliable estimators Arvind Murugan Dept. of Physics Regression using

Combining Biased and Unbiased Estimators in High Dimensions Bill Strawderman Rutgers University

Survival models and Cox-regression Rates and Survival Lifetable estimators Bendix Carstensen

Survival Rates and Multiple timescales Survival Lifetable estimators Competing risks Kaplan-

Maximum Likelihood Theory Max Turgeon STAT 4690Applied Multivariate Analysis Suffjcient

Fast and Stable Maximum Likelihood Estimation for Incomplete Multinomial Models Chenyang Zhang,

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

CS54701: Information Retrieval CS-54701 Information Retrieval Course Review Luo Si Department

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Nave Bayes Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative HW

TUTORIAL TUTORIAL Matthieu R Bloch Tuesday, March 24, 2020 1 MLE FOR UNIFORM DISTRIBUTIONS

6. Linear & logistjc regressions Chlo-Agathe Azencot Centre for Computatjonal Biology,

About this class Point Estimators The next two lectures are really - PowerPoint PPT Presentation

About this class Point Estimators The next two lectures are really coming from Lets say we have a stream of values all coming a statistics perspective, but were going to dis- from the same population (no changing with cover how useful it

L-estimators, R-estimators, Redescending M gr. Jakub Petr asek Estimators Revision Seminar

Dynamic Panel Data estimators Christopher F Baum EC 823: Applied Econometrics Boston College,

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Dynamic Panel Data estimators Christopher F Baum ECON 8823: Applied Econometrics Boston College,

From Importance Sampling to Doubly Robust Policy Gradient Jiawei Huang (UIUC) Nan Jiang (UIUC)

Regression Discontinuity Estimators and LATE James Heckman University of Chicago Econ 312 May

Debugging Floating-Point Debugging Floating-Point Debugging Floating-Point Math in Racket Math

Point-wise map recovery Task : Recover a point-to-point map from its functional representation n

point to point telephone &amp; telegraph History of Information October 22 overview point to

3/14/16 Review Class/Object Type Class Keyword class class Point

Continuous attractors as unreliable estimators Arvind Murugan Dept. of Physics Regression using

Combining Biased and Unbiased Estimators in High Dimensions Bill Strawderman Rutgers University

Survival models and Cox-regression Rates and Survival Lifetable estimators Bendix Carstensen

Survival Rates and Multiple timescales Survival Lifetable estimators Competing risks Kaplan-

Maximum Likelihood Theory Max Turgeon STAT 4690Applied Multivariate Analysis Suffjcient

Fast and Stable Maximum Likelihood Estimation for Incomplete Multinomial Models Chenyang Zhang,

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

CS54701: Information Retrieval CS-54701 Information Retrieval Course Review Luo Si Department

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Nave Bayes Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative HW

TUTORIAL TUTORIAL Matthieu R Bloch Tuesday, March 24, 2020 1 MLE FOR UNIFORM DISTRIBUTIONS

6. Linear &amp; logistjc regressions Chlo-Agathe Azencot Centre for Computatjonal Biology,

point to point telephone & telegraph History of Information October 22 overview point to

6. Linear & logistjc regressions Chlo-Agathe Azencot Centre for Computatjonal Biology,