Theory of Generalized Linear Models If Y has a Poisson distribution - PowerPoint PPT Presentation

Theory of Generalized Linear Models ◮ If Y has a Poisson distribution with parameter µ then P ( Y = y ) = µ y e − µ y ! for y a non-negative integer. ◮ We can use the method of maximum likelihood to estimate µ if we have a sample Y 1 , . . . , Y n of independent Poisson random variables all with mean µ . ◮ If we observe Y 1 = y 1 , Y 2 = y 2 and so on then the likelihood function is P y i e − n µ n µ y i e − µ = µ � P ( Y 1 = y 1 , . . . , Y n = y n ) = � y i ! y i ! i =1 ◮ This function of µ can be maximized by maximizing its logarithm, the log likelihood function. Richard Lockhart STAT 350: Estimating Eauations

◮ Set derivative of log likelihood with respect to µ equal to 0. ◮ Get likelihood equation : d � � � d µ [ y i log µ − n µ − log( y i !)] = y i /µ − n = 0 . ◮ Solution ˆ µ = ¯ y is the maximum likelihood estimate of µ . Richard Lockhart STAT 350: Estimating Eauations

◮ In a regression problem all the Y i will have different means µ i . ◮ Our log-likelihood is now � � � y i log µ i − µ i − log( y i !) ◮ If we treat all n of the µ i as unknown parameters we can maximize the log likelihood by setting each of the n partial derivatives with respect to µ k for k from 1 to n equal to 0. ◮ The k th of these n equations is just y k /µ k − 1 = 0 . ◮ This leads to ˆ µ k = y k . ◮ In glm jargon this model is the saturated model. Richard Lockhart STAT 350: Estimating Eauations

◮ A more useful model is one in which there are fewer parameters but more than 1. ◮ A typical glm model is µ i = exp( x T i β ) where the x i are covariate values for the i th observation. ◮ Often include an intercept term just as in standard linear regression. ◮ In this case the log-likelihood is � � � y i x T exp( x T i β − i β ) − log( y i !) which should be treated as a function of β and maximized. ◮ The derivative of this log-likelihood with respect to β k is � � exp( x T � y i x ik − i β ) x i , k = ( y i − µ i ) x i , k ◮ If β has p components then setting these p derivatives equal to 0 gives the likelihood equations . Richard Lockhart STAT 350: Estimating Eauations

◮ It is no longer possible to solve the likelihood equations analytically. ◮ We have, instead, to settle for numerical techniques. ◮ One common technique is called iteratively re-weighted least squares . ◮ For a Poisson variable with mean µ i the variance is σ 2 i = µ i . ◮ Ignore for a moment the fact that if we knew σ i we would know µ i and ◮ consider fitting our model by least squares with the σ 2 i known. ◮ We would minimize (see our discussion of weighted least squares) � ( Y i − µ i ) 2 σ 2 i by taking the derivative with respect to β k and (again ignoring the fact that σ 2 i depends on β k we would get � ( Y i − µ i ) ∂µ i /∂β k − 2 = 0 σ 2 i Richard Lockhart STAT 350: Estimating Eauations

◮ But the derivative of µ i with respect to β k is µ i x ik ◮ and replacing σ 2 i by µ i we get the equation � ( Y i − µ i ) x ik = 0 exactly as before. ◮ This motivates the following estimation scheme. 1. Begin with guess for SDs σ i (taking all to be 1 is easy). 2. Do (non-linear) weighted least squares using guessed weights. Get estimated regression parameters ˆ β . σ 2 3. Use these to compute estimated variances ˆ i . Go back to do weighted least squares with these weights. 4. Iterate (repeat over and over) until estimates stop changing. ◮ NOTE : if the estimation converges then the final estimate is a fixed point of the algorithm which solves the equation � ( Y i − µ i ) x ik = 0 derived above. Richard Lockhart STAT 350: Estimating Eauations

Estimating equations: an introduction via glim Get estimates ˆ θ by solving h ( X , θ ) = 0 for θ . 1. The normal equations in linear regression: X T Y − X T X β = 0 2. Likelihood equations; if ℓ ( θ ) is log-likelihood: ∂ℓ ∂θ = 0 . 3. Non-linear least squares: ( Y i − µ i ) ∂µ i � ∂θ = 0 4. The iteratively reweighted least squares estimating equation: � Y i − µ i ∂µ i ∂θ = 0; σ 2 i for generalized linear model σ 2 i is known function of µ i . Richard Lockhart STAT 350: Estimating Eauations

Poisson regression revisited ◮ The likelihood function for a Poisson regression model is: � µ y i � i L ( β ) = y i ! exp( − µ i ) ◮ the log-likelihood is � � � y i log µ i − µ i − log( y i !) ◮ A typical glm model is µ i = exp( x T i β ) where x i is covariate vector for observation i (often include intercept term as in standard linear regression). ◮ In this case the log-likelihood is � y i x T � exp( x T � i β − i β ) − log( y i !) which should be treated as a function of β and maximized. Richard Lockhart STAT 350: Estimating Eauations

◮ The derivative of this log-likelihood with respect to β k is � � � exp( x T i β ) x i , k = ( y i − µ i ) x i , k y i x ik − ◮ If β has p components then setting these p derivatives equal to 0 gives the likelihood equations . ◮ For a Poisson model the variance is given by σ 2 i = µ i = exp( x T i β ) ◮ so the likelihood equations can be written as � ( y i − µ i ) x i , k µ i � ( y i − µ i ) ∂µ i = = 0 σ 2 µ i ∂β k i which is the fourth equation above. Richard Lockhart STAT 350: Estimating Eauations

IRWLS ◮ Equations solved iteratively, as in non-linear regression, but iteration now involves weighted least squares. ◮ Resulting scheme is called iteratively reweighted least squares . 1. Begin with guess for SDs σ i (taking all equal to 1 is simple). 2. Do (non-linear) weighted least squares using the guessed weights. Get estimated regression parameters ˆ β (0) . σ 2 3. Use to compute estimated variances ˆ i . Re-do weighted least squares with these weights; get ˆ β (1) . 4. Iterate (repeat over and over) until estimates not really changing. Richard Lockhart STAT 350: Estimating Eauations

Fixed Points of Algorithms β ( k ) converge as k → ∞ to something, say, ˆ ◮ Suppose the ˆ β . ◮ Recall � � � y i − µ i (ˆ ∂µ i (ˆ β ( k +1) ) β ( k +1) ) = 0 i (ˆ ∂ ˆ σ 2 β ( k ) ) β ( k +1) ◮ we learn that ˆ β must be a root of the equation � � � y i − µ i (ˆ ∂µ i (ˆ β ) β ) = 0 i (ˆ ∂ ˆ σ 2 β ) β which is the last of our example estimating equations. Richard Lockhart STAT 350: Estimating Eauations

Distribution of Estimators ◮ Distribution Theory : compute distribution of statistics, estimators and pivots. ◮ Examples: Multivariate Normal Distribution; theorems about chi-squared distribution of quadratic forms; theorems that F statistics have F distributions when null hypothesis true; theorems that show a t pivot has a t distribution. ◮ Exact Distribution Theory : exact results as in previous example when errors are assumed to have exactly normal distributions. ◮ Asymptotic or Large Sample Distribution Theory : same sort of conclusions but only approximately true and assuming n is large. Theorems of the form: n →∞ P ( T n ≤ t ) = F ( t ) lim ◮ For generalized linear models do asymptotic distribution theory. Richard Lockhart STAT 350: Estimating Eauations

Uses of Asymptotic Theory: principles ◮ An estimate is normally only useful if it is equipped with a measure of uncertainty such as a standard error. ◮ A standard error is a useful measure of uncertainty provided the error of estimation ˆ θ − θ has approximately a normal distribution and the standard error is the standard deviation of this normal distribution. ◮ For many estimating equations h ( Y , θ ) = 0 the root ˆ θ is unique and has the desired approximate normal distribution, provided the sample size n is large . Richard Lockhart STAT 350: Estimating Eauations

Sketch of reasoning in special case ◮ Poisson example: p = 1 ◮ Assume Y i has a Poisson distribution with mean µ i = e x i β where now β is a scalar. ◮ The estimating equation (the likelihood equation) is � ( Y i − e x i β ) x i = 0 U ( β ) = h ( Y 1 , . . . , Y n , β ) = ◮ It is now important to distinguish between a value of β which we are trying out in the estimating equation and the true value of β which I will call β 0 . ◮ If we happen to try out the true value of β in U then we find � E β 0 ( U ( β 0 )) = x i E β 0 ( Y i − µ i ) = 0 Richard Lockhart STAT 350: Estimating Eauations

◮ On the other hand if we try out a value of β other than the correct one we find x i ( e x i β − e x i β 0 ) � = 0 . � E β 0 ( U ( β )) = ◮ But U ( β ) is a sum of independent random variables so by the law of large numbers (law of averages) must be close to its expected value. ◮ This means: if we stick in a value of β far from the right value we will not get 0 while if we stick in a value of β close to the right answer we will get something close to 0. ◮ This can sometimes be turned in to the assertion: The glm estimate of β is consistent , that is, it converges to the correct answer as the sample size goes to ∞ . Richard Lockhart STAT 350: Estimating Eauations

Theory of Generalized Linear Models If Y has a Poisson distribution - PowerPoint PPT Presentation

Theory of Generalized Linear Models If Y has a Poisson distribution with parameter then P ( Y = y ) = y e y ! for y a non-negative integer. We can use the method of maximum likelihood to estimate if we have a sample Y 1 , . .

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Generalized linear models Christopher F Baum EC 823: Applied Econometrics Boston College, Spring

Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik

Introduction to General and Generalized Linear Models Generalized Linear Models - part I Henrik

Multiple logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan February 7, 2017

Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Generalized Additive Models September 10, 2019 Generalized Additive Models September 10, 2019 1

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A

Bias reduction in generalized nonlinear models Ioannis Kosmidis and David Firth Department of

Proper Generalized Decomposition for Linear and Non-Linear Stochastic Models Olivier Le Matre 1

Detection and Estimation Theory Lecture 12 Mojtaba Soltanalian- UIC msol@uic.edu

On the Statistical Rate of Nonlinear Recovery in Generative Models with Heavy-tailed Data Xiaohan

Nonlinear Optimization Practical Advice for Non-linear Least Square Problems Niclas Brlin

EKF, UKF Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox,

Optimal state estimation for numerical weather prediction using reduced order models A.S. Lawless

A motivation for polynomial regression We have obtained input-output pairs { ( x t , y t ) } t over

Applied Machine Learning Applied Machine Learning Regularization Siamak Ravanbakhsh Siamak

Analysing Gene Expression Data Using Gaussian Processes Lorenz Wernisch School of