lecture 7 glms score equations residuals
play

Lecture 7: GLMs: Score equations, Residuals Author: Nick Reich / - PowerPoint PPT Presentation

Lecture 7: GLMs: Score equations, Residuals Author: Nick Reich / Transcribed by Bing Miu and Yukun Li Course: Categorical Data Analysis (BIOSTATS 743) Made available under the Creative Commons Attribution-ShareAlike 4.0 International License.


  1. Lecture 7: GLMs: Score equations, Residuals Author: Nick Reich / Transcribed by Bing Miu and Yukun Li Course: Categorical Data Analysis (BIOSTATS 743) Made available under the Creative Commons Attribution-ShareAlike 4.0 International License.

  2. Likelihood Equations for GLMs ◮ The GLM likelihood function is given as follows: � ⇀ L ( β ) = log ( f ( y i | θ i , φ )) i � y i θ i − b ( θ i ) � � = + C ( y i , φ ) a ( φ ) i � � y i θ i − b ( θ i ) = + C ( y i , φ ) a ( φ ) i i ◮ φ is a dispersion parameter. Not indexed by i , assumed to be fixed ◮ θ i contains β , from η i ◮ C ( y i , φ ) is from the random component.

  3. Score Equations ◮ Taking the derivative of the log likelihood function, set it equal to 0 ⇀ � ∂ L ( β ) ∂ L i = = 0 , ∀ j ∂β j ∂β j i ∂θ i = ( y i − µ i ) ◮ Since ∂ L i a ( φ ) , µ i = b ′ ( θ i ), Var ( Y i ) = b ′′ ( θ i ) a ( φ ), and η i = � j β j x ij � � ∂ L i y i − µ i a ( φ ) ∂µ i 0 = = x ij ∂β j a ( φ ) Var ( Y i ) ∂η i i i � ( y i − µ i ) x ij ∂µ i = Var ( Y i ) ∂η i i ◮ V ( θ ) = b ′′ ( θ ), b ′′ ( θ ) is the variance function of the GLM. ◮ µ i = E [ Y i | x i ] = g − 1 ( X i β ). These functions are typically non-linear with respect to β ’s, thus require iterative computation solutions.

  4. Example: Score Equation from Binomial GLM (Ch5.5.1) Y~ Binomial ( n i , π i ) ◮ The joint probability mass function: N � π ( x i ) y i [1 − π ( x i )] n i − y i i =1 ◮ The log likelihood: � � � � � � �� � � L ( β ) = y i x ij β j − n i log 1 + exp β j x ij j i i j ◮ The score equation: ⇀ � e X i β ∂ L ( β ) = ( y i − n i ˆ π i ) x ij note that ˆ π i = ∂β j 1 + e X i β i .

  5. Asymptotic Covariance of ˆ β : ◮ The likelihood function determines the asymptotic covariance of the ML estimate for ˆ β . ◮ Given the information matrix, I with hj elements: ⇀ � − ∂ 2 L ( � � ∂µ i � 2 N � β ) x ih x ij I = E = ∂β h β j Var ( Y i ) ∂η i i =1 where w i denotes � ∂µ i � 2 1 w i = Var ( Y i ) ∂η i

  6. Asymptotic Covariance Matrix of ˆ β : ◮ The information matrix, I is equivalent to: I = � N i =1 x ih x ij w i = X T WX ◮ W is a diagonal matrix with w i as the diagonal element. In β MLE and depdent on the link practice, W is evulated at ˆ function ◮ The square root of the main diagonal elements of ( X T WX ) − 1 are estimated standard errors of ˆ β

  7. Analogous to SLR SLR GLM the i th main diagnal σ 2 Var ( ˆ ˆ β i ) � N x ) 2 i =1 ( x i − ¯ element of ( X T WX ) − 1 σ 2 ( X T X ) − 1 ( X T WX ) − 1 Cov ( ˆ β i ) ˆ

  8. Residual and Diagnostics ◮ Deviance Tests ◮ Measure of goodness of fit in GLM based on likelihood ◮ Most useful as a comparison between models (used as a screening method to identify important covariates) ◮ Use the saturated model as a baseline for comparison with other model fits ◮ For Poisson or binomial GLM: D = − 2[ L (ˆ µ | y ) − L ( y | y )]. ◮ Example of Deviance Model D(( y , ˆ µ ) ) � ( Y i − ˆ µ i ) 2 Gaussian 2 � ( y i ln( y i Poisson µ i ) − ( y i − ˆ µ i )) ˆ 2 � ( y i ln ( y i µ i )+( n i − y i ) ln ( n i − y i Bionomial µ i )) ˆ n i − ˆ

  9. Deviance tests for nested models ◮ Consider two models, M 0 with fitted values ˆ µ 0 and M 1 with fitted values ˆ µ 1 : ◮ M 0 is nested within M 1 η µ 1 1 = β 0 + β 1 X 11 + β 2 X 12 η µ 0 0 = β 0 + β 1 X 11 ◮ Simpler models have smaller log likelihood and larger deviance: L (ˆ µ 0 | y ) ≤ L (ˆ µ 1 | y ) and D ( y | ˆ µ 1 ) ≤ D ( y | ˆ µ 0 ). ◮ The likelihood-ratio statistic comparing the two models is the difference between the deviances. − 2[ L (ˆ µ 0 | y ) − L (ˆ µ 1 | y )] = − 2[ L (ˆ µ 0 | y ) − L ( y | y )] − {− 2[ L (ˆ µ 1 | y ) − L ( y | y )] } = D ( y | ˆ µ 0 ) − D ( y | ˆ µ 1 )

  10. Hypothesis test with differences in Deviance ◮ H 0 : β i 1 = ... = β ij = 0, fit a full and reduced model ◮ Hypothesis test with difference in deviance as test statistics. df is the number of parameter different between µ 1 and µ 0 µ 1 ) ∼ χ 2 D ( y | ˆ µ 0 ) − D ( y | ˆ df ◮ Reject H 0 if the the chi-square calculated value is larger than χ 2 df , 1 − α , where df is the number of parameters difference between µ 0 and µ 1 .

  11. Residual Examinations ◮ Pearson residuals : e p y − ˆ µ i √ µ i ) , where µ i = g − 1 ( η i ) = g − 1 ( X i β ) i = V (ˆ ◮ Deviance residuals : µ i ) √ d i , where d i is the deviance contribution of e d i = sign ( y i − ˆ � 1 x > 0 i th obs. and sign ( x ) = − 1 x ≤ 0 ◮ Standardized residuals: e i y − ˆ µ i µ i ) , � √ r i = � h i ) , where e i = h 1 is the measure of (1 − � V (ˆ leverage, and r i ∼ = N (0 , 1)

  12. Residual Plot Problem: Residual plot is hard to interpret for logistic regression 2 1 Residuals 0 −1 −2 −3 −2 −1 0 1 2 3 Expected Values

  13. Binned Residual Plot ◮ Group observations into ordered groups (by x j , ˆ y or x ij ), with equal number of observations per group. ◮ Compute group-wise average for raw residuals ◮ Plot the average residuals vs predicted value. Each dot represent a group. Average Residuals 0.4 0.0 −0.6 −2 −1 0 1 2 Expected Values

  14. Binned Residual Plot (Part 2) ◮ Red lines indicate ± 2 standard-error bounds, within which one would expect about 95% of the binned residuals to fall. ◮ R function avaiable. linrary (arm) binnedplot (x ,y, nclass...) # x <- Expected values. # y <- Residuals values. # nclass <- Number of bins. Average Residuals 0.2 −0.6 −2 −1 0 1 2 Expected Values

  15. Binned Residual Plot (Part 3) ◮ In practice may need to fiddle with the number of observations per group. Default will take the value of nclass according to the n such that: – if n ≥ 100, nclass = floor ( sqrt ( length ( x ))); – if 10 < n < 100, nclass = 10; – if n < 10, nclass = floor ( n / 2).

  16. Ex: Binned Residual Plot with different bin sizes bin size = 10 bin size = 50 0.4 0.2 Average Residuals Average Residuals 0.2 0.1 0.0 0.0 −0.2 −0.2 −3 −2 −1 0 1 2 −4 −2 0 2 4 Expected Values Expected Values bin size = 100 bin size = 500 0.4 0.4 Average Residuals Average Residuals 0.2 0.2 0.0 0.0 −0.2 −0.2 −0.4 −4 −2 0 2 4 −4 −2 0 2 4 Expected Values Expected Values

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend