lecture 12 quasi likelihood nan ye
play

Lecture 12. Quasi-likelihood Nan Ye School of Mathematics and - PowerPoint PPT Presentation

Lecture 12. Quasi-likelihood Nan Ye School of Mathematics and Physics University of Queensland 1 / 28 Looking Back: Course Overview Generalized linear models (GLMs) Building blocks systematic and random components, exponential familes


  1. Lecture 12. Quasi-likelihood Nan Ye School of Mathematics and Physics University of Queensland 1 / 28

  2. Looking Back: Course Overview Generalized linear models (GLMs) • Building blocks systematic and random components, exponential familes • Prediction and parameter estimation • Specific models for different types of data continuous response, binary response, count response... • Modelling process and model diagnostics Extensions of GLMs • Quasi-likelihood models • Nonparametric models • Mixed models and marginal models Time series 2 / 28

  3. Extending GLMs (a) (c) Quasi-likelihood Mixed/marginal GLMs models models (b) Nonparametric models (a) Relax assumption on the random component. (b) Relax assumption on the systematic component. (c) Relax assumption on the data (independence). 3 / 28

  4. Recall Gamma regression • When Y is a non-negative continuous random variable, we can choose the systematic and random components as follows. E ( Y | x ) = exp( 𝛾 ⊤ x ) (systematic) (random) Y | x is Gamma distributed . • We further assume the variance of the Gamma distribution is 𝜈 2 /𝜉 ( 𝜉 treated as known), thus Y | x ∼ Γ( 𝜈 = exp( 𝛾 ⊤ x ) , var = 𝜈 2 /𝜉 ) , where Γ( 𝜈 = a , var = b ) denotes a Gamma distribution with mean a and variance b . We have seen how to estimate 𝛾 for Gamma regression. How do we estimate the dispersion parameter 𝜒 = 1 /𝜉 ? 4 / 28

  5. Poisson regression • Poisson regression requires data variance to be the same as mean, but this is seldom the case in real data. • Overdispersion: variance in data is larger than expected based on the model. • Underdisperson: variance in data is smaller than expected based on the model. • For count data, we used quasi Poisson regression to allow both overdisperson and underdispersion. How is the quasi-Poisson model defined? How are the parameters estimated? 5 / 28

  6. This Lecture • Estimation of dispersion parameter • Quasi-likelihood: derivation and parameter estimation 6 / 28

  7. Estimation of Dispersion Parameter Recall: Fisher scoring for Gamma regression • Consider the Gamma regression model Y | x ∼ Γ( 𝜈 = exp( 𝛾 ⊤ x ) , var = 𝜈 2 /𝜉 ) , • Let 𝜈 i = exp( x ⊤ i 𝛾 ), then gradient and Fisher information are 𝜉 ( y i − 𝜈 i ) ∑︂ ∑︂ 𝜉 x ⊤ ∇ ℓ ( 𝛾 ) = x i , I ( 𝛾 ) = i x i , 𝜈 i i i • Fisher scoring updates 𝛾 to 𝛾 ′ = 𝛾 + I ( 𝛾 ) − 1 ∇ ℓ ( 𝛾 ) . Update of 𝛾 does not depend on the dispersion parameter 𝜒 = 1 /𝜉 ! 7 / 28

  8. Moment estimator for the dispersion parameter • We first estimate 𝛾 with Fisher scoring. • Recall: if a GLM model with var( Y ) = 𝜒 V ( 𝜈 ) is correct, then X 2 𝜈 i ) 2 ( y i − ˆ ∑︂ ∼ 𝜓 2 𝜒 = n − p 𝜒 V (ˆ 𝜈 i ) i where X 2 is the generalized Pearson statistic, n is the number of examples, and p is the number of parameters in 𝛾 . • That is, we have E ( X 2 /𝜒 ) = n − p . • The gives us the moment estimator X 2 𝜈 i ) 2 1 ( y i − ˆ ˆ ∑︂ 𝜒 = n − p = n − p V (ˆ 𝜈 i ) i The formula can be used for any GLM with unknown 𝜒 ! 8 / 28

  9. Example For Gamma regression, var( Y ) = 𝜒𝜈 2 , so V ( 𝜈 ) = 𝜈 2 . > fit.gam.inv = glm(time ~ lot * log(conc), data=clot, family=Gamma) (Dispersion parameter for Gamma family taken to be 0.002129707) > mu = predict(fit.gam.inv, type= ' response ' ) > sum((fit.gam.inv $ y - mu)**2 / mu**2) / (length(mu) - length(coef(fit.gam.inv))) [1] 0.002129692 Our estimate is consistent with the summary function. 9 / 28

  10. Quasi-Likelihood Recall: Fisher scoring for GLM • Let 𝜈 i = E ( Y i | x i , 𝛾 ) = g ( x ⊤ i 𝛾 ) and V i = var( Y i | x i , 𝛾 ). • The gradient, or score function, is y i − 𝜈 i ∑︂ ∇ ℓ ( 𝛾 ) = x i . g ′ ( 𝜈 i ) V i i • The Fisher information is 1 ∑︂ x i x ⊤ I ( 𝛾 ) = i . g ′ ( 𝜈 i ) 2 V i i • Fisher scoring updates 𝛾 to 𝛾 ′ = 𝛾 + I − 1 ( 𝛾 ) ∇ ℓ ( 𝛾 ) . 10 / 28

  11. • Fisher scoring for GLM can thus be written as )︄ − 1 (︄∑︂ (︄∑︂ )︄ 1 y i − 𝜈 i 𝛾 ′ = 𝛾 + x i x ⊤ x i . i g ′ ( 𝜈 i ) 2 V i g ′ ( 𝜈 i ) V i i i • We just need to know the link function g and the variances V i ’s. • In particular, if we know V i = 𝜒 V ( 𝜈 i ), then the update does not depend on 𝜒 . • Thus we can determine 𝛾 even if 𝜒 is unknown. 11 / 28

  12. Quasi-model via Fisher scoring • A GLM has the following structure 𝜈 = E ( Y | x ) = h ( 𝛾 ⊤ x ) , (systematic) (random) Y | x follows an exponential family distribution . • A quasi-model relaxes the assumption on the random component 𝜈 = E ( Y | x ) = h ( 𝛾 ⊤ x ) , (systematic) (random) var( Y | x ) = 𝜒 V ( 𝜈 ) , where 𝜒 is a dispersion parameter, V ( 𝜈 ) is a variance function, and 𝛾 is determined using Fisher scoring! 12 / 28

  13. Hi, I’m Quasimodo. 13 / 28

  14. Quasi-model via quasi-likelihood • A quasi-model relaxes the assumption on the random component 𝜈 = E ( Y | x ) = h ( 𝛾 ⊤ x ) , (systematic) (random) var( Y | x ) = 𝜒 V ( 𝜈 ) , where 𝜒 is a dispersion parameter, V ( 𝜈 ) is a variance function, and 𝛾 is determined by maximizing quasi-likelihood! • Quasi-likelihood is a surrogate for the log-likelihood of the mean parameter 𝜈 given an observation y , when we only know var( Y | x ) = 𝜒 V ( 𝜈 ). 14 / 28

  15. Construction of quasi-likelihood • Recall: a score function ℓ ( 𝜈 ) satisfies E ( ℓ ) = 0 , var( ℓ ) = − E ( ℓ ′ ) . Y − µ • Define S ( 𝜈 ) = φ V ( µ ) , then S ( 𝜈 ) is similar to a score function: E ( S ) = 0 , 1 var( S ) = − E S ′ = 𝜒 V ( 𝜈 ) . • S ( 𝜈 ) is thus called a quasi-score function. 15 / 28

  16. • The usual log-likelihood is an integral of the score function. • By analogy, the quasi-likelihood (quasi log-likelihood) is ∫︂ µ y − t Q ( 𝜈 ; y ) = 𝜒 V ( t ) dt . y 16 / 28

  17. Quasi-likelihood for some variance functions V ( µ ) Q ( µ ; y ) distribution constraint − ( y − µ ) 2 / 2 1 normal - µ y ln µ − µ Poisson µ > 0 , y ≥ 0 µ 2 − y /µ − ln µ Gamma µ > 0 , y ≥ 0 µ 3 − y / (2 µ 2 ) + 1 /µ inverse Gaussian µ > 0 , y ≥ 0 µ 2 µ − m (︂ )︂ µ y µ m 1 − m − - µ > 0 , m ̸ = 0 , 1 , 2 2 − m µ (1 − µ ) y ln 1 − µ + ln(1 − µ ) µ binomial µ ∈ (0 , 1) , 0 ≤ y ≤ 1 µ 2 (1 − µ 2 ) 1 − µ − y µ − 1 − y µ (2 y − 1) ln - µ ∈ (0 , 1) , 0 ≤ y ≤ 1 1 − µ µ + µ 2 / k k µ y ln k + µ + k ln negative binomial µ > 0 , y ≥ 0 k + µ 17 / 28

  18. Parameter estimation for quasi-model • In a quasi-model, 𝜈 is a function of 𝛾 , and the quasi-likelihood is also a function of 𝛾 ∑︂ Q ( 𝛾 ) = Q ( 𝜈 i ( 𝛾 ); y i ) i • The Fisher scoring update for Q is given by 𝛾 ′ = 𝛾 + ( − E ∇ 2 Q ( 𝛾 )) − 1 ∇ Q ( 𝛾 ) )︄ − 1 (︄∑︂ (︄∑︂ )︄ 1 y i − 𝜈 i g ′ ( 𝜈 i ) 2 𝜒 V ( 𝜈 i ) x i x ⊤ = 𝛾 + g ′ ( 𝜈 i ) 𝜒 V ( 𝜈 i ) x i . i i i The update is independent of 𝜒 . X 2 • 𝜒 is estimated as ˆ 𝜒 = n − p after 𝛾 is estimated. 18 / 28

  19. Recall: quasi-Poisson regression • Quasi-Poisson regression model introduces an additional dispersion paramemeter 𝜒 . • It replaces the original model variance V i on x i by 𝜒 V i . • 𝜒 > 1 is used to accommodate overdispersion relative to the original model. • 𝜒 < 1 is used to accommodate underdispersion relative to the original model. • 𝜒 is usually estimated separately after estimating 𝛾 . 19 / 28

  20. Estimating 𝜒 in quasi-Poisson regression > fit.qpo <- glm(Days ~ Sex + Age + Eth + Lrn, data=quine, family=quasipoisson) (Dispersion parameter for quasipoisson family taken to be 13.16691) > mu = predict(fit.qpo, type= ' response ' ) > sum((fit.qpo $ y - mu)**2 / mu) / (length(mu) - length(coef(fit.qpo))) [1] 13.16684 20 / 28

  21. Example Data Variety Site 1 2 3 4 5 6 7 8 9 10 Mean 1 0.05 0.00 0.00 0.10 0.25 0.05 0.50 1.30 1.50 1.50 0.52 2 0.00 0.05 0.05 0.30 0.75 0.30 3.00 7.50 1.00 12.70 2.56 3 1.25 1.25 2.50 16.60 2.50 2.50 0.00 20.00 37.50 26.25 11.03 4 2.50 0.50 0.01 3.00 2.50 0.01 25.00 55.00 5.00 40.00 13.35 5 5.50 1.00 6.00 1.10 2.50 8.00 16.50 29.50 20.00 43.50 13.36 6 1.00 5.00 5.00 5.00 5.00 5.00 10.00 5.00 50.00 75.00 16.60 7 5.00 0.10 5.00 5.00 50.00 10.00 50.00 25.00 50.00 75.00 27.51 8 5.00 10.00 5.00 5.00 25.00 75.00 50.00 75.00 75.00 75.00 40.00 9 17.50 25.00 42.50 50.00 37.50 95.00 62.50 95.00 95.00 95.00 61.50 Mean 4.20 4.77 7.34 9.57 14.00 21.76 24.17 34.81 37.22 49.33 20.72 • Incidence of leaf blotch on 10 varieties of barley grown at 9 sites. • The response is the percentage leaf area affected. 21 / 28

  22. Heatmap for the data 9 8 7 proportion 6 75 site 5 50 4 25 0 3 2 1 1 2 3 4 5 6 7 8 9 10 variety 22 / 28

  23. > fit.qbin = glm(proportions ~ as.factor(site) + as.factor(variety), family = quasibinomial) • A binomial model satisfies var( Y ) = 𝜈 (1 − 𝜈 ). • A quasibinomial model assumes that var( Y ) = 𝜒𝜈 (1 − 𝜈 ), where 𝜒 is the dispersion parameter. • The probability of having leaf blotch for variety j at site i has the form exp( b + 𝛽 i + 𝛾 j ) p ij = 1 + exp( b + 𝛽 i + 𝛾 j ) 23 / 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend