Statistics for Applications Chapter 10: Generalized Linear Models - PowerPoint PPT Presentation

Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52

Linear model A linear model assumes Y | X ∼ N ( µ ( X ) , σ 2 I ) , And E( Y | X ) = µ ( X ) = X ⊤ β, I 2/52

Components of a linear model The two components (that we are going to relax) are 1. Random component: the response variable Y | X is continuous E( Y | X ) . and normally distributed with mean µ = µ ( X ) = I 2. Link: between the random and covariates ( X (1) , X (2) , · · · , X ( p ) ) ⊤ : µ ( X ) = X ⊤ β . X = 3/52

Generalization A generalized linear model (GLM) generalizes normal linear regression models in the following directions. 1. Random component: Y ∼ some exponential family distribution 2. Link: between the random and covariates: � � g µ ( X ) = X ⊤ β where g called link function and µ = I E( Y | X ) . 4/52

Example 1: Disease Occuring Rate In the early stages of a disease epidemic, the rate at which new cases occur can often increase exponentially through time. Hence, if µ i is the expected number of new cases on day t i , a model of the form µ i = γ exp( δt i ) seems appropriate. ◮ Such a model can be turned into GLM form, by using a log link so that log( µ i ) = log( γ ) + δt i = β 0 + β 1 t i . ◮ Since this is a count, the Poisson distribution (with expected value µ i ) is probably a reasonable distribution to try. 5/52

Example 2: Prey Capture Rate(1) The rate of capture of preys, y i , by a hunting animal, tends to increase with increasing density of prey, x i , but to eventually level off, when the predator is catching as much as it can cope with. A suitable model for this situation might be αx i µ i = , h + x i where α represents the maximum capture rate, and h represents the prey density at which the capture rate is half the maximum rate. 6/52

Example 2: Prey Capture Rate (2) 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.2 0.4 0.6 0.8 1.0 7/52

Example 2: Prey Capture Rate (3) ◮ Obviously this model is non-linear in its parameters, but, by using a reciprocal link, the right-hand side can be made linear in the parameters, 1 1 h 1 1 g ( µ i ) = = + = β 0 + β 1 . µ i α α x i x i ◮ The standard deviation of capture rate might be approximately proportional to the mean rate, suggesting the use of a Gamma distribution for the response. 8/52

Example 3: Kyphosis Data The Kyphosis data consist of measurements on 81 children following corrective spinal surgery. The binary response variable, Kyphosis, indicates the presence or absence of a postoperative deforming. The three covariates are, Age of the child in month, Number of the vertebrae involved in the operation, and the Start of the range of the vertebrae involved. ◮ The response variable is binary so there is no choice: Y | X is µ ( X ) ∈ (0 , 1) . Bernoulli with expected value ◮ We cannot write µ ( X ) = X ⊤ β because the right-hand side ranges through I R . f ( X ⊤ β ) ∈ (0 , 1) ◮ We need an invertible function f such that 9/52

GLM: motivation ◮ clearly, normal LM is not appropriate for these examples; ◮ need a more general regression framework to account for various types of response data ◮ Exponential family distributions ◮ develop methods for model fitting and inferences in this framework ◮ Maximum Likelihood estimation. 10/52

Exponential Family R k is { P θ : θ ∈ Θ } , Θ ⊂ I A family of distribution said to be a R q , k -parameter exponential family on I if there exist real valued functions: ◮ η 1 , η 2 , · · · , η k and B of θ , R q such T 2 , · · · , T k , ∈ I ◮ T 1 , and h of x that the density function (pmf or pdf) of P θ can be written as L k p θ ( x ) = exp[ η i ( θ ) T i ( x ) − B ( θ )] h ( x ) i =1 11/52

Normal distribution example X ∼ N ( µ, σ 2 ) , = ( µ, σ 2 ) . The ◮ Consider θ density is ( 2 ) µ 1 µ 1 2 − √ p θ ( x ) = exp x − x , σ 2 2 σ 2 2 σ 2 σ 2 π which forms a two-parameter exponential family with µ 1 2 η 1 = , η 2 = − , T 1 ( x ) = x, T 2 ( x ) = x , σ 2 2 σ 2 2 √ µ B ( θ ) = + log( σ 2 π ) , h ( x ) = 1 . 2 σ 2 σ 2 is ◮ When known, it becomes a one-parameter exponential family on I R : 2 − x 2 µ µ e 2 σ 2 √ η = , T ( x ) = x, B ( θ ) = , h ( x ) = . σ 2 2 σ 2 σ 2 π 12/52

Examples of discrete distributions The following distributions form discrete exponential families of distributions with pmf x (1 − p ) 1 − x ◮ Bernoulli( p ): p , x ∈ { 0 , 1 } λ x − λ ◮ Poisson( λ ): e , x = 0 , 1 , . . . . x ! 13/52

Examples of Continuous distributions The following distributions form continuous exponential families of distributions with pdf: 1 x a − 1 − ◮ Gamma ( a, b ) : x e b ; Γ( a ) b a ◮ above: a : shape parameter, b : scale parameter ◮ reparametrize: µ = ab : mean parameter ) a 1 ( a a − 1 − ax x e . µ Γ( a ) µ β α − α − 1 − β/x ◮ Inverse Gamma ( α, β ) : x e . Γ( α ) − σ 2( x − µ )2 σ 2 Gaussian ( µ, σ 2 ) : ◮ Inverse e 2 . 2 µ x 2 πx 3 Others: Chi-square, Beta, Binomial, Negative binomial distributions. 14/52

Components of GLM 1. Random component: Y ∼ some exponential family distribution 2. Link: between the random and covariates: g µ ( X ) = X ⊤ β � � where g called link function and µ ( X ) = I E( Y | X ) . 15/52

One-parameter canonical exponential family ∈ I ◮ Canonical exponential family for k = 1 , y R ( yθ − b ( θ ) ) f θ ( y ) = exp + c ( y, φ ) φ known functions for some b ( · ) and c ( · , · ) . ◮ If φ is known, this is a one-parameter exponential family with θ being the canonical parameter . ◮ If φ is unknown, this may/may not be a two-parameter exponential family. φ is called dispersion parameter. ◮ In this class, we always assume that φ is known . 16/52

Normal distribution example ◮ Consider the following Normal density function with known σ 2 , variance 1 − ( y − µ )2 √ f θ ( y ) = e 2 2 σ σ 2 π � ( )� 1 2 yµ − 2 µ 1 y 2 + log(2 πσ 2 ) = exp − , σ 2 σ 2 2 θ 2 = µ, φ = σ 2 , , b ( θ ) = 2 , ◮ Therefore θ and 1 y 2 c ( y, φ ) = − ( + log(2 πφ )) . 2 φ 17/52

Other distributions Table 1: Exponential Family Normal Poisson Bernoulli N ( µ, σ 2 ) Notation P ( µ ) B ( p ) Range of y ( −∞ , ∞ ) [0 , −∞ ) { 0 , 1 } σ 2 φ 1 1 θ 2 θ log(1 + e θ ) b ( θ ) e 2 − 1 ( y 2 + log(2 πφ )) − log y ! c ( y, φ ) 1 2 φ 18/52

Likelihood Let ℓ ( θ ) = log f θ ( Y ) denote the log-likelihood function. The mean I E( Y ) and the variance var( Y ) can be derived from the following identities ◮ First identity ∂ℓ I E( ) = 0 , ∂θ ◮ Second identity ∂ 2 ℓ ∂ℓ ) 2 = 0 . I E( ) + I E( ∂θ 2 ∂θ � Obtained from f θ ( y ) dy ≡ 1 . 19/52

Expected value Note that Y θ − b ( θ ℓ ( θ ) = + c ( Y ; φ ) , φ Therefore ′ ( θ ) ∂ℓ Y − b = ∂θ φ It yields ′ ( θ )) ∂ℓ I E( Y ) − b 0 = I E( ) = , ∂θ φ which leads to ′ ( θ ) . I E( Y ) = µ = b 20/52

Variance On the other hand we have we have ∂ 2 ℓ ′′ ( θ ) ′ ( θ ) ) 2 ∂ℓ b ( Y − b ) 2 = − + ( + ∂θ 2 ∂θ φ φ and from the previous result, ′ ( θ ) Y − b Y − I E( Y ) = φ φ Together, with the second identity, this yields ′′ ( θ ) b var( Y ) 0 = − + , φ 2 φ which leads to ′′ ( θ ) φ. var( Y ) = V ( Y ) = b 21/52

Example: Poisson distribution Example: Consider a Poisson likelihood, y µ y log µ − µ − log( y !) − µ f ( y ) = e = e , y ! Thus, = log µ, b ( θ ) = µ, c ( y, φ ) = − log( y !) , θ φ = 1 , θ µ = e , θ b ( θ ) = e , θ ′′ ( θ ) b = e = µ, 22/52

Link function ◮ β is the parameter of interest, and needs to appear somehow in the likelihood function to use maximum likelihood. X ⊤ β to ◮ A link function g relates the linear predictor the mean parameter µ , X ⊤ β = g ( µ ) . ◮ g is required to be monotone increasing and differentiable − 1 ( X ⊤ β ) . µ = g 23/52

Examples of link functions g ( · ) = identity. ◮ For LM, ◮ Poisson data. Suppose Y | X ∼ Poisson( µ ( X )) . ◮ µ ( X ) > 0 ; ◮ log( µ ( X )) = X ⊤ β ; ◮ In general, a link function for the count data should map (0 , + ∞ ) to I R . ◮ The log link is a natural one. ◮ Bernoulli/Binomial data. ◮ 0 < µ < 1 ; ◮ g should map (0 , 1) to I R : ◮ 3 choices: ( µ ( X ) ) 1. logit: log X ⊤ = β ; 1 − µ ( X ) 2. probit: Φ − 1 ( µ ( X )) = X ⊤ β where Φ( · ) is the normal cdf; 3. complementary log-log: log( − log(1 − µ ( X ))) = X ⊤ β ◮ The logit link is the natural choice. 24/52

Examples of link functions for Bernoulli response (1) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 x e ◮ in blue: f 1 ( x ) = 1 + e x ◮ in red: f 2 ( x ) = Φ( x ) (Gaussian CDF) 25/52

Statistics for Applications Chapter 10: Generalized Linear Models - PowerPoint PPT Presentation

Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y | X N ( ( X ) , 2 I ) , And E( Y | X ) = ( X ) = X , I 2/52 Components of a linear model The two components

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Multiple logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models in

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Generalized linear models Christopher F Baum EC 823: Applied Econometrics Boston College, Spring

Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik

Introduction to General and Generalized Linear Models Generalized Linear Models - part I Henrik

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan February 7, 2017

Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik

Generalized MPLS Signaling draft-ietf-mpls-generalized-signaling-05.txt

Generalized Additive Models September 10, 2019 Generalized Additive Models September 10, 2019 1

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Generalized Contagion Generalized Model of Contagion Principles of Complex Systems References

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens

Game Tree Search 1/6/17 Frameworks for Decision-Making 1. Goal-directed planning Agents want

Discrepancy and energy optimization on the sphere Dmitriy Bilyk University of Minnesota

STK-IN4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Real Root Finding for Equivariant Semi-Algebraic Systems ISSAC 2018 Cordian Riener 1 Mohab Safey

Learning argumentative recommenders Olivier Cailloux LAMSADE, Universit Paris-Dauphine 22 nd

t t t st