Logistic regression and Poisson regression Rasmus Waagepetersen - PowerPoint PPT Presentation

Logistic regression and Poisson regression Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark October 6, 2017

Outline ◮ Logistic regression ◮ Poisson regression

Binary and count data Linear mixed models very flexible and useful model for continuous response variables that can be well approximated by a normal distribution. If the response variable is binary a normal distribution is clearly inappropriate. For count response variables normal distribution may be OK approximation if counts are not too small. However this not so for small counts. Also problem with variance heterogeneity: typically larger variances for larger counts. This lecture: regression models for binary and count data.

Example: o-ring failure data Number of damaged O-rings (out of 6) and temperature was recorded for 23 missions previous to Challenger space shuttle disaster. Proportions of damaged O-rings versus temperature and least squares fit: Problems with least squares fit: ◮ predicts proportions outside 0.8 [0 , 1]. ◮ assumes variance 0.6 Fraction damaged homogeneity (same precision 0.4 for all observations). ◮ proportions not normally 0.2 distributed. 0.0 40 50 60 70 80 temperature

Modeling of o-ring data Poisson with mean 3 250 Number of damaged o-rings is a count 200 variable but restricted to be between 0 150 and 6 for each mission. Hence Poisson Frequency distribution not applicable (a Poisson 100 distributed variable can take any value 50 0 , 1 , 2 , . . . ). 0 0 2 4 6 8 10 12 To j th ring for i th mission we may associate binary variable I ij which is one if ring defect and zero otherwise. We assume the I ij independent with p i = P ( I ij = 1) depending on temperature. Then Y i = � 6 j =1 I ij follows a binomial b (6 , p i ) distribution.

Binomial model for o-ring data Y i number of failures and t i temperature for i th mission. Y i ∼ b (6 , p i ) where p i probability of failure for i th mission. Model for variance heterogeneity: V ar Y i = n i p i (1 − p i ) How do we model dependence of p i on t i ? Linear model: p i = α + β t i Problem: p i not restricted to [0 , 1] !

Logistic regression Consider logit transformation: p η = logit( p ) = log( 1 − p ) where p 1 − p is the odds of an event happening with probality p . Note: logit injective function from [0 , 1] to R . Hence we may apply linear model to η and transform back: exp( α + β t ) η = α + β t ⇔ p = exp( α + β t ) + 1 Note: p now guaranteed to be in [0 , 1]

Plots of logit and inverse logit functions 1.0 6 0.8 4 invlogit(eta) 2 0.6 logit(p) 0 0.4 −2 −4 0.2 −6 0.0 0.0 0.2 0.4 0.6 0.8 1.0 −10 −5 0 5 10 p eta

Logistic regression and odds Odds for a failure in i th mission is p i o i = = exp( η i ) 1 − p i and odds ratio is o i = exp( η i − η j ) = exp( β ( t i − t j )) o j Example: to double odds we need 2 = exp( β ( t i − t j )) ⇔ t i − t j = log(2) /β Example: exp( β ) is increase in odds ratio due to unit increase in t .

Estimation Likelihood function for simple logistic regression logit( p i ) = α + β x i : � p y i i (1 − p i ) n i − y i L ( α, β ) = i where exp( α + β x i ) p i = 1 + exp( α + β x i ) α, ˆ MLE (ˆ β ) found by iterative maximization (Newton-Raphson) More generally we may have multiple explanatory variables: logit( p i ) = β 1 x 1 i + . . . + β p x pi

Logistic regression in R > out=glm(cbind(damage,6-damage)~temp,family=binomial(logit)) > summary(out) ... Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 11.66299 3.29626 3.538 0.000403 *** temp -0.21623 0.05318 -4.066 4.78e-05 *** ... Null deviance: 38.898 on 22 degrees of freedom Residual deviance: 16.912 on 21 degrees of freedom ... Residual deviance: see later slide. Note response is a matrix with first rows numbers of damaged and second row number of undamaged rings. If we had the separate binary variables I ij in a vector y , say, this could be used as response instead: y~temp .

Hypothesis testing Wald test: Estimate Std. Error z value Pr(>|z|) (Intercept) 11.66299 3.29626 3.538 0.000403 *** temp -0.21623 0.05318 -4.066 4.78e-05 *** Temperature highly significant.

Same conclusion using likelihood ratio test: > out2=glm(cbind(damage,6-damage)~1,family=binomial(logit) > anova(out2,out,test="Chisq") Analysis of Deviance Table Model 1: cbind(damage, 6 - damage) ~ 1 Model 2: cbind(damage, 6 - damage) ~ temp Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 22 38.898 2 21 16.912 1 21.985 2.747e-06 (log likelihood ratio approximately χ 2 distributed) (alternatively you may use drop1(out,test="Chisq") )

Another example: radioactive decay Intensity of radioactive decay: λ ( t ) = A exp( at ) By theory of physics number of decays X i in time interval [ t i , t i +1 [ is a Poisson variable with mean � t i +1 λ ( t ) d t ≈ ∆ i λ ( t i ) = exp(log ∆ i + logA + at i ) t i where ∆ i = t i +1 − t i . NB: X i for disjoint intervals independent. Simulated radioactive decay x 0 , . . . , x 14 within unit intervals [ t , t + 1[, t = 0 , 1 , 2 , . . . : 5 9 5 5 2 1 4 0 0 2 0 0 0 0 1

Naive approach: log E X i ≈ log 1 + log A + at i = log A + at i , i = 0 , 1 , 2 , hence fit linear regression to ( t i , log x i ). Problems: ◮ log transformation of zero counts ? ◮ variance heterogeneity: larger counts have large variance ◮ linear model fits model for E log X i but this is different from log E X i Better approach: Poisson regression with log link.

Poisson regression Suppose X 1 , . . . , X n are Poisson distributed with associated covariates z 1 , . . . , z n . Let λ i > 0 denote expectation of X i . We might try linear model λ i = α + β z i but this may conflict with the requirement λ i > 0. Better alternative is log-linear model λ i = exp( α + β z i ) since this guarantees λ i > 0. Variance heterogeneity: for a Poisson variable, the variance is equal to the expectation: V ar X i = E X i = λ i .

Implementation in R - linear model > radiols=lm(log(x+0.001)~offset(log(deltat))+times) > summary(radiols) ... Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.1969 1.5489 1.418 0.17961 times -0.6152 0.1883 -3.267 0.00612 ** True log A and a are 2 . 08 and − 0 . 3.

Implementation in R - Poisson regression model > radiofit=glm(x~offset(log(deltat))+times,family=poisson(log)) > summary(radiofit) #offset to take into account lengths of ... #which may in general differ from 1 Min 1Q Median 3Q Max -1.5955 -1.0093 -0.7251 0.8709 1.5391 ... Estimate Std. Error z value Pr(>|z|) (Intercept) 2.08130 0.23835 8.732 < 2e-16 *** times -0.26287 0.05464 -4.811 1.5e-06 *** ... Residual deviance: 17.092 on 13 degrees of freedom True log A and a are 2 . 08 and − 0 . 3.

Data and fitted values plot(times,x) lines(times,fitted(radiofit)) lines(times,exp(fitted(radiols)),lty=2) legend(locator(1),lty=c(1,2),legend=c("Poisson regression","least 8 Poisson regression least squares 6 x 4 2 0 0 2 4 6 8 10 12 14 times Note problems with least squares fit: follows zeros too closely !

Model assessment for logistic and Poisson regression ◮ Pearson’s statistic n µ i ) 2 ( y i − ˆ X 2 = � V (ˆ µ i ) i =1 where V ( µ ) is variance of observation with mean µ ( µ = p or µ = λ , V ( p ) = np (1 − p ) or V ( λ ) = λ ). ◮ Plot Pearson residuals against predicted values and covariates = y i − ˆ µ i r P i � V (ˆ µ i ) NB: Pearson’s statistic approximately χ 2 ( n − p ) where p number of parameters - if µ i ’s not too small (larger than 5 say). NB: Pearson residuals not normal - can make interpretation difficult. Deviance closely related to Pearson’s statistic but more technical. Deviance residuals similar to Pearson residuals.

Residuals for o-rings devres=residuals(out) plot(devres~temp,xlab="temperature",ylab="residuals",ylim=c(-1.25,4)) pearson=residuals(out,type="pearson") points(pearson~temp,pch=2) 4 3 2 Much spurious structure due to residuals discreteness of data. 1 0 −1 55 60 65 70 75 80 temperature

Residuals for radioactive decay plot(residuals(radiofit),ylim=c(-1.6,1.8)) points(residuals(radiofit,type="pearson"),pch=2) 1.5 1.0 0.5 residuals(radiofit) Much spurious structure due to 0.0 discreteness of data. −0.5 −1.0 −1.5 2 4 6 8 10 12 14 Index

Generalized linear models Logistic and Poisson regression special cases of wide class of models called generalized linear models that can all be analyzed using the glm -procedure. We need to specify distribution family and link function. In practice Binomial/logistic and Poisson/log regression are the most commonly used examples of generalized linear models. SPSS: Analyze → Generalized linear models → etc.

Logistic regression and Poisson regression Rasmus Waagepetersen - PowerPoint PPT Presentation

Logistic regression and Poisson regression Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark October 6, 2017 Outline Logistic regression Poisson regression Binary and count data Linear mixed models very

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Workshop 10.6a: Poisson regression Murray Logan 12 Sep 2016 Section 1 Poisson regression

Poisson Distribution: Review Poisson Over Time Let B 1 Poisson( ) be the number of bikes

Poisson Regression Models for Count Data Outline Review Introduction to Poisson

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Linear Classification w T x i is the classifier score for the instance x i The score can be used

Replication and Consistency 07 The Universality of Consensus Annette Bieniusa AG Softech FB

Mixed-Integer Programming for Cycle Detection in Non-reversible Markov Processes Isabel

Transport properties - Boltzmann equation goal: calculation of conductivity Boltzmann transport

Math 211 Math 211 Lecture #19 Nullspaces and Subspaces October 10, 2001 2 Structure of the

Theory of Linear Ordinary Differential Equations Bernd Schr oder logo1 Bernd Schr oder

Lecture 3: Linear systems Habib Ammari Department of Mathematics, ETH Z urich Numerical

Higher-order iterated sums signatures Nikolas Tapia FG6 joint with J. Diehl (Greifswald) and K.

Sambuz

Useful Links

Newsletter

Mail Us