Logistic regression Rasmus Waagepetersen Department of Mathematics - PowerPoint PPT Presentation

Logistic regression Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark October 6, 2020

Binary and count data Linear mixed models very flexible and useful model for continuous response variables that can be well approximated by a normal distribution. If the response variable is binary a normal distribution is clearly inappropriate. For count response variables normal distribution may be OK approximation if counts are not too small. However this not so for small counts. Also often problems with variance heterogeneity. This lecture: focus on regression models for binary and binomial data.

Example: o-ring failure data Number of damaged O-rings (out of 6) and temperature was recorded for 23 missions previous to Challenger space shuttle disaster. Proportions of damaged O-rings versus temperature and least squares fit: Problems with least squares fit: ◮ predicts proportions outside 0.8 [0 , 1]. ◮ assumes variance 0.6 Fraction damaged homogeneity (same precision 0.4 for all observations). ◮ proportions not normally 0.2 distributed. 0.0 40 50 60 70 80 temperature

Modeling of o-ring data 0.8 Number of damaged o-rings is a count variable but restricted to be between 0 0.6 Fraction damaged and 6 for each mission. Hence Poisson 0.4 distribution not applicable (a Poisson distributed variable can take any value 0.2 0 , 1 , 2 , . . . ). 0.0 40 50 60 70 80 temperature To j th ring for i th mission we may associate binary variable I ij which is one if ring defect and zero otherwise. We assume the I ij independent with p i = P ( I ij = 1) depending on temperature. Then count of defect rings, Y i = I i 1 + I i 2 + · · · + I i 6 follows a binomial b (6 , p i ) distribution

Binomial model for o-ring data Y i number of failures and t i temperature for i th mission. Y i ∼ b (6 , p i ) where p i probability of failure for i th mission. Model for variance heterogeneity: V ar Y i = n i p i (1 − p i ) How do we model dependence of p i on t i ? Linear model: p i = α + β t i Problem: p i not restricted to [0 , 1] !

Logistic regression Consider logit transformation: p η = logit( p ) = log( 1 − p ) where p 1 − p is the odds of an event happening with probality p . Note: logit injective function from ]0 , 1[ to R . Hence we may apply linear model to η and transform back: exp( α + β t ) η = α + β t ⇔ p = exp( α + β t ) + 1 Note: p now guaranteed to be in ]0 , 1[

Plots of logit and inverse logit functions 1.0 6 0.8 4 invlogit(eta) 2 0.6 logit(p) 0 0.4 −2 −4 0.2 −6 0.0 0.0 0.2 0.4 0.6 0.8 1.0 −10 −5 0 5 10 p eta

Logistic regression and odds Odds for a failure in i th mission is p i o i = = exp( η i ) = exp( β t i ) 1 − p i and odds ratio is o i = exp( η i − η j ) = exp( β ( t i − t j )) o j Example: to double odds we need 2 = exp( β ( t i − t j )) ⇔ t i − t j = log(2) /β Example: exp( β ) is increase in odds ratio due to unit increase in t .

Logistic regression in R > out=glm(cbind(damage,6-damage)~temp,family=binomial(logit)) > summary(out) ... Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 11.66299 3.29626 3.538 0.000403 *** temp -0.21623 0.05318 -4.066 4.78e-05 *** ... Null deviance: 38.898 on 22 degrees of freedom Residual deviance: 16.912 on 21 degrees of freedom ... Residual deviance: see later slide. Note response is a matrix with first rows numbers of damaged and second row number of undamaged rings. If we had the separate binary variables I ij in a vector y , say, this could be used as response instead: y~temp .

Model assessment for logistic regression Pearson’s statistic ( N number of binomial observations) N µ i ) 2 ( y i − ˆ X 2 = � V (ˆ µ i , n i ) i =1 where V ( µ, n ) is variance of observation with mean µ and number of trials n ( µ = np V ( µ, n ) = np (1 − p )). NB: Pearson’s statistic approximately χ 2 ( N − q ) where q number of parameters - if µ i ’s not too small (larger than 5 say). Pearson’s statistic very close to residual deviance reported in glm output.

Residuals for o-rings Pearson residuals: y i − ˆ µ i r P = i � V (ˆ µ i , n i ) devres=residuals(out) plot(devres~temp,xlab="temperature",ylab="residuals",ylim=c(-1.25,4)) pearson=residuals(out,type="pearson") points(pearson~temp,pch=2) 4 3 2 Much spurious structure due to residuals discreteness of data. 1 0 −1 55 60 65 70 75 80 temperature

Generalized linear models Logistic regression special case of wide class of models called generalized linear models that can all be analyzed using the glm -procedure. We need to specify distribution family and link function. In practice Binomial/logistic and Poisson/log regression are the most commonly used examples of generalized linear models. SPSS: Analyze → Generalized linear models → etc.

Overdispersion Suppose Pearsons X 2 is large relative to degrees of freedom n − p . This may either be due to systematic defiency of model (misspecified mean structure) or overdispersion , i.e. variance of observations larger than model predicts. Overdispersion may be due e.g. to unobserved explanatory variables like e.g. genetic variation between subjects, variation between batches in laboratory experiments, or variation in environment in agricultural trials. There are various ways to handle overdispersion - we will focus on a model based approach: generalized linear mixed models.

Exercises 1. Suppose the probability that the race horse Flash wins is 10%. What are the odds that Flash wins ? 2. Suppose that the logit of the probability p is 0, logit( p ) = 0. What is then the value of p ? 3. Consider a logistic regression model with P ( X = 1) = p and logit( p ) = 3 + 2 z . What are the odds for the event X = 1 when z = 0 . 5 ? What is the increase in odds if z is increased by one ? 4. Show that the mean and variance of a binomial variable Y ∼ b ( n , p ) are np and np (1 − p ), respectively. Hint: use that Y = I 1 + I 2 + . . . , I n where the I i are independent binary random variables with P ( I i = 1) = p .

5. Consider the wheezing data (available as data set ohio in the faraway package or ohio.sav at the course web page). The variables in the data set are resp (an indicator of wheeze status, 1=yes, 0=no), id (a numeric vector for subject id), age (a numeric vector of age, 0 is 9 years old), smoke (an indicator of maternal smoking at the first year of the study). Fit a logistic regression model for the binary resp variable with age and smoke as factors. Check the significance of age and smoke. Compare with a model with age as a covariate (i.e. a single slope parameter for age).

Logistic regression Rasmus Waagepetersen Department of Mathematics - PowerPoint PPT Presentation

Logistic regression Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark October 6, 2020 Binary and count data Linear mixed models very flexible and useful model for continuous response variables that can be well

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

CSE 158 Lecture 10 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday!

Regularization Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative

Excel2013: Model Logistic MLE 1Y1X Sept 2015 V1A V1A V1A Excel2013 Model Logistic MLE 1Y1X

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm

Applied Machine Learning Logistic and Softmax Regression Siamak Ravanbakhsh COMP 551 (Fall 2020)

Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models Shanshan Wu , Sujay

The Power of Binary 0, 1, 10, 11, 100, 101, 110, 111... What is Binary? a binary number

Binary Numbers Wolfgang Schreiner Research Institute for Symbolic Computation (RISC-Linz)