STAT 213 Logistic Regression II Colin Reimer Dawson Oberlin - PowerPoint PPT Presentation

Outline Logistic Regression Fitting the Model Assessment and Testing STAT 213 Logistic Regression II Colin Reimer Dawson Oberlin College 28 April 2016

Outline Logistic Regression Fitting the Model Assessment and Testing Outline Logistic Regression Fitting the Model Assessment and Testing

Outline Logistic Regression Fitting the Model Assessment and Testing Reading Quiz (Multiple Choice) Two logistic models have the same β 0 but different β 1 . For each of the following, state whether the statement must be true , might be true , or cannot be true b. The graphs of log( odds ) versus X cross the Y -axis at the same value of y d. The graphs of P ( Y = 1) versus X cross the line P ( Y = 1) = 0 . 5 at the same value of X e. The graphs of P ( Y = 1) versus X the line x = 0 . 5 at the same value of Y .

Outline Logistic Regression Fitting the Model Assessment and Testing For Tuesday • Write up: 9.14, 9.22, 9.26 • Read: 10.1-10.2 • Answer: 9.12, 10.2, 10.4 • Soon: Project 3 (due on the last day of classes)

Outline Logistic Regression Fitting the Model Assessment and Testing Quantitative Vs. Categorical Predictor and Response Response Quantitative Categorical Quantitative Linear Reg. Predictor Logistic Reg. Categorical ANOVA

Outline Logistic Regression Fitting the Model Assessment and Testing Binary Logistic Regression Response variable ( Y ) is categorical with two categories (i.e., binary). • Code Y as an indicator variable: 0 or 1 • Assume (for now) a single quantitative predictor, X

Outline Logistic Regression Fitting the Model Assessment and Testing Two Equivalent Forms of Logistic Regression Probability Form e β 0 + β 1 X π = 1 + e β 0 + β 1 X Logit Form � π � log = β 0 + β 1 X 1 − π π : Probability that Y = 1 π 1 − π : Odds that Y = 1 � � π log : Log odds, or logit that Y = 1 1 − π

Outline Logistic Regression Fitting the Model Assessment and Testing Example: Golf Putts Distance (ft) 3 4 5 6 7 # Made 84 88 61 61 44 # Missed 17 31 47 64 90 Total 101 119 108 125 134 1. Estimate the probability of success at each length 2. Estimate the odds of success at each length 3. Estimate the log odds of success at each length 4. Plot each of these against distance

Outline Logistic Regression Fitting the Model Assessment and Testing Odds Ratios Logit and Odds � π � log = β 0 + β 1 X 1 − π π 1 − π = e β 0 + β 1 X • In the model, for each 1 unit increase in X , the logit increases by β 1 . • Equivalently: For each 1 unit increase in X , the odds are multiplied by e β 1 • In other words, e β 1 is the odds ratio resulting from a one unit change in X , with β 1 the log odds ratio .

Outline Logistic Regression Fitting the Model Assessment and Testing Odds Ratios The odds ratio associated with a binary response Y at two different predictor values X = x 2 vs. X = x 2 is the ratio of the odds; that is: Odds Ratio ( x 2 vs. x 1 ) = π ( x 2 ) / (1 − π ( x 2 )) π ( x 1 ) / (1 − π ( x 1 )) We can estimate this from a sample using: Odds Ratio ( x 2 vs. x 1 ) = ˆ π ( x 2 ) / (1 − ˆ π ( x 2 )) � π ( x 1 ) / (1 − ˆ ˆ π ( x 1 ))

Outline Logistic Regression Fitting the Model Assessment and Testing Example: Golf Putts Distance (ft) 3 4 5 6 7 # Made 84 88 61 61 44 # Missed 17 31 47 64 90 Total 101 119 108 125 134 ˆ 0.832 0.739 0.565 0.488 0.328 π Odds 4.94 2.84 1.30 0.95 0.49 Log Odds 1.60 1.04 0.26 -0.05 -0.71 5. Find the sample odds ratio for success for 4 ft. vs. 3 ft; 5 vs. 4; 6 vs. 5; 7 vs. 6 6. Take the log of each of these to get the (additive) change in the logit. Should be slopes of lines “connecting the dots” (since ∆ X = 1 ).

Outline Logistic Regression Fitting the Model Assessment and Testing Example: Golf Putts Distance (ft) 3 4 5 6 7 # Made 84 88 61 61 44 # Missed 17 31 47 64 90 Odds 4.94 2.84 1.30 0.95 0.49 Log Odds 1.60 1.04 0.26 -0.05 -0.71 OR 0.575 0.457 0.734 0.513 ∆ Log Odds -0.56 -0.78 -0.31 -0.66 • In the data, successive ORs (changes in log odds) are different • The model fits a constant ratio (slope for log odds) 7. Draw a single line through your logit plot and get an estimated slope and intercept. These are your ˆ β 0 and ˆ β 1 .

Outline Logistic Regression Fitting the Model Assessment and Testing Example: Golf Putts library("mosaic") Putts <- data.frame(Distance = 3:7, Made = c(84,88,61,61,44), Total = c(101,119,108,125,134)) Putts <- mutate(Putts, PropMade = Made / Total) (model <- glm(PropMade ~ Distance, weights = Total, data = Putts, family = "binomial")) Call: glm(formula = PropMade ~ Distance, family = "binomial", data = Putts, weights = Total) Coefficients: (Intercept) Distance 3.2568 -0.5661 Degrees of Freedom: 4 Total (i.e. Null); 3 Residual Null Deviance: 81.39 Residual Deviance: 1.069 AIC: 30.18

Outline Logistic Regression Fitting the Model Assessment and Testing Example: Golf Putts (Probabilities) xyplot(PropMade ~ Distance, data = Putts) f.hat <- makeFun(model) plotFun(f.hat(Distance) ~ Distance, add = TRUE) ● 0.8 ● 0.7 PropMade 0.6 ● 0.5 ● 0.4 ● 3 4 5 6 7 Distance

Outline Logistic Regression Fitting the Model Assessment and Testing Example: Golf Putts (Odds) f.hat <- makeFun(model, transform = function(p){p/(1-p)}) xyplot(PropMade/(1 - PropMade) ~ Distance, data = Putts) plotFun(f.hat(Distance) ~ Distance, add = TRUE) PropMade/(1 − PropMade) 5 ● 4 3 ● 2 ● 1 ● ● 3 4 5 6 7 Distance exp(-0.5661) ## Odds ratio for a one foot increase in Distance [1] 0.5677353

Outline Logistic Regression Fitting the Model Assessment and Testing Example: Golf Putts (Log Odds) f.hat <- makeFun(model, transform = logit) xyplot(logit(PropMade) ~ Distance, data = Putts) plotFun(f.hat(Distance) ~ Distance, add = TRUE) ● logit(PropMade) 1.5 1.0 ● 0.5 ● 0.0 ● −0.5 ● 3 4 5 6 7 Distance -0.5661 ## Log (odds ratio) / rate of change in log odds / slope of logit

Outline Logistic Regression Fitting the Model Assessment and Testing Reconstructing Odds Ratio • The logistic regression output from R gives us ˆ β 0 and ˆ β 1 . But unlike in linear regression, these are not very interpretable on their own. • We have seen that β 1 corresponds to “rate of change in log odds”. Better to convert to “odds ratio” per unit change in X . • What do we do to β 1 to get this?

Outline Logistic Regression Fitting the Model Assessment and Testing Choosing ˆ β 0 and ˆ β 1 Recall that in linear regression, we choose ˆ β 0 and ˆ β 1 to minimize ( Y i − f ( X i )) 2 = � � ( Y i − ˆ β 0 − ˆ β 1 X ) 2 RSS = i i For a logistic model, choose ˆ β 0 and ˆ β 1 to maximize the probability of the data under the model . n � π Y i π i ) 1 − Y i Pr ( Data | Model ) = ˆ i (1 − ˆ i =1 � Y i � n � β 0 +ˆ ˆ � 1 − Y i β 1 X i e 1 � = 1 + e ˆ β 0 +ˆ 1 + e ˆ β 0 +ˆ β 1 X i β 1 X i i =1

Outline Logistic Regression Fitting the Model Assessment and Testing Maximum Likelihood • Pr ( Data | Model ) is called the likelihood of the model. • In fact, when we assume heteroskedastic Normal residuals, the RSS is the negative log likelihood. • So we’ve secretly been doing max likelihood this whole time. • But whereas MLE for Normal-linear model was a calculus problem, MLE for logistic requires an iterative algorithm.

Outline Logistic Regression Fitting the Model Assessment and Testing Conditions for Logistic Regression 1. Linearity ( log odds depends linearly on X ) 2. Independence (no clustering or time/space dependence) 3. Random (data comes from a random sample, or random assignment) 4. Normality no longer applies! (Response is binary, so it can’t) 5. Homoskedasticity no longer required! (In fact, more variance when ˆ π near 0.5)

Outline Logistic Regression Fitting the Model Assessment and Testing Checking Linearity • Can’t just transform response via logit to check linearity... • Unless data is binned... then can take logit of proportion per bin

Outline Logistic Regression Fitting the Model Assessment and Testing Binned Data xyplot(logit(PropMade) ~ Distance, data = Putts, type = c("p","r")) ● 1.5 logit(PropMade) ● 1.0 0.5 ● 0.0 ● −0.5 ● 3 4 5 6 7 Distance Logits are fairly linear

Outline Logistic Regression Fitting the Model Assessment and Testing Equivalent Model Code for Binned Data Putts <- mutate(Putts, Missed = Total - Made) (m2 <- glm(cbind(Made,Missed) ~ Distance, data = Putts, family = "binomial")) Call: glm(formula = cbind(Made, Missed) ~ Distance, family = "binomial", data = Putts) Coefficients: (Intercept) Distance 3.2568 -0.5661 Degrees of Freedom: 4 Total (i.e. Null); 3 Residual Null Deviance: 81.39 Residual Deviance: 1.069 AIC: 30.18

Outline Logistic Regression Fitting the Model Assessment and Testing Hypothesis Test for β 1 In linear regression, we computed ˆ β 1 − 0 t obs = se (ˆ ˆ β 1 ) and found P -value = Pr ( | T n − 2 | ≥ | t obs | ) In logistic regression we can use a Normal approximation: ˆ β 1 − 0 z obs = se (ˆ ˆ β 1 ) and get P -value = Pr ( | Z | ≥ | z obs | )

STAT 213 Logistic Regression II Colin Reimer Dawson Oberlin - PowerPoint PPT Presentation

Outline Logistic Regression Fitting the Model Assessment and Testing STAT 213 Logistic Regression II Colin Reimer Dawson Oberlin College 28 April 2016 Outline Logistic Regression Fitting the Model Assessment and Testing Outline

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

S01 - Logistic Regression STAT 401 (Engineering) - Iowa State University April 23, 2018

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

STAT 213 Logistic Regression: Assessment and Testing Colin Reimer Dawson Oberlin College April

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

CSE-571 Grid maps or scans Probabilistic Robotics [Lu & Milios, 97; Gutmann, 98: Thrun

Searching for family members - (Durbin et al., Ch.5) Suppose we have a family of related

Week 4: Binary Outcomes Logistic Regression & Classification Max H. Farrell The University

TIDY TEXT Jeff Goldsmith, PhD Department of Biostatistics 1 Text data Written

Are the clients of flawed classes (also) defect prone? Authors: Radu & Cristina Marinescu

Welcome and Introductions Statistical Consulting What is it, and why is it important? Welcome to

Implications of Big Data for Statistics Instruction 17 Nov 2013 Teaching Introductory Business

STAT 213 Logistic Regression II Colin Reimer Dawson Oberlin - PowerPoint PPT Presentation

Outline Logistic Regression Fitting the Model Assessment and Testing STAT 213 Logistic Regression II Colin Reimer Dawson Oberlin College 28 April 2016 Outline Logistic Regression Fitting the Model Assessment and Testing Outline

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

S01 - Logistic Regression STAT 401 (Engineering) - Iowa State University April 23, 2018

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

STAT 213 Logistic Regression: Assessment and Testing Colin Reimer Dawson Oberlin College April

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

CSE-571 Grid maps or scans Probabilistic Robotics [Lu &amp; Milios, 97; Gutmann, 98: Thrun

Searching for family members - (Durbin et al., Ch.5) Suppose we have a family of related

Week 4: Binary Outcomes Logistic Regression &amp; Classification Max H. Farrell The University

TIDY TEXT Jeff Goldsmith, PhD Department of Biostatistics 1 Text data Written

Are the clients of flawed classes (also) defect prone? Authors: Radu &amp; Cristina Marinescu

Welcome and Introductions Statistical Consulting What is it, and why is it important? Welcome to

Implications of Big Data for Statistics Instruction 17 Nov 2013 Teaching Introductory Business

CSE-571 Grid maps or scans Probabilistic Robotics [Lu & Milios, 97; Gutmann, 98: Thrun

Week 4: Binary Outcomes Logistic Regression & Classification Max H. Farrell The University

Are the clients of flawed classes (also) defect prone? Authors: Radu & Cristina Marinescu