Logistic regression Susanne Rosthj Section of Biostatistics - PowerPoint PPT Presentation

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Logistic regression Susanne Rosthøj Section of Biostatistics Institute of Public Health University of Copenhagen sr@biostat.ku.dk

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Outline • Risk, odds and odds-ratio • Simple logistic regression: • One binary explantory variable • One categorical • One quantitative. • Multiple logistic regression: • Two binary + interaction 2 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Example 1: gender and CHD Is the risk of CHD different for males and females? CHD 0 1 Females 616 (85.6%) 104 (14.4%) 720 Males 479 (74.5%) 164 (25.5%) 643 1095 (80.3%) 268 (19.7%) 1363 The hypothesis of no difference in risk for the genders is rejected (p<0.0001, Chi-square test). 3 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Quantifying the difference Risk of CHD for males: p 1 ≈ 164 / 643 = 0 . 26 Risk of CHD for females: p 2 ≈ 104 / 720 = 0 . 14 Odds of CHD for males: p 1 / ( 1 − p 1 ) ≈ 164 / 479 = 0 . 34 ( ≈ 1 : 3 ) Odds of CHD for females: p 2 / ( 1 − p 2 ) ≈ 104 / 616 = 0 . 17 ( ≈ 1 : 6 ) Quantification of the difference in risk : Absolute Risk Reduction (ARR): | p 1 − p 2 | ≈ 0.12 Risk Ratio (RR) : p 1 / p 2 ≈ 1.77 Odds-ratio (OR): p 1 / ( 1 − p 1 ) / ( p 2 / ( 1 − p 2 )) ≈ 2.03. When p 1 and p 2 are small (<0.1) : RR ≈ OR. We have seen that there is a difference for males and females : p 1 � = p 2 i.e. ARR > 0, RR � = 1, OR � = 1 4 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s The purpose of a logistic regression analysis Relate a binary outcome variable , e.g. � if i developed CHD 1 Y i = 0 if i did not develop CHD to explanatory variables for individual i . In logistic regression we formulate models for log-odds : � p i � log 1 − p i 5 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Odds and log-odds 4 8 2 6 Odds p/(1−p) 0 log−odds 4 −2 −4 2 −6 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 p p 6 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s The logistic regression model � i is female 0 Explanatory variable : male i = 1 i is male . Model: � � p i � i is female a = a + b · male i = log 1 − p i a + b i is male Determine a and b by hand. The difference in log-odds between males and females is b = (?) 7 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Calculating OR using logistic regression � p i i is female � � a = a + b · male i = log a + b i is male. 1 − p i b = ( a + b ) − a = log (odds for males) - log (odds for females) = log (OR for males vs. females) ie. exp ( b ) = OR for males vs. females = Now determine the OR of CHD for females vs. males. 8 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Logistic regression in R glm = Generalized Linear Model. > d <- read.dbf(’framingham.dbf’) > glm1 <- glm ( chd01 ~ factor(sex), data=d, family=binomial ) > summary( glm1 ) Call: glm(formula = chd01 ~ factor(sex), family = binomial, data = d) Deviance Residuals: Min 1Q Median 3Q Max -0.7674 -0.7674 -0.5586 -0.5586 1.9672 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.7789 0.1060 -16.780 < 2e-16 *** factor(sex)1 0.7070 0.1394 5.073 3.92e-07 *** --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1351.2 on 1362 degrees of freedom Residual deviance: 1324.9 on 1361 degrees of freedom (43 observations deleted due to missingness) AIC: 1328.9 Number of Fisher Scoring iterations: 4 > 9 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Finding OR and CI > > # Estimates in terms of log-odds > coef( glm1 ) (Intercept) factor(sex)1 -1.7788561 0.7070219 > > > # OR’s : > exp( coef( glm1 ) ) (Intercept) factor(sex)1 0.1688312 2.0279428 > > > # Confidence intervals : > exp( confint.default ( glm1 ) ) 2.5 % 97.5 % (Intercept) 0.1371558 0.2078218 factor(sex)1 1.5432055 2.6649413 > 10 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Logistic regression with a quantitative variable � � p i The model for log-odds is linear: log = a + b · age i 1 − p i Compare two individuals aged 51 and 50 OR = odds 51 years odds 50 years . log ( OR ) = log ( odds 51 years ) − log ( odds 50 years ) = ( a + 51 · b ) − ( a + 50 · b ) = b i.e. = exp ( b ) = exp ( 0 . 066 ) = 1 . 068 . OR Interpretation? 11 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Exercise : Odds ratios Determine the OR comparing two individuals with a difference in age of two years. Three years? Ten years? Discuss how to assess whether the linear model is plausible. 12 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Risk of CHD according to the model exp ( a + b · age i ) Predictions : p i = 1 + exp ( a + b · age i ) . 1.0 0.8 0.6 p 0.4 0.2 0.0 0 50 100 150 Alder 13 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s The additive model Consider the additive model : � p i � log = a + b × male i + c × hypertension i 1 − p i or put in tabular form: log-odds no hypertension hypertension females a + a c males a + b a + b + c OR of CHD, hypertention vs no hypertension: Males: exp ( c ) Females: exp ( c ) 14 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Interaction Is there an interaction between gender and hypertension? The interaction model log-odds no hypertension hypertension females a a + c males a + b a + b + c + d OR of CHD, hypertension vs no hypertension: Males: ? Females: ? 15 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Exercise On the following slides you find three model outputs. Study the outputs and fill in the blanks on this and the next slide. We use model to test whether there is an interaction between sex and hypertension. No interaction, i.e. d = 0 Estimated interaction term ( d ) and SE d Test statistic: Wald: W = SE = , P = Conclude : 16 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Exercise cont. We use model to compute ORs of CHD, hypertension vs no hypertension, for each gender. Males OR: Females OR: 17 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Model 1 > glm1 <- glm( chd01 ~ factor(sex)*factor(hyper), data=d, family=binomial ) > summary( glm1 ) Call: glm(formula = chd01 ~ factor(sex) * factor(hyper), family = binomial, data = d) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.5244 0.1867 -13.524 < 2e-16 *** factor(sex)1 1.2147 0.2196 5.532 3.16e-08 *** factor(hyper)1 1.3812 0.2300 6.005 1.92e-09 *** factor(sex)1:factor(hyper)1 -0.6815 0.2977 -2.289 0.0221 * --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1351.2 on 1362 degrees of freedom Residual deviance: 1271.7 on 1359 degrees of freedom (43 observations deleted due to missingness) AIC: 1279.7 Number of Fisher Scoring iterations: 5 > > exp( coef( glm1 ) ) (Intercept) factor(sex)1 0.08010336 3.36922654 factor(hyper)1 factor(sex)1:factor(hyper)1 3.97957459 0.50585702 > 18 / 20

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Model 2 > glm2 <- glm( chd01 ~ factor(sex) + factor(sex):factor(hyper), data=d, family=binomial) > summary( glm2 ) Call: glm(formula = chd01 ~ factor(sex) + factor(sex):factor(hyper), family = binomial, data = d) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.5244 0.1867 -13.524 < 2e-16 *** factor(sex)1 1.2147 0.2196 5.532 3.16e-08 *** factor(sex)0:factor(hyper)1 1.3812 0.2300 6.005 1.92e-09 *** factor(sex)1:factor(hyper)1 0.6997 0.1890 3.701 0.000214 *** --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1351.2 on 1362 degrees of freedom Residual deviance: 1271.7 on 1359 degrees of freedom (43 observations deleted due to missingness) AIC: 1279.7 Number of Fisher Scoring iterations: 5 > > exp( coef( glm2 ) ) (Intercept) factor(sex)1 0.08010336 3.36922654 factor(sex)0:factor(hyper)1 factor(sex)1:factor(hyper)1 3.97957459 2.01309573 > 19 / 20

Logistic regression Susanne Rosthj Section of Biostatistics - PowerPoint PPT Presentation

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Logistic regression Susanne Rosthj Section of Biostatistics Institute of Public Health University of Copenhagen sr@biostat.ku.dk u n i v e r s i

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

Contents 1 Introduction 1 2 The Problem of Overdispersion 1 2.1 Relevant Distributional

STAT 215 Logistic Regression II Colin Reimer Dawson Oberlin College November 14, 2017 1 / 33

Lecture 7: GLMs: Score equations, Residuals Author: Nick Reich / Transcribed by Bing Miu and

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Marcel Dettling Marcel Dettling Institute for Data Analysis and d Process Design Zurich

Machine Learning for Computational Linguistics Classifjcation ar ltekin University of

R E G R E S S I O N D I AG N O ST I C S A N D P R E D I C T I O N S MPA 630: Data Science for

Statistical-Significance Background & Goal Shortcuts Statistical significance is one of

Logistic regression Susanne Rosthj Section of Biostatistics - PowerPoint PPT Presentation

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Logistic regression Susanne Rosthj Section of Biostatistics Institute of Public Health University of Copenhagen sr@biostat.ku.dk u n i v e r s i

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

Contents 1 Introduction 1 2 The Problem of Overdispersion 1 2.1 Relevant Distributional

STAT 215 Logistic Regression II Colin Reimer Dawson Oberlin College November 14, 2017 1 / 33

Lecture 7: GLMs: Score equations, Residuals Author: Nick Reich / Transcribed by Bing Miu and

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Marcel Dettling Marcel Dettling Institute for Data Analysis and d Process Design Zurich

Machine Learning for Computational Linguistics Classifjcation ar ltekin University of

R E G R E S S I O N D I AG N O ST I C S A N D P R E D I C T I O N S MPA 630: Data Science for

Statistical-Significance Background &amp; Goal Shortcuts Statistical significance is one of

Statistical-Significance Background & Goal Shortcuts Statistical significance is one of