Marcel Dettling Marcel Dettling Institute for Data Analysis and d - PowerPoint PPT Presentation

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 Marcel Dettling Marcel Dettling Institute for Data Analysis and d Process Design Zurich University of Applied S Sciences marcel.dettling@zhaw.ch htt http://stat.ethz.ch/~dettling // t t th h/ d ttli ETH Zürich, November 29 9, 2010 Marcel Dettling, Zurich University of Applied Sciences 1

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 Logistic Regression L i ti R i Model M d l { { } } Y ∈ 0,1 , • has a Bernoulli d distribution. i i • The parameter of this distri bution is , the success rate p i Now please note that: = = = ( ( 1) 1) [ ] [ ] p P Y P Y E Y E Y i i i � the most powerful notion o of the logistic regression model is to see it as a model where w e try to find a relation between the expected value of and th he predictors! Y i = β + β + + β β Important : is no good here! ... p x x 0 1 1 i i ip Marcel Dettling, Zurich University of Applied Sciences 2

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 E Example l Survival in Prem mature Birth 35 0 30 age 25 20 2.8 2.9 3.0 3.1 log10(we eight) Marcel Dettling, Zurich University of Applied Sciences 3

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 I f Inference with GLMs ith GLM There are three tests that can be done: • Goodness-of-fit test - based on comparing agai based on comparing agai nst the saturated model nst the saturated model - not suitable for non-group ped, binary data • Comparing two nested m models - likelihood ratio test leads to deviance differences - test statistics has an asym mptotic Chi-Square distribution • Global test • Global test - comparing versus an emp pty model with only an intercept - this is a nested model tak this is a nested model, tak ke the null deviance ke the null deviance Marcel Dettling, Zurich University of Applied Sciences 4

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 N ll D Null Deviance i Smallest model: - The smallest model is with hout predictors, only with intercept π π qual to ˆ - Fitted values will all be eq Fitted values will all be eq qual to 0 - Our best fit (F) and the sm mallest model (0) are nested A global test: ( ( ) ) ( ( ) ) ( ( ) ) − − = = − − (0) ( ) ( ) (0) F F ˆ ˆ 2 2 , , l l l l D y p D y p D D D y p D y p Example: Null deviance: 319.28 o on 246 degrees of freedom Residual deviance: 235.9 Residual deviance: 235 9 94 94 on 244 degrees of freedom on 244 degrees of freedom Marcel Dettling, Zurich University of Applied Sciences 5

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 M d l Di Model Diagnostics ti Diagnostics are: g • as important with logistic re egression as they are with multiple linear regression models linear regression models • again based on differences s between fitted & observed values � we now have to take into a ccount that the variances are not equal for the different insta equal for the different insta nces nces. � we have to come up with n ovel types of residuals: Pearson and Deviance res siduals Marcel Dettling, Zurich University of Applied Sciences 6

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 P Pearson Residuals R id l Take the difference between o observed an fitted value and divide by an estimate of the standard d deviation: − ˆ y p = i i R − i ˆ ˆ ( (1 ) ) p p p p i i i i 2 � R is the contribution of the e i th observation to the Pearson i statistic for model compari son. � It is important to note that � It is important to note that Pearson residuals exceeding a Pearson residuals exceeding a value of two in absolute va alue warrant a closer look Marcel Dettling, Zurich University of Applied Sciences 7

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 Deviance Residuals D i R id l Take the contribution of the i th h observation to the log-likelihood, g , i.e. the chi-square statistic for model comparison. ( ( ) ) ) ) ( ( ) ) ( ( = − ⋅ + − − ˆ ˆ 2 log (1 )log 1 d y p y p i i i i i For obtaining a well interpreta able residual, we take the square root and the sign of the differe ence between true and fitted value: = − ⋅ ˆ ( ) D sign y p d i i i i � It is important to note that Pearson residuals exceeding a value of two in absolute va alue warrant a closer look Marcel Dettling, Zurich University of Applied Sciences 8

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 Tukey-Anscombe Plo T k A b Pl ot t Remark: sometimes studentiz zed residuals are used! Tukey-Anscombe Plot 1 Tukey-Anscombe Plot 2 2 2 1 1 s s rson residual rson residual 0 0 -1 -1 Pear Pear -2 -2 -3 -3 0.2 0.4 0.6 0.8 1 .0 -3 -2 -1 0 1 2 3 fitted probabilities linear predictor Marcel Dettling, Zurich University of Applied Sciences 9

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 T k Tukey-Anscombe Plo A b Pl ot t The Tukey-Anscombe plots in y p n R are not perfect. Better use: p xx <- predict(fit, type="re esponse") yy <- residuals(fit, type=" id l (fi " "pearson") " ") scatter.smooth(xx, yy, fami ily="gaussian", pch=20) abline(h=0, lty=3) bli (h 0 lt 3) Reasons: - using a non-robust smoothe er is a must - - different types of residuals different types of residuals can be used can be used - on the x-axis: probs or linea ar predictor Marcel Dettling, Zurich University of Applied Sciences 10

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 M More Diagnostics Di ti Residuals vs Lev verage 2 165 1 n resid. 0 Std. Pearson -1 -2 4 S -3 68 Cook's distance 0.5 -4 0.00 0.02 0.04 0.06 0.08 Leverage e glm(survival ~ I(log10(w weight)) + age) Marcel Dettling, Zurich University of Applied Sciences 11

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 Binomial Regression Bi i l R i n Models M d l Concentration Number of Number of in log of mg/l in log of mg/l insects n i insects n_i killed insects y_i killed insects y i 0.96 50 6 1.33 48 16 1.63 46 24 2.04 49 42 2.32 2 32 50 50 44 44 � for the number of killed inse ects, we have ~ ( , ) Y Bin n p i i i i i i � we are mainly interested in the proportion of insects surviving � these are grouped data: the ere is more than 1 observation for a given predictor setting Marcel Dettling, Zurich University of Applied Sciences 12

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 M d l Model and Estimation d E ti ti n The goal is to find a relation: g = = η = β + β + + β ( 1| ,..., ) ~ ... p P Y x x x x 1 0 1 1 i i p i i p ip η = ( ) We will again use the logit link k function such that g p i i ⎛ ⎛ ⎞ ⎞ = p p β + β + + β ⎜ ⎟ log 1 i ... x x − 0 1 1 1 i p ip ⎝ ⎠ p i Here, is the expected value p e , and thus, also this model [ / ] E Y n i i i here fits within the GLM frame ework. The log-likelihood is: ⎡ p ⎤ ⎛ ⎞ n k ∑ ∑ β = + + − − i ⎢ ⎥ ( ) log ⎜ ⎟ log( ) (1 )log(1 ) l n y p n y i i i i i i ⎝ ⎝ ⎠ ⎠ y y ⎣ ⎣ ⎦ ⎦ = 1 i i Marcel Dettling, Zurich University of Applied Sciences 13

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 Fitti Fitting with R ith R We need to generate a two-co g olumn matrix where the first contains the “successes” and the second contains the “failures” > killsurv > kill killed surviv [1,] 6 44 [1 ] 6 44 [2,] 16 32 [3 ] [3,] 24 22 24 22 [4,] 42 7 [5 ] [5,] 44 6 44 6 > fit <- glm(killsurv~ ~conc, family="binomial") Marcel Dettling, Zurich University of Applied Sciences 14

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 – Week 10 S Summary Output O t t The result for the insecticide e example is: p > summary(glm(killsurv ~ co onc, family = "binomial") Coefficients: E ti Estimate Std. E t Std E E Error z value Pr(>|z|) l P (>| |) (Intercept) -4.8923 0. .6426 -7.613 2.67e-14 *** conc conc 3.1088 0. 3 1088 0 .3879 8.015 1.11e-15 *** 3879 8 015 1 11e 15 *** --- Null deviance: 96.6881 Null deviance: 96 6881 on 4 degrees of freedom on 4 degrees of freedom Residual deviance: 1.4542 on 3 degrees of freedom AIC: 24.675 AIC: 24 675 Marcel Dettling, Zurich University of Applied Sciences 15

Marcel Dettling Marcel Dettling Institute for Data Analysis and d - PowerPoint PPT Presentation

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 Week 10 Marcel Dettling Marcel Dettling Institute for Data Analysis and d Process Design Zurich University of Applied S Sciences marcel.dettling@zhaw.ch htt

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute fr Datenanalyse und Prozessdesign Zrcher Hochschule fr Angewandte

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute fr Datenanalyse und Prozessdesign Zrcher Hochschule fr Angewandte

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

MISTER MARCEL STORY Mister Marcel is the fashion line for Men by Little Marcel, inspired by the

Systematic Mapping Studies Marcel Heinz 23. Juli 2014 Marcel Heinz Systematic Mapping Studies

Complexity in granular materials and porous media: More is Different by Marcel Moura and

! Audit Oversight in Malta Presented by MARCEL P. COPPINI marcel.coppini@gov.mt ! 12th FCM

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Logistic regression Susanne Rosthj Section of Biostatistics Institute of Public Health

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Contents 1 Introduction 1 2 The Problem of Overdispersion 1 2.1 Relevant Distributional

Machine Learning for Computational Linguistics Classifjcation ar ltekin University of

R E G R E S S I O N D I AG N O ST I C S A N D P R E D I C T I O N S MPA 630: Data Science for

Statistical-Significance Background & Goal Shortcuts Statistical significance is one of

Decomposition of sum of squares 8 y y 6 y y y y 4 y y 2 2 4 6 8

Marcel Dettling Marcel Dettling Institute for Data Analysis and d - PowerPoint PPT Presentation

Applied Statistical Regr Applied Statistical Regr ression ression HS 2010 Week 10 Marcel Dettling Marcel Dettling Institute for Data Analysis and d Process Design Zurich University of Applied S Sciences marcel.dettling@zhaw.ch htt

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute fr Datenanalyse und Prozessdesign Zrcher Hochschule fr Angewandte

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

Marcel Dettling Institute fr Datenanalyse und Prozessdesign Zrcher Hochschule fr Angewandte

Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied

MISTER MARCEL STORY Mister Marcel is the fashion line for Men by Little Marcel, inspired by the

Systematic Mapping Studies Marcel Heinz 23. Juli 2014 Marcel Heinz Systematic Mapping Studies

Complexity in granular materials and porous media: More is Different by Marcel Moura and

! Audit Oversight in Malta Presented by MARCEL P. COPPINI marcel.coppini@gov.mt ! 12th FCM

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Logistic regression Susanne Rosthj Section of Biostatistics Institute of Public Health

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Contents 1 Introduction 1 2 The Problem of Overdispersion 1 2.1 Relevant Distributional

Machine Learning for Computational Linguistics Classifjcation ar ltekin University of

R E G R E S S I O N D I AG N O ST I C S A N D P R E D I C T I O N S MPA 630: Data Science for

Statistical-Significance Background &amp; Goal Shortcuts Statistical significance is one of

Decomposition of sum of squares 8 y y 6 y y y y 4 y y 2 2 4 6 8

Statistical-Significance Background & Goal Shortcuts Statistical significance is one of