linear and logisitic regression models
play

Linear and logisitic regression models Sren Hjsgaard Department of - PowerPoint PPT Presentation

Linear and logisitic regression models Sren Hjsgaard Department of Mathematical Sciences Aalborg University, Denmark August 22, 2012 Printed: August 22, 2012 File: models-slides.tex 2: August 22, 2012 Contents 1 Linear normal models 3


  1. Linear and logisitic regression models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark August 22, 2012 Printed: August 22, 2012 File: models-slides.tex

  2. 2: August 22, 2012 Contents 1 Linear normal models 3 2 Linear regression 4 2.1 Fitting linear regression model with lm() . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Printing the object: print() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Model objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Extractor methods and methods for further computing . . . . . . . . . . . . . . . . . . . . . 13 2.5 Plotting the object: plot() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.6 Summary information: summary() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.7 Confidence interval for model parameters: confint() . . . . . . . . . . . . . . . . . . . . . . 19 2.8 Predicting new cases: predict() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3 Regression with a factor 23 3.1 Transforming data using transform() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4 Model comparison 30 4.1 Comparing two models with anova() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 Three commonly used tables for model comparisons . . . . . . . . . . . . . . . . . . . . . . . 33 4.3 Sequential ANOVA table: anova() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.4 On interpreting the anova() output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.5 Dropping each term in turn using drop1() . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.6 On interpreting the drop1() output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.7 Investigating parameter estimates using coef() . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.8 Which table to use?* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5 Residuals and model checking 42 5.1 Interpreting diagnostic plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6 Logistic regression 46

  3. 3: August 22, 2012 1 Linear normal models • Linear normal models (regression models, analysis of variance models, analysis of covariance models etc.) are fitted using the lm() function. • The lm() function is typically called as: R> lm(y ~ x1 + x2 + x3, dataset) • The result from calling the lm() function is an object (also called a model object) with a specific class. • Further analysis of the model is typically via additional R functions applied to the model object.

  4. 4: August 22, 2012 2 Linear regression The hellung dataset has 51 rows and 3 columns. Diameter and concentration of Tetrahymena cells with (coded as 1 ) and without (coded as 2 ) glucose added to growth medium. Tetrahymena cells are often used as model organisms in experimental biology.

  5. 5: August 22, 2012 R> data(hellung,package="ISwR") R> head(hellung,6) glucose conc diameter 1 1 631000 21.2 2 1 592000 21.5 3 1 563000 21.3 4 1 475000 21.0 5 1 461000 21.5 6 1 416000 21.3 R> sapply(hellung, class) glucose conc diameter "integer" "integer" "numeric"

  6. 6: August 22, 2012 R> par(mfrow=c(1,2)) R> plot(diameter ~ conc, data=hellung, + col=glucose, pch=as.character(glucose)) R> plot(log(diameter) ~ log(conc), data=hellung, + col=glucose, pch=as.character(glucose)) 1 1 3.25 1 1 1 1 1 1 1 1 1 1 1 1 25 1 1 1 1 1 1 1 2 1 1 2 2 2 1 2 log(diameter) 2 1 2 1 2 1 3.15 1 2 1 1 1 2 1 diameter 1 2 1 1 1 2 1 2 1 1 2 1 23 2 2 1 2 1 1 2 2 1 2 1 1 2 1 2 1 1 1 1 3.05 2 1 1 1 1 1 1 2 1 1 2 2 2 1 1 21 2 2 2 1 2 2 1 1 2 2 2 2 2.95 2 2 19 0e+00 3e+05 6e+05 10 11 12 13 conc log(conc) • On a log scale the curves look linear • but are they parallel?

  7. 7: August 22, 2012 For now we ignore the glucose treatment. In the following let y = log( diameter ) and x = log( conc ) . The plots suggest an approximately linear relationship on the log scale which we can capture by a linear regression model e i ∼ N (0 , σ 2 ) , y i = α + βx i + e i , i = 1 , . . . , 51

  8. 8: August 22, 2012 2.1 Fitting linear regression model with lm() R> hm <- lm(log(diameter) ~ log(conc), data=hellung) Now hm is a linear model object.

  9. 9: August 22, 2012 2.2 Printing the object: print() The print() method gives some information about the object: R> print(hm) Call: lm(formula = log(diameter) ~ log(conc), data = hellung) Coefficients: (Intercept) log(conc) 3.74703 -0.05451 Instead of calling print() we may simply type the object’s name: R> hm Call: lm(formula = log(diameter) ~ log(conc), data = hellung) Coefficients: (Intercept) log(conc) 3.74703 -0.05451

  10. 10: August 22, 2012 2.3 Model objects Techically, a model object is a list with all sorts of information; for example the model specification, the data, the parameter estimates and so on. R> class(hm) [1] "lm" R> names(hm) [1] "coefficients" "residuals" "effects" "rank" [5] "fitted.values" "assign" "qr" "df.residual" [9] "xlevels" "call" "terms" "model" The names of the list are called attributes or slots .

  11. 11: August 22, 2012 We may extract values from the model object using the $ –operator as follows R> hm$coefficients (Intercept) log(conc) 3.74702641 -0.05451464 R> hm$residuals 1 2 3 4 5 0.0350211011 0.0455948627 0.0335108932 0.0100606873 0.0319602834 6 7 8 9 10 0.0170150708 -0.0352928345 0.0665403222 0.0089021709 0.0182022593 11 12 13 14 15 0.0349529425 0.0214114524 0.0462117137 0.0518996737 0.0275888916 16 17 18 19 20 0.0191915976 0.0218841508 0.0061947069 -0.0004889143 0.0306645283 21 22 23 24 25 0.0059028969 0.0565348153 0.0212271549 0.0298302213 0.0260690999 26 27 28 29 30 -0.0098984554 0.0471016029 0.0193724964 0.0218186112 0.0256750284 31 32 33 34 35 0.0096540212 0.0298367056 -0.0641562641 -0.0611421290 -0.0683058568 36 37 38 39 40 -0.0177867867 -0.0368224409 -0.0443737505 -0.0468146393 -0.0932894058 41 42 43 44 45 -0.0149983152 -0.0164825214 -0.0395395538 -0.0230984646 0.0014197167 46 47 48 49 50 -0.0295345605 -0.0402017510 -0.0534922053 -0.0320022390 -0.0401489903 51 -0.0533796004

  12. 12: August 22, 2012

  13. 13: August 22, 2012 2.4 Extractor methods and methods for further computing For some of the attributes there exist extractor functions, for example:

  14. 14: August 22, 2012 R> coef(hm) (Intercept) log(conc) 3.74702641 -0.05451464 R> residuals(hm) 1 2 3 4 5 0.0350211011 0.0455948627 0.0335108932 0.0100606873 0.0319602834 6 7 8 9 10 0.0170150708 -0.0352928345 0.0665403222 0.0089021709 0.0182022593 11 12 13 14 15 0.0349529425 0.0214114524 0.0462117137 0.0518996737 0.0275888916 16 17 18 19 20 0.0191915976 0.0218841508 0.0061947069 -0.0004889143 0.0306645283 21 22 23 24 25 0.0059028969 0.0565348153 0.0212271549 0.0298302213 0.0260690999 26 27 28 29 30 -0.0098984554 0.0471016029 0.0193724964 0.0218186112 0.0256750284 31 32 33 34 35 0.0096540212 0.0298367056 -0.0641562641 -0.0611421290 -0.0683058568 36 37 38 39 40 -0.0177867867 -0.0368224409 -0.0443737505 -0.0468146393 -0.0932894058 41 42 43 44 45 -0.0149983152 -0.0164825214 -0.0395395538 -0.0230984646 0.0014197167 46 47 48 49 50 -0.0295345605 -0.0402017510 -0.0534922053 -0.0320022390 -0.0401489903 51 -0.0533796004

  15. 15: August 22, 2012 Moreover, there are various methods available for model objects and each of these methods perform a specific task. Some of these methods are print() , summary() , plot() , coef() , fitted() , predict() ...

  16. 16: August 22, 2012 2.5 Plotting the object: plot() The plot() method for lm –objects produces illustrative diagnostic plots: R> par(mfrow=c(2,2),mar=c(2,4.5,2,2)) R> plot(hm) Residuals vs Fitted Normal Q−Q Standardized residuals 2 8 ● 8 ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● Residuals ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● 0.00 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 35 ●● −0.10 35 −2 ● ● 40 40 ● 3.05 3.10 3.15 3.20 −2 −1 0 1 2 Scale−Location Residuals vs Leverage Standardized residuals Standardized residuals 40 1.5 ● 2 ● ● ● 35 ● 8 ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● −1 ●● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● 34 ● ● 33 35 ● Cook's distance ● ● 0.0 −3 3.05 3.10 3.15 3.20 0.00 0.02 0.04 0.06

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend