workshop 2
play

Workshop 2 Building from Linear Models to Generalised Linear Models - PowerPoint PPT Presentation

1 Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2 What are linear models? Something you have met already! Model to explain, with a linear relationship, one response variable with one or


  1. 1 Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs

  2. 2 What are linear models? • Something you have met already! • Model to explain, with a linear relationship, one response variable with one or more explanatory variables • y ~ x

  3. 3 What are linear models? Stage 1: response continuous - General Linear Model Procedure Response Predictors Single linear regression y ~ x Continuous 1 Continuous/discrete Two-sample t-test y ~ x Continuous 1 categorical (2 levels) One-way ANOVA y ~ x Continuous 1 categorical (2 or more levels) Two-way ANOVA y ~ x1*x2 Continuous 2 categorical (2 or more levels each) Stage 2: incl other types of response - Generalised Linear Model

  4. 4 Key points T-tests, ANOVA and regression are fundamentally the same Collectively ‘general linear model’ Can be extended to ‘generalised linear model’ for different types of response

  5. 5 Single linear regression y = 11.23 − 0.07*x > model <- lm(data = mydata, y ~ x) > summary(model) Intercept Call: lm(formula = y ~ x) Slope Residuals: Min 1Q Median 3Q Max Test of intercept -7.2875 -2.4868 -0.4081 2.2612 10.7125 Test of slope Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 11.23481 1.65035 6.808 3.9e-07 *** % of variation in y explained by x x -0.07373 0.02933 -2.514 0.0187 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Test of model (same as test of Residual standard error: 3.935 on 25 degrees of freedom slope for one variable) Multiple R-squared: 0.2018, Adjusted R-squared: 0.1699 F-statistic: 6.321 on 1 and 25 DF, p-value: 0.01874

  6. 6 Two-sample t-test t.test(y ~ x, data = mydata, paired = F , var.equal = T) t.test(mass ~ sex, data = chaff, paired = F, var.equal = T) Is there a significant Two Sample t-test data: mass by sex difference between the t = -2.6471, df = 38, p-value = 0.01175 alternative hypothesis: true difference in means is not equal to 0 masses of male and female 95 percent confidence interval: -3.167734 -0.422266 chaffinches? sample estimates: mean in group females mean in group males 20.480 22.275 t.test(cell$growth ~ cell$treatment, paired = F,var.equal = T) Two Sample t-test Does treatment with data: cell$growth by cell$treatment t = 2.6471, df = 38, p-value = 0.01175 Compound X affect cell growth alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: compared to control treatment 0.422266 3.167734 sample estimates: mean in group control mean in group withx 22.275 20.480

  7. 7 Two-sample t-test Using t.test > t.test(mass ~ sex, data = chaff, paired = F, var.equal = T) Two Sample t-test data: mass by sex t = -2.6471, df = 38, p-value = 0.01175 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: > mod <- lm(mass ~ sex, data = chaff) -3.167734 -0.422266 Using lm() sample estimates: > summary(mod) Call: mean in group females mean in group males 20.480 22.275 lm(formula = mass ~ sex, data = chaff) Female mean sig diff from 0. Not important Residuals: Min 1Q Median 3Q Max -5.2750 -1.7000 -0.3775 1.6200 4.1250 Coefficients: Estimate Std. Error t value Pr(>|t|) Intercept is mean of ‘lowest’ level of factor (Intercept) 20.4800 0.4795 42.712 <2e-16 *** Difference is sexmales 1.7950 0.6781 2.647 0.0118 * significant --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Difference between intercept Residual standard error: 2.144 on 38 degrees of freedom and next level (i.e., the slope) Multiple R-squared: 0.1557, Adjusted R-squared: 0.1335 F-statistic: 7.007 on 1 and 38 DF, p-value: 0.01175 Why use lm() - because it is extendable

  8. 8 One-way ANOVA mod <- aov(y ~ x, data = mydata) summary(mod) > modc <- aov(diameter ~ medium, data = culture) > summary(modc) Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

  9. 9 One-way ANOVA Using aov() > modc <- aov(diameter ~ medium, data = culture) > summary(modc) Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584 --- Using lm() Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > modl <- lm(diameter ~ medium, data = culture) > summary(modl) lm(formula = diameter ~ medium, data = culture) Whether Residuals: Min 1Q Median 3Q Max differences are -1.541 -0.700 -0.080 0.424 1.949 Intercept is mean of lowest level of factor (control) significant Control mean = 10.07 Coefficients: Difference between intercept and ‘with sugar’ Estimate Std. Error t value Pr(>|t|) Difference between intercept and ‘with sugar + amino acids’ (Intercept) 10.0700 0.2930 34.370 < 2e-16 *** mediumwith sugar 0.1700 0.4143 0.410 0.68483 mediumwith sugar + amino acids 1.3310 0.4143 3.212 0.00339 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.9265 on 27 degrees of freedom Multiple R-squared: 0.3117, Adjusted R-squared: 0.2607 F-statistic: 6.113 on 2 and 27 DF, p-value: 0.00646 Whether ‘model’ (the one factor) is significant

  10. 10 Two-way ANOVA Effect of two factors on wing length of butterflies – Response: wing length – Predictors: sex and spp (categorical) mod <- lm(y ~ x1 * x2, data = mydata) mod <- lm(y ~ x1 + x2, data = mydata) OR 3 tests: 2 Main effects and interaction 2 tests: 2 Main effects Stage 1: aov(y ~ x1 * x2, data = mydata) mod <- aov(winglen ~ sex * spp, data = butter)) F.concocti F.flappa summary(mod) females 31.37 24.67 Df Sum Sq Mean Sq F value Pr(>F) males 24.97 23.45 sex 1 145.16 145.161 9.2717 0.004334 ** spp 1 168.92 168.921 10.7893 0.002280 ** sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

  11. 11 Two-way ANOVA F.concocti F.flappa females 31.37 24.67 males 24.97 23.45 mod <- aov(winglen ~ sex * spp,data = butter)) summary(mod) mod2 <- lm(winglen ~ sex * spp, data = butter) Df Sum Sq Mean Sq F value Pr(>F) summary(mod2) sex 1 145.16 145.161 9.2717 0.004334 ** Call: spp 1 168.92 168.921 10.7893 0.002280 ** lm(formula = winglen ~ sex * spp, data = butter) sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656 Residuals: --- Min 1Q Median 3Q Max Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 -7.770 -3.095 0.090 2.920 6.530 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 31.370 1.251 25.071 < 2e-16 *** sexmales -6.400 1.770 -3.617 0.000907 *** sppF.flappa -6.700 1.770 -3.786 0.000560 *** sexmales:sppF.flappa 5.180 2.503 2.070 0.045692 * The same Each explanatory variable --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.957 on 36 degrees of freedom anova(mod2) Multiple R-squared: 0.4034, Adjusted R-squared: 0.3537 Analysis of Variance Table F-statistic: 8.115 on 3 and 36 DF, p-value: 0.0002949 Response: winglen Df Sum Sq Mean Sq F value Pr(>F) sex 1 145.16 145.161 9.2717 0.004334 ** Whole model spp 1 168.92 168.921 10.7893 0.002280 ** sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656

  12. 12 What are linear models? • Response can be continuous, discrete or categorical • Predictors can be continuous or categorical • Type of response and (“errors”), type of predictors and relationship between them determines type of model

  13. 13 Key points T-tests, ANOVA and regression are fundamentally the same Collectively ‘general linear model’ mod <- lm( response ~ explanatory1 , data = mydata ) Use summary(mod2) mod <- lm( response ~ explanatory1 * explanatory2 , data = mydata ) summary(mod) anova(mod) Can be extended to ‘generalised linear model’ for different types of response

  14. 14 Summary – regression, t-tests and anova are linear models and have the assumptions of classical linear models – summary() output • Intercept estimate is mean of lowest factor level • There is significance test for each estimate • There is a significance test for the model as a whole

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend