Workshop 2
Building from Linear Models to Generalised Linear Models Part 1: understanding LMs
1
Workshop 2 Building from Linear Models to Generalised Linear Models - - PowerPoint PPT Presentation
1 Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2 What are linear models? Something you have met already! Model to explain, with a linear relationship, one response variable with one or
1
2
Procedure Response Predictors Single linear regression y ~ x Continuous 1 Continuous/discrete Two-sample t-test y ~ x Continuous 1 categorical (2 levels) One-way ANOVA y ~ x Continuous 1 categorical (2 or more levels) Two-way ANOVA y ~ x1*x2 Continuous 2 categorical (2 or more levels each)
Stage 1: response continuous - General Linear Model Stage 2: incl other types of response - Generalised Linear Model
3
T-tests, ANOVA and regression are fundamentally the same Collectively ‘general linear model’ Can be extended to ‘generalised linear model’ for different types of response
4
> model <- lm(data = mydata, y ~ x)
> summary(model) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 11.23481 1.65035 6.808 3.9e-07 *** x -0.07373 0.02933 -2.514 0.0187 *
Residual standard error: 3.935 on 25 degrees of freedom Multiple R-squared: 0.2018, Adjusted R-squared: 0.1699 F-statistic: 6.321 on 1 and 25 DF, p-value: 0.01874
Intercept Slope Test of intercept Test of slope % of variation in y explained by x Test of model (same as test of slope for one variable)
y = 11.23 − 0.07*x
5
t.test(mass ~ sex, data = chaff, paired = F, var.equal = T) Two Sample t-test data: mass by sex t = -2.6471, df = 38, p-value = 0.01175 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:
sample estimates: mean in group females mean in group males 20.480 22.275
t.test(y ~ x, data = mydata, paired = F , var.equal = T)
t.test(cell$growth ~ cell$treatment, paired = F,var.equal = T) Two Sample t-test data: cell$growth by cell$treatment t = 2.6471, df = 38, p-value = 0.01175 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.422266 3.167734 sample estimates: mean in group control mean in group withx 22.275 20.480
Is there a significant difference between the masses of male and female chaffinches? Does treatment with Compound X affect cell growth compared to control treatment
6
> t.test(mass ~ sex, data = chaff, paired = F, var.equal = T) Two Sample t-test data: mass by sex t = -2.6471, df = 38, p-value = 0.01175 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:
sample estimates: mean in group females mean in group males 20.480 22.275 > mod <- lm(mass ~ sex, data = chaff) > summary(mod) Call: lm(formula = mass ~ sex, data = chaff) Residuals: Min 1Q Median 3Q Max
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 20.4800 0.4795 42.712 <2e-16 *** sexmales 1.7950 0.6781 2.647 0.0118 *
Residual standard error: 2.144 on 38 degrees of freedom Multiple R-squared: 0.1557, Adjusted R-squared: 0.1335 F-statistic: 7.007 on 1 and 38 DF, p-value: 0.01175
Using lm()
Female mean sig diff from 0. Not important
Using t.test
Difference between intercept and next level (i.e., the slope) Intercept is mean of ‘lowest’ level of factor Difference is significant
Why use lm() - because it is extendable
7
> modc <- aov(diameter ~ medium, data = culture) > summary(modc) Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584
mod <- aov(y ~ x, data = mydata) summary(mod)
8
> modl <- lm(diameter ~ medium, data = culture) > summary(modl) lm(formula = diameter ~ medium, data = culture) Residuals: Min 1Q Median 3Q Max
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 10.0700 0.2930 34.370 < 2e-16 *** mediumwith sugar 0.1700 0.4143 0.410 0.68483 mediumwith sugar + amino acids 1.3310 0.4143 3.212 0.00339 **
Residual standard error: 0.9265 on 27 degrees of freedom Multiple R-squared: 0.3117, Adjusted R-squared: 0.2607 F-statistic: 6.113 on 2 and 27 DF, p-value: 0.00646
> modc <- aov(diameter ~ medium, data = culture) > summary(modc) Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584
Intercept is mean of lowest level of factor (control) Control mean = 10.07 Difference between intercept and ‘with sugar’ Difference between intercept and ‘with sugar + amino acids’
Whether differences are significant Whether ‘model’ (the one factor) is significant
Using aov() Using lm()
9
Effect of two factors on wing length of butterflies – Response: wing length – Predictors: sex and spp (categorical)
mod <- aov(winglen ~ sex * spp, data = butter)) summary(mod) Df Sum Sq Mean Sq F value Pr(>F) sex 1 145.16 145.161 9.2717 0.004334 ** spp 1 168.92 168.921 10.7893 0.002280 ** sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656
F.concocti F.flappa females 31.37 24.67 males 24.97 23.45
mod <- lm(y ~ x1 * x2, data = mydata) 3 tests: 2 Main effects and interaction mod <- lm(y ~ x1 + x2, data = mydata) 2 tests: 2 Main effects
OR
Stage 1: aov(y ~ x1 * x2, data = mydata)
10
F.concocti F.flappa females 31.37 24.67 males 24.97 23.45
mod2 <- lm(winglen ~ sex * spp, data = butter) summary(mod2) Call: lm(formula = winglen ~ sex * spp, data = butter) Residuals: Min 1Q Median 3Q Max
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 31.370 1.251 25.071 < 2e-16 *** sexmales -6.400 1.770 -3.617 0.000907 *** sppF.flappa -6.700 1.770 -3.786 0.000560 *** sexmales:sppF.flappa 5.180 2.503 2.070 0.045692 *
1 Residual standard error: 3.957 on 36 degrees of freedom Multiple R-squared: 0.4034, Adjusted R-squared: 0.3537 F-statistic: 8.115 on 3 and 36 DF, p-value: 0.0002949 mod <- aov(winglen ~ sex * spp,data = butter)) summary(mod) Df Sum Sq Mean Sq F value Pr(>F) sex 1 145.16 145.161 9.2717 0.004334 ** spp 1 168.92 168.921 10.7893 0.002280 ** sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656
anova(mod2) Analysis of Variance Table Response: winglen Df Sum Sq Mean Sq F value Pr(>F) sex 1 145.16 145.161 9.2717 0.004334 ** spp 1 168.92 168.921 10.7893 0.002280 ** sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656
The same Whole model Each explanatory variable
11
12
T-tests, ANOVA and regression are fundamentally the same Collectively ‘general linear model’ Use Can be extended to ‘generalised linear model’ for different types of response
mod <- lm(response ~ explanatory1 * explanatory2, data = mydata) summary(mod) anova(mod) mod <- lm(response ~ explanatory1, data = mydata) summary(mod2)
13
– regression, t-tests and anova are linear models and have the assumptions of classical linear models – summary() output
14