Workshop 2 Building from Linear Models to Generalised Linear Models - - PowerPoint PPT Presentation

workshop 2
SMART_READER_LITE
LIVE PREVIEW

Workshop 2 Building from Linear Models to Generalised Linear Models - - PowerPoint PPT Presentation

1 Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2 What are linear models? Something you have met already! Model to explain, with a linear relationship, one response variable with one or


slide-1
SLIDE 1

Workshop 2

Building from Linear Models to Generalised Linear Models Part 1: understanding LMs

1

slide-2
SLIDE 2

What are linear models?

  • Something you have met already!
  • Model to explain, with a linear relationship,
  • ne response variable with one or more

explanatory variables

  • y ~ x

2

slide-3
SLIDE 3

What are linear models?

Procedure Response Predictors Single linear regression y ~ x Continuous 1 Continuous/discrete Two-sample t-test y ~ x Continuous 1 categorical (2 levels) One-way ANOVA y ~ x Continuous 1 categorical (2 or more levels) Two-way ANOVA y ~ x1*x2 Continuous 2 categorical (2 or more levels each)

Stage 1: response continuous - General Linear Model Stage 2: incl other types of response - Generalised Linear Model

3

slide-4
SLIDE 4

Key points

T-tests, ANOVA and regression are fundamentally the same Collectively ‘general linear model’ Can be extended to ‘generalised linear model’ for different types of response

4

slide-5
SLIDE 5

Single linear regression

> model <- lm(data = mydata, y ~ x)

> summary(model) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max

  • 7.2875 -2.4868 -0.4081 2.2612 10.7125

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 11.23481 1.65035 6.808 3.9e-07 *** x -0.07373 0.02933 -2.514 0.0187 *

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.935 on 25 degrees of freedom Multiple R-squared: 0.2018, Adjusted R-squared: 0.1699 F-statistic: 6.321 on 1 and 25 DF, p-value: 0.01874

Intercept Slope Test of intercept Test of slope % of variation in y explained by x Test of model (same as test of slope for one variable)

y = 11.23 − 0.07*x

5

slide-6
SLIDE 6

Two-sample t-test

t.test(mass ~ sex, data = chaff, paired = F, var.equal = T) Two Sample t-test data: mass by sex t = -2.6471, df = 38, p-value = 0.01175 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

  • 3.167734 -0.422266

sample estimates: mean in group females mean in group males 20.480 22.275

t.test(y ~ x, data = mydata, paired = F , var.equal = T)

t.test(cell$growth ~ cell$treatment, paired = F,var.equal = T) Two Sample t-test data: cell$growth by cell$treatment t = 2.6471, df = 38, p-value = 0.01175 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.422266 3.167734 sample estimates: mean in group control mean in group withx 22.275 20.480

Is there a significant difference between the masses of male and female chaffinches? Does treatment with Compound X affect cell growth compared to control treatment

6

slide-7
SLIDE 7

Two-sample t-test

> t.test(mass ~ sex, data = chaff, paired = F, var.equal = T) Two Sample t-test data: mass by sex t = -2.6471, df = 38, p-value = 0.01175 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

  • 3.167734 -0.422266

sample estimates: mean in group females mean in group males 20.480 22.275 > mod <- lm(mass ~ sex, data = chaff) > summary(mod) Call: lm(formula = mass ~ sex, data = chaff) Residuals: Min 1Q Median 3Q Max

  • 5.2750 -1.7000 -0.3775 1.6200 4.1250

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 20.4800 0.4795 42.712 <2e-16 *** sexmales 1.7950 0.6781 2.647 0.0118 *

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.144 on 38 degrees of freedom Multiple R-squared: 0.1557, Adjusted R-squared: 0.1335 F-statistic: 7.007 on 1 and 38 DF, p-value: 0.01175

Using lm()

Female mean sig diff from 0. Not important

Using t.test

Difference between intercept and next level (i.e., the slope) Intercept is mean of ‘lowest’ level of factor Difference is significant

Why use lm() - because it is extendable

7

slide-8
SLIDE 8

One-way ANOVA

> modc <- aov(diameter ~ medium, data = culture) > summary(modc) Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

mod <- aov(y ~ x, data = mydata) summary(mod)

8

slide-9
SLIDE 9

> modl <- lm(diameter ~ medium, data = culture) > summary(modl) lm(formula = diameter ~ medium, data = culture) Residuals: Min 1Q Median 3Q Max

  • 1.541 -0.700 -0.080 0.424 1.949

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 10.0700 0.2930 34.370 < 2e-16 *** mediumwith sugar 0.1700 0.4143 0.410 0.68483 mediumwith sugar + amino acids 1.3310 0.4143 3.212 0.00339 **

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9265 on 27 degrees of freedom Multiple R-squared: 0.3117, Adjusted R-squared: 0.2607 F-statistic: 6.113 on 2 and 27 DF, p-value: 0.00646

One-way ANOVA

> modc <- aov(diameter ~ medium, data = culture) > summary(modc) Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Intercept is mean of lowest level of factor (control) Control mean = 10.07 Difference between intercept and ‘with sugar’ Difference between intercept and ‘with sugar + amino acids’

Whether differences are significant Whether ‘model’ (the one factor) is significant

Using aov() Using lm()

9

slide-10
SLIDE 10

Effect of two factors on wing length of butterflies – Response: wing length – Predictors: sex and spp (categorical)

Two-way ANOVA

mod <- aov(winglen ~ sex * spp, data = butter)) summary(mod) Df Sum Sq Mean Sq F value Pr(>F) sex 1 145.16 145.161 9.2717 0.004334 ** spp 1 168.92 168.921 10.7893 0.002280 ** sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

F.concocti F.flappa females 31.37 24.67 males 24.97 23.45

mod <- lm(y ~ x1 * x2, data = mydata) 3 tests: 2 Main effects and interaction mod <- lm(y ~ x1 + x2, data = mydata) 2 tests: 2 Main effects

OR

Stage 1: aov(y ~ x1 * x2, data = mydata)

10

slide-11
SLIDE 11

F.concocti F.flappa females 31.37 24.67 males 24.97 23.45

Two-way ANOVA

mod2 <- lm(winglen ~ sex * spp, data = butter) summary(mod2) Call: lm(formula = winglen ~ sex * spp, data = butter) Residuals: Min 1Q Median 3Q Max

  • 7.770 -3.095 0.090 2.920 6.530

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 31.370 1.251 25.071 < 2e-16 *** sexmales -6.400 1.770 -3.617 0.000907 *** sppF.flappa -6.700 1.770 -3.786 0.000560 *** sexmales:sppF.flappa 5.180 2.503 2.070 0.045692 *

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

1 Residual standard error: 3.957 on 36 degrees of freedom Multiple R-squared: 0.4034, Adjusted R-squared: 0.3537 F-statistic: 8.115 on 3 and 36 DF, p-value: 0.0002949 mod <- aov(winglen ~ sex * spp,data = butter)) summary(mod) Df Sum Sq Mean Sq F value Pr(>F) sex 1 145.16 145.161 9.2717 0.004334 ** spp 1 168.92 168.921 10.7893 0.002280 ** sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

anova(mod2) Analysis of Variance Table Response: winglen Df Sum Sq Mean Sq F value Pr(>F) sex 1 145.16 145.161 9.2717 0.004334 ** spp 1 168.92 168.921 10.7893 0.002280 ** sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656

The same Whole model Each explanatory variable

11

slide-12
SLIDE 12

What are linear models?

  • Response can be continuous, discrete or

categorical

  • Predictors can be continuous or categorical
  • Type of response and (“errors”), type of

predictors and relationship between them determines type of model

12

slide-13
SLIDE 13

Key points

T-tests, ANOVA and regression are fundamentally the same Collectively ‘general linear model’ Use Can be extended to ‘generalised linear model’ for different types of response

mod <- lm(response ~ explanatory1 * explanatory2, data = mydata) summary(mod) anova(mod) mod <- lm(response ~ explanatory1, data = mydata) summary(mod2)

13

slide-14
SLIDE 14

Summary

– regression, t-tests and anova are linear models and have the assumptions of classical linear models – summary() output

  • Intercept estimate is mean of lowest factor level
  • There is significance test for each estimate
  • There is a significance test for the model as a whole

14