Linear regression and t-tests Steve Bagley somgen223.stanford.edu - - PowerPoint PPT Presentation

▶

Nov 16, 2023 462 likes •634 views

Linear regression and t-tests Steve Bagley somgen223.stanford.edu 1 Linear regression somgen223.stanford.edu 2 d <- tibble (height = 0 : 5, weight = 0.5 + 0 : 5 + runif (6, -0.5, 0.5)) Create data In this dataset, weight = 0.5 + height

SLIDE 1

Linear regression and t-tests

Steve Bagley

somgen223.stanford.edu 1

SLIDE 2

Linear regression

somgen223.stanford.edu 2

SLIDE 3

Create data

d <- tibble(height = 0:5, weight = 0.5 + 0:5 + runif(6, -0.5, 0.5))

In this dataset, weight = 0.5 + height + some random errors.
runif generates random numbers from a uniform distribution.

somgen223.stanford.edu 3

SLIDE 4

Plot the data

plot <- ggplot(d, aes(height, weight)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + expand_limits(y = 0) plot

2 4 6 1 2 3 4 5

height weight

somgen223.stanford.edu 4

SLIDE 5

How to do a linear regression

reg <- lm(weight ~ height, data = d) reg Call: lm(formula = weight ~ height, data = d) Coefficients: (Intercept) height 0.4463 1.1150

Note use of ~ here: weight ~ height
This is called the formula notation.
The variable on the left is the dependent variable.
The variable on the right is the independent variable.
They should be column names in the data argument.
The result shows the y-intercept and the coefficient of the height variable.

somgen223.stanford.edu 5

SLIDE 6

How to get more information about the regression

summary(reg) Call: lm(formula = weight ~ height, data = d) Residuals: 1 2 3 4 5 6

0.01798

0.14520 -0.27676 0.14624 0.04688 -0.04359 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.4463 0.1272 3.51 0.0247 * height 1.1151 0.0420 26.55 1.2e-05 ***

Signif. codes:

0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1757 on 4 degrees of freedom Multiple R-squared: 0.9944, Adjusted R-squared: 0.9929 F-statistic: 704.8 on 1 and 4 DF, p-value: 1.197e-05

somgen223.stanford.edu 6

SLIDE 7

How to extract the coefficients

coefficients(reg) (Intercept) height 0.4463429 1.1150495 coefficients(reg)[["(Intercept)"]] [1] 0.4463429 coefficients(reg)[["height"]] [1] 1.115049

coefficients returns a named vector.
Use [[ ]] to extract the values without the names.

somgen223.stanford.edu 7

SLIDE 8

Add regression line information

plot + annotate("text", x = 1, y = 5, label = sprintf("y = %.4f + %.4f x", coefficients(reg)[["(Intercept)"]], coefficients(reg)[["height"]]))

y = 0.4463 + 1.1150 x

2 4 6 1 2 3 4 5

height weight

somgen223.stanford.edu 8

SLIDE 9

Add regression line information (fancy)

plot + annotate("text", x = 1, y = 5, label = sprintf("italic(y) == %.4f + %.4f * italic(x)", coefficients(reg)[["(Intercept)"]], coefficients(reg)[["height"]]), parse = TRUE)

y = 0.4463 + 1.115x

2 4 6 1 2 3 4 5

height weight

See ?plotmath for details

somgen223.stanford.edu 9

SLIDE 10

Add other information (gratuitously ornate)

plot + annotate("text", x = 1, y = 5, label = "e^{pi * i} - 1 == 0", parse = TRUE)

eπi − 1 = 0

2 4 6 1 2 3 4 5

height weight

See ?plotmath for details

somgen223.stanford.edu 10

SLIDE 11

Adding the regression info using package ggpubr

library(ggpubr) ggscatter(d, x = "height", y = "weight", add = "reg.line", add.params = list(color = "blue")) + stat_regline_equation(label.x = 1, label.y = 5) + stat_cor(label.x = 1, label.y = 4.7)

y = 0.45 + 1.1 x R = 1 , p = 1.2e-05 2 4 6 1 2 3 4 5 height weight

somgen223.stanford.edu 11

SLIDE 12

Simple statistical tests

somgen223.stanford.edu 12

SLIDE 13

Create data

set.seed(13) n <- 50 d2 <- tibble(value = c(rnorm(n, mean = 10, sd = 2), rnorm(n, mean = 11, sd = 2)), group = c(rep("control", times = n), rep("treatment", times = n))) head(d2) # A tibble: 6 x 2 value group <dbl> <chr> 1 11.1 control 2 9.44 control 3 13.6 control 4 10.4 control 5 12.3 control 6 10.8 control

rnorm generates random numbers from a Gaussian distribution.
rep builds a vector by repeating values.

somgen223.stanford.edu 13

SLIDE 14

Plot the data

ggplot(d2, aes(value, color = group)) + geom_histogram(aes(fill = group), position = "dodge", binwidth = 0.5)

2 4 6 7.5 10.0 12.5 15.0

value count group

control treatment somgen223.stanford.edu 14

SLIDE 15

Two sample t-test

d2_x <- d2 %>% filter(group == "control") %>% pull(value) d2_y <- d2 %>% filter(group == "treatment") %>% pull(value) t.test(d2_x, d2_y) Welch Two Sample t-test data: d2_x and d2_y t = -2.247, df = 97.824, p-value = 0.02689 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

1.6164644 -0.1002824

sample estimates: mean of x mean of y 9.947163 10.805536

t.test uses vectors, not data frames.

somgen223.stanford.edu 15

Linear regression and t-tests

Steve Bagley

Linear regression

Create data

d <- tibble(height = 0:5, weight = 0.5 + 0:5 + runif(6, -0.5, 0.5))

Plot the data

plot <- ggplot(d, aes(height, weight)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + expand_limits(y = 0) plot

height weight

How to do a linear regression

reg <- lm(weight ~ height, data = d) reg Call: lm(formula = weight ~ height, data = d) Coefficients: (Intercept) height 0.4463 1.1150

How to get more information about the regression

summary(reg) Call: lm(formula = weight ~ height, data = d) Residuals: 1 2 3 4 5 6

0.14520 -0.27676 0.14624 0.04688 -0.04359 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.4463 0.1272 3.51 0.0247 * height 1.1151 0.0420 26.55 1.2e-05 ***

0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1757 on 4 degrees of freedom Multiple R-squared: 0.9944, Adjusted R-squared: 0.9929 F-statistic: 704.8 on 1 and 4 DF, p-value: 1.197e-05

How to extract the coefficients

coefficients(reg) (Intercept) height 0.4463429 1.1150495 coefficients(reg)[["(Intercept)"]] [1] 0.4463429 coefficients(reg)[["height"]] [1] 1.115049

Add regression line information

plot + annotate("text", x = 1, y = 5, label = sprintf("y = %.4f + %.4f x", coefficients(reg)[["(Intercept)"]], coefficients(reg)[["height"]]))

y = 0.4463 + 1.1150 x

height weight

Add regression line information (fancy)

plot + annotate("text", x = 1, y = 5, label = sprintf("italic(y) == %.4f + %.4f * italic(x)", coefficients(reg)[["(Intercept)"]], coefficients(reg)[["height"]]), parse = TRUE)

y = 0.4463 + 1.115x

height weight

Add other information (gratuitously ornate)

plot + annotate("text", x = 1, y = 5, label = "e^{pi * i} - 1 == 0", parse = TRUE)

eπi − 1 = 0

height weight

Adding the regression info using package ggpubr

library(ggpubr) ggscatter(d, x = "height", y = "weight", add = "reg.line", add.params = list(color = "blue")) + stat_regline_equation(label.x = 1, label.y = 5) + stat_cor(label.x = 1, label.y = 4.7)

y = 0.45 + 1.1 x R = 1 , p = 1.2e-05 2 4 6 1 2 3 4 5 height weight

Simple statistical tests

Create data

Plot the data

ggplot(d2, aes(value, color = group)) + geom_histogram(aes(fill = group), position = "dodge", binwidth = 0.5)

value count group

Two sample t-test

sample estimates: mean of x mean of y 9.947163 10.805536

0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1757 on 4 degrees of freedom Multiple R-squared: 0.9944, Adjusted R-squared: 0.9929 F-statistic: 704.8 on 1 and 4 DF, p-value: 1.197e-05