Rcourse: Linear model Sonja Grath, No emie Becker & Dirk - - PowerPoint PPT Presentation

rcourse linear model
SMART_READER_LITE
LIVE PREVIEW

Rcourse: Linear model Sonja Grath, No emie Becker & Dirk - - PowerPoint PPT Presentation

Rcourse: Linear model Sonja Grath, No emie Becker & Dirk Metzler Winter semester 2014-15 Background and basics 1 Analysis of variance 2 Model checking 3 Background and basics Contents Background and basics 1 Analysis of variance


slide-1
SLIDE 1

Rcourse: Linear model

Sonja Grath, No´ emie Becker & Dirk Metzler Winter semester 2014-15

slide-2
SLIDE 2

1

Background and basics

2

Analysis of variance

3

Model checking

slide-3
SLIDE 3

Background and basics

Contents

1

Background and basics

2

Analysis of variance

3

Model checking

slide-4
SLIDE 4

Background and basics

Intruitive linear regression

What is linear regression?

slide-5
SLIDE 5

Background and basics

Intruitive linear regression

What is linear regression? It is the straight line that best approximates a set of points: y=a+b*x a is called the intercept and b the slope.

slide-6
SLIDE 6

Background and basics

Linear regression by eye

I give you the following points: x <- 0:8 ; y <- c(12,10,8,11,6,7,2,3,3) ; plot(x,y)

  • 2

4 6 8 2 4 6 8 10 12 x y

slide-7
SLIDE 7

Background and basics

Linear regression by eye

I give you the following points: x <- 0:8 ; y <- c(12,10,8,11,6,7,2,3,3) ; plot(x,y)

  • 2

4 6 8 2 4 6 8 10 12 x y

By eye we would say a=12 and b=(12-2)/8=1.25

slide-8
SLIDE 8

Background and basics

Linear regression by eye

I give you the following points: x <- 0:8 ; y <- c(12,10,8,11,6,7,2,3,3) ; plot(x,y)

  • 2

4 6 8 2 4 6 8 10 12 x y

By eye we would say a=12 and b=(12-2)/8=1.25

slide-9
SLIDE 9

Background and basics

Best fit in R

y is modelled as a function of x. In R this job is done by the function lm(). Lets try on the R console.

slide-10
SLIDE 10

Background and basics

Best fit in R

y is modelled as a function of x. In R this job is done by the function lm(). Lets try on the R console.

  • 2

4 6 8 2 4 6 8 10 12 x y

The linear model does not explain all of the variation. The error is called ”residual”. The purpose of linear regression is to minimize this error. But do you remember how we do this?

slide-11
SLIDE 11

Background and basics

Statistics

We define the linear regression y = ˆ a + ˆ b · x by minimizing the sum of the square of the residuals: (ˆ a, ˆ b) = arg min

(a,b)

  • i

(yi − (a + b · xi))2 This assumes that a, b exist, so that for all (xi, yi) yi = a + b · xi + εi, where all εi are independant and follow the normal distribution with varaince σ2.

slide-12
SLIDE 12

Background and basics

Statistics

We estimate a and b, by calculating (ˆ a, ˆ b) := arg min

(a,b)

  • i

(yi − (a + b · xi))2

slide-13
SLIDE 13

Background and basics

Statistics

We estimate a and b, by calculating (ˆ a, ˆ b) := arg min

(a,b)

  • i

(yi − (a + b · xi))2 We can calculate ˆ a und ˆ b by ˆ b =

  • i(yi − ¯

y) · (xi − ¯ x)

  • i(xi − ¯

x)2 =

  • i yi · (xi − ¯

x)

  • i(xi − ¯

x)2 and ˆ a = ¯ y − ˆ b · ¯ x.

slide-14
SLIDE 14

Background and basics

Back to our example

The commands used to produce this graph are the following:

  • 2

4 6 8 2 4 6 8 10 12 x y

regr.obj <- lm(y x) fitted <- predict(regr.obj)

slide-15
SLIDE 15

Background and basics

Back to our example

The commands used to produce this graph are the following:

  • 2

4 6 8 2 4 6 8 10 12 x y

regr.obj <- lm(y x) fitted <- predict(regr.obj) plot(x,y); abline(regr.obj) for(i in 1:9) { lines(c(x[i],x[i]),c(y[i],fitted[i])) }

slide-16
SLIDE 16

Analysis of variance

Contents

1

Background and basics

2

Analysis of variance

3

Model checking

slide-17
SLIDE 17

Analysis of variance

Reminder: ANOVA

I am sure you all remember from statistic courses: We observe different mean values for different groups.

Gruppe 1 Gruppe 2 Gruppe 3 −2 2 4

  • Beobachtungswert

High variability within groups

Gruppe 1 Gruppe 2 Gruppe 3 −2 2 4

  • Beobachtungswert

Low variability within groups

slide-18
SLIDE 18

Analysis of variance

Reminder: ANOVA

I am sure you all remember from statistic courses: We observe different mean values for different groups.

Gruppe 1 Gruppe 2 Gruppe 3 −2 2 4

  • Beobachtungswert

High variability within groups

Gruppe 1 Gruppe 2 Gruppe 3 −2 2 4

  • Beobachtungswert

Low variability within groups

Could it be just by chance? It depends from the variability of the group means and of the values within groups.

slide-19
SLIDE 19

Analysis of variance

Reminder: ANOVA

ANOVA-Table (”ANalysis Of VAriance“)

Degrees

  • f

free- dom (DF) Sum

  • f

squares (SS) Mean sum

  • f

squares (SS/DF)

F-Value Groups 1 88.82 88.82 30.97 Residuals 7 20.07 2.87

slide-20
SLIDE 20

Analysis of variance

Reminder: ANOVA

ANOVA-Table (”ANalysis Of VAriance“)

Degrees

  • f

free- dom (DF) Sum

  • f

squares (SS) Mean sum

  • f

squares (SS/DF)

F-Value Groups 1 88.82 88.82 30.97 Residuals 7 20.07 2.87 Under the hypothesis H0 ”the group mean values are equal“ (and

the values are normally distributed)

F is Fisher-distributed with 1 and 7 DF , p = Fisher1,7([30.97, ∞)) ≤ 8 · 10−4.

slide-21
SLIDE 21

Analysis of variance

Reminder: ANOVA

ANOVA-Table (”ANalysis Of VAriance“)

Degrees

  • f

free- dom (DF) Sum

  • f

squares (SS) Mean sum

  • f

squares (SS/DF)

F-Value Groups 1 88.82 88.82 30.97 Residuals 7 20.07 2.87 Under the hypothesis H0 ”the group mean values are equal“ (and

the values are normally distributed)

F is Fisher-distributed with 1 and 7 DF , p = Fisher1,7([30.97, ∞)) ≤ 8 · 10−4. We can reject H0.

slide-22
SLIDE 22

Analysis of variance

ANOVA in R

In R ANOVA is performed using summary.aov() and summary(). These functions apply on a regression: result of command lm(). summary.aov() gives you only the ANOVA table whereas summary() outputs other information such as Residuals, R-square etc ...

slide-23
SLIDE 23

Analysis of variance

ANOVA in R

In R ANOVA is performed using summary.aov() and summary(). These functions apply on a regression: result of command lm(). summary.aov() gives you only the ANOVA table whereas summary() outputs other information such as Residuals, R-square etc ... Lets see a couple of examples with self-generated data in R.

slide-24
SLIDE 24

Model checking

Contents

1

Background and basics

2

Analysis of variance

3

Model checking

slide-25
SLIDE 25

Model checking

Model checking

When you perform a linear model you have to check for the pvalues of your effects but also the variance and the normality of the residues. Why?

slide-26
SLIDE 26

Model checking

Model checking

When you perform a linear model you have to check for the pvalues of your effects but also the variance and the normality of the residues. Why? This is because we assumed in our model that the residues are normally distributed and have the same variance.

slide-27
SLIDE 27

Model checking

Model checking

When you perform a linear model you have to check for the pvalues of your effects but also the variance and the normality of the residues. Why? This is because we assumed in our model that the residues are normally distributed and have the same variance. In R you can do that directly by using the function plot() on your regression object. Lets try on one example. We will focus on the first two graphs.

slide-28
SLIDE 28

Model checking

Model checking: Good example

This is how it should look like:

slide-29
SLIDE 29

Model checking

Model checking: Good example

This is how it should look like: On the first graph, we should see no trend (equal variance).

slide-30
SLIDE 30

Model checking

Model checking: Good example

This is how it should look like: On the first graph, we should see no trend (equal variance). On the second graph, points should be close to the line (normality).

slide-31
SLIDE 31

Model checking

Model checking: Bad example

This is a more problematic case:

slide-32
SLIDE 32

Model checking

Model checking: Bad example

This is a more problematic case: What do you con- clude?