Rcourse: Basic statistics with R Sonja Grath, No emie Becker & - - PowerPoint PPT Presentation

rcourse basic statistics with r
SMART_READER_LITE
LIVE PREVIEW

Rcourse: Basic statistics with R Sonja Grath, No emie Becker & - - PowerPoint PPT Presentation

Rcourse: Basic statistics with R Sonja Grath, No emie Becker & Dirk Metzler Winter semester 2014-15 Theory of statistical tests 1 Test for a difference in means 2 Testing for dependence 3 Nominal variables Continuous variables


slide-1
SLIDE 1

Rcourse: Basic statistics with R

Sonja Grath, No´ emie Becker & Dirk Metzler Winter semester 2014-15

slide-2
SLIDE 2

1

Theory of statistical tests

2

Test for a difference in means

3

Testing for dependence Nominal variables Continuous variables Ordinal variables

4

Power of a test

5

Degrees of freedom

slide-3
SLIDE 3

Theory of statistical tests

Contents

1

Theory of statistical tests

2

Test for a difference in means

3

Testing for dependence Nominal variables Continuous variables Ordinal variables

4

Power of a test

5

Degrees of freedom

slide-4
SLIDE 4

Theory of statistical tests

A simple example

You want to show that a treatment is effective.

slide-5
SLIDE 5

Theory of statistical tests

A simple example

You want to show that a treatment is effective. You have data for 2 groups of patients with and without treatment.

slide-6
SLIDE 6

Theory of statistical tests

A simple example

You want to show that a treatment is effective. You have data for 2 groups of patients with and without treatment. 80% patients with treatment recovered whereas only 30% patients without recovered.

slide-7
SLIDE 7

Theory of statistical tests

A simple example

You want to show that a treatment is effective. You have data for 2 groups of patients with and without treatment. 80% patients with treatment recovered whereas only 30% patients without recovered. A pessimist would say that this just happened by chance. What do you do to convince the pessimist?

slide-8
SLIDE 8

Theory of statistical tests

A simple example

You want to show that a treatment is effective. You have data for 2 groups of patients with and without treatment. 80% patients with treatment recovered whereas only 30% patients without recovered. A pessimist would say that this just happened by chance. What do you do to convince the pessimist? You assume he is right and you show that under this hypothesis the data would be very unlikely.

slide-9
SLIDE 9

Theory of statistical tests

In statistical words

What you want to show is the alternative hypothesis H1. The pessimist (by chance) is the null hypothesis H0.

slide-10
SLIDE 10

Theory of statistical tests

In statistical words

What you want to show is the alternative hypothesis H1. The pessimist (by chance) is the null hypothesis H0. Show that the observation and everything more ’extreme’ is sufficiently unlikely under this null hypothesis. Scientists have agreed that it suffices that this probability is at most 5%. This refutes the pessimist. Statistical language: We reject the null hypothesis on the significance level 5%.

slide-11
SLIDE 11

Theory of statistical tests

In statistical words

What you want to show is the alternative hypothesis H1. The pessimist (by chance) is the null hypothesis H0. Show that the observation and everything more ’extreme’ is sufficiently unlikely under this null hypothesis. Scientists have agreed that it suffices that this probability is at most 5%. This refutes the pessimist. Statistical language: We reject the null hypothesis on the significance level 5%. p = P(observation and everything more ’extreme’ /H0 is true ) If the p value is over 5% you say you cannot reject the null hypothesis.

slide-12
SLIDE 12

Theory of statistical tests

Statistical tests in R

There is a huge variety of statistical tests that you can perform in R. We will cover the most basic ones in this lecture and you can find a non-exhaustive list in your lecture notes.

slide-13
SLIDE 13

Test for a difference in means

Contents

1

Theory of statistical tests

2

Test for a difference in means

3

Testing for dependence Nominal variables Continuous variables Ordinal variables

4

Power of a test

5

Degrees of freedom

slide-14
SLIDE 14

Test for a difference in means

The Students T test: Underline

What is given? Independent observations (x1 , . . . , xn) and (y1 , . . . , ym)).

slide-15
SLIDE 15

Test for a difference in means

The Students T test: Underline

What is given? Independent observations (x1 , . . . , xn) and (y1 , . . . , ym)). Null hypothesis: x and y are samples from distributions having the same mean.

slide-16
SLIDE 16

Test for a difference in means

The Students T test: Underline

What is given? Independent observations (x1 , . . . , xn) and (y1 , . . . , ym)). Null hypothesis: x and y are samples from distributions having the same mean. R command: t.test(x,y)

slide-17
SLIDE 17

Test for a difference in means

The Students T test: Underline

What is given? Independent observations (x1 , . . . , xn) and (y1 , . . . , ym)). Null hypothesis: x and y are samples from distributions having the same mean. R command: t.test(x,y) Idea of the test: If the sample means are too far apart, then reject the null hypothesis.

slide-18
SLIDE 18

Test for a difference in means

The Students T test: Underline

What is given? Independent observations (x1 , . . . , xn) and (y1 , . . . , ym)). Null hypothesis: x and y are samples from distributions having the same mean. R command: t.test(x,y) Idea of the test: If the sample means are too far apart, then reject the null hypothesis. Approximative test but rather robust

slide-19
SLIDE 19

Test for a difference in means

Martian example

Dataset containing height of martian of different colours. See the code on the R console.

slide-20
SLIDE 20

Test for a difference in means

Martian example

Dataset containing height of martian of different colours. See the code on the R console. We cannot reject the null hypothesis. It was an unpaired test because the two samples are independent.

slide-21
SLIDE 21

Test for a difference in means

Shoe example

Dataset containing wear of shoes of 2 materials A and B. The same persons have weared the two types of shoes abd we have a measure of use of the shoes.

slide-22
SLIDE 22

Test for a difference in means

Shoe example

Dataset containing wear of shoes of 2 materials A and B. The same persons have weared the two types of shoes abd we have a measure of use of the shoes. Paired test because some persons will cause more damage to the shoe than others. See the code on the R console.

slide-23
SLIDE 23

Test for a difference in means

Shoe example

Dataset containing wear of shoes of 2 materials A and B. The same persons have weared the two types of shoes abd we have a measure of use of the shoes. Paired test because some persons will cause more damage to the shoe than others. See the code on the R console. We can reject the null hypothesis.

slide-24
SLIDE 24

Test for a difference in means

Test for (un)equality of variances

In t.test() there is an option var.equal=. This way we can control if the variances between the two samples are assumed to be equal or not. The default value is FALSE. If you have a good biological reason, you can assume that the variances are equal. You can test for equality of variances by applying a variance test with the command var.test. Let’s see an example on the R console.

slide-25
SLIDE 25

Test for a difference in means

Test for (un)equality of variances

In t.test() there is an option var.equal=. This way we can control if the variances between the two samples are assumed to be equal or not. The default value is FALSE. If you have a good biological reason, you can assume that the variances are equal. You can test for equality of variances by applying a variance test with the command var.test. Let’s see an example on the R console.

slide-26
SLIDE 26

Testing for dependence

Contents

1

Theory of statistical tests

2

Test for a difference in means

3

Testing for dependence Nominal variables Continuous variables Ordinal variables

4

Power of a test

5

Degrees of freedom

slide-27
SLIDE 27

Testing for dependence

Testing for dependence

The test depends on the data type: Nominal variables: not ordered like eye colour or gender

slide-28
SLIDE 28

Testing for dependence

Testing for dependence

The test depends on the data type: Nominal variables: not ordered like eye colour or gender Ordinal variables: ordered but not continuous like the result of a dice

slide-29
SLIDE 29

Testing for dependence

Testing for dependence

The test depends on the data type: Nominal variables: not ordered like eye colour or gender Ordinal variables: ordered but not continuous like the result of a dice Continuous variables: like body height

slide-30
SLIDE 30

Testing for dependence

Testing for dependence

The test depends on the data type: Nominal variables: not ordered like eye colour or gender Ordinal variables: ordered but not continuous like the result of a dice Continuous variables: like body height

slide-31
SLIDE 31

Testing for dependence Nominal variables

Contents

1

Theory of statistical tests

2

Test for a difference in means

3

Testing for dependence Nominal variables Continuous variables Ordinal variables

4

Power of a test

5

Degrees of freedom

slide-32
SLIDE 32

Testing for dependence Nominal variables

Nominal variables: Underline

What is given? Pairwise observations (x1 , y1) , (x2 , y2) ... (xn , yn)

slide-33
SLIDE 33

Testing for dependence Nominal variables

Nominal variables: Underline

What is given? Pairwise observations (x1 , y1) , (x2 , y2) ... (xn , yn) Null hypothesis: x and y are independent

slide-34
SLIDE 34

Testing for dependence Nominal variables

Nominal variables: Underline

What is given? Pairwise observations (x1 , y1) , (x2 , y2) ... (xn , yn) Null hypothesis: x and y are independent Test: χ2

slide-35
SLIDE 35

Testing for dependence Nominal variables

Nominal variables: Underline

What is given? Pairwise observations (x1 , y1) , (x2 , y2) ... (xn , yn) Null hypothesis: x and y are independent Test: χ2 R command: chisq.test(x,y) or chisq.test(contingency table)

slide-36
SLIDE 36

Testing for dependence Nominal variables

Nominal variables: Underline

What is given? Pairwise observations (x1 , y1) , (x2 , y2) ... (xn , yn) Null hypothesis: x and y are independent Test: χ2 R command: chisq.test(x,y) or chisq.test(contingency table) Idea of the test: Calculate the expected abundances under the assumption of independence. If the observed abundances deviate too much from the expected abundances, then reject the null hypothesis.

slide-37
SLIDE 37

Testing for dependence Nominal variables

Nominal variables: Underline

What is given? Pairwise observations (x1 , y1) , (x2 , y2) ... (xn , yn) Null hypothesis: x and y are independent Test: χ2 R command: chisq.test(x,y) or chisq.test(contingency table) Idea of the test: Calculate the expected abundances under the assumption of independence. If the observed abundances deviate too much from the expected abundances, then reject the null hypothesis. Approximative test, see the conditions on the lecture notes

slide-38
SLIDE 38

Testing for dependence Nominal variables

Nominal variables: Example

contingency <- matrix( c(47,3,8,42,60,15,8,33,3), nrow=3 ) chisq.test(contingency)$expected See on the R console.

slide-39
SLIDE 39

Testing for dependence Nominal variables

Nominal variables: Example

contingency <- matrix( c(47,3,8,42,60,15,8,33,3), nrow=3 ) chisq.test(contingency)$expected See on the R console. All expected abundances are above 5, so we may apply the test. chisq.test(contingency)

slide-40
SLIDE 40

Testing for dependence Nominal variables

Nominal variables: Example

contingency <- matrix( c(47,3,8,42,60,15,8,33,3), nrow=3 ) chisq.test(contingency)$expected See on the R console. All expected abundances are above 5, so we may apply the test. chisq.test(contingency) Reject the null hypothesis that the two variables are independent.

slide-41
SLIDE 41

Testing for dependence Nominal variables

Nominal variables: Fishers exact test

In case of 2 by 2 contigency tables the chi square approximation is not needed and we can use the Fisher’s exact test. table <- matrix( c(14,10,21,3), nrow=2 ) fisher.test(table) See on the R console.

slide-42
SLIDE 42

Testing for dependence Nominal variables

Nominal variables: Fishers exact test

In case of 2 by 2 contigency tables the chi square approximation is not needed and we can use the Fisher’s exact test. table <- matrix( c(14,10,21,3), nrow=2 ) fisher.test(table) See on the R console. Reject the null hypothesis that the two variables are independent.

slide-43
SLIDE 43

Testing for dependence Continuous variables

Contents

1

Theory of statistical tests

2

Test for a difference in means

3

Testing for dependence Nominal variables Continuous variables Ordinal variables

4

Power of a test

5

Degrees of freedom

slide-44
SLIDE 44

Testing for dependence Continuous variables

Continuous variables: Underline

What is given? Pairwise observations (x1 , y1) , (x2 , y2) ... (xn , yn)

slide-45
SLIDE 45

Testing for dependence Continuous variables

Continuous variables: Underline

What is given? Pairwise observations (x1 , y1) , (x2 , y2) ... (xn , yn) Null hypothesis: x and y are independent

slide-46
SLIDE 46

Testing for dependence Continuous variables

Continuous variables: Underline

What is given? Pairwise observations (x1 , y1) , (x2 , y2) ... (xn , yn) Null hypothesis: x and y are independent Test: Pearsons correlation test for independence Assumption: x and y are samples from a normal distribution.

slide-47
SLIDE 47

Testing for dependence Continuous variables

Continuous variables: Underline

What is given? Pairwise observations (x1 , y1) , (x2 , y2) ... (xn , yn) Null hypothesis: x and y are independent Test: Pearsons correlation test for independence Assumption: x and y are samples from a normal distribution. R command: cor.test(x,y)

slide-48
SLIDE 48

Testing for dependence Continuous variables

Continuous variables: Example

Distance needed to stop from a certain speed for cars. This dataset is pre-installed in R and can be loaded with the command data(cars)

slide-49
SLIDE 49

Testing for dependence Continuous variables

Continuous variables: Example

Distance needed to stop from a certain speed for cars. This dataset is pre-installed in R and can be loaded with the command data(cars) Reject the null hypothesis that the correlation is equal to 0.

slide-50
SLIDE 50

Testing for dependence Continuous variables

Testing for neutrality

The Pearsons correlation assumes normal distrubition of the variables. When this is not true you can modify the option method = "pearson" to use another type of correlation test (Kendall or Spearman). If you want to test for deviation from the normality you can apply a Shapiro test with the command shapiro.test. Let’s see an example on the R console.

slide-51
SLIDE 51

Testing for dependence Continuous variables

Testing for neutrality

The Pearsons correlation assumes normal distrubition of the variables. When this is not true you can modify the option method = "pearson" to use another type of correlation test (Kendall or Spearman). If you want to test for deviation from the normality you can apply a Shapiro test with the command shapiro.test. Let’s see an example on the R console. The measure of speed does not deviate significantly from normality, but the distance variable does deviate.

slide-52
SLIDE 52

Testing for dependence Ordinal variables

Contents

1

Theory of statistical tests

2

Test for a difference in means

3

Testing for dependence Nominal variables Continuous variables Ordinal variables

4

Power of a test

5

Degrees of freedom

slide-53
SLIDE 53

Testing for dependence Ordinal variables

Ordinal variables: Underline

What is given? Pairwise observations (x1 , y1) , (x2 , y2) ... (xn , yn), values can be ordered.

slide-54
SLIDE 54

Testing for dependence Ordinal variables

Ordinal variables: Underline

What is given? Pairwise observations (x1 , y1) , (x2 , y2) ... (xn , yn), values can be ordered. Null hypothesis: x and y are uncorrelated

slide-55
SLIDE 55

Testing for dependence Ordinal variables

Ordinal variables: Underline

What is given? Pairwise observations (x1 , y1) , (x2 , y2) ... (xn , yn), values can be ordered. Null hypothesis: x and y are uncorrelated Test: spearmans rank correlation rho R command: cor.test(x,y, method="spearman")

slide-56
SLIDE 56

Testing for dependence Ordinal variables

Ordinal variables: Example

Number of important scientific discoveries or inventions per

  • year. This dataset is pre-installed in R and can be loaded with

the command data(discoveries)

Time discoveries 1860 1880 1900 1920 1940 1960 2 4 6 8 10 12

slide-57
SLIDE 57

Testing for dependence Ordinal variables

Ordinal variables: Example

Number of important scientific discoveries or inventions per

  • year. This dataset is pre-installed in R and can be loaded with

the command data(discoveries)

Time discoveries 1860 1880 1900 1920 1940 1960 2 4 6 8 10 12

Reject the null hypothesis that the correlation is equal to 0. There is a significant negative correlation.

slide-58
SLIDE 58

Power of a test

Contents

1

Theory of statistical tests

2

Test for a difference in means

3

Testing for dependence Nominal variables Continuous variables Ordinal variables

4

Power of a test

5

Degrees of freedom

slide-59
SLIDE 59

Power of a test

Definition

There are two types of error for a statistical test: Type I error (or first kind or alpha error or false positive): rejecting H0 when it is true.

slide-60
SLIDE 60

Power of a test

Definition

There are two types of error for a statistical test: Type I error (or first kind or alpha error or false positive): rejecting H0 when it is true. Type II error (or second kind or beta error or false negative): failing to reject H0 when it is not true.

slide-61
SLIDE 61

Power of a test

Definition

There are two types of error for a statistical test: Type I error (or first kind or alpha error or false positive): rejecting H0 when it is true. Type II error (or second kind or beta error or false negative): failing to reject H0 when it is not true. Power of a test = 1 - β If power=0: you will never reject H0.

slide-62
SLIDE 62

Power of a test

Definition

There are two types of error for a statistical test: Type I error (or first kind or alpha error or false positive): rejecting H0 when it is true. Type II error (or second kind or beta error or false negative): failing to reject H0 when it is not true. Power of a test = 1 - β If power=0: you will never reject H0. The choice of H1 is important because it will influence the power. In general the power increases with sample size.

slide-63
SLIDE 63

Power of a test

Power in R

Use the functions power.t.test() or power.fisher.test() (in package statmod) to calculate the minimal sample size needed to show a certain difference. We will try this during the exercise session.

slide-64
SLIDE 64

Degrees of freedom

Contents

1

Theory of statistical tests

2

Test for a difference in means

3

Testing for dependence Nominal variables Continuous variables Ordinal variables

4

Power of a test

5

Degrees of freedom

slide-65
SLIDE 65

Degrees of freedom

Concept

You may have noticed that we see a value named df in our test results.

slide-66
SLIDE 66

Degrees of freedom

Concept

You may have noticed that we see a value named df in our test results. Do you know what degrees of freedom are?

slide-67
SLIDE 67

Degrees of freedom

Concept

You may have noticed that we see a value named df in our test results. Do you know what degrees of freedom are? Lets try with an example: Degrees of freedom of a vector x(x1,x2,x3,x4,x5)?

slide-68
SLIDE 68

Degrees of freedom

Concept

You may have noticed that we see a value named df in our test results. Do you know what degrees of freedom are? Lets try with an example: Degrees of freedom of a vector x(x1,x2,x3,x4,x5)?5 Degrees of freedom of the vector x - mean(x)?

slide-69
SLIDE 69

Degrees of freedom

Concept

You may have noticed that we see a value named df in our test results. Do you know what degrees of freedom are? Lets try with an example: Degrees of freedom of a vector x(x1,x2,x3,x4,x5)?5 Degrees of freedom of the vector x - mean(x)?4

slide-70
SLIDE 70

Degrees of freedom

Concept

You may have noticed that we see a value named df in our test results. Do you know what degrees of freedom are? Lets try with an example: Degrees of freedom of a vector x(x1,x2,x3,x4,x5)?5 Degrees of freedom of the vector x - mean(x)?4 Definition: degrees of freedom of a sample = the sample size minus the number of parameters estimated from the sample.