SOME STATISTICAL TESTS Overview Theory of statistical tests Test - - PowerPoint PPT Presentation

some statistical tests
SMART_READER_LITE
LIVE PREVIEW

SOME STATISTICAL TESTS Overview Theory of statistical tests Test - - PowerPoint PPT Presentation

R course Tuesday, March 12 2013 SOME STATISTICAL TESTS Overview Theory of statistical tests Test for a difference in mean Test for dependence Nominal variables Continuous variables Ordinal variables Power of a test


slide-1
SLIDE 1

R course Tuesday, March 12 2013

SOME STATISTICAL TESTS

slide-2
SLIDE 2

Overview

  • Theory of statistical tests
  • Test for a difference in mean
  • Test for dependence

– Nominal variables – Continuous variables – Ordinal variables

  • Power of a test
  • Degrees of freedom
slide-3
SLIDE 3

Overview

  • Theory of statistical tests
  • Test for a difference in mean
  • Test for dependence

– Nominal variables – Continuous variables – Ordinal variables

  • Power of a test
  • Degrees of freedom
slide-4
SLIDE 4

Theory of statistical tests

  • Read the § from the lecture notes
slide-5
SLIDE 5

Overview

  • Theory of statistical tests
  • Test for a difference in mean
  • Test for dependence

– Nominal variables – Continuous variables – Ordinal variables

  • Power of a test
  • Degrees of freedom
slide-6
SLIDE 6

Test for a difference in mean : T test

  • Underline of the test

– What is given? Independent observations (x1 , . . . , xn )

and (y1 , . . . , ym ).

– Null hypothesis: x and y are samples from distributions

having the same mean.

– Test: t-test – R command: t.test( x, y ) – Idea of the test: If the sample means are too far apart,

then reject the null hypothesis.

– Approximative test but rather robust

slide-7
SLIDE 7

Test for a difference in mean : T test

  • Ex 1: marsians

– Dataset containing

height for marsians of different colors

– Reject the null hypo – It was an unpaired t

test (no dependence between the 2 samples)

> mars <- read.table("mars.txt",header=TRUE) > head(mars) size color 1 65.67974 red 2 65.90436 red 3 67.34730 red 4 60.42924 red 5 55.34526 red 6 62.85024 red > attach(mars) > t.test(size[color=="green"],size[color=="blue"]) Two Sample t-test data: size[color == "green"] and size[color == "blue"] t = -3.4244, df = 19.419, p-value = 0.002775 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

  • 16.875514 -4.083647

sample estimates: mean of x mean of y 60.86840 71.34798

slide-8
SLIDE 8

Test for a difference in mean : T test

  • Ex 1: marsians

– Dataset containing

height for marsians of different colors

– Reject the null hypo – It was an unpaired t

test (no dependence between the 2 samples)

> mars <- read.table("mars.txt",header=TRUE) > head(mars) size color 1 65.67974 red 2 65.90436 red 3 67.34730 red 4 60.42924 red 5 55.34526 red 6 62.85024 red > attach(mars) > t.test(size[color=="green"],size[color=="blue"]) Two Sample t-test data: size[color == "green"] and size[color == "blue"] t = -3.4244, df = 19.419, p-value = 0.002775 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

  • 16.875514 -4.083647

sample estimates: mean of x mean of y 60.86840 71.34798

slide-9
SLIDE 9

Test for a difference in mean : T test

  • Ex 2: shoe wear

– Dataset containing

wear of shoes of 2 materials A and B

– Paired test because

some boys will cause more damage to the shoe than others

– Reject the null hypo

> data(shoes,package=’MASS’) > attach(shoes) > head(shoes) $A [1] 13.2 8.2 10.9 14.3 10.7 6.6 9.5 10.8 8.8 13.3 $B [1] 14.0 8.8 11.2 14.2 11.8 6.4 9.8 11.3 9.3 13.6 > t.test(A,B,paired=TRUE) Paired t-test data: A and B t = -3.3489, df = 9, p-value = 0.008539 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

  • 0.6869539 -0.1330461

sample estimates: mean of the differences

  • 0.41
slide-10
SLIDE 10

Test for a difference in mean : T test

  • Ex 2: shoe wear

– Dataset containing

wear of shoes of 2 materials A and B

– Paired test because

some boys will cause more damage to the shoe than others

– Reject the null hypo

> data(shoes,package=’MASS’) > attach(shoes) > head(shoes) $A [1] 13.2 8.2 10.9 14.3 10.7 6.6 9.5 10.8 8.8 13.3 $B [1] 14.0 8.8 11.2 14.2 11.8 6.4 9.8 11.3 9.3 13.6 > t.test(A,B,paired=TRUE) Paired t-test data: A and B t = -3.3489, df = 9, p-value = 0.008539 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

  • 0.6869539 -0.1330461

sample estimates: mean of the differences

  • 0.41
slide-11
SLIDE 11

Test for a difference in mean : T test

  • Linked tests that might be of interest

– var.test() to test for equality in variance

→ this way you can change the option var.equal in t.test()

– shapiro.test() to test for normality for example

before doing a Pearson correlation

The null hypothesis of the shapiro test is normal distribution

slide-12
SLIDE 12

Overview

  • Theory of statistical tests
  • Test for a difference in mean
  • Test for dependence

– Nominal variables – Continuous variables – Ordinal variables

  • Power of a test
  • Degrees of freedom
slide-13
SLIDE 13

Test for dependence

  • The test depends from the data type

– Nominal variables (not ordered like eye color or

gender)

– Ordinal variables (ordered but not continuous like

result of a dice)

– Continuous variables (like body height)

slide-14
SLIDE 14

Overview

  • Theory of statistical tests
  • Test for a difference in mean
  • Test for dependence

– Nominal variables – Continuous variables – Ordinal variables

  • Power of a test
  • Degrees of freedom
slide-15
SLIDE 15

Test for dependence Nominal (count) variables

  • Underline of the test

– What is given? Pairwise observations (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) – Null hypothesis: x and y are independent – Test: χ2 -test for independence – R command: chisq.test( x, y ) or chisq.test( contingency.table ) – Idea of the test: Calculate the expected abundancies under the assumption of

  • independence. If the observed abundancies deviate too much from the

expected abundancies, then reject the null hypothesis.

– Approximate test, see the conditions on the lecture notes

slide-16
SLIDE 16

Test for dependence Nominal (count) variables

  • Ex 1: χ2 -test

> contingency <- matrix( c(47,3,8,42,60,15,8,33,3), nrow=3 ) > chisq.test(contingency)$expected [,1] [,2] [,3] [1,] 25.689498 51.82192 19.488584 [2,] 25.424658 51.28767 19.287671 [3,] 6.885845 13.89041 5.223744 # expected abundancies are all above 5, so we may apply the test > chisq.test(contingency) Pearson’s Chi-squared test data: contingency X-squared = 58.5349, df = 4, p-value = 5.892e-12

  • Reject the null hypo that the two variables are independent
slide-17
SLIDE 17

Test for dependence Nominal (count) variables

  • Fisher´s exact test

– 2*2 contingency tables – Example:

> table <- matrix( c(14,10,21,3), nrow=2 ) > fisher.test(table) Fisher’s Exact Test for Count Data data: table p-value = 0.04899 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.03105031 0.99446037 sample estimates:

  • dds ratio 0.2069884
  • We reject the null hypo
slide-18
SLIDE 18

Overview

  • Theory of statistical tests
  • Test for a difference in mean
  • Test for dependence

– Nominal variables – Continuous variables – Ordinal variables

  • Power of a test
  • Degrees of freedom
slide-19
SLIDE 19

Test for dependence Continuous variables

  • Underline of the test

– What is given? Pairwise observations (x1 , y1 ),

(x2 , y2 ), . . . , (xn , yn ); all values in some interval are possible

– Null hypothesis: x and y are independent – Test: Pearson’s correlation test for independence – Assumption: x and y are samples from a normal

distribution

– R command: cor.test( x, y )

slide-20
SLIDE 20

Test for dependence Continuous variables

  • Ex:

– Distance needed to

stop from a certain speed for cars

– Reject the null hypo

> data(cars) > attach(cars) > str(cars) > ?cars > plot(speed,dist) > cor.test(speed, dist) Pearson’s product-moment correlation data: speed and dist t = 9.464, df = 48, p-value = 1.49e-12 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.6816422 0.8862036 sample estimates: cor 0.8068949

slide-21
SLIDE 21

Test for dependence Continuous variables

  • Ex:

– Distance needed to

stop from a certain speed for cars

– Reject the null hypo

> data(cars) > attach(cars) > str(cars) > ?cars > plot(speed,dist) > cor.test(speed, dist) Pearson’s product-moment correlation data: speed and dist t = 9.464, df = 48, p-value = 1.49e-12 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.6816422 0.8862036 sample estimates: cor 0.8068949

slide-22
SLIDE 22

Overview

  • Theory of statistical tests
  • Test for a difference in mean
  • Test for dependence

– Nominal variables – Continuous variables – Ordinal variables

  • Power of a test
  • Degrees of freedom
slide-23
SLIDE 23

Test for dependence Ordinal variables

  • Underline of the test

– What is given? Pairwise

  • bservations (x1 , y1 ),

(x2 , y2 ), . . . , (xn , yn ); values can be ordered

– Null hypothesis: x and y

are uncorrelated

– Test: Spearman’s rank

correlation rho

– R command: cor.test( x,

y, method="spearman")

> data(cars) > attach(cars) > cor.test(speed, dist, method=”spearman”) Spearman's rank correlation rho data: speed and dist S = 3532.819, p-value = 8.825e-14 alternative hypothesis: true rho is not equal to sample estimates: rho 0.8303568 Warning message: In cor.test.default(speed, dist, method = "spearman") : Cannot compute exact p-values with ties

slide-24
SLIDE 24

Overview

  • Theory of statistical tests
  • Test for a difference in mean
  • Test for dependence

– Nominal variables – Continuous variables – Ordinal variables

  • Power of a test
  • Degrees of freedom
slide-25
SLIDE 25

The power of a test

  • Alternative hypothesis H1

– Ex: H0: µ=0 and H1: µ≠0

  • 2 types of error

– Type I error (or “first kind” or “α error” or “false positive”): rejecting H0 when it is

true

– Type II error (or “second kind” or “β error” or “false negative”): failing to reject H0

when it is not true

  • Power is 1-β

– If power=0 you will never reject H0 – Ex: if the true value is close to 0, the test has no chance to reject H0: rather

choose |µ|>=0.5

  • In general the power increase with sample size

– Use power.test() or power.fisher.test() to calculate the min sample size needed

slide-26
SLIDE 26

Overview

  • Theory of statistical tests
  • Test for a difference in mean
  • Test for dependence

– Nominal variables – Continuous variables – Ordinal variables

  • Power of a test
  • Degrees of freedom → see your lecture notes