SOME STATISTICAL TESTS Overview Theory of statistical tests Test - - PowerPoint PPT Presentation
SOME STATISTICAL TESTS Overview Theory of statistical tests Test - - PowerPoint PPT Presentation
R course Tuesday, March 12 2013 SOME STATISTICAL TESTS Overview Theory of statistical tests Test for a difference in mean Test for dependence Nominal variables Continuous variables Ordinal variables Power of a test
Overview
- Theory of statistical tests
- Test for a difference in mean
- Test for dependence
– Nominal variables – Continuous variables – Ordinal variables
- Power of a test
- Degrees of freedom
Overview
- Theory of statistical tests
- Test for a difference in mean
- Test for dependence
– Nominal variables – Continuous variables – Ordinal variables
- Power of a test
- Degrees of freedom
Theory of statistical tests
- Read the § from the lecture notes
Overview
- Theory of statistical tests
- Test for a difference in mean
- Test for dependence
– Nominal variables – Continuous variables – Ordinal variables
- Power of a test
- Degrees of freedom
Test for a difference in mean : T test
- Underline of the test
– What is given? Independent observations (x1 , . . . , xn )
and (y1 , . . . , ym ).
– Null hypothesis: x and y are samples from distributions
having the same mean.
– Test: t-test – R command: t.test( x, y ) – Idea of the test: If the sample means are too far apart,
then reject the null hypothesis.
– Approximative test but rather robust
Test for a difference in mean : T test
- Ex 1: marsians
– Dataset containing
height for marsians of different colors
– Reject the null hypo – It was an unpaired t
test (no dependence between the 2 samples)
> mars <- read.table("mars.txt",header=TRUE) > head(mars) size color 1 65.67974 red 2 65.90436 red 3 67.34730 red 4 60.42924 red 5 55.34526 red 6 62.85024 red > attach(mars) > t.test(size[color=="green"],size[color=="blue"]) Two Sample t-test data: size[color == "green"] and size[color == "blue"] t = -3.4244, df = 19.419, p-value = 0.002775 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:
- 16.875514 -4.083647
sample estimates: mean of x mean of y 60.86840 71.34798
Test for a difference in mean : T test
- Ex 1: marsians
– Dataset containing
height for marsians of different colors
– Reject the null hypo – It was an unpaired t
test (no dependence between the 2 samples)
> mars <- read.table("mars.txt",header=TRUE) > head(mars) size color 1 65.67974 red 2 65.90436 red 3 67.34730 red 4 60.42924 red 5 55.34526 red 6 62.85024 red > attach(mars) > t.test(size[color=="green"],size[color=="blue"]) Two Sample t-test data: size[color == "green"] and size[color == "blue"] t = -3.4244, df = 19.419, p-value = 0.002775 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:
- 16.875514 -4.083647
sample estimates: mean of x mean of y 60.86840 71.34798
Test for a difference in mean : T test
- Ex 2: shoe wear
– Dataset containing
wear of shoes of 2 materials A and B
– Paired test because
some boys will cause more damage to the shoe than others
– Reject the null hypo
> data(shoes,package=’MASS’) > attach(shoes) > head(shoes) $A [1] 13.2 8.2 10.9 14.3 10.7 6.6 9.5 10.8 8.8 13.3 $B [1] 14.0 8.8 11.2 14.2 11.8 6.4 9.8 11.3 9.3 13.6 > t.test(A,B,paired=TRUE) Paired t-test data: A and B t = -3.3489, df = 9, p-value = 0.008539 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:
- 0.6869539 -0.1330461
sample estimates: mean of the differences
- 0.41
Test for a difference in mean : T test
- Ex 2: shoe wear
– Dataset containing
wear of shoes of 2 materials A and B
– Paired test because
some boys will cause more damage to the shoe than others
– Reject the null hypo
> data(shoes,package=’MASS’) > attach(shoes) > head(shoes) $A [1] 13.2 8.2 10.9 14.3 10.7 6.6 9.5 10.8 8.8 13.3 $B [1] 14.0 8.8 11.2 14.2 11.8 6.4 9.8 11.3 9.3 13.6 > t.test(A,B,paired=TRUE) Paired t-test data: A and B t = -3.3489, df = 9, p-value = 0.008539 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:
- 0.6869539 -0.1330461
sample estimates: mean of the differences
- 0.41
Test for a difference in mean : T test
- Linked tests that might be of interest
– var.test() to test for equality in variance
→ this way you can change the option var.equal in t.test()
– shapiro.test() to test for normality for example
before doing a Pearson correlation
The null hypothesis of the shapiro test is normal distribution
Overview
- Theory of statistical tests
- Test for a difference in mean
- Test for dependence
– Nominal variables – Continuous variables – Ordinal variables
- Power of a test
- Degrees of freedom
Test for dependence
- The test depends from the data type
– Nominal variables (not ordered like eye color or
gender)
– Ordinal variables (ordered but not continuous like
result of a dice)
– Continuous variables (like body height)
Overview
- Theory of statistical tests
- Test for a difference in mean
- Test for dependence
– Nominal variables – Continuous variables – Ordinal variables
- Power of a test
- Degrees of freedom
Test for dependence Nominal (count) variables
- Underline of the test
– What is given? Pairwise observations (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) – Null hypothesis: x and y are independent – Test: χ2 -test for independence – R command: chisq.test( x, y ) or chisq.test( contingency.table ) – Idea of the test: Calculate the expected abundancies under the assumption of
- independence. If the observed abundancies deviate too much from the
expected abundancies, then reject the null hypothesis.
– Approximate test, see the conditions on the lecture notes
Test for dependence Nominal (count) variables
- Ex 1: χ2 -test
> contingency <- matrix( c(47,3,8,42,60,15,8,33,3), nrow=3 ) > chisq.test(contingency)$expected [,1] [,2] [,3] [1,] 25.689498 51.82192 19.488584 [2,] 25.424658 51.28767 19.287671 [3,] 6.885845 13.89041 5.223744 # expected abundancies are all above 5, so we may apply the test > chisq.test(contingency) Pearson’s Chi-squared test data: contingency X-squared = 58.5349, df = 4, p-value = 5.892e-12
- Reject the null hypo that the two variables are independent
Test for dependence Nominal (count) variables
- Fisher´s exact test
– 2*2 contingency tables – Example:
> table <- matrix( c(14,10,21,3), nrow=2 ) > fisher.test(table) Fisher’s Exact Test for Count Data data: table p-value = 0.04899 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.03105031 0.99446037 sample estimates:
- dds ratio 0.2069884
- We reject the null hypo
Overview
- Theory of statistical tests
- Test for a difference in mean
- Test for dependence
– Nominal variables – Continuous variables – Ordinal variables
- Power of a test
- Degrees of freedom
Test for dependence Continuous variables
- Underline of the test
– What is given? Pairwise observations (x1 , y1 ),
(x2 , y2 ), . . . , (xn , yn ); all values in some interval are possible
– Null hypothesis: x and y are independent – Test: Pearson’s correlation test for independence – Assumption: x and y are samples from a normal
distribution
– R command: cor.test( x, y )
Test for dependence Continuous variables
- Ex:
– Distance needed to
stop from a certain speed for cars
– Reject the null hypo
> data(cars) > attach(cars) > str(cars) > ?cars > plot(speed,dist) > cor.test(speed, dist) Pearson’s product-moment correlation data: speed and dist t = 9.464, df = 48, p-value = 1.49e-12 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.6816422 0.8862036 sample estimates: cor 0.8068949
Test for dependence Continuous variables
- Ex:
– Distance needed to
stop from a certain speed for cars
– Reject the null hypo
> data(cars) > attach(cars) > str(cars) > ?cars > plot(speed,dist) > cor.test(speed, dist) Pearson’s product-moment correlation data: speed and dist t = 9.464, df = 48, p-value = 1.49e-12 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.6816422 0.8862036 sample estimates: cor 0.8068949
Overview
- Theory of statistical tests
- Test for a difference in mean
- Test for dependence
– Nominal variables – Continuous variables – Ordinal variables
- Power of a test
- Degrees of freedom
Test for dependence Ordinal variables
- Underline of the test
– What is given? Pairwise
- bservations (x1 , y1 ),
(x2 , y2 ), . . . , (xn , yn ); values can be ordered
– Null hypothesis: x and y
are uncorrelated
– Test: Spearman’s rank
correlation rho
– R command: cor.test( x,
y, method="spearman")
> data(cars) > attach(cars) > cor.test(speed, dist, method=”spearman”) Spearman's rank correlation rho data: speed and dist S = 3532.819, p-value = 8.825e-14 alternative hypothesis: true rho is not equal to sample estimates: rho 0.8303568 Warning message: In cor.test.default(speed, dist, method = "spearman") : Cannot compute exact p-values with ties
Overview
- Theory of statistical tests
- Test for a difference in mean
- Test for dependence
– Nominal variables – Continuous variables – Ordinal variables
- Power of a test
- Degrees of freedom
The power of a test
- Alternative hypothesis H1
– Ex: H0: µ=0 and H1: µ≠0
- 2 types of error
– Type I error (or “first kind” or “α error” or “false positive”): rejecting H0 when it is
true
– Type II error (or “second kind” or “β error” or “false negative”): failing to reject H0
when it is not true
- Power is 1-β
– If power=0 you will never reject H0 – Ex: if the true value is close to 0, the test has no chance to reject H0: rather
choose |µ|>=0.5
- In general the power increase with sample size
– Use power.test() or power.fisher.test() to calculate the min sample size needed
Overview
- Theory of statistical tests
- Test for a difference in mean
- Test for dependence
– Nominal variables – Continuous variables – Ordinal variables
- Power of a test
- Degrees of freedom → see your lecture notes