Hypotheses with two variates Paired data R.W. Oldford Common - PowerPoint PPT Presentation

Paired data - the same distribution? If the hypothesis of interest is H 0 : F X ( x ) = F Y ( y ), then we might proceed as before. Doing so, only the marginal distributions are being compared. No use, or even recognition, of the pairing is being made. This is not generally what we want with paired data. Instead, we imagine the pairing has resulted in two different measurements of the same quantity on the same unit, but under different circumstances. For example, the two measurements might be the values of some response (e.g. systolic blood pressure, pulse, speed, yield of grain, etc.) taken on the same unit before and after some treatment or other change. Because of the inherent differences between units, this matching of measurements on the same unit should make detection of any effect of treatment on response easier by removing the unit to unit variability in the comparison. The pairing ( x i , y i ) on the same unit i , suggests that differences between the x i s and the y i s might be better studying the differences based on each pair, namely based on z i = x i − y i ∀ i = 1 , . . . , n . Under the null hypothesis of no difference, we might expect the distribution of Z to be symmetric about 0.

Horseshoe crab counts - the same distribution? For the horseshoe crab counts, the units are the beaches. The thinking is that comparing the individual distributions will not be as powerful as looking at the differences.

Horseshoe crab counts - the same distribution? For the horseshoe crab counts, the units are the beaches. The thinking is that comparing the individual distributions will not be as powerful as looking at the differences. For example, just looking at the histograms of counts and of their paired differences suggests that this might indeed be the case.

Horseshoe crab counts - the same distribution? For the horseshoe crab counts, the units are the beaches. The thinking is that comparing the individual distributions will not be as powerful as looking at the differences. For example, just looking at the histograms of counts and of their paired differences suggests that this might indeed be the case. 2011 2012 15 15 10 10 Frequency Frequency 5 5 0 0 0e+00 1e+05 2e+05 3e+05 0e+00 1e+05 2e+05 3e+05 Crab count Crab count

Horseshoe crab counts - the same distribution? For the horseshoe crab counts, the units are the beaches. The thinking is that comparing the individual distributions will not be as powerful as looking at the differences. For example, just looking at the histograms of counts and of their paired differences suggests that this might indeed be the case. 2011 2012 15 15 10 10 Frequency Frequency 5 5 0 0 0e+00 1e+05 2e+05 3e+05 0e+00 1e+05 2e+05 3e+05 Crab count Crab count 2011 − 2012 15 10 Frequency 5 0 −1e+05 0e+00 1e+05 2e+05 3e+05 Differences in crab count

Horseshoe crab counts - paired t-test The advantage pairing can be seen in testing the hypothesis H 0 : E ( X ) = E ( Y ) via a t-test.

Horseshoe crab counts - paired t-test The advantage pairing can be seen in testing the hypothesis H 0 : E ( X ) = E ( Y ) via a t-test. First unpaired: t.test (Crabs ~ Year, data = horseshoe) $ p.value ## [1] 0.1124666

Horseshoe crab counts - paired t-test The advantage pairing can be seen in testing the hypothesis H 0 : E ( X ) = E ( Y ) via a t-test. First unpaired: t.test (Crabs ~ Year, data = horseshoe) $ p.value ## [1] 0.1124666 Then paired: t.test (Crabs ~ Year, data = horseshoe, paired = TRUE) $ p.value ## [1] 0.04528869

Horseshoe crab counts - paired t-test The advantage pairing can be seen in testing the hypothesis H 0 : E ( X ) = E ( Y ) via a t-test. First unpaired: t.test (Crabs ~ Year, data = horseshoe) $ p.value ## [1] 0.1124666 Then paired: t.test (Crabs ~ Year, data = horseshoe, paired = TRUE) $ p.value ## [1] 0.04528869 These were based on assuming normals (independent samples in the first case, not so in the second).

Horseshoe crab counts - paired t-test The advantage pairing can be seen in testing the hypothesis H 0 : E ( X ) = E ( Y ) via a t-test. First unpaired: t.test (Crabs ~ Year, data = horseshoe) $ p.value ## [1] 0.1124666 Then paired: t.test (Crabs ~ Year, data = horseshoe, paired = TRUE) $ p.value ## [1] 0.04528869 These were based on assuming normals (independent samples in the first case, not so in the second). The histograms for 2011 and 2012 did not look that symmetric, let alone Gaussian. Although the difference is a little better.

Horseshoe crab counts - paired t-test Could have transformed the counts, say to the square root scale effectively √ √ testing a different hypothesis, namely H 0 : E ( X ) = E ( Y ) via the t-tests.

Horseshoe crab counts - paired t-test Could have transformed the counts, say to the square root scale effectively √ √ testing a different hypothesis, namely H 0 : E ( X ) = E ( Y ) via the t-tests. First unpaired: t.test ( sqrt (Crabs) ~ Year, data = horseshoe) $ p.value ## [1] 0.1467886

Horseshoe crab counts - paired t-test Could have transformed the counts, say to the square root scale effectively √ √ testing a different hypothesis, namely H 0 : E ( X ) = E ( Y ) via the t-tests. First unpaired: t.test ( sqrt (Crabs) ~ Year, data = horseshoe) $ p.value ## [1] 0.1467886 Then paired: t.test ( sqrt (Crabs) ~ Year, data = horseshoe, paired = TRUE) $ p.value ## [1] 0.02448857

Horseshoe crab counts - paired t-test Could have transformed the counts, say to the square root scale effectively √ √ testing a different hypothesis, namely H 0 : E ( X ) = E ( Y ) via the t-tests. First unpaired: t.test ( sqrt (Crabs) ~ Year, data = horseshoe) $ p.value ## [1] 0.1467886 Then paired: t.test ( sqrt (Crabs) ~ Year, data = horseshoe, paired = TRUE) $ p.value ## [1] 0.02448857 Normality seems easier to justify on the square root scale (independent samples in the first case, not so in the second).

Horseshoe crab counts - paired t-test Could have transformed the counts, say to the square root scale effectively √ √ testing a different hypothesis, namely H 0 : E ( X ) = E ( Y ) via the t-tests. First unpaired: t.test ( sqrt (Crabs) ~ Year, data = horseshoe) $ p.value ## [1] 0.1467886 Then paired: t.test ( sqrt (Crabs) ~ Year, data = horseshoe, paired = TRUE) $ p.value ## [1] 0.02448857 Normality seems easier to justify on the square root scale (independent samples in the first case, not so in the second). Normal or not, in both cases the pairing makes it easier to detect a difference.

Horseshoe crab counts - paired Wilcoxon signed rank test Alternatively, we might choose a test simply based on the signed ranks of the | z i | s. A test based on this is the Wilcoxon signed rank test .

Horseshoe crab counts - paired Wilcoxon signed rank test Alternatively, we might choose a test simply based on the signed ranks of the | z i | s. A test based on this is the Wilcoxon signed rank test . Basically it replaces the z i s by w i = sign ( z i ) × rank ( | z i | ) and uses the | w | as a discrepancy.

Horseshoe crab counts - paired Wilcoxon signed rank test Alternatively, we might choose a test simply based on the signed ranks of the | z i | s. A test based on this is the Wilcoxon signed rank test . Basically it replaces the z i s by w i = sign ( z i ) × rank ( | z i | ) and uses the | w | as a discrepancy. The Wilcoxon signed rank test is often recommended when the distribution of the differences z i is not likely to be normal.

Horseshoe crab counts - paired Wilcoxon signed rank test Alternatively, we might choose a test simply based on the signed ranks of the | z i | s. A test based on this is the Wilcoxon signed rank test . Basically it replaces the z i s by w i = sign ( z i ) × rank ( | z i | ) and uses the | w | as a discrepancy. The Wilcoxon signed rank test is often recommended when the distribution of the differences z i is not likely to be normal. This too can be carried out easily in R .

Horseshoe crab counts - paired Wilcoxon signed rank test Alternatively, we might choose a test simply based on the signed ranks of the | z i | s. A test based on this is the Wilcoxon signed rank test . Basically it replaces the z i s by w i = sign ( z i ) × rank ( | z i | ) and uses the | w | as a discrepancy. The Wilcoxon signed rank test is often recommended when the distribution of the differences z i is not likely to be normal. This too can be carried out easily in R . wilcox.test (Crabs ~ Year, data = horseshoe, paired = TRUE) $ p.value ## [1] 0.01874471

Horseshoe crab counts - paired Wilcoxon signed rank test Alternatively, we might choose a test simply based on the signed ranks of the | z i | s. A test based on this is the Wilcoxon signed rank test . Basically it replaces the z i s by w i = sign ( z i ) × rank ( | z i | ) and uses the | w | as a discrepancy. The Wilcoxon signed rank test is often recommended when the distribution of the differences z i is not likely to be normal. This too can be carried out easily in R . wilcox.test (Crabs ~ Year, data = horseshoe, paired = TRUE) $ p.value ## [1] 0.01874471 Note that here too had we first transformed to a square root scale, a different result would be had.

Horseshoe crab counts - paired Wilcoxon signed rank test Alternatively, we might choose a test simply based on the signed ranks of the | z i | s. A test based on this is the Wilcoxon signed rank test . Basically it replaces the z i s by w i = sign ( z i ) × rank ( | z i | ) and uses the | w | as a discrepancy. The Wilcoxon signed rank test is often recommended when the distribution of the differences z i is not likely to be normal. This too can be carried out easily in R . wilcox.test (Crabs ~ Year, data = horseshoe, paired = TRUE) $ p.value ## [1] 0.01874471 Note that here too had we first transformed to a square root scale, a different result would be had. wilcox.test ( sqrt (Crabs) ~ Year, data = horseshoe, paired = TRUE) $ p.value ## [1] 0.01246649

Horseshoe crab counts - paired Wilcoxon signed rank test Alternatively, we might choose a test simply based on the signed ranks of the | z i | s. A test based on this is the Wilcoxon signed rank test . Basically it replaces the z i s by w i = sign ( z i ) × rank ( | z i | ) and uses the | w | as a discrepancy. The Wilcoxon signed rank test is often recommended when the distribution of the differences z i is not likely to be normal. This too can be carried out easily in R . wilcox.test (Crabs ~ Year, data = horseshoe, paired = TRUE) $ p.value ## [1] 0.01874471 Note that here too had we first transformed to a square root scale, a different result would be had. wilcox.test ( sqrt (Crabs) ~ Year, data = horseshoe, paired = TRUE) $ p.value ## [1] 0.01246649 All of these test slightly different hypotheses.

Horseshoe crab counts - paired Wilcoxon signed rank test Alternatively, we might choose a test simply based on the signed ranks of the | z i | s. A test based on this is the Wilcoxon signed rank test . Basically it replaces the z i s by w i = sign ( z i ) × rank ( | z i | ) and uses the | w | as a discrepancy. The Wilcoxon signed rank test is often recommended when the distribution of the differences z i is not likely to be normal. This too can be carried out easily in R . wilcox.test (Crabs ~ Year, data = horseshoe, paired = TRUE) $ p.value ## [1] 0.01874471 Note that here too had we first transformed to a square root scale, a different result would be had. wilcox.test ( sqrt (Crabs) ~ Year, data = horseshoe, paired = TRUE) $ p.value ## [1] 0.01246649 All of these test slightly different hypotheses. And all paired tests are indicating evidence against the hypothesis that the corresponding expectations are identical.

Horseshoe crab counts - pairing and simulation Since the pairing brings us back to a univariate distribution, we might use the zeroed mean sample quantile function of the z i s to generate new data (from a distribution of the same shape but zero mean).

Horseshoe crab counts - pairing and simulation Since the pairing brings us back to a univariate distribution, we might use the zeroed mean sample quantile function of the z i s to generate new data (from a distribution of the same shape but zero mean). The discrepancy measure is simply d = | z | discrepancyAbsAve <- function (data) { abs ( mean (data))}

Horseshoe crab counts - pairing and simulation Since the pairing brings us back to a univariate distribution, we might use the zeroed mean sample quantile function of the z i s to generate new data (from a distribution of the same shape but zero mean). The discrepancy measure is simply d = | z | discrepancyAbsAve <- function (data) { abs ( mean (data))} And the test would be x <- horseshoe $ Crabs[horseshoe $ Year == 2011] y <- horseshoe $ Crabs[horseshoe $ Year == 2012] z <- x - y numericalTest (z, discrepancyFn = discrepancyAbsAve, generateFn = generateFromMeanShiftData) ## [1] 0.0055

Horseshoe crab counts - pairing and simulation Since the pairing brings us back to a univariate distribution, we might use the zeroed mean sample quantile function of the z i s to generate new data (from a distribution of the same shape but zero mean). The discrepancy measure is simply d = | z | discrepancyAbsAve <- function (data) { abs ( mean (data))} And the test would be x <- horseshoe $ Crabs[horseshoe $ Year == 2011] y <- horseshoe $ Crabs[horseshoe $ Year == 2012] z <- x - y numericalTest (z, discrepancyFn = discrepancyAbsAve, generateFn = generateFromMeanShiftData) ## [1] 0.0055 Again indicating (even stronger) evidence against the hypothesis H 0 : E ( Z ) = 0.

Horseshoe crab counts - pairing and simulation Since the pairing brings us back to a univariate distribution, we might use the zeroed mean sample quantile function of the z i s to generate new data (from a distribution of the same shape but zero mean). The discrepancy measure is simply d = | z | discrepancyAbsAve <- function (data) { abs ( mean (data))} And the test would be x <- horseshoe $ Crabs[horseshoe $ Year == 2011] y <- horseshoe $ Crabs[horseshoe $ Year == 2012] z <- x - y numericalTest (z, discrepancyFn = discrepancyAbsAve, generateFn = generateFromMeanShiftData) ## [1] 0.0055 Again indicating (even stronger) evidence against the hypothesis H 0 : E ( Z ) = 0. Note : This numerical test has not as yet been subjected to sufficient scrutiny for it to be generally recommended. Certainly, its value will improve with increasing n . Nevertheless, at least for exploratory work by a knowledgeable analyst, it may be helpful. C.f. Bootstrapping and permuting paired t-test type statistics (Statistics and Computing, 2014).

More typically paired data Unlike the horseshoe crab counts, ( x , y ) pairs are more typically measurements on two completely different variates. For example, ( wt , mpg ) from the mtcars dataset, and so on.

More typically paired data Unlike the horseshoe crab counts, ( x , y ) pairs are more typically measurements on two completely different variates. For example, ( wt , mpg ) from the mtcars dataset, and so on. For such data, the more interesting questions include: ◮ Are the variates correlated?

More typically paired data Unlike the horseshoe crab counts, ( x , y ) pairs are more typically measurements on two completely different variates. For example, ( wt , mpg ) from the mtcars dataset, and so on. For such data, the more interesting questions include: ◮ Are the variates correlated? A hypothesis might be H 0 : ρ X , Y = 0. ◮ Are the variates independently distributed? ◮ Can we predict the value of one variate given values of the other?

More typically paired data Unlike the horseshoe crab counts, ( x , y ) pairs are more typically measurements on two completely different variates. For example, ( wt , mpg ) from the mtcars dataset, and so on. For such data, the more interesting questions include: ◮ Are the variates correlated? A hypothesis might be H 0 : ρ X , Y = 0. ◮ Are the variates independently distributed? ◮ Can we predict the value of one variate given values of the other? ◮ Can we provide a simple summary of the dependency relation between the two variates?

More typically paired data Unlike the horseshoe crab counts, ( x , y ) pairs are more typically measurements on two completely different variates. For example, ( wt , mpg ) from the mtcars dataset, and so on. For such data, the more interesting questions include: ◮ Are the variates correlated? A hypothesis might be H 0 : ρ X , Y = 0. ◮ Are the variates independently distributed? ◮ Can we predict the value of one variate given values of the other? ◮ Can we provide a simple summary of the dependency relation between the two variates? Of these, the last two really require some modelling of the data to answer.

More typically paired data Unlike the horseshoe crab counts, ( x , y ) pairs are more typically measurements on two completely different variates. For example, ( wt , mpg ) from the mtcars dataset, and so on. For such data, the more interesting questions include: ◮ Are the variates correlated? A hypothesis might be H 0 : ρ X , Y = 0. ◮ Are the variates independently distributed? ◮ Can we predict the value of one variate given values of the other? ◮ Can we provide a simple summary of the dependency relation between the two variates? Of these, the last two really require some modelling of the data to answer. The first also requires some modelling.

More typically paired data Unlike the horseshoe crab counts, ( x , y ) pairs are more typically measurements on two completely different variates. For example, ( wt , mpg ) from the mtcars dataset, and so on. For such data, the more interesting questions include: ◮ Are the variates correlated? A hypothesis might be H 0 : ρ X , Y = 0. ◮ Are the variates independently distributed? ◮ Can we predict the value of one variate given values of the other? ◮ Can we provide a simple summary of the dependency relation between the two variates? Of these, the last two really require some modelling of the data to answer. The first also requires some modelling. The problem is it isn’t clear how to generate data under the null hypothesis that ρ X , Y = 0 without imposing some fairly detailed structure on F X , Y ( x , y ).

More typically paired data Unlike the horseshoe crab counts, ( x , y ) pairs are more typically measurements on two completely different variates. For example, ( wt , mpg ) from the mtcars dataset, and so on. For such data, the more interesting questions include: ◮ Are the variates correlated? A hypothesis might be H 0 : ρ X , Y = 0. ◮ Are the variates independently distributed? ◮ Can we predict the value of one variate given values of the other? ◮ Can we provide a simple summary of the dependency relation between the two variates? Of these, the last two really require some modelling of the data to answer. The first also requires some modelling. The problem is it isn’t clear how to generate data under the null hypothesis that ρ X , Y = 0 without imposing some fairly detailed structure on F X , Y ( x , y ). For example, if F X , Y ( x , y ) is a bivariate normal distribution, then ρ X , Y = 0 if and only if F X , Y ( x , y ) = F X ( x ) × F Y ( y ). That is, X and Y are independent ( X ⊥ ⊥ Y ).

More typically paired data Unlike the horseshoe crab counts, ( x , y ) pairs are more typically measurements on two completely different variates. For example, ( wt , mpg ) from the mtcars dataset, and so on. For such data, the more interesting questions include: ◮ Are the variates correlated? A hypothesis might be H 0 : ρ X , Y = 0. ◮ Are the variates independently distributed? ◮ Can we predict the value of one variate given values of the other? ◮ Can we provide a simple summary of the dependency relation between the two variates? Of these, the last two really require some modelling of the data to answer. The first also requires some modelling. The problem is it isn’t clear how to generate data under the null hypothesis that ρ X , Y = 0 without imposing some fairly detailed structure on F X , Y ( x , y ). For example, if F X , Y ( x , y ) is a bivariate normal distribution, then ρ X , Y = 0 if and only if F X , Y ( x , y ) = F X ( x ) × F Y ( y ). That is, X and Y are independent ( X ⊥ ⊥ Y ). Note that X ⊥ ⊥ Y = ⇒ ρ X , Y = 0, but not vice versa . The bivariate normal is the exception, not the rule.

Paired data - testing for independence The one question which can be address without posing too much additional structure on F X , Y ( x , y ) is that of independence – H 0 : X ⊥ ⊥ Y .

Paired data - testing for independence The one question which can be address without posing too much additional structure on F X , Y ( x , y ) is that of independence – H 0 : X ⊥ ⊥ Y . That is because under H 0 , any x might be paired with any y .

Paired data - testing for independence The one question which can be address without posing too much additional structure on F X , Y ( x , y ) is that of independence – H 0 : X ⊥ ⊥ Y . That is because under H 0 , any x might be paired with any y . This suggests that to generate data under H 0 we could just randomly permute the valued of one of them, say the y i s, and then pair these up with the x i s in their original order.

Paired data - testing for independence The one question which can be address without posing too much additional structure on F X , Y ( x , y ) is that of independence – H 0 : X ⊥ ⊥ Y . That is because under H 0 , any x might be paired with any y . This suggests that to generate data under H 0 we could just randomly permute the valued of one of them, say the y i s, and then pair these up with the x i s in their original order. A function which would do exactly this is mixCoords <- function (data) { # Note that data needs to have named x and a y components x <- data $ x y <- data $ y n <- length (x) stopifnot (n == length (y)) new_y <- sample (y, n, replace=FALSE) list (x=x, y=new_y) }

Paired data - testing for independence The one question which can be address without posing too much additional structure on F X , Y ( x , y ) is that of independence – H 0 : X ⊥ ⊥ Y . That is because under H 0 , any x might be paired with any y . This suggests that to generate data under H 0 we could just randomly permute the valued of one of them, say the y i s, and then pair these up with the x i s in their original order. A function which would do exactly this is mixCoords <- function (data) { # Note that data needs to have named x and a y components x <- data $ x y <- data $ y n <- length (x) stopifnot (n == length (y)) new_y <- sample (y, n, replace=FALSE) list (x=x, y=new_y) } This can then be used with whatever discrepancy measure might be proposed.

Testing for independence - continuous data For example, we could check test whether mileage ( mpg ) and car weight ( wt ) are independently distributed using mtcars . 30 25 mpg 20 15 10 2 3 4 5 wt The scatterplot in itself suggests a strong dependence, as the car weight increases the mileage goes down. A simple summary of that dependence is the least-squares fitted line.

Testing for independence - continuous data Since X ⊥ ⊥ Y = ⇒ ρ X , Y = 0, we might even use the sample correlation as a discrepancy measure.

Testing for independence - continuous data Since X ⊥ ⊥ Y = ⇒ ρ X , Y = 0, we might even use the sample correlation as a discrepancy measure. The sample correlation is � n i =1 ( x i − x )( y i − y ) � � ρ X , Y = ( � n i =1 ( x i − x ) 2 ) × ( � n . i =1 ( y i − y ) 2 )

Testing for independence - continuous data Since X ⊥ ⊥ Y = ⇒ ρ X , Y = 0, we might even use the sample correlation as a discrepancy measure. The sample correlation is � n i =1 ( x i − x )( y i − y ) � � ρ X , Y = ( � n i =1 ( x i − x ) 2 ) × ( � n . i =1 ( y i − y ) 2 ) ⊥ Y then � If X ⊥ ρ X , Y should be near zero (since ρ X , Y must be). The larger is | � ρ X , Y | the greater the evidence against the hypothesis.

Testing for independence - continuous data Since X ⊥ ⊥ Y = ⇒ ρ X , Y = 0, we might even use the sample correlation as a discrepancy measure. The sample correlation is � n i =1 ( x i − x )( y i − y ) � � ρ X , Y = ( � n i =1 ( x i − x ) 2 ) × ( � n . i =1 ( y i − y ) 2 ) ⊥ Y then � If X ⊥ ρ X , Y should be near zero (since ρ X , Y must be). The larger is | � ρ X , Y | the greater the evidence against the hypothesis. correlationDiscrepancy <- function (data) { # Note that data needs to have named x and a y components x <- data $ x y <- data $ y abs ( cor (x,y)) }

Testing for independence - continuous data Since X ⊥ ⊥ Y = ⇒ ρ X , Y = 0, we might even use the sample correlation as a discrepancy measure. The sample correlation is � n i =1 ( x i − x )( y i − y ) � � ρ X , Y = ( � n i =1 ( x i − x ) 2 ) × ( � n . i =1 ( y i − y ) 2 ) ⊥ Y then � If X ⊥ ρ X , Y should be near zero (since ρ X , Y must be). The larger is | � ρ X , Y | the greater the evidence against the hypothesis. correlationDiscrepancy <- function (data) { # Note that data needs to have named x and a y components x <- data $ x y <- data $ y abs ( cor (x,y)) } Correlation chases an linear relationship. Not surprisingly then, we might also � � � � � � consider fitting a straight line y = α + β + r to the data pairs and using β � as the discrepancy.

Testing for independence - continuous data Since X ⊥ ⊥ Y = ⇒ ρ X , Y = 0, we might even use the sample correlation as a discrepancy measure. The sample correlation is � n i =1 ( x i − x )( y i − y ) � � ρ X , Y = ( � n i =1 ( x i − x ) 2 ) × ( � n . i =1 ( y i − y ) 2 ) ⊥ Y then � If X ⊥ ρ X , Y should be near zero (since ρ X , Y must be). The larger is | � ρ X , Y | the greater the evidence against the hypothesis. correlationDiscrepancy <- function (data) { # Note that data needs to have named x and a y components x <- data $ x y <- data $ y abs ( cor (x,y)) } Correlation chases an linear relationship. Not surprisingly then, we might also � � � � � � consider fitting a straight line y = α + β + r to the data pairs and using β � as the discrepancy. slopeDiscrepancy <- function (data) { # Note that data needs to have named x and a y components fit <- lm (y ~ x, data = data) abs (fit $ coef[2]) }

Testing for independence - continuous data Using these discrepancies, and generating data under the assumption of independence, a numerical test whether of the independence of mileage ( mpg ) and car weight ( wt ) using mtcars is straightforward. Using correlation as discrepancy: numericalTest ( list (x = mtcars $ wt, y = mtcars $ mpg), discrepancyFn = correlationDiscrepancy, generateFn = mixCoords) ## [1] 0

Testing for independence - continuous data Using these discrepancies, and generating data under the assumption of independence, a numerical test whether of the independence of mileage ( mpg ) and car weight ( wt ) using mtcars is straightforward. Using correlation as discrepancy: numericalTest ( list (x = mtcars $ wt, y = mtcars $ mpg), discrepancyFn = correlationDiscrepancy, generateFn = mixCoords) ## [1] 0 Using the estimate slope as discrepancy: numericalTest ( list (x = mtcars $ wt, y = mtcars $ mpg), discrepancyFn = slopeDiscrepancy, generateFn = mixCoords) ## [1] 0 Both give extraordinarily strong evidence against independence!

Testing for independence - problems with correlation Of course, this test is based on using discrepancy measures that essentially test for a straight line relationship as the departure from H 0 : X ⊥ ⊥ Y .

Testing for independence - problems with correlation Of course, this test is based on using discrepancy measures that essentially test for a straight line relationship as the departure from H 0 : X ⊥ ⊥ Y . This may not be the best test.

Testing for independence - problems with correlation Of course, this test is based on using discrepancy measures that essentially test for a straight line relationship as the departure from H 0 : X ⊥ ⊥ Y . This may not be the best test. For example, the following pairs of variates all have the same correlation , the same slope estimate , the same R 2 , . . . (Anscombe, 1973) but show very different departures from independence.

Testing for independence - problems with correlation Of course, this test is based on using discrepancy measures that essentially test for a straight line relationship as the departure from H 0 : X ⊥ ⊥ Y . This may not be the best test. For example, the following pairs of variates all have the same correlation , the same slope estimate , the same R 2 , . . . (Anscombe, 1973) but show very different departures from independence. r = 0.82 r = 0.82 11 9 10 8 9 7 8 y y 6 7 5 6 4 5 3 4 4 6 8 10 12 14 4 6 8 10 12 14 x x r = 0.82 r = 0.82 12 12 10 10 y y 8 8 6 6 4 6 8 10 12 14 8 10 12 14 16 18 x x

Testing for independence - problems with correlation Moreover, it is easy to construct situations where ρ X , Y = 0 where it is obvious that X ⊥ ⊥ / Y !

Testing for independence - problems with correlation Moreover, it is easy to construct situations where ρ X , Y = 0 where it is obvious that X ⊥ ⊥ / Y ! r = 0 r = 0 2500 40 2000 20 1500 y y 0 1000 −20 500 −40 0 −40 −20 0 20 40 −40 −20 0 20 40 x x No correlation, but clearly a relationship between X and Y !

Testing for independence - choice of discrepancy measure It is important to understand what kind of departure from the null hypothesis is actually being quantified by the chosen discrepancy measure.

Testing for independence - choice of discrepancy measure It is important to understand what kind of departure from the null hypothesis is actually being quantified by the chosen discrepancy measure. With a hypothesis like H 0 : X ⊥ ⊥ Y , there infinitely many ways in which X ⊥ ⊥ / Y might occur. Expecting a single numerical discrepancy measure to capture them all seems a bit naive.

Testing for independence - choice of discrepancy measure It is important to understand what kind of departure from the null hypothesis is actually being quantified by the chosen discrepancy measure. With a hypothesis like H 0 : X ⊥ ⊥ Y , there infinitely many ways in which X ⊥ ⊥ / Y might occur. Expecting a single numerical discrepancy measure to capture them all seems a bit naive. We need some idea of the sort of departure might matter most to us.

Testing for independence - choice of discrepancy measure It is important to understand what kind of departure from the null hypothesis is actually being quantified by the chosen discrepancy measure. With a hypothesis like H 0 : X ⊥ ⊥ Y , there infinitely many ways in which X ⊥ ⊥ / Y might occur. Expecting a single numerical discrepancy measure to capture them all seems a bit naive. We need some idea of the sort of departure might matter most to us. Correlation and straight lines capture a sort of simple dependence of the mean of one variate on the other. More generally, if X ⊥ ⊥ Y , then µ ( x ) = E ( Y | X = x ) = E ( Y ) .

Testing for independence - choice of discrepancy measure It is important to understand what kind of departure from the null hypothesis is actually being quantified by the chosen discrepancy measure. With a hypothesis like H 0 : X ⊥ ⊥ Y , there infinitely many ways in which X ⊥ ⊥ / Y might occur. Expecting a single numerical discrepancy measure to capture them all seems a bit naive. We need some idea of the sort of departure might matter most to us. Correlation and straight lines capture a sort of simple dependence of the mean of one variate on the other. More generally, if X ⊥ ⊥ Y , then µ ( x ) = E ( Y | X = x ) = E ( Y ) . That is, the average of y values should be the same no matter what their x values. The straight line model (and correlation) simply restrict consideration to the possibility that µ ( x ) = α + β x

Testing for independence - choice of discrepancy measure It is important to understand what kind of departure from the null hypothesis is actually being quantified by the chosen discrepancy measure. With a hypothesis like H 0 : X ⊥ ⊥ Y , there infinitely many ways in which X ⊥ ⊥ / Y might occur. Expecting a single numerical discrepancy measure to capture them all seems a bit naive. We need some idea of the sort of departure might matter most to us. Correlation and straight lines capture a sort of simple dependence of the mean of one variate on the other. More generally, if X ⊥ ⊥ Y , then µ ( x ) = E ( Y | X = x ) = E ( Y ) . That is, the average of y values should be the same no matter what their x values. The straight line model (and correlation) simply restrict consideration to the possibility that µ ( x ) = α + β x which will not depend on the value of x iff β = 0, hence the discrepancy � � � � � � measure β � .

Testing for independence - choice of discrepancy measure It is important to understand what kind of departure from the null hypothesis is actually being quantified by the chosen discrepancy measure. With a hypothesis like H 0 : X ⊥ ⊥ Y , there infinitely many ways in which X ⊥ ⊥ / Y might occur. Expecting a single numerical discrepancy measure to capture them all seems a bit naive. We need some idea of the sort of departure might matter most to us. Correlation and straight lines capture a sort of simple dependence of the mean of one variate on the other. More generally, if X ⊥ ⊥ Y , then µ ( x ) = E ( Y | X = x ) = E ( Y ) . That is, the average of y values should be the same no matter what their x values. The straight line model (and correlation) simply restrict consideration to the possibility that µ ( x ) = α + β x which will not depend on the value of x iff β = 0, hence the discrepancy � � � � � � � . ( � measure β ρ X , Y sort of simultaneously assesses whether E ( X | Y = y ) = E ( X ), treating X and Y symmetrically.)

Testing for independence - a loess discrepancy measure If departures from independence by measured by mean dependency µ ( x ) = E ( Y | X = x ) is of some import, then we might want to choose a more flexible model than a simple straight line.

Testing for independence - a loess discrepancy measure If departures from independence by measured by mean dependency µ ( x ) = E ( Y | X = x ) is of some import, then we might want to choose a more flexible model than a simple straight line. We might for example choose a smooth model for µ ( x ) like that given by loess() .

Testing for independence - a loess discrepancy measure If departures from independence by measured by mean dependency µ ( x ) = E ( Y | X = x ) is of some import, then we might want to choose a more flexible model than a simple straight line. We might for example choose a smooth model for µ ( x ) like that given by loess() . One such discrepancy measure would be loessDiscrepancy <- function (data){ fit <- loess (y ~ x, data = data, # default span family = "symmetric") # robust family sd (fit $ fitted) /sd (data $ y) # proportion of sd of ys explained by mu } The larger this is, the better the fit is of the smooth µ ( x ) is to y .

Testing for independence - a loess discrepancy measure If departures from independence by measured by mean dependency µ ( x ) = E ( Y | X = x ) is of some import, then we might want to choose a more flexible model than a simple straight line. We might for example choose a smooth model for µ ( x ) like that given by loess() . One such discrepancy measure would be loessDiscrepancy <- function (data){ fit <- loess (y ~ x, data = data, # default span family = "symmetric") # robust family sd (fit $ fitted) /sd (data $ y) # proportion of sd of ys explained by mu } The larger this is, the better the fit is of the smooth µ ( x ) is to y . Note: ◮ using a ratio of standard deviations instead of usual variance ◮ could have used square roots of sums of squares instead ◮ when using mixCoords() the sd(y) never changes so could just use fitted values alone

Loess discrepancy measure - examples Mileage versus weight ( mtcars ): least−squares line loess smooth 30 25 mpg 20 15 10 2 3 4 5 wt

Loess discrepancy measure - examples Mileage versus weight ( mtcars ): least−squares line loess smooth 30 25 mpg 20 15 10 2 3 4 5 wt numericalTest ( list (x = mtcars $ wt, y = mtcars $ mpg), discrepancyFn = loessDiscrepancy, generateFn = mixCoords) ## [1] 0.001

Loess discrepancy measure - examples Horseshoe crabs Horseshoe crab counts 120000 80000 2012 60000 40000 20000 least−squares line loess smooth 0 0 50000 100000 150000 200000 250000 300000 350000 2011

Loess discrepancy measure - examples Horseshoe crabs Horseshoe crab counts 120000 80000 2012 60000 40000 20000 least−squares line loess smooth 0 0 50000 100000 150000 200000 250000 300000 350000 2011 y2011 <- horseshoe $ Year == 2011 y2012 <- horseshoe $ Year == 2012 numericalTest ( list (x = horseshoe $ Crabs[y2011], y = horseshoe $ Crabs[y2012]), discrepancyFn = loessDiscrepancy, generateFn = mixCoords) ## [1] 0.0055

Loess discrepancy measure - examples Horseshoe crabs - square root scale Horseshoe crab counts 350 300 250 200 2012 150 100 least−squares line 50 loess smooth 0 100 200 300 400 500 600 2011

Loess discrepancy measure - examples Horseshoe crabs - square root scale Horseshoe crab counts 350 300 250 200 2012 150 100 least−squares line 50 loess smooth 0 100 200 300 400 500 600 2011 y2011 <- horseshoe $ Year == 2011 y2012 <- horseshoe $ Year == 2012 numericalTest ( list (x = sqrt (horseshoe $ Crabs[y2011]), y = sqrt (horseshoe $ Crabs[y2012])), discrepancyFn = loessDiscrepancy, generateFn = mixCoords) ## [1] 5e-04

Loess discrepancy measure - examples Facebook data Facebook 3 log10(like + 1) 2 1 least−squares line loess smooth 0 3.0 3.5 4.0 4.5 5.0 5.5 6.0 log10(Impressions + 1)

Loess discrepancy measure - examples Facebook data Facebook 3 log10(like + 1) 2 1 least−squares line loess smooth 0 3.0 3.5 4.0 4.5 5.0 5.5 6.0 log10(Impressions + 1) numericalTest ( list (x = log10 (facebook $ Impressions + 1), y = log10 (facebook $ like + 1)), discrepancyFn = loessDiscrepancy, generateFn = mixCoords) ## [1] NA

Graphical discrepancy measures How about a simple scatterplot?

Graphical discrepancy measures How about a simple scatterplot? showScatter <- function (data, subjectNo) { plot (y ~ x, data = data, main = paste (subjectNo), cex.main = 4, xlab = "", ylab = "", pch = 19, cex = 2, col = adjustcolor ("steelblue", 0.75), xaxt = "n", yaxt = "n" # turn off axes ) }

Graphical discrepancy measures How about a simple scatterplot? showScatter <- function (data, subjectNo) { plot (y ~ x, data = data, main = paste (subjectNo), cex.main = 4, xlab = "", ylab = "", pch = 19, cex = 2, col = adjustcolor ("steelblue", 0.75), xaxt = "n", yaxt = "n" # turn off axes ) } The line up test would then be data <- list (x = mtcars $ wt, y = mtcars $ mpg) lineup (data, generateSubject = mixCoords, showSubject = showScatter, layout= c (4, 5))

Graphical discrepancy measures - simple scatterplot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 True Location: log(1.25823822102331e+107, base=21) - 68

Hypotheses with two variates Paired data R.W. Oldford Common - PowerPoint PPT Presentation

Hypotheses with two variates Paired data R.W. Oldford Common hypotheses Recall some common circumstances and hypotheses: Hypotheses about the distribution of a single random variable Y for which we have sample values y 1 , y 2 , . . . y n .

Hypotheses with two variates Two sample hypotheses R.W. Oldford Common hypotheses Recall some

13. hypothesis testing 1 competing hypotheses 2 competing hypotheses 3 competing hypotheses

Verifying Test Hypotheses - HOL/TestGen An Experiment in Test and Proof Thomas Malcher January

Overview Two-Part MDL Two-Part MDL Two-Part MDL for Two-Part MDL for Grammar Learning

Some simple hypotheses to be Some simple hypotheses to be tested by IBOY-DIWPA data Takakazu

Generating Hypotheses by Generating Hypotheses by Discovering Implicit Associations in

Evaluating Hypotheses IEEE Expert, October 1996 1 Evaluating Hypotheses Sample error, true

Business Statistics CONTENTS A hypothesis test Hypotheses Rejection region and significance

Learning Logically Defined Hypotheses Martin Grohe RWTH Aachen Outline I. A Declarative

Fictions Functions: Three Data-Driven Hypotheses Andrew Piper, McGill University How can we

Efficiency-Improvement Techniques Overview Reading: Ch. 11 in Law & Ch. 10 in Handbook of

k -variates++: Poster #29, Mon. 3-7pm more pluses in the k -means++ Richard Nock , Raphal

Course 02402 Overview, Hypotheses Tests Concerning Two Means Introduction to Statistics

Chapter 9: Testing Hypotheses In this chapter we will cover: 1. Hypothesis tests ( 9.1, 9.2

1 Z-Score Test for Comparing One-sided vs Two-sided Tests Learned Hypotheses Assumes h 1 is

Course 02402 Overview, Hypotheses Concerning Means Introduction to Statistics Motivating Example

The Evaluation Issues The accuracy of a classifier can be evaluated using a test data set

Sigma models in algebraic QFT Local Quantum Physics and Beyond In Memoriam Rudolf Haag Hamburg

T h e S t a t i s t i c a l L e a r n i n g T h e o r y i n P r a

R-flux string sigma model and algebroid duality on Lie 3-algebroids Marc Andre Heller Tohoku

Replication, Preregistration & Open Science Why most published research findings are false

Events in Magnetized GAr Tom Junk DUNE ND Meeting January 11, 2018 The question: What fraction

ex 1. compare and test : conditions Aside: save conditions b,a computes a - b , sets flags,

Problem: Finding Duplicate Elements Given a set of objects

Hypotheses with two variates Paired data R.W. Oldford Common - PowerPoint PPT Presentation

Hypotheses with two variates Paired data R.W. Oldford Common hypotheses Recall some common circumstances and hypotheses: Hypotheses about the distribution of a single random variable Y for which we have sample values y 1 , y 2 , . . . y n .

Hypotheses with two variates Two sample hypotheses R.W. Oldford Common hypotheses Recall some

13. hypothesis testing 1 competing hypotheses 2 competing hypotheses 3 competing hypotheses

Verifying Test Hypotheses - HOL/TestGen An Experiment in Test and Proof Thomas Malcher January

Overview Two-Part MDL Two-Part MDL Two-Part MDL for Two-Part MDL for Grammar Learning

Some simple hypotheses to be Some simple hypotheses to be tested by IBOY-DIWPA data Takakazu

Generating Hypotheses by Generating Hypotheses by Discovering Implicit Associations in

Evaluating Hypotheses IEEE Expert, October 1996 1 Evaluating Hypotheses Sample error, true

Business Statistics CONTENTS A hypothesis test Hypotheses Rejection region and significance

Learning Logically Defined Hypotheses Martin Grohe RWTH Aachen Outline I. A Declarative

Fictions Functions: Three Data-Driven Hypotheses Andrew Piper, McGill University How can we

Efficiency-Improvement Techniques Overview Reading: Ch. 11 in Law &amp; Ch. 10 in Handbook of

k -variates++: Poster #29, Mon. 3-7pm more pluses in the k -means++ Richard Nock , Raphal

Course 02402 Overview, Hypotheses Tests Concerning Two Means Introduction to Statistics

Chapter 9: Testing Hypotheses In this chapter we will cover: 1. Hypothesis tests ( 9.1, 9.2

1 Z-Score Test for Comparing One-sided vs Two-sided Tests Learned Hypotheses Assumes h 1 is

Course 02402 Overview, Hypotheses Concerning Means Introduction to Statistics Motivating Example

The Evaluation Issues The accuracy of a classifier can be evaluated using a test data set

Sigma models in algebraic QFT Local Quantum Physics and Beyond In Memoriam Rudolf Haag Hamburg

T h e S t a t i s t i c a l L e a r n i n g T h e o r y i n P r a

R-flux string sigma model and algebroid duality on Lie 3-algebroids Marc Andre Heller Tohoku

Replication, Preregistration &amp; Open Science Why most published research findings are false

Events in Magnetized GAr Tom Junk DUNE ND Meeting January 11, 2018 The question: What fraction

ex 1. compare and test : conditions Aside: save conditions b,a computes a - b , sets flags,

Problem: Finding Duplicate Elements Given a set of objects

Efficiency-Improvement Techniques Overview Reading: Ch. 11 in Law & Ch. 10 in Handbook of

Replication, Preregistration & Open Science Why most published research findings are false