tests for multivariate means
play

Tests for Multivariate Means Max Turgeon STAT 7200Multivariate - PowerPoint PPT Presentation

Tests for Multivariate Means Max Turgeon STAT 7200Multivariate Statistics Objectives Construct tests for a single multivariate mean Discuss and compare confidence regions and confidence intervals Describe connection with


  1. Tests for Multivariate Means Max Turgeon STAT 7200–Multivariate Statistics

  2. Objectives • Construct tests for a single multivariate mean • Discuss and compare confidence regions and confidence intervals • Describe connection with Likelihood Ratio Test • Construct tests for two multivariate means • Present robust alternatives to these tests 2

  3. 3 reject the null hypothesis if • This means that • We saw in a previous lecture that Test for a multivariate mean: Σ known • Let Y 1 , . . . , Y n ∼ N p ( µ, Σ) be independent. µ, 1 � � ¯ Y ∼ N p n Σ . n ( ¯ Y − µ ) T Σ − 1 ( ¯ Y − µ ) ∼ χ 2 ( p ) . • In particular, if we want to test H 0 : µ = µ 0 at level α , then we n ( ¯ Y − µ 0 ) T Σ − 1 ( ¯ Y − µ 0 ) > χ 2 α ( p ) .

  4. Example i library (dslabs) library (tidyverse) dataset <- filter (gapminder, year == 2012, !is.na (infant_mortality)) dataset <- dataset[, c (”infant_mortality”, ”life_expectancy”, ”fertility”)] 4 dataset <- as.matrix (dataset)

  5. Example ii dim (dataset) ## [1] 178 3 # Assume we know Sigma Sigma <- matrix ( c (555, -170, 30, -170, 65, -10, 30, -10, 2), ncol = 3) mu_hat 5 mu_hat <- colMeans (dataset)

  6. Example iii ## infant_mortality life_expectancy fertility ## 25.824157 71.308427 2.868933 # Test mu = mu_0 mu_0 <- c (25, 50, 3) solve (Sigma) %*% (mu_hat - mu_0) c ( drop (test_statistic), qchisq (0.95, df = 3)) ## [1] 7153.275387 7.814728 6 test_statistic <- nrow (dataset) * t (mu_hat - mu_0) %*%

  7. Example iv drop (test_statistic) > qchisq (0.95, df = 3) ## [1] TRUE 7

  8. 8 null hypothesis if Test for a multivariate mean: Σ unknown i • Of course, we rarely (if ever) know Σ , and so we use its MLE n Σ = 1 ˆ ( Y i − ¯ Y )( Y i − ¯ Y ) T � n i =1 or the sample covariance S n . • Therefore, to test H 0 : µ = µ 0 at level α , then we reject the T 2 = n ( ¯ n ( ¯ Y − µ 0 ) T S − 1 Y − µ 0 ) > c, for a suitably chosen constant c that depends on α . • Note : The test statistic T 2 is known as Hotelling’s T 2 .

  9. 9 Test for a multivariate mean: Σ unknown ii • We will show that (under H 0 ) T 2 has a simple distribution: T 2 ∼ ( n − 1) p ( n − p ) F ( p, n − p ) . • In other words, we reject the null hypothesis at level α if T 2 > ( n − 1) p ( n − p ) F α ( p, n − p ) .

  10. Example (revisited) i n <- nrow (dataset); p <- ncol (dataset) # Test mu = mu_0 mu_0 <- c (25, 50, 3) solve ( cov (dataset)) %*% (mu_hat - mu_0) df2 = n - p) / (n - p) 10 test_statistic <- n * t (mu_hat - mu_0) %*% critical_val <- (n - 1) * p *qf (0.95, df1 = p,

  11. Example (revisited) ii c ( drop (test_statistic), critical_val) ## [1] 5121.461370 8.059773 drop (test_statistic) > critical_val ## [1] TRUE 11

  12. We will prove a more general result that we will also be useful for more than one multivariate mean. Theorem independent. Define Then degrees of freedom. 12 Distribution of T 2 Let Y ∼ N p (0 , Σ) , let mW ∼ W p ( m, Σ) , and assume Y , W are T 2 = m Y T W − 1 Y . m − p + 1 T 2 ∼ F ( p, m − p + 1) , mp where F ( α, β ) denotes the non-central F -distribution with α, β

  13. Proof i • In other words, without loss of generality, we can assume • The other columns can be chosen by finding a basis for the obtain an orthonormal basis. 13 • First, if we write Σ = LL T , we can replace Y by L − 1 Y and W with ( L − 1 ) T W ( L − 1 ) without changing T 2 . Σ = I p . • Now, note that since Y and W are independent, the conditional distribution of mW given Y is also W p ( m, I p ) . • Consider Y a fixed quantity, and let H be an orthogonal matrix whose first column is Y ( Y T Y ) − 1 / 2 . orthogonal complement of Y and applying Gram-Schmidt to

  14. Proof ii 14 • Define V = H T WH . Conditional on Y , this is still distributed as 1 m W p ( m, I p ) . • This distribution does not depend on Y , and therefore V and Y are independent. • Decompose V as such:    v 11 V 12  , V 21 V 22 where v 11 is a (random) scalar.

  15. Proof iii • We now have 15 • By result A.2.4g of MKB (see supplementary materials), the (1 , 1) element of V − 1 is given by v − 1 11 | 2 = ( v 11 − V 12 V − 1 22 V 21 ) − 1 . • Moreover, note that v 11 | 2 ∼ χ 2 ( m − p + 1) . 1 mT 2 = Y T W − 1 Y = ( H T Y ) T ( H T WH ) − 1 ( H T Y ) = ( H T Y ) T ( V ) − 1 ( H T Y ) = ( Y T Y ) 1 / 2 v − 1 11 | 2 ( Y T Y ) 1 / 2 = ( Y T Y ) /v 11 | 2 .

  16. Proof iv • Therefore, we have 16 independent chi-squares. • In other words, we have expressed 1 m T 2 as a ratio of m − p + 1 T 2 = � � � � ( Y T Y ) /p / v 11 | 2 / ( m − p + 1) mp ∼ F ( p, m − p + 1) .

  17. • Analogously to the univariate setting, it may be more informative to look at a confidence region : 17 Confidence region for µ i • The set of values µ 0 ∈ R p that are supported by the data, i.e. whose corresponding null hypothesis H 0 : µ = µ 0 would be rejected at level α . • Let c 2 = ( n − 1) p ( n − p ) F α ( p, n − p ) . A 100(1 − α ) % confidence region for µ is given by the ellipsoid around ¯ Y such that n ( ¯ n ( ¯ Y − µ ) T S − 1 Y − µ ) < c 2 , µ ∈ R p .

  18. • We can describe the confidence region in terms of the of unit length. axes 18 Confidence region for µ ii eigendecomposition of S n : let λ 1 ≥ · · · ≥ λ p be its eigenvalues, and let v 1 , . . . , v p be corresponding eigenvectors • The confidence region is the ellipsoid centered around ¯ Y with � ± c λ i v i .

  19. • Therefore, we first need to project onto an axis or onto the plane. 19 Visualizing confidence regions when p > 2 i • When p > 2 we cannot easily plot the confidence regions. • Theorem : Let c > 0 be a constant and A a p × p positive definite matrix. For a given vector u ̸ = 0 , the projection of the ellipse { y T A − 1 y ≤ c 2 } onto u is given by √ u T A u c u . u T u

  20. 20 Visualizing confidence regions when p > 2 ii • If we take u to be the standard unit vectors, we get confidence intervals for each component of µ : � � ( n − 1) p � LB = ¯ � ( n − p ) F α ( p, n − p )( s 2 Y j − jj /n ) � � ( n − 1) p � UB = ¯ � ( n − p ) F α ( p, n − p )( s 2 Y j + jj /n ) .

  21. Example i n <- nrow (dataset); p <- ncol (dataset) df2 = n - p) / (n - p) cbind (mu_hat - sqrt (critical_val * sample_cov / n), mu_hat + sqrt (critical_val * sample_cov / n)) 21 critical_val <- (n - 1) * p *qf (0.95, df1 = p, sample_cov <- diag ( cov (dataset))

  22. Example ii ## [,1] [,2] ## infant_mortality 20.801776 30.846538 ## life_expectancy 69.561973 73.054881 ## fertility 2.565608 3.172257 22

  23. definite matrix. For a given pair of perpendicular unit vectors 23 Visualizing confidence regions when p > 2 (cont’d) i • Theorem : Let c > 0 be a constant and A a p × p positive u 1 , u 2 , the projection of the ellipse { y T A − 1 y ≤ c 2 } onto the plane defined by u 1 , u 2 is given by � ( U T y ) T ( U T AU ) − 1 ( U T y ) ≤ c 2 � , where U = ( u 1 , u 2 ) .

  24. Example (cont’d) i U <- matrix ( c (1, 0, 0, 0, 1, 0), ncol = 2) R <- n *solve ( t (U) %*% cov (dataset) %*% U) 24 transf <- chol (R)

  25. Example (cont’d) ii # First create a circle of radius c theta_vect <- seq (0, 2 * pi, length.out = 100) sin (theta_vect)) # Then turn into ellipse ellipse <- circle %*% t ( solve (transf)) + matrix (mu_hat[1 : 2], ncol = 2, nrow = nrow (circle), byrow = TRUE) 25 circle <- sqrt (critical_val) * cbind ( cos (theta_vect),

  26. Example (cont’d) iii # Eigendecomposition # To visualize the principal axes decomp <- eigen ( t (U) %*% cov (dataset) %*% U) decomp $ vectors[,1] * sqrt (critical_val) decomp $ vectors[,2] * sqrt (critical_val) 26 first <- sqrt (decomp $ values[1]) * second <- sqrt (decomp $ values[2]) *

  27. 27 Example (cont’d) iv 74 73 72 life_expectancy 71 70 69 22 24 26 28 30 infant_mortality

  28. Simultaneous Confidence Statements i following confidence interval: 28 • Let w ∈ R p . We are interested in constructing confidence intervals for w T µ that are simultaneously valid (i.e. right coverage probability) for all w . • Note that w T ¯ Y and w T S n w are both scalars. • If we were only interested in a particular w , we could use the � � w T ¯ � Y ± t α/ 2 ,n − 1 w T S n w/n .

  29. Simultaneous Confidence Statements ii • Or equivalently, the confidence interval contains the set of 29 values w T µ for which t 2 ( w ) = n ( w T ¯ = n ( w T ( ¯ Y − w T µ ) 2 Y − µ )) 2 ≤ F α (1 , n − 1) . w T S n w w T S n w • Strategy : Maximise over all w : n ( w T ( ¯ Y − µ )) 2 t 2 ( w ) = max max . w T S n w w w

  30. Simultaneous Confidence Statements iii • Using the Cauchy-Schwarz Inequality: 30 Y − µ )) 2 = ( w T S 1 / 2 ( w T ( ¯ n S − 1 / 2 ( ¯ Y − µ )) 2 n = (( S 1 / 2 n w ) T ( S − 1 / 2 ( ¯ Y − µ ))) 2 n ≤ ( w T S n w )(( ¯ Y − µ ) T S − 1 n ( ¯ Y − µ )) . • Dividing both sides by w T S n w/n , we get t 2 ( w ) ≤ n ( ¯ Y − µ ) T S − 1 n ( ¯ Y − µ ) .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend