Tests for Multivariate Means Max Turgeon STAT 7200Multivariate - PowerPoint PPT Presentation

Tests for Multivariate Means Max Turgeon STAT 7200–Multivariate Statistics

Objectives • Construct tests for a single multivariate mean • Discuss and compare confidence regions and confidence intervals • Describe connection with Likelihood Ratio Test • Construct tests for two multivariate means • Present robust alternatives to these tests 2

3 reject the null hypothesis if • This means that • We saw in a previous lecture that Test for a multivariate mean: Σ known • Let Y 1 , . . . , Y n ∼ N p ( µ, Σ) be independent. µ, 1 � � ¯ Y ∼ N p n Σ . n ( ¯ Y − µ ) T Σ − 1 ( ¯ Y − µ ) ∼ χ 2 ( p ) . • In particular, if we want to test H 0 : µ = µ 0 at level α , then we n ( ¯ Y − µ 0 ) T Σ − 1 ( ¯ Y − µ 0 ) > χ 2 α ( p ) .

Example i library (dslabs) library (tidyverse) dataset <- filter (gapminder, year == 2012, !is.na (infant_mortality)) dataset <- dataset[, c (”infant_mortality”, ”life_expectancy”, ”fertility”)] 4 dataset <- as.matrix (dataset)

Example ii dim (dataset) ## [1] 178 3 # Assume we know Sigma Sigma <- matrix ( c (555, -170, 30, -170, 65, -10, 30, -10, 2), ncol = 3) mu_hat 5 mu_hat <- colMeans (dataset)

Example iii ## infant_mortality life_expectancy fertility ## 25.824157 71.308427 2.868933 # Test mu = mu_0 mu_0 <- c (25, 50, 3) solve (Sigma) %*% (mu_hat - mu_0) c ( drop (test_statistic), qchisq (0.95, df = 3)) ## [1] 7153.275387 7.814728 6 test_statistic <- nrow (dataset) * t (mu_hat - mu_0) %*%

Example iv drop (test_statistic) > qchisq (0.95, df = 3) ## [1] TRUE 7

8 null hypothesis if Test for a multivariate mean: Σ unknown i • Of course, we rarely (if ever) know Σ , and so we use its MLE n Σ = 1 ˆ ( Y i − ¯ Y )( Y i − ¯ Y ) T � n i =1 or the sample covariance S n . • Therefore, to test H 0 : µ = µ 0 at level α , then we reject the T 2 = n ( ¯ n ( ¯ Y − µ 0 ) T S − 1 Y − µ 0 ) > c, for a suitably chosen constant c that depends on α . • Note : The test statistic T 2 is known as Hotelling’s T 2 .

9 Test for a multivariate mean: Σ unknown ii • We will show that (under H 0 ) T 2 has a simple distribution: T 2 ∼ ( n − 1) p ( n − p ) F ( p, n − p ) . • In other words, we reject the null hypothesis at level α if T 2 > ( n − 1) p ( n − p ) F α ( p, n − p ) .

Example (revisited) i n <- nrow (dataset); p <- ncol (dataset) # Test mu = mu_0 mu_0 <- c (25, 50, 3) solve ( cov (dataset)) %*% (mu_hat - mu_0) df2 = n - p) / (n - p) 10 test_statistic <- n * t (mu_hat - mu_0) %*% critical_val <- (n - 1) * p *qf (0.95, df1 = p,

Example (revisited) ii c ( drop (test_statistic), critical_val) ## [1] 5121.461370 8.059773 drop (test_statistic) > critical_val ## [1] TRUE 11

We will prove a more general result that we will also be useful for more than one multivariate mean. Theorem independent. Define Then degrees of freedom. 12 Distribution of T 2 Let Y ∼ N p (0 , Σ) , let mW ∼ W p ( m, Σ) , and assume Y , W are T 2 = m Y T W − 1 Y . m − p + 1 T 2 ∼ F ( p, m − p + 1) , mp where F ( α, β ) denotes the non-central F -distribution with α, β

Proof i • In other words, without loss of generality, we can assume • The other columns can be chosen by finding a basis for the obtain an orthonormal basis. 13 • First, if we write Σ = LL T , we can replace Y by L − 1 Y and W with ( L − 1 ) T W ( L − 1 ) without changing T 2 . Σ = I p . • Now, note that since Y and W are independent, the conditional distribution of mW given Y is also W p ( m, I p ) . • Consider Y a fixed quantity, and let H be an orthogonal matrix whose first column is Y ( Y T Y ) − 1 / 2 . orthogonal complement of Y and applying Gram-Schmidt to

Proof ii 14 • Define V = H T WH . Conditional on Y , this is still distributed as 1 m W p ( m, I p ) . • This distribution does not depend on Y , and therefore V and Y are independent. • Decompose V as such:    v 11 V 12  , V 21 V 22 where v 11 is a (random) scalar.

Proof iii • We now have 15 • By result A.2.4g of MKB (see supplementary materials), the (1 , 1) element of V − 1 is given by v − 1 11 | 2 = ( v 11 − V 12 V − 1 22 V 21 ) − 1 . • Moreover, note that v 11 | 2 ∼ χ 2 ( m − p + 1) . 1 mT 2 = Y T W − 1 Y = ( H T Y ) T ( H T WH ) − 1 ( H T Y ) = ( H T Y ) T ( V ) − 1 ( H T Y ) = ( Y T Y ) 1 / 2 v − 1 11 | 2 ( Y T Y ) 1 / 2 = ( Y T Y ) /v 11 | 2 .

Proof iv • Therefore, we have 16 independent chi-squares. • In other words, we have expressed 1 m T 2 as a ratio of m − p + 1 T 2 = � � � � ( Y T Y ) /p / v 11 | 2 / ( m − p + 1) mp ∼ F ( p, m − p + 1) .

• Analogously to the univariate setting, it may be more informative to look at a confidence region : 17 Confidence region for µ i • The set of values µ 0 ∈ R p that are supported by the data, i.e. whose corresponding null hypothesis H 0 : µ = µ 0 would be rejected at level α . • Let c 2 = ( n − 1) p ( n − p ) F α ( p, n − p ) . A 100(1 − α ) % confidence region for µ is given by the ellipsoid around ¯ Y such that n ( ¯ n ( ¯ Y − µ ) T S − 1 Y − µ ) < c 2 , µ ∈ R p .

• We can describe the confidence region in terms of the of unit length. axes 18 Confidence region for µ ii eigendecomposition of S n : let λ 1 ≥ · · · ≥ λ p be its eigenvalues, and let v 1 , . . . , v p be corresponding eigenvectors • The confidence region is the ellipsoid centered around ¯ Y with � ± c λ i v i .

• Therefore, we first need to project onto an axis or onto the plane. 19 Visualizing confidence regions when p > 2 i • When p > 2 we cannot easily plot the confidence regions. • Theorem : Let c > 0 be a constant and A a p × p positive definite matrix. For a given vector u ̸ = 0 , the projection of the ellipse { y T A − 1 y ≤ c 2 } onto u is given by √ u T A u c u . u T u

20 Visualizing confidence regions when p > 2 ii • If we take u to be the standard unit vectors, we get confidence intervals for each component of µ : � � ( n − 1) p � LB = ¯ � ( n − p ) F α ( p, n − p )( s 2 Y j − jj /n ) � � ( n − 1) p � UB = ¯ � ( n − p ) F α ( p, n − p )( s 2 Y j + jj /n ) .

Example i n <- nrow (dataset); p <- ncol (dataset) df2 = n - p) / (n - p) cbind (mu_hat - sqrt (critical_val * sample_cov / n), mu_hat + sqrt (critical_val * sample_cov / n)) 21 critical_val <- (n - 1) * p *qf (0.95, df1 = p, sample_cov <- diag ( cov (dataset))

Example ii ## [,1] [,2] ## infant_mortality 20.801776 30.846538 ## life_expectancy 69.561973 73.054881 ## fertility 2.565608 3.172257 22

definite matrix. For a given pair of perpendicular unit vectors 23 Visualizing confidence regions when p > 2 (cont’d) i • Theorem : Let c > 0 be a constant and A a p × p positive u 1 , u 2 , the projection of the ellipse { y T A − 1 y ≤ c 2 } onto the plane defined by u 1 , u 2 is given by � ( U T y ) T ( U T AU ) − 1 ( U T y ) ≤ c 2 � , where U = ( u 1 , u 2 ) .

Example (cont’d) i U <- matrix ( c (1, 0, 0, 0, 1, 0), ncol = 2) R <- n *solve ( t (U) %*% cov (dataset) %*% U) 24 transf <- chol (R)

Example (cont’d) ii # First create a circle of radius c theta_vect <- seq (0, 2 * pi, length.out = 100) sin (theta_vect)) # Then turn into ellipse ellipse <- circle %*% t ( solve (transf)) + matrix (mu_hat[1 : 2], ncol = 2, nrow = nrow (circle), byrow = TRUE) 25 circle <- sqrt (critical_val) * cbind ( cos (theta_vect),

Example (cont’d) iii # Eigendecomposition # To visualize the principal axes decomp <- eigen ( t (U) %*% cov (dataset) %*% U) decomp $ vectors[,1] * sqrt (critical_val) decomp $ vectors[,2] * sqrt (critical_val) 26 first <- sqrt (decomp $ values[1]) * second <- sqrt (decomp $ values[2]) *

27 Example (cont’d) iv 74 73 72 life_expectancy 71 70 69 22 24 26 28 30 infant_mortality

Simultaneous Confidence Statements i following confidence interval: 28 • Let w ∈ R p . We are interested in constructing confidence intervals for w T µ that are simultaneously valid (i.e. right coverage probability) for all w . • Note that w T ¯ Y and w T S n w are both scalars. • If we were only interested in a particular w , we could use the � � w T ¯ � Y ± t α/ 2 ,n − 1 w T S n w/n .

Simultaneous Confidence Statements ii • Or equivalently, the confidence interval contains the set of 29 values w T µ for which t 2 ( w ) = n ( w T ¯ = n ( w T ( ¯ Y − w T µ ) 2 Y − µ )) 2 ≤ F α (1 , n − 1) . w T S n w w T S n w • Strategy : Maximise over all w : n ( w T ( ¯ Y − µ )) 2 t 2 ( w ) = max max . w T S n w w w

Simultaneous Confidence Statements iii • Using the Cauchy-Schwarz Inequality: 30 Y − µ )) 2 = ( w T S 1 / 2 ( w T ( ¯ n S − 1 / 2 ( ¯ Y − µ )) 2 n = (( S 1 / 2 n w ) T ( S − 1 / 2 ( ¯ Y − µ ))) 2 n ≤ ( w T S n w )(( ¯ Y − µ ) T S − 1 n ( ¯ Y − µ )) . • Dividing both sides by w T S n w/n , we get t 2 ( w ) ≤ n ( ¯ Y − µ ) T S − 1 n ( ¯ Y − µ ) .

Tests for Multivariate Means Max Turgeon STAT 7200Multivariate - PowerPoint PPT Presentation

Tests for Multivariate Means Max Turgeon STAT 7200Multivariate Statistics Objectives Construct tests for a single multivariate mean Discuss and compare confidence regions and confidence intervals Describe connection with

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Tests for Multivariate Linear Models with the car Package John Fox McMaster University Hamilton,

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Robust Statistics Part 2: Multivariate location and scatter Peter Rousseeuw LARS-IASC School,

Advanced PHP Dr. Steven Bitner A/B and Multivariate testing Why use multivariate testing If

Multivariate normal distribution Surajit Ray Reader, University of Glasgow DataCamp

Multivariate Normal Distribution Max Turgeon STAT 4690Applied Multivariate Analysis Building

Comparing User-Provided Tests to Developer-Provided Tests Ren Just, Chris Parnin, Ian Drosos,

On corrections of classical multivariate tests for high-dimensional data Jian-feng Yao with

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

Lecture 23/Chapter 19 Diversity of Sample Means Means versus Proportions Behavior of

In vitro tests and experimental animal In vitro tests and experimental animal In vitro tests and

Lecture 3: Dependence measures using RKHS embeddings MLSS Cadiz, 2016 Arthur Gretton Gatsby

Prolific Permutations and Expected Breadth Cheyne Homberger Permutation Patterns 2018 Joint work

CS 211 Autumn 2006 Homework 2 Solutions Matousek 2.1.2 Induction: By induction on n: Assume

The 8M Algorithm from Todays Perspective Radim Belohlavek, Martin Trnecka DEPARTMENT OF

Sta$s$calMethodsforExperimental Par$clePhysics TomJunk

Multiple Comparisons October 18, 2019 October 18, 2019 1 / 17 After the ANOVA For an ANOVA, H

On the Optimum Number of Hypotheses to Test when the Number of Observations is Limited A.

Boolean Logic 01-1 Boolean values Are TRUE and FALSE 01-2 Boolean values Are TRUE and