Test for Covariances Max Turgeon STAT 7200Multivariate Statistics - PowerPoint PPT Presentation

Test for Covariances Max Turgeon STAT 7200–Multivariate Statistics

Objectives • Review general theory of likelihood ratio tests • Tests for structured covariance matrices • Test for equality of multiple covariance matrices 2

Likelihood ratio tests i • We will build our tests for covariances using likelihood ratios. • Therefore, we quickly review the asymptotic theory for regular models. • We are interested in the following hypotheses: 3 • Let Y 1 , . . . , Y n be a random sample from a density p θ with parameter θ ∈ R d . H 0 : θ ∈ Θ 0 , H 1 : θ ∈ Θ 1 , where Θ i ⊆ R d .

Likelihood ratio tests ii likelihood ratio 4 • Let L ( θ ) = � n i =1 p θ ( Y i ) be the likelihood, and define the max θ ∈ Θ 0 L ( θ ) Λ = max θ ∈ Θ 0 ∪ Θ 1 L ( θ ) . • Recall : we reject the null hypothesis H 0 for small values of Λ .

Likelihood ratio tests iii Theorem (Van der Wandt, Chapter 16) we have • Therefore, in practice, we need to count the number of free enough. 5 Assume Θ 0 , Θ 1 are locally linear . Under regularity conditions on p θ , − 2 log Λ → χ 2 ( k ) , where k is the difference in the number of free parameters between the null model Θ 0 and the unrestricted model Θ 0 ∪ Θ 1 . parameters in each model and hope the sample size n is large

Tests for structured covariance matrices i • We are going to look at several tests for structured covariance matrix. positive definite. • Like other exponential families, the multivariate normal distribution satisfies the regularity conditions of the theorem above. • Being positive definite implies that the unrestricted parameter space is locally linear , i.e. we are staying away from the 6 • Throughout, we assume Y 1 , . . . , Y n ∼ N p ( µ, Σ) with Σ boundary where Σ is singular.

Tests for structured covariance matrices ii • A few important observations about the unrestricted model: • The number of free parameters is equal to the number of • The maximised likelihood for the unrestricted model is given by 7 entries on and above the diagonal of Σ , which is p ( p + 1) / 2 . • The sample mean ¯ Y maximises the likelihood independently of the structure of Σ . exp( − np/ 2) L ( ˆ Y , ˆ Σ) = Σ | n/ 2 . (2 π ) np/ 2 | ˆ

Specified covariance structure i • We will start with the simplest hypothesis test: • Note that there is no free parameter in the null model. 8 H 0 : Σ = Σ 0 . • Write V = n ˆ Σ . Recall that we have � � − 1 Y , Σ) = (2 π ) − np/ 2 | Σ | − n/ 2 exp L ( ˆ 2tr(Σ − 1 V ) .

Specified covariance structure ii • Therefore, the likelihood ratio is given by 9 � � (2 π ) − np/ 2 | Σ 0 | − n/ 2 exp − 1 2 tr(Σ − 1 0 V ) Λ = exp( − np/ 2)(2 π ) − np/ 2 | ˆ Σ | − n/ 2 � � | Σ 0 | − n/ 2 exp − 1 2 tr(Σ − 1 0 V ) = exp( − np/ 2) | n − 1 V | − n/ 2 � e � np/ 2 � � − 1 0 V | n/ 2 exp | Σ − 1 2tr(Σ − 1 = 0 V ) . n • In particular, if Σ 0 = I p , we get � e � np/ 2 � � − 1 | V | n/ 2 exp Λ = 2tr( V ) . n

Example i library (tidyverse) # Winnipeg avg temperature url <- paste0 (”https://maxturgeon.ca/w20-stat7200/”, ”winnipeg_temp.csv”) dataset[1 : 3,1 : 3] ## temp_2010 temp_2011 temp_2012 ## 1 -25.57500 -16.25417 -6.379167 ## 2 -26.06250 -18.39583 -12.925000 ## 3 -20.56667 -19.45833 -5.791667 10 dataset <- read.csv (url)

Example ii n <- nrow (dataset) # Diag = 14^2 # Corr = 0.8 Sigma0 <- diag (0.8, nrow = p) diag (Sigma0) <- 1 Sigma0 <- 14 ^ 2 * Sigma0 Sigma0_invXV <- solve (Sigma0, V) 11 p <- ncol (dataset) V <- (n - 1) *cov (dataset)

Example iii lrt <- 0.5 * n * p * (1 - log (n)) df <- choose (p + 1, 2) c (lrt, qchisq (0.95, df)) ## [1] 5631.63409 73.31149 12 lrt <- lrt + 0.5 * n *log ( det (Sigma0_invXV)) lrt <- lrt - 0.5 *sum ( diag (Sigma0_invXV)) lrt <- -2 * lrt

Test for sphericity i • Note that there is one free parameter. • We have 13 • In other words, we are looking at the following null hypothesis: uncorrelated and have the same variance . • Sphericity means the different components of Y are σ 2 > 0 . H 0 : Σ = σ 2 I p , � � − 1 Y , σ 2 I p ) = (2 π ) − np/ 2 | σ 2 I p | − n/ 2 exp L ( ˆ 2tr(( σ 2 I p ) − 1 V ) � � − 1 = (2 πσ 2 ) − np/ 2 exp 2 σ 2 tr( V ) .

Test for sphericity ii • We then get • Taking the derivative of the logarithm and setting it equal to 14 zero, we find that L ( ˆ Y , σ 2 I p ) is maximised when σ 2 = tr V � np . � � − 1 σ 2 ) − np/ 2 exp L ( ˆ Y , � σ 2 I p ) = (2 π � σ 2 tr( V ) 2 � � tr V � − np/ 2 � � − np = (2 π ) − np/ 2 exp . np 2

Test for sphericity iii • Therefore, we have 15 (2 π ) − np/ 2 � � − np/ 2 exp � � − np tr V np 2 Λ = exp( − np/ 2)(2 π ) − np/ 2 | ˆ Σ | − n/ 2 � � − np/ 2 tr V np = | n − 1 V | − n/ 2 � � n/ 2 | V | = . (tr V/p ) p

Example (cont’d) i lrt <- -2 * 0.5 * n * ( log ( det (V)) - p *log ( mean ( diag (V)))) c (lrt, qchisq (0.95, df)) ## [1] 5630.79458 72.15322 16 df <- choose (p + 1, 2) - 1

Test for sphericity (cont’d) i • Recall that we have 17 � � n/ 2 | V | Λ = . (tr V/p ) p • We can rewrite this as follows: let l 1 ≥ · · · ≥ l p be the eigenvalues of V . We have | V | Λ 2 /n = (tr V/p ) p � p j =1 l j = � p ( 1 j =1 l j ) p p   � p p j =1 l 1 /p j   = . � p 1 j =1 l j p

Test for sphericity (cont’d) ii • A result of Srivastava and Khatri gives the exact distribution of 18 • In other words, the modified LRT ˜ Λ = Λ 2 /n is the ratio of the geometric to the arithmetic mean of the eigenvalues of V (all raised to the power p ). ˜ Λ : � 1 � 1 �� p − 1 � 2 + 1 ˜ Λ = B 2( n − j − 1) , j . p j =1

Example (cont’d) i B <- 1000 df1 <- 0.5 * (n - seq_len (p-1) - 1) # Critical values dist <- replicate (B, { prod ( rbeta (p-1, df1, df2)) }) 19 df2 <- seq_len (p-1) * (0.5 + 1 / p)

Example (cont’d) ii # Test statistic decomp <- eigen (V, symmetric = TRUE, only.values = TRUE) lrt_mod <- (geo_mean / ar_mean) ^ p c (lrt_mod, quantile (dist, 0.95)) ## 95% ## 1.181561e-07 8.967361e-01 20 ar_mean <- mean (decomp $ values) geo_mean <- exp ( mean ( log (decomp $ values)))

Test for independence i . . . . ... . . . . . . ... . . 21 • Decompose Y i into k blocks: Y i = ( Y 1 i , . . . , Y ki ) , � k where Y ji ∼ N p j ( µ j , Σ jj ) and j =1 p j = p . • This induces a decomposition on Σ and V :     Σ 11 · · · Σ 1 k V 11 · · · V 1 k         Σ =  , V =  .       Σ k 1 · · · Σ kk V k 1 · · · V kk

Test for independence ii . • Under the null hypothesis, the likelihood can be decomposed . . • We are interested in testing for independence between the ... . . . 22 different blocks Y 1 i , . . . , Y ki . This equivalent to   Σ 11 · · · 0     H 0 : Σ =  .    0 · · · Σ kk • Note that there are � k j =1 p j ( p j + 1) / 2 free parameters. into k likelihoods that can be maximised independently.

Test for independence iii • This gives us • Putting this together, we conclude that 23 k � exp( − np j / 2) max L ( ˆ Y , Σ) = (2 π ) np j / 2 | � Σ jj | n/ 2 j =1 exp( − np/ 2) = Σ jj | n/ 2 . (2 π ) np/ 2 � k j =1 | � � � n/ 2 | V | Λ = . � k j =1 | V jj |

Example i url <- paste0 (”https://maxturgeon.ca/w20-stat7200/”, ”blue_data.csv”) names (blue_data) ## [1] ”NumSold” ”Price” ”AdvCost” ”SalesAssist” dim (blue_data) ## [1] 10 4 24 blue_data <- read.csv (url)

Example ii # Let's test for independence between # all four variables n <- nrow (blue_data) V <- (n-1) *cov (blue_data) df <- choose (p + 1, 2) - p c (lrt, qchisq (0.95, df)) 25 p <- ncol (blue_data) lrt <- -2 * ( log ( det (V)) - sum ( log ( diag (V))))

Example iii ## [1] 5.635124 12.591587 lrt > qchisq (0.95, df) ## [1] FALSE 26

Test for equality of covariances i independent random samples from (potentially) different 27 • We now look at a different setting: assume that we collected K p -dimensional multivariate normal distributions: Y 1 k , . . . , Y n k k ∼ N p ( µ k , Σ k ) , k = 1 , . . . , K. • We are interested in the null hypothesis that all Σ k are equal to some unknown Σ : H 0 : Σ k = Σ , for all k = 1 , . . . , K.

Test for equality of covariances ii • First, note that since the samples are independent, the full likelihood is the product of the likelihoods for each sample: • Therefore, over the unrestricted model, the maximum likelihood estimators are • Note that the number of free parameters over the unrestricted 28 K � L ( µ 1 , . . . , µ K , Σ 1 , . . . , Σ K ) = L ( µ k , Σ k ) . k =1 Y k , ˆ ( ¯ Σ k ) . model is kp ( p + 1) / 2 .

Test for Covariances Max Turgeon STAT 7200Multivariate Statistics - PowerPoint PPT Presentation

Test for Covariances Max Turgeon STAT 7200Multivariate Statistics Objectives Review general theory of likelihood ratio tests Tests for structured covariance matrices Test for equality of multiple covariance matrices 2

ANDES progress Meeting WP2: Uncertainties and covariances of nuclear data Task 2.4: Covariances

Model-Based Testing (ISTQB Chapter 4) Arie van Deursen 1 4.1 ISTQB Test Design Test Scripts

200511316 200511316 Test plan Test design specification g p

FLSA DUTIES TEST Exemption/Duties Test Types of Duties/Exemption Test Executive Exemption

Engineering Best Practices Test, test, test, and test some more; test as you go Start from a

Test automation Building automatically repeatable test suites Test automation n Test automation

Nehemiah Prays Nehemiah 1-2 Here is some test text Here is some test text Here is some test

TEST ANXIETY Strategies to Handle Test Anxiety OVERVIEW What is test anxiety? Positive verses

TEST AUTOMATION AT BMAR BMAR TEST TEAM Test Automation Planning 1. Selection Of Test

TESTING EQUIPMENTS FOR SAFETY TEST LIST OF TEST EQUIPMENT TEST SETUP FOR AIR CONDITIONER 1.

God Rescues Daniel from the Lions Daniel 6 Here is some test text Here is some test text Here

The Good Samaritan Luke 10:25-37 Here is some test text Here is some test text Here is some

Esther and the Great Reversal Esther 6-9 Here is some test text Here is some test text Here is

The Handwriting of God Daniel 5 Here is some test text Here is some test text Here is some test

Jesus Calls His First Disciples Matthew 4:17-22; 9:9-13 Here is some test text Here is some test

God Calls a Spokesman The Book of JEREMIAH Here is some test text Here is some test text Here

Chapter 8: Hypothesis Testing STK4011/9011: Statistical Inference Theory Johan Pensar

Bayesian hypothesis testing (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University March 7,

The Bernoulli Generalized Likelihood Ratio test (BGLR) for Non-Stationary Multi-Armed Bandits

13. hypothesis testing 1 competing hypotheses 2 competing hypotheses 3 competing hypotheses

Classification Fundamentals and Overview September 17, 2019 Classification Fundamentals

Internet Traffic Analysis: Mohammed Alasmar Co Cosen ener ers, , 2019 2019

Approximating likelihood ratios with calibrated classifiers Gilles Louppe June 22, 2016 MLHEP,

Weighted Classification Cascades for Optimizing Discovery Significance Lester Mackey