Multivariate Analysis of Variance Max Turgeon STAT 7200Multivariate - PowerPoint PPT Presentation

Multivariate Analysis of Variance Max Turgeon STAT 7200–Multivariate Statistics

Objectives • Present the four classical test statistics • Discuss approximations for their null distribution 2 • Introduce MANOVA as a generalization of Hotelling’s T 2

Quick Overview What do we mean by Analysis of Variance? • ANOVA is a collection of statistical models that aim to analyze and understand the differences in means between different subgroups of the data. • Note that there could be multiple, overlapping ways of defining the subgroups (e.g multiway ANOVA) • It also provides a framework for hypothesis testing. • Which can be recovered from a suitable regression model. • Most importantly , ANOVA provides a framework for understanding and comparing the various sources of variation in the data. 3 • As such, it can be seen as a generalisation of the t -test (or of Hotelling’s T 2 ).

Review of univariate ANOVA i . • Homoscedasticity . . ... . . . 4 • Assume the data comes from g populations: X 11 , . . . , X 1 n 1 X g 1 , . . . , X gn g • Assume that X ℓ 1 , . . . , X ℓn ℓ is a random sample from N ( µ ℓ , σ 2 ) , for ℓ = 1 , . . . , g . • We are interested in testing the hypothesis that µ 1 = . . . = µ g .

Review of univariate ANOVA ii • We can write our observations as there are infinitely many models that lead to the same data-generating mechanism. 5 • Reparametrisation : We will write the mean µ ℓ = µ + τ ℓ as a sum of an overall component µ (i.e. shared by all populations) and a population-specific component τ ℓ . • Our hypothesis can now be rewritten as τ ℓ = 0 , for all ℓ . X ℓi = µ + τ ℓ + ε ℓi , where ε ℓi ∼ N (0 , σ 2 ) . • Identifiability : We need to assume ∑ g ℓ =1 τ ℓ = 0 , otherwise • Sample statistics : Set n = ∑ g ℓ =1 n ℓ .

Review of univariate ANOVA iii • We get the following decomposition: get 6 • Overall sample mean: ¯ ∑ g ∑ n ℓ X = 1 i =1 X ℓi . n ℓ =1 • Population-specific sample mean: ¯ ∑ n ℓ 1 X ℓ = i =1 X ℓi . n ℓ ( ¯ ( X ℓi − ¯ ) X ℓ − ¯ ) ( X ℓi − ¯ ) X = X + X ℓ . • Squaring the left-hand side and summing over both ℓ and i , we ( ¯ g n ℓ g g n ℓ ) 2 = ) 2 . ) 2 + ( X ℓi − ¯ X ℓ − ¯ ( X ℓi − ¯ ∑ ∑ ∑ ∑ ∑ X n ℓ X X ℓ i =1 i =1 ℓ =1 ℓ =1 ℓ =1 • This is typically summarised as SS T = SS M + SS R : ) 2 • The total sum of squares : SS T = ∑ g ∑ n ℓ ( X ℓi − ¯ X i =1 ℓ =1

Review of univariate ANOVA iv • The model (or treatment) sum of squares : Total Residual Model Degrees of freedom Sum of Squares Source of Variation • Yet another representation is the ANOVA table : 7 • The residual sum of squares : ( ¯ ) 2 SS M = ∑ g X ℓ − ¯ ℓ =1 n ℓ X ) 2 SS R = ∑ g ( ∑ n ℓ X ℓi − ¯ X ℓ ℓ =1 i =1 SS M g − 1 SS R n − g SS T n − 1

Review of univariate ANOVA v • We could also instead reject the null hypothesis for small values of This is the test statistic that we will generalize to the multivariate setting. 8 • The usual test statistic used for testing τ ℓ = 0 for all ℓ is F = SS M / ( g − 1) SS R / ( n − g ) ∼ F ( g − 1 , n − g ) . SS R = SS R . SS R + SS M SS T

Multivariate ANOVA i . • We are again interested in testing the hypothesis that • Homoscedasticity is key here again. . . ... . . . populations: 9 • The setting is similar: Assume the data comes from g Y 11 , . . . , Y 1 n 1 Y g 1 , . . . , Y gn g • Assume that Y ℓ 1 , . . . , Y ℓn ℓ is a random sample from N p ( µ ℓ , Σ) , for ℓ = 1 , . . . , g . µ 1 = . . . = µ g . • Reparametrisation : We will write the mean as µ ℓ = µ + τ ℓ

Multivariate ANOVA ii • Instead of a decomposition of the sum of squares, we get a • The decomposition is given as decomposition of the outer product: 10 • Identifiability : We need to assume • Y ℓi = µ + τ ℓ + E ℓi , where E ℓi ∼ N p (0 , Σ) . ∑ g ℓ =1 τ ℓ = 0 . ( Y ℓi − ¯ Y )( Y ℓi − ¯ Y ) T . g n ℓ g Y ) T = ( Y ℓi − ¯ Y )( Y ℓi − ¯ n ℓ ( ¯ Y ℓ − ¯ Y )( ¯ Y ℓ − ¯ Y ) T ∑ ∑ ∑ i =1 ℓ =1 ℓ =1 g n ℓ ∑ ∑ ( Y ℓi − ¯ Y ℓ )( Y ℓi − ¯ Y ℓ ) T . + ℓ =1 i =1

Multivariate ANOVA iii • Within sum of squares and cross products matrix : are independent, and that under the null hypothesis that • Between sum of squares and cross products matrix : 11 B = ∑ g ℓ =1 n ℓ ( ¯ Y ℓ − ¯ Y )( ¯ Y ℓ − ¯ Y ) T . W = ∑ g ∑ n ℓ i =1 ( Y ℓi − ¯ Y ℓ )( Y ℓi − ¯ Y ℓ ) T . ℓ =1 • Note that W = ∑ g ℓ =1 ( n ℓ − 1) S ℓ , and therefore W p ( n − g, Σ) . • Moreover, using Cochran’s theorem, we can show that W and B τ ℓ = 0 for all ℓ = 1 , . . . , g , we also have B ∼ W p ( g − 1 , Σ) .

Multivariate ANOVA iv • Similarly as above, we have a MANOVA table : Source of Variation Sum of Squares Degrees of freedom Model Residual Total 12 B g − 1 W n − g B + W n − 1

Likelihood Ratio Test i • As the notation suggests, this is the likelihood ratio test statistic . pooled covariance: each mean parameter is maximised independently, and the • Under the unrestricted model (i.e. no constraint on the means), maximum likelihood estimator for the covariance matrix is the will use Wilk’s lambda as our test statistic: 13 • To test the null hypothesis H 0 : τ ℓ = 0 for all ℓ = 1 , . . . , g , we | W | Λ 2 /n = | B + W | . Σ = 1 ˆ µ ℓ = ¯ ˆ Y ℓ , nW.

Likelihood Ratio Test ii • Putting this together, we get • Under the null model (i.e. all means are equal), all observations 14 maximum likelihood estimators are Y ℓi come from a unique distribution N p ( µ, Σ) , and so the Σ = 1 ˆ µ = ¯ ˆ Y , n ( B + W ) . Λ = (2 π ) − np/ 2 exp( − np/ 2) | 1 n ( B + W ) | − n/ 2 (2 π ) − np/ 2 exp( − np/ 2) | 1 n W | − n/ 2 = | 1 n ( B + W ) | − n/ 2 | 1 n W | − n/ 2 ) n/ 2 ( | W | = . | B + W |

Likelihood Ratio Test iii • From the general asymptotic theory, we now that we reject the null hypothesis if 15 • Using Bartlett’s approximation, we can get an unbiased test: − 2 log Λ ≈ χ 2 (( g − 1) p ) . n − 1 − 1 ( ) log Λ ≈ χ 2 (( g − 1) p ) . − 2( p + g ) • In particular, if we let c = χ 2 α (( n − 1) p ) be the critical value, ( ) − c Λ ≤ exp . n − 1 − 0 . 5( p + g )

Example i ## Example on producing plastic film ## from Krzanowski (1998, p. 381) tear <- c (6.5, 6.2, 5.8, 6.5, 6.5, 6.9, 7.2, 6.9, 6.1, 6.3, 6.7, 6.6, 7.2, 7.1, 6.8, 7.1, 7.0, 7.2, 7.5, 7.6) 9.9, 9.5, 9.4, 9.1, 9.3, 8.3, 8.4, 8.5, 9.2, 8.8, 9.7, 10.1, 9.2) opacity <- c (4.4, 6.4, 3.0, 4.1, 0.8, 5.7, 2.0, 3.9, 1.9, 5.7, 2.8, 4.1, 3.8, 1.6, 3.4, 8.4, 5.2, 6.9, 2.7, 1.9) 16 gloss <- c (9.5, 9.9, 9.6, 9.6, 9.2, 9.1, 10.0,

Example ii Y <- cbind (tear, gloss, opacity) Y_low <- Y[1 : 10,] Y_high <- Y[11 : 20,] W <- ( nrow (Y_low) - 1) *cov (Y_low) + ( nrow (Y_high) - 1) *cov (Y_high) B <- (n-1) *cov (Y) - W (Lambda <- det (W) /det (W + B)) ## [1] 0.4136192 17 n <- nrow (Y); p <- ncol (Y); g <- 2

Example iii transf_lambda <- - (n - 1 - 0.5 * (p + g)) *log (Lambda) ## [1] TRUE # Or if you want a p-value pchisq (transf_lambda, p * (g-1), lower.tail = FALSE) ## [1] 0.002227356 18 transf_lambda > qchisq (0.95, p * (g-1))

Example iv # R has a function for MANOVA # But first, create factor variable rate <- gl (g, 10, labels = c (”Low”, ”High”)) fit <- manova (Y ~ rate) summary_tbl <- broom ::tidy (fit, test = ”Wilks”) # Or you can use the summary function knitr ::kable (summary_tbl, digits = 3) 19

Example v 3 - - - - - 18 Residuals 0.002 16 7.561 term 0.414 1 rate p.value den.df num.df statistic wilks df 20

Example vi # Check residuals for evidence of normality library (tidyverse) variable, residual) ggplot (data_plot, aes (sample = residual)) + stat_qq () + stat_qq_line () + facet_grid (. ~ variable) + theme_minimal () 21 resids <- residuals (fit) data_plot <- gather ( as.data.frame (resids),

Example vii 22 gloss opacity tear 2.5 sample 0.0 −2.5 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 theoretical

Example viii # Next: Chi-squared plot Sn <- cov (resids) qqplot ( qchisq ( ppoints (dists), df = df), qqline (dists, distribution = function (p) { qchisq (p, df = df) }) 23 dists <- mahalanobis (resids, colMeans (resids), Sn) df <- mean (dists) dists, xlab = ”Theoretical Quantiles”)

Example ix 24 7 6 5 4 dists 3 2 1 0 0 2 4 6 8 Theoretical Quantiles

Comments i • The output from R shows a different approximation to the Wilk’s lambda distribution, due to Rao. • There are actually 4 tests available in R : • Wilk’s lambda; • Pillai-Bartlett; • Hotelling-Lawley; • Roy’s Largest Root. 25

Comments ii • Since we only had two groups in the above example, we were only comparing two means. • But of course MANOVA is much more general. • We can assess the normality assumption by looking at the 26 • Wilk’s lambda was therefore equivalent to Hotelling’s T 2 . residuals E ℓi = Y ℓi − ¯ Y ℓ .

Multivariate Analysis of Variance Max Turgeon STAT 7200Multivariate - PowerPoint PPT Presentation

Multivariate Analysis of Variance Max Turgeon STAT 7200Multivariate Statistics Objectives Present the four classical test statistics Discuss approximations for their null distribution 2 Introduce MANOVA as a generalization of

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Multivariate Analysis of Variance Max Turgeon STAT 4690Applied Multivariate Analysis Quick

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Analysis of variance and regression December 4, 2007 Variance component models Variance

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Multivariate Normal Distribution Max Turgeon STAT 4690Applied Multivariate Analysis Building

Module 15 Standard Costing and Variance Analysis Dr. Varadraj Bapat 1 Standard Costing

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Robust Statistics Part 2: Multivariate location and scatter Peter Rousseeuw LARS-IASC School,

Advanced PHP Dr. Steven Bitner A/B and Multivariate testing Why use multivariate testing If

Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 4 Jan-Willem van de Meent (

Wireless Network Pricing Chapter 6: Oligopoly Pricing Jianwei Huang & Lin Gao Network

Nursing Home Choice, Family Bargaining and Optimal Policy in a Hotelling Economy M-L Leroux

in one cloud FIW Research Conference Verti-zontal Differentiation in Monopolistic

Using Facebook Data to Predict 2016 US Presidential Election Keng-Chi Chang Chun-Fang Chiang

The Green Paradox The Green Paradox Reyer Gerlagh Tilburg University Introduction Policy

The Multivariate Dustbin Phil Ender UCLA Statistical Consulting Group (Ret.) Stata Conference

Repeated Measures ANOVA Rick Balkin, Ph.D., LPC-S, NCC Department of Counseling Texas A & M

Multivariate Analysis of Variance Max Turgeon STAT 7200Multivariate - PowerPoint PPT Presentation

Multivariate Analysis of Variance Max Turgeon STAT 7200Multivariate Statistics Objectives Present the four classical test statistics Discuss approximations for their null distribution 2 Introduce MANOVA as a generalization of

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Multivariate Analysis of Variance Max Turgeon STAT 4690Applied Multivariate Analysis Quick

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Analysis of variance and regression December 4, 2007 Variance component models Variance

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Multivariate Normal Distribution Max Turgeon STAT 4690Applied Multivariate Analysis Building

Module 15 Standard Costing and Variance Analysis Dr. Varadraj Bapat 1 Standard Costing

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Robust Statistics Part 2: Multivariate location and scatter Peter Rousseeuw LARS-IASC School,

Advanced PHP Dr. Steven Bitner A/B and Multivariate testing Why use multivariate testing If

Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 4 Jan-Willem van de Meent (

Wireless Network Pricing Chapter 6: Oligopoly Pricing Jianwei Huang &amp; Lin Gao Network

Nursing Home Choice, Family Bargaining and Optimal Policy in a Hotelling Economy M-L Leroux

in one cloud FIW Research Conference Verti-zontal Differentiation in Monopolistic

Using Facebook Data to Predict 2016 US Presidential Election Keng-Chi Chang Chun-Fang Chiang

The Green Paradox The Green Paradox Reyer Gerlagh Tilburg University Introduction Policy

The Multivariate Dustbin Phil Ender UCLA Statistical Consulting Group (Ret.) Stata Conference

Repeated Measures ANOVA Rick Balkin, Ph.D., LPC-S, NCC Department of Counseling Texas A &amp; M

Wireless Network Pricing Chapter 6: Oligopoly Pricing Jianwei Huang & Lin Gao Network

Repeated Measures ANOVA Rick Balkin, Ph.D., LPC-S, NCC Department of Counseling Texas A & M