multivariate analysis of variance
play

Multivariate Analysis of Variance Max Turgeon STAT 7200Multivariate - PowerPoint PPT Presentation

Multivariate Analysis of Variance Max Turgeon STAT 7200Multivariate Statistics Objectives Present the four classical test statistics Discuss approximations for their null distribution 2 Introduce MANOVA as a generalization of


  1. Multivariate Analysis of Variance Max Turgeon STAT 7200–Multivariate Statistics

  2. Objectives • Present the four classical test statistics • Discuss approximations for their null distribution 2 • Introduce MANOVA as a generalization of Hotelling’s T 2

  3. Quick Overview What do we mean by Analysis of Variance? • ANOVA is a collection of statistical models that aim to analyze and understand the differences in means between different subgroups of the data. • Note that there could be multiple, overlapping ways of defining the subgroups (e.g multiway ANOVA) • It also provides a framework for hypothesis testing. • Which can be recovered from a suitable regression model. • Most importantly , ANOVA provides a framework for understanding and comparing the various sources of variation in the data. 3 • As such, it can be seen as a generalisation of the t -test (or of Hotelling’s T 2 ).

  4. Review of univariate ANOVA i . • Homoscedasticity . . ... . . . 4 • Assume the data comes from g populations: X 11 , . . . , X 1 n 1 X g 1 , . . . , X gn g • Assume that X ℓ 1 , . . . , X ℓn ℓ is a random sample from N ( µ ℓ , σ 2 ) , for ℓ = 1 , . . . , g . • We are interested in testing the hypothesis that µ 1 = . . . = µ g .

  5. Review of univariate ANOVA ii • We can write our observations as there are infinitely many models that lead to the same data-generating mechanism. 5 • Reparametrisation : We will write the mean µ ℓ = µ + τ ℓ as a sum of an overall component µ (i.e. shared by all populations) and a population-specific component τ ℓ . • Our hypothesis can now be rewritten as τ ℓ = 0 , for all ℓ . X ℓi = µ + τ ℓ + ε ℓi , where ε ℓi ∼ N (0 , σ 2 ) . • Identifiability : We need to assume ∑ g ℓ =1 τ ℓ = 0 , otherwise • Sample statistics : Set n = ∑ g ℓ =1 n ℓ .

  6. Review of univariate ANOVA iii • We get the following decomposition: get 6 • Overall sample mean: ¯ ∑ g ∑ n ℓ X = 1 i =1 X ℓi . n ℓ =1 • Population-specific sample mean: ¯ ∑ n ℓ 1 X ℓ = i =1 X ℓi . n ℓ ( ¯ ( X ℓi − ¯ ) X ℓ − ¯ ) ( X ℓi − ¯ ) X = X + X ℓ . • Squaring the left-hand side and summing over both ℓ and i , we ( ¯ g n ℓ g g n ℓ ) 2 = ) 2 . ) 2 + ( X ℓi − ¯ X ℓ − ¯ ( X ℓi − ¯ ∑ ∑ ∑ ∑ ∑ X n ℓ X X ℓ i =1 i =1 ℓ =1 ℓ =1 ℓ =1 • This is typically summarised as SS T = SS M + SS R : ) 2 • The total sum of squares : SS T = ∑ g ∑ n ℓ ( X ℓi − ¯ X i =1 ℓ =1

  7. Review of univariate ANOVA iv • The model (or treatment) sum of squares : Total Residual Model Degrees of freedom Sum of Squares Source of Variation • Yet another representation is the ANOVA table : 7 • The residual sum of squares : ( ¯ ) 2 SS M = ∑ g X ℓ − ¯ ℓ =1 n ℓ X ) 2 SS R = ∑ g ( ∑ n ℓ X ℓi − ¯ X ℓ ℓ =1 i =1 SS M g − 1 SS R n − g SS T n − 1

  8. Review of univariate ANOVA v • We could also instead reject the null hypothesis for small values of This is the test statistic that we will generalize to the multivariate setting. 8 • The usual test statistic used for testing τ ℓ = 0 for all ℓ is F = SS M / ( g − 1) SS R / ( n − g ) ∼ F ( g − 1 , n − g ) . SS R = SS R . SS R + SS M SS T

  9. Multivariate ANOVA i . • We are again interested in testing the hypothesis that • Homoscedasticity is key here again. . . ... . . . populations: 9 • The setting is similar: Assume the data comes from g Y 11 , . . . , Y 1 n 1 Y g 1 , . . . , Y gn g • Assume that Y ℓ 1 , . . . , Y ℓn ℓ is a random sample from N p ( µ ℓ , Σ) , for ℓ = 1 , . . . , g . µ 1 = . . . = µ g . • Reparametrisation : We will write the mean as µ ℓ = µ + τ ℓ

  10. Multivariate ANOVA ii • Instead of a decomposition of the sum of squares, we get a • The decomposition is given as decomposition of the outer product: 10 • Identifiability : We need to assume • Y ℓi = µ + τ ℓ + E ℓi , where E ℓi ∼ N p (0 , Σ) . ∑ g ℓ =1 τ ℓ = 0 . ( Y ℓi − ¯ Y )( Y ℓi − ¯ Y ) T . g n ℓ g Y ) T = ( Y ℓi − ¯ Y )( Y ℓi − ¯ n ℓ ( ¯ Y ℓ − ¯ Y )( ¯ Y ℓ − ¯ Y ) T ∑ ∑ ∑ i =1 ℓ =1 ℓ =1 g n ℓ ∑ ∑ ( Y ℓi − ¯ Y ℓ )( Y ℓi − ¯ Y ℓ ) T . + ℓ =1 i =1

  11. Multivariate ANOVA iii • Within sum of squares and cross products matrix : are independent, and that under the null hypothesis that • Between sum of squares and cross products matrix : 11 B = ∑ g ℓ =1 n ℓ ( ¯ Y ℓ − ¯ Y )( ¯ Y ℓ − ¯ Y ) T . W = ∑ g ∑ n ℓ i =1 ( Y ℓi − ¯ Y ℓ )( Y ℓi − ¯ Y ℓ ) T . ℓ =1 • Note that W = ∑ g ℓ =1 ( n ℓ − 1) S ℓ , and therefore W p ( n − g, Σ) . • Moreover, using Cochran’s theorem, we can show that W and B τ ℓ = 0 for all ℓ = 1 , . . . , g , we also have B ∼ W p ( g − 1 , Σ) .

  12. Multivariate ANOVA iv • Similarly as above, we have a MANOVA table : Source of Variation Sum of Squares Degrees of freedom Model Residual Total 12 B g − 1 W n − g B + W n − 1

  13. Likelihood Ratio Test i • As the notation suggests, this is the likelihood ratio test statistic . pooled covariance: each mean parameter is maximised independently, and the • Under the unrestricted model (i.e. no constraint on the means), maximum likelihood estimator for the covariance matrix is the will use Wilk’s lambda as our test statistic: 13 • To test the null hypothesis H 0 : τ ℓ = 0 for all ℓ = 1 , . . . , g , we | W | Λ 2 /n = | B + W | . Σ = 1 ˆ µ ℓ = ¯ ˆ Y ℓ , nW.

  14. Likelihood Ratio Test ii • Putting this together, we get • Under the null model (i.e. all means are equal), all observations 14 maximum likelihood estimators are Y ℓi come from a unique distribution N p ( µ, Σ) , and so the Σ = 1 ˆ µ = ¯ ˆ Y , n ( B + W ) . Λ = (2 π ) − np/ 2 exp( − np/ 2) | 1 n ( B + W ) | − n/ 2 (2 π ) − np/ 2 exp( − np/ 2) | 1 n W | − n/ 2 = | 1 n ( B + W ) | − n/ 2 | 1 n W | − n/ 2 ) n/ 2 ( | W | = . | B + W |

  15. Likelihood Ratio Test iii • From the general asymptotic theory, we now that we reject the null hypothesis if 15 • Using Bartlett’s approximation, we can get an unbiased test: − 2 log Λ ≈ χ 2 (( g − 1) p ) . n − 1 − 1 ( ) log Λ ≈ χ 2 (( g − 1) p ) . − 2( p + g ) • In particular, if we let c = χ 2 α (( n − 1) p ) be the critical value, ( ) − c Λ ≤ exp . n − 1 − 0 . 5( p + g )

  16. Example i ## Example on producing plastic film ## from Krzanowski (1998, p. 381) tear <- c (6.5, 6.2, 5.8, 6.5, 6.5, 6.9, 7.2, 6.9, 6.1, 6.3, 6.7, 6.6, 7.2, 7.1, 6.8, 7.1, 7.0, 7.2, 7.5, 7.6) 9.9, 9.5, 9.4, 9.1, 9.3, 8.3, 8.4, 8.5, 9.2, 8.8, 9.7, 10.1, 9.2) opacity <- c (4.4, 6.4, 3.0, 4.1, 0.8, 5.7, 2.0, 3.9, 1.9, 5.7, 2.8, 4.1, 3.8, 1.6, 3.4, 8.4, 5.2, 6.9, 2.7, 1.9) 16 gloss <- c (9.5, 9.9, 9.6, 9.6, 9.2, 9.1, 10.0,

  17. Example ii Y <- cbind (tear, gloss, opacity) Y_low <- Y[1 : 10,] Y_high <- Y[11 : 20,] W <- ( nrow (Y_low) - 1) *cov (Y_low) + ( nrow (Y_high) - 1) *cov (Y_high) B <- (n-1) *cov (Y) - W (Lambda <- det (W) /det (W + B)) ## [1] 0.4136192 17 n <- nrow (Y); p <- ncol (Y); g <- 2

  18. Example iii transf_lambda <- - (n - 1 - 0.5 * (p + g)) *log (Lambda) ## [1] TRUE # Or if you want a p-value pchisq (transf_lambda, p * (g-1), lower.tail = FALSE) ## [1] 0.002227356 18 transf_lambda > qchisq (0.95, p * (g-1))

  19. Example iv # R has a function for MANOVA # But first, create factor variable rate <- gl (g, 10, labels = c (”Low”, ”High”)) fit <- manova (Y ~ rate) summary_tbl <- broom ::tidy (fit, test = ”Wilks”) # Or you can use the summary function knitr ::kable (summary_tbl, digits = 3) 19

  20. Example v 3 - - - - - 18 Residuals 0.002 16 7.561 term 0.414 1 rate p.value den.df num.df statistic wilks df 20

  21. Example vi # Check residuals for evidence of normality library (tidyverse) variable, residual) ggplot (data_plot, aes (sample = residual)) + stat_qq () + stat_qq_line () + facet_grid (. ~ variable) + theme_minimal () 21 resids <- residuals (fit) data_plot <- gather ( as.data.frame (resids),

  22. Example vii 22 gloss opacity tear 2.5 sample 0.0 −2.5 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 theoretical

  23. Example viii # Next: Chi-squared plot Sn <- cov (resids) qqplot ( qchisq ( ppoints (dists), df = df), qqline (dists, distribution = function (p) { qchisq (p, df = df) }) 23 dists <- mahalanobis (resids, colMeans (resids), Sn) df <- mean (dists) dists, xlab = ”Theoretical Quantiles”)

  24. Example ix 24 7 6 5 4 dists 3 2 1 0 0 2 4 6 8 Theoretical Quantiles

  25. Comments i • The output from R shows a different approximation to the Wilk’s lambda distribution, due to Rao. • There are actually 4 tests available in R : • Wilk’s lambda; • Pillai-Bartlett; • Hotelling-Lawley; • Roy’s Largest Root. 25

  26. Comments ii • Since we only had two groups in the above example, we were only comparing two means. • But of course MANOVA is much more general. • We can assess the normality assumption by looking at the 26 • Wilk’s lambda was therefore equivalent to Hotelling’s T 2 . residuals E ℓi = Y ℓi − ¯ Y ℓ .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend