Analysis of multivariate data depending on several factors: ANOVA-PLS - - PowerPoint PPT Presentation

analysis of multivariate data depending on several
SMART_READER_LITE
LIVE PREVIEW

Analysis of multivariate data depending on several factors: ANOVA-PLS - - PowerPoint PPT Presentation

Analysis of multivariate data depending on several factors: ANOVA-PLS A. El Ghaziri 1 E.M. Qannari 1 T. Moyon 2 M.-C. Alexandre-Gouabau 2 1 ONIRIS, Sensometrics and Chemometrics unit, Nantes 2 INRA, Physiologie des Adaptations Nutritionnelles


slide-1
SLIDE 1

Analysis of multivariate data depending on several factors: ANOVA-PLS

  • A. El Ghaziri 1

E.M. Qannari 1

  • T. Moyon 2

M.-C. Alexandre-Gouabau 2

1ONIRIS, Sensometrics and Chemometrics unit, Nantes 2INRA, Physiologie des Adaptations Nutritionnelles (PhAN), Nantes

slide-2
SLIDE 2

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Metabolomics data and two factors

Context

Several metabolites (variables) measured on animals (rat pups) according to an experimental design involving two factors: Gestation: nutritional protein restriction on the mothers during pregnancy ⇒ levels : YES/NO Lactation: nutritional protein restriction on the mothers during lactation ⇒ levels : YES/NO

2 / 19

slide-3
SLIDE 3

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Metabolomics data and two factors

Context

Several metabolites (variables) measured on animals (rat pups) according to an experimental design involving two factors: Gestation: nutritional protein restriction on the mothers during pregnancy ⇒ levels : YES/NO Lactation: nutritional protein restriction on the mothers during lactation ⇒ levels : YES/NO

Aim

study of the effect of the various factors (fetal and post-natal) nutritional periods on the growth of the rat pups through the metabolites.

2 / 19

slide-4
SLIDE 4

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Outline

1 Existing methods

ASCA ANOVA-PCA

2 New method: ANOVA-PLS 3 Comparison of methods 4 Benefits of ANOVA-PLS 5 Application to metabolomics data

Factor Gestation Factor Lactation Interaction

6 Conclusion

3 / 19

slide-5
SLIDE 5

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Existing methods

  • ANOVA on each metabolite
  • Multivariate-ANOVA (MANOVA)

4 / 19

slide-6
SLIDE 6

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Existing methods

  • ANOVA on each metabolite
  • Multivariate-ANOVA (MANOVA)
  • ANOVA-Simultaneous Component Analysis (ASCA,

Smilde et al. (2005))

  • ANOVA-PCA (APCA, Harrington et al. (2005))

4 / 19

slide-7
SLIDE 7

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Decomposition

  • For each variable (i.e. metabolite), the two-way ANOVA

decomposition: xijk = µ + αi + βj + γij + ǫijk

  • µ is the overall mean
  • αi is the effect due to level i in the first factor
  • βj is the effect due to level j in the second factor
  • γijk is the effect due to the interaction between the two factors.

Extension to multivariate data for 2 factors G, L with interaction: X = ¯ X + XG + XL + XGL + E

5 / 19

slide-8
SLIDE 8

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Decomposition

  • For each variable (i.e. metabolite), the two-way ANOVA

decomposition: xijk = µ + αi + βj + γij + ǫijk

  • µ is the overall mean
  • αi is the effect due to level i in the first factor
  • βj is the effect due to level j in the second factor
  • γijk is the effect due to the interaction between the two factors.

Extension to multivariate data for 2 factors G, L with interaction: X = ¯ X + XG + XL + XGL + E In the following, we assume that the data matrix X is centered by column: ¯ X = 0)

5 / 19

slide-9
SLIDE 9

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

ANOVA-Simultaneous Component Analysis

X = XG + XL + XGL +E PCA PCA For each effect, maximization of the between groups variances

Limitation

Lack of tools to assess the significance of the effects. To cope with this difficulty, permutation tests were proposed (Vis et al., 2007; Zwanenburg et al., 2011)

6 / 19

slide-10
SLIDE 10

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

ANOVA-PCA (APCA)

  • ZG = XG + E

PCA(ZG)

  • ZL = XL + E

PCA(ZL)

  • ZGL = XGL + E

PCA(ZGL) The rationale behind this strategy is that if a factor is significant, it is likely to emerge on the first principal components.

Problem

It may occur that the noise dominates the first component. Gradual reduction of the residual variance + permutation test (Climaco-Pinto et al., 2009).

7 / 19

slide-11
SLIDE 11

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

New idea: ANOVA-PLS

  • ZG = XG + E

PLS(XG ∼ ZG)

  • ZL = XL + E

PLS(XL ∼ ZL)

  • ZGL = XGL + E

PLS(XGL ∼ ZGL)

Rationale

If the factor has a significant effect it will emerge,

  • therwise it will be “diluted” in the noise and the

regression model will be irrelevant.

8 / 19

slide-12
SLIDE 12

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

The case of one factor: F

X = XF + E

  • PLS(XF ∼ X)

⇒ PLS-DA (Kemsley, 1996; Barker & Rayens, 2003; Nocairi et al., 2005) Thus, our approach appears as an extension of PLS-DA to the case of several factors.

9 / 19

slide-13
SLIDE 13

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Comparison of methods

ASCA: PCA(XG)

max variance(u) with u = XGν

APCA: PCA(XG + E)

max variance(t) with t = (XG + E)ω

ANOVA-PLS: PLS(XG ∼ (XG + E))

max cov2(u, t) = variance(u)variance(t)cor2(u, t)

10 / 19

slide-14
SLIDE 14

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Benefits of the ANOVA-PLS

Tools are available to highlight the relevance of the model (and the significance of the effects):

  • PLS principal components (scores)
  • RMSEP (Root Mean Square Error of Prediction)
  • Q2 (index used in Cross validation to assess the

significance of a new component)

  • VIP: Variable Importance in the Projection

11 / 19

slide-15
SLIDE 15

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Application to metabolomics data

Experimental design on rats pups during gestation and lactation stages:

Gestation Lactation Protein-Restricted dams (8g of protein/100g of food) 8 % Protein-Restricted dams RR 20 % Protein-Control dams RC Protein-Control dams (20g of protein/100 g of food) 8 % Protein-Restricted dams CR 20 % Protein-Control dams CC

Two factors + interaction

  • Factor Gestation (G), 2 levels: first letter R or first letter C
  • Factor Lactation (L), 2 levels: second letter R or second letter C

12 / 19

slide-16
SLIDE 16

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Factor Gestation

PLS(XG ∼ XG + E) Cumulative percentage of variation XG XG + E comp 1 17.8 10.7 comp 2 36.4 20.4 comp 3 55.5 25.0 comp 4 70.4 27.8 comp 5 81.0 30.4 comp 6 87.8 32.7 comp 7 91.3 35.6 comp 8 93.0 40.7 comp 9 94.8 43.3

Percentage correctly classified by cross validation (LOO)

2 4 6 8 10 56 58 60 62 64 component index percentage correctly classified

7 components

13 / 19

slide-17
SLIDE 17

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Factor Gestation

7 components were retained and submitted to Linear Discriminant Analysis (LDA) Boxplot of the discriminant component of LDA

GC GR −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 LDA scores

⇒ Discrimination between the two levels of the factor Gestation on LDA component

14 / 19

slide-18
SLIDE 18

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Factor Lactation

PLS(XL ∼ XL + E) Cumulative percentage of variation XL XL + E comp 1 47.7 8.0 comp 2 60.8 21.1 comp 3 75.1 26.1 comp 4 84.0 29.6 comp 5 89.9 32.1 comp 6 92.8 34.7 comp 7 94.2 39.2

Percentage correctly classified by cross validation (LOO)

2 4 6 8 10 78 80 82 84 86 88 90 component index percentage correctly classified

5 components

15 / 19

slide-19
SLIDE 19

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Factor Lactation

5 components were retained and submitted to LDA Boxplot of the discriminant component of LDA

LC LR −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 LDA scores

⇒ Discrimination between the two levels of the factor Lactation on LDA component

16 / 19

slide-20
SLIDE 20

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Interaction

PLS(XGL ∼ XGL + E) Cumulative percentage of variation XGL XGL + E comp 1 16.1 11.5 comp 2 32.4 20.6 comp 3 47.7 25.0 comp 4 56.9 28.6 comp 5 64.3 31.1 comp 6 68.4 34.7 comp 7 71.3 39.0 comp 8 73.5 41.8 comp 9 75.7 43.9

7 components retained and submitted to LDA

−0.5 0.0 0.5 1.0 1.5 −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15

LDA comp.1 LDA comp.2

CC CR RC RR

⇒ Discrimination on the first LDA component between levels {CC, RR} and {RC, CR}.

17 / 19

slide-21
SLIDE 21

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Conclusion

A new method is introduced to study multivariate data depending on several factors, ANOVA-PLS: → an extension of PLS-DA to the case of several factors → a compromise between the two existing methods ASCA and ANOVA-PCA → diagnostic tools available through using PLS-regression → better prediction ability than ASCA and ANOVA-PCA.

18 / 19

slide-22
SLIDE 22

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

References

Barker, M., & Rayens, W. (2003). Partial least squares for discrimination. Journal of Chemometrics, 17, 166–173. Climaco-Pinto, R., Barros, A. S., Locquet, N., Schmidtke, L., & Rutledge, D. N. (2009). Improving the detection of significant factors using anova-pca by selective reduction of residual variability. Analytica Chimica Acta, 653, 131–142. Harrington, P. d. B., Vieira, N. E., Espinoza, J., Nien, J. K., Romero, R., & Yergey, A. L. (2005). Analysis of variance-principal component analysis: A soft tool for proteomic discovery. Analytica Chimica Acta, 544, 118–127. Kemsley, E. K. (1996). Discriminant analysis of high-dimensional data: a comparison of principal components analysis and partial least squares data reduction methods. Chemometrics and Intelligent Laboratory Systems, 33, 47–61. Nocairi, H., Qannari, E. M., Vigneau, E., & Bertrand, D. (2005). Discrimination on latent components with respect to patterns. application to multicollinear data. Computational Statistics and Data Analysis, 48, 139–147. Smilde, A., Jansen, J., Hoefsloot, H., Lamers, R.-J., Greef, J., & Timmerman, M. (2005). Anova-simultaneous component analysis (asca): a new tool for analyzing designed metabolomics data. BIOINFORMATICS, 21, 3043–3048. Vis, D., Westerhuis, J., Smilde, A. K., & van der Greef, J. (2007). Statistical validation of megavariate effects in asca. BMC Bioinformatics, 8, 322. Zwanenburg, G., Hoefsloot, H. C. J., Westerhuis, J. A., Jansen, J. J., & Smilde, A. K. (2011). Anova-principal component analysis and anova-simultaneous component analysis: a comparison. Journal of Chemometrics, 25, 561–567. 18 / 19

slide-23
SLIDE 23

Thank you!

slide-24
SLIDE 24

Analysis of multivariate data depending on several factors Existing methods

ASCA APCA

ANOVA-PLS

Particular case

Comparison Benefits ANOVA-PLS Application

Factor Gestation Factor Lactation Interaction

Conclusion References

Comparison between methods

We compare the ability of prediction for a new sample between the three methods: ASCA, ANOVA-PCA and ANOVA-PLS, by cross validation Leave One Out. Prediction of a new sample for factor Lactation.

2 4 6 8 10 30 40 50 60 70 80 90 component index percentage correctly classified

ANOVA−PLS ANOVA−PCA ASCA

Better prediction for ANOVA-PLS.

19 / 19