on mixtures of factor mixture analyzers
play

On Mixtures of Factor Mixture Analyzers Cinzia Viroli - PowerPoint PPT Presentation

On Mixtures of Factor Mixture Analyzers Cinzia Viroli cinzia.viroli@unibo.it Department of Statistics, University of Bologna, Italy Compstat 2010 Paris August 22-27 slide 1 State of the art (1) In model based clustering the data are


  1. On Mixtures of Factor Mixture Analyzers Cinzia Viroli cinzia.viroli@unibo.it Department of Statistics, University of Bologna, Italy Compstat 2010 Paris August 22-27 – slide 1

  2. State of the art (1) ■ In model based clustering the data are assumed to come from a Introduction ➢ State of the art finite mixture model (McLachlan and Peel, 2000) with each ➢ State of the art component corresponding to a cluster. ➢ State of the art ➢ State of the art ➢ State of the art MFMA An empirical illustration Compstat 2010 Paris August 22-27 – slide 2

  3. State of the art (1) ■ In model based clustering the data are assumed to come from a Introduction ➢ State of the art finite mixture model (McLachlan and Peel, 2000) with each ➢ State of the art component corresponding to a cluster. ➢ State of the art ➢ State of the art ➢ State of the art ■ For quantitative data each mixture component is usually modeled MFMA as a multivariate Gaussian distribution (Fraley and Raftery, An empirical illustration 2002): k � w i φ ( p ) ( y ; µ i , Σ i ) f ( y ; θ ) = i =1 Compstat 2010 Paris August 22-27 – slide 2

  4. State of the art (1) ■ In model based clustering the data are assumed to come from a Introduction ➢ State of the art finite mixture model (McLachlan and Peel, 2000) with each ➢ State of the art component corresponding to a cluster. ➢ State of the art ➢ State of the art ➢ State of the art ■ For quantitative data each mixture component is usually modeled MFMA as a multivariate Gaussian distribution (Fraley and Raftery, An empirical illustration 2002): k � w i φ ( p ) ( y ; µ i , Σ i ) f ( y ; θ ) = i =1 ■ However when the number of observed variables is large, it is well known that Gaussian mixture models represent an over-parameterized solution. Compstat 2010 Paris August 22-27 – slide 2

  5. State of the art (2) Some solutions (among the others): Model based clustering Dimensionally reduced model based clustering Compstat 2010 Paris August 22-27 – slide 3

  6. State of the art (2) Some solutions (among the others): Model based clustering Dimensionally reduced model based clustering ■ Banfield and Raftery (1993): proposed a parameterization of the generic component- covariance matrix based on its spectral decomposition: Σ i = λ i A ⊤ i D i A i ■ Bouveyron et al. (2007): proposed a different parameteri- zation of the generic component- covariance matrix Compstat 2010 Paris August 22-27 – slide 3

  7. State of the art (2) Some solutions (among the others): Model based clustering Dimensionally reduced model based clustering ■ Banfield and Raftery (1993): ■ Ghahrami and Hilton (1997) and proposed a parameterization McLachlan et al. (2003): of the generic component- Mixtures of Factor Analyzers (MFA) covariance matrix based on its spectral decomposition: Σ i = λ i A ⊤ i D i A i ■ Bouveyron et al. (2007): proposed a different parameteri- zation of the generic component- covariance matrix Compstat 2010 Paris August 22-27 – slide 3

  8. State of the art (2) Some solutions (among the others): Model based clustering Dimensionally reduced model based clustering ■ Banfield and Raftery (1993): ■ Ghahrami and Hilton (1997) and proposed a parameterization McLachlan et al. (2003): of the generic component- Mixtures of Factor Analyzers (MFA) covariance matrix based on its spectral decomposition: ■ Yoshida et al. (2004), Baek and Σ i = λ i A ⊤ i D i A i McLachlan (2008), Montanari and Viroli (2010) : ■ Bouveyron et al. (2007): Factor Mixture Analysis (FMA) proposed a different parameteri- zation of the generic component- covariance matrix Compstat 2010 Paris August 22-27 – slide 3

  9. Mixture of factor analyzers (MFA) ■ Dimensionality reduction is performed through k factor models with Gaussian factors Compstat 2010 Paris August 22-27 – slide 4

  10. Mixture of factor analyzers (MFA) ■ Dimensionality reduction is performed through k factor models with Gaussian factors ■ The distribution of each observation is modelled, with probability π j ( j = 1 , . . . , k ), according to an ordinary factor analysis model y = η j + Λ j z + e j , with e j ∼ φ ( p ) ( 0 , Ψ j ) , where Ψ j is a diagonal matrix and z j ∼ φ ( q ) ( 0 , I q ) Compstat 2010 Paris August 22-27 – slide 4

  11. Mixture of factor analyzers (MFA) ■ Dimensionality reduction is performed through k factor models with Gaussian factors ■ The distribution of each observation is modelled, with probability π j ( j = 1 , . . . , k ), according to an ordinary factor analysis model y = η j + Λ j z + e j , with e j ∼ φ ( p ) ( 0 , Ψ j ) , where Ψ j is a diagonal matrix and z j ∼ φ ( q ) ( 0 , I q ) ■ In the observed space we obtain a finite mixture of multivariate Gaussians with heteroscedastic components: k � π j φ ( p ) ( η j , Λ j Λ ⊤ f ( y ) = j + Ψ j ) j =1 Compstat 2010 Paris August 22-27 – slide 4

  12. Factor Mixture Analysis (FMA) ■ Dimensionality reduction is performed through a single factor model with factors modelled by a multivariate Gaussian mixture Compstat 2010 Paris August 22-27 – slide 5

  13. Factor Mixture Analysis (FMA) ■ Dimensionality reduction is performed through a single factor model with factors modelled by a multivariate Gaussian mixture ■ The observed centred data are described as y = Λ z + e with e ∼ φ ( p ) ( 0 , Ψ ) where Ψ is diagonal. Compstat 2010 Paris August 22-27 – slide 5

  14. Factor Mixture Analysis (FMA) ■ Dimensionality reduction is performed through a single factor model with factors modelled by a multivariate Gaussian mixture ■ The observed centred data are described as y = Λ z + e with e ∼ φ ( p ) ( 0 , Ψ ) where Ψ is diagonal. ■ The q factors are assumed to be standardized and are modelled as a finite mixture of multivariate Gaussians k γ i φ ( q ) � f ( z ) = i ( µ i , Σ i ) . i =1 Compstat 2010 Paris August 22-27 – slide 5

  15. Factor Mixture Analysis (FMA) ■ Dimensionality reduction is performed through a single factor model with factors modelled by a multivariate Gaussian mixture ■ The observed centred data are described as y = Λ z + e with e ∼ φ ( p ) ( 0 , Ψ ) where Ψ is diagonal. ■ The q factors are assumed to be standardized and are modelled as a finite mixture of multivariate Gaussians k γ i φ ( q ) � f ( z ) = i ( µ i , Σ i ) . i =1 ■ In the observed space we obtain a finite mixture of multivariate Gaussians with heteroscedastic components: k ( Λ µ i , ΛΣ i Λ ⊤ + Ψ ) . γ i φ ( p ) � f ( y ) = i i =1 Compstat 2010 Paris August 22-27 – slide 5

  16. MFA vs FMA MFA FMA ■ k factor models with q Gaussian ■ one factor model with q non factors; Gaussian factors (distributed as a multivariate mixture of Gaus- sians); Compstat 2010 Paris August 22-27 – slide 6

  17. MFA vs FMA MFA FMA ■ k factor models with q Gaussian ■ one factor model with q non factors; Gaussian factors (distributed as a multivariate mixture of Gaus- sians); ■ The number of clusters corre- ■ The number of clusters is defined sponds to the number of factor by the number of components of models; ⇒ ’local’ dimension re- the Gaussian mixture; ⇒ ’global’ duction within each group dimension reduction and cluster- ing is performed in the latent space. Compstat 2010 Paris August 22-27 – slide 6

  18. MFA vs FMA MFA FMA ■ k factor models with q Gaussian ■ one factor model with q non factors; Gaussian factors (distributed as a multivariate mixture of Gaus- sians); ■ The number of clusters corre- ■ The number of clusters is defined sponds to the number of factor by the number of components of models; ⇒ ’local’ dimension re- the Gaussian mixture; ⇒ ’global’ duction within each group dimension reduction and cluster- ing is performed in the latent space. ■ A flexible solution with less pa- ■ A flexible solution with less pa- rameters than model based clus- rameters than model based clus- tering; tering; Compstat 2010 Paris August 22-27 – slide 6

  19. Introduction MFMA ➢ Definition (1) ➢ Definition (2) ➢ A note An empirical illustration Mixtures of Factor Mixture Analyzers Compstat 2010 Paris August 22-27 – slide 7

  20. The model We assume the data can be described by k 1 factor models with Introduction probability π j ( j = 1 , . . . , k 1 ): MFMA ➢ Definition (1) ➢ Definition (2) y = η j + Λ j z + e j . (1) ➢ A note An empirical illustration Compstat 2010 Paris August 22-27 – slide 8

  21. The model We assume the data can be described by k 1 factor models with Introduction probability π j ( j = 1 , . . . , k 1 ): MFMA ➢ Definition (1) ➢ Definition (2) y = η j + Λ j z + e j . (1) ➢ A note An empirical illustration Within all the factor models, the factors are assumed to be distributed according to a finite mixture of k 2 Gaussians: k 2 � γ i φ ( q ) ( µ i , Σ i ) , f ( z ) = (2) i =1 with mixture parameters supposed to be equal across the factor models j = 1 , . . . , k 1 . Compstat 2010 Paris August 22-27 – slide 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend