unifying data units and models in co clustering
play

Unifying Data Units and Models in (Co-)Clustering C. Biernacki - PowerPoint PPT Presentation

Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Unifying Data Units and Models in (Co-)Clustering C. Biernacki Joint work with A. Lourme 24 e rencontres de la Soci et e Francophone de


  1. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Unifying Data Units and Models in (Co-)Clustering C. Biernacki Joint work with A. Lourme 24 e rencontres de la Soci´ et´ e Francophone de Classification 28-30 juin 2017 – Lyon – Fance 1/48

  2. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Quizz! y = β x 2 + e Is it a linear regression on co-variates ( x 2 )? Is it a quadratic regression on co-variates x ? Both! 2/48

  3. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Take home message Units are entirely interrelated with models This part: Be aware that interpretation of (“classical”) models is unit dependent Models should even be revisited as a couple units × “classical” models Opportunity for cheap/wide/meaningful enlarging of “classical” model families Focus on model-based (co-)clustering but larger potential impact 3/48

  4. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Outline 1 Introduction 2 Units in model-based clustering Scale units and parsimonious Gaussians Non scale units and Gaussians Class conditional units and Gaussians Units and Poissons 3 Units in model-based co-clustering Model for di ff erent kinds of data Units and Bernoulli Units and multinomial 4 Conclusion Summary Units and other distributions 4/48

  5. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion General (model-based) statistical framework Data: Whole data set composed by n objects, described by d variables x = ( x 1 , . . . , x n ) with x i = ( x i 1 , . . . , x id ) ∈ X Each x i value is provided with a unit id We note “ id ” since units are often user defined (a kind of canonical units) Model: A pdf 1 family, indexed by m ∈ M 2 p m = {· ∈ X �→ p( · ; θ ) : θ ∈ Θ m } With p( · ; θ ) a (parametric) pdf and Θ m a space where evolves this parameter Target: � target = f ( x , p m ) Unit id is hidden everywhere and could have consequences on the target estimation! 1 probability density function 2 Often, the index m is confounded with the distribution family itself as a shortcut 5/48

  6. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Changing the data units Principle of data units transformation u : X = X id X u u : − → x = x id = id ( x ) x u = u ( x ) �− → u is a bijective mapping to preserve the whole data set information quantity We denote by u − 1 the reciprocal of u , so u − 1 ◦ u = id Thus, id is only a particular unit u Often a meaningful restriction 3 on u : it proceeds lines by lines and rows by rows u ( x ) = ( u ( x 1 ) , . . . , u ( x n )) with u ( x i ) = ( u 1 ( x i 1 ) , . . . , u d ( x id )) Advantage to respect the variable definition, transforming only its unit u ( x i ) means that u applied to the data set x i , restricted to the single individual i u j corresponds to the specific (bijective) transformation unit associated to variable j 3 Possibility to relax this restriction, including for instance linear transformations involved in PCA (principal component analysis). But the variable definition is no longer respected. 6/48

  7. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Revisiting units as a modelling component Explicitly exhibiting the “canonical” unit id in the model p m = {· ∈ X �→ p( · ; θ ) : θ ∈ Θ m } = {· ∈ X id �→ p( · ; θ ) : θ ∈ Θ m } = p id m Thus the variable space and the probability measure are embedded As the standard probability theory: a couple (variable space,probability measure)! Changing id into u , while preserving m , is expected to produce a new modelling m = {· ∈ X u �→ p( · ; θ ) : θ ∈ Θ m } . p u A model should be systematically defined by a couple ( u , m ), denoted by p u m 7/48

  8. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Interpretation and identifiability of p u m Standard probability theory (again): there exists a measure u − 1 ( m ) s.t. 4 u − 1 ( m ) ∈ { m ′ ∈ M : p id m ′ = p u m } There exists two alternative interpretations of strictly the same model: p u m : data measured with unit u arise from measure m ; p id u − 1( m ) : data measured with unit id arise from measure u − 1 ( m ) Two points of view: Statistician The model p u m is not identifiable over the couple ( m , u ) Practitioner Freedom to choose the interpretation which is the most meaningful for him 4 This set is usually restricted to a single element 8/48

  9. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Opportunity for designing new models Great opportunity to build easily numerous new meaningful models p u m ! Just combine a standard model family { m } with a standard unit family { u } New family can be huge! Combinatorial problems can occur. . . Some model stability can exist in some (specific) cases: m = u − 1 ( m ) 9/48

  10. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Model selection As any model, possible to choose between p u 1 m 1 and p u 2 m 2 However, caution when using likelihood-based model selection criteria (as BIC) Prohibited to compare m 1 in unit u 1 and m 2 in unit u 2 But allowed after transforming in identical unit id Thus compare their equivalent expression: p id ( m 1 ) and p id u − 1 u − 1 ( m 2 ) 1 2 Example for abs. continuous x and di ff erentiable u , the density transform in id is: u − 1 ( m ) = {· ∈ X id �→ p( u ( · ); θ ) × | J u ( · ) | : θ ∈ Θ m } p id with J u ( · ) the Jacobian associated to the transformation u 10/48

  11. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Focus on the clustering target A current challenge is to enlarge model collection. . . and units could contribute to it! Model: mixture model m of parameter θ = { π k , α k } g k =1 g � p m ( ; θ ) = π k p( ; α k ) k =1 g is the number of clusters Clusters correspond to a hidden partition z = ( z 1 , . . . , z n ), where z i ∈ { 1 , . . . , g } π k = p( Z = k ) and p( ; α k ) = p( = | Z = k ) Target: estimate z (and often g ) Estimate ˆ θ m by maximum likelihood (typically) i = x i ; ˆ Estimate z by the MAP principle ˆ z i = arg max k ∈ { 1 ,..., g } p( Z i = k | θ m ) Estimate g by BIC or ICL criteria typically (maximum likelihood based criteria) 11/48

  12. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Outline 1 Introduction 2 Units in model-based clustering Scale units and parsimonious Gaussians Non scale units and Gaussians Class conditional units and Gaussians Units and Poissons 3 Units in model-based co-clustering Model for di ff erent kinds of data Units and Bernoulli Units and multinomial 4 Conclusion Summary Units and other distributions 12/48

  13. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion 14 spectral models on Σ k X = R d d -variate Gaussian model m : p m ( · ; α k ) = N d ( µ k , Σ k ) [Celeux & Govaert, 1995] 5 propose the following eigen decomposition · D ′ Σ k = λ k · D k · Λ k k ���� ���� ���� volume orientation shape x 2 0.12 0.1 a k λ k 0.08 f(x) 0.06 λ k α k 0.04 a k x 1 0.02 µ k 4 0 − 2 2 0 0 2 − 2 4 6 − 4 x2 x1 5 Celeux, G., and Govaert, G.. Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793 (1995). 13/48

  14. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Scale unit invariance Consider scale unit transformation u ( x ) = Dx , with diagonal D ∈ R d × d Very current transformation: standard units (mm, cm), standardized units [Biernacki & Lourme, 2014] listed models where invariance holds (8 among 14) The general model is invariant: k ] = u − 1 ([ λ k ′ ′ [ λ k k ]) k Λ k k Λ k An example of not invariant model: ′ ] ̸ = u − 1 ([ λ k ′ ]) [ λ k Λ k Λ k Do not forget to compare all models m ′ = u − 1 ( m ) in unit id for BIC / ICL validity Use the Rmixmod package 14/48

  15. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion MASSICCC platform for the MIXMOD software https://massiccc.lille.inria.fr/ 15/48

  16. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Illustration on the Old Faithful geyser data set All models are with free proportions ( π k ) All ICL values are expressed with the initial unit id =min × min We observe the e ff ect of unit on the ICL ranking for some models Cheap opportunity to enlarge the model family! u scale1 = (sec , min) u scale2 = (stand , stand) id = (min , min) ICL id ICL id ICL id family m m m All mod. ′ ] ′ ] ′ [ λ k Λ k 1 160 . 3 [ λ k Λ k 1 158 . 7 [ λ k k Λ k ] 1 160 . 3 General mod. ′ ′ ′ [ λ k k Λ k k ] 1 161 . 4 [ λ k k Λ k k ] 1 161 . 4 [ λ k k Λ k k ] 1 161 . 4 16/48

  17. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Outline 1 Introduction 2 Units in model-based clustering Scale units and parsimonious Gaussians Non scale units and Gaussians Class conditional units and Gaussians Units and Poissons 3 Units in model-based co-clustering Model for di ff erent kinds of data Units and Bernoulli Units and multinomial 4 Conclusion Summary Units and other distributions 17/48

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend