Model-based clustering of categorical data by relaxing conditional - PowerPoint PPT Presentation

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Model-based clustering of categorical data by relaxing conditional independence M. Marbac 3 , 6 , C. Biernacki 3 , 4 , 5 , V. Vandewalle 1 , 2 , 3 Classification society meeting 2015 Mc Master University 5 June 2015 5 2 4 1 6 3 1/39

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Outline 1 Motivation 2 Intra-block model I: Mixture of two extreme distributions 3 Intra-block model II: Conditional dependency modes 2/39

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Model-based clustering ˆ z = (ˆ z 1 , . . . , ˆ x = ( x 1 , ..., x n ) z n ), ˆ g clusters 4 4 2 2 clustering − → X 2 X 2 0 0 −2 −2 −2 0 2 4 −2 0 2 4 X 1 X 1 Mixture model: well-posed problem g � x → ˆ θ → p ( z | x , g ; ˆ � θ ) → ˆ z p ( x ; θ | g ) = π k p ( x ; θ k | g ) can be used for x → ˆ p ( g | x ) → ˆ g k =1 with θ = (( π 1 , . . . , π k , . . . , π g ) , ( α 1 , . . . , α k , . . . , α g )) 3/39

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Categorical data d categorical variables, each with m j response levels x i = { x j i : j = 1 , . . . , d } x j i = { x jh : h = 1 , . . . , m j } i x jh = 1 if i has response level h for variable j and x jh = 0 otherwise i i Example (“Genes Diffusion” company): n = 4270 calves d = 9 variables of behavior 1 and health related 2 Response levels of TRC ( j = 3): TRC ∈{ “curative”,“preventive”,“no” } ( m 3 = 3) x 3 = “curative” = (1 0 0) 1 x 3 = “no” = (0 0 1) 2 x 3 = “no” = (0 0 1) 3 . . . . . . . . . . . . . . . 1 aptitude for sucking Apt , behavior of the mother just before the calving Iso 2 treatment against omphalite TOC , respiratory disease TRC and diarrhea TDC , umbilicus disinfection Dis , umbilicus emptying Emp , mother preventive treatment against respiratory disease TRM and diarrhea TDM 4/39

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Intra-class correlations A nowadays interest More frequent (in the population) when d increases More observable (in the sample) when n increases Risk of bias when models do not take into account such correlations Bias example (on z ) with Gaussians: 4 4 2 2 X 2 X 2 0 0 −2 −2 −2 0 2 4 −2 0 2 4 X 1 X 1 Independent Gaussians Correlated Gaussians 5/39

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Classical categorical models Conditional independence (CIM): linked to some χ 2 distance-based methods m j d d � � � k ) x jh p ( x j ; α j ( α jh p ( x ; θ k ) = p ( x ; α k ) = k ) = j =1 j =1 h =1 k = p ( x jh = 1 | z = k ) where α k = { α jh k : j = 1 , . . . , d , h = 1 , . . . , m j } and α jh ⊖ bias Dependence trees: allows only certain dependencies ⊖ too many parameters and unstable estimation of the tree Latent Trait Analyzers: a continuous variable explains intra-dependency m j d � � � p ( x jh | c ; α k ) p ( c ) d c p ( x ; α k ) = R | c | j =1 h =1 ⊖ difficult to meaningfully explain correlations The “gold rule” A model should be flexible + parsimonious + meaningful 6/39

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Dependence per blocks (1/3) Conditionally on the class k , variables are grouped into b k independent blocks Partition of variables: σ k = ( σ k 1 , . . . , σ k b k ) of { 1 , . . . , d } Number of variables in the block b of the component k : d { kb } = card( σ kb ) Subset of x associated to σ kb : x { kb } = x σ kb = ( x { kb } j ; j = 1 , . . . , d { kb } ) Variable j of the block b for component k : x { kb } j = ( x { kb } jh ; h = 1 , . . . , m { kb } ) j Modalities number of x { kb } j : m { kb } j All repartitions in blocks: σ = ( σ 1 , . . . , σ g ) Distribution per class: B k � p ( x { kb } ; θ kb ) p ( x ; θ k | σ k , g ) = with θ k = ( θ k 1 , . . . , θ k b k ) b =1 Inter-Block model σ k verifies the “gold rule” 7/39

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Dependence per blocks (2/3) Example with g = 2, d = 5: k = 1, B 1 = 2 k = 2, B 2 = 3 σ 1 = ( { 1 , 2 } , { 3 , 4 , 5 } ) σ 2 = ( { 1 , 5 } , { 2 , 4 } , { 3 } ) The present work Intra-block distribution p ( x { kb } ; θ kb ) should also verify the “gold rule” 8/39

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Dependence per blocks (2/3) Example with g = 2, d = 5: k = 1, B 1 = 2 k = 2, B 2 = 3 σ 1 = ( { 1 , 2 } , { 3 , 4 , 5 } ) σ 2 = ( { 1 , 5 } , { 2 , 4 } , { 3 } ) The present work Two Intra-block distributions are now proposed. . . 9/39

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Outline 1 Motivation 2 Intra-block model I: Mixture of two extreme distributions 3 Intra-block model II: Conditional dependency modes 10/39

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Maximum dependency distribution Main idea The “opposite” distribution of independence according to the Cramer’s V criterion computed on all the couples of variables The knowledge of the variable having the largest number of modalities determines exactly the others Variables are ordered by decreasing number of modalities in each block Successive surjections from the space of x { kb } j to the space of x { kb } j +1 other variables � �� 1st variable d { kb } � �� p ( x { kb } ; τ kb , δ kb ) = p ( x { kb } 1 ; τ kb ) p ( x { kb } j | x { kb } 1 ; { δ hj kb } h =1 ,..., m { kb } ) 1 j =2 m { kb } m { kb } d { kb } j 1 ) x { kb } jh ′ � x { kb } 1 h � � � � ( δ hjh ′ τ h = kb kb �� h ′ =1 h =1 j =2 ∈ (0 , 1) ∈{ 0 , 1 } kb = ( δ hjh ′ with δ kb = ( δ hj kb ), δ hj kb ), τ kb = ( τ h kb ) 11/39

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Example m { 12 } 1 = m { 12 } 2 = m { 12 } 3 = 2 m { 11 } 1 = 4, m { 11 } 2 = 3 δ hjh ′ δ h 1 h = 1 for h = 1 , 2 , 3, δ 413 11 = 1 = 1 iff ( h = h ′ ) 11 12 τ 11 = (0 . 1 , 0 . 3 , 0 . 2 , 0 . 4) τ 12 = (0 . 5 , 0 . 5) 12/39

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Mixture of extreme distributions (CCM1) CCM1 p ( x { kb } ; θ kb ) = (1 − ρ kb ) p ( x { kb } ; α kb ) + ρ kb p ( x { kb } ; τ kb , δ kb ) � �� independence extreme dependency where θ kb = ( ρ kb , α kb , τ kb , δ kb ) Meaningful: ρ kb : global inter-variable correlation in the block (0 ≤ ρ kb ≤ 1) δ kb : intra-variable correlation in the block ( ∈ { 0 , 1 } ) Parsimony: � m { kb } ν ccm1 = ν cim + 1 { ( k , b ) | d { kb } > 1 } � �� nb modalities of the 1st variable in the block Identifiable if d { kb } > 2 or m { kb } > 2 (additional constraints added otherwise) 2 13/39

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes ρ kb vs. Cramer’s V Empirical link between ρ kb and the Cramer’s V for two binary variables 14/39

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Estimation of θ (1/3) ˆ θ = argmax θ L ( θ ; x | g , σ ) with model ( g , σ ) fixed Global GEM algorithm E global step: π ( r ) k p ( x i ; σ k , θ ( r ) k ) z ( r ) = ik � g k ′ =1 π ( r ) k ′ p ( x i ; σ k ′ , θ ( r ) k ′ ) GM global step: = n ( r ) n � π ( r +1) n ( r ) z ( r ) k where = k k ik n i =1 θ ( r +1) = argmax θ kb L ( θ kb ; x , z ( r ) | g , σ ) ∀ ( k , b ) , − → MH algorithm kb 15/39

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Estimation of θ (2/3) θ ( r +1) = argmax θ kb L ( θ kb ; x , z ( r ) | g , σ ) with ( z ( r ) , g , σ ) fixed ∀ ( k , b ) , kb Metropolis-Hastings algorithm (discrete parameters δ kb ) Proposal distribution: ( r , s + 1 2 ) ∼ uniform distribution in a neighborhood ∆( δ ( r , s ) δ ) kb kb ( r , s + 1 ( ρ kb , α kb , τ kb ) ( r , s + 1 2 ) 2 ) = argmax • L ( • ; x , z ( r ) , δ | g , σ ) − → EM algorithm kb Acceptance distribution:   ( r , s + 1 ( r , s + 1 ) z ( r ) 2 ) 2 ) � n i =1 p ( x { kb } ik | ∆( δ  ; θ ) |  µ ( r , s +1) = min i kb kb , 1 ) z ( r ) � n i =1 p ( x { kb } ; θ ( r , s ) ik | ∆( δ ( r , s )   ) | i kb kb � ( r , s + 1 2 ) with probability µ ( r , s +1) θ ( r , s +1) θ = kb kb θ ( r , s ) otherwise kb 16/39

Model-based clustering of categorical data by relaxing conditional - PowerPoint PPT Presentation

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Model-based clustering of categorical data by relaxing conditional independence M. Marbac 3 , 6 , C. Biernacki 3 , 4 , 5 , V.

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Case study introduction Emily Robinson Data Scientist DataCamp Categorical Data in the

Reordering factors Emily Robinson Data Scientist DataCamp Categorical Data in the Tidyverse

Clustering A Categorization of Major Clustering Methods Partitioning Methods

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

STAT 113 Describing Categorical Data Colin Reimer Dawson Oberlin College September 7, 2017 1 /

STAT 113 Describing Categorical Data I Colin Reimer Dawson Oberlin College September 11, 2020

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

CHAPTER VIII VIII CHAPTER Data Clustering and Data Clustering and Self- -Organizing Feature

Introduction to qualitative data Emily Robinson Data Scientist DataCamp Categorical Data in

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Mobile & Service Robotics Mobile & Service Robotics Introduction and Locomotio

ROBOTICS ROBOTICS 01PEEQW 01PEEQW 01PEEQW 01PEEQW Basilio Bona Basilio Bona DAUIN DAUIN

CSCE 625: Artificial Intelligence Dr. Dylan Shell 1 Shell CSCE 625 TAMU CSCE 625: Artificial

On accessibility of hyperbolic components of the tricorn Hiroyuki Inou (Joint work in progress

The Risk Channel of Unconventional Monetary Policy Dejanir H. Silva Discussant: Christoph

Unconventional Monetary Policy during the Great Recession: Theory, Empirical Evidence and

The r-process in supernovae and neutron star mergers Almudena Arcones r-process in ultra

Econometric Analysis of Monetary Policy at the Zero Lower Bound Daisuke Ikeda Bank of Japan 26

Model-based clustering of categorical data by relaxing conditional - PowerPoint PPT Presentation

Motivation Intra-block model I: Mixture of two extreme distributions Intra-block model II: Conditional dependency modes Model-based clustering of categorical data by relaxing conditional independence M. Marbac 3 , 6 , C. Biernacki 3 , 4 , 5 , V.

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Case study introduction Emily Robinson Data Scientist DataCamp Categorical Data in the

Reordering factors Emily Robinson Data Scientist DataCamp Categorical Data in the Tidyverse

Clustering A Categorization of Major Clustering Methods Partitioning Methods

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

STAT 113 Describing Categorical Data Colin Reimer Dawson Oberlin College September 7, 2017 1 /

STAT 113 Describing Categorical Data I Colin Reimer Dawson Oberlin College September 11, 2020

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

CHAPTER VIII VIII CHAPTER Data Clustering and Data Clustering and Self- -Organizing Feature

Introduction to qualitative data Emily Robinson Data Scientist DataCamp Categorical Data in

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Categorical Semantics for Linear Logic Categorical semantics for linear logic Interaction

Mobile &amp; Service Robotics Mobile &amp; Service Robotics Introduction and Locomotio

ROBOTICS ROBOTICS 01PEEQW 01PEEQW 01PEEQW 01PEEQW Basilio Bona Basilio Bona DAUIN DAUIN

CSCE 625: Artificial Intelligence Dr. Dylan Shell 1 Shell CSCE 625 TAMU CSCE 625: Artificial

On accessibility of hyperbolic components of the tricorn Hiroyuki Inou (Joint work in progress

The Risk Channel of Unconventional Monetary Policy Dejanir H. Silva Discussant: Christoph

Unconventional Monetary Policy during the Great Recession: Theory, Empirical Evidence and

The r-process in supernovae and neutron star mergers Almudena Arcones r-process in ultra

Econometric Analysis of Monetary Policy at the Zero Lower Bound Daisuke Ikeda Bank of Japan 26

Mobile & Service Robotics Mobile & Service Robotics Introduction and Locomotio