Recent results in model-based clustering via the cluster-weighted - PowerPoint PPT Presentation

Recent results in model-based clustering via the cluster-weighted approach Salvatore Ingrassia Department of Economics and Business University of Catania (Italy) s.ingrassia@unict.it The National Institute for Astrophysics Catania Astrophysical Observatory 17 February 2016 Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 1 / 62

Outline 1 Mixture Modeling Mixture Models with covariates 2 3 Cluster-Weighted Models: the original framework 4 CWM for model-based clustering 5 Gaussian and Student- t CWM Decision boundaries 6 7 Generalized Cluster-Weighted Models 8 More recent developments Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 2 / 62

Mixture Modeling Mixture modeling Finite mixture models provide a flexible approach to statistical modeling of a wide variety of random phenomena characterized by unobserved heterogeneity. Assume that a given population Ω can be partitioned into G disjoint subsets, i.e. Ω = Ω 1 ∪ · · · ∪ Ω G . We aim at identifying the underlying groups and estimating the parameters of the conditional-group densities. Two main cases: 1. finite mixtures of distributions (FMD) 2a. finite mixtures of regression models (FMR) also known as mixture-of-experts models in the machine learning area, switching regression models in econometrics, latent class regression models in marketing, mixed models in biology. 2b. finite mixtures of regression models with concomitant variables (FMRC). Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 3 / 62

Mixture Modeling Mixtures of Distributions (FMD) 1/3 Let Z be a random vector defined on Ω with values in some space Z ⊆ R d and denote by p ( z ) the probability density function (pdf) of Z . Assume that Ω can be partitioned in G disjoint subsets, i.e. Ω = Ω 1 ∪ · · · ∪ Ω G . We say that the density of Z is a finite mixture of distributions (FMD) if p ( z ) can be written in the form G � p ( z ) = p ( z | Ω g ) π g g = 1 where p ( z | Ω g ) is the pdf of Z | Ω g and π g = p (Ω g ) is the mixing weight of the Ω g , with g = 1 , . . . , G . Quite often, one considers mixtures of multivariate Gaussians (FMG) with Z | Ω g ∼ N d + 1 ( µ g , Σ g ) , for g = 1 , . . . , G : G G � � p ( z ) = p ( z | Ω g ) π g = φ d + 1 ( z ; µ g , Σ g ) π g g = 1 g = 1 where φ d + 1 ( z ; µ g , Σ g ) denotes the pdf of multivariate Gaussian distribution with mean vector µ g and covariance matrix Σ g . Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 4 / 62

Mixture Modeling Mixtures of Distributions (FMD) 2/3 Population of students: men and women. Histogram weight distribution: women Histogram weight distribution: men 0.07 0.07 0.06 0.06 0.05 0.05 frequency density frequency density 0.04 0.04 0.03 0.03 0.02 0.02 0.01 0.01 0.00 0.00 20 40 60 80 100 120 20 40 60 80 100 120 weight weight µ w = 56 . 8 Kg, σ w = 6 . 8 Kg µ m = 73 . 7 Kg, σ m = 9 . 8 Kg n w = 1680, , π w = 0 . 61 n m = 1079, π m = 0 . 39 Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 5 / 62

Mixture Modeling Mixtures of Distributions (FMD) 3/3 Histogram weight distribution: men+women 0.07 0.06 0.05 frequency density 0.04 0.03 0.02 0.01 0.00 20 40 60 80 100 120 weight p ( z ) = π w N ( µ w , σ 2 w )+ π m N ( µ m , σ 2 m ) = 0 . 61 · N ( 56 . 8 , 6 . 8 2 )+ 0 . 39 · N ( 73 . 7 , 9 . 8 2 ) Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 6 / 62

Mixture Models with covariates Mixture models with covariates: the problem Consider a pair ( Y , X 1 , . . . , X d ) of a response variable Y and covariates ( X 1 , . . . , X d ) ′ defined on some population Ω with values in R × R d . Assume we are provided with a sample of N i.i.d. realizations of ( Y , X 1 , . . . , X d ) and the dependence of Y n on x n is modeled by a multiple regression model Y n = β 0 + β 1 x n 1 + · · · + β d x nd + ε n = β ′ x n where β = ( β 0 , β 1 , . . . , β d ) ′ ∈ R d + 1 are unknown parameters, x n = ( 1 , x n 1 , . . . , x nd ) ′ ∈ R d + 1 denotes the augmented covariate vector. ε 1 , . . . , ε N ∼ N ( 0 , σ 2 ε ) . The problem In many circumstances, the assumption that the regression coefficients are fixed over all possible realizations of Y 1 , . . . , Y N is inadequate, and models where the regression coefficients change are of practical interest. Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 7 / 62

Mixture Models with covariates Example A: Student Data Data come from a survey on N = 270 university students, see Ingrassia et al. (2014) 1 . Consider the relationship between student height and student’s father height. Two groups: males and females (blue=males, red=females). 190 190 180 180 170 170 height height 160 160 150 150 160 165 170 175 180 185 190 160 165 170 175 180 185 190 father’s height father’s height scatter plot single linear regression model 1 Ingrassia S., Minotti S.C., Punzo A. (2014), Model-based clustering via linear cluster-weighted models, Computational Statistics & Data Analysis , 71 , 159-182. Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 8 / 62

Mixture Models with covariates Example B: Tourism Data Data concern N = 180 monthly observations about the attendance at museums and monuments ( Y , data in millions) on the tourist overnights ( X , data in millions) in Italy over the 15-year period spanning from January 1996 to December 2010. 5 5 attendance at museums and monuments (in million) 4 4 attendance at museums and monuments (in million) 3 3 2 2 1 1 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 tourist overnights (in million) tourist overnights (in million) plot of data ( left ); b) single linear regression model ( right ). Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 9 / 62

Mixture Models with covariates Example C: Star Data Data concern N = 33 observations about the chromospheric activity index log RH K of stars hosting transiting hot Jupiters appears to be correlated with the planets’ surface gravity − 4.0 − 4.0 − 4.5 − 4.5 logR logR − 5.0 − 5.0 − 5.5 − 5.5 0.0005 0.0010 0.0015 0.0005 0.0010 0.0015 g_p^ − 1 g_p^ − 1 plot of data ( left ); b) single linear regression model ( right ). Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 10 / 62

Mixture Models with covariates Mixture models with covariates Consider a pair ( Y , X ) of a response variable Y and covariates X defined on some heterogeneous population Ω partitioned into G disjoint homogeneous subpopulations, i.e. Ω = Ω 1 ∪ · · · ∪ Ω G . We focus on modeling the dependence between Y and X based on data coming from a heterogeneous population. In this framework, mixture models provide a flexible approach for a wide variety of random phenomena characterized by unobserved heterogeneity. Existing literature Mixture of regression models (MR) , Mixture of regression models with concomitant variables (MRC) Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 11 / 62

Mixture Models with covariates Mixture of Regressions (MR) Dependence between Y and X for data coming from a heterogenuous population can be modeled by a finite mixture of regressions (FMR) , see e.g. McLachlan and Peel (2000) 2 , Fr¨ uhwirth-Schnatter (2006) 3 : G � p ( y | x , ψ ) = f ( y | x , θ g ) π g . g = 1 where: f ( y | x , θ g ) is the conditional density of Y given x in the group Ω g ; the conditional densities belong to the same parametric family, indexed in θ g ∈ Θ , g = 1 , . . . , G . π g = p (Ω g ) is the mixing weight of Ω g , ( π g > 0 and � G g = 1 π g = 1). G ) ′ ∈ Ψ is the vector of all parameters. ψ = ( π 1 , . . . , π G , θ ′ 1 , . . . , θ ′ 2 McLachlan G.J., Peel D. (2000). Finite Mixture Models . Wiley, New York. 3 Fr¨ uhwirth-Schnatter S. (2006). Finite Mixture and Markov Switching Models . Springer, Heidelberg. Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 12 / 62

Recent results in model-based clustering via the cluster-weighted - PowerPoint PPT Presentation

Recent results in model-based clustering via the cluster-weighted approach Salvatore Ingrassia Department of Economics and Business University of Catania (Italy) s.ingrassia@unict.it The National Institute for Astrophysics Catania

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Clustering Data Clustering with user constraints The clustering problem : Given a set of

Clustering with k-means Introduction to Machine Learning Clustering, what? Cluster :

Clustering Algorithms Dalya Baron (Tel Aviv University) XXX Winter School, November 2018

CLUSTER ANALYSIS Agenda Introduction to cluster analysis and application Feature

DSPACE CLUSTERING DSPACE CLUSTERING VIA PUPPET, HAPROXY AND CEPHFS VIA PUPPET, HAPROXY AND

Cluster Architectures Overview Cluster Computing The Problem The Solution The Anatomy

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

3. Random Variables Let ( , F , P ) be a probability model for an experiment, and X a function

Introduction CSCE CSCE In Homework 1, you are (supposedly) 478/878 478/878 Lecture 4:

11/11/2014 Chapter 22 INFERENCES ABOUT MEANS 1 SAMPLING DISTRIBUTION FOR MEANS Recall, the

CONFIDENCE INTERVALS Student s t-test The value of t will be compared to values in the

Analysis of LIGO S2 data for GWs from isolated pulsars Rjean J Dupuis, University of Glasgow

CPSC 531: System Modeling and Simulation Carey Williamson Department of Computer Science

Bargaining and Coalition Formation Dr James Tremewan (james.tremewan@univie.ac.at) Fairness

SBAs Post Award Subcontract Reform and Enforcement Introductions SBAs Angela