Recent results in model-based clustering via the cluster-weighted - - PowerPoint PPT Presentation

recent results in model based clustering via the cluster
SMART_READER_LITE
LIVE PREVIEW

Recent results in model-based clustering via the cluster-weighted - - PowerPoint PPT Presentation

Recent results in model-based clustering via the cluster-weighted approach Salvatore Ingrassia Department of Economics and Business University of Catania (Italy) s.ingrassia@unict.it The National Institute for Astrophysics Catania


slide-1
SLIDE 1

Recent results in model-based clustering via the cluster-weighted approach

Salvatore Ingrassia

Department of Economics and Business University of Catania (Italy) s.ingrassia@unict.it

The National Institute for Astrophysics Catania Astrophysical Observatory 17 February 2016

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 1 / 62

slide-2
SLIDE 2

Outline

1

Mixture Modeling

2

Mixture Models with covariates

3

Cluster-Weighted Models: the original framework

4

CWM for model-based clustering

5

Gaussian and Student-t CWM

6

Decision boundaries

7

Generalized Cluster-Weighted Models

8

More recent developments

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 2 / 62

slide-3
SLIDE 3

Mixture Modeling

Mixture modeling

Finite mixture models provide a flexible approach to statistical modeling of a wide variety of random phenomena characterized by unobserved heterogeneity. Assume that a given population Ω can be partitioned into G disjoint subsets, i.e. Ω = Ω1 ∪ · · · ∪ ΩG. We aim at identifying the underlying groups and estimating the parameters of the conditional-group densities. Two main cases:

  • 1. finite mixtures of distributions (FMD)
  • 2a. finite mixtures of regression models (FMR) also known as

mixture-of-experts models in the machine learning area, switching regression models in econometrics, latent class regression models in marketing, mixed models in biology.

  • 2b. finite mixtures of regression models with concomitant variables (FMRC).

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 3 / 62

slide-4
SLIDE 4

Mixture Modeling

Mixtures of Distributions (FMD) 1/3

Let Z be a random vector defined on Ω with values in some space Z ⊆ Rd and denote by p(z) the probability density function (pdf) of Z. Assume that Ω can be partitioned in G disjoint subsets, i.e. Ω = Ω1 ∪ · · · ∪ ΩG. We say that the density of Z is a finite mixture of distributions (FMD) if p(z) can be written in the form p(z) =

G

  • g=1

p(z|Ωg)πg where p(z|Ωg) is the pdf of Z|Ωg and πg = p(Ωg) is the mixing weight of the Ωg, with g = 1, . . . , G. Quite often, one considers mixtures of multivariate Gaussians (FMG) with Z|Ωg ∼ Nd+1(µg, Σg), for g = 1, . . . , G: p(z) =

G

  • g=1

p(z|Ωg)πg =

G

  • g=1

φd+1(z; µg, Σg)πg where φd+1(z; µg, Σg) denotes the pdf of multivariate Gaussian distribution with mean vector µg and covariance matrix Σg.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 4 / 62

slide-5
SLIDE 5

Mixture Modeling

Mixtures of Distributions (FMD) 2/3

Population of students: men and women.

Histogram weight distribution: women

weight frequency density 20 40 60 80 100 120 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07

Histogram weight distribution: men

weight frequency density 20 40 60 80 100 120 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07

µw = 56.8 Kg, σw = 6.8 Kg µm = 73.7 Kg, σm = 9.8 Kg nw = 1680, , πw = 0.61 nm = 1079, πm = 0.39

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 5 / 62

slide-6
SLIDE 6

Mixture Modeling

Mixtures of Distributions (FMD) 3/3

Histogram weight distribution: men+women

weight frequency density 20 40 60 80 100 120 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07

p(z) = πwN(µw, σ2

w)+πmN(µm, σ2 m) = 0.61·N(56.8, 6.82)+0.39·N(73.7, 9.82)

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 6 / 62

slide-7
SLIDE 7

Mixture Models with covariates

Mixture models with covariates: the problem

Consider a pair (Y, X1, . . . , Xd) of a response variable Y and covariates (X1, . . . , Xd)′ defined on some population Ω with values in R × Rd. Assume we are provided with a sample of N i.i.d. realizations of (Y, X1, . . . , Xd) and the dependence of Yn on xn is modeled by a multiple regression model Yn = β0 + β1xn1 + · · · + βdxnd + εn = β′xn where β = (β0, β1, . . . , βd)′ ∈ Rd+1 are unknown parameters, xn = (1, xn1, . . . , xnd)′ ∈ Rd+1 denotes the augmented covariate vector. ε1, . . . , εN ∼ N(0, σ2

ε) .

The problem

In many circumstances, the assumption that the regression coefficients are fixed over all possible realizations of Y1, . . . , YN is inadequate, and models where the regression coefficients change are of practical interest.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 7 / 62

slide-8
SLIDE 8

Mixture Models with covariates

Example A: Student Data

Data come from a survey on N = 270 university students, see Ingrassia et al. (2014)1. Consider the relationship between student height and student’s father height. Two groups: males and females (blue=males, red=females).

160 165 170 175 180 185 190 150 160 170 180 190 father’s height height 160 165 170 175 180 185 190 150 160 170 180 190 father’s height height

scatter plot single linear regression model

1Ingrassia S., Minotti S.C., Punzo A. (2014), Model-based clustering via linear

cluster-weighted models, Computational Statistics & Data Analysis, 71, 159-182.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 8 / 62

slide-9
SLIDE 9

Mixture Models with covariates

Example B: Tourism Data

Data concern N = 180 monthly observations about the attendance at museums and monuments (Y, data in millions) on the tourist overnights (X, data in millions) in Italy

  • ver the 15-year period spanning from January 1996 to December 2010.

10 20 30 40 50 60 70 80 1 2 3 4 5 tourist overnights (in million) attendance at museums and monuments (in million)

10 20 30 40 50 60 70 80 1 2 3 4 5 tourist overnights (in million) attendance at museums and monuments (in million)

plot of data (left); b) single linear regression model (right).

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 9 / 62

slide-10
SLIDE 10

Mixture Models with covariates

Example C: Star Data

Data concern N = 33 observations about the chromospheric activity index log RHK of stars hosting transiting hot Jupiters appears to be correlated with the planets’ surface gravity

0.0005 0.0010 0.0015 −5.5 −5.0 −4.5 −4.0 g_p^−1 logR 0.0005 0.0010 0.0015 −5.5 −5.0 −4.5 −4.0 g_p^−1 logR

plot of data (left); b) single linear regression model (right).

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 10 / 62

slide-11
SLIDE 11

Mixture Models with covariates

Mixture models with covariates

Consider a pair (Y, X) of a response variable Y and covariates X defined on some heterogeneous population Ω partitioned into G disjoint homogeneous subpopulations, i.e. Ω = Ω1 ∪ · · · ∪ ΩG. We focus on modeling the dependence between Y and X based on data coming from a heterogeneous population. In this framework, mixture models provide a flexible approach for a wide variety of random phenomena characterized by unobserved heterogeneity.

Existing literature

Mixture of regression models (MR), Mixture of regression models with concomitant variables (MRC)

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 11 / 62

slide-12
SLIDE 12

Mixture Models with covariates

Mixture of Regressions (MR)

Dependence between Y and X for data coming from a heterogenuous population can be modeled by a finite mixture of regressions (FMR), see e.g. McLachlan and Peel (2000)2, Fr¨ uhwirth-Schnatter (2006)3: p(y|x, ψ) =

G

  • g=1

f(y|x, θg)πg . where: f(y|x, θg) is the conditional density of Y given x in the group Ωg; the conditional densities belong to the same parametric family, indexed in θg ∈ Θ, g = 1, . . . , G. πg = p(Ωg) is the mixing weight of Ωg, (πg > 0 and G

g=1 πg = 1).

ψ = (π1, . . . , πG, θ′

1, . . . , θ′ G)′ ∈ Ψ is the vector of all parameters. 2McLachlan G.J., Peel D. (2000). Finite Mixture Models. Wiley, New York. 3Fr¨

uhwirth-Schnatter S. (2006). Finite Mixture and Markov Switching Models. Springer, Heidelberg.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 12 / 62

slide-13
SLIDE 13

Mixture Models with covariates

Mixture of Regressions with Concomitant (MRC)

Extension of FMR is Finite mixture of regressions with concomitant variables (FMRC), see Dayton and MacReady (1988)4: p(y|x, ψ) =

G

  • g=1

f(y|x; θg)p(Ωg|x, w) . where mixing weight p(Ωg|x, w) is now a function depending on x through some parameters w, and ψ = (π1, . . . , πG, θ1, . . . , θG, w) denotes the set of all parameters

  • f the model.

The probability p(Ωg|x, w) is usually modeled by a multinomial logistic distribution with the first component as baseline, that is: p(Ωg|x, w) = exp(w ′

gx)

G

h=1 exp(w ′ hx)

. where w g = (wg0, wg1, . . . , wgd)′ ∈ Rd+1 and w = (w ′

1, . . . , w ′ G)′ ∈ RG(d+1). 4Dayton C.M., Macready G.B. (1988). Concomitant-Variable Latent-Class Models, Journal of

the American Statistical Association, 83, 173-178.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 13 / 62

slide-14
SLIDE 14

Mixture Models with covariates

Two important cases

Gaussian linear components

In the Gaussian case, we consider f(y|x, θg) = φ(y; β′

gx, σ2 ε,g)

where φ denotes the Gaussian density with mean β′

gx and variance σ2 ε,g.

Generalized linear components

In Mixtures of Generalized Linear Models (GLM), we have f(y|x, θg) = f(y|x, βg, λg) = exp

  • yβ′

gx − b(β′ gx)

a(λg) + c(y, λg)

  • ,

for some specific functions a(·), b(·), and c(·), where λg is the dispersion parameter (constant in Ωg), a(λg) > 0. Here y ∈ Y ⊆ R. Canonical links are: the identity, log, logit, inverse and squared inverse functions for the Normal, Poisson, binomial, gamma and inverse Gaussian distributions.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 14 / 62

slide-15
SLIDE 15

Mixture Models with covariates

Parameter estimation and classification

Assume we are provided with a set of N independent observation pairs {(x1, y1), . . . , (xN, yN)} drawn from a mixture of regressions (MR/MRC). Then:

1

for fixed G, estimate model parameters, usually according to the maximum likelihood approach, using the EM algorithm;

2

if G is unknown, then

1

repeat step 1 for different number G of groups;

2

select G according to model selection criteria like AIC, BIC, ICL;

3

based on the estimate ψ, compute the posterior probability τg(xn, yn; ψ) that the nth unit (yn, xn) belong to the gth group Ωg: MR : τg(xn, yn| ψ)=f(yn|xn, θg) πg p(yn|x, ψ) = f(yn|xn; θg) πg G

h=1 f(yn|xn,

θh) πh MRC : τg(xn, yn| ψ)=f(yn|xn, θg)p(Ωg|xn, w) p(y|x; ψ) = f(yn|xn, θg) exp(w ′

gxn)

G

h=1 f(yn|xn,

θh) exp(w ′

hxn)

. and classify units into groups according to the maximum a posteriori probability (MAP) criterion.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 15 / 62

slide-16
SLIDE 16

Mixture Models with covariates

Identifiability about MR/MRC

Like any finite mixture model, mixtures of regression models suffer from nonidentifiability due to label switching and potential overfitting. Generic identifiability for MR does not follow in general directly from the generic identifiability of Gaussian mixtures, despite the close relationship between the two classes of models. A necessary condition for identifiability of MR is that the matrix x′x is of full rank, where x = (x′

1, . . . , x′ N)′.

Sufficient conditions for identifiability of MR in Hennig (2000)5. Generalization to identifiability of mixtures of GLM in Gr¨ un and Leisch (2008)6. Identifiability results for mixtures of logistic regression models with concomitant variables in Wang (1994)7.

5Hennig C. (2000).Identifiablity of models for clusterwise linear regression. Journal of

Classification, 17(2), 273–296.

6Gr¨

un B. and Leisch F. (2008). Identifiability of finite mixtures of multinomial logit models with varying and fixed effects. Journal of Classification, 25(2), 225–247.

7Wang, P

. (1994). Mixed regression models for discrete data. Technical report, Ph.D. Thesis, University of British Columbia, Vancouver.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 16 / 62

slide-17
SLIDE 17

Mixture Models with covariates

Example A: Student data (cont’d) 1/3

The relationship between height of the respondent and height of respondent’s father has been modeled according to both finite mixture of regression (MR) and finite mixture of regression with concomitant (MRC). The histogram of the X-variable height of respondent’s father does not show any cluster structure along the X-variable.

160 165 170 175 180 185 190 150 160 170 180 190 father’s height height father’s height Density 160 165 170 175 180 185 190 0.00 0.02 0.04 0.06 0.08 0.10

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 17 / 62

slide-18
SLIDE 18

Mixture Models with covariates

Example A: Student data (cont’d) 2/3

160 165 170 175 180 185 190 150 160 170 180 190 father’s height height 160 165 170 175 180 185 190 150 160 170 180 190 MR father’s height height 160 165 170 175 180 185 190 150 160 170 180 190 MRC father’s height height

The two models give in practice the same results

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 18 / 62

slide-19
SLIDE 19

Mixture Models with covariates

Example A: Student data (cont’d) 3/3

MR MRC Confusion Matrices Predicted Actual M F M 112 7 F 151 Predicted Actual M F M 113 6 F 151 Misclassification Error 2.59% 2.22% Adjusted Rand Index 0.8986 0.9127 computations performed using the R package flexmix 2.3-8, see Gr¨ un and Leisch (2008)8. Best solutions over 100 runs.

8Gr¨

un B., Leisch F. (2008). FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters, Journal of Statistical Software, 28, n. 4, 1-35.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 19 / 62

slide-20
SLIDE 20

Mixture Models with covariates

Example B: Tourism data (cont’d - 1/5)

Cluster structure with respect to the covariate:

10 20 30 40 50 60 70 80 1 2 3 4 5 tourist overnights (in million) attendance at museums and monuments (in million)

Tourism data

tourist overnights (in million) Density 20 40 60 80 100 0.00 0.01 0.02 0.03 0.04 0.05 0.06

The data have been modeled without considering the time information (month labels)

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 20 / 62

slide-21
SLIDE 21

Mixture Models with covariates

Example B: Tourism data (cont’d - 2/5)

1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 20 40 60 80 1 2 3 4 5

model: MR

tourist overnights (in million) attendance at museums and monuments (in million) 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 22 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

20 40 60 80 1 2 3 4 5

model: MRC

tourist overnights (in million) attendance at museums and monuments (in million) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 33 3 3 3 33 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

choice: G = 4 for economic reasons.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 21 / 62

slide-22
SLIDE 22

Mixture Models with covariates

Example B: Tourism data (cont’d - 3/5)

Data clustering (according to MAP): units labeled by month. Model: Mixture of Regressions with Concomitant (MRC).

Month Group Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 2 3 4 5 6 7 8 9 10 11 12 1 15 15 2 1 12 15 15 15 3 15 15 4 15 14 3 15 15

The four groups identify almost perfectly the units from time point of view. Group 1 : units in June and September (early and late summer); Group 2 : units in March, April, May and October (spring and early autumn); Group 3 : units in July and August (summer); Group 4 : units from November to February (late autumn and winter); misclassification error: δ = 2.22%

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 22 / 62

slide-23
SLIDE 23

Mixture Models with covariates

Example D: Student data 2 (cont’d) 1/4

In Student Data consider the relationship between weight and height. Two groups: males and females (blue=males, red=females) .

150 160 170 180 190 50 60 70 80 90 height weight weight Density 50 60 70 80 90 0.00 0.01 0.02 0.03 0.04 0.05 0.06

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 23 / 62

slide-24
SLIDE 24

Mixture Models with covariates

Example D: Student data 2 (cont’d) 2/4

150 160 170 180 190 50 60 70 80 90

model: MR

height weight 150 160 170 180 190 50 60 70 80 90

model: MRC

height weight

δ = 46.30% δ = 42.96% ARI = 0.0072 ARI = 0.0105

Question

Can we do better?

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 24 / 62

slide-25
SLIDE 25

Cluster-Weighted Models: the original framework

Cluster Weighted Model (CWM) - 1/2

Cluster-Weighted Model has been proposed in Gershenfeld (1997)9 in the framework

  • f machine learning, in particular for supervised learning of a probability density

estimation of a joint set of input feature and output target data. In the original setting, CWM was developed in the context of media technology under Gaussian assumptions, in order to build a digital violin with realistic sound, in the framework of non-linear time series. It can be interpreted as a flexible technique for nonlinear function fitting of an input-output relation (X, Y).

9Gershenfeld N. (1997). Nonlinear inference and Cluster-Weighted Modeling, Annals of the

New York Academy of Sciences, 808, n.1, 18–24

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 25 / 62

slide-26
SLIDE 26

Cluster-Weighted Models: the original framework

Cluster Weighted Model (CWM) - 2/2

The idea is to model the joint probability p(x, y) of (X, Y) as: p(x, y|ψ) =

G

  • g=1

p(x, y|ψg)πg =

G

  • g=1

f(y|x, θg)p(x|ξg)πg where f(y|x, θg) is the conditional density of Y|x in the group Ωg, p(x|ξg) is the density of X in the group Ωg, πg is the mixing weight of Ωg,

Remarks

1

the joint density of (X, Y) can be viewed as a mixture of local models f(y|x; θg) weighted on both p(x; ξg) and πg;

2

the choice of the parametric family of the component densities f(y|x; θg) and p(x; ξg) yields a large family of models.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 26 / 62

slide-27
SLIDE 27

Cluster-Weighted Models: the original framework

Why Cluster Weighted ′′ Model?

Consider the expected value of Y given x. Some algebra yields E(Y|x) =

  • R

yf(y|x, θ)dy =

  • R

y G

g=1 f(y|x, θg)p(x|ξg)πg

p(x|ξ) dy = G

g=1

  • R y p(y|x, θg) dy
  • p(x|ξg)πg

G

j=1 p(x|ξj)πj

=

G

  • g=1

β′

gx p(Ωg|x, ξ)

because p(Ωg|x, ξ) = p(x|ξg)πg G

j=1 p(x|ξj)πj

. see Sch¨

  • ner (2000)10.

The result

E(Y|x) is computed through the sum of the local models β′

gx weighted on the posterior

probability of the components (clusters) Ωg, g = 1, . . . , G.

10Sch¨

  • ner B. (2000). Probabilistic Characterization and Synthesis of Complex Data Driven

Systems, Ph.D. Thesis, MIT.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 27 / 62

slide-28
SLIDE 28

Cluster-Weighted Models: the original framework

CWM for model-based clustering

CWM is not new in statistics. In Hennig (2000)11 such model is referred to as clusterwise linear regression with random covariates and in Wedel (2002)12 as saturated mixture regression model. In Ingrassia, Minotti and Vittadini (2012)13, CWM has been developed in the framework

  • f model-based clustering.

CWM emerges as a ”competitor” of MR/MRC. E(Y|x)MR =

G

  • g=1

β′

gx πg

E(Y|x)MRC =

G

  • g=1

β′

gx p(Ωg|x, w) = G

  • g=1

β′

gx

exp(w ′

gx)

G

h=1 exp(w ′ hx) 11Hennig C. (2000). Identifiability of Models for Clusterwise Regression, Journal of

Classification, 17, 273-296.

12Wedel M. (2002). Concomitant variables in finite mixture models. Statistica Nederlandica, 56,

n.3, 362-375.

13Ingrassia S., Minotti S.C., Vittadini G. (2012), Local Statistical Modeling via a

Cluster-Weighted Approach with Elliptical Distributions, Journal of Classification, 29, 363-401.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 28 / 62

slide-29
SLIDE 29

CWM for model-based clustering

Relationship with MR

Proposition

If the probability density of X|Ωg does not depend on group g, i.e. p(x|ξg) = p(x|ξ) for every g = 1, . . . , G, then it results p(x, y|ψ) = p(x|ξ)

G

  • g=1

f(y|x, θg) πg

  • MR

.

Corollary

Under the same assumptions, MR and CWM lead to the same posterior probabilities. We say that CWM contains MR.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 29 / 62

slide-30
SLIDE 30

CWM for model-based clustering

Relationship with MRC

Proposition

Assume that X|Ωg ∼ Nd(µg, Σ) for g = 1, . . . , G. If Σg = Σ and πg = π = 1/G for g = 1, . . . , G, then the density of CWM can be written as p(x, y|ψ) = p(x|ξ)

G

  • g=1

f(y|x, ξg) p(Ωg|x, w)

  • MRC

for suitable w ∈ Rd+1, g = 1, . . . , G and p(x|ξ) = (1/G) G

g=1 φd(x; µg, Σ) .

Corollary

Under the same assumptions, MRC and CWM lead to the same posterior probabilities. Thus we say that CWM contains MRC.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 30 / 62

slide-31
SLIDE 31

Gaussian and Student-t CWM

The linear Gaussian CWM

In the traditional framework, both marginal densities and conditional densities are assumed to be Gaussian, with X|Ωg ∼ Nd(µg, Σg) and Y|x, Ωg ∼ N(β′

gx, σ2 ε,g)

so that p(x|ξg) = φd(x; µg, Σg) and p(y|x, Ωg) = φ(y; β′

gx, σ2 ε,g)

Thus we get: p(x, y|ψ) =

G

  • g=1

φ(y; β′

gx, σ2 ε,g)φd(x; µg, Σg) πg.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 31 / 62

slide-32
SLIDE 32

Gaussian and Student-t CWM

Example C: Student data 2 (cont’d) 3/4

150 160 170 180 190 50 60 70 80 90 height weight 150 160 170 180 190 50 60 70 80 90

model: CWM

height weight

real data δ = 5.93% ARI = 0.7761

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 32 / 62

slide-33
SLIDE 33

Gaussian and Student-t CWM

Example D: Student data 2 (cont’d) 4/4

In summary:

150 160 170 180 190 50 60 70 80 90 height weight 150 160 170 180 190 50 60 70 80 90 model: CWM height weight 150 160 170 180 190 50 60 70 80 90 model: MR height weight 150 160 170 180 190 50 60 70 80 90 model: MRC height weight

Why do we have such differences?

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 33 / 62

slide-34
SLIDE 34

Gaussian and Student-t CWM

Data modeling via Student-t distributions

Data modeling according to the Student-t distribution has been proposed in the literature in order to provide much robust fitting for groups of observations with longer than normal tails or atypical observations. A q variate random vector Z has a multivariate t distribution with degrees of freedom ν ∈ (0, ∞), location parameter µ ∈ Rq and q × q positive definite inner product matrix Σ if its density is given by td(z; µ, Σ, ν) = Γ((ν + q)/2)νν/2 Γ(ν/2)|πΣ|1/2[ν + δ(z, µ; Σ)](ν+q)/2 where δ(z, µ; Σ) = (z − µ)′Σ−1(z − µ) denotes the squared Mahalanobis distance between z and µ, with respect to the matrix Σ, and Γ(·) is the Gamma function. We write Z ∼ tq(µ, Σ, ν). Moreover, mixtures of multivariate Student-t distributions (FMT) have density: p(z) =

G

  • g=1

td(z; µg, Σg, νg)πg .

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 34 / 62

slide-35
SLIDE 35

Gaussian and Student-t CWM

Student-t CWM: definition

Data modeling according to the Student-t distribution has been proposed in the literature in order to provide much robust fitting for groups of observations with longer than normal tails or atypical observations. Let us assume: X|Ωg ∼ td(µg, Σg, νg) Y|x, Ωg ∼ t(β′

gx, σ2 ε,g, ζg)

where X|Ωg has a multivariate t distribution with location parameter µg, inner product matrix Σg and degrees of freedom νg: Y|x, Ωg has a t distribution with location parameter β′

gx, scale parameter σ2 ε,g

and degrees of freedom ζg. Thus the Student-t CWM is defined as: p(x, y|ψ) =

G

  • g=1

t(y|β′

gx, σ2 g, ζg) td(x|µg, Σg, νg) πg .

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 35 / 62

slide-36
SLIDE 36

Gaussian and Student-t CWM

Example A: Tourism data (cont’d - 4/5)

Data have been modeled according to CWM. The BIC corresponding to the different models can be represented according to an mclust-type plot.

1 2 3 4 5 6 −2100 −2000 −1900 −1800 −1700 k BIC NN−VE NN−EV NN−VV tN−VE tN−EV tN−VV Nt−VE Nt−EV Nt−VV tt−VE tt−EV tt−VV 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

20 40 60 80 1 2 3 4 5

model: CWM

tourist overnights (in million) attendance at museums and monuments (in million) 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 33 3 3 33 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

We selected the NN-VV model with G = 4 groups.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 36 / 62

slide-37
SLIDE 37

Gaussian and Student-t CWM

Example A: Tourism data (cont’d - 5/5)

Data clustering by month:

month Group Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 15 15 2 15 15 15 15 2 3 15 15 13 15 4 15 15

The four groups identify almost perfectly the units from time point of view. Group 1 : units in June and September (early and late summer); Group 2 : units in March, April, May and October (spring and early autumn); Group 3 : units from November to February (late autumn and winter); Group 4 : units in July and August (summer); misclassification error rate: δ = 1.11% (MRC: δ = 2.22%)

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 37 / 62

slide-38
SLIDE 38

Gaussian and Student-t CWM

Example C: Star data (cont’d - 2/2)

Data have been modeled according to CWM.

0.0005 0.0010 0.0015 −5.2 −5.0 −4.8 −4.6 −4.4 gp

−1

log RHK 0.0005 0.0010 0.0015 −5.2 −5.0 −4.8 −4.6 −4.4 gp

−1

log RHK

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 38 / 62

slide-39
SLIDE 39

Gaussian and Student-t CWM

Example D: Simdata1 1/2

Data classification according the three models: MR MRC CWM

2 4 6 8 10 12 5 10 15 20 25 x y 2 4 6 8 10 12 5 10 15 20 25 MR x y 2 4 6 8 10 12 5 10 15 20 25 MRC x height 2 4 6 8 10 12 5 10 15 20 25 CWM x y

Question (in other words)

Where do these three groupings come from?

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 39 / 62

slide-40
SLIDE 40

Decision boundaries

Decision boundaries 1/2

Relationships among MR, MRC and CWM have been investigated in Ingrassia et al. (2015). More insight can be gained through a geometrical analysis14. Consider the posterior probabilities: MR : τg(x, y| ψ)= φ(y; β

′ gx, s2 ε,g)

πg G

j=1 φ(y;

β

′ jx, s2 ε,g)

πj MRC : τg(x, y| ψ)= φ(y; β

′ gx, s2 ε,g) exp(

w

′ gx)

G

j=1 φ(y;

β

′ jx, s2 ε,j) exp(

w

′ j x)

CWM : τg(x, y| ψ)= φ(y; β

′ gx, s2 ε,g)φ(x;

µg, Σg) πg G

j=1 φ(y;

β

′ jx, s2 ε,j)φ(x;

µj, Σj) πj where s2

ε,g denotes the estimate of σ2 ε,g. 14Ingrassia S., Punzo A. (2015). Decision boundaries for mixtures of regressions, Journal of the

Korean Statistical Society, forthcoming.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 40 / 62

slide-41
SLIDE 41

Decision boundaries

Decision boundaries 2/2

Consider two groups Ω1 and Ω2 = Ωc

  • 1. The set
  • (x, y) ∈ Rd+1 : τg(x, y|

ψ) = 1 2

  • is the decision boundary between Ω1 and Ω2.

In Gaussian models, the decision boundaries of MR/MRC/CWM belong to the family of quadrics which are characterized by the equation z′Az + b′z + c = 0 where z = (x′, y)′, A is a (d + 1) × (d + 1) symmetric matrix, b ∈ Rd+1 and C ∈ R. For d = 2 examples of quadrics are spheres, circular cylinders, elliptic paraboloids, etc. For d = 1 quadrics reduces to conics: ellipses, parabolas, hyperbolas.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 41 / 62

slide-42
SLIDE 42

Decision boundaries

Geometric analysis of decision boundaries (d = 1) 1/5

When d = 1, boundary decisions are conics (ellipses, parabolas, hyperbolas) which are characterized by an equation of type a11x2 + a22y2 + 2a12xy + 2a13x + 2a23y + a33 = 0 Consider the symmetric matrices A =   a11 a12 a13 a12 a22 a23 a13 a23 a33   A33 = a11 a12 a12 a22

  • If |A| = 0 then the conic is not degenerate and |A33| identifies the conic:

if |A33| > 0 then the boundary decision is an ellipse, if |A33| = 0 then the boundary decision is a parabola, if |A33| < 0 then the boundary decision is a hyperbola.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 42 / 62

slide-43
SLIDE 43

Decision boundaries

Geometric analysis of decision boundaries (d = 1) 2/5

MR MRC CWM a11

  • β2

21

2s2

ε,2 −

  • β2

11

2s2

ε,1

  • β2

21

2s2

ε,2 −

  • β2

11

2s2

ε,1

  • β2

21

2s2

ε,2 −

  • β2

11

2s2

ε,1 +

1 2 σ2

2 −

1 2 σ2

1

a12

  • β11

2s2

ε,1 −

  • β21

2s2

ε,2

  • β11

2s2

ε,1 −

  • β21

2s2

ε,2

  • β11

2s2

ε,1 −

  • β21

2s2

ε,2

a22

1 2s2

ε,2 −

1 2s2

ε,1

1 2s2

ε,2 −

1 2s2

ε,1

1 2s2

ε,2 −

1 2s2

ε,1

a13

  • β20

β21 2s2

ε,2 −

  • β10

β11 2s2

ε,1

  • β20

β21 2s2

ε,2 −

  • β10

β11 2s2

ε,1 +

  • w11−

w21 2

  • β20

β21 2s2

ε,2 −

  • β10

β11 2s2

ε,1 +

  • µ1

2 σ2

1 −

  • µ2

2 σ2

2

a23

  • β10

2s2

ε,1 −

  • β20

2s2

ε,2

  • β10

2s2

ε,1 −

  • β20

2s2

ε,2

  • β10

2s2

ε,1 −

  • β20

2s2

ε,2

a33

  • β2

10

2s2

ε,2 −

  • β2

20

2s2

ε,1 + ln

π1

2

  • β2

10

2s2

ε,2 −

  • β2

20

2s2

ε,1 +

w10 − w20

  • β2

10

2s2

ε,2 −

  • β2

20

2s2

ε,1 + ln

π1

2 +

  • µ2

2

2 σ2

2 −

  • µ2

1

2 σ2

1 Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 43 / 62

slide-44
SLIDE 44

Decision boundaries

Geometric analysis of decision boundaries (d = 1) 3/5

Proposition

Let (x1, y1), . . . , (xN, yN) be a sample drawn from a population Ω = Ω1 ∪Ω2 and assume that the conditional distribution of Y|x follows (Gaussian) MR. If β11 = β21, then the decision boundary between Ω1 and Ω2 is a hyperbola.

  • Proof. The matrices A and A33 are given by

|A| = −

  • s2

ε,2

  • β2

10 −

β2

20

  • + s2

ε,1

  • 2 ln

π1

2

  • s2

ε,2 +

β2

10 −

β2

20

  • β11 −

β21 2 8s4

ε,1s4 ε,1

|A33| = −

  • β11 −

β21

  • 2

4s2

ε,1s2 ε,2

. and for β11 = β21 it results |A| = 0 and |A33| < 0 yielding to hyperbolas. In particular, it results β11 = β21 with probability equal to zero yielding |A| = 0 (decision boundary is a degenerate conic, i.e. straight lines).

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 44 / 62

slide-45
SLIDE 45

Decision boundaries

Geometric analysis of decision boundaries (d = 1) 4/5

Proposition

Let (x1, y1), . . . , (xN, yN) be a sample drawn from a population Ω = Ω1 ∪ Ω2 and assume that the conditional distribution of Y|x follows (Gaussian) MRC. If β11 = β21, then the decision boundary between Ω1 and Ω2 is a hyperbola, otherwise its is a parabola.

  • Proof. Compute the determinants of the matrices A and A33. In particular, we get

|A| = 0 and |A33| = −

  • β11 −

β21

  • 2

4s2

ε,1s2 ε,2

. Thus, for β11 = β21 it results |A33| < 0 yielding to hyperbolas. In particular, it results

  • β11 =

β21 with probability equal to zero (decision boundary is a parabola).

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 45 / 62

slide-46
SLIDE 46

Decision boundaries

Geometric analysis of decision boundaries (d = 1) 5/5

Proposition

Let (x1, y1), . . . , (xN, yN) be a sample drawn from a population Ω = Ω1 ∪ Ω2 and assume that the joint distribution of X, Y follows (Gaussian) CWM. If σ2

1 = σ2 2 and (

β11 − β21)2 = (s2

ε,1 − s2 ε,2)(σ2 1 − σ2 2)/σ2 1σ2 2, then the decision boundary between Ω1 and Ω2 is either a

hyperbola or an ellipse, otherwise it is a parabola.

  • Proof. Compute the determinants of the matrices A and A33. In particular, we get

|A| = 0 and |A33| = −( β11 − β21)2 4s2

ε,1s2 ε,2

+ (s2

ε,1 − s2 ε,2)(σ2 1 − σ2 2)

4s2

ε,1s2 ε,2σ2 1σ2 2

. Thus, for ( β11 − β21)2 = (s2

ε,1 − s2 ε,2)(σ2 1 − σ2 2)/σ2 1σ2 2 it results |A33| ≶ 0 yielding to either

hyperbolas or ellipses. In particular, it results ( β11 − β21)2 = (s2

ε,1 − s2 ε,2)(σ2 1 − σ2 2)/σ2 1σ2 2

with probability equal to zero (the decision boundary is a parabola).

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 46 / 62

slide-47
SLIDE 47

Decision boundaries

Example D: Simdata2 (cont’d) 2/2

MR MRC CWM

2 4 6 8 10 12 5 10 15 20 25 MR x y 2 4 6 8 10 12 5 10 15 20 25 MRC x y 2 4 6 8 10 12 5 10 15 20 25 CWM x y −10 −5 5 10 15 20 −40 −20 20 40 MR x y −30 −20 −10 10 20 30 −20 −10 10 20 30 MRC x y 5 10 15 −10 10 20 30 CWM x y

η = 30% η = 4% η = 2.67%

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 47 / 62

slide-48
SLIDE 48

Decision boundaries

Geometric analysis of decision boundaries (d = 2) 1/6

For d = 2 boundary decisions are characterized by the equation a11x2

1 +a22x2 2 +a33y2 +2a12x1x2 +2a13x1y +2a23x2y +2a14x1 +2a24x2 +2a34y +a44 = 0

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 48 / 62

slide-49
SLIDE 49

Decision boundaries

Geometric analysis of decision boundaries (d = 2) 2/6

Consider the symmetric matrices A =     a11 a12 a13 a14 a12 a22 a23 a24 a13 a23 a33 a34 a14 a24 a34 a44     A44 =   a11 a12 a13 a12 a22 a23 a13 a23 a33   The shape of the decision boundary (i.e. the type of quadric) depends on: the sign of |A| the sign of |A44| the sign of the eigenvalues of |A44| (when |A| = 0).

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 49 / 62

slide-50
SLIDE 50

Decision boundaries

Geometric analysis of decision boundaries (d = 2) 3/6

Proposition

Let (x1, y1), . . . , (xN, yN) be a sample drawn from a population Ω = Ω1∪Ω2 and assume that the conditional distribution of Y|x follows (Gaussian) MR. If β11/ β12 = β21/ β22, then the decision boundary between Ω1 and Ω2 is a hyperbolic paraboloid, otherwise it is a cylinder (degenerate quadric).

  • Proof. For MR we have:

|A| =

  • s2

ε,2

β10 − s2

ε,1

β20 2

  • β12

β21 − β11 β22 2 4s6

ε,1s6 ε,2

and |A44| = 0. If β11/ β12 = β21/ β22 then |A| = 0 and then the decision boundaries are hyperbolic paraboloids. If β11/ β12 = β21/ β22 then |A| = 0 and then the decision boundaries are cylinders (degenerate quadrics).

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 50 / 62

slide-51
SLIDE 51

Decision boundaries

Geometric analysis of decision boundaries (d = 2) 4/6

Proposition

Let (x1, y1), . . . , (xN, yN) be a sample drawn from a population Ω = Ω1 ∪ Ω2 and as- sume that the conditional distribution of Y|x follows MRC. Then the decision boundary between Ω1 and Ω2 is a hyperbolic paraboloid.

  • Proof. For MRC we have:

|A| = 1 16s6

ε,1s6 ε,2

  • 2(s2

ε,1

β20 − s2

ε,2

β10)( β11 β22 − β12 β21) + +s2

ε,1s2 ε,2

  • β22 −

β12

  • (w11 − w21) +
  • β11 −

β21

  • (w12 − w22)

2 |A44| = 0. In general, |A| > 0 and then the decision boundaries are hyperbolic paraboloids.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 51 / 62

slide-52
SLIDE 52

Decision boundaries

Geometric analysis of decision boundaries (d = 2) 5/6

Proposition

Let (x1, y1), . . . , (xN, yN) be a sample drawn from a population Ω = Ω1∪Ω2 and assume that the conditional distribution of Y|x follows CWM. If Σ1 = Σ2 the decision boundary between Ω1 and Ω2 is either a hyperboloid of one sheet or an ellipsoid. If Σ1 = Σ2 the decision boundary between Ω1 and Ω2 is a hyperbolic paraboloid.

  • Proof. For CWM we have in general |A| = 0.

If Σ1 = Σ2, then it results |A44| = 0 and the decision boundaries are hyperbolic of

  • ne sheet or ellipsoids.

If Σ1 = Σ2, then it results |A44| = 0 and the decision boundaries are hyperbolic paraboloids.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 52 / 62

slide-53
SLIDE 53

Decision boundaries

Geometric analysis of decision boundaries (d = 2) 6/6

MR/MRC: CWM with Σ1 = Σ2: CWM with Σ1 = Σ2:

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 53 / 62

slide-54
SLIDE 54

Decision boundaries

Geometric analysis of decision boundaries

There no exist a characterization of quadrics for more than three dimension. In terms of number of parameters, MR < MRC < CWM. From a geometrical point

  • f view, the more complex the model is, the more flexible the family of the

decision boundaries is.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 54 / 62

slide-55
SLIDE 55

Decision boundaries

Geometric analysis of decision boundaries - MR

t models do not yield quadrics, analysis by simulation studies. Degrees of freedom of X|Ωg: ν1 = ν2 = 50. Plots for different degrees of freedom of Y|x, Ωg, with ζ1 = ζ2:

−10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10

ζ1 = ζ2 = 3 ζ1 = ζ2 = 10 ζ1 = ζ2 = 30 Dotted lines refer to boundary decision of Gaussian models with the same parameters (except the degrees of freedom).

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 55 / 62

slide-56
SLIDE 56

Decision boundaries

Geometric analysis of decision boundaries - CWM

Degrees of freedom of X|Ωg: ν1 = ν2 = 50. Plots for different degrees of freedom of Y|x, Ωg, with ζ1 = ζ2:

−10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10 −10 −5 5 10

ζ1 = ζ2 = 3 ζ1 = ζ2 = 30 ζ1 = ζ2 = 130 Dotted lines refer to boundary decision of Gaussian models with the same parameters (except the degrees of freedom).

Why?

Differently from the Gaussian case, if X|Ωg and Y|x, Ωg are t-distributed, in general the joint distribution of (X, Y) is not t.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 56 / 62

slide-57
SLIDE 57

Generalized Cluster-Weighted Models

Generalized Linear Gaussian CWM

Mixtures of GLM can be extended in the cluster-weighted framework yielding the model p(x, y|ψ) =

G

  • g=1

f(y|x, βg, λg)φd(x; µg, Σg)πg, which will be referred to as generalized linear Gaussian CWM, see Ingrassia et al. (2014)15. Here y ∈ Y ⊆ R. For sake of simplicity, we assume Gaussian marginals, i.e. X|Ωg ∼ Nd(µg, Σg) (but Student-t can be assumed as well). Discrete response variables Y can be modeled according to the Binomial and Poisson distributions. Such models will be referred to as the Binomial Gaussian CWM (Y = {0, 1, . . . , M}) and the Poisson Gaussian CWM (Y = N).

15Ingrassia S., Punzo A., Vittadini G., Minotti S.C. (2015), The Generalized Linear Mixed

Cluster-Weighted Model, Journal of Classification, 32, n.1, 85-113.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 57 / 62

slide-58
SLIDE 58

More recent developments

CWMs for high-dimensional X-spaces

In order to extend the CWM in high dimensional X-spaces, we assume a latent Gaussian factor structure for X with q ≪ d factors, in each mixture-component, which leads to the factor regression model of Y on x. This leads to the linear Gaussian Cluster-Weighted Factor Analyzers model (CWFA)16 p(x, y|θ) =

G

  • g=1

φ(y|x; βg, σ2

g) φd(x; µg, Λg, Ψg) πg

where Σg = ΛgΛ′

g + ψg and θ denotes the overall parameters.

This model has been further extended in order to incorporate common/uncommon t-factor analyzers for the covariates, and a t-distribution for the response variable in each mixture component, see Subedi et al. (2015)17

16Subedi S., Punzo A. Ingrassia S., McNicholas P

.D. (2013). Clustering and Classification via Cluster-Weighted of Factor Analyzers, Advances in Data Analysis and Classification, 7, n.1, 5-40.

17Subedi S., Punzo A. Ingrassia S., McNicholas P

.D. (2015). Cluster-Weighted t-Factor Analyzers for Robust Model-Based Clustering and Dimension Reduction, Statistical Methods and Applications, 24, n. 4, 623-649.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 58 / 62

slide-59
SLIDE 59

More recent developments

Robust CWM 1/2

Current approaches for robust MR through trimming are based on a two-step procedure (along both X and Y), see Garcia-Escudero et al. (2010)18 which extends the TCLUST methodology proposed in Garc´ ıa-Escudero et al. (2008)19 to the context

  • f robust MR:

1

In the first step, data are trimmed in order to avoid the effect of outliers in Y: the proportion 1 − α1 of observations Xn’s which attains the highest possible values for maxg=1,...,G{f(yn|xn, βg, σg)πg} is retained;

2

in the second step, a second trimming of size α2 is considered, taking into account only the values of the covariates of the observations surviving to the first trimming, in order to diminish the effect of leverage points.

18Garc´

ıa-Escudero L.A., Gordaliza A., Mayo-Iscar A., San Martin R. (2010). Robust clusterwise linear regression through trimming, Computational Statistics and Data Analysis, 54, 3057-3069.

19Garc´

ıa-Escudero, L.A., Gordaliza, A., Matr´ an, C., Mayo-Iscar, A., 2008. A general trimming approach to robust cluster analysis, Annals of Statistics, 36, 1324-1345.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 59 / 62

slide-60
SLIDE 60

More recent developments

Robust CWM 2/2

CWM provides a natural framework in order to approach such problems using a unique ”global” model. In Garc´ ıa-Escudero et al. (2016)20 constraints in the estimation of the Gaussian CWM are introduced to avoid not only the singularities of the objective function, but also the appearance of spurious solutions. The trimmed CWM methodology is based on the maximization of the log-likelihood, given by

N

  • n=1

z(yn, xn) log  

G

  • g=1

φ(yn; β′

gxn, σ2 g)φd(xn; µg, Σg)πg

  , where z(·, ·) is a 0-1 trimming indicator function that indicates whether observation (yn, xn) is trimmed off (z(yn, xn)=0) or not (z(yn, xn)=1). A fixed fraction α of observations are allowed to be unassigned by setting N

n=1 z(yn, xn) = [N(1 − α)], where the parameter α denotes the trimming level. 20Garcia-Escudero L.A., Gordaliza A., Greselin F., Ingrassia S., Mayo-Iscar A. (2016). Robust

estimation of mixtures of regressions with random covariates, via trimming and constraints, Statistics and Computing, DOI 10.1007/s11222-016-9628-3, forthcoming

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 60 / 62

slide-61
SLIDE 61

More recent developments

Unsupervised Learning of Finite Mixtures with Covariates from Incomplete Data

Unsupervised learning of statistical models from incomplete data is a relevant topic in many practical problems because data coming from the real world involve quite often missing data. New research focuses on learning the relationship between a response variable and covariates based on incomplete data coming from a heterogeneous population. The problem is approached by means of mixtures of regression with random covariates, also referred in literature to as cluster-weighted models21. To this end, in the likelihood framework an EM algorithm is proposed under the assumption of missing at random data. The performance of this approach is analysed on the ground of a large simulation study.

21Ingrassia S., Murray P

.M., McNicholas P .D. (2016). Unsupervised Learning of Finite Mixtures with Covariates from Incomplete Data, submitted for publication.

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 61 / 62

slide-62
SLIDE 62

More recent developments

Thank you for your attention

Salvatore Ingrassia (University of Catania) Cluster Weighted Models CT Astrophysical Observatory 17/02/16 62 / 62