Flexible Mixture Modeling and Model-Based Clustering in R Bettina - - PowerPoint PPT Presentation

flexible mixture modeling and model based clustering in r
SMART_READER_LITE
LIVE PREVIEW

Flexible Mixture Modeling and Model-Based Clustering in R Bettina - - PowerPoint PPT Presentation

Flexible Mixture Modeling and Model-Based Clustering in R Bettina Grn September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R 0 / 170 Outline Bettina Grn September 2017 c Flexible Mixture Modeling and


slide-1
SLIDE 1

Flexible Mixture Modeling and Model-Based Clustering in R

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – 0 / 170

slide-2
SLIDE 2

Outline

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Outline – 1 / 170

slide-3
SLIDE 3

Program

Day 1: Afternoon 13:30–14:00 Registration 14:00–15:40 Lecture Introduction Estimation and inference 15:40–16:00 COFFEE BREAK 16:00–17:00 Practicals Day 2: Morning 09:00–10:40 Lecture Mixtures of normal distributions Package mclust 10:40–11:00 COFFEE BREAK 11:00–12:00 Practicals

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Outline – 2 / 170

slide-4
SLIDE 4

Program / 2

Day 2: Afternoon 14:00–15:40 Lecture Latent class analysis Package flexmix Mixtures of regression models 15:40–16:00 COFFEE BREAK 16:00–17:00 Practicals Day 3: Morning 09:00–10:40 Lecture Extensions and variants Extending flexmix 10:40–11:00 COFFEE BREAK 11:00–12:00 Practicals

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Outline – 3 / 170

slide-5
SLIDE 5

Material

Website:

http://http://www.stat.tugraz.at/friedl/FLEXMIX/

E-Mail: Bettina.Gruen@jku.at

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Outline – 4 / 170

slide-6
SLIDE 6

Introduction

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Introduction – 5 / 170

slide-7
SLIDE 7

Finite mixture models

Flexible model class with different special cases depending on data types and application. Types of application: Semi-parametric tool to approximate general distribution functions. Modeling of unobserved heterogeneity / detecting groups in data → Model-based clustering. Areas of application: Astronomy Biology Economy

. . .

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Introduction – 6 / 170

slide-8
SLIDE 8

Finite mixture models / 2

Histogram of faithful$waiting

faithful$waiting Density 40 50 60 70 80 90 100 0.00 0.01 0.02 0.03 0.04 0.05 Bettina Grün c

September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Introduction – 7 / 170

slide-9
SLIDE 9

Finite mixture models / 3

Histogram of faithful$waiting

faithful$waiting Density 40 50 60 70 80 90 100 0.00 0.01 0.02 0.03 0.04 0.05 Bettina Grün c

September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Introduction – 8 / 170

slide-10
SLIDE 10

Finite mixture models / 4

  • −5

5 10 −2 2 4 6 8 Bettina Grün c

September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Introduction – 9 / 170

slide-11
SLIDE 11

Finite mixture models / 5

  • −5

5 10 −2 2 4 6 8

1 2 3 4

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Introduction – 10 / 170

slide-12
SLIDE 12

Finite mixture models / 6

  • 2

4 6 8 10 10 20 30 40 50 x yn Bettina Grün c

September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Introduction – 11 / 170

slide-13
SLIDE 13

Finite mixture models / 7

  • 2

4 6 8 10 10 20 30 40 50 x yn Bettina Grün c

September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Introduction – 12 / 170

slide-14
SLIDE 14

Model definition

A finite mixture model is given by H(y|x, Θ) =

K

  • k=1

πkFk(y|x, ϑk),

where

K

  • k=1

πk = 1

and

πk > 0 ∀k.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Introduction – 13 / 170

slide-15
SLIDE 15

Model definition / 2

In the following we assume that the component specific density functions fk() exist and allow to define the density of the mixture distribution h(). h(y|x, Θ) =

K

  • k=1

πkfk(y|x, ϑk).

Arbitrary and also different densities fk() can be used for each component. In most applications one assumes that fk() is from the same parametric distribution family or model for all k = 1, . . . , K. The component specific densities only differ in the values of the parameter vector ϑk.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Introduction – 14 / 170

slide-16
SLIDE 16

Estimation and inference

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 15 / 170

slide-17
SLIDE 17

Estimation

The log-likelihood for N independent observations is given by

log L(Θ) = ℓ(Θ) =

N

  • n=1

log K

  • k=1

πkfk(yn|xn, ϑk)

  • .

Maximum-likelihood (ML) estimation: Direct optimization of likelihood (mostly in simpler cases). Expectation-maximization (EM) algorithm for more complicated models (Dempster, Laird and Rubin, 1977). EM followed by direct optimization for estimate of Hessian. . . . Bayesian estimation: Gibbs sampling (Diebolt and Robert, 1994) Markov chain Monte Carlo algorithm Applicable when the joint posterior distribution is not known explicitly, but the conditional posterior distributions of each variable/subsets of variables are known.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 16 / 170

slide-18
SLIDE 18

EM algorithm

General method for ML estimation in models with unobserved latent variables. The complete-data log-likelihood contains the observed and the unobserved / missing data. This is easier to estimate. Iterates between E-step, which computes the expectation of the complete-data log-likelihood, and M-step, where the expected complete-data log-likelihood is maximized.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 17 / 170

slide-19
SLIDE 19

Missing data

The component label vectors zn = (znk)k=1,...,K are treated as missing

  • data. It holds that

znk ∈ {0, 1} and

K

k=1 znk = 1 for all n = 1, . . . , N.

The complete-data log-likelihood is given by

log Lc(Θ) =

K

  • k=1

N

  • n=1

znk [log πk + log fk(yn|xn, ϑk)] .

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 18 / 170

slide-20
SLIDE 20

EM algorithm: E-step

Given the current parameter estimates Θ(i) replace the missing data znk by the estimated a-posteriori probabilities

ˆ

z(i)

nk = P(k|yn, xn, Θ(i)) =

π(i)

k fk(yn|xn, ϑ(i) k ) K

  • u=1

π(i)

u fk(yn|xn, ϑ(i) u )

.

The conditional expectation of log Lc(Θ) at the ith step is given by Q(Θ; Θ(i)) = EΘ(i) [log Lc(Θ)|y, x]

=

K

  • k=1

N

  • n=1

ˆ

z(i)

nk [log πk + log fk(yn|xn, ϑk)] .

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 19 / 170

slide-21
SLIDE 21

EM algorithm: M-step

The next parameter estimate is given by:

Θ(i+1) = arg max

Θ

Q(Θ; Θ(i)). The estimates for the component sizes are given by:

π(i+1)

k

= 1

N

N

  • n=1

ˆ

z(i)

nk .

The component specific parameter estimates are determined by:

ϑ(i+1)

k

= arg max

ϑk

N

  • n=1

ˆ

z(i)

nk log(fk(yn|xn, ϑk)).

⇒ Weighted ML estimation of the component specific model.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 20 / 170

slide-22
SLIDE 22

Mixtures of two univ. normal distributions: Example

Finite mixture of two univariate normal distributions: h(y|Θ) = πfN(y|µ1, σ2

1) + (1 − π)fN(y|µ2, σ2 2),

where fN(·|µ, σ2) is the density of the univariate normal distribution with mean µ and variance σ2.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 21 / 170

slide-23
SLIDE 23

Mixtures of two univ. normal distributions: Example / 2

1

Initialize the parameters µ(0)

1 , σ(0) 1 , µ(0) 2 , σ(0) 2

and π(0). Set i = 0.

2

E-step: Determine the a-posteriori probabilities. The contributions to the likelihood for each observation and component are:

˜

z(i)

n1 = π(i)fN(yn|µ(i) 1 , (σ2 1)(i)),

˜

z(i)

n2 = (1 − π(i))fN(yn|µ(i) 2 , (σ2 2)(i)).

The a-posteriori probability is given by: z(i)

n1 =

˜

z(i)

n1

˜

z(i)

n1 + ˜

z(i)

n2

.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 22 / 170

slide-24
SLIDE 24

Mixtures of two univ. normal distributions: Example / 3

3

M-step:

π(i+1) = 1

N

N

  • n=1

z(i)

n1

µ(i+1)

1

=

1 Nπ(i)

N

  • n=1

z(i)

n1 yn,

µ(i+1)

2

=

1 N(1 − π(i))

N

  • n=1

(1 − z(i)

n1 )yn

(σ2

1)(i+1) =

1 Nπ(i)

N

  • n=1

z(i)

n1 (yn − µ(i) 1 )2

(σ2

2)(i+1) =

1 N(1 − π(i))

N

  • n=1

(1 − z(i)

n1 )(yn − µ(i) 2 )2

Increase i by 1.

4

Iterate between E- and M-step until convergence.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 23 / 170

slide-25
SLIDE 25

EM algorithm

Advantages: The likelihood is increased in each step → EM algorithm converges for bounded likelihoods. Relatively easy to implement: Different mixture models require only different M-steps. Weighted ML estimation of the component specific model is sometimes already available in standard software. Disadvantages: Standard errors have to be determined separately as the information matrix is not required during the algorithm. Convergence only to a local optimum. Slow convergence.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 24 / 170

slide-26
SLIDE 26

Why does EM work?

Given the definition of conditional probabilities it holds P(z|y, Θ′) = P(z, y|Θ′) P(y|Θ′) . This can be transformed to P(y|Θ′) = P(z, y|Θ′) P(z|y, Θ′). For the log-likelihoods it thus holds

ℓ(Θ′; y) = ℓc(Θ′; y, z) − ℓ1(Θ′; z|y).

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 25 / 170

slide-27
SLIDE 27

Why does EM work? / 2

Taking expectation with respect to the conditional density of y, z|y, Θ gives

ℓ(Θ′; y) = E[ℓc(Θ′; y, z)|y, Θ] − E[ℓ1(Θ′; z|y)|y, Θ] = Q(Θ′, Θ) − R(Θ′, Θ).

R(Θ′, Θ) is the expected value of the log-likelihood of a density with parameter Θ′ taken with respect to the same density with parameter value Θ. Based on the Jensen’s inequality it follows that R(Θ′, Θ) is maximum for Θ′ = Θ. If Θ′ maximizes Q(Θ′, Θ), it holds

ℓ(Θ′; y) − ℓ(Θ; y) = (Q(Θ′, Θ) − Q(Θ, Θ)) − (R(Θ′, Θ) − R(Θ, Θ)) ≥ 0.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 26 / 170

slide-28
SLIDE 28

EM as a Max-Max method

The EM algorithm can also be interpreted as a joint maximization method. Consider the following function: F(Θ′, ˜ P) = E˜

P[ℓc(Θ′; y, z)] − E˜ P[log ˜

P(z)]. For fixed Θ′ the function F with respect to ˜ P(z) is maximized by

˜

P(z) = P[z|y, Θ′]. In the M-step F() is maximized with respect Θ′ keeping ˜ P fixed. The observed log-likelihood and F(Θ′, ˜ P) have the same value for

˜

P(z) = P[z|y, Θ′].

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 27 / 170

slide-29
SLIDE 29

EM as a Max-Max method / 2

0.9 . 8 0.7 0.6 0.5

Latent data parameter Model parameter

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 28 / 170

slide-30
SLIDE 30

EM algorithm: Variants

Classification EM (CEM): assigns each observation to the component with the maximum a-posteriori probability. In general faster convergence than EM. Convergence to the classification likelihood. Stochastic EM (SEM): assigns each observation to one component by drawing from the multinomial distribution induced by the a-posteriori probabilities. Does not converge to the “closest” local optimum given the initial values. If started with too many components, some components will eventually become empty and will be eliminated.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 29 / 170

slide-31
SLIDE 31

Bayesian estimation

Determine the posterior density using Bayes’ theorem p(Θ|Y, X) ∝ h(Y|X, Θ)p(Θ), where p(Θ) is the prior and Y = (yn)n and X = (xn)n. Standard prior distributions: Proper priors: Improper priors give improper posteriors. Independent priors for the component weights and the component specific parameters. Conjugate priors for the complete likelihood. Dirichlet distribution D(e0,1, . . . , e0,K) for the component weights which is the conjugate prior for the multinomial distribution. Priors on the component specific parameters depend on the underlying distribution family. Invariant priors, e.g. the parameter for the Dirichlet prior is constant over all components: e0,k ≡ e0.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 30 / 170

slide-32
SLIDE 32

Estimation: Gibbs sampling

Starting with Z 0 = (z0

n)n=1,...,N repeat the following steps for

i = 1, . . . , I0, . . . , I + I0.

1

Parameter simulation conditional on the classification Z (i−1):

1

Sample π1, . . . , πK from D((N

n=1 z(i−1) nk

+ e0,k)k=1,...,K).

2

Sample component specific parameters from the complete-data posterior p(ϑ1, . . . , ϑK|Z (i−1), Y) Store the actual values of all parameters Θ(i) = (π(i)

k , ϑ(i) k )k=1,...,K .

2

Classification of each observation (yn, xn) conditional on knowing

Θ(i):

Sample z(i)

n

from the multinomial distribution with parameter equal to the posterior probabilities. After discarding the burn-in draws the draws I0 + 1, . . . , I + I0 can be used to approximate all quantities of interest (after resolving label switching).

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 31 / 170

slide-33
SLIDE 33

Example: Gaussian distribution

Assume an independence prior p(µk, Σ−1

k ) ∼ fN(µk; b0, B0)fW(Σ−1 k ; c0, C0).

1

Parameter simulation conditional on the classification Z (i−1):

1

Sample π(i)

1 , . . . , π(i) K from D((N n=1 z(i−1) nk

+ e0,k)k=1,...,K).

2

Sample (Σ−1

k )(i) in each group k from a Wishart

W(ck(Z (i−1)), Ck(Z (i−1))) distribution.

3

Sample µ(i)

k

in each group k from a

N(bk(Z (i−1)), Bk(Z (i−1))) distribution.

2

Classification of each observation yn conditional on knowing Θ(i): P(z(i)

nk = 1|yn, Θ(i)) ∝ πkfN(yn; µk, Σk)

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 32 / 170

slide-34
SLIDE 34

Estimation: Gibbs sampling

Advantages: Relatively easy to implement. Different mixture models differ only in the parameter simulation step. Parameter simulation conditional on the classification is sometimes already available. Disadvantages: Might fail to escape the attraction area of one mode → Not all posterior modes are visited.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 33 / 170

slide-35
SLIDE 35

Selecting the number of components

A-priori known Information criteria: e.g. AIC, BIC, ICL: AIC = −2 log(L) + 2p, BIC = −2 log(L) + log(N)p, ICL = −2 log(L) + log(N)p + 2ent, where p is the number of estimated parameters and ent denotes the mean entropy of the a-posteriori probabilities. Likelihood ratio test statistic: in a ML framework. Comparison of nested models ⇒ Regularity conditions are not fulfilled. Bayes factors in a Bayesian framework. Sampling schemes with a varying number of components in a Bayesian framework. Reversible-jump MCMC. Inclusion of birth-and-death processes.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 34 / 170

slide-36
SLIDE 36

Initialization

Construct a suitable parameter vector Θ(0). Random. Other estimation methods: e.g. moment estimators. Classify observations/assign a-posteriori probabilities to each

  • bservation.

Random. Cluster analysis results: e.g. hierarchical clustering, k-means. Use short runs of EM, CEM or SEM with different initializations (Biernacki et al., 2003). Use different subsets of the complete data (Wehrens et al., 2004).

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 35 / 170

slide-37
SLIDE 37

Resampling for model diagnostics

Testing for the number of components: For example, the likelihood-ratio test using the parametric bootstrap (McLachlan, 1987). Standard errors for the parameter estimates: For example using the parametric bootstrap with initialization in the solution (Basford et al., 1997). Identifiability: For example by testing for unimodality of the component specific parameter estimates using either empirical or parametric bootstrap with random initialization. Stability of induced partitions: For example by comparing the results based on the Rand index correct by chance (Hubert & Arabie, 1985) using either empirical

  • r parametric bootstrap with random initialization.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 36 / 170

slide-38
SLIDE 38

Identifiability

Three kinds of identifiability issues: Label switching Overfitting: leads to empty components or components with equal parameters. Generic unidentifiability

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 37 / 170

slide-39
SLIDE 39

Label switching

The posterior distribution is invariant under a permutation of the components with the same component-specific model.

⇒ Determine a unique labeling for component-specific inference:

Impose a suitable ordering constraint, e.g. πs < πt

∀s, t ∈ {1, . . . , S} with s < t.

Minimize the distance to the Maximum-A-Posteriori (MAP) estimate. Fix the component membership for some observations. Relabeling algorithms.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 38 / 170

slide-40
SLIDE 40

Mixtures of distributions

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 39 / 170

slide-41
SLIDE 41

Model definition

A finite mixture of distributions is given by H(y|Θ) =

K

  • k=1

πkFk(y|ϑk),

where

K

  • k=1

πk = 1

and

πk > 0 ∀k.

In the case of mixtures of distributions sometimes one of the components is assumed to be from a different parametric family, e.g., to model noise.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 40 / 170

slide-42
SLIDE 42

Distributions in the components

We assume that we have p-dimensional data with Metric variables.

⇒ Mixtures of multivariate normal distributions

Binary / categorical variables.

⇒ Latent class analysis:

Assumes that variables within each component are independent. Both variable types.

⇒ Mixed-mode data:

Assuming again that variable types are independent within each component allows to separate the two cases (Hunt and Jorgensen, 1988).

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 41 / 170

slide-43
SLIDE 43

Generic identifiability

Identifiable: (multivariate) normal-, Gamma-, exponential-, Cauchy- and Poisson distribution in the components. Not identifiable: continuous or discrete uniform distribution in the components. Identifiable under certain conditions: Mixture of binomial and multinomial distributions are identifiable if N ≥ 2K − 1, where N is the parameter representing the number of trials. See for example Everitt & Hand (1981), Titterington et al. (1985), McLachlan & Peel (2000).

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 42 / 170

slide-44
SLIDE 44

Mixtures of normal distributions

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 43 / 170

slide-45
SLIDE 45

Univariate normal distributions

The density of a finite mixture of univariate normal distributions is given by h(y|Θ) =

K

  • k=1

πkfN(y|µk, σ2

k).

The likelihood is unbounded for degenerate solutions with σ2

k = 0.

⇒ Add a constraint that

1

σ2

k > ǫ for all k or

2

πk > ǫ for all k (to achieve in general a similar effect).

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 44 / 170

slide-46
SLIDE 46

Univariate normal distributions / 2

Histogram of faithful$waiting

faithful$waiting Density 40 50 60 70 80 90 100 0.00 0.01 0.02 0.03 0.04 0.05 Bettina Grün c

September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 45 / 170

slide-47
SLIDE 47

Univariate normal distributions / 3

Histogram of faithful$waiting

faithful$waiting Density 40 50 60 70 80 90 100 0.00 0.01 0.02 0.03 0.04 0.05 Bettina Grün c

September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 46 / 170

slide-48
SLIDE 48

Univariate normal distributions / 4

The estimated model has the following parameters:

Comp.1 Comp.2 pi 0.64 0.36 mu 80.10 54.63 sigma 5.87 5.89

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 47 / 170

slide-49
SLIDE 49

Multivariate normal distributions

For p-dimensional data the number of parameters to estimate is K − 1 + K ·

  • p + p(p + 1)

2

  • for unrestricted variance-covariance matrices.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 48 / 170

slide-50
SLIDE 50

Multivariate normal distributions / 2

More parsimonious model specifications are: Restrict the variance-covariance matrices to be the same over components. Specify the structure of the variance-covariance matrix using its decomposition into volume, shape and orientation (Fraley & Raftery, 2002):

Σk = λkD⊤

k diag(ak)Dk.

For these three characteristics restrictions can be imposed separately to be the same over components. In addition the orientation can be assumed to be equal to the identity matrix and the shape to be spherical.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 49 / 170

slide-51
SLIDE 51

Multivariate normal distributions / 3

Specify a factor model for the variance-covariance matrix.

⇒ Mixtures of “factor analyzers” (McLachlan & Peel, 2000) ⇒ Analogous restrictions possible to obtain “parsimonious

Gaussian mixture models” (McNicholas & Murphy, 2008).

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 50 / 170

slide-52
SLIDE 52

Multivariate normal distributions / 4

  • −5

5 10 −2 2 4 6 8 Bettina Grün c

September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 51 / 170

slide-53
SLIDE 53

Multivariate normal distributions / 5

  • −5

5 10 −2 2 4 6 8

1 2 3 4

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 52 / 170

slide-54
SLIDE 54

Multivariate normal distributions / 6

  • −5

5 10 −2 2 4 6 8

1 2 3 4 5

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 53 / 170

slide-55
SLIDE 55

Mixtures of normal distributions: Example

Example from Fraley & Raftery (2002), data from package spatstat. Estimation of a density for the occurrence of maple trees. 2-dimensional coordinates for 514 trees observed.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 54 / 170

slide-56
SLIDE 56

Mixtures of normal distributions: Example / 2

  • ● ●
  • ● ●
  • ● ●
  • ●●
  • ● ●
  • ● ●
  • 0.0

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x y

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 55 / 170

slide-57
SLIDE 57

Mixtures of normal distributions: Example / 3

Mixtures with 1 to 9 components are fitted with all different restrictions on the variance-covariance matrices. The best model has 7 components and spherical variance-covariance matrices with different volume across components.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 56 / 170

slide-58
SLIDE 58

Mixtures of normal distributions: Example / 4

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x y

  • x

y

. 5 . 5 0.5 1 1 1.5 1.5 1.5 2 2 2 2 2 2 2 . 5 2 . 5 2 . 5 2 . 5 2 . 5 3 3 3 3 3.5 4

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 57 / 170

slide-59
SLIDE 59

Package mclust

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 58 / 170

slide-60
SLIDE 60

Package mclust

Written by Chris Fraley, Adrian Raftery, Luca Scrucca, Thomas Brendan Murphy. In particular suitable for mixtures of uni- and multivariate normal distributions. Uses model-based hierarchical clustering for initialization. Variance-covariance matrices can be specified via volume, shape and orientation. Function Mclust() returns the best model according to the BIC.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 59 / 170

slide-61
SLIDE 61

Package mclust: Model specification

univariate:

"E": identical variance "V": different variances

multivariate: three letter abbreviation for volume: equal (E) / variable (V) shape: identity (I) / equal (E) / variable (V)

  • rientation: identity (I) / equal (E) / variable (V)

"EII": spherical, same volume "VII": spherical, different volume "EEI": diagonal matrix, same volume & shape "VEI": diagonal matrix, different volume, same shape "EVI": diagonal matrix, same volume, different shape "VVI": diagonal matrix, different volume & shape

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 60 / 170

slide-62
SLIDE 62

Package mclust: Model specification / 2

"EEE": ellipsoidal, same volume, shape & orientation "EVE": ellipsoidal, same volume & orientation "VEE": ellipsoidal, same shape & orientation "VVE": ellipsoidal, same orientation "EEV": ellipsoidal, same volume & shape "VEV": ellipsoidal, same shape "EVV": ellipsoidal, same volume "VVV": ellipsoidal, different volume, shape and orientation

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 61 / 170

slide-63
SLIDE 63

Package mclust: Model specification / 3

EII

  • VII
  • EEI
  • VEI
  • EVI
  • VVI
  • Bettina Grün

c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 62 / 170

slide-64
SLIDE 64

Package mclust: Model specification / 4

EEE

  • EVE
  • VEE
  • VVE
  • EEV
  • VEV
  • EVV
  • VVV
  • Bettina Grün

c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 63 / 170

slide-65
SLIDE 65

Package mclust: Function Mclust()

Mclust() has the following arguments:

data: A numeric vector, a numeric matrix or a data frame of

  • bservations. Categorical variables are not allowed. If a matrix or a

data frame is supplied, the rows correspond to observations and columns to variables. G: An integer vector indicating the number of components for which the model should be fitted. The default is G = 1:9. modelNames: A character vector specifying the models (with different restrictions on the variance-covariance matrices) to fit. prior: The default is to use no prior. This argument allows to specify conditionally conjugate priors on the mean and the variance-covariance matrices of the components using

priorControl().

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 64 / 170

slide-66
SLIDE 66

Package mclust: Function Mclust() / 2

control: A list of control parameters for the EM algorithm. The defaults are obtained by calling emControl(). initialization: A list containing one or several elements of: (1)

"hcPairs", providing the results of the hierarchical clustering,

and (2) "subset", a logical or numeric vector to specify a subset. warn: A logical value to indicate if warnings (e.g., with respect to singularities) are issued. The default is to suppress warnings.

. . .: Catches unused arguments.

Returns an object of class ‘Mclust’.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 65 / 170

slide-67
SLIDE 67

Package mclust: Further methods

The fitted object with class ‘Mclust’ can be plotted using the associated plot method. The following plots can be generated: Comparison of the BIC values for all estimated models. Scatter plot indicating the classification for 2-dimensional data. 2-dimensional projection of the data with classification. 2-dimensional projection of the data with uncertainty of classification. Estimated density for 1- and 2-dimensional data.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 66 / 170

slide-68
SLIDE 68

Package mclust: Maple trees

> data("lansing", package = "spatstat") > maples <- subset(spatstat::as.data.frame.ppp(lansing), + marks == "maple", c("x", "y")) > library("mclust") > maplesMix <- Mclust(maples) > maplesMix ’Mclust’ model object: best model: spherical, varying volume (VII) with 7 compone > par(mfrow = c(2, 2)) > plot(maplesMix, what = "BIC") > plot(maplesMix, what = "classification") > plot(maplesMix, what = "uncertainty") > plot(maplesMix, what = "density")

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 67 / 170

slide-69
SLIDE 69

Package mclust: Maple trees / 2

−100 50 100 Number of components BIC

  • 1

2 3 4 5 6 7 8 9

  • EII

VII EEI VEI EVI VVI EEE EVE VEE VVE EEV VEV EVV VVV 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x y

Classification

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x y

Uncertainty

x y

−6 −6 −5 − 4 −3 −3 − 3 −2 − 2 −2 −2 −1 − 1 −1 1 1 1 1 1

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

log Density Contour Plot

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 68 / 170

slide-70
SLIDE 70

Latent class analysis

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 69 / 170

slide-71
SLIDE 71

Latent class analysis

Used for multivariate binary and categorical data. Often only local identifiability ensured. Observed variables are correlated because of the unobserved group structure.

⇒ Within each component variables are assumed to be

independent and the multivariate density is given by the product of the univariate densities.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 70 / 170

slide-72
SLIDE 72

Latent class analysis / 2

In general the model can be written as h(y|Θ) =

K

  • k=1

πk  

p

  • j=1

fkj(yj|ϑkj)

  .

In latent class analysis the densities fkj() for all k and j are from the same univariate parametric distribution family. For binary variables the Bernoulli distribution is used: f(y|θ) = θy(1 − θ)1−y.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 71 / 170

slide-73
SLIDE 73

Latent class analysis: Example

Example from Dayton (1999), data from package poLCA. Data from a survey among students about cheating. 319 students were questioned. The students were asked about their behavior during their bachelor studies. The following four questions are used in the following which are about

1

lying to avoid taking an exam,

2

lying to avoid handing a term paper in on time,

3

purchasing a term paper to hand in as their own or obtaining a copy of an exam prior to taking the exam,

4

copying answers during an exam from someone sitting near to them.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 72 / 170

slide-74
SLIDE 74

Latent class analysis: Example / 2

The relative frequencies of agreement with the four questions are: 0.11, 0.12, 0.07, 0.21. First it is assessed if the answers to the different questions are independent given the marginal distributions. The density is given by h(y|Θ) =

4

  • j=1

θ

yj j (1 − θj)1−yj,

where θj is the probability that question j is agreed to. A χ2-goodness-of-fit test is conducted:

χ2 = 136.3,

p-value < 2e − 16. This simple model does not fit the data well.

⇒ Extension to mixture models.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 73 / 170

slide-75
SLIDE 75

Latent class analysis: Example / 3

The finite mixture model is given by h(y|Θ) =

K

  • k=1

πk  

4

  • j=1

θ

yj kj(1 − θkj)1−yj

  ,

where θkj is the probability to agree to question j, if the respondent is from component k. This model is fit with 2 components. The χ2-goodness-of-fit test gives

χ2 = 8.3,

p-value = 0.22. The null hypothesis that the data is from this model cannot be rejected.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 74 / 170

slide-76
SLIDE 76

Latent class analysis: Example / 4

The size of the first component is 0.16. The estimated parameters for both components are

Comp.1 Comp.2 center.LIEEXAM 0.58 0.02 center.LIEPAPER 0.59 0.03 center.FRAUD 0.22 0.04 center.COPYEXAM 0.38 0.18

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 75 / 170

slide-77
SLIDE 77

Package flexmix

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 76 / 170

slide-78
SLIDE 78

Software: flexmix

(Leisch, 2004; Grün & Leisch, 2008a) The function flexmix() provides the E-step and all data handling. The M-step is supplied by the user similar to glm() families. Multiple independent responses from different families Currently bindings to several GLM families exist (Gaussian, Poisson, Gamma, Binomial) Weighted, hard (CEM) and random (SEM) classification Components with prior probability below a user-specified threshold are automatically removed during iterations.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 77 / 170

slide-79
SLIDE 79

flexmix design

Primary goal is extensibility: ideal for trying out new mixture models. No replacement of specialized mixtures like Mclust(), but complement. Usage of S4 classes and methods. Formula-based interface. Multivariate responses: combination of univariate families: assumption of independence (given x), each response may have its own model formula, i.e., a different set of regressors. multivariate families: if family handles multivariate response directly, then arbitrary multivariate response distributions are possible.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 78 / 170

slide-80
SLIDE 80

Function flexmix()

flexmix() takes the following arguments:

formula: A symbolic description of the model, which is fitted. The general form is y~x|g, where y is the dependent variable, x the independent variables and g an optional categorical grouping variable for repeated observations. data: An optional data frame used for evaluating the formula. k: Number of components (not necessary if cluster is specified). cluster: Either a matrix with k columns which contains the initial a-posteriori probabilities; or a categorical variable or integer vector indicating the component memberships of observations. model: Object of class ‘FLXM’ or a list of these objects. concomitant: Object of class ‘FLXP’. control: Object of class ‘FLXcontrol’ or a named list.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 79 / 170

slide-81
SLIDE 81

Function flexmix() / 2

Returns an object of class ‘flexmix’. Repeated calls of flexmix() with

stepFlexmix(): using random initialization. initFlexmix(): combining short with long runs of EM.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 80 / 170

slide-82
SLIDE 82

Controlling the EM algorithm

‘FLXcontrol’ for setting the specifications for the EM algorithm: iter.max: maximum number of iterations. minprior: minimum component sizes. verbose: If larger than 0, then flexmix() provides status information each verbose iteration of the EM algorithm. classify: One of “auto”, “weighted”, “CEM” (or “hard”), “SEM” (or “random”). For convenience flexmix() also allows specification of the control argument through a named list, where the names are partially matched, e.g.,

flexmix(..., control = list(class = "r"))

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 81 / 170

slide-83
SLIDE 83

Variants of mixture models

Component specific models: FLXMxxx() Model-based clustering: FLXMCxxx()

FLXMCmvnorm() FLXMCmvbinary() FLXMCmvpois()

. . . Clusterwise regression: FLXMRxxx()

FLXMRglm() FLXMRglmfix() FLXMRziglm()

. . . Concomitant variable models: FLXPxxx()

FLXPconstant() FLXPmultinom()

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 82 / 170

slide-84
SLIDE 84

Methods for ‘flexmix’ objects

show(), summary(): some information on the fitted model. plot(): rootogram of posterior probabilities. refit(): refits an estimated mixture model to obtain the

variance-covariance matrix.

logLik(), BIC(), . . . : obtain log-likelihood and model fit criteria. parameters(), prior(): obtain component specific model

parameters and prior class probabilities / component weights.

posterior(), clusters(): obtain a-posteriori probabilities and

assignments to the maximum a-posteriori probability.

fitted(), predict(): fitted and predicted (component-specific)

values.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 83 / 170

slide-85
SLIDE 85

Package flexmix: Cheating

Data preparation:

> data("cheating", package = "poLCA") > X <- as.data.frame(table(cheating[, 1:4])) > X <- subset(X, Freq > 0) > X[, 1:4] <- sapply(X[,1:4], as.integer) - 1 > head(X) LIEEXAM LIEPAPER FRAUD COPYEXAM Freq 1 207 2 1 10 3 1 13 4 1 1 11 5 1 7 6 1 1 1

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 84 / 170

slide-86
SLIDE 86

Package flexmix: Cheating / 2

Fit model with flexmix:

> set.seed(201107) > FORMULA <- + cbind(LIEEXAM, LIEPAPER, FRAUD, COPYEXAM) ~ 1 > CONTROL <- + list(tolerance = 10^-12, iter.max = 500) > cheatMix <- + stepFlexmix(FORMULA, k = 2, weights = ~ Freq, + model = FLXMCmvbinary(), data = X, + control = CONTROL, nrep = 3) 2 : * * *

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 85 / 170

slide-87
SLIDE 87

Package flexmix: Cheating / 3

> cheatMix Call: stepFlexmix(FORMULA, weights = ~Freq, model = FLXMCmvbinary( data = X, control = CONTROL, k = 2, nrep = 3) Cluster sizes: 1 2 54 265 convergence after 314 iterations

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 86 / 170

slide-88
SLIDE 88

Package flexmix: Cheating / 4

> summary(cheatMix) Call: stepFlexmix(FORMULA, weights = ~Freq, model = FLXMCmvbinary( data = X, control = CONTROL, k = 2, nrep = 3) prior size post>0 ratio Comp.1 0.161 54 319 0.169 Comp.2 0.839 265 319 0.831 ’log Lik.’ -440 (df=9) AIC: 898.1 BIC: 931.9 > plot(cheatMix)

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 87 / 170

slide-89
SLIDE 89

Package flexmix: Cheating / 5

Rootogram of posterior probabilities > 1e−04

75 150 225 0.0 0.2 0.4 0.6 0.8 1.0

  • Comp. 1

0.0 0.2 0.4 0.6 0.8 1.0

  • Comp. 2

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 88 / 170

slide-90
SLIDE 90

Mixtures of regression models

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 89 / 170

slide-91
SLIDE 91

Mixtures of regression models

aka clusterwise regression Regression models are fitted for each component.

⇒ Weighted ML estimation of linear and generalized linear models

required. Heterogeneity between observations with respect to their regression parameters. Random effects can be estimated in a semi-parametric way. Possible models: Mixtures of linear regression models Mixtures of generalized linear regression models Mixtures of linear mixed regression models

. . .

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 90 / 170

slide-92
SLIDE 92

Mixtures of regression models / 2

The density of a mixture of regression models is given by h(y|x, w, Θ) =

K

  • k=1

πkfk(y|x, ϑk),

where

K

  • k=1

πk = 1

and

πk > 0 ∀k.

In the EM algorithm the weighted log-likelihoods of the regression models need to be maximized in the M-step.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 91 / 170

slide-93
SLIDE 93

Generic identifiability

Influencing factors: Component distribution (see mixtures of distributions). Model matrix. Repeated observations for each individual / partly labeled

  • bservations.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 92 / 170

slide-94
SLIDE 94

Identifiability: Model matrix

If only one observation per individual is available, the full rank of the model matrix is not sufficient for identifiability. Linear models: Mixtures of linear models with normally distributed errors are identifiable, if the number of components K is smaller than the minimum number of (feasible) hyper planes which are necessary to cover all covariate points (Hennig, 2000). Generalized linear models: Analogous condition for the linear predictor, additional constraints dependent on the distribution of the dependent variable (in particular for binomial and multinomial logit-models, see Grün & Leisch, 2008b).

⇒ Coverage condition.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 93 / 170

slide-95
SLIDE 95

Identifiability: Repetitions / labels

At least one of the hyper planes in the coverage condition needs to cover all repeated / labeled observations where component membership is fixed. Violation of the coverage condition leads to: Intra-component label switching: If the labels are fixed in one covariate point according to some

  • rdering constraint, then labels may switch in other covariate

points for different parameterizations of the model.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 94 / 170

slide-96
SLIDE 96

Illustration: Binomial logit models

We consider a mixture density h(y|x, Θ) = π1fB(y|N, (1, x)β1) + (1 − π1)fB(y|N, (1, x)β2) with binomial logit models in the components and parameters

π1 = 0.5, β1 = (−2, 4)′, β2 = (2, −4)′.

Even if N ≥ 3 the mixture is not identifiable if there are only 2 covariate points available. The second solution is then given by

π(2)

1

= 0.5, β(2)

1

= (−2, 0)′, β(2)

2

= (2, 0)′.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 95 / 170

slide-97
SLIDE 97

Illustration: Binomial logit models / 2

Simulation design: Number of repetitions N ∈ {1, 10} in the same covariate point. Number of covariate points: #x ∈ {2, 5}, equidistantly spread across [0, 1]. 100 samples with 100 observations are drawn from the model for each combination of N and #x. Finite mixtures with 2 components are fitted to each sample. Solutions after imposing an ordering constraint on the intercept are reported.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 96 / 170

slide-98
SLIDE 98

Illustration: Binomial logit models / 3

x Relative frequency

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

  • N = 1

#x = 2

0.0 0.2 0.4 0.6 0.8 1.0

  • N = 10

#x = 2

0.0 0.2 0.4 0.6 0.8 1.0

  • N = 1

#x = 5

  • N = 10

#x = 5 Bettina Grün c

September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 97 / 170

slide-99
SLIDE 99

Illustration: Binomial logit models / 4

(Intercept) x π 5% 95%

N = 1 #x = 2 N = 10 #x = 2

(Intercept) x π

N = 1 #x = 5

5% 95%

N = 10 #x = 5 Bettina Grün c

September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 98 / 170

slide-100
SLIDE 100

Mixtures of linear regressions: Example

  • 10

20 30 40 5 10 15 20 GNP CO2 Bettina Grün c

September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 99 / 170

slide-101
SLIDE 101

Mixtures of linear regressions: Example / 2

Example from Hurn, Justel and Robert (2003), data from package mixtools. 28 observations of countries on their gross national product per capita (GNP) and the estimated carbon dioxide emission per capita (CO2) in 1996. The relationship between the two variables is analyzed using a linear model.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 100 / 170

slide-102
SLIDE 102

Mixtures of linear regressions: Example / 3

  • 10

20 30 40 5 10 15 20 GNP CO2 Bettina Grün c

September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 101 / 170

slide-103
SLIDE 103

Mixtures of linear regressions: Example / 4

Call: lm(formula = CO2 ~ GNP, data = CO2data) Residuals: Min 1Q Median 3Q Max

  • 5.550 -2.530 -0.470

0.682 11.007 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.5978 1.5192 5.00 3.4e-05 GNP 0.0778 0.0687 1.13 0.27 Residual standard error: 4.06 on 26 degrees of freedom Multiple R-squared: 0.047, Adjusted R-squared: 0.010 F-statistic: 1.28 on 1 and 26 DF, p-value: 0.268

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 102 / 170

slide-104
SLIDE 104

Mixtures of linear regressions: Example / 5

It is assumed that the relationship between the two variables is not the same for all countries. A mixture of linear regression models is thus estimated. The model is given by h(CO2|GNP, Θ) =

K

  • k=1

πkfN(CO2|α0k + GNPα1k, σ2

k),

where α0k is the component specific intercept, α1k the component specific regression coefficient for GNP and σ2

k the component

specific residual variance.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 103 / 170

slide-105
SLIDE 105

Mixtures of linear regressions: Example / 6

Mixtures of linear regression models are fitted for 1 to 5 components.

> data("CO2data", package = "mixtools") > co2mix_step <- + stepFlexmix(CO2 ~ GNP, data = CO2data, k = 1:5) 1 : * * * 2 : * * * 3 : * * * 4 : * * * 5 : * * *

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 104 / 170

slide-106
SLIDE 106

Mixtures of linear regressions: Example / 7

> co2mix_step Call: stepFlexmix(CO2 ~ GNP, data = CO2data, k = 1:5) iter converged k k0 logLik AIC BIC ICL 1 2 TRUE 1 1 -77.98 162.0 166.0 166.0 2 22 TRUE 2 2 -66.98 148.0 157.3 160.2 3 41 TRUE 3 3 -63.94 149.9 164.5 179.4 4 35 TRUE 4 4 -63.94 157.9 177.9 204.9 5 30 TRUE 5 5 -63.86 165.7 191.0 219.5

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 105 / 170

slide-107
SLIDE 107

Mixtures of linear regressions: Example / 8

The best model according to the BIC is selected. This has 2 components.

> co2mix <- getModel(co2mix_step) > co2mix Call: stepFlexmix(CO2 ~ GNP, data = CO2data, k = 2) Cluster sizes: 1 2 22 6 convergence after 22 iterations

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 106 / 170

slide-108
SLIDE 108

Mixtures of linear regressions: Example / 9

  • 10

20 30 40 5 10 15 20 GNP CO2 Bettina Grün c

September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 107 / 170

slide-109
SLIDE 109

Mixtures of linear regressions: Example / 10

The estimated regression coefficients for each component together with the estimated standard errors are given by

$Comp.1 Estimate Std. Error z value Pr(>|z|) (Intercept) 8.6790 1.0257 8.46 <2e-16 GNP

  • 0.0233

0.0426

  • 0.55

0.58 $Comp.2 Estimate Std. Error z value Pr(>|z|) (Intercept) 1.4151 0.6653 2.13 0.033 GNP 0.6766 0.0347 19.52 <2e-16

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 108 / 170

slide-110
SLIDE 110

Mixtures of linear regressions: Example / 11

The countries assigned to the smaller component are:

GNP CO2 country 6 2.83 2.7 TUR 2 3.67 3.9 MEX 1 19.02 14.7 CAN 4 20.09 16.0 AUS 5 24.51 16.6 NOR 3 28.20 20.8 USA

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 109 / 170

slide-111
SLIDE 111

Mixtures of binomial logit models: Example

Number of aphids Proportion of infected plants

0.0 0.1 0.2 0.3 0.4 100 200 300

  • Bettina Grün

c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 110 / 170

slide-112
SLIDE 112

Mixtures of binomial logit models: Example / 2

Data from package mixreg; taken from Boiteau et al. (1998) and Turner (2000). Observations from 51 independent experiments. The number of aphids which were released in a room with 12 infected and 69 healthy tobacco plants was varied. After 24 hours the previously healthy plants are checked for infection. For each experiment the number of aphids released (n.aphids) and the number of newly infected plants (n.inf) were recorded. It is assumed that the data comes from a mixture of 2 binomial logit-models.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 111 / 170

slide-113
SLIDE 113

Mixtures of binomial logit models: Example / 3

The model is given by h(n.inf|n.aphids, 69, Θ) =

K

  • k=1

πkfB(n.inf|69, θk(n.aphids)),

where fB(·|N, θ) is the binomial distribution with number of trials parameter N and success probability θ. Furthermore logit(θk) = α0k + α1kn.aphids

α0k is the component specific intercept, α1k the component

specific regression coefficient for number of aphids.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 112 / 170

slide-114
SLIDE 114

Mixtures of binomial logit models: Example / 4

> aphidmix <- + stepFlexmix(cbind(n.inf, 69 - n.inf) ~ n.aphids, + data = aphids, k = 2, nrep = 5, + model = FLXMRglm(family = "binomial")) 2 : * * * * *

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 113 / 170

slide-115
SLIDE 115

Mixtures of binomial logit models: Example / 5

> aphidmix Call: stepFlexmix(cbind(n.inf, 69 - n.inf) ~ n.aphids, data = aphids, model = FLXMRglm(family = "binomial"), k = 2, nrep = 5) Cluster sizes: 1 2 28 23 convergence after 6 iterations

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 114 / 170

slide-116
SLIDE 116

Mixtures of binomial logit models: Example / 6

> summary(aphidmix) Call: stepFlexmix(cbind(n.inf, 69 - n.inf) ~ n.aphids, data = aphids, model = FLXMRglm(family = "binomial"), k = 2, nrep = 5) prior size post>0 ratio Comp.1 0.54 28 37 0.757 Comp.2 0.46 23 45 0.511 ’log Lik.’ -127.6 (df=5) AIC: 265.3 BIC: 274.9

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 115 / 170

slide-117
SLIDE 117

Mixtures of binomial logit models: Example / 7

Number of released aphids Proportion of infected plants

0.0 0.1 0.2 0.3 0.4 100 200 300

  • Bettina Grün

c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 116 / 170

slide-118
SLIDE 118

Mixtures of binomial logit models: Example / 8

> aphidmix_ref <- refit(aphidmix) > summary(aphidmix_ref) $Comp.1 Estimate Std. Error z value Pr(>|z|) (Intercept) -4.32054 0.31915

  • 13.54

<2e-16 n.aphids 0.00199 0.00190 1.05 0.3 $Comp.2 Estimate Std. Error z value Pr(>|z|) (Intercept) -2.461623 0.141691

  • 17.37

< 2e-16 n.aphids 0.005300 0.000694 7.64 2.3e-14

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 117 / 170

slide-119
SLIDE 119

Extensions and variants

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 118 / 170

slide-120
SLIDE 120

Extensions and variants

Concomitant variable models. Identical parameters for all components / groups of components: Regression coefficients Variances Fix some parameters of a component: Zero-inflated models Mixtures of linear mixed models.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 119 / 170

slide-121
SLIDE 121

Concomitant variable models

The component sizes depend on additional variables. These additional variables allow to characterize the composition of components. The density of the finite mixture model with concomitant variables is given by: h(y|w, x, Θ) =

K

  • k=1

πk(w, α)fk(y|x, ϑk),

where for all w

K

  • k=1

πk(w, α) = 1

and

πk(w, α) > 0 ∀k.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 120 / 170

slide-122
SLIDE 122

Concomitant variable models / 2

The E-step changes to z(i)

nk = P(k|yn, wn, Θ(i)) =

πk(wn, α(i))fk(yn|ϑ(i)

k ) K

  • u=1

πu(wn, α(i))fk(yn|ϑ(i)

u )

.

In the M-step α also needs to be estimated. This can be done separately from the estimation of the parameters ϑk, because the conditional expectation of log Lc(Θ) in the ith step is given by Q(Θ; Θ(i)) = EΘ(i) [log Lc(Θ)|y, w]

=

K

  • k=1

N

  • n=1

z(i)

nk [log πk(wn, α) + log fk(yn|ϑk)] .

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 121 / 170

slide-123
SLIDE 123

Concomitant variable models / 3

Multinomial logit models are used most frequently (Dayton & Macready, 1988).

α is a vector of length K, where for identifiability one has to impose

for example αK ≡ 0. The component sizes are given by

πk(w, α) =

ew⊤αk

K

  • u=1

ew⊤αu

.

In the M-step weighted ML estimation of multinomial logit models is required.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 122 / 170

slide-124
SLIDE 124

Concomitant variable models: Example

Continuing the example using the cheating data. For 315 students also the grade point average (GPA) is available. The GPA was categorized into 5 categories with

(1)

2.99 or smaller

(2)

3.00 – 3.25

(3)

3.26 – 3.50

(4)

3.51 – 3.75

(5)

3.76 – 4.00 The scores 1–5 are used as metric variables in the analysis. A multinomial logit model is fitted for GPA as concomitant variable.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 123 / 170

slide-125
SLIDE 125

Concomitant variable models: Example / 2

A likelihood ratio test between the models with and without the concomitant variable model gives

χ2 = 20.8,

p-value = 5.2e − 06.

⇒ The larger model provides a significantly better fit.

The coefficients of the concomitant variable model are:

$Comp.2 Estimate Std. Error z value Pr(>|z|) (Intercept)

  • 0.113

0.524

  • 0.22

0.8285 GPA 0.842 0.269 3.14 0.0017

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 124 / 170

slide-126
SLIDE 126

Concomitant variable models: Example / 3

1 1 1 1 1 1 2 3 4 5 0.0 0.2 0.4 0.6 0.8 1.0 GPA A−priori probability 2 2 2 2 2

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 125 / 170

slide-127
SLIDE 127

Concomitant variable models: Example / 4

Fit model with concomitant variables using flexmix:

> X <- as.data.frame(table(cheating)) > X <- subset(X, Freq > 0) > X[, 1:4] <- sapply(X[,1:4], as.integer) - 1 > X[, 5] <- as.integer(X[, 5]) > POST <- posterior(cheatMix, newdata = X[, 1:4]) > cheatMixConc <- + flexmix(FORMULA, weights = ~ Freq, data = X, + model = FLXMCmvbinary(), + concomitant = FLXPmultinom(~ GPA), + cluster = POST, control = CONTROL)

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 126 / 170

slide-128
SLIDE 128

Concomitant variable models: Example / 5

> summary(cheatMixConc) Call: flexmix(formula = FORMULA, data = X, cluster = POST, model = FLXMCmvbinary(), concomitant = FLXPmultinom(~GP control = CONTROL, weights = ~Freq) prior size post>0 ratio Comp.1 0.178 47 315 0.149 Comp.2 0.822 268 315 0.851 ’log Lik.’ -429.6 (df=10) AIC: 879.3 BIC: 916.8

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 127 / 170

slide-129
SLIDE 129

Concomitant variable models: Example / 6

> cheatMixConcRef <- refit(cheatMixConc) > summary(cheatMixConcRef, which = "concomitant") $Comp.2 Estimate Std. Error z value Pr(>|z|) (Intercept)

  • 0.113

0.524

  • 0.22

0.8285 GPA 0.842 0.269 3.14 0.0017 > plot(cheatMixConcRef, which = "concomitant")

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 128 / 170

slide-130
SLIDE 130

Concomitant variable models: Example / 7

GPA (Intercept) −1.0 −0.5 0.0 0.5 1.0

  • Comp. 2

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 129 / 170

slide-131
SLIDE 131

Identical parameters

Estimation with the EM algorithm: Identical variances: The regression coefficients are estimated in the same way. Given the estimated regression coefficients the variance is estimated using the weighted residuals of all components. Identical regression coefficients: The regression coefficients cannot be estimated separately for each component in the M-step. A joint model matrix for all components needs to be generated.

⇒ The model matrix has KN rows.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 130 / 170

slide-132
SLIDE 132

Identical parameters: Example

Site Response/Total

0.0 0.2 0.4 0.6 0.8

  • Bettina Grün

c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 131 / 170

slide-133
SLIDE 133

Identical parameters: Example / 2

Example from Aitkin (1999), data from package flexmix. 44 observations of a medical study with 22 locations. The observed variables are:

Response: Number of responses at location. Total: Total number of patients at location. Drug: Binary variable if drug was administered or not. Site: Location.

It is assumed that the probability of success follows a mixture distribution over locations with the effect of the drug is assumed to be the same over locations.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 132 / 170

slide-134
SLIDE 134

Identical parameters: Example / 3

> data("Mehta", package = "flexmix") > mehtaMix <- + stepFlexmix(cbind(Response, Total - Response) ~ + 1 | Site, data = Mehta, k = 1:5, + model = FLXMRglmfix(family = "binomial", + fixed = ~ Drug), nrep = 5, + control=list(minprior = 0.04)) 1 : * * * * * 2 : * * * * * 3 : * * * * * 4 : * * * * * 5 : * * * * *

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 133 / 170

slide-135
SLIDE 135

Identical parameters: Example / 4

> mehtaMix Call: stepFlexmix(cbind(Response, Total - Response) ~ 1 | Site, data = Mehta, model = FLXMRglmfix(family = "bin fixed = ~Drug), control = list(minprior = 0.04), k = 1:5, nrep = 5) iter converged k k0 logLik AIC BIC ICL 1 2 TRUE 1 1 -78.81 161.6 165.2 165.2 2 36 TRUE 2 2 -71.76 151.5 158.7 161.5 3 27 TRUE 3 3 -66.81 145.6 156.3 166.8 4 39 TRUE 4 4 -66.81 149.6 163.9 181.1 5 43 TRUE 5 5 -66.81 153.6 171.5 205.2

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 134 / 170

slide-136
SLIDE 136

Identical parameters: Example / 5

> mehtaMix <- getModel(mehtaMix) > summary(mehtaMix) Call: stepFlexmix(cbind(Response, Total - Response) ~ 1 | Site, data = Mehta, model = FLXMRglmfix(family = "bin fixed = ~Drug), control = list(minprior = 0.04), k = 3, nrep = 5) prior size post>0 ratio Comp.1 0.4429 20 40 0.5 Comp.2 0.5114 22 44 0.5 Comp.3 0.0456 2 4 0.5 ’log Lik.’ -66.81 (df=6) AIC: 145.6 BIC: 156.3

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 135 / 170

slide-137
SLIDE 137

Identical parameters: Example / 6

> parameters(mehtaMix) Comp.1 Comp.2 Comp.3 coef.DrugControl 1.759 1.759 1.759 coef.(Intercept) -4.760 -3.562 -1.406

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 136 / 170

slide-138
SLIDE 138

Identical parameters: Example / 7

> summary(refit(mehtaMix)) $Comp.1 Estimate Std. Error z value Pr(>|z|) DrugControl 1.760 0.337 5.22 1.8e-07 (Intercept)

  • 4.749

0.639

  • 7.44

1.0e-13 $Comp.2 Estimate Std. Error z value Pr(>|z|) DrugControl 1.760 0.337 5.22 1.8e-07 (Intercept)

  • 3.558

0.371

  • 9.58

< 2e-16 $Comp.3 Estimate Std. Error z value Pr(>|z|) DrugControl 1.760 0.337 5.22 1.8e-07 (Intercept)

  • 1.406

0.473

  • 2.98

0.0029

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 137 / 170

slide-139
SLIDE 139

Identical parameters: Example / 8

Site Response/Total

0.0 0.2 0.4 0.6 0.8

  • Comp.1
  • Comp.2
  • Comp.3

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 138 / 170

slide-140
SLIDE 140

Zero-inflated models

When fitting generalized linear regression models with the dependent variable following a binomial or Poisson distribution,

  • ften more zeros than expected under the model are observed.

In the mixture context an additional component is fitted which absorbs the excessive zeros. Alternative model: hurdle model. A truncated distribution is used for the observations larger than zero.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 139 / 170

slide-141
SLIDE 141

Zero-inflated models: Example

Example from Böhning et al. (1999), data from package flexmix. 797 observations on teeth among children. The variables are:

End: Number of decayed, missing or filled teeth at the end of

the study.

Begin: Number of decayed, missing or filled teeth at the

beginning of the study.

Gender: Gender. Ethnic: Coded as "brown", "white", "black". Treatment: Indication which of the 6 treatments was applied.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 140 / 170

slide-142
SLIDE 142

Zero-inflated models: Example / 2

> data("dmft", package = "flexmix") > dmft.formula <- End ~ Gender + Ethnic + Treatment + + log(Begin + 0.5) > dmftMix <- + stepFlexmix(dmft.formula, data = dmft, k = 2, + control = list(minprior = 0.01), + model = FLXMRziglm(family = "poisson")) 2 : * * *

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 141 / 170

slide-143
SLIDE 143

Zero-inflated models: Example / 3

> summary(dmftMix) Call: stepFlexmix(dmft.formula, data = dmft, control = list(minpri model = FLXMRziglm(family = "poisson"), k = 2) prior size post>0 ratio Comp.1 0.0461 13 231 0.0563 Comp.2 0.9539 784 797 0.9837 ’log Lik.’ -1232 (df=11) AIC: 2486 BIC: 2538

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 142 / 170

slide-144
SLIDE 144

Zero-inflated models: Example / 4

> parameters(dmftMix) Comp.1 Comp.2 coef.(Intercept)

  • Inf -0.146498

coef.Gendermale 0.006741 coef.Ethnicwhite 0.050334 coef.Ethnicblack 0 -0.046786 coef.Treatmenteduc 0 -0.236596 coef.Treatmentall 0 -0.327348 coef.Treatmentenrich 0.017172 coef.Treatmentrinse 0 -0.240918 coef.Treatmenthygiene 0 -0.102710 coef.log(Begin + 0.5) 0.730067

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 143 / 170

slide-145
SLIDE 145

Mixtures of linear mixed models

Repeated observations for individuals. Component membership fixed for each individual. The model for each individual n with Jn observations is given by h(y|x, u, Θ) =

=

K

  • k=1

πk

  • Jn
  • j=1

fN(ynj|x⊤

nj αk + u⊤ nj bnk, σ2 k)fN(bnk|0, Ψk)dbnk

=

K

  • k=1

πkfN(y|xα, uΨku⊤ + σ2

kIJn).

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 144 / 170

slide-146
SLIDE 146

Mixtures of linear mixed models / 2

The missing information can be seen to consist of: Component membership z. Individual coefficients b for u. The complete-data log-likelihood is given by

log Lc(Θ) =

K

  • k=1

N

  • n=1

znk

 log πk +

Jn

  • j=1

log fN(ynj|x⊤

nj αk + u⊤ nj bnk, σ2 k)

  .

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 145 / 170

slide-147
SLIDE 147

Mixtures of linear mixed models / 3

The a-posteriori probabilities are determined in the E-step of the EM algorithm using znk ∝ πkfN(yn|xnαk, znΨkz⊤

n + σ2 kIJn).

Further the mean and the variance of the random effects are required for the complete-data log-likelihood:

µbn,k = E[bn|yn, xn, zn, Θ, k] = 1 σ2

k

z⊤

n zn + Ψ−1 k

−1 1 σ2

k

z⊤

n (yn − xnαk)

Σbn,k = VAR[bn|yn, xn, zn, Θ, k] = 1 σ2

k

z⊤

n zn + Ψ−1 k

−1 .

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 146 / 170

slide-148
SLIDE 148

Mixtures of linear mixed models / 4

The M-step can be solved in closed form.

ˆ αk =  

N

  • n=1

znk

Jn

  • j=1

xnjx⊤

nj

 

−1 

N

  • n=1

znk

Jn

  • j=1

xnj(ynj − u⊤

nj µbn,k)

  ˆ σ2

k =

1

N

n=1 znkJn N

  • n=1

znk

 

Jn

  • j=1

(ynj − u⊤

nj µbn,k − x⊤ nj ˆ

αk)2 + u⊤

nj Σbn,kunj

  ˆ Ψk =

1

N

n=1 znk N

  • n=1

znk

  • Σbn,k + µbn,kµ⊤

bn,k

  • .

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 147 / 170

slide-149
SLIDE 149

Linear mixed models: Example

Data from package MEMSS. 176 observations on the weight of rats at different time points. The variable Diet is a categorical variable indicating what the animals were fed. We assume in the following that this information has not been observed to fit a mixture model.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 148 / 170

slide-150
SLIDE 150

Linear mixed models: Example / 2

Time weight

300 400 500 600 20 40 60

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 149 / 170

slide-151
SLIDE 151

Linear mixed models: Example / 3

> weightMix <- + stepFlexmix(weight ~ Time | Rat, + k = 2, nrep = 3, + model = FLXMRlmm(random = ~ 1), + data = BodyWeight, + control = list(iter.max = 500, + minprior = 0)) 2 : * * *

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 150 / 170

slide-152
SLIDE 152

Linear mixed models: Example / 4

> weightMix Call: stepFlexmix(weight ~ Time | Rat, model = FLXMRlmm(random = data = BodyWeight, control = list(iter.max = 500, minprior = 0), k = 2, nrep = 3) Cluster sizes: 1 2 88 88 convergence after 36 iterations

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 151 / 170

slide-153
SLIDE 153

Linear mixed models: Example / 5

> parameters(weightMix)[1:4,] Comp.1 Comp.2 coef.(Intercept) 478.0202 251.6517 coef.Time 0.8117 0.3596 sigma2.Random 2476.1500 121.6707 sigma2.Residual 75.7345 14.6092 > xyplot(weight ~ Time | factor(clusters(weightMix)), + groups = Rat, data = BodyWeight, + type = "l", col = 1, lty = 1)

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 152 / 170

slide-154
SLIDE 154

Linear mixed models: Example / 6

Time weight

300 400 500 600 20 40 60

1

20 40 60

2

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 153 / 170

slide-155
SLIDE 155

Linear mixed models: Example / 7

Do the latent groups correspond to the diets?

> table(clusters(weightMix), BodyWeight$Diet) a b c 1 0 44 44 2 88

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 154 / 170

slide-156
SLIDE 156

Extending flexmix

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 155 / 170

slide-157
SLIDE 157

Extending flexmix

flexmix can be extended by providing new drivers for the M-step. Alternative distributions / models in the components. Alternative models for determining the component sizes in dependence of the concomitant variables.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 156 / 170

slide-158
SLIDE 158

Extending flexmix / 2

‘FLXM’ objects have the following slots:

fit: A function(x, y, w), which returns an object of

class ‘FLXcomponent’.

defineComponent: A function(para) which constructs an

  • bject of class ‘FLXcomponent’.

weighted: Logical indicating if the driver is able to perform

weighted ML estimation.

formula: Formula relative to the formula which is specified in flexmix(). name: A string giving the name of the driver.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 157 / 170

slide-159
SLIDE 159

Extending flexmix / 3

Objects of class ‘FLXcomponent’ have the following slots:

logLik: A function(x, y), which returns the log-likelihood

values for the observations given by x and y.

predict: A function(x), which returns the predicted

values for y given x.

df: The number of degrees of freedom used to estimate the

component.

parameters: A list containing the estimated parameters.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 158 / 170

slide-160
SLIDE 160

Extending: mymclust()

mymclust <- function(formula = . ~ ., diagonal = TRUE) { z <- new("FLXMC", weighted = TRUE, formula = formula, dist = "mvnorm", name = "my driver") z@defineComponent <- function(para) { logLik <- function(x, y) mvtnorm::dmvnorm(y, para$center, para$cov, log = TRUE) predict <- function(x) matrix(para$center, nrow = nrow(x), ncol = length(para$center), byrow = TRUE) new("FLXcomponent", parameters = list(center = para$center, cov = para$cov), df = para$df, logLik = logLik, predict = predict) }

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 159 / 170

slide-161
SLIDE 161

Extending: mymclust() / 2

z@fit <- function(x, y, w) { para <- cov.wt(y, wt = w)[c("center", "cov")] para$df <- (3 * ncol(y) + ncol(y)^2)/2 if (diagonal) { para$cov <- diag(diag(para$cov)) para$df <- 2 * ncol(y) } z@defineComponent(para) } z }

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 160 / 170

slide-162
SLIDE 162

Extending: mymclust() / 3

Using the new driver:

> data("Nclus", package = "flexmix") > m1 <- flexmix(Nclus ~ 1, k = 4, + model = mymclust(diagonal=FALSE)) > m1 Call: flexmix(formula = Nclus ~ 1, k = 4, model = mymclust(diagona Cluster sizes: 1 2 3 4 96 100 150 204 convergence after 28 iterations

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 161 / 170

slide-163
SLIDE 163

Extending: mymclust() / 4

> summary(m1) Call: flexmix(formula = Nclus ~ 1, k = 4, model = mymclust(diagona prior size post>0 ratio Comp.1 0.177 96 131 0.733 Comp.2 0.182 100 112 0.893 Comp.3 0.272 150 176 0.852 Comp.4 0.368 204 249 0.819 ’log Lik.’ -2224 (df=23) AIC: 4493 BIC: 4592 > plotEll(m1, Nclus)

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 162 / 170

slide-164
SLIDE 164

Extending: mymclust() / 5

  • −5

5 10 −5 5 10 response[, which][,1] response[, which][,2]

1 2 3 4

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Extensions and variants – 163 / 170

slide-165
SLIDE 165

Summary

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Summary – 164 / 170

slide-166
SLIDE 166

Summary

If unobserved heterogeneity is suspected, any statistical model can be included in the finite mixture framework. Estimation using the EM algorithm is straightforward if weighed ML estimation of the model is available. A range of add-on packages are available for R to fit mixture models. See also the CRAN Task View “Cluster Analysis & Finite Mixture Models” at

https://CRAN.R-project.org/view=Cluster.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Summary – 165 / 170

slide-167
SLIDE 167

Monographs

  • D. Böhning.Computer Assisted Analysis of Mixtures and Applications:

Meta-Analysis, Disease Mapping, and Others. Chapman & Hall/CRC, London, 1999.

  • B. S. Everitt and D. J. Hand. Finite Mixture Distributions. Chapman and Hall,

London, 1981.

  • S. Frühwirth-Schnatter. Finite Mixture and Markov Switching Models. Springer

Series in Statistics. Springer, New York, 2006.

  • B. G. Lindsay. Mixture Models: Theory, Geometry, and Applications. The

Institute for Mathematical Statistics, Hayward, California, 1995.

  • G. J. McLachlan and K. E. Basford. Mixture Models: Inference and

Applications to Clustering. Marcel Dekker, New York, 1988.

  • G. J. McLachlan and D. Peel. Finite Mixture Models. Wiley, 2000.
  • D. M. Titterington, A. F. M. Smith and U. E. Makov. Statistical Analysis of Finite

Mixture Distributions. Wiley, 1985.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Summary – 166 / 170

slide-168
SLIDE 168

Additional references

  • M. Aitkin. Meta-analysis by random effect modelling in generalized linear
  • models. Statistics in Medicine 18:2343–2351, 1999.
  • K. E. Basford, D. R. Greenway, G. J. McLachlan, and D. Peel. Standard errors
  • f fitted component means of normal mixtures. Computational Statistics, 12:

1–17, 1997.

  • C. Biernacki, G. Celeux, and G. Govaert. Choosing starting values for the EM

algorithm for getting the highest likelihood in multivariate Gaussian mixture

  • models. Computational Statistics & Data Analysis, 41:561–575, 2003.
  • D. Böhning, E. Dietz, P

. Schlattmann, L. Mendonça, and U. Kirchner. The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. Journal of the Royal Statistical Society A, 162(2): 195-209, 1999.

  • G. Boiteau, M. Singh, R. P

. Singh, G. C. C. , Tai, und T. R. Turner. Rate of spread of PVY-n by alate Myzus persicae (Sulzer) from infected to healthy plants under laboratory conditions, Potato Research, 41:335–344, 1998.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Summary – 167 / 170

slide-169
SLIDE 169

Additional references / 2

  • C. M. Dayton. Latent Class Scaling Analysis. Sage, 1999.
  • C. M. Dayton and G. B. Macready. Concomitant-variable latent-class models.

Journal of the American Statistical Association, 83(401):173–178, 1988.

  • A. P

. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM-algorithm. Journal of the Royal Statistical Society B, 39:1–38, 1977.

  • C. Fraley and A. E. Raftery. Model-based clustering, discriminant analysis and

density estimation. Journal of the American Statistical Association, 97(458): 611–631, 2002.

  • B. Grün and F. Leisch. FlexMix version 2: Finite mixtures with concomitant

variables and varying and constant parameters. Journal of Statistical Software, 28(4), 2008a. URL http://www.jstatsoft.org/v28/i04/.

  • B. Grün and F. Leisch. Identifiability of finite mixtures of multinomial logit

models with varying and fixed effects. Journal of Classification, 25(2): 225–247, 2008b.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Summary – 168 / 170

slide-170
SLIDE 170

Additional references / 3

  • C. Hennig. Identifiability of models for clusterwise linear regression. Journal of

Classification, 17(2):273–296, 2000.

  • L. Hubert and P

. Arabie. Comparing partitions. Journal of Classification, 2: 193–218, 1985.

  • M. Hurn, A. Justel, and C. P

. Robert. Estimating mixtures of regressions. Journal of Computational and Graphical Statistics, 12(1):55–79, 2003.

  • L. Hunt and M. Jorgensen. Mixture model clustering using the MULTIMIX
  • program. Australian & New Zealand Journal of Statistics, 41(2):153–171, 1999.
  • F. Leisch. FlexMix: A general framework for finite mixture models and latent

class regression in R. Journal of Statistical Software, 11(8), 2004. URL

http://www.jstatsoft.org/v11/i08/.

  • G. J. McLachlan. On bootstrapping the likelihood ratio test statistic for the

number of components in a normal mixture. Journal of the Royal Statistical Society C, 36(3):318–324, 1987.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Summary – 169 / 170

slide-171
SLIDE 171

Additional references / 4

P . D. McNicholas and T. B. Murphy. Parsimonious Gaussian mixture models. Statistics and Computing 18(3):285–296, 2008.

  • T. R. Turner. Estimating the rate of spread of a viral infection of potato plants

via mixtures of regressions. Journal of the Royal Statistical Society C, 49(3): 371–384, 2000.

  • R. Wehrens, L. M. C. Buydens, C. Fraley, and A. Raftery. Model-based

clustering for image segmentation and large datsets via sampling. Journal of Classification, 21:231–253, 2004.

Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R – Summary – 170 / 170