EM-like algorithms for nonparametric estimation in multivariate - - PowerPoint PPT Presentation

em like algorithms for nonparametric estimation in
SMART_READER_LITE
LIVE PREVIEW

EM-like algorithms for nonparametric estimation in multivariate - - PowerPoint PPT Presentation

Mixture models and EM algorithms Multivariate non-parametric npEM algorithms Further extensions EM-like algorithms for nonparametric estimation in multivariate mixtures Didier Chauveau MAPMO - UMR 6628 - Universit dOrlans Joint


slide-1
SLIDE 1

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions

EM-like algorithms for nonparametric estimation in multivariate mixtures

Didier Chauveau

MAPMO - UMR 6628 - Université d’Orléans Joint work with D. Hunter & T. Benaglia (Penn State University, USA)

COMPSTAT 2010 – Paris, August 24th 2010

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-2
SLIDE 2

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions

Outline

1

Mixture models and EM algorithms Motivations, examples and notation Review of EM algorithm-ology

2

Multivariate non-parametric “npEM” algorithms Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

3

Further extensions

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-3
SLIDE 3

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Outline: Next up. . .

1

Mixture models and EM algorithms Motivations, examples and notation Review of EM algorithm-ology

2

Multivariate non-parametric “npEM” algorithms Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

3

Further extensions

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-4
SLIDE 4

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Outline: Next up. . .

1

Mixture models and EM algorithms Motivations, examples and notation Review of EM algorithm-ology

2

Multivariate non-parametric “npEM” algorithms Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

3

Further extensions

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-5
SLIDE 5

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Finite mixture estimation problem

Multivariate observation x = (x1, . . . , xr) ∈ Rr from the mixture g(x) =

m

  • j=1

λj fj(x)

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-6
SLIDE 6

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Finite mixture estimation problem

Multivariate observation x = (x1, . . . , xr) ∈ Rr from the mixture g(x) =

m

  • j=1

λj fj(x) Assume independence of x1, . . . , xr conditional of the component from which x comes (Hall and Zhou 2003,. . . ): g(x) =

m

  • j=1

λj

r

  • k=1

fjk(xk) i.e. the dependence is induced by the mixture.

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-7
SLIDE 7

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Finite mixture estimation problem

Multivariate observation x = (x1, . . . , xr) ∈ Rr from the mixture g(x) =

m

  • j=1

λj fj(x) Assume independence of x1, . . . , xr conditional of the component from which x comes (Hall and Zhou 2003,. . . ): g(x) =

m

  • j=1

λj

r

  • k=1

fjk(xk) i.e. the dependence is induced by the mixture. Goal: Estimate θ = (λ, f) given an i.i.d. sample from g

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-8
SLIDE 8

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Nonparametric mixture model

In parametric case fj(·) ≡ f(·; φj) ∈ F, a parametric family indexed by a parameter φ ∈ Rd The parameter of the mixture model is θ = (λ, φ) = (λ1, . . . , λm, φ1, . . . , φm) Usual example: the univariate Gaussian mixture model, f(x; φj) = f

  • x; (µj, σ2

j )

  • = the pdf of N(µj, σ2

j ).

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-9
SLIDE 9

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Nonparametric mixture model

In parametric case fj(·) ≡ f(·; φj) ∈ F, a parametric family indexed by a parameter φ ∈ Rd The parameter of the mixture model is θ = (λ, φ) = (λ1, . . . , λm, φ1, . . . , φm) Usual example: the univariate Gaussian mixture model, f(x; φj) = f

  • x; (µj, σ2

j )

  • = the pdf of N(µj, σ2

j ).

Motivations here: Do not assume any parametric form for the fjk’s (e.g., avoid assumptions on tails...)

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-10
SLIDE 10

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Notational convention

We have: n = # of individuals in the sample m = # of Mixture components r = # of Repeated measurements (coordinates) Throughout, we use the subscripts: 1 ≤ i ≤ n, 1 ≤ j ≤ m, 1 ≤ k ≤ r

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-11
SLIDE 11

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Notational convention

We have: n = # of individuals in the sample m = # of Mixture components r = # of Repeated measurements (coordinates) Throughout, we use the subscripts: 1 ≤ i ≤ n, 1 ≤ j ≤ m, 1 ≤ k ≤ r The log-likelihood given data x1, . . . , xn is L(θ) =

n

  • i=1

log  

m

  • j=1

λj

r

  • k=1

fjk(xik)  

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-12
SLIDE 12

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Motivating example: Water-level data

Example from Thomas Lohaus and Brainerd (1993). The task: n = 405 subjects are shown r = 8 vessels, pointing at 1, 2, 4, 5, 7, 8, 10 and 11 o’clock They draw the water surface for each Measure: (signed) angle formed by surface with horizontal

Vessel tilted to point at 1:00

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-13
SLIDE 13

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Outline: Next up. . .

1

Mixture models and EM algorithms Motivations, examples and notation Review of EM algorithm-ology

2

Multivariate non-parametric “npEM” algorithms Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

3

Further extensions

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-14
SLIDE 14

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Review of standard EM for mixtures

For MLE in finite mixtures, EM algorithms are standard. A “complete” observation (X, Z) consists of: The observed, “incomplete” data X The “missing” vector Z, defined by for 1 ≤ j ≤ m, Zj =

  • 1

if X comes from component j

  • therwise
  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-15
SLIDE 15

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Review of standard EM for mixtures

For MLE in finite mixtures, EM algorithms are standard. A “complete” observation (X, Z) consists of: The observed, “incomplete” data X The “missing” vector Z, defined by for 1 ≤ j ≤ m, Zj =

  • 1

if X comes from component j

  • therwise

What does this mean? In simulations: We generate Z first, then X|Zj = 1 ∼ fj In real data, Z is a latent variable whose interpretation depends on context.

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-16
SLIDE 16

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Parametric (univariate) EM algorithm for mixtures

Let θt be an “arbitrary” value of θ

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-17
SLIDE 17

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Parametric (univariate) EM algorithm for mixtures

Let θt be an “arbitrary” value of θ E-step: Amounts to find the conditional expectation of each Z Z t

ij := Pθt[Zij = 1|xi] =

λt

j f(xi; φt j )

  • j′ λt

j′f(xi; φt j′)

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-18
SLIDE 18

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Parametric (univariate) EM algorithm for mixtures

Let θt be an “arbitrary” value of θ E-step: Amounts to find the conditional expectation of each Z Z t

ij := Pθt[Zij = 1|xi] =

λt

j f(xi; φt j )

  • j′ λt

j′f(xi; φt j′)

M-step: Maximize the “complete data” loglikelihood θt+1 = arg max

θ n

  • i=1

m

  • j=1

Z t

ij log

  • λjf(xi; φj)
  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-19
SLIDE 19

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Parametric (univariate) EM algorithm for mixtures

Let θt be an “arbitrary” value of θ E-step: Amounts to find the conditional expectation of each Z Z t

ij := Pθt[Zij = 1|xi] =

λt

j f(xi; φt j )

  • j′ λt

j′f(xi; φt j′)

M-step: Maximize the “complete data” loglikelihood θt+1 = arg max

θ n

  • i=1

m

  • j=1

Z t

ij log

  • λjf(xi; φj)
  • Typically: λt+1

j

=

Pn

i=1 Z t ij

n

, µt+1

j

=

Pn

i=1 Z t ij xi

Pn

i=1 Z t ij

, . . .

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-20
SLIDE 20

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Semiparametric univariate mixture & EM-like algorithm

Identifiability: g(x) uniquely determines all λj and fj’s Parametric case: When fj(x) = f(x; φj), generally OK Nonparametric case: Some restrictions on fj are needed

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-21
SLIDE 21

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Semiparametric univariate mixture & EM-like algorithm

Identifiability: g(x) uniquely determines all λj and fj’s Parametric case: When fj(x) = f(x; φj), generally OK Nonparametric case: Some restrictions on fj are needed Bordes Mottelet and Vandekerkhove (2006) and Hunter Wang and Hettmansperger (2007) both showed that: for f symmetric about the origin and λ1 = 1/2, gθ(x) =

2

  • j=1

λjf(x − µj) is identifiable for the parameter θ = (λ, µ, f).

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-22
SLIDE 22

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Motivations, examples and notation Review of EM algorithm-ology

Semiparametric univariate mixture & EM-like algorithm

Identifiability: g(x) uniquely determines all λj and fj’s Parametric case: When fj(x) = f(x; φj), generally OK Nonparametric case: Some restrictions on fj are needed Bordes Mottelet and Vandekerkhove (2006) and Hunter Wang and Hettmansperger (2007) both showed that: for f symmetric about the origin and λ1 = 1/2, gθ(x) =

2

  • j=1

λjf(x − µj) is identifiable for the parameter θ = (λ, µ, f). Bordes Chauveau and Vandekerkhove (2007) introduced a stochastic EM-like algorithm that includes a Kernel Density Estimation (KDE) step.

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-23
SLIDE 23

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Outline: Next up. . .

1

Mixture models and EM algorithms Motivations, examples and notation Review of EM algorithm-ology

2

Multivariate non-parametric “npEM” algorithms Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

3

Further extensions

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-24
SLIDE 24

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Outline: Next up. . .

1

Mixture models and EM algorithms Motivations, examples and notation Review of EM algorithm-ology

2

Multivariate non-parametric “npEM” algorithms Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

3

Further extensions

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-25
SLIDE 25

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

The blessing of dimensionality (!)

Recall the model in the multivariate case, r > 1: g(x) =

m

  • j=1

λj

r

  • k=1

fjk(xk) N.B.: Assume conditional independence of x1, . . . , xr Hall and Zhou (2003) show that when m = 2 and r ≥ 3, the model is identifiable under mild restrictions on the fjk(·) Hall et al. (2005) . . . from at least one point of view, the ‘curse of dimensionality’ works in reverse.

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-26
SLIDE 26

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

The blessing of dimensionality (!)

Recall the model in the multivariate case, r > 1: g(x) =

m

  • j=1

λj

r

  • k=1

fjk(xk) N.B.: Assume conditional independence of x1, . . . , xr Hall and Zhou (2003) show that when m = 2 and r ≥ 3, the model is identifiable under mild restrictions on the fjk(·) Hall et al. (2005) . . . from at least one point of view, the ‘curse of dimensionality’ works in reverse. Allman et al. (2008) give mild sufficient conditions for identifiability whenever r ≥ 3

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-27
SLIDE 27

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

The notation gets even worse. . .

Suppose some of the r coordinates are identically distributed. Let the r coordinates be grouped into B blocks of iid coordinates. Denote the block index of the kth coordinate by bk ∈ {1, . . . , B}, k = 1, . . . , r. The model becomes g(x) =

m

  • j=1

λj

r

  • k=1

fjbk(xk)

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-28
SLIDE 28

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

The notation gets even worse. . .

Suppose some of the r coordinates are identically distributed. Let the r coordinates be grouped into B blocks of iid coordinates. Denote the block index of the kth coordinate by bk ∈ {1, . . . , B}, k = 1, . . . , r. The model becomes g(x) =

m

  • j=1

λj

r

  • k=1

fjbk(xk) Special cases:

bk = k for each k: Fully general model, seen earlier (Hall et al. 2005; Qin and Leung 2006) bk = 1 for each k: Conditionally i.i.d. assumption (Elmore et al. 2004)

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-29
SLIDE 29

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Motivation: The water-level data example again

8 vessels, presented in order 11, 4, 2, 7, 10, 5, 1, 8

  • ’clock

11:00 4:00 2:00 7:00 10:00 5:00 1:00 8:00

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-30
SLIDE 30

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Motivation: The water-level data example again

8 vessels, presented in order 11, 4, 2, 7, 10, 5, 1, 8

  • ’clock

Assume that opposite clock-face

  • rientations lead to conditionally

iid responses (same behavior)

Vessel tilted to point at 1:00 and 7:00

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-31
SLIDE 31

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Motivation: The water-level data example again

8 vessels, presented in order 11, 4, 2, 7, 10, 5, 1, 8

  • ’clock

Assume that opposite clock-face

  • rientations lead to conditionally

iid responses (same behavior) B = 4 blocks defined by b = (4, 3, 2, 1, 3, 4, 1, 2) e.g., b4 = b7 = 1, i.e., block 1 relates to coordinates 4 and 7, corresponding to clock

  • rientations 1:00 and 7:00

11:00 4:00 2:00 7:00 10:00 5:00 1:00 8:00

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-32
SLIDE 32

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

The nonparametric “EM” (npEM) algorithm

E-step: Same as usual, but now fjbk is part of the parameter: Z t

ij ≡ Eθt[Zij|xi] =

λt

j

r

k=1 f t jbk(xik)

  • j′ λt

j′

r

k=1 f t j′bk(xik)

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-33
SLIDE 33

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

The nonparametric “EM” (npEM) algorithm

E-step: Same as usual, but now fjbk is part of the parameter: Z t

ij ≡ Eθt[Zij|xi] =

λt

j

r

k=1 f t jbk(xik)

  • j′ λt

j′

r

k=1 f t j′bk(xik)

M-step: Maximize “complete data loglikelihood” for λ: λt+1

j

= 1 n

n

  • i=1

Z t

ij

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-34
SLIDE 34

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

The nonparametric “EM” (npEM) algorithm

E-step: Same as usual, but now fjbk is part of the parameter: Z t

ij ≡ Eθt[Zij|xi] =

λt

j

r

k=1 f t jbk(xik)

  • j′ λt

j′

r

k=1 f t j′bk(xik)

M-step: Maximize “complete data loglikelihood” for λ: λt+1

j

= 1 n

n

  • i=1

Z t

ij

WKDE-step: Update estimate of fjℓ (component j, block ℓ) by f t+1

jℓ

(u) = 1 nhCℓλt+1

j r

  • k=1

n

  • i=1

Z t

ijI{bk=ℓ}K

u − xik h

  • where Cℓ = r

k=1 I{bk=ℓ} = # of coordinates in block ℓ

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-35
SLIDE 35

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Outline: Next up. . .

1

Mixture models and EM algorithms Motivations, examples and notation Review of EM algorithm-ology

2

Multivariate non-parametric “npEM” algorithms Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

3

Further extensions

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-36
SLIDE 36

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Advertising!

All computational techniques in this talk are implemented in the mixtools package for the R Statistical Software

www.r-project.org cran.cict.fr/web/packages/mixtools

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-37
SLIDE 37

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Simulated trivariate benchmark models

Comparisons with Hall et al. (2005) inversion method m = 2, r = 3, conditional independence (no blocks)

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-38
SLIDE 38

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Simulated trivariate benchmark models

Comparisons with Hall et al. (2005) inversion method m = 2, r = 3, conditional independence (no blocks) For j = 1, 2 and k = 1, 2, 3, we compute as in Hall et al. MISEjk = 1 S

S

  • s=1

ˆ f (s)

jk (u) − fjk(u)

2 du

  • ver S replications, where ˆ

Zij’s are the final posterior, and ˆ fjk(u) = 1 nhˆ λj

n

  • i=1

ˆ ZijK u − xik h

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-39
SLIDE 39

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

MISE comparisons with Hall et al (2005) benchmarks

n = 500, S = 300 replications, 3 models, log scale

0.05 0.10 0.20 0.50

Normal λ1 MISE

0.1 0.2 0.3 0.4

  • 0.05

0.10 0.20 0.50

Double Exponential λ1

0.1 0.2 0.3 0.4

  • Component 1

Component 2

Inversion ↑ npEM ↓

0.05 0.10 0.20 0.50

t(10) λ λ1

0.1 0.2 0.3 0.4

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-40
SLIDE 40

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

The Water-level data

Previously analysed using mixtures by Hettmansperger and Thomas (2000), and Elmore et al. (2004), using Assumptions and model: r = 8 coordinates assumed conditionally i.i.d. Cutpoint approach = binning data in p-dim vectors mixture of multinomial identifiable whenever r ≥ 2m − 1 (Elmore and Wang 2003)

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-41
SLIDE 41

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

The Water-level data

Previously analysed using mixtures by Hettmansperger and Thomas (2000), and Elmore et al. (2004), using Assumptions and model: r = 8 coordinates assumed conditionally i.i.d. Cutpoint approach = binning data in p-dim vectors mixture of multinomial identifiable whenever r ≥ 2m − 1 (Elmore and Wang 2003) The non appropriate i.i.d. assumption masks interesting features that our model reveals

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-42
SLIDE 42

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

The Water-level data, m = 3 components, 4 blocks

Block 1: 1:00 and 7:00 orientations 0.000 0.010 0.020 0.030 Mixing Proportion (Mean, Std Dev) 0.077 (−32.1, 19.4) 0.431 ( −3.9, 23.3) 0.492 ( −1.4, 6.0) Appearance of Vessel at Orientation = 1:00 −90 −60 −30 30 60 90 Block 2: 2:00 and 8:00 orientations 0.000 0.010 0.020 0.030 Mixing Proportion (Mean, Std Dev) 0.077 (−31.4, 55.4) 0.431 (−11.7, 27.0) 0.492 ( −2.7, 4.6) Appearance of Vessel at Orientation = 2:00 −90 −60 −30 30 60 90 Block 3: 4:00 and 10:00 orientations 0.000 0.010 0.020 0.030 Mixing Proportion (Mean, Std Dev) 0.077 ( 43.6, 39.7) 0.431 ( 11.4, 27.5) 0.492 ( 1.0, 5.3) Appearance of Vessel at Orientation = 4:00 −90 −60 −30 30 60 90 Block 4: 5:00 and 11:00 orientations 0.000 0.010 0.020 0.030 Mixing Proportion (Mean, Std Dev) 0.077 ( 27.5, 19.3) 0.431 ( 2.0, 22.1) 0.492 ( −0.1, 6.1) Appearance of Vessel at Orientation = 5:00 −90 −60 −30 30 60 90

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-43
SLIDE 43

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Outline: Next up. . .

1

Mixture models and EM algorithms Motivations, examples and notation Review of EM algorithm-ology

2

Multivariate non-parametric “npEM” algorithms Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

3

Further extensions

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-44
SLIDE 44

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Bandwidth issues in the kernel density estimates

Crude method : use R default (Silverman’s rule) based on sd (standard deviation) and IQR (InterQuartileRange) computed by pooling the n × r data points, h = 0.9 min

  • sd, IQR

1.34

  • (nr)−1/5
  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-45
SLIDE 45

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Bandwidth issues in the kernel density estimates

Crude method : use R default (Silverman’s rule) based on sd (standard deviation) and IQR (InterQuartileRange) computed by pooling the n × r data points, h = 0.9 min

  • sd, IQR

1.34

  • (nr)−1/5

Inappropriate for mixtures, e.g. for components with supports of different locations and/or scales Example (see later): f11 ≡ Student and f22 ≡ Beta

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-46
SLIDE 46

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Iterative and per component & block bandwidths

Estimated sample size for jth component and ℓth block

n

  • i=1

r

  • k=1

I{bk=ℓ}Z t

ij = nCℓλt j

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-47
SLIDE 47

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Iterative and per component & block bandwidths

Estimated sample size for jth component and ℓth block

n

  • i=1

r

  • k=1

I{bk=ℓ}Z t

ij = nCℓλt j

Iterative bandwidth ht+1

jℓ

applying (e.g.) Silverman’s rule ht+1

jℓ

= 0.9 min

  • σt+1

jℓ

, IQRt+1

jℓ

1.34

  • (nCℓλt+1

j

)−1/5 where σ’s and IQR’s have to be estimated per iteration/component/block

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-48
SLIDE 48

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Iterative and per component/block sd’s

Augment each M-step to include µt+1

jℓ

=

n

  • i=1

r

  • k=1

Z t

ijI{bk=ℓ}xik

nCℓλt+1

j

, σt+1

jℓ

=      

n

  • i=1

r

  • k=1

Z t

ijI{bk=ℓ}(xik − µt+1 jℓ )2

nCℓλt+1

j

     

1/2

NB: these “parameters” are not in the model

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-49
SLIDE 49

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Iterative and per component/block quantiles

Let xℓ denote the nCℓ data in block ℓ, and τ(·) be a permutation

  • n {1, . . . , nCℓ} such that

xℓ

τ(1) ≤ xℓ τ(2) ≤ · · · ≤ xℓ τ(nCℓ)

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-50
SLIDE 50

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Iterative and per component/block quantiles

Let xℓ denote the nCℓ data in block ℓ, and τ(·) be a permutation

  • n {1, . . . , nCℓ} such that

xℓ

τ(1) ≤ xℓ τ(2) ≤ · · · ≤ xℓ τ(nCℓ)

Define the weighted α-quantile estimate: Qt+1

jℓ,α = xℓ τ(iα),

where iα = min

  • s :

s

  • u=1

Z t

τ(u)j ≥ αnCℓλt+1 j

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-51
SLIDE 51

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Iterative and per component/block quantiles

Let xℓ denote the nCℓ data in block ℓ, and τ(·) be a permutation

  • n {1, . . . , nCℓ} such that

xℓ

τ(1) ≤ xℓ τ(2) ≤ · · · ≤ xℓ τ(nCℓ)

Define the weighted α-quantile estimate: Qt+1

jℓ,α = xℓ τ(iα),

where iα = min

  • s :

s

  • u=1

Z t

τ(u)j ≥ αnCℓλt+1 j

  • Set IQRt+1

jℓ

= Qt+1

jℓ,0.75 − Qt+1 jℓ,0.25

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-52
SLIDE 52

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Iterative & adaptive bandwidth illustration

Multivariate example with m = 2, r = 5, B = 2 blocks Block 1 = (x1, x2, x3), components f11 = t(2, 0), f21 = t(10, 4) Block 2 = (x4, x5), components f12 = U[0,1], f22 = Beta(1, 5)

−5 5 10 15 0.00 0.05 0.10 0.15

block 1

x 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0

block 2

x

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-53
SLIDE 53

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Simulated data, n = 300 individuals

Default bandwidth

> blockid = c(1,1,1,2,2) > a = npEM(x, 2, blockid) > plot(a, breaks = 18) > a$bandwidth [1] 0.5238855

xx Density −10 −5 5 10 0.00 0.05 0.10 0.15 0.427 0.573 Coordinates 1,2,3 xx Density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 0.427 0.573 Coordinates 4,5

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-54
SLIDE 54

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Simulated data, n = 300 individuals

Default bandwidth

> blockid = c(1,1,1,2,2) > a = npEM(x, 2, blockid) > plot(a, breaks = 18) > a$bandwidth [1] 0.5238855

xx Density −10 −5 5 10 0.00 0.05 0.10 0.15 0.427 0.573 Coordinates 1,2,3 xx Density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 0.427 0.573 Coordinates 4,5

Bandwidth per block & component

> b = npEM(x, 2, blockid, samebw=FALSE) > plot(b, breaks = 18) > b$bandwidth component 1 component 2 block 1 0.38573749 0.35232409 block 2 0.08441747 0.04388618

xx Density −10 −5 5 10 0.00 0.05 0.10 0.15 0.431 0.569 Coordinates 1,2,3 xx Density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 0.431 0.569 Coordinates 4,5

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-55
SLIDE 55

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

The Water-level data with adaptive bandwidth

Density 0.000 0.010 0.020 0.030 0.119 0.834 0.046

Block 1: 1:00 and 7:00 orientations

  • 90
  • 60
  • 30

30 60 90 Density 0.000 0.010 0.020 0.030 0.119 0.834 0.046

Block 2: 2:00 and 8:00 orientations

  • 90
  • 60
  • 30

30 60 90 Density 0.000 0.010 0.020 0.030 0.119 0.834 0.046

Block 3: 4:00 and 10:00 orientations

  • 90
  • 60
  • 30

30 60 90 Density 0.000 0.010 0.020 0.030 0.119 0.834 0.046

Block 4: 5:00 and 11:00 orientations

  • 90
  • 60
  • 30

30 60 90

> b$band comp 1 comp 2 comp 3 block 1 12.172 1.4597 0.97535 block 2 13.996 2.7370 2.27581 block 3 19.190 2.5545 2.27582 block 4 12.363 1.2772 1.62558

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-56
SLIDE 56

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

Pros and cons of the npEM algorithm

Pro: Easily generalizes beyond m = 2, r = 3 (not the case for inversion methods) Pro: Much lower MISE for similar test problems. Pro: Computationally simple (in the mixtools package). Pro: No need to assume conditionally i.i.d., and no loss of information from categorizing data (as for for the cutpoint approach) Con: Not a true EM algorithm (no monotonicity property) → Nonlinear Smoothed Likelihood MM algorithms Levine, Hunter and Chauveau (2010, . . . )

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-57
SLIDE 57

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions

Outline: Next up. . .

1

Mixture models and EM algorithms Motivations, examples and notation Review of EM algorithm-ology

2

Multivariate non-parametric “npEM” algorithms Model and algorithm Examples Adaptive bandwidths in the npEM algorithm

3

Further extensions

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-58
SLIDE 58

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions

Further extensions: Semiparametric models

Component or block density may differ only in location and/or scale parameters, e.g. fjℓ(x) = 1 σjℓ fj x − µjℓ σjℓ

  • r

fjℓ(x) = 1 σjℓ fℓ x − µjℓ σjℓ

  • r

fjℓ(x) = 1 σjℓ f x − µjℓ σjℓ

  • where fj’s, fℓ’s, or the single f remain fully unspecified

For all these situations special cases of the npEM algorithm can easily be designed (some are already in mixtools).

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-59
SLIDE 59

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions

Further extensions: Stochastic npEM versions

In some setup, it may be useful to simulate the latent data from the posterior probabilities: ˆ Z

t i ∼ Mult

  • 1 ; Z t

i1, . . . , Z t im

  • ,

i = 1, . . . , n Then the sequence (θt)t≥1 becomes a Markov Chain Historically, parametric Stochastic EM introduced by Celeux Diebolt (1985, 1986,. . . ), see also MCMC sampling (Diebolt Robert 1994) In non-parametric framework: Stochastic npEM for reliability mixture models, Bordes Chauveau (COMPSTAT

  • 2010. . . )
  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures

slide-60
SLIDE 60

Mixture models and EM algorithms Multivariate non-parametric “npEM” algorithms Further extensions

Selected references

Allman, E.S., Matias, C. and Rhodes, J.A. (2008), Identifiability of latent class models with many observed variables, Annals of Statistics, 37: 3099–3132. Benaglia, T., Chauveau, D., and Hunter, D. R. (2009), An EM-like algorithm for semi- and non-parametric estimation in mutlivariate mixtures, J. Comput. Graph. Statist. 18, no. 2, 505Ð-526. Benaglia T., Chauveau D., Hunter D. R., Young D. S., mixtools: An R Package for Analyzing Mixture Models, Journal of Statistical Software 32 (2009), 1–29. Bordes, L., Mottelet, S., and Vandekerkhove, P . (2006), Semiparametric estimation of a two-component mixture model, Annals of Statistics, 34, 1204–1232. Bordes, L., Chauveau, D., and Vandekerkhove, P . (2007), An EM algorithm for a semiparametric mixture model, Computational Statistics and Data Analysis, 51: 5429–5443. Elmore, R. T., Hettmansperger, T. P ., and Thomas, H. (2004), Estimating component cumulative distribution functions in finite mixture models, Communications in Statistics: Theory and Methods, 33: 2075–2086. Elmore, R. T., Hall, P . and Neeman, A. (2005), An application of classical invariant theory to identifiability in nonparametric mixtures, Annales de l’Institut Fourier, 55, 1: 1–28. Hall, P . and Zhou, X. H. (2003) Nonparametric estimation of component distributions in a multivariate mixture, Annals of Statistics, 31: 201–224. Hall, P ., Neeman, A., Pakyari, R., and Elmore, R. (2005), Nonparametric inference in multivariate mixtures, Biometrika, 92: 667–678. Hunter, D. R., Wang, S., and Hettmansperger, T. P . (2007), Inference for mixtures of symmetric distributions, Annals of Statistics, 35: 224–251. Thomas, H., Lohaus, A., and Brainerd, C.J. (1993). Modeling Growth and Individual Differences in Spatial Tasks, Monographs of the Society for Research in Child Development, 58, 9: 1–190.

  • D. Chauveau – COMPSTAT 2010

Nonparametric multivariate mixtures