Minimum Message Length Inference and Mixture Modelling of Inverse - - PowerPoint PPT Presentation

minimum message length inference and mixture modelling of
SMART_READER_LITE
LIVE PREVIEW

Minimum Message Length Inference and Mixture Modelling of Inverse - - PowerPoint PPT Presentation

Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions Daniel F. Schmidt Enes Makalic Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of


slide-1
SLIDE 1

Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions

Daniel F. Schmidt Enes Makalic

Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne

25th Australasian Joint Conference on Artificial Intelligence 2012

(The University of Melbourne) AI’2012 1 / 19

slide-2
SLIDE 2

Mixture Modelling

Content

1

Mixture Modelling Problem Description MML Mixture Models

2

MML Inverse Gaussian Distributions Inverse Gaussian Distributions MML Inference of Inverse Gaussians

3

Example

(The University of Melbourne) AI’2012 2 / 19

slide-3
SLIDE 3

Mixture Modelling Problem Description

Problem Description

We have n items, each with q associated attributes, formed into a matrix Y =

     

y1 y2 . . . yn

     

=

     

y1,1 y1,2 . . . y1,q y2,1 y2,2 . . . y2,q . . . . . . ... . . . yn,1 yn,2 . . . yn,q

     

Group together, or “cluster”, similar items A form of unsupervised learning Sometimes called intrinsic classification ⇒ Class labels are learned from the data

(The University of Melbourne) AI’2012 3 / 19

slide-4
SLIDE 4

Mixture Modelling Problem Description

Mixture Modelling (1)

Models data as a mixture of probability distributions p(yi,j; Φ) =

K

  • k=1

αkp(yi,j; θk,j) where

K is the number of classes α = (α1, . . . , αK) are the mixing (population) weights θk,j are the parameters of the distributions Φ = {K, α, θ1,1, . . . , θK,q} denotes the complete mixture model

Has an explicit probabilistic form ⇒ allows for statistical interpretion

(The University of Melbourne) AI’2012 4 / 19

slide-5
SLIDE 5

Mixture Modelling Problem Description

Mixture Modelling (2)

How is this related to clustering? Each class is a cluster

Class-specific probability distributions over each attribute

e.g., normal, inverse Gaussian, Poisson, etc.

Mixing weight is prevalance of the classes in the population

Measure of similarity of item to class pk(yi) =

q

  • j=1

p(yi,j; θk,j) ⇒ probability of item’s attributes under class distributions

(The University of Melbourne) AI’2012 5 / 19

slide-6
SLIDE 6

Mixture Modelling Problem Description

Mixture Modelling (2)

How is this related to clustering? Each class is a cluster

Class-specific probability distributions over each attribute

e.g., normal, inverse Gaussian, Poisson, etc.

Mixing weight is prevalance of the classes in the population

Measure of similarity of item to class pk(yi) =

q

  • j=1

p(yi,j; θk,j) ⇒ probability of item’s attributes under class distributions

(The University of Melbourne) AI’2012 5 / 19

slide-7
SLIDE 7

Mixture Modelling Problem Description

Mixture Modelling (3)

Membership of items to classes is soft ri,k = αk pk(yi)

K

l=1 αl pl(yi)

Posterior probability of belonging to class k

αk is a priori probability item belongs to class k pk(yi) is probability of data item yi under class k

⇒ Assign to class with highest posterior probability Total number of samples in a class is then nk =

n

  • i=1

ri,k

(The University of Melbourne) AI’2012 6 / 19

slide-8
SLIDE 8

Mixture Modelling Problem Description

Mixture Modelling (3)

Membership of items to classes is soft ri,k = αk pk(yi)

K

l=1 αl pl(yi)

Posterior probability of belonging to class k

αk is a priori probability item belongs to class k pk(yi) is probability of data item yi under class k

⇒ Assign to class with highest posterior probability Total number of samples in a class is then nk =

n

  • i=1

ri,k

(The University of Melbourne) AI’2012 6 / 19

slide-9
SLIDE 9

Mixture Modelling MML Mixture Models

MML Mixture Models (1)

Minimum Message Length goodness-of-fit criterion

Popular criterion for mixture modelling

Based on the idea of compression Message length of data is our yardstick ; comprised of

1

Length of codeword needed to state model Φ

Number of classes: I(K) Relative abundances: I(α) Parameters for each distribution in each class: I(θk,j)

2

Length of codeword needed to state data, given model: I(Y|Φ)

(The University of Melbourne) AI’2012 7 / 19

slide-10
SLIDE 10

Mixture Modelling MML Mixture Models

MML Mixture Models (1)

Minimum Message Length goodness-of-fit criterion

Popular criterion for mixture modelling

Based on the idea of compression Message length of data is our yardstick ; comprised of

1

Length of codeword needed to state model Φ

Number of classes: I(K) Relative abundances: I(α) Parameters for each distribution in each class: I(θk,j)

2

Length of codeword needed to state data, given model: I(Y|Φ)

(The University of Melbourne) AI’2012 7 / 19

slide-11
SLIDE 11

Mixture Modelling MML Mixture Models

MML Mixture Models (2)

Total message length: I(Y, Φ) = I(K) + I(α) +

K

  • k=1

q

  • j=1

I(θk,j) + I(Y|Φ) ⇒ balances model complexity against model fit Estimate Φ by minimising message length

ˆ α and ˆ θj,k found by expectation-maximisation Find ˆ K by splitting/merging classes

(The University of Melbourne) AI’2012 8 / 19

slide-12
SLIDE 12

Mixture Modelling MML Mixture Models

MML Mixture Models (2)

Total message length: I(Y, Φ) = I(K) + I(α) +

K

  • k=1

q

  • j=1

I(θk,j) + I(Y|Φ) ⇒ balances model complexity against model fit Estimate Φ by minimising message length

ˆ α and ˆ θj,k found by expectation-maximisation Find ˆ K by splitting/merging classes

(The University of Melbourne) AI’2012 8 / 19

slide-13
SLIDE 13

MML Inverse Gaussian Distributions

Content

1

Mixture Modelling Problem Description MML Mixture Models

2

MML Inverse Gaussian Distributions Inverse Gaussian Distributions MML Inference of Inverse Gaussians

3

Example

(The University of Melbourne) AI’2012 9 / 19

slide-14
SLIDE 14

MML Inverse Gaussian Distributions Inverse Gaussian Distributions

Inverse Gaussian Distributions (1)

Distribution for positive, continuous data We say Yi ∼ IG(µ, λ) if p.d.f. for Yi = yi is p(yi; µ, λ) =

  • 1

2πλy3

i

1

2

exp

  • −(yi − µ)2

2µ2λyi

  • ,

where

µ > 0 is the mean parameter λ > 0 is the inverse-shape parameter

Suitable for positively skewed data Derive the message length formula for use in mixture modelling

(The University of Melbourne) AI’2012 10 / 19

slide-15
SLIDE 15

MML Inverse Gaussian Distributions Inverse Gaussian Distributions

Inverse Gaussian Distributions (2)

Example of inverse Gaussian distributions

0.5 1 1.5 2 2.5 3 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 y p(y; µ, λ) µ=1, λ=1 µ=1, λ=3 µ=3, λ=1 (The University of Melbourne) AI’2012 11 / 19

slide-16
SLIDE 16

MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians

MML Inference of Inverse Gaussians (1)

Use Wallace–Freeman approximation Bayesian; we chose uninformative priors π(µ, λ) ∝ 1 λµ

3 2

Message length component for use in mixture models I(θk,j) = log nk − 1 2 log ˆ λk,j + log

  • 2

√ 2aj

bj

  • where

ˆ λk,j is the MML estimate of λ for class k and variable j nk is number of samples in class k aj, bj are hyper-parameters

Details may be found in the paper

(The University of Melbourne) AI’2012 12 / 19

slide-17
SLIDE 17

MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians

MML Inference of Inverse Gaussians (1)

Use Wallace–Freeman approximation Bayesian; we chose uninformative priors π(µ, λ) ∝ 1 λµ

3 2

Message length component for use in mixture models I(θk,j) = log nk − 1 2 log ˆ λk,j + log

  • 2

√ 2aj

bj

  • where

ˆ λk,j is the MML estimate of λ for class k and variable j nk is number of samples in class k aj, bj are hyper-parameters

Details may be found in the paper

(The University of Melbourne) AI’2012 12 / 19

slide-18
SLIDE 18

MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians

MML Inference of Inverse Gaussians (2)

Let y = (y1, . . . , yn) be data from an inverse Gaussian Define sufficient statistics S1 =

n

  • i=1

yi, S2 =

n

  • i=1

1 yi , Compare maximum likelihood estimates ˆ µML = S1 n , ˆ λML = S1S2 − n2 nS1 to minimum message length estimates ˆ µ87 = S1 n , ˆ λ87 = S1S2 − n2 (n − 1)S1 MML estimates:

1

Are Unbiased

2

Strictly dominate ML estimates in terms of KL risk

(The University of Melbourne) AI’2012 13 / 19

slide-19
SLIDE 19

MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians

MML Inference of Inverse Gaussians (2)

Let y = (y1, . . . , yn) be data from an inverse Gaussian Define sufficient statistics S1 =

n

  • i=1

yi, S2 =

n

  • i=1

1 yi , Compare maximum likelihood estimates ˆ µML = S1 n , ˆ λML = S1S2 − n2 nS1 to minimum message length estimates ˆ µ87 = S1 n , ˆ λ87 = S1S2 − n2 (n − 1)S1 MML estimates:

1

Are Unbiased

2

Strictly dominate ML estimates in terms of KL risk

(The University of Melbourne) AI’2012 13 / 19

slide-20
SLIDE 20

Example

Content

1

Mixture Modelling Problem Description MML Mixture Models

2

MML Inverse Gaussian Distributions Inverse Gaussian Distributions MML Inference of Inverse Gaussians

3

Example

(The University of Melbourne) AI’2012 14 / 19

slide-21
SLIDE 21

Example

Example (1)

Compared inverse Gaussian mixture models against standard Gaussian mixture models Used several well known, real, datasets

1

“Enzyme”

2

“Acidity”

3

“Galaxy”

Results shown for “enzyme”

n = 245 samples

See paper for “acidity” and “galaxy” results

(The University of Melbourne) AI’2012 15 / 19

slide-22
SLIDE 22

Example

Example (2)

Histogram of “enzyme” data

0.5 1 1.5 2 2.5 3 10 20 30 40 50 60 70 80

(The University of Melbourne) AI’2012 16 / 19

slide-23
SLIDE 23

Example

Example (3)

Gaussian mixture model (K = 2, I = 86.19)

0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3 3.5

(The University of Melbourne) AI’2012 17 / 19

slide-24
SLIDE 24

Example

Example (4)

Inverse Gaussian mixture model (K = 3, I = 69.34)

0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3 3.5

(The University of Melbourne) AI’2012 18 / 19

slide-25
SLIDE 25

Example

References

Wallace, C. S., Boulton, D. M. “An information measure for classification”. Computer Journal, 1968, Vol. 11, pp. 185-194 Wallace, C. S., Dowe, D. L. “MML mixture modelling of multi-state, Poisson, von Mises circular and Gaussian distributions”. Proceedings of the 6th International Workshop on Artificial Intelligence and Statistics, 1997, pp. 529-536 Wallace, C. S. “Intrinsic Classification of Spatially Correlated Data”. The Computer Journal, 1998, Vol. 41, pp. 602-611 Wallace, C. S., Dowe, D. L., “MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions”. Statistics and Computing, 2000, Vol. 10, pp. 73-83 Wallace, C. S. “Statistical and Inductive Inference by Minimum Message Length”, Springer, 2005

(The University of Melbourne) AI’2012 19 / 19