Pattern Recognition 2019 Clustering, Mixture Models and EM Ad - - PowerPoint PPT Presentation

pattern recognition 2019 clustering mixture models and em
SMART_READER_LITE
LIVE PREVIEW

Pattern Recognition 2019 Clustering, Mixture Models and EM Ad - - PowerPoint PPT Presentation

Pattern Recognition 2019 Clustering, Mixture Models and EM Ad Feelders Universiteit Utrecht December 13, 2019 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 1 / 66 Objective of Clustering Put objects (persons,


slide-1
SLIDE 1

Pattern Recognition 2019 Clustering, Mixture Models and EM

Ad Feelders

Universiteit Utrecht

December 13, 2019

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 1 / 66

slide-2
SLIDE 2

Objective of Clustering

Put objects (persons, images, web-pages, ...) into a number of groups in such a way that the objects within the same group are similar, but the groups are dissimilar.

Variable 1 Variable 2

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 2 / 66

slide-3
SLIDE 3

Similarity between objects

Each object is described by a number of variables (also called features

  • r attributes).

The similarity between objects is determined on the basis of these variables. The measurement of similarity is central to many clustering methods.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 3 / 66

slide-4
SLIDE 4

Clustering = Classification

In classification the group to which an object belongs is given, and the task is to discriminate between groups on the basis of the variables used to describe the objects. In clustering the groups are not given, but the objective is to discover them. Clustering is sometimes called unsupervised learning, and classification supervised learning.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 4 / 66

slide-5
SLIDE 5

Clustering Techniques

Many techniques have been developed to cluster objects into groups: Hierarchical clustering (not discussed). Partitioning methods (e.g. K-means, K-medoids). Model-based clustering (mixture models).

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 5 / 66

slide-6
SLIDE 6

Data Matrix

We have observations on N objects, that we want to cluster into a number

  • f groups. For each object we observe D variables, numbered 1, 2, . . . , D.

Data matrix: X =         x11 . . . x1j . . . x1D . . . . . . . . . xn1 . . . xnj . . . xnD . . . . . . . . . xN1 . . . xNj . . . xND         where xnj denotes the value of object n for variable j.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 6 / 66

slide-7
SLIDE 7

Distance Measures: numeric variables

Variable 1 Variable 2 (x11,x12) x21 x11 (x21,x22) x22 x12 Object 1 Object 2

Dashed line: Euclidian distance Solid line: Manhattan distance

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 7 / 66

slide-8
SLIDE 8

Distance Measures: numeric variables

Manhattan distance between xi and xj:

D

  • d=1

|xid − xjd|. Squared Euclidian distance between xi and xj:

D

  • d=1

(xid − xjd)2 = xi − xj2.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 8 / 66

slide-9
SLIDE 9

Standardization

Units of measurement should not be important for cluster structure. Therefore variables are often standardized. For example: sj =

  • 1

N − 1

N

  • n=1

(xnj − ¯ xj)2 Standardized measurement: x∗

nj = xnj − ¯

xj sj x∗

j has mean zero and standard deviation 1.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 9 / 66

slide-10
SLIDE 10

Partitioning methods

Search directly for a division of the N objects into K groups that maximizes the quality of the clustering. The number of distinct partitions P(N, K) of N objects into K non-empty groups is O(K N). For example: P(100, 5) = 1068. Exhaustive search is not feasible.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 10 / 66

slide-11
SLIDE 11

K-means Clustering

There are many possibilities to measure the quality of a partition. In case of numeric data, one can use for example J =

N

  • n=1

K

  • k=1

rnkxn − µk2 (9.1) Sum of the squares of Euclidian distances of each data point to the center

  • f the cluster to which it has been assigned.

rnk = 1 if xn has been assigned to cluster k, and rnk = 0 otherwise (1-of-K coding).

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 11 / 66

slide-12
SLIDE 12

Minimize J with respect to rnk (E-step)

Optimize for each point n separately by choosing rnk to be 1 for the value

  • f k that gives the minimum distance xn − µk2.

More formally rnk = 1 if k = arg minj xn − µj2

  • therwise.

(9.2)

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 12 / 66

slide-13
SLIDE 13

Minimize J with respect to µj (M-step)

Take derivative of J with respect to µj and equate to zero −2

N

  • n=1

rnj(xn − µj) = 0 (9.3) which gives µj =

  • n rnjxn
  • n rnj

(9.4) i.e. the mean of the points that are assigned to cluster j.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 13 / 66

slide-14
SLIDE 14

K-means algorithm

1 Partition the observations into K initial clusters. 2 Calculate the mean of each cluster (M-step). 3 Assign each observation to the cluster whose mean is nearest (E-step). 4 If reassignments have taken place, return to step 2; otherwise stop. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 14 / 66

slide-15
SLIDE 15

Old Faithful data set

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 15 / 66

slide-16
SLIDE 16

Old Faithful data set

(a) −2 2 −2 2 (b) −2 2 −2 2 (c) −2 2 −2 2 (d) −2 2 −2 2 (e) −2 2 −2 2 (f) −2 2 −2 2 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 16 / 66

slide-17
SLIDE 17

Old Faithful data set

(g) −2 2 −2 2 (h) −2 2 −2 2 (i) −2 2 −2 2 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 17 / 66

slide-18
SLIDE 18

Convergence of algorithm

  • 1

2 3 4 500 1000

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 18 / 66

slide-19
SLIDE 19

How to do this in R

# load library/package MASS > library(MASS) # scale data > faith.sc <- scale(faithful) # K-means with K=2 applied to faithful data > faithful.k2 <- kmeans(faith.sc,2) # plot resulting clusters > plot(faith.sc[,1],faith.sc[,2],xlim=c(-2,2),type="n") > points(faith.sc[,1],faith.sc[,2], col=faithful.k2$cluster*2,pch=19)

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 19 / 66

slide-20
SLIDE 20

Final clustering obtained

−2 −1 1 2 −2 −1 1 2 eruptions waiting

  • Ad Feelders

( Universiteit Utrecht ) Pattern Recognition December 13, 2019 20 / 66

slide-21
SLIDE 21

How many clusters?

Required number of groups usually not known in advance. Determine the appropriate number of groups from the data. Informal: plot the quality criterion against the number of groups. Look for large jumps to determine the appropriate number of groups.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 21 / 66

slide-22
SLIDE 22

Example: Ruspini data

20 40 60 80 100 120 50 100 150

Variable 1 Variable 2

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 22 / 66

slide-23
SLIDE 23

Determining the number of groups

number of groups within sum of squares 2 3 4 5 6 20000 40000 60000 80000 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 23 / 66

slide-24
SLIDE 24

K-medoids

Can be used with other dissimilarity measures than Euclidian distance. Use a number of representative objects (called medoids) instead of means. Advantage: less sensitive to outliers than K-means (cf. the mean and median of a sample).

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 24 / 66

slide-25
SLIDE 25

K-medoids: cluster quality

Each object is assigned to the cluster corresponding to the nearest medoid. The K representative objects should minimize the sum of the dissimilarities of all objects to their nearest medoid, i.e. ˜ J =

N

  • n=1

K

  • k=1

rnkV(xn, µk) (9.6)

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 25 / 66

slide-26
SLIDE 26

K-medoids algorithm

1 Choose K initial medoids. 2 Assign each object to the nearest medoid. 3 From each cluster, choose the object that minimizes the summed

distances as the new medoid.

4 Repeat steps 2) and 3) until convergence. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 26 / 66

slide-27
SLIDE 27

Example: Ruspini data

20 40 60 80 100 120 50 100 150

Variable 1 Variable 2

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 27 / 66

slide-28
SLIDE 28

Clusters found by K-medoids (K=4)

20 40 60 80 100 120 50 100 150 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 22 2 2 22 22 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 33 3 3 33 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

Variable 1 Variable 2

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 28 / 66

slide-29
SLIDE 29

K-medoids in R

# load library/package cluster > library(cluster) # K-medoids with K=4 applied to ruspini data > rusp.pam2 <- pam(ruspini,k=4) # plot resulting clusters > plot(ruspini[,1],ruspini[,2],type="n") > text(ruspini[,1],ruspini[,2],rusp.pam2$clust)

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 29 / 66

slide-30
SLIDE 30

Model-Based Clustering

Data is viewed as sample from a population that consists of a number

  • f subpopulations (clusters or components)

Each cluster can be described by a probability distribution. Use statistical inference to answer questions such as

What is the probability that an object belongs to a particular cluster? What is the most likely number of clusters present in the population?

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 30 / 66

slide-31
SLIDE 31

Mixture distributions

A mixture distribution is a “weighted average” of a number of component distributions. In general, a mixture distribution can be written as p(x) =

K

  • k=1

πkpk(x|θk) where K is the number of components, πk the mixing proportions and θk the component parameter vectors.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 31 / 66

slide-32
SLIDE 32

Data Matrix and Likelihood Function

The observed data is: X =         x11 . . . x1j . . . x1D . . . . . . . . . xn1 . . . xnj . . . xnD . . . . . . . . . xN1 . . . xNj . . . xND         where xnj denotes the value of object n for variable j. Let xn denote (xn1, . . . , xnD), where the xn are assumed to be drawn independently from a mixture.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 32 / 66

slide-33
SLIDE 33

Maximum likelihood for mixtures

Given a training sample (x1, . . . , xN), the values of θ and π are chosen so as to maximize the likelihood p(X|θ, π) =

N

  • n=1

K

  • k=1

πkpk(xn|θk). In other words: we choose the parameter values for which the probability

  • f the observed data is maximal (larger than for any other value of the

parameters).

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 33 / 66

slide-34
SLIDE 34

Probability distributions for clusters

Assumption: each cluster is a member of the same parametric family of probability distributions. Frequently used distributions are Numeric data: multivariate normal distribution (section 9.2). Binary data: multivariate Bernoulli distribution (section 9.3.3).

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 34 / 66

slide-35
SLIDE 35

Numeric data: normal clusters

For numeric data we usually choose normal (Gaussian) clusters. The likelihood function is then given by p(X|π, µ, Σ) =

N

  • n=1

K

  • k=1

πkN(xn|µk, Σk) (9.14) N: (multivariate) normal density function µk: mean vector of the k th cluster Σk: covariance matrix of k th cluster.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 35 / 66

slide-36
SLIDE 36

ML estimation of mixture models

Complicated likelihood: no closed-form solution for ML estimates Apply iterative scheme: EM algorithm Converges to local maximum of likelihood function

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 36 / 66

slide-37
SLIDE 37

ML estimation of univariate normal mixture

Suppose cluster membership were known n x r1 r2 1 4 1 2 8 1 3 2 1 4 6 1 The ML estimates are then simply given by µk = N

n=1 rnkxn

N

n=1 rnk

, σ2

k =

N

n=1 rnk(xn − µk)2

N

n=1 rnk

, πk = N

n=1 rnk

N For data in table: µ1 = 3, µ2 = 7, σ2

1 = 1, σ2 2 = 1, π1 = 0.5, π2 = 0.5.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 37 / 66

slide-38
SLIDE 38

ML estimation: first guess

But we don’t know cluster membership, that’s the whole point! Make a first guess at cluster membership n x r1 r2 1 4 1 2 8 1 3 2 1 4 6 1 µk = N

n=1 rnkxn

N

n=1 rnk

, σ2

k =

N

n=1 rnk(xn − µk)2

N

n=1 rnk

, πk = N

n=1 rnk

N ML estimates based on first guess µ(1)

1

= 4, µ(1)

2

= 6, σ2(1)

1

= 4, σ2(1)

2

= 4, π(1)

1

= 0.5, π(1)

2

= 0.5.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 38 / 66

slide-39
SLIDE 39

Computing probabilities of cluster membership

Use current parameter estimates and Bayes’ rule to compute for each

  • bservation the probability that it belongs to cluster 1 and 2:

p(rk = 1|x) = p(x|rk = 1)p(rk = 1) K

j=1 p(x|rj = 1)p(rj = 1)

(9.13) = N(x|µk, σ2

k)πk

K

j=1 N(x|µj, σ2 j )πj

Fill in these probabilities for r1 and r2 in the table. They act like weights that indicate to what extent each observation belongs to cluster 1 and cluster 2 respectively.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 39 / 66

slide-40
SLIDE 40

ML estimation: second guess

Fill in the computed probabilities (the R function dnorm was used to compute normal density values) n x r1 r2 1 4 0.62 0.38 2 8 0.18 0.82 3 2 0.82 0.18 4 6 0.38 0.62 µk = N

n=1 rnkxn

N

n=1 rnk

, σ2

k =

N

n=1 rnk(xn − µk)2

N

n=1 rnk

, πk = N

n=1 rnk

N ML estimates, second guess µ(2)

1

= 3.92, µ(2)

2

= 6.08, σ2(2)

1

= 1.82, σ2(2)

2

= 1.82, π(2)

1

= 0.5, π(2)

2

= 0.5.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 40 / 66

slide-41
SLIDE 41

Convergence of EM to local maximum

Continue iteration until likelihood hardly increases, or parameter estimates hardly change. Then we have found a local maximum of the likelihood function.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 41 / 66

slide-42
SLIDE 42

Sequence of iterates µ(t)

1 iteration 1

(t)

5 10 15 20 3.0 3.2 3.4 3.6 3.8 4.0

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 42 / 66

slide-43
SLIDE 43

Sequence of iterates for µ(t)

2

5 10 15 20 6.0 6.2 6.4 6.6 6.8 7.0

iteration t 2

(t)

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 43 / 66

slide-44
SLIDE 44

K-means algorithm

1 Partition the observations into K initial clusters. 2 (M-step) Calculate the mean of each cluster. 3 (E-step) Assign each observation to the cluster whose mean is nearest. 4 If reassignments have taken place, return to step 2; otherwise stop. Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 44 / 66

slide-45
SLIDE 45

EM-algorithm for normal mixtures

1 Partition the observations into K initial clusters. 2 (M-step) Calculate the mean, covariance matrix and mixing

proportion of each cluster.

3 (E-step) Calculate the posterior probability of cluster membership for

each observation.

4 (Check convergence) If parameters/posterior probabilities have

changed, return to step 2; otherwise stop.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 45 / 66

slide-46
SLIDE 46

Mixture of two normal components (perspective)

  • 4
  • 2

2 4 X

  • 4
  • 2

2 4 Y 0 0.020.040.060.08 0.1 0.120.14 Z

= (2,2) = 0.8 x = y = 1 = 0.5 = (1,2) = 0.5 x = y = 1 = 0.5

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 46 / 66

slide-47
SLIDE 47

Mixture of two normal components (contour)

x y

  • 4
  • 2

2 4

  • 4
  • 2

2 4

= (2,2) = 0.8 x = y = 1 = 0.5 = (1,2) = 0.5 x = y = 1 = 0.5

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 47 / 66

slide-48
SLIDE 48

Examples of contours

x y

  • 4
  • 2

2 4

  • 4
  • 2

2 4 x y

  • 4
  • 2

2 4

  • 4
  • 2

2 4 x y

  • 4
  • 2

2 4

  • 4
  • 2

2 4 x y

  • 4
  • 2

2 4

  • 4
  • 2

2 4

= 0 x y = 0.8 x y = 0.8 x y = 0.8 x y

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 48 / 66

slide-49
SLIDE 49

Finding clusters of different shapes

We can find clusters of different shapes/orientation by constraining the covariance matrices of the clusters in different ways. Examples of restrictions: None (as in Quadratic Discriminant Analysis) Equal for all clusters (as in Linear Discriminant Analysis). Diagonal and equal for all clusters. Diagonal but not equal for all clusters.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 49 / 66

slide-50
SLIDE 50

Covariance structure

Equal Arbitrary Diagonal equal Diagonal unequal

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 50 / 66

slide-51
SLIDE 51

Analysis of Ruspini data

  • ●●
  • 20

40 60 80 100 120 50 100 150 Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 51 / 66

slide-52
SLIDE 52

Analysis of Ruspini data

> library(mclust) > ruspiniBIC.EII <- EMclust(ruspini,G=2:9,emModelNames="EII") > plot(ruspiniBIC.EII) > ruspiniBIC.Sum <- summary(ruspiniBIC.EII,ruspini) > ruspiniBIC.Sum best BIC values: EII,5 EII,4

  • 1344.212 -1351.412

best model: spherical, equal volume > plot(ruspini[,1],ruspini[,2],type="n") > text(ruspini[,1],ruspini[,2],ruspiniBIC.Sum$class)

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 52 / 66

slide-53
SLIDE 53

What does EII mean?

In mclust, the model type is specified by properties of the cluster covariance matrices. EII means cluster volumes are equal (E), shape is the identity (I), and orientation is the identity (I). This corresponds to the restriction: Σk = Σ = σ2■

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 53 / 66

slide-54
SLIDE 54

How many clusters?

1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 −1500 −1450 −1400 −1350 number of clusters BIC

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 54 / 66

slide-55
SLIDE 55

BIC Score

Bayesian Information Criterion BIC = 2 ln p(X|θML) − ln(N) × #par > ruspiniBIC.Sum$loglik [1] -639.7248 > ruspiniBIC.Sum$loglik*2-log(75)*15 [1] -1344.212 Why 15 parameters? 5 × 2 means, 1 variance, 4 mixing proportions

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 55 / 66

slide-56
SLIDE 56

Final Clustering

20 40 60 80 100 120 50 100 150 ruspini[, 1] ruspini[, 2] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 22 2 2 22 22 2 2 2 2 2 2 2 4 4 3 3 3 4 4 4 4 4 4 44 4 4 44 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 56 / 66

slide-57
SLIDE 57

Find the two clusters

  • −5

5 10 15 −1 1 2 3 4 5 x[, 1] x[, 2]

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 57 / 66

slide-58
SLIDE 58

How we generated the data

> library(mvtnorm) > sigma <- matrix(c(30,3,3,0.5),2) > sigma [,1] [,2] [1,] 30 3.0 [2,] 3 0.5 > mu1 <- c(1.5,0) > mu2 <- c(2,4) > x1 <- rmvnorm(10,mean=mu1,sigma=sigma) > x2 <- rmvnorm(10,mean=mu2,sigma=sigma) > x <- rbind(x1,x2)

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 58 / 66

slide-59
SLIDE 59

Fitting different models

> testBIC <- EMclust(x,emModelNames=c("EII","EEI","EEE","EEV")) > plot(testBIC) EII EEI EEE EEV "1" "2" "3" "4" > testBIC.Sum <- summary(testBIC,x) > testBIC.Sum$class [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 59 / 66

slide-60
SLIDE 60

BIC Scores

1 1 1 1 1 1 1 1 1 2 4 6 8 −240 −230 −220 −210 −200 number of clusters BIC 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 60 / 66

slide-61
SLIDE 61

Best clustering found

−5 5 10 15 −1 1 2 3 4 5 x[, 1] x[, 2] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 22 2

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 61 / 66

slide-62
SLIDE 62

EII-model

> testBIC.EII <- EMclust(x,emModelNames=c("EII")) > testBIC.EII.Sum <- summary(testBIC.EII,x) > testBIC.EII.Sum classification table: 1 2 3 15 4 1 best BIC values: EII,3 EII,1

  • 227.4807 -231.5405

best model: spherical, equal volume > testBIC.EII.Sum$class [1] 1 1 1 1 1 2 2 1 2 1 1 1 3 1 1 1 2 1 1 1

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 62 / 66

slide-63
SLIDE 63

Clusters found by EII model

−5 5 10 15 −1 1 2 3 4 5 x[, 1] x[, 2] 1 1 1 1 1 2 2 1 2 1 1 1 3 1 1 1 2 11 1

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 63 / 66

slide-64
SLIDE 64

Clusters found by EII model with K=2

−5 5 10 15 −1 1 2 3 4 5 x[, 1] x[, 2] 1 1 2 1 1 1 1 1 1 2 2 2 2 1 1 2 1 11 2

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 64 / 66

slide-65
SLIDE 65

Clusters found by K-means with K=2

−5 5 10 15 −1 1 2 3 4 5 x[, 1] x[, 2] 2 1 1 1 2 2 2 2 2 1 1 1 1 2 2 1 2 11 1

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 65 / 66

slide-66
SLIDE 66

K-means and model-based clustering

Discriminant function for class k dk(x) = ln |Σk| − 2 ln p(Ck) + (x − µk)⊤Σ−1

k (x − µk)

Assume (EII assumption) Σk = Σ = σ2■ Then Σ−1

= (σ2■)−1 = 1 σ2 ■ so (x − µk)⊤Σ−1

k (x − µk) = 1

σ2 (x − µk)⊤(x − µk) where (x − µk)⊤(x − µk) =

D

  • d=1

(xd − µk,d)2 is the squared Euclidian distance between x and µk.

Ad Feelders ( Universiteit Utrecht ) Pattern Recognition December 13, 2019 66 / 66