Clustering: Models and Algorithms Shikui Tu 2019-02-28 1 Outline - - PowerPoint PPT Presentation

clustering models and algorithms
SMART_READER_LITE
LIVE PREVIEW

Clustering: Models and Algorithms Shikui Tu 2019-02-28 1 Outline - - PowerPoint PPT Presentation

Clustering: Models and Algorithms Shikui Tu 2019-02-28 1 Outline Clustering K-mean clustering, hierarchical clustering Adaptive learning (online learning) CL, FSCL, RPCL Gaussian Mixture Models (GMM)


slide-1
SLIDE 1

Clustering: Models and Algorithms

Shikui Tu 2019-02-28

1

slide-2
SLIDE 2

Outline

  • Clustering

– K-mean clustering, hierarchical clustering

  • Adaptive learning (online learning)

– CL, FSCL, RPCL

  • Gaussian Mixture Models (GMM)
  • Expectation-Maximization (EM) for maximum

likelihood

2

slide-3
SLIDE 3

What is clustering?

  • 8 APRIL 2016 • VOL 352 ISSUE 6282, SCIENCE
  • Six malignant tumors (melanoma)

3

slide-4
SLIDE 4

How to represent a cluster

  • 4
slide-5
SLIDE 5

How to define error?

! xt ||! - xt ||2

Square distance:

||! - x1 ||2 + ||! - x2 ||2 + ||! - x3 ||2

!

5

slide-6
SLIDE 6

Matrix derivatives

6

http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/3274/pdf/imm3274.pdf

slide-7
SLIDE 7

Clustering the data

We have the following data: We want to cluster the data into two clusters (red and blue)

How?

7

slide-8
SLIDE 8

Minimize the sum of square distances J

rnk = 1 if and only if data point xn is assigned to cluster k;

  • therwise rnk = 0.

minimize

k = 1, 2; K = 2 clusters n = 1, …, N; N: the total number of points. rn1 = 1 rn2 = 0

µ2 µ1

We need to calculate { rnk }and { µk }.

8

slide-9
SLIDE 9

If we know rn1 , rn2 for all n=1,…,N

µ2 µ1

Since the points have been assigned to cluster 1 or cluster 2, we calculate

µ1 = mean of the points in cluster 1

Or formally

µ2 = mean of the points in cluster 2

We call it the M Step.

9

slide-10
SLIDE 10

If we know µ1, µ2

µ2 µ1

|| xn – µ1 ||2 < || xn – µ2 ||2 We should assign point xn to cluster 1, because Then,

rn1 = 1 rn2 = 0

Or formally We call it the E Step

10

slide-11
SLIDE 11

Initialization

µ2 µ1

11

slide-12
SLIDE 12

Given µ1, µ2 , calculate rn1 , rn2 for all n=1,…,N

E Step

Assign the points to the nearest cluster:

Steps

Equal distance line

12

slide-13
SLIDE 13

Given rn1 , rn2 , calculate µ1, µ2

M Step

Calculate the means of the points in each cluster:

Steps 13

slide-14
SLIDE 14

Given µ1, µ2 , calculate rn1 , rn2 for all n=1,…,N

E Step

Assign the points to the nearest cluster:

Steps 14

slide-15
SLIDE 15

Given rn1 , rn2 , calculate µ1, µ2

M Step

Calculate the means of the points in each cluster:

Steps 15

slide-16
SLIDE 16

Initialization E-Step M-Step E-Step M-Step M-Step E-Step E-Step Convergence If J does not change, or { µ1, µ2 } do not change, then the algorithm converges.

16

slide-17
SLIDE 17

K

  • !1,…,!k

– !i – !i

slide-18
SLIDE 18

Basic ingredients

  • Model or structure
  • Objective function
  • Algorithm
  • Convergence
slide-19
SLIDE 19

Questions for K-mean algorithm

  • Does it find the global optimum of J?

– No, the nearest local optimum, depending on initialization

  • If Euclidean distance is not good for some data,

do we have other choices?

  • Can we assign each data point to the clusters

probabilistically?

  • If K (the total number of clusters) is unknown, can

we estimate it from the data?

19

slide-20
SLIDE 20

Outline

  • Clustering

– K-mean clustering, hierarchical clustering

  • Adaptive learning (online learning)

– CL, FSCL, RPCL

  • Gaussian Mixture Models (GMM)
  • Expectation-Maximization (EM) for maximum

likelihood

20

slide-21
SLIDE 21

Hierarchical Clustering

  • k-means clustering requires

– k – Positions of initial centers – A distance measure between points (e.g. Euclidean distance)

  • Hierarchical clustering requires a measure of

distance between groups of data points

Adapted from Blei, D. Hierarchial Cluster [PwerPoint slides]. www.cs.princeton.edu/courses/archive/spr08/cos424/slides/clustering-2.pdf

21

slide-22
SLIDE 22

Hierarchical Clustering

  • Agglomerative clustering
  • A very simple procedure:

– Assign each data point into its own group – Repeat: look for the two closest groups and merge them into one group – Stop when all the data points are merged into a single cluster

Adapted from Blei, D. Hierarchial Cluster [PwerPoint slides]. www.cs.princeton.edu/courses/archive/spr08/cos424/slides/clustering-2.pdf

22

slide-23
SLIDE 23

Distance Measure

  • Distance between data points a and b:

  • Group A and B

– Single-linkage – Complete-linkage – Average-linkage

d(A, B) = min

a∈A,b∈B d(a, b)

d(a, b) d(A, B) = max

a∈A,b∈B d(a, b)

d(A, B) = P

a∈A,b∈B d(a, b)

|A| · |B|

23

slide-24
SLIDE 24

Dendrogram

Jain, A. K., Murty, M. N., Flynn, P. J. (1999) "Data Clustering: A Review". ACM Computing Surveys (CSUR), 31(3), p.264-323, 1999.

24

Distance

slide-25
SLIDE 25

Outline

  • Clustering

– K-mean clustering, hierarchical clustering

  • Adaptive learning (online learning)

– CL, FSCL, RPCL

  • Gaussian Mixture Models (GMM)
  • Expectation-Maximization (EM) for maximum

likelihood

25

slide-26
SLIDE 26

From batch to adaptive

  • Given a batch of data points
  • Data points come one by one:

26

x1 x2 xN

slide-27
SLIDE 27

Competitive learning

  • Data points come one by one:

27

x1 x2 xN

slide-28
SLIDE 28

When starting with “bad initializations”

28

slide-29
SLIDE 29

A four-cluster case

29

slide-30
SLIDE 30

frequency sensitive competitive learning (FSCL) [Ahalt et al., 1990]

30

The idea is to penalize the frequent winners:

slide-31
SLIDE 31

FSCL is not good when there are extra centers

31

When k is pre-assigned to 5. the frequency sensitive mechanism also brings the extra one into data to disturb the correct locations of others

slide-32
SLIDE 32

Rival penalized competitive learning (RPCL)

32

(Xu, Krzyzak, & Oja, 1992 , 1993) The RPCL differs from FSCL by implementing pj,t as follows:

where γ approximately takes a number between 0.05 and 0.1 for controlling the penalizing strength.

slide-33
SLIDE 33

33

Rival penalized mechanism makes extra agents driven far away.

slide-34
SLIDE 34

Thank you!

53