TOWARDS AN OPTIMAL SUBSPACE FOR K-MEANS ADVISOR: JIA-LING KOH - - PowerPoint PPT Presentation

towards an optimal subspace for k means
SMART_READER_LITE
LIVE PREVIEW

TOWARDS AN OPTIMAL SUBSPACE FOR K-MEANS ADVISOR: JIA-LING KOH - - PowerPoint PPT Presentation

TOWARDS AN OPTIMAL SUBSPACE FOR K-MEANS ADVISOR: JIA-LING KOH SPEAKER: YIN-HSIANG LIAO 2018/01/30, FROM KDD 2017. Introduction Which Two attributes you will pick, If you what to show the clusters? Petal_length, Petal_width. Iris dataset 2


slide-1
SLIDE 1

TOWARDS AN OPTIMAL SUBSPACE FOR K-MEANS

ADVISOR: JIA-LING KOH SPEAKER: YIN-HSIANG LIAO 2018/01/30, FROM KDD 2017.

slide-2
SLIDE 2

Introduction

Iris dataset Petal_length, Petal_width.

Which Two attributes you will pick, If you what to show the clusters?

2

slide-3
SLIDE 3

Introduction

No obvious pair. PCA?

Which Two attributes you will pick, If you what to show the clusters?

3

slide-4
SLIDE 4

Introduction

PCA An orthogonal linear transformation. Descending order in variance.

4

slide-5
SLIDE 5

Introduction

  • Motivation:

A problem of K-means: “Curse of dimensionality.” ( Hard to be interpreted. )

5

slide-6
SLIDE 6

Introduction

  • Goal:

Optimal dimensionality reduction for k-means.

6

slide-7
SLIDE 7

Method

7

slide-8
SLIDE 8

Method

K-Means. In notion of objective function: Minimize

8

________________________

Tan, Page 499, 514

slide-9
SLIDE 9

Method

In short,

9

slide-10
SLIDE 10

Method

Objective function:

10

_____________________ clustered space __________________ noise space

slide-11
SLIDE 11

Method

Intuition of having noise term.

11

slide-12
SLIDE 12

Method

Minimize the objective function: Gradient descent

12

_________________ Transform the problem to an eigen-decomposition one.

slide-13
SLIDE 13

Method

Let do the math

13

_____________________ clustered space ____________________________________

slide-14
SLIDE 14

Method

Let do the math

14

slide-15
SLIDE 15

Method

Let do the math

15

___

slide-16
SLIDE 16

Method

Let do the math

16

(2)

slide-17
SLIDE 17

Method

Let do the math Cyclic permutation property

17

slide-18
SLIDE 18

Method

Let do the math

18

(2)

slide-19
SLIDE 19

Method

Let do the math

19

slide-20
SLIDE 20

Method

Let do the math

20

slide-21
SLIDE 21

Method

Let do the math

21

____________ _______________________

slide-22
SLIDE 22

Method

22

slide-23
SLIDE 23

23

回傳最近的centroid 加入此centroid的cluster

slide-24
SLIDE 24

Method

24

For all clusters

slide-25
SLIDE 25

Method

Complexity:

25

Diagonalization

slide-26
SLIDE 26

Experiment

Compare to k-means with PCA and ICA. Compare to 4 algorithms with dimension reduction during clustering. LDA-k-means FOSSCLU ORCLUS 4C

26

Run each 40 times.

slide-27
SLIDE 27

27

slide-28
SLIDE 28

Experiment

28

slide-29
SLIDE 29

Experiment

29

slide-30
SLIDE 30

Experiment

Problems of : LDA-k-means: fixed (k-1) dims, overfit as high dims. FOSSCLU: SLOW.

30

slide-31
SLIDE 31

Experiment

Limitation: As k-means: Outlier, non-globular, different size, densities Need to have “centroids.”

31

slide-32
SLIDE 32

Conclusion

SubKmeans are: K-means extension. dim(cluster space) is defined automatically. The only parameter is k. Easy to implement and fast.

32