Clusterability in Model Selection Johannes Kiesel - - PowerPoint PPT Presentation

clusterability in model selection
SMART_READER_LITE
LIVE PREVIEW

Clusterability in Model Selection Johannes Kiesel - - PowerPoint PPT Presentation

Clusterability in Model Selection Johannes Kiesel Bauhaus-Universitt Weimar 28 th May, 2014 1 [] Cluster Analysis: Motivation Art and Design Computer Science Media Studies Data Categorization Given data (a set of comparable entities


slide-1
SLIDE 1

Clusterability in Model Selection

Johannes Kiesel

Bauhaus-Universität Weimar

28th May, 2014

[]

1

slide-2
SLIDE 2

Cluster Analysis: Motivation

Data Categorization

Art and Design Computer Science Media Studies

Given data (a set of comparable entities or objects) Find a categorization of it

[]

2

slide-3
SLIDE 3

Cluster Analysis: Motivation

Data Categorization ? ? ? Given data (a set of comparable entities or objects) Find a categorization of it (without labels)

[]

2

slide-4
SLIDE 4

Cluster Analysis: Motivation

Data Categorization D R A W E Y Given data (a set of comparable entities or objects) Find a categorization of it (without labels)

[]

2

slide-5
SLIDE 5

Cluster Analysis: In the Beginning was the Data

Data

[]

3

slide-6
SLIDE 6

Cluster Analysis: Modeling

Data Model Age: Fashion index: XKCD/week: Library (h/day): Sketches/day:

[]

4

slide-7
SLIDE 7

Cluster Analysis: Modeling

Data Model Age: Fashion index: XKCD/week: Library (h/day): Sketches/day:

[]

4

slide-8
SLIDE 8

Cluster Analysis: Clustering

Data Model Clustering

Clustering algorithm

[]

5

slide-9
SLIDE 9

Cluster Analysis: Clustering

Data Model Clustering

Clustering algorithm

Categorization

D R A W E Y

[]

5

slide-10
SLIDE 10

Cluster Analysis: Clustering

Data Model Clustering

Clustering algorithm

Categorization

D R A W E Y

[]

5

slide-11
SLIDE 11

Cluster Analysis: Clustering

Data Model Clustering

Clustering algorithm

Categorization

D R A W E Y

[]

5

slide-12
SLIDE 12

Cluster Analysis: Modeling II

Data Model Age: Fashion index: XKCD/week: Library (h/day): Sketches/day:

[]

6

slide-13
SLIDE 13

Cluster Analysis: Modeling II

Data Model Age: Noselength (cm): Weight (kg): Heigth (cm): Student ID:

[]

6

slide-14
SLIDE 14

Cluster Analysis: Modeling II

Data Model Clustering

Clustering algorithm

Categorization

[]

6

slide-15
SLIDE 15

Cluster Analysis: Modeling II

Data Model Clustering

Clustering algorithm

Categorization

[]

6

slide-16
SLIDE 16

Cluster Analysis: Cluster Evaluation

Clustering Model

Clustering Algorithm

[]

7

slide-17
SLIDE 17

Cluster Analysis: Cluster Evaluation

Clustering Model

Clustering Algorithm Evaluation index Separation Cohesiveness

[]

7

slide-18
SLIDE 18

Cluster Analysis: Cluster Evaluation

Clustering Model

Clustering Algorithm Evaluation index Separation Cohesiveness

Test (2.0) good

[]

7

slide-19
SLIDE 19

Cluster Analysis: Cluster Evaluation

Clustering Model

Clustering Algorithm Evaluation index Separation Cohesiveness

Test (0.0) bad

[]

7

slide-20
SLIDE 20

Cluster Analysis: Cluster Evaluation

Clustering Model

Clustering Algorithm Evaluation index Separation Cohesiveness

Test (0.0) bad

[]

7

slide-21
SLIDE 21

Cluster Analysis: Cluster Evaluation

Clustering Model

Clustering Algorithm Evaluation index Separation Cohesiveness

Test (0.0) bad

[]

7

slide-22
SLIDE 22

Cluster Analysis: Model Evaluation

Model Test (0.0) bad

Clusterability index

Clustering

[]

8

slide-23
SLIDE 23

Cluster Analysis: Overview

Test (1.2) Test (4.2) Test (2.3) Test (1.3) Test (0.9) Test (1.4) Test (0.6) Test (0.8) Test (2.0) Test (1.0)

Clusterability index Clustering algorithm(s) Evaluation index

[]

9

slide-24
SLIDE 24

Clusterability

◮ Task: calculate a score for a model ◮ Has to be comparable at least among similar models

(same number of objects) Test (4.2)

◮ A clusterable model (high score) has a dominant

structure of mutually separated parts that are cohesive groups of objects.

[]

10

slide-25
SLIDE 25

Clusterability I: Salient Clustering

Idea Model selection by cluster evaluation (“one-step”)

◮ Cluster the model with different algorithms and/or

parameter settings

◮ Evaluate all clusterings ◮ Choose best combination of model & clustering

→ two-step

  • ne-step

[]

11

slide-26
SLIDE 26

Clusterability I: Dunn Index

Evaluation index Dunn index family min( )/max(1/ ) Dunn MST Minimum spanning tree Dunn index (Dunn MST) 1/ Largest edge length in the minimum spanning tree of the cluster Smallest dissimilarity of objects from different clusters Optimum clustering is feasibly computable (no other clustering algorithm necessary)

[]

12

slide-27
SLIDE 27

Clusterability I: Salient Clustering

+

+

Needs no additional clusterability index

+ Evaluation indices are

better understood

  • Most evaluation indices

require local optimization

  • Not all evaluation indices

can compare clusterings

  • f different models

[]

13

slide-28
SLIDE 28

Clusterability II: Statistical Tests on Structure

Idea Use a statistical test for unstructured models

◮ Null hypothesis: model generated from a model

distribution that generates non-clusterable models (e.g., uniform distribution)

◮ Calculate a test statistic with known distribution under

the null hypothesis

◮ Use the probability that a similar large value occurs

under the null hypothesis for the clusterability assessment

[]

14

slide-29
SLIDE 29

Clusterability II: Hopkins and Skellam Statistic

x x0 spaced uniform clustered Compare distribution of original objects (x) and r uniformly sampled x0 (null hypothesis)

[Hopkins and Skellam. A New Method for Determining the Type of Distribution of Plant Individuals. 1954]

15

slide-30
SLIDE 30

Clusterability II: Hopkins and Skellam Statistic

x x0 ψnn(x) ψnn(x0) Hr → 0 Hr ≈ 0.5 Hr → 1 Compare distribution of original objects (x) and r uniformly sampled x0 (null hypothesis) ψnn(x) Dissimilarity of x to its nearest neighbor

[Hopkins and Skellam. A New Method for Determining the Type of Distribution of Plant Individuals. 1954]

15

slide-31
SLIDE 31

Clusterability II: Hopkins and Skellam Statistic

x x0 ψnn(x) ψnn(x0) Hr → 0 Hr ≈ 0.5 Hr → 1 Compare distribution of original objects (x) and r uniformly sampled x0 (null hypothesis) Hr = r

i=1(ψnn(x0 i ))m

r

i=1(ψnn(x0 i ))m + r i=1(ψnn(xπ(i)))m

ψnn(x) Dissimilarity of x to its nearest neighbor m Number of dimensions

[Hopkins and Skellam. A New Method for Determining the Type of Distribution of Plant Individuals. 1954]

15

slide-32
SLIDE 32

Clusterability II: Statistical Tests on Structure

+

+ The distribution under

the null hypothesis allows for an interpretation of the score

+ Often requires only a

sample

  • Depends heavily on the

null hypothesis

  • Adjustment of statistics

is not trivial

1 2 3 4 5 0.2 0.4 0.6 0.8 1 probability density function Hr (uniform distribution)

βr,r-distribution

[]

16

slide-33
SLIDE 33

Clusterability III: Concentration of Dissimilarities

Idea In a clusterable model most object pairs should be either very dissimilar (different clusters) or very similar (same clusters)

separation cohesiveness 0.2 0.4 0.6 0.8 1 similarity ϕ

Similarity-histogram

◮ Test if relatively few dissimilarities are of average size

[]

17

slide-34
SLIDE 34

Clusterability III: Dash et al. score

spaced uniform clustered

0.2 0.4 0.6 0.8 1 similarity ϕ

Similarity-histogram

[Dash et al. Dimensionality Reduction for Unsupervised Data. 1997]

18

slide-35
SLIDE 35

Clusterability III: Dash et al. score

spaced uniform clustered

0.2 0.4 0.6 0.8 1 similarity ϕ

Similarity-histogram

0.2 0.4 0.6 0.8 1 similarity ϕ

Weighting-function

1 − (ϕ · log2(ϕ) + (1 − ϕ) · log2(1 − ϕ))

[Dash et al. Dimensionality Reduction for Unsupervised Data. 1997]

18

slide-36
SLIDE 36

Clusterability III: Dash et al. score

spaced uniform clustered

0.2 0.4 0.6 0.8 1 similarity ϕ

Similarity-histogram

0.2 0.4 0.6 0.8 1 similarity ϕ

Weighted similarity-histogram

1 − (ϕ · log2(ϕ) + (1 − ϕ) · log2(1 − ϕ))

[Dash et al. Dimensionality Reduction for Unsupervised Data. 1997]

18

slide-37
SLIDE 37

Clusterability III: Dash et al. score

spaced uniform clustered

Clusterability-score

0.2 0.4 0.6 0.8 1 similarity ϕ

Weighted similarity-histogram

1 − (ϕ · log2(ϕ) + (1 − ϕ) · log2(1 − ϕ))

[Dash et al. Dimensionality Reduction for Unsupervised Data. 1997]

18

slide-38
SLIDE 38

Clusterability III: Concentration of Dissimilarities

+

+ Very general idea + Related to the concept

  • f intrinsic dimensionality
  • Not clear when the used

heuristic (see right figure) applies

  • Lacks the interpretability
  • f statistical tests

separation cohesiveness 0.2 0.4 0.6 0.8 1 similarity ϕ

Similarity-histogram

[Dash et al. Dimensionality Reduction for Unsupervised Data. 1997]

19

slide-39
SLIDE 39

Clusterability: Overview

◮ A clusterable model has a dominant structure of

mutually separated parts that are cohesive groups

  • f objects.

Test (4.2)

◮ Clusterability is related to various other topics in data

analysis

◮ Evaluation indices (Dunn) ◮ Tests on model distributions (Hopkins and Skellam) ◮ Methods of unsupervised feature selection (Dash et al.) ◮ Estimators of intrinsic dimensionality ◮ . . . ? []

20

slide-40
SLIDE 40

Experiment: Synthetic Models

Can the clusterability indices identify clusterable models? Experiment setup:

◮ 10 model distributions of varying intuitive clusterability

1 model from the uniform distribution

◮ 1 000 models per distribution (results are means) ◮ 180 2-dimensional objects per model

[]

21

slide-41
SLIDE 41

Experiment: Synthetic Models

s = 0 s = 0.1 s = 0.2 s = 0.3

[]

22

slide-42
SLIDE 42

Experiment: Synthetic Models

s = 0 s = 0.1 s = 0.2 s = 0.3 symbol

[]

22

slide-43
SLIDE 43

Experiment: Synthetic Models

0.1 0.2 0.3 mean clusterability s Dunn MST[1] 0.1 0.2 0.3 s Hopkins and Skellam[2] 0.1 0.2 0.3 s Dash et al.

[1] Limited to clusterings with 13 or less clusters [2] Mean of 1 000 applications per model

[]

23

slide-44
SLIDE 44

Experiment: Synthetic Models

0.1 0.2 0.3 mean clusterability s Dunn MST[1] 0.1 0.2 0.3 s Hopkins and Skellam[2] 0.1 0.2 0.3 s Dash et al.

[1] Limited to clusterings with 13 or less clusters [2] Mean of 1 000 applications per model

0.1 0.2 0.3 mean clusterability s Ostrovsky et al. 0.1 0.2 0.3 s Levina and Bickel

[]

23

slide-45
SLIDE 45

Contributions

A clusterable model has a dominant structure of mutually separated parts that are cohesive groups of objects.

◮ Clusterability indices can be used for model selection ◮ The indices differ, among others, with respect to their

preference for fine or coarse structure

◮ If models are (somewhat) meaningful for a dataset,

the more clusterable models are assumed to be also the more meaningful

◮ Clusterability can incorporate ideas from various

related topics (especially clustering evaluation)

◮ Formal properties of clustering evaluation indices can

be converted to properties of clusterability indices

[]

24

slide-46
SLIDE 46

Future Work

◮ Further formalization of clusterability indices ◮ Application to large datasets ◮ Application to high-dimensional problems ◮ Relation to cluster stability ◮ Incorporation of additional knowledge

(constraint clustering)

[]

25

slide-47
SLIDE 47

Thank you for your attention.

[]

26