Characterization of Linkage-Based Clustering Margareta Ackerman - - PowerPoint PPT Presentation

characterization of linkage based clustering
SMART_READER_LITE
LIVE PREVIEW

Characterization of Linkage-Based Clustering Margareta Ackerman - - PowerPoint PPT Presentation

Characterization of Linkage-Based Clustering Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo COLT 2010 Motivation There are a wide variety of clustering algorithms, which often produce very different


slide-1
SLIDE 1

Characterization of Linkage-Based Clustering

Margareta Ackerman Joint work with Shai Ben-David and David Loker

University of Waterloo COLT 2010

slide-2
SLIDE 2

There are a wide variety of clustering algorithms, which often produce very different clusterings.

How should a user decide which algorithm to use for a given application?

Motivation

  • M. Ackerman, S. Ben-David, and D. Loker
slide-3
SLIDE 3
  • Identify properties that separate input-output

behaviour of different clustering paradigms

  • The properties should

1) Be intuitive and meaningful to clustering users 2) Distinguish between different clustering algorithms

Our approach for clustering algorithm selection

  • M. Ackerman, S. Ben-David, and D. Loker
slide-4
SLIDE 4
  • Kleinberg proposes abstract properties

(“Axioms”) of clustering functions (NIPS, 2002)

  • Bosagh Zadeh and Ben-David provide a set of

properties that characterize single linkage clustering (UAI, 2009)

Previous work

  • M. Ackerman, S. Ben-David, and D. Loker
slide-5
SLIDE 5

Characterize linkage-based clustering algorithms, using a set of intuitive properties

Our contributions

  • M. Ackerman, S. Ben-David, and D. Loker
slide-6
SLIDE 6
  • Define linkage-based clustering
  • Introduce new clustering properties
  • Main result
  • Sketch of proof
  • Conclusions

Outline

  • M. Ackerman, S. Ben-David, and D. Loker
slide-7
SLIDE 7

For a finite domain set X, a dissimilarity function d

  • ver the members of X.

A Clustering Function F maps Input: (X,d) and k>0 to Output: a k-partition (clustering) of X We require clustering functions to be representation independent and scale invariant.

Formal setup

  • M. Ackerman, S. Ben-David, and D. Loker
slide-8
SLIDE 8

Proceed in steps:

  • Start with the clustering of singletons
  • At each step, merge the closest pair of clusters
  • Repeat until only k clusters remain.
  • Ex. Single linkage, average linkage, complete linkage

Informally, a linkage function is an extension of the between-point distance that applies to subsets of the domain.

  • The choice of the linkage function distinguishes

between different linkage-based algorithms.

?

Linkage-based algorithm: An informal definition

  • M. Ackerman, S. Ben-David, and D. Loker
slide-9
SLIDE 9
  • Define linkage-based clustering
  • Introduce new clustering properties
  • Main result
  • Sketch of proof
  • Conclusions

Outline

  • M. Ackerman, S. Ben-David, and D. Loker
slide-10
SLIDE 10
  • A clustering C is a refinement of clustering C’

if every cluster in C’ is a union of some clusters in C.

  • A clustering function is hierarchical if for

and every F(X,d,k’) is a refinement of F(X,d,k).

d X 

Hierarchical clustering

| | ' 1 X k k   

  • M. Ackerman, S. Ben-David, and D. Loker
slide-11
SLIDE 11

F is local if for any X, d, k and any

), , , ( k d X F C  |) | , , ( C d c F C

C c

C

) 4 , , ( d X F

) 2 , ' / , ' ( X d X F

Locality

  • M. Ackerman, S. Ben-David, and D. Loker
slide-12
SLIDE 12

If d’ equals d, except for increasing between-cluster distances, then F(X,d,k)=F(X,d’,k) for all d, X, and k.

d d’ F(X,d,3) F(X,d’,3)

Outer Consistency

Based on Kleinberg, 2002.

  • M. Ackerman, S. Ben-David, and D. Loker
slide-13
SLIDE 13
  • Some common clustering algorithms fail

locality and outer-consistency

  • Ex. Spectral clustering objectives Ratio Cut and Normalized

Cut

  • Locality and outer-consistency can be used to

distinguish between clustering algorithms (they are not axioms). Not all algorithms are local and outer-consistent!

  • M. Ackerman, S. Ben-David, and D. Loker
slide-14
SLIDE 14

) , (

1 1 d

X ) , (

2 2 d

X ) , (

3 3 d

X ) , (

1 1 d

X ) , (

2 2 d

X ) , (

3 3 d

X

) , ( d X

Extended Richness

  • M. Ackerman, S. Ben-David, and D. Loker
slide-15
SLIDE 15

) , (

1 1 d

X ) , (

2 2 d

X ) , (

3 3 d

X ) , (

1 1 d

X ) , (

2 2 d

X ) , (

3 3 d

X

Extended Richness

) 3 , , ( d X F

  • M. Ackerman, S. Ben-David, and D. Loker
slide-16
SLIDE 16

F satisfies extended richness if for any set of domains there is a d over that extends each of the so that

)} , ( , ), , ( ), , {(

2 2 1 1 k k d

X d X d X 

i

X X 

}. , , , { ) , , (

2 1 k

X X X k d X F  

) , (

1 1 d

X ) , (

2 2 d

X ) , (

3 3 d

X ) , (

1 1 d

X ) , (

2 2 d

X ) , (

3 3 d

X

Extended Richness

) 3 , , ( d X F s di

  • M. Ackerman, S. Ben-David, and D. Loker
slide-17
SLIDE 17
  • Define linkage-based clustering
  • Our new clustering properties
  • Main result
  • Sketch of proof
  • A taxonomy of common clustering algorithms

using our properties

  • Conclusions

Outline

  • M. Ackerman, S. Ben-David, and D. Loker
slide-18
SLIDE 18

Theorem: A clustering function is Linkage-Based if and only if it is Hierarchical, Outer-Consistent, Local and satisfies Extended Richness.

Our main result

  • M. Ackerman, S. Ben-David, and D. Loker
slide-19
SLIDE 19

Every Linkage-Based clustering function is Hierarchical, Local, Outer-Consistent, and satisfies Extended Richness. The proof is quite straight-forward.

Easy direction of proof

  • M. Ackerman, S. Ben-David, and D. Loker
slide-20
SLIDE 20

If F is Hierarchical and it satisfies Outer Consistency, Locality and Extended-Richness then F is Linkage-Based. To prove this direction we first need to formalize linkage-based clustering, by formally defining what is a linkage function.

Interesting direction of proof

  • M. Ackerman, S. Ben-David, and D. Loker
slide-21
SLIDE 21

A linkage function is a function

l:{ : d is a dissimilarity function over }

that satisfies the following:

What do we expect from linkage function?

) , , (

2 1

d X X

2 1

X X 

 R

1) Representation independent: Doesn’t

change if we re-label the data 2) Monotonic: if we increase edges that go between and , then l doesn’t decrease. 3) Any pair of clusters can be made arbitrarily distant: By increasing edges that go between and , we can make l exceed any value in the range of l.

1

X

2

X

) , (

2 1

d X X 

1

X

2

X

) , , (

2 1

d X X

1

X

2

X ) , , (

2 1

d X X

  • M. Ackerman, S. Ben-David, and D. Loker
slide-22
SLIDE 22

Need to prove: If F is a hierarchical function that satisfies the above clustering properties then F is linkage-based. Goal: Given a clustering function F that satisfies the properties, define a linkage function l so that the linkage-based clustering based on l coincides with F (for every X, d and k).

Sketch of proof

  • M. Ackerman, S. Ben-David, and D. Loker
slide-23
SLIDE 23
  • Define an operator <F : (A,B,d1) <F (C,D,d2) if there

exists d that extends d1 and d2 such that when we run F on , A and B are merged before C and D.

) , ( d D C B A   

Sketch of proof (continued…)

) 4 , , ( d D C B A F   

A B C D

  • M. Ackerman, S. Ben-David, and D. Loker
slide-24
SLIDE 24

Sketch of proof (continued…)

) 3 , , ( d D C B A F   

A B C D

  • M. Ackerman, S. Ben-David, and D. Loker
  • Define an operator <F : (A,B,d1) <F (C,D,d2) if there

exists d that extends d1 and d2 such that when we run F on , A and B are merged before C and D.

) , ( d D C B A   

slide-25
SLIDE 25

Sketch of proof (continued…)

) 3 , , ( d D C B A F   

  • Prove that <F can be

extended to a partial

  • rdering
  • Use the ordering to

define l

A B C D

  • M. Ackerman, S. Ben-David, and D. Loker
  • Define an operator <F : (A,B,d1) <F (C,D,d2) if there

exists d that extends d1 and d2 such that when we run F on , A and B are merged before C and D.

) , ( d D C B A   

slide-26
SLIDE 26

Sketch of proof continue:

Show that <F is a partial ordering

We show that <F is cycle-free.

Lemma: Given a function F that is hierarchical, local,

  • uter-consistent and satisfies extended richness,

there are no so that and

) , , ( , ), , , ( ), , , (

1 1 2 2 1 1 1

d B A d B A d B A

n n

) , , ( ) , , ( ) , , (

2 2 2 1 1 1 n n n F F F

d B A d B A d B A    

) , , ( ) , , (

1 1 1 n n n

d B A d B A 

  • M. Ackerman, S. Ben-David, and D. Loker
slide-27
SLIDE 27
  • By the above Lemma, the transitive closure of

<F is a partial ordering.

  • This implies that there exists an order

preserving function l that maps pairs of data sets to R (since <F is defined over a countable set).

  • It can be shown that l satisfies the properties
  • f a linkage function.

Sketch of proof (continued…)

  • M. Ackerman, S. Ben-David, and D. Loker
slide-28
SLIDE 28
  • We introduced new meaningful properties of

clustering algorithms.

  • Prove they characterize linkage-based

algorithms.

  • Whenever all these properties are desirable, a

linkage-based algorithm should be used.

Conclusions

  • M. Ackerman, S. Ben-David, and D. Loker