Lecture 22: Clustering Distance measures K-Means Aykut Erdem - - PowerPoint PPT Presentation

lecture 22
SMART_READER_LITE
LIVE PREVIEW

Lecture 22: Clustering Distance measures K-Means Aykut Erdem - - PowerPoint PPT Presentation

Lecture 22: Clustering Distance measures K-Means Aykut Erdem December 2016 Hacettepe University Last time Boosting Idea: given a weak learner, run it multiple times on (reweighted) training data, then let the learned classifiers


slide-1
SLIDE 1

Lecture 22:

−Clustering −Distance measures −K-Means

Aykut Erdem

December 2016 Hacettepe University

slide-2
SLIDE 2

Last time… Boosting

  • Idea: given a weak learner, run it multiple times on (reweighted)

training data, then let the learned classifiers vote

  • On each iteration t:
  • weight each training example by how incorrectly it was classified
  • Learn a hypothesis – ht
  • A strength for this hypothesis – at
  • Final classifier:
  • A linear combination of the votes of the different classifiers

weighted by their strength

  • Practically useful
  • Theoretically interesting

2

slide by Aarti Singh & Barnabas Poczos

slide-3
SLIDE 3

Last time.. The AdaBoost Algorithm

3

slide by Jiri Matas and Jan Šochman

slide-4
SLIDE 4

This week

  • Distance measures
  • K-Means
  • Spectral clustering
  • Hierarchical clustering
  • What is a good clustering?

4

slide-5
SLIDE 5

Distance measures

5

slide-6
SLIDE 6

Distance measures

  • In studying clustering techniques we will

assume that we are given a matrix of distances between all pairs of data points:

6

m 4 3 2 1

x x x x x

1

x

2

x

3

x

m

x

4

x

  • • •

) x , d(x

j i

slide by Julia Hockenmeier

slide-7
SLIDE 7

What is Similarity/Dissimilarity?

  • The real meaning of similarity is a philosophical question. We will take

a more pragmatic approach.

  • Depends on representation and algorithm. For many rep.//alg., easier

to think in terms of a distance (rather than similarity) between vectors.

7

Hard to define! But we know it when we see it

  • slide by Eric Xing
slide-8
SLIDE 8

Defining Distance Measures

8

0.23 3 342.7

gene2 gene1

  • Definition: Let O1 and O2 be two objects from the

universe of possible objects. The distance (dissimilarity) between O1 and O2 is a real number denoted by D(O1, O2).

slide by Andrew Moore

slide-9
SLIDE 9
  • ance

d(x,y) (xi yi)2

i

  • 9
  • Similarity rather than distance
  • Can determine similar trends

A few examples:

  • Euclidean distance
  • Correlation coefficient
  • coefficient

s(x,y) (xi x)(yi y)

i

  • x y
  • slide by Andrew Moore
slide-10
SLIDE 10

What properties should a distance measure have?

  • Symmetric
  • D(A,B) = D(B,A)
  • Otherwise, we can say A looks like B but B does not look

like A

  • Positivity, and self-similarity
  • D(A,B) ≥ 0, and D(A,B) = 0 iff A = B
  • Otherwise there will different objects that we cannot tell

apart

  • Triangle inequality
  • D(A,B) + D(B,C) ≥ D(A,C)
  • Otherwise one can say “A is like B, B is like C, but A is not

like C at all”

10

slide by Alan Fern

slide-11
SLIDE 11

hattan (L1) ity (Sup) Distance L

d(x, y) = x - y = xi − yi

i=1 d

idean (L2) hattan (L )

d(x, y) = (xi − yi)2

i=1 d

Distance measures

  • Euclidean (L2) 


  • Manhattan (L1)


  • Infinity (Sup) Distance L∞


  • Note that L∞ < L1 < L2, but different distances do

not induce the same ordering on points.

11

ity (Sup) Distance L∞

d(x, y) = max1≤i≤d xi − yi

slide by Julia Hockenmeier

slide-12
SLIDE 12

Distance measures

12

x = (x1, x2) y = (x1–2, x2+4)

Euclidean: (42 + 22)1/2 = 4.47 Manhattan: 4+ 2 = 6 Sup: Max(4,2) = 4

2 4

slide by Julia Hockenmeier

slide-13
SLIDE 13

Distance measures

  • Different distances do not induce the same
  • rdering on points

13 9

4

ε ε + = + = =

5 ) (5 b) (a, L 5 b) (a, L

1/2 2 2 2

4

5

66 . 5 4 = = + = =

2 4 ) (4 d) (c, L 4 d) (c, L

1/2 2 2 2

b) (a, L d) (c, L b) (a, L d) (c, L

2 2

> <

∞ ∞

slide by Julia Hockenmeier

slide-14
SLIDE 14

Distance measures

  • Clustering is sensitive to the distance measure.
  • Sometimes it is beneficial to use a distance

measure that is invariant to transformations that are natural to the problem:

  • Mahalanobis distance:

✓ Shift and scale invariance

14

slide by Julia Hockenmeier

slide-15
SLIDE 15

Mahalanobis Distance

15

Σ is a (symmetric) Covariance Matrix: Translates all the axes to a mean = 0 and variance = 1 (shift and scale invariance)

d(x, y) = (x - y)T Σ(x − y)

µ = 1 m xi

i=1 m

, (average of the data) Σ = 1 m (x −µ)(x −µ)T,

i=1 m

a matrix of size m× m

slide by Julia Hockenmeier

slide-16
SLIDE 16

Distance measures

  • Some algorithms require distances between

a point x and a set of points A d(x, A) 


This might be defined e.g. as min/max/avg distance between x and any point in A. 


  • Others require distances between two sets
  • f points A, B, d(A, B). 


This might be defined e.g as min/max/avg distance between any point in A and any point in B.

16

slide by Julia Hockenmeier

slide-17
SLIDE 17

Clustering algorithms

  • Partitioning algorithms
  • Construct various partitions


and then evaluate them by 
 some criterion

  • K-means
  • Mixture of Gaussians
  • Spectral Clustering
  • Hierarchical algorithms
  • Create a hierarchical decomposition 

  • f the set of objects using some


criterion

  • Bottom-up – agglomerative
  • Top-down – divisive

17

  • %;
  • slide by Eric Xing
slide-18
SLIDE 18

Desirable Properties of a Clustering Algorithm

  • Scalability (in terms of both time and space)
  • Ability to deal with different data types
  • Minimal requirements for domain

knowledge to determine input parameters

  • Ability to deal with noisy data
  • Interpretability and usability
  • Optional
  • Incorporation of user-specified constraints

18

slide by Andrew Moore

slide-19
SLIDE 19

K-Means

19

slide-20
SLIDE 20

K-Means

  • An iterative

clustering algorithm

  • Initialize: Pick K

random points as cluster centers (means)

  • Alternate:
  • Assign data instances

to closest mean

  • Assign each mean to

the average of its assigned points

  • Stop when no points’

assignments change

20

slide by David Sontag

slide-21
SLIDE 21

K-Means

  • An iterative

clustering algorithm

  • Initialize: Pick K

random points as cluster centers (means)

  • Alternate:
  • Assign data instances

to closest mean

  • Assign each mean to

the average of its assigned points

  • Stop when no points’

assignments change

21

slide by David Sontag

slide-22
SLIDE 22

K-Means Clustering: Example

  • Pick K random

points as cluster centers (means)

Shown here for K=2

22

slide by David Sontag

slide-23
SLIDE 23

Iterative Step 1

  • Assign data points

to closest cluster centers

23

K-Means Clustering: Example

slide by David Sontag

slide-24
SLIDE 24

Iterative Step 2

  • Change the cluster

center to the average of the assigned points

24

K-Means Clustering: Example

slide by David Sontag

slide-25
SLIDE 25
  • Repeat until

convergence

25

K-Means Clustering: Example

slide by David Sontag

slide-26
SLIDE 26

26

K-Means Clustering: Example

slide by David Sontag

slide-27
SLIDE 27

27

K-Means Clustering: Example

slide by David Sontag

slide-28
SLIDE 28

Properties of K-Means Algorithms

  • Guaranteed to converge in a finite number
  • f iterations
  • Running time per iteration:
  • 1. Assign data points to closest cluster center 


O(KN) time

  • 2. Change the cluster center to the average of its

assigned points 
 O(N) time

28

slide by David Sontag

slide-29
SLIDE 29

Objective

  • 1. Fix μ, optimize C:
  • 2. Fix C, optimize μ:

– Take partial derivative of μi and set to zero, we have

29

K-Means takes an alternating optimization approach, each step is guaranteed to decrease the objective – thus guaranteed to converge

K-Means Convergence

slide by Alan Fern

slide-30
SLIDE 30

Demo time…

30

slide-31
SLIDE 31

K-Means 
 Example Applications

31

slide-32
SLIDE 32

Example: K-Means for Segmentation

32

K = 2

K=2

Original image

Original

K = 3

K=3

K = 10

K=10

Goal of Segmentation is to partition an image into regions each of which has reasonably homogenous visual appearance.

slide by David Sontag

slide-33
SLIDE 33

Example: K-Means for Segmentation

33

K = 2

K=2

Original image

Original

K = 3

K=3

K = 10

K=10

slide by David Sontag

slide-34
SLIDE 34

Example: K-Means for Segmentation

34

K = 2

K=2

Original image

Original

K = 3

K=3

K = 10

K=10

slide by David Sontag

slide-35
SLIDE 35

Example: Vector quantization

35

FIGURE 14.9. Sir Ronald A. Fisher (1890 − 1962) was one of the founders

  • f modern day statistics, to whom we owe maximum-likelihood, sufficiency, and

many other fundamental concepts. The image on the left is a 1024×1024 grayscale image at 8 bits per pixel. The center image is the result of 2 × 2 block VQ, using 200 code vectors, with a compression rate of 1.9 bits/pixel. The right image uses

  • nly four code vectors, with a compression rate of 0.50 bits/pixel

[Figure from Hastie et al. book]

slide by David Sontag

slide-36
SLIDE 36

Example: Simple Linear Iterative Clustering (SLIC) superpixels

36

  • R. Achanta, A. Shaji, K. Smith, A. Lucchi, P

. Fua, and S. Susstrunk SLIC Superpixels Compared to State-of-the-art Superpixel Methods, IEEE T-PAMI, 2012

λ: spatial regularization parameter

slide-37
SLIDE 37

Bag of Words model

37

aardvark 0 about 2 all 2 Africa 1 apple anxious ... gas 1 ...

  • il

1 … Zaire

slide by Carlos Guestrin

slide-38
SLIDE 38

38

slide by Fei Fei Li

slide-39
SLIDE 39

39

Object Bag of ‘words’

slide by Fei Fei Li

slide-40
SLIDE 40

Interest Point Features

40

Normalize patch

Detect patches

[Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03]

Compute SIFT descriptor

[Lowe’99]

slide by Josef Sivic

slide-41
SLIDE 41

Patch Features

41

slide by Josef Sivic

slide-42
SLIDE 42

Dictionary Formation

42

slide by Josef Sivic

slide-43
SLIDE 43

Clustering (usually K-means)

43

Vector quantization

slide by Josef Sivic

slide-44
SLIDE 44

Clustered Image Patches

44

slide by Fei Fei Li

slide-45
SLIDE 45

Visual synonyms and polysemy

45

Visual Polysemy. Single visual word occurring on different (but locally 
 similar) parts on different object categories. Visual Synonyms. Two different visual words representing a similar part of an object (wheel of a motorbike).

slide by Andrew Zisserman

slide-46
SLIDE 46

Image Representation

46

…..

frequency

codewords

slide by Fei Fei Li

slide-47
SLIDE 47

K-Means Clustering: Some Issues

  • How to set k?
  • Sensitive to initial centers
  • Sensitive to outliers
  • Detects spherical clusters
  • Assuming means can be computed

47

slide by Kristen Grauman