Clustering Algorithms Johannes Bl omer WS 2015/16 1 / 20 - - PowerPoint PPT Presentation

clustering algorithms
SMART_READER_LITE
LIVE PREVIEW

Clustering Algorithms Johannes Bl omer WS 2015/16 1 / 20 - - PowerPoint PPT Presentation

Clustering Algorithms Johannes Bl omer WS 2015/16 1 / 20 Introduction Clustering techniques for data management and analysis that classify/group given set of objects into categories/subgroups or clusters 2 / 20 Introduction Clustering


slide-1
SLIDE 1

Clustering Algorithms

Johannes Bl¨

  • mer

WS 2015/16

1 / 20

slide-2
SLIDE 2

Introduction

Clustering techniques for data management and analysis that classify/group given set of objects into categories/subgroups or clusters

2 / 20

slide-3
SLIDE 3

Introduction

Clustering techniques for data management and analysis that classify/group given set of objects into categories/subgroups or clusters Clusters homogeneous subgroups of objects such that similarity b/w objects in one subgroup is larger than similarity b/w objects from different subgroups

2 / 20

slide-4
SLIDE 4

Introduction

Clustering techniques for data management and analysis that classify/group given set of objects into categories/subgroups or clusters Clusters homogeneous subgroups of objects such that similarity b/w objects in one subgroup is larger than similarity b/w objects from different subgroups Goals

1 find structures in large set of objects/data 2 simplify large data sets

2 / 20

slide-5
SLIDE 5

Example

3 / 20

slide-6
SLIDE 6

Example

3 / 20

slide-7
SLIDE 7

Example

4 / 20

slide-8
SLIDE 8

Example

How do we measure similarity/dissimilarity of objects?

4 / 20

slide-9
SLIDE 9

Example

How do we measure similarity/dissimilarity of objects? How do we measure quality of clustering?

4 / 20

slide-10
SLIDE 10

Application areas

1 information retrieval

5 / 20

slide-11
SLIDE 11

Application areas

1 information retrieval 2 data mining

5 / 20

slide-12
SLIDE 12

Application areas

1 information retrieval 2 data mining 3 machine learning

5 / 20

slide-13
SLIDE 13

Application areas

1 information retrieval 2 data mining 3 machine learning 4 statistics

5 / 20

slide-14
SLIDE 14

Application areas

1 information retrieval 2 data mining 3 machine learning 4 statistics 5 pattern recognition

5 / 20

slide-15
SLIDE 15

Application areas

1 information retrieval 2 data mining 3 machine learning 4 statistics 5 pattern recognition 6 computer graphics

5 / 20

slide-16
SLIDE 16

Application areas

1 information retrieval 2 data mining 3 machine learning 4 statistics 5 pattern recognition 6 computer graphics 7 data compression

5 / 20

slide-17
SLIDE 17

Application areas

1 information retrieval 2 data mining 3 machine learning 4 statistics 5 pattern recognition 6 computer graphics 7 data compression 8 bioinformatics

5 / 20

slide-18
SLIDE 18

Application areas

1 information retrieval 2 data mining 3 machine learning 4 statistics 5 pattern recognition 6 computer graphics 7 data compression 8 bioinformatics 9 speech recognition.

5 / 20

slide-19
SLIDE 19

Goals of this course

different models for clustering

6 / 20

slide-20
SLIDE 20

Goals of this course

different models for clustering many important clustering heuristics, including agglomerative clustering, Lloyd’s algorithm, and the EM algorithm

6 / 20

slide-21
SLIDE 21

Goals of this course

different models for clustering many important clustering heuristics, including agglomerative clustering, Lloyd’s algorithm, and the EM algorithm the limitations of these heuristics

6 / 20

slide-22
SLIDE 22

Goals of this course

different models for clustering many important clustering heuristics, including agglomerative clustering, Lloyd’s algorithm, and the EM algorithm the limitations of these heuristics improvements to these heuristics

6 / 20

slide-23
SLIDE 23

Goals of this course

different models for clustering many important clustering heuristics, including agglomerative clustering, Lloyd’s algorithm, and the EM algorithm the limitations of these heuristics improvements to these heuristics various theoretical results about clustering, including NP-hardness results and approximation algorithms

6 / 20

slide-24
SLIDE 24

Goals of this course

different models for clustering many important clustering heuristics, including agglomerative clustering, Lloyd’s algorithm, and the EM algorithm the limitations of these heuristics improvements to these heuristics various theoretical results about clustering, including NP-hardness results and approximation algorithms general techniques to improve the efficiency of heuristics and approximation algorithms, i.e. dimension reduction techniques.

6 / 20

slide-25
SLIDE 25

Organization

Information about this course http://www.cs.uni-paderborn.de/fachgebiete/ag- bloemer/lehre/2015/ws/clusteringalgorithms.html Here you find announcements handouts slides literature

7 / 20

slide-26
SLIDE 26

Organization

Information about this course http://www.cs.uni-paderborn.de/fachgebiete/ag- bloemer/lehre/2015/ws/clusteringalgorithms.html Here you find announcements handouts slides literature lecture notes (will be written and appear as course progresses)

7 / 20

slide-27
SLIDE 27

Organization

Information about this course http://www.cs.uni-paderborn.de/fachgebiete/ag- bloemer/lehre/2015/ws/clusteringalgorithms.html Here you find announcements handouts slides literature lecture notes (will be written and appear as course progresses) There is only one tutorial, Thursday 13:00 -14:00. It starts next week.

7 / 20

slide-28
SLIDE 28

Prerequisites

design and analysis of algorithms basic complexity theory probability theory and stochastic some linear algebra

8 / 20

slide-29
SLIDE 29

Objects

  • bjects described by d different features

9 / 20

slide-30
SLIDE 30

Objects

  • bjects described by d different features

features continuous or binary

9 / 20

slide-31
SLIDE 31

Objects

  • bjects described by d different features

features continuous or binary

  • bjects described as elements in Rd or {0, 1}d

9 / 20

slide-32
SLIDE 32

Objects

  • bjects described by d different features

features continuous or binary

  • bjects described as elements in Rd or {0, 1}d
  • bjects from M ⊆ Rd or M ⊆ {0, 1}d

9 / 20

slide-33
SLIDE 33

Distance functions Definition 1.1

D : M × M → R is called a distance function, if for all x, y, z ∈ M D(x, y) = D(y, x) (symmetry) D(x, y) ≥ 0 (positivity),

10 / 20

slide-34
SLIDE 34

Distance functions Definition 1.1

D : M × M → R is called a distance function, if for all x, y, z ∈ M D(x, y) = D(y, x) (symmetry) D(x, y) ≥ 0 (positivity), D is called a metric, if in addition, D(x, y) = 0 ⇔ x = y (reflexivity) D(x, z) ≤ D(x, y) + D(y, z) (triangle inequality)

10 / 20

slide-35
SLIDE 35

Examples Example 1.2 (Euclidean distance)

M = Rd, Dl2(x, y) = x − y2 =

  • d
  • i=1

|xi − yi|2 1

2 ,

where x = (x1, . . . , xd) and y = (y1, . . . , yd).

11 / 20

slide-36
SLIDE 36

Examples Example 1.3 (Squared Euclidean distance)

M = Rd, Dl2

2 (x, y) = x − y2 =

d

  • i=1

|xi − yi|2, where x = (x1, . . . , xd) and y = (y1, . . . , yd).

12 / 20

slide-37
SLIDE 37

Examples Example 1.4 (Minkowski distances, lp-norms)

M = Rd, p ≥ 1, Dlp(x, y) = x − yp =

  • d
  • i=1

|xi − yi|p 1

p . 13 / 20

slide-38
SLIDE 38

Examples Example 1.4 (Minkowski distances, lp-norms)

M = Rd, p ≥ 1, Dlp(x, y) = x − yp =

  • d
  • i=1

|xi − yi|p 1

p .

Example 1.5 (maximum distance)

M = Rd, Dl∞(x, y) = x − y∞ = max

1≤i≤d |xi − yi|.

13 / 20

slide-39
SLIDE 39

Examples Example 1.6 (Pearson correlation)

M = Rd, DPearson(x, y) = 1 2  1 − d

i=1(xi − ¯

x)(yi − ¯ y) d

i=1(xi − ¯

x)2 d

i=1(yi − ¯

y)2   , where ¯ x = 1

d

xi and ¯ y = 1

d

yi.

14 / 20

slide-40
SLIDE 40

Examples Example 1.7 (Mahalanobis divergence)

A ∈ Rd×d positive definite, i.e. xTAx > 0 for x = 0, M = Rd, DA(x, y) = (x − y)TA(x − y)

15 / 20

slide-41
SLIDE 41

Examples Example 1.7 (Mahalanobis divergence)

A ∈ Rd×d positive definite, i.e. xTAx > 0 for x = 0, M = Rd, DA(x, y) = (x − y)TA(x − y)

Example 1.8 (Itakura-Saito divergence)

M = Rd

>0,

DIS(x, y) = xi yi − ln(xi yi ) − 1.

15 / 20

slide-42
SLIDE 42

Examples Example 1.9 (Kullback-Leibler divergence)

M = Sd := {x ∈ Rd : ∀i : xi ≥ 0, xi = 1}, DKLD(x, y) =

  • xi ln(xi/yi),

where by definition 0 · ln(0) = 0.

16 / 20

slide-43
SLIDE 43

Examples Example 1.9 (Kullback-Leibler divergence)

M = Sd := {x ∈ Rd : ∀i : xi ≥ 0, xi = 1}, DKLD(x, y) =

  • xi ln(xi/yi),

where by definition 0 · ln(0) = 0.

Example 1.10 (generalized KLD)

M = Rd

≥0,

DKLD(x, y) =

  • xi ln(xi/yi) − (xi − yi),

16 / 20

slide-44
SLIDE 44

Similarity functions Definition 1.11

S : M × M → R is called a similarity function, if for all x, y, z ∈ M S(x, y) = S(y, x) (symmetry) 0 ≤ S(x, y) ≤ 1 (positivity), S is called a metric, if in addition, S(x, y) = 1 ⇔ x = y (reflexivity) S(x, y)S(y, z) ≤

  • S(x, y) + S(y, z)
  • S(x, z) (triangle

inequality)

17 / 20

slide-45
SLIDE 45

Examples Example 1.12 (Cosine similarity)

M = Rd, SCS(x, y) = xTy xy

  • r

¯ SCS(x, y) =1 + SCS(x, y) 2

18 / 20

slide-46
SLIDE 46

Similarity for binary features

Let x, y ∈ {0, 1}d, then nb¯

b(x, y) :=

  • {1 ≤ i ≤ d : xi = b, yi = ¯

b}

  • and for w ∈ R≥0

Sw(x, y) := n00(x, y) + n11(x, y) n00(x, y) + n11(x, y) + w

  • n01(x, y) + n10(x, y)

.

19 / 20

slide-47
SLIDE 47

Similarity for binary features

Let x, y ∈ {0, 1}d, then nb¯

b(x, y) :=

  • {1 ≤ i ≤ d : xi = b, yi = ¯

b}

  • and for w ∈ R≥0

Sw(x, y) := n00(x, y) + n11(x, y) n00(x, y) + n11(x, y) + w

  • n01(x, y) + n10(x, y)

. Popular: w = 1, 2, 1

2.

19 / 20

slide-48
SLIDE 48

Similarity for binary features

Let x, y ∈ {0, 1}d, then nb¯

b(x, y) :=

  • {1 ≤ i ≤ d : xi = b, yi = ¯

b}

  • and for w ∈ R≥0

Sw(x, y) := n00(x, y) + n11(x, y) n00(x, y) + n11(x, y) + w

  • n01(x, y) + n10(x, y)

. Popular: w = 1, 2, 1

2.

Example 1.13 (matching coefficient)

w = 1, Smc(x, y) = n00(x, y) + n11(x, y) d .

19 / 20

slide-49
SLIDE 49

Similarity for binary features

¯ Sw(x, y) := n11(x, y) n11(x, y) + w

  • n01(x, y) + n10(x, y)
  • 20 / 20
slide-50
SLIDE 50

Similarity for binary features

¯ Sw(x, y) := n11(x, y) n11(x, y) + w

  • n01(x, y) + n10(x, y)
  • Popular: w = 1, 2, 1

2.

20 / 20

slide-51
SLIDE 51

Similarity for binary features

¯ Sw(x, y) := n11(x, y) n11(x, y) + w

  • n01(x, y) + n10(x, y)
  • Popular: w = 1, 2, 1

2.

Example 1.14 (Jaccard coefficient)

w = 1, SJacard(x, y) = n11(x, y) n11(x, y) + n01(x, y) + n10(x, y).

20 / 20