Hierarchical clustering David M. Blei COS424 Princeton University - - PowerPoint PPT Presentation

hierarchical clustering
SMART_READER_LITE
LIVE PREVIEW

Hierarchical clustering David M. Blei COS424 Princeton University - - PowerPoint PPT Presentation

Hierarchical clustering David M. Blei COS424 Princeton University February 28, 2008 D. Blei Clustering 02 1 / 21 Hierarchical clustering Hierarchical clustering is a widely used data analysis tool. D. Blei Clustering 02 2 / 21


slide-1
SLIDE 1

Hierarchical clustering

David M. Blei

COS424 Princeton University

February 28, 2008

  • D. Blei

Clustering 02 1 / 21

slide-2
SLIDE 2

Hierarchical clustering

  • Hierarchical clustering is a widely used data analysis tool.
  • D. Blei

Clustering 02 2 / 21

slide-3
SLIDE 3

Hierarchical clustering

  • Hierarchical clustering is a widely used data analysis tool.
  • The idea is to build a binary tree of the data that successively

merges similar groups of points

  • D. Blei

Clustering 02 2 / 21

slide-4
SLIDE 4

Hierarchical clustering

  • Hierarchical clustering is a widely used data analysis tool.
  • The idea is to build a binary tree of the data that successively

merges similar groups of points

  • Visualizing this tree provides a useful summary of the data
  • D. Blei

Clustering 02 2 / 21

slide-5
SLIDE 5

Hierarchical clusering vs. k-means

  • Recall that k-means or k-medoids requires
  • D. Blei

Clustering 02 3 / 21

slide-6
SLIDE 6

Hierarchical clusering vs. k-means

  • Recall that k-means or k-medoids requires
  • A number of clusters k
  • D. Blei

Clustering 02 3 / 21

slide-7
SLIDE 7

Hierarchical clusering vs. k-means

  • Recall that k-means or k-medoids requires
  • A number of clusters k
  • An initial assignment of data to clusters
  • D. Blei

Clustering 02 3 / 21

slide-8
SLIDE 8

Hierarchical clusering vs. k-means

  • Recall that k-means or k-medoids requires
  • A number of clusters k
  • An initial assignment of data to clusters
  • A distance measure between data d(xn, xm)
  • D. Blei

Clustering 02 3 / 21

slide-9
SLIDE 9

Hierarchical clusering vs. k-means

  • Recall that k-means or k-medoids requires
  • A number of clusters k
  • An initial assignment of data to clusters
  • A distance measure between data d(xn, xm)
  • Hierarchical clustering only requires a measure of similarity between

groups of data points.

  • D. Blei

Clustering 02 3 / 21

slide-10
SLIDE 10

Agglomerative clustering

  • We will talk about agglomerative clustering.
  • D. Blei

Clustering 02 4 / 21

slide-11
SLIDE 11

Agglomerative clustering

  • We will talk about agglomerative clustering.
  • Algorithm:
  • D. Blei

Clustering 02 4 / 21

slide-12
SLIDE 12

Agglomerative clustering

  • We will talk about agglomerative clustering.
  • Algorithm:

1 Place each data point into its own singleton group

  • D. Blei

Clustering 02 4 / 21

slide-13
SLIDE 13

Agglomerative clustering

  • We will talk about agglomerative clustering.
  • Algorithm:

1 Place each data point into its own singleton group 2 Repeat: iteratively merge the two closest groups

  • D. Blei

Clustering 02 4 / 21

slide-14
SLIDE 14

Agglomerative clustering

  • We will talk about agglomerative clustering.
  • Algorithm:

1 Place each data point into its own singleton group 2 Repeat: iteratively merge the two closest groups 3 Until: all the data are merged into a single cluster

  • D. Blei

Clustering 02 4 / 21

slide-15
SLIDE 15

Example

  • 20

40 60 80 −20 20 40 60 80

Data

  • D. Blei

Clustering 02 5 / 21

slide-16
SLIDE 16

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 001

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-17
SLIDE 17

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 002

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-18
SLIDE 18

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 003

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-19
SLIDE 19

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 004

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-20
SLIDE 20

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 005

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-21
SLIDE 21

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 006

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-22
SLIDE 22

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 007

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-23
SLIDE 23

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 008

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-24
SLIDE 24

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 009

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-25
SLIDE 25

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 010

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-26
SLIDE 26

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 011

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-27
SLIDE 27

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 012

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-28
SLIDE 28

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 013

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-29
SLIDE 29

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 014

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-30
SLIDE 30

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 015

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-31
SLIDE 31

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 016

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-32
SLIDE 32

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 017

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-33
SLIDE 33

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 018

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-34
SLIDE 34

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 019

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-35
SLIDE 35

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 020

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-36
SLIDE 36

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 021

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-37
SLIDE 37

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 022

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-38
SLIDE 38

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 023

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-39
SLIDE 39

Example

  • 20

40 60 80 −20 20 40 60 80

iteration 024

V1 V2

  • D. Blei

Clustering 02 5 / 21

slide-40
SLIDE 40

Agglomerative clustering

  • Each level of the resulting tree is a segmentation of the data
  • D. Blei

Clustering 02 6 / 21

slide-41
SLIDE 41

Agglomerative clustering

  • Each level of the resulting tree is a segmentation of the data
  • The algorithm results in a sequence of groupings
  • D. Blei

Clustering 02 6 / 21

slide-42
SLIDE 42

Agglomerative clustering

  • Each level of the resulting tree is a segmentation of the data
  • The algorithm results in a sequence of groupings
  • It is up to the user to choose a ”natural” clustering from this

sequence

  • D. Blei

Clustering 02 6 / 21

slide-43
SLIDE 43

Dendrogram

  • Agglomerative clustering is monotonic
  • D. Blei

Clustering 02 7 / 21

slide-44
SLIDE 44

Dendrogram

  • Agglomerative clustering is monotonic
  • The similarity between merged clusters is monotone decreasing

with the level of the merge.

  • D. Blei

Clustering 02 7 / 21

slide-45
SLIDE 45

Dendrogram

  • Agglomerative clustering is monotonic
  • The similarity between merged clusters is monotone decreasing

with the level of the merge.

  • Dendrogram: Plot each merge at the (negative) similarity between

the two merged groups

  • D. Blei

Clustering 02 7 / 21

slide-46
SLIDE 46

Dendrogram

  • Agglomerative clustering is monotonic
  • The similarity between merged clusters is monotone decreasing

with the level of the merge.

  • Dendrogram: Plot each merge at the (negative) similarity between

the two merged groups

  • Provides an interpretable visualization of the algorithm and data
  • D. Blei

Clustering 02 7 / 21

slide-47
SLIDE 47

Dendrogram

  • Agglomerative clustering is monotonic
  • The similarity between merged clusters is monotone decreasing

with the level of the merge.

  • Dendrogram: Plot each merge at the (negative) similarity between

the two merged groups

  • Provides an interpretable visualization of the algorithm and data
  • Useful summarization tool, part of why hierarchical clustering is

popular

  • D. Blei

Clustering 02 7 / 21

slide-48
SLIDE 48

Dendrogram of example data

2904 2432 2641 2489 2278 2905 2085 2959 2743 2797 2314 2282 2425 2024 1455 1723 1622 1616 244 851 220 81 184 252 477 20 40 60 80 100 120

Cluster Dendrogram

hclust (*, "complete") dist(x) Height

Groups that merge at high values relative to the merger values of their subgroups are candidates for natural clusters. (Tibshirani et al., 2001)

  • D. Blei

Clustering 02 8 / 21

slide-49
SLIDE 49

Group similarity

  • Given a distance measure between points, the user has many choices

for how to define intergroup similarity.

  • D. Blei

Clustering 02 9 / 21

slide-50
SLIDE 50

Group similarity

  • Given a distance measure between points, the user has many choices

for how to define intergroup similarity.

  • Three most popular choices
  • D. Blei

Clustering 02 9 / 21

slide-51
SLIDE 51

Group similarity

  • Given a distance measure between points, the user has many choices

for how to define intergroup similarity.

  • Three most popular choices
  • Single-linkage: the similarity of the closest pair

dSL(G, H) = min

i∈G,j∈H di,j

  • D. Blei

Clustering 02 9 / 21

slide-52
SLIDE 52

Group similarity

  • Given a distance measure between points, the user has many choices

for how to define intergroup similarity.

  • Three most popular choices
  • Single-linkage: the similarity of the closest pair

dSL(G, H) = min

i∈G,j∈H di,j

  • Complete linkage: the similarity of the furthest pair

dCL(G, H) = max

i∈G,j∈H di,j

  • D. Blei

Clustering 02 9 / 21

slide-53
SLIDE 53

Group similarity

  • Given a distance measure between points, the user has many choices

for how to define intergroup similarity.

  • Three most popular choices
  • Single-linkage: the similarity of the closest pair

dSL(G, H) = min

i∈G,j∈H di,j

  • Complete linkage: the similarity of the furthest pair

dCL(G, H) = max

i∈G,j∈H di,j

  • Group average: the average similarity between groups

dGA = 1 NGNH

  • i∈G
  • j∈H

di,j

  • D. Blei

Clustering 02 9 / 21

slide-54
SLIDE 54

Properties of intergroup similarity

  • Single linkage can produce “chaining,” where a sequence of close
  • bservations in different groups cause early merges of those groups
  • D. Blei

Clustering 02 10 / 21

slide-55
SLIDE 55

Properties of intergroup similarity

  • Single linkage can produce “chaining,” where a sequence of close
  • bservations in different groups cause early merges of those groups
  • Complete linkage has the opposite problem. It might not merge

close groups because of outlier members that are far apart.

  • D. Blei

Clustering 02 10 / 21

slide-56
SLIDE 56

Properties of intergroup similarity

  • Single linkage can produce “chaining,” where a sequence of close
  • bservations in different groups cause early merges of those groups
  • Complete linkage has the opposite problem. It might not merge

close groups because of outlier members that are far apart.

  • Group average represents a natural compromise, but depends on the

scale of the similarities. Applying a monotone transformation to the similarities can change the results.

  • D. Blei

Clustering 02 10 / 21

slide-57
SLIDE 57

Caveats

  • Hierarchical clustering should be treated with caution.
  • D. Blei

Clustering 02 11 / 21

slide-58
SLIDE 58

Caveats

  • Hierarchical clustering should be treated with caution.
  • Different decisions about group similarities can lead to vastly

different dendrograms.

  • D. Blei

Clustering 02 11 / 21

slide-59
SLIDE 59

Caveats

  • Hierarchical clustering should be treated with caution.
  • Different decisions about group similarities can lead to vastly

different dendrograms.

  • The algorithm imposes a hierarchical structure on the data, even

data for which such structure is not appropriate.

  • D. Blei

Clustering 02 11 / 21

slide-60
SLIDE 60

Examples

  • “Repeated Observation of Breast Tumor Subtypes in Independent

Gene Expression Data Sets” (Sorlie et al., 2003)

  • D. Blei

Clustering 02 12 / 21

slide-61
SLIDE 61

Examples

  • “Repeated Observation of Breast Tumor Subtypes in Independent

Gene Expression Data Sets” (Sorlie et al., 2003)

  • Hierarchical clustering of gene expression data lead to new theories
  • D. Blei

Clustering 02 12 / 21

slide-62
SLIDE 62

Examples

  • “Repeated Observation of Breast Tumor Subtypes in Independent

Gene Expression Data Sets” (Sorlie et al., 2003)

  • Hierarchical clustering of gene expression data lead to new theories
  • Later, theories tested in the lab.
  • D. Blei

Clustering 02 12 / 21

slide-63
SLIDE 63
  • D. Blei

Clustering 02 13 / 21

slide-64
SLIDE 64

Examples

  • “The Balance of Roger de Piles” (Studdert-Kennedy and Davenport,

1974)

  • D. Blei

Clustering 02 14 / 21

slide-65
SLIDE 65

Examples

  • “The Balance of Roger de Piles” (Studdert-Kennedy and Davenport,

1974)

  • Roger de Piles rated 57 paintings along different dimensions.
  • D. Blei

Clustering 02 14 / 21

slide-66
SLIDE 66

Examples

  • “The Balance of Roger de Piles” (Studdert-Kennedy and Davenport,

1974)

  • Roger de Piles rated 57 paintings along different dimensions.
  • These authors cluster them using different methods, including

hierarchical clustering

  • D. Blei

Clustering 02 14 / 21

slide-67
SLIDE 67

Examples

  • “The Balance of Roger de Piles” (Studdert-Kennedy and Davenport,

1974)

  • Roger de Piles rated 57 paintings along different dimensions.
  • These authors cluster them using different methods, including

hierarchical clustering

  • They discuss the different clusters. (They are art critics.)
  • D. Blei

Clustering 02 14 / 21

slide-68
SLIDE 68

Good: They are cautious. “The value of this analysis...will depend on any interesting speculation it may provoke.”

  • D. Blei

Clustering 02 15 / 21

slide-69
SLIDE 69

Examples

  • “Similarity Grouping of Australian Universities” (Stanley and

Reynlds, 1994)

  • D. Blei

Clustering 02 16 / 21

slide-70
SLIDE 70

Examples

  • “Similarity Grouping of Australian Universities” (Stanley and

Reynlds, 1994)

  • Use hierarchical clustering on Austrailian universities
  • D. Blei

Clustering 02 16 / 21

slide-71
SLIDE 71

Examples

  • “Similarity Grouping of Australian Universities” (Stanley and

Reynlds, 1994)

  • Use hierarchical clustering on Austrailian universities
  • Use features such as
  • D. Blei

Clustering 02 16 / 21

slide-72
SLIDE 72

Examples

  • “Similarity Grouping of Australian Universities” (Stanley and

Reynlds, 1994)

  • Use hierarchical clustering on Austrailian universities
  • Use features such as
  • # of staff in different departments
  • D. Blei

Clustering 02 16 / 21

slide-73
SLIDE 73

Examples

  • “Similarity Grouping of Australian Universities” (Stanley and

Reynlds, 1994)

  • Use hierarchical clustering on Austrailian universities
  • Use features such as
  • # of staff in different departments
  • entry scores
  • D. Blei

Clustering 02 16 / 21

slide-74
SLIDE 74

Examples

  • “Similarity Grouping of Australian Universities” (Stanley and

Reynlds, 1994)

  • Use hierarchical clustering on Austrailian universities
  • Use features such as
  • # of staff in different departments
  • entry scores
  • funding
  • D. Blei

Clustering 02 16 / 21

slide-75
SLIDE 75

Examples

  • “Similarity Grouping of Australian Universities” (Stanley and

Reynlds, 1994)

  • Use hierarchical clustering on Austrailian universities
  • Use features such as
  • # of staff in different departments
  • entry scores
  • funding
  • evaluations
  • D. Blei

Clustering 02 16 / 21

slide-76
SLIDE 76
  • D. Blei

Clustering 02 17 / 21

slide-77
SLIDE 77
  • D. Blei

Clustering 02 18 / 21

slide-78
SLIDE 78
  • Split values: They notice that there’s no kink and conclude that

there is no cluster structure in Austrailian universities.

  • Good: Cautious interpretation of clustering, analysis of clustering

based on multiple subsets of the features.

  • Bad: Their conclusions—we can’t cluster Australian

universities—ignores all the algorithmic choices that were made.

  • D. Blei

Clustering 02 19 / 21

slide-79
SLIDE 79

Examples

  • “Comovement of International Equity Markets: A Taxonomic

Approach” (Panton et al., 1976)

  • D. Blei

Clustering 02 20 / 21

slide-80
SLIDE 80

Examples

  • “Comovement of International Equity Markets: A Taxonomic

Approach” (Panton et al., 1976)

  • Data: weekly rates of return for stocks in twelve countries
  • D. Blei

Clustering 02 20 / 21

slide-81
SLIDE 81

Examples

  • “Comovement of International Equity Markets: A Taxonomic

Approach” (Panton et al., 1976)

  • Data: weekly rates of return for stocks in twelve countries
  • Run agglometerative clustering year by year
  • D. Blei

Clustering 02 20 / 21

slide-82
SLIDE 82

Examples

  • “Comovement of International Equity Markets: A Taxonomic

Approach” (Panton et al., 1976)

  • Data: weekly rates of return for stocks in twelve countries
  • Run agglometerative clustering year by year
  • Interpret the structure and examine stability over different time

periods

  • D. Blei

Clustering 02 20 / 21

slide-83
SLIDE 83

Examples

Good: Cautious. “This study is only descriptive...A logical subsequent research area is to explain observed structural properties and the causes

  • f structural change.”
  • D. Blei

Clustering 02 21 / 21