Lecture 21: Unsupervised Learning and Clustering Algorithms Dr. - - PowerPoint PPT Presentation

lecture 21 unsupervised learning and clustering algorithms
SMART_READER_LITE
LIVE PREVIEW

Lecture 21: Unsupervised Learning and Clustering Algorithms Dr. - - PowerPoint PPT Presentation

Lecture 21: Unsupervised Learning and Clustering Algorithms Dr. Chengjiang Long Computer Vision Researcher at Kitware Inc. Adjunct Professor at RPI. Email: longc3@rpi.edu Recap Previous Lecture 2 C. Long Lecture 21 April 20, 2018 Outline


slide-1
SLIDE 1

Lecture 21: Unsupervised Learning and Clustering Algorithms

  • Dr. Chengjiang Long

Computer Vision Researcher at Kitware Inc. Adjunct Professor at RPI. Email: longc3@rpi.edu

slide-2
SLIDE 2
  • C. Long

Lecture 21 April 20, 2018 2

Recap Previous Lecture

slide-3
SLIDE 3
  • C. Long

Lecture 21 April 20, 2018 3

Outline

  • Introduce Unsupervised Learning and Clustering
  • K-means Algorithm
  • Hierarchy Clustering
  • Applications
slide-4
SLIDE 4
  • C. Long

Lecture 21 April 20, 2018 4

Outline

  • Introduce Unsupervised Learning and Clustering
  • K-means Algorithm
  • Hierarchy Clustering
  • Applications
slide-5
SLIDE 5
  • C. Long

Lecture 21 April 20, 2018 5

Unsupervised learning and clustering

All data is labeled and the algorithms learn to predict the output from the input data. All data is unlabeled and the algorithms learn the inherent structure from the input data.

slide-6
SLIDE 6
  • C. Long

Lecture 21 April 20, 2018 6

Unsupervised learning and clustering

G o a l : t o m o d e l t h e underlying structure or distribution in the input data.

slide-7
SLIDE 7
  • C. Long

Lecture 21 April 20, 2018 7

What is clustering?

slide-8
SLIDE 8
  • C. Long

Lecture 21 April 20, 2018 8

What is clustering for?

E.g.1: group people of similiar size together to make S,M,L T-shirts. E.g.2: segment customers to do targeted marketing. E.g.3: orgnize documents to produce a topic hierarchy.

slide-9
SLIDE 9
  • C. Long

Lecture 21 April 20, 2018 9

What is clustering for?

slide-10
SLIDE 10
  • C. Long

Lecture 21 April 20, 2018 10

Clustering evaluation

Clustering is hard to evaluate. In most applications, expert judgements are still the key.

slide-11
SLIDE 11
  • C. Long

Lecture 21 April 20, 2018 11

Data Clustering - Formal Definition

  • Given a set of N unlabeled examples D = x1, x2, ..., xN in

a d-dimensional feature space, D is partitioned into a number of disjoint subsets Dj’s:

  • A partition is denoted by:

and the problem of data clustering is thus formulated as where f(·) is formulated according to a given criterion.

slide-12
SLIDE 12
  • C. Long

Lecture 21 April 20, 2018 12

Outline

  • Introduce Unsupervised Learning and Clustering
  • K-means Algorithm
  • Hierarchy Clustering
  • Applications
slide-13
SLIDE 13
  • C. Long

Lecture 21 April 20, 2018 13

K-means

slide-14
SLIDE 14
  • C. Long

Lecture 21 April 20, 2018 14

K-means: an example

slide-15
SLIDE 15
  • C. Long

Lecture 21 April 20, 2018 15

K-means: an example

slide-16
SLIDE 16
  • C. Long

Lecture 21 April 20, 2018 16

1-st iteration

K-means: an example

slide-17
SLIDE 17
  • C. Long

Lecture 21 April 20, 2018 17

1-st iteration

K-means: an example

slide-18
SLIDE 18
  • C. Long

Lecture 21 April 20, 2018 18

2-nd iteration

K-means: an example

slide-19
SLIDE 19
  • C. Long

Lecture 21 April 20, 2018 19

2-nd iteration

K-means: an example

slide-20
SLIDE 20
  • C. Long

Lecture 21 April 20, 2018 20

3-rd iteration

K-means: an example

slide-21
SLIDE 21
  • C. Long

Lecture 21 April 20, 2018 21

3-rd iteration

K-means: an example

slide-22
SLIDE 22
  • C. Long

Lecture 21 April 20, 2018 22

No changes: Done

K-means: an example

slide-23
SLIDE 23
  • C. Long

Lecture 21 April 20, 2018 23

Iterate:

 Assign/cluster each example to closest center  Recalculate centers as the mean of the points in a cluster

K-means

How do we do this?

iterate over each point:

  • get distance to each cluster center
  • assign to closest center (hard cluster)
slide-24
SLIDE 24
  • C. Long

Lecture 21 April 20, 2018 24

Iterate:

 Assign/cluster each example to closest center  Recalculate centers as the mean of the points in a cluster

K-means

What distance measure should we use?

iterate over each point:

  • get distance to each cluster center
  • assign to closest center (hard cluster)
slide-25
SLIDE 25
  • C. Long

Lecture 21 April 20, 2018 25

Iterate:

 Assign/cluster each example to closest center  Recalculate centers as the mean of the points in a cluster

K-means

What distance measure should we use?

good for spatial data

slide-26
SLIDE 26
  • C. Long

Lecture 21 April 20, 2018 26

Euclidean Distance

  • Euclidean Distance

å

=

  • =

n k k k

q p dist

1 2

) (

Where n is the number of dimensions (attributes) and pk and qk are, respectively, the kth attributes (components) or data objects p and q.

  • Standardization is necessary, if scales differ.
slide-27
SLIDE 27
  • C. Long

Lecture 21 April 20, 2018 27

Minkowski Distance

  • Minkowski Distance is a generalization of Euclidean

Distance

Where r is a parameter, n is the number of dimensions (attributes) and pk and qk are, respectively, the k-th attributes (components) or data objects p and q.

slide-28
SLIDE 28
  • C. Long

Lecture 21 April 20, 2018 28

Euclidean Distance

slide-29
SLIDE 29
  • C. Long

Lecture 21 April 20, 2018 29

More about Euclidean distance

slide-30
SLIDE 30
  • C. Long

Lecture 21 April 20, 2018 30

Manhattan Distance

  • Manhattan distance represents distance that is

measured along directions that are parallel to the x and y axes

  • Manhattan distance between two n-dimensional

vectors x=(x1,x2,…,xn) and y=(y1,y2,…,yn) is:

å

=

  • =
  • +

+

  • +
  • =

n i i i n n M

y x y x y x y x y x d

1 2 2 1 1

) , ( 

Where represents the absolute value of the difference betweeen xi and yi

slide-31
SLIDE 31
  • C. Long

Lecture 21 April 20, 2018 31

Minkowski Distance: Examples

slide-32
SLIDE 32
  • C. Long

Lecture 21 April 20, 2018 32

Minkowski Distance

slide-33
SLIDE 33
  • C. Long

Lecture 21 April 20, 2018 33

K-means

Iterate:

  • Assign-cluster each example to closest center
  • Recalculate centers as the mean of the points in a cluster

Where are the cluster centers?

slide-34
SLIDE 34
  • C. Long

Lecture 21 April 20, 2018 34

K-means

Iterate:

  • Assign-cluster each example to closest center
  • Recalculate centers as the mean of the points in a cluster

How do we calculate these?

slide-35
SLIDE 35
  • C. Long

Lecture 21 April 20, 2018 35

K-means

Iterate:

  • Assign-cluster each example to closest center
  • Recalculate centers as the mean of the points in a cluster
slide-36
SLIDE 36
  • C. Long

Lecture 21 April 20, 2018 36

Pros and cons of K-means

Weakneses:

The user needs to specify the value of K.

Applicable only when mean is defined.

The algorithm is sensitive to the initial seeds.

The algorithm is sensitive to outliers.

 Outliers are data points that are very far away from other data points.  Outliers could be errors in the data recording or some special data points

with very different values.

slide-37
SLIDE 37
  • C. Long

Lecture 21 April 20, 2018 37

Failure case

slide-38
SLIDE 38
  • C. Long

Lecture 21 April 20, 2018 38

Sensitive to initial seeds

slide-39
SLIDE 39
  • C. Long

Lecture 21 April 20, 2018 39

Sensitive to outliers

  • utlier
  • utlier
slide-40
SLIDE 40
  • C. Long

Lecture 21 April 20, 2018 40

Application to visual object recognition: Bag of Words

slide-41
SLIDE 41
  • C. Long

Lecture 21 April 20, 2018 41

Application to visual object recognition: Bag of Words

Vector quantize descriptors from a set of training images using k- means Image representation: a normalized histogram of visual words.

slide-42
SLIDE 42
  • C. Long

Lecture 21 April 20, 2018 42

Application to visual object recognition: Bag of Words

The same visual word

slide-43
SLIDE 43
  • C. Long

Lecture 21 April 20, 2018 43

Application to visual object recognition: Bag of Words

slide-44
SLIDE 44
  • C. Long

Lecture 21 April 20, 2018 44

Summary

slide-45
SLIDE 45
  • C. Long

Lecture 21 April 20, 2018 45

Outline

  • Introduce Unsupervised Learning and Clustering
  • K-means Algorithm
  • Hierarchy Clustering
  • Applications
slide-46
SLIDE 46
  • C. Long

Lecture 21 April 20, 2018 46

Hierarchical Clustering

  • Up to now, considered “flat” clustering
  • For some data, hierarchical clustering is more

appropriate than “flat” clustering

  • Hierarchical clustering
slide-47
SLIDE 47
  • C. Long

Lecture 21 April 20, 2018 47

Hierarchical Clustering: Biological Taxonomy

slide-48
SLIDE 48
  • C. Long

Lecture 21 April 20, 2018 48

Hierarchical Clustering: Dendogram

  • Prefered way to represent a hierarchical clustering is a

dendogram

_x0001_ Binary tree _x0001_ Level k corresponds to partitioning with n-k+1 clusters _x0001_ if need k clusters, take clustering from level n-k+1 _x0001_ If samples are in the same cluster at level k, they stay in the same cluster at higher levels _x0001_ dendogram typically shows the similarity of grouped clusters

slide-49
SLIDE 49
  • C. Long

Lecture 21 April 20, 2018 49

Hierarchical Clustering: Venn Diagram

  • Can also use Venn diagram to show hierarchical

clustering, but similarity is not represented quantitatively

slide-50
SLIDE 50
  • C. Long

Lecture 21 April 20, 2018 50

Hierarchical Clustering

  • Algorithms for hierarchical clustering can be divided

into two types:

  • 1. Agglomerative (bottom up) procedures

_x0001_ Start with n singleton clusters _x0001_ Form hierarchy by merging most similar clusters

  • 2. Divisive (top bottom) procedures

_x0001_ Start with all samples in one cluster _x0001_ Form hierarchy by splitting the “worst” clusters

slide-51
SLIDE 51
  • C. Long

Lecture 21 April 20, 2018 51

Divisive Hierarchical Clustering

  • Any “flat” algorithm which produces a fixed number
  • f clusters can be used

set c = 2

slide-52
SLIDE 52
  • C. Long

Lecture 21 April 20, 2018 52

Agglomerative Hierarchical Clustering

  • Initialize with each example in singleton cluster while

there is more than 1 cluster

  • 1. find 2 nearest clusters
  • 2. merge them
  • Four common ways to measure cluster distance
slide-53
SLIDE 53
  • C. Long

Lecture 21 April 20, 2018 53

Single Linkage or Nearest Neighbor

  • Agglomerative clustering with minimum distance
  • generates minimum spanning tree
  • encourages growth of elongated clusters
  • disadvantage: very sensitive to noise
slide-54
SLIDE 54
  • C. Long

Lecture 21 April 20, 2018 54

Complete Linkage or Farthest Neighbor

  • Agglomerative clustering with maximum distance
  • Encourages compact clusters
  • Does not work well if elongated clusters present
slide-55
SLIDE 55
  • C. Long

Lecture 21 April 20, 2018 55

Average and Mean Agglomerative Clustering

  • Agglomerative clustering is more robust under the

average or the mean cluster distance

  • Mean distance is cheaper to compute than the average

distance

  • Unfortunately, there is not much to say about

agglomerative clustering theoretically, but it does work reasonably well in practice

slide-56
SLIDE 56
  • C. Long

Lecture 21 April 20, 2018 56

Agglomerative vs. Divisive

  • Agglomerative is faster to compute, in general
  • Divisive may be less “blind” to the global structure
  • f the data
slide-57
SLIDE 57
  • C. Long

Lecture 21 April 20, 2018 57

Outline

  • Introduce Unsupervised Learning and Clustering
  • K-means Algorithm
  • Hierarchy Clustering
  • Applications
slide-58
SLIDE 58
  • C. Long

Lecture 21 April 20, 2018 58

Application of Clustering

  • John Snow, a London physician plotted the location of

cholera deaths on a map during an outbreak in the 1850s.

  • The locations indicated that cases were clustered

around certain intersections where there were polluted wells -- thus exposing both the problem and the solution.

From: Nina Mishra HP Labs

slide-59
SLIDE 59
  • C. Long

Lecture 21 April 20, 2018 59

Application of Clustering

  • Astronomy
  • SkyCat: Clustered 2x109 sky objects into stars, galaxies,

quasars, etc based on radiation emitted in different spectrum bands.

slide-60
SLIDE 60
  • C. Long

Lecture 21 April 20, 2018 60

Applications of Clustering

  • Image segmentation
  • Find interesting “objects” in images to focus attention at
slide-61
SLIDE 61
  • C. Long

Lecture 21 April 20, 2018 61

Applications of Clustering

  • Image Database Organization for efficient search
slide-62
SLIDE 62
  • C. Long

Lecture 21 April 20, 2018 62

Applications of Clustering

  • Data Mining
  • Technology watch

_x0001_ Derwent Database, contains all patents filed in the last 10 years worldwide _x0001_ Searching by keywords leads to thousands of documents _x0001_ Find clusters in the database and find if there are any emerging technologies and what competition is up to

  • Marketing

_x0001_ Customer database _x0001_ Find clusters of customers and tailor marketing schemes to them

slide-63
SLIDE 63
  • C. Long

Lecture 21 April 20, 2018 63

Applications of Clustering

  • Profiling Web Users
  • Use web access logs to generate a feature vector for each

user

  • Cluster users based on their feature vectors
  • Identify common goals for users

_x0001_ Shopping _x0001_ Job Seekers _x0001_ Product Seekers _x0001_ Tutorials Seekers

  • Can use clustering results to improving web content and

design

slide-64
SLIDE 64
  • C. Long

Lecture 21 April 20, 2018 64