Lecture 22: Clustering Distance measures K-Means Aykut Erdem May - PowerPoint PPT Presentation

Lecture 22: − Clustering − Distance measures − K-Means Aykut Erdem May 2016 Hacettepe University

Last time… Boosting • Idea: given a weak learner, run it multiple times on (reweighted) training data, then let the learned classifiers vote • On each iteration t : - weight each training example by how incorrectly it was classified Learn a hypothesis – h t - A strength for this hypothesis – a t - • Final classifier: - A linear combination of the votes of the di ff erent classifiers weighted by their strength slide by Aarti Singh & Barnabas Poczos • Practically useful • Theoretically interesting 2

3 Last time.. The AdaBoost Algorithm slide by Jiri Matas and Jan Š ochman

This week • Clustering • Distance measures • K-Means • Spectral clustering • Hierarchical clustering • What is a good clustering? 4

Distance measures 5

Distance measures • In studying clustering techniques we will assume that we are given a matrix of distances between all pairs of data points: x x x x x 1 2 3 4 m x 1 x 2 x 3 d(x , x ) x i j 4 • • • • slide by Julia Hockenmeier • • x m 6

What is Similarity/Dissimilarity? Hard to define! But we know it when we see it • The real meaning of similarity is a philosophical question. We will take � a more pragmatic approach. � • Depends on representation and algorithm. For many rep.//alg., easier to think in terms of a distance (rather than similarity) between vectors. slide by Eric Xing 7

Defining Distance Measures • Definition: Let O 1 and O 2 be two objects from the universe of possible objects. The distance (dissimilarity) between O 1 and O 2 is a real number denoted by D( O 1 , O 2 ). gene1 gene2 slide by Andrew Moore 0.23 3 342.7 8

A few examples: Euclidean distance • � d ( x , y ) � ( x i � y i ) 2 � • ance • � � i Correlation coefficient • • • • Similarity rather than distance • • • Can determine similar trends • � • coefficient slide by Andrew Moore � � � � � ( x i � � x )( y i � � y ) ฀ � s ( x , y ) � ฀ � i � � x � y 9 � � ฀ � ฀ �

What properties should a distance measure have? • Symmetric - D( A , B ) = D( B , A ) - Otherwise, we can say A looks like B but B does not look like A • Positivity, and self-similarity - D( A , B ) ≥ 0, and D( A , B ) = 0 i ff A = B - Otherwise there will di ff erent objects that we cannot tell apart • Triangle inequality - D( A , B ) + D( B , C ) ≥ D( A , C ) - Otherwise one can say “ A is like B , B is like C , but A is not slide by Alan Fern like C at all” 10

      Distance measures • Euclidean (L 2 )   idean (L 2 ) d ( x i − y i ) 2 ∑ d ( x , y ) = i = 1 hattan (L ) • Manhattan (L 1 )   hattan (L 1 ) d d ( x , y ) = x - y = ∑ x i − y i i = 1 ity (Sup) Distance L • Infinity (Sup) Distance L ∞  ity (Sup) Distance L ∞ d ( x , y ) = max 1 ≤ i ≤ d x i − y i slide by Julia Hockenmeier • Note that L ∞ < L 1 < L 2 , but di ff erent distances do not induce the same ordering on points. 11

Distance measures x = (x 1 , x 2 ) y = (x 1 –2, x 2 +4) Euclidean: (4 2 + 2 2 ) 1/2 = 4.47 Manhattan: 4 + 2 = 6 Sup: Max (4,2) = 4 4 slide by Julia Hockenmeier 2 12

Distance measures • Di ff erent distances do not induce the same ordering on points L (a, b) 5 = ∞ 2 2 1/2 L (a, b) (5 ) 5 = + ε = + ε 2 L (c, d) 4 = ∞ 2 2 1/2 4 L (c, d) (4 4 ) 4 2 5 . 66 = + = = 2 5 L (c, d) L (a, b) < slide by Julia Hockenmeier ∞ ∞ L (c, d) L (a, b) > 4 2 2 9 13

Distance measures • Clustering is sensitive to the distance measure. • Sometimes it is beneficial to use a distance measure that is invariant to transformations that are natural to the problem: - Mahalanobis distance: ✓ Shift and scale invariance slide by Julia Hockenmeier 14

Mahalanobis Distance ( x - y ) T Σ ( x − y ) d ( x , y ) = Σ is a (symmetric) Covariance Matrix: µ = 1 m ∑ x i , (average of the data) m i = 1 Σ = 1 m ( x − µ )( x − µ ) T , ∑ a matrix of size m × m m i = 1 Translates all the axes to a mean = 0 and slide by Julia Hockenmeier variance = 1 (shift and scale invariance) 15

Distance measures • Some algorithms require distances between a point x and a set of points A d(x, A)   This might be defined e.g. as min/max/avg distance between x and any point in A.   • Others require distances between two sets of points A, B, d(A, B).   This might be defined e.g as min/max/avg distance between any point in A and any point in B. slide by Julia Hockenmeier 16

� � � � � � � � Clustering algorithms • Partitioning algorithms � � %; - Construct various partitions   � and then evaluate them by   � � some criterion � • K-means • Mixture of Gaussians � • Spectral Clustering • Hierarchical algorithms � � - Create a hierarchical decomposition   � � of the set of objects using some   � � � criterion - Bottom-up – agglomerative - Top-down – divisive slide by Eric Xing � � 17 � � � �

Desirable Properties of a Clustering Algorithm • Scalability (in terms of both time and space) • Ability to deal with di ff erent data types • Minimal requirements for domain knowledge to determine input parameters • Ability to deal with noisy data • Interpretability and usability • Optional slide by Andrew Moore - Incorporation of user-specified constraints 18

K-Means 19

K-Means • An iterative clustering algorithm - Initialize: Pick K random points as cluster centers (means) - Alternate: Assign data instances • to closest mean Assign each mean to • the average of its assigned points - Stop when no points’ slide by David Sontag assignments change 20

K-Means • An iterative clustering algorithm - Initialize: Pick K random points as cluster centers (means) - Alternate: Assign data instances • to closest mean Assign each mean to • the average of its assigned points - Stop when no points’ slide by David Sontag assignments change 21

K-Means Clustering: Example • Pick K random points as cluster centers (means) Shown here for K=2 slide by David Sontag 22

K-Means Clustering: Example Iterative Step 1 • Assign data points to closest cluster centers slide by David Sontag 23

K-Means Clustering: Example Iterative Step 2 • Change the cluster center to the average of the assigned points slide by David Sontag 24

K-Means Clustering: Example • Repeat until convergence slide by David Sontag 25

K-Means Clustering: Example slide by David Sontag 26

K-Means Clustering: Example slide by David Sontag 27

Properties of K-Means Algorithms • Guaranteed to converge in a finite number of iterations • Running time per iteration: 1. Assign data points to closest cluster center   O( KN ) time 2. Change the cluster center to the average of its assigned points   O( N ) time slide by David Sontag 28

K-Means Convergence Objective 1. Fix μ , optimize C : 2. Fix C , optimize μ : Take partial derivative of μ i and set to zero, we have – K-Means takes an alternating optimization approach, each step is slide by Alan Fern guaranteed to decrease the objective – thus guaranteed to converge 29

Demo time… 30

Example: K-Means for Segmentation K=2 Original K=3 K=10 K = 2 Original image K = 3 Goal of Segmentation K = 10 is to partition an image into regions each of which has reasonably homogenous visual appearance. slide by David Sontag 31

Example: K-Means for Segmentation K=2 Original K=3 K=10 K = 2 Original image K = 3 K = 10 slide by David Sontag 32

Example: K-Means for Segmentation K=2 Original K=3 K=10 K = 2 Original image K = 3 K = 10 slide by David Sontag 33

Example: Vector quantization FIGURE 14.9. Sir Ronald A. Fisher ( 1890 − 1962 ) was one of the founders of modern day statistics, to whom we owe maximum-likelihood, su ffi ciency, and many other fundamental concepts. The image on the left is a 1024 × 1024 grayscale image at 8 bits per pixel. The center image is the result of 2 × 2 block VQ, using 200 code vectors, with a compression rate of 1 . 9 bits/pixel. The right image uses only four code vectors, with a compression rate of 0 . 50 bits/pixel slide by David Sontag [Figure from Hastie et al. book] 34

Bag of Words model aardvark 0 about 2 all 2 Africa 1 apple 0 anxious 0 ... gas 1 ... oil 1 … slide by Carlos Guestrin Zaire 0 35

36 slide by Fei Fei Li

Object Bag of ‘words’ slide by Fei Fei Li 37

Interest Point Features Compute Normalize SIFT patch descriptor [Lowe’99] Detect patches [Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03] slide by Josef Sivic 38

39 Patch Features … slide by Josef Sivic

Dictionary Formation … slide by Josef Sivic 40

Clustering (usually K-means) … Vector quantization slide by Josef Sivic 41

Clustered Image Patches slide by Fei Fei Li 42

Lecture 22: Clustering Distance measures K-Means Aykut Erdem May - PowerPoint PPT Presentation

Lecture 22: Clustering Distance measures K-Means Aykut Erdem May 2016 Hacettepe University Last time Boosting Idea: given a weak learner, run it multiple times on (reweighted) training data, then let the learned classifiers vote

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Lecture 21: Unsupervised Learning and Clustering Algorithms Dr. Chengjiang Long Computer Vision

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Hierarchical and Ensemble Clustering Ke Chen Reading: [7.8-7.10, EA], [25.5, KPM], [Fred &

Sys System tem Eva Evaluat luation ion Darren Urada, Ph.D. and Howard Padwa, Ph.D., and

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

RECSM Summer School: Machine Learning for Social Sciences Session 3.4: Hierarchical Clustering

Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393

Geographic Data Science - Lecture VII Grouping Data over Space Dani Arribas-Bel Today The need

Lecture 22: Clustering Distance measures K-Means Aykut Erdem May - PowerPoint PPT Presentation

Lecture 22: Clustering Distance measures K-Means Aykut Erdem May 2016 Hacettepe University Last time Boosting Idea: given a weak learner, run it multiple times on (reweighted) training data, then let the learned classifiers vote

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Lecture 21: Unsupervised Learning and Clustering Algorithms Dr. Chengjiang Long Computer Vision

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Hierarchical and Ensemble Clustering Ke Chen Reading: [7.8-7.10, EA], [25.5, KPM], [Fred &amp;

Sys System tem Eva Evaluat luation ion Darren Urada, Ph.D. and Howard Padwa, Ph.D., and

Data Mining and Machine Learning: Fundamental Concepts and Algorithms dataminingbook.info

RECSM Summer School: Machine Learning for Social Sciences Session 3.4: Hierarchical Clustering

Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393

Geographic Data Science - Lecture VII Grouping Data over Space Dani Arribas-Bel Today The need

Hierarchical and Ensemble Clustering Ke Chen Reading: [7.8-7.10, EA], [25.5, KPM], [Fred &