Lecture 22: Clustering Distance measures K-Means Aykut Erdem - PowerPoint PPT Presentation

Lecture 22: − Clustering − Distance measures − K-Means Aykut Erdem December 2016 Hacettepe University

Last time… Boosting • Idea: given a weak learner, run it multiple times on (reweighted) training data, then let the learned classifiers vote • On each iteration t : - weight each training example by how incorrectly it was classified - Learn a hypothesis – h t - A strength for this hypothesis – a t • Final classifier: - A linear combination of the votes of the di ff erent classifiers weighted by their strength slide by Aarti Singh & Barnabas Poczos • Practically useful • Theoretically interesting 2

3 Last time.. The AdaBoost Algorithm slide by Jiri Matas and Jan Š ochman

This week • Distance measures • K-Means • Spectral clustering • Hierarchical clustering • What is a good clustering? 4

Distance measures 5

Distance measures • In studying clustering techniques we will assume that we are given a matrix of distances between all pairs of data points: x x x x x 1 2 3 4 m x 1 x 2 x 3 d(x , x ) x i j 4 • • • • slide by Julia Hockenmeier • • x m 6

What is Similarity/Dissimilarity? Hard to define! But we know it when we see it • The real meaning of similarity is a philosophical question. We will take � a more pragmatic approach. � • Depends on representation and algorithm. For many rep.//alg., easier to think in terms of a distance (rather than similarity) between vectors. slide by Eric Xing 7

Defining Distance Measures • Definition: Let O 1 and O 2 be two objects from the universe of possible objects. The distance (dissimilarity) between O 1 and O 2 is a real number denoted by D( O 1 , O 2 ). gene1 gene2 slide by Andrew Moore 0.23 3 342.7 8

A few examples: Euclidean distance • � d ( x , y ) � ( x i � y i ) 2 � • ance • � � i Correlation coefficient • • • • Similarity rather than distance • • • Can determine similar trends • � • coefficient slide by Andrew Moore � � � � � ( x i � � x )( y i � � y ) ฀ � s ( x , y ) � ฀ � i � � x � y 9 � � ฀ � ฀ �

What properties should a distance measure have? • Symmetric - D( A , B ) = D( B , A ) - Otherwise, we can say A looks like B but B does not look like A • Positivity, and self-similarity - D( A , B ) ≥ 0, and D( A , B ) = 0 i ff A = B - Otherwise there will di ff erent objects that we cannot tell apart • Triangle inequality - D( A , B ) + D( B , C ) ≥ D( A , C ) - Otherwise one can say “ A is like B , B is like C , but A is not slide by Alan Fern like C at all” 10

      Distance measures • Euclidean (L 2 )   idean (L 2 ) d ( x i − y i ) 2 ∑ d ( x , y ) = i = 1 hattan (L ) • Manhattan (L 1 )   hattan (L 1 ) d d ( x , y ) = x - y = ∑ x i − y i i = 1 ity (Sup) Distance L • Infinity (Sup) Distance L ∞  ity (Sup) Distance L ∞ d ( x , y ) = max 1 ≤ i ≤ d x i − y i slide by Julia Hockenmeier • Note that L ∞ < L 1 < L 2 , but di ff erent distances do not induce the same ordering on points. 11

Distance measures x = (x 1 , x 2 ) y = (x 1 –2, x 2 +4) Euclidean: (4 2 + 2 2 ) 1/2 = 4.47 Manhattan: 4 + 2 = 6 Sup: Max (4,2) = 4 4 slide by Julia Hockenmeier 2 12

Distance measures • Di ff erent distances do not induce the same ordering on points L (a, b) 5 = ∞ 2 2 1/2 L (a, b) (5 ) 5 = + ε = + ε 2 L (c, d) 4 = ∞ 2 2 1/2 4 L (c, d) (4 4 ) 4 2 5 . 66 = + = = 2 5 L (c, d) L (a, b) < slide by Julia Hockenmeier ∞ ∞ L (c, d) L (a, b) > 4 2 2 9 13

Distance measures • Clustering is sensitive to the distance measure. • Sometimes it is beneficial to use a distance measure that is invariant to transformations that are natural to the problem: - Mahalanobis distance: ✓ Shift and scale invariance slide by Julia Hockenmeier 14

Mahalanobis Distance ( x - y ) T Σ ( x − y ) d ( x , y ) = Σ is a (symmetric) Covariance Matrix: µ = 1 m ∑ x i , (average of the data) m i = 1 Σ = 1 m ( x − µ )( x − µ ) T , ∑ a matrix of size m × m m i = 1 Translates all the axes to a mean = 0 and slide by Julia Hockenmeier variance = 1 (shift and scale invariance) 15

Distance measures • Some algorithms require distances between a point x and a set of points A d(x, A)   This might be defined e.g. as min/max/avg distance between x and any point in A.   • Others require distances between two sets of points A, B, d(A, B).   This might be defined e.g as min/max/avg distance between any point in A and any point in B. slide by Julia Hockenmeier 16

� � � � � � � � Clustering algorithms • Partitioning algorithms � � %; - Construct various partitions   � and then evaluate them by   � � some criterion � • K-means • Mixture of Gaussians � • Spectral Clustering • Hierarchical algorithms � � - Create a hierarchical decomposition   � � of the set of objects using some   � � � criterion - Bottom-up – agglomerative - Top-down – divisive slide by Eric Xing � � 17 � � � �

Desirable Properties of a Clustering Algorithm • Scalability (in terms of both time and space) • Ability to deal with di ff erent data types • Minimal requirements for domain knowledge to determine input parameters • Ability to deal with noisy data • Interpretability and usability • Optional slide by Andrew Moore - Incorporation of user-specified constraints 18

K-Means 19

K-Means • An iterative clustering algorithm - Initialize: Pick K random points as cluster centers (means) - Alternate: Assign data instances • to closest mean Assign each mean to • the average of its assigned points - Stop when no points’ slide by David Sontag assignments change 20

K-Means • An iterative clustering algorithm - Initialize: Pick K random points as cluster centers (means) - Alternate: Assign data instances • to closest mean Assign each mean to • the average of its assigned points - Stop when no points’ slide by David Sontag assignments change 21

K-Means Clustering: Example • Pick K random points as cluster centers (means) Shown here for K=2 slide by David Sontag 22

K-Means Clustering: Example Iterative Step 1 • Assign data points to closest cluster centers slide by David Sontag 23

K-Means Clustering: Example Iterative Step 2 • Change the cluster center to the average of the assigned points slide by David Sontag 24

K-Means Clustering: Example • Repeat until convergence slide by David Sontag 25

K-Means Clustering: Example slide by David Sontag 26

K-Means Clustering: Example slide by David Sontag 27

Properties of K-Means Algorithms • Guaranteed to converge in a finite number of iterations • Running time per iteration: 1. Assign data points to closest cluster center   O( KN ) time 2. Change the cluster center to the average of its assigned points   O( N ) time slide by David Sontag 28

K-Means Convergence Objective 1. Fix μ , optimize C : 2. Fix C , optimize μ : Take partial derivative of μ i and set to zero, we have – K-Means takes an alternating optimization approach, each step is slide by Alan Fern guaranteed to decrease the objective – thus guaranteed to converge 29

Demo time… 30

K-Means   Example Applications 31

Example: K-Means for Segmentation K=2 Original K=3 K=10 K = 2 Original image K = 3 Goal of Segmentation K = 10 is to partition an image into regions each of which has reasonably homogenous visual appearance. slide by David Sontag 32

Example: K-Means for Segmentation K=2 Original K=3 K=10 K = 2 Original image K = 3 K = 10 slide by David Sontag 33

Example: K-Means for Segmentation K=2 Original K=3 K=10 K = 2 Original image K = 3 K = 10 slide by David Sontag 34

Example: Vector quantization FIGURE 14.9. Sir Ronald A. Fisher ( 1890 − 1962 ) was one of the founders of modern day statistics, to whom we owe maximum-likelihood, su ffi ciency, and many other fundamental concepts. The image on the left is a 1024 × 1024 grayscale image at 8 bits per pixel. The center image is the result of 2 × 2 block VQ, using 200 code vectors, with a compression rate of 1 . 9 bits/pixel. The right image uses only four code vectors, with a compression rate of 0 . 50 bits/pixel slide by David Sontag [Figure from Hastie et al. book] 35

Example: Simple Linear Iterative Clustering (SLIC) superpixels λ : spatial regularization parameter R. Achanta, A. Shaji, K. Smith, A. Lucchi, P . Fua, and S. Susstrunk SLIC Superpixels Compared to State-of-the-art Superpixel Methods, IEEE T-PAMI, 2012 36

Bag of Words model aardvark 0 about 2 all 2 Africa 1 apple 0 anxious 0 ... gas 1 ... oil 1 … slide by Carlos Guestrin Zaire 0 37

38 slide by Fei Fei Li

Object Bag of ‘words’ slide by Fei Fei Li 39

Interest Point Features Compute Normalize SIFT patch descriptor [Lowe’99] Detect patches [Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03] slide by Josef Sivic 40

41 Patch Features … slide by Josef Sivic

Dictionary Formation … slide by Josef Sivic 42

Lecture 22: Clustering Distance measures K-Means Aykut Erdem - PowerPoint PPT Presentation

Lecture 22: Clustering Distance measures K-Means Aykut Erdem December 2016 Hacettepe University Last time Boosting Idea: given a weak learner, run it multiple times on (reweighted) training data, then let the learned classifiers

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Cluster Analysis Grouping the data items into a number of sets such that the members of each

Clustering ! Hierarchical methods ! Model-based methods ! Density-based methods 1 2 What is

DS504/CS586: Big Data Analytics Big Data Clustering II Prof. Yanhua Li Time: 6pm 8:50pm Thu

Camera Visualization System Requirements and Status JTM - March 2017 Visualization Requirements

A Consistent Density-Based Clustering Algorithm and its Application to Microstructure Image

CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality

Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016

Week 7 Video 3 Advanced Clustering Algorithms Today Multiple advanced algorithms for

Sambuz

Useful Links

Newsletter

Mail Us