BBM406 Fundamentals of Machine Learning Lecture 21: Clustering - PowerPoint PPT Presentation

� � � � � � � � Clustering algorithms • Partitioning algorithms � � %; - Construct various partitions   � and then evaluate them by   � � some criterion � • K-means • Mixture of Gaussians � • Spectral Clustering • Hierarchical algorithms � � - Create a hierarchical decomposition   � � of the set of objects using some   � � � criterion - Bottom-up – agglomerative - Top-down – divisive slide by Eric Xing � � 38 � � � �

Desirable Properties of a Clustering Algorithm • Scalability (in terms of both time and space) • Ability to deal with di ff erent data types • Minimal requirements for domain knowledge to determine input parameters • Ability to deal with noisy data • Interpretability and usability • Optional slide by Andrew Moore - Incorporation of user-specified constraints 39

K-Means   Clustering 40

K-Means Clustering Benefits • Fast • Conceptually straightforward • Popular slide by Tamara Broderick 41

K-Means: Preliminaries slide by Tamara Broderick 42

K-Means: Preliminaries Datum: Vector of continuous values slide by Tamara Broderick 43

K-Means: Preliminaries Distance North Datum: Vector of continuous values slide by Tamara Broderick Distance East 44

K-Means: Preliminaries Distance North Datum: Vector of continuous values 6 . 2 slide by Tamara Broderick 1 . 5 Distance East 45

K-Means: Preliminaries Distance North Datum: Vector of continuous values North x 3 = (1 . 5 , 6 . 2) East Nor East 1.2 5.9 x 1 6 . 2 4.3 2.1 x 2 1.5 6.3 x 3 ... 4.1 2.3 x N slide by Tamara Broderick Distance East 1 . 5 Distance East 46

K-Means: Preliminaries Datum: Vector of continuous values Feature 1 Feature 2 Feature 2 x 3 = (1 . 5 , 6 . 2) Nor East 1.2 5.9 x 1 6 . 2 4.3 2.1 x 2 1.5 6.3 x 3 ... 4.1 2.3 x N slide by Tamara Broderick Distance East 1 . 5 Feature 1 47

K-Means: Preliminaries Datum: Vector of continuous values Feature 1 Feature 2 Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = (1 . 5 , 6 . 2) x 3 = ( x 3 , 1 , x 3 , 2 ) Nor East F F 1.2 5.9 x 1 , 1 x 1 , 2 x 1 x 1 4.3 2.1 x 2 x 2 x 2 , 1 x 2 , 2 1.5 6.3 x 3 , 1 x 3 , 2 x 3 x 3 ... ... 4.1 2.3 x N, 1 x N, 2 x N x N slide by Tamara Broderick Feature 1 Distance East Feature 1 48

K-Means: Preliminaries Datum: Vector of D continuous values Feature 1 Feature 2 Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) Nor East F F 1.2 5.9 x 1 , 1 x 1 , 2 x 1 x 1 4.3 2.1 x 2 x 2 x 2 , 1 x 2 , 2 1.5 6.3 x 3 , 1 x 3 , 2 x 3 x 3 ... ... 4.1 2.3 x N, 1 x N, 2 x N x N slide by Tamara Broderick Feature 1 Distance East Feature 1 49

K-Means: Preliminaries Datum: Vector of D continuous values Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) slide by Tamara Broderick Feature 1 Feature 1 50

K-Means: Preliminaries Dissimilarity: Distance as the crow flies Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) slide by Tamara Broderick Feature 1 Feature 1 51

K-Means: Preliminaries Dissimilarity: Distance as the crow flies Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 52

K-Means: Preliminaries Dissimilarity: Distance as the crow flies Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 53

K-Means: Preliminaries Dissimilarity: Euclidean distance Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 54

K-Means: Preliminaries Dissimilarity: Squared Euclidean distance Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) dis ( x 3 , x 17 ) = ( x 3 , 1 − x 17 , 1 ) 2 + ( x 3 , 2 − x 17 , 2 ) 2 x 3 x 17 slide by Tamara Broderick Feature 1 Feature 1 55

K-Means: Preliminaries Dissimilarity: Squared Euclidean distance Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) D � dis ( x 3 , x 17 ) = ( x 3 , 1 − x 17 , 1 ) 2 ( x 3 ,d − x 17 ,d ) 2 dis ( x 3 , x 17 ) = d =1 + ( x 3 , 2 − x 17 , 2 ) 2 x 3 x 17 For each feature For each feature slide by Tamara Broderick Feature 1 Feature 1 56

K-Means: Preliminaries Dissimilarity Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) slide by Tamara Broderick Feature 1 Feature 1 57

K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) K = number of clusters slide by Tamara Broderick Feature 1 Feature 1 58

K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers slide by Tamara Broderick Feature 1 Feature 1 59

K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 2 µ 3 µ 1 slide by Tamara Broderick Feature 1 Feature 1 62

K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 2 µ 3 1 = ( µ 1 , 1 , µ 1 , 2 ) µ 1 slide by Tamara Broderick Feature 1 Feature 1 63

K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ 2 , . . . , µ K µ 2 µ 3 1 = ( µ 1 , 1 , µ 1 , 2 ) µ 1 slide by Tamara Broderick Feature 1 Feature 1 64

K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ 2 , . . . , µ K • Data assignments to clusters slide by Tamara Broderick Feature 1 Feature 1 65

K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters slide by Tamara Broderick Feature 1 Feature 1 66

K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters = set of points in   S k = set of points in cluster k cluster k slide by Tamara Broderick Feature 1 Feature 1 67

K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters S 1 , S 2 , . . . , S K = set of points in   S k = set of points in cluster k cluster k slide by Tamara Broderick Feature 1 Feature 1 68

K-Means: Preliminaries Cluster summary Feature 2 Feature 2 x 3 = ( x 3 , 1 , x 3 , 2 ) x 3 = ( x 3 , 1 , x 3 , 2 ) • K cluster centers µ 1 , µ • Data assignments to clusters µ 1 , µ 2 , . . . , µ K • Data assignments to clusters µ 2 µ 3 S 1 , S 2 , . . . , S K = set of points in   S k = set of points in cluster k cluster k µ 1 slide by Tamara Broderick Feature 1 Feature 1 69

K-Means: Preliminaries Dissimilarity Feature 2 Featur slide by Tamara Broderick Feature 1 70

K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 slide by Tamara Broderick Feature 1 71

K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 For each cluster slide by Tamara Broderick Feature 1 72

K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 For each cluster or each cluster For each data   or each data point in the   kth cluster slide by Tamara Broderick Feature 1 73

K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 For each cluster or each cluster For each data   or each data or each data point in the   point in the kth cluster kth cluster slide by Tamara Broderick Feature 1 or each featur For each feature 74

K-Means: Preliminaries Dissimilarity (global) Feature 2 K D � � � ( x n,d − µ k,d ) 2 dis global = Featur k =1 n : x n ∈ S k d =1 slide by Tamara Broderick Feature 1 75

• Initialize K cluster centers K-Means Algorithm • Repeat until convergence: ✦ Assign each data point to   the cluster with the closest   center. ✦ Assign each cluster   Featur center to be the mean of its cluster’s data points slide by Tamara Broderick 76

• Initialize K cluster centers K-Means Algorithm • Repeat until convergence: ✦ Assign each data point to   the cluster with the closest   center. ✦ Assign each cluster   Featur center to be the mean of its cluster’s data points slide by Tamara Broderick 77

• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from   1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to   Featur the cluster with the closest   center. ✦ Assign each cluster   center to be the mean of its cluster’s data points slide by Tamara Broderick 78

• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from   1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until convergence: ✦ Assign each data point to   the cluster with the closest   center. ✦ Assign each cluster   center to be the mean of its cluster’s data points slide by Tamara Broderick 79

• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from   1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ Assign each data point to   the cluster with the closest   center. ✦ Assign each cluster   center to be the mean of its cluster’s data points slide by Tamara Broderick 82

• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from   1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t Or no change Or no change Or no change   change: in in dis global ✦ Assign each data point to   the cluster with the closest   center. ✦ Assign each cluster   center to be the mean of its cluster’s data points slide by Tamara Broderick 83

• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from   1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ Assign each data point to   the cluster with the closest   center. ✦ Assign each cluster   center to be the mean of its cluster’s data points slide by Tamara Broderick 84

• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from   1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ❖ Find k with smallest   * Find k with smallest dis ( x n , µ k ) ❖ Put (and no   * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster   center to be the mean of its cluster’s data points slide by Tamara Broderick 85

• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from   1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest   * Find k with smallest dis ( x n , µ k ) ✤ Put (and no   * Put (and no x n ∈ S k other S j ) ✦ Assign each cluster   center to be the mean of its cluster’s data points slide by Tamara Broderick 88

• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from   1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest   * Find k with smallest dis ( x n , µ k ) ✤ Put (and no   * Put (and no x n ∈ S k other S j ) For k = 1,...,K ✦ For k = 1,…,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 89

• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from   1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest   * Find k with smallest dis ( x n , µ k ) ✤ Put (and no   * Put (and no x n ∈ S k other S j ) ✦ For k = 1,…,K For k = 1,...,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 97

• For k = 1,…, K K-Means Algorithm ✦ Randomly draw n from   1,…,N without replacement ✦ µ k ← x n • Repeat until con • Repeat until S 1 ,…,S k don’t change: ✦ For n = 1,…N ✤ Find k with smallest   * Find k with smallest dis ( x n , µ k ) ✤ Put (and no   * Put (and no x n ∈ S k other S j ) ✦ For k = 1,…,K For k = 1,...,K � µ k ← | S k | − 1 * x n ✤ n : n ∈ S k slide by Tamara Broderick 98

K-Means: Evaluation slide by Tamara Broderick 99

K-Means: Evaluation • Will it terminate? Yes. Always. slide by Tamara Broderick 100

BBM406 Fundamentals of Machine Learning Lecture 21: Clustering - PowerPoint PPT Presentation

Photo by Unsplash user @foodiesfeed BBM406 Fundamentals of Machine Learning Lecture 21: Clustering K-Means Aykut Erdem // Hacettepe University // Fall 2019 Last time Boosting Idea: given a weak learner, run it multiple times on

BBM406 Fundamentals of Machine Learning Lecture 1: Course outline and logistics An overview

BBM406 Fundamentals of Machine Learning Lecture 23: Dimensionality Reduction Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 6: Learning theory Probability Review Aykut

BBM406 Fundamentals of Machine Learning Lecture 18: Decision Trees Aykut Erdem // Hacettepe

BBM406 Fundamentals of Machine Learning Lecture 9: Logistic Regression Discriminative vs.

BBM406 Fundamentals of Machine Learning Lecture 11: Multi-layer Perceptron Forward Pass

BBM406 Fundamentals of Machine Learning Lecture 13: Introduction to Deep Learning Aykut

BBM406 Fundamentals of Machine Learning Lecture 7: Probability Review (contd.) Maximum

BBM406 Fundamentals of Machine Learning Lecture 10: Linear Discriminant Functions Perceptron

BBM406 Fundamentals of Machine Learning Lecture 19: What is Ensemble Learning? Bagging

BBM406 Fundamentals of Machine Learning Lecture 8: Maximum a Posteriori (MAP) Nave Bayes

BBM406 Fundamentals of Machine Learning Lecture 2: Machine Learning by Examples, Nearest

BBM406 Fundamentals of Machine Learning Lecture 20: AdaBoost Aykut Erdem // Hacettepe

BBM406 Fundamentals of Machine Learning Lecture 15: Support Vector Machines Aykut Erdem //

BBM406 Fundamentals of Machine Learning Lecture 17: Kernel Trick for SVMs Risk and Loss

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

Requirements on Linked Data Consumption Platform Jakub Klmek, Petr koda, Martin Neask

The Ontology of Colour, from Colour for Philosophers Hardin Why

HYPERBOLIC EXPLICIT-PARABOLIC LINEARLY IMPLICIT FINITE DIFFERENCE METHODS FOR DEGENERATE

Inertial support of distinguished and inertial support representations Examples G -data

CS414 Section 1 Project 1: Minithreads Owen Arden owen@cs.cornell.edu All slides stolen. What

Lecture 16 Spatial Data and Cartography Colin Rundel 03/20/2017 1 Background 2 Analysis of

Lecture 17 Spatial Data and Cartography (Part 2) Colin Rundel 03/22/2017 1 Plotting 2

10-701 Machine Learning Classification Related reading: Mitchell 8.1,8.2; Bishop 1.5 Where we