Clustering: K-Means, GMM, EM March 11, 2016 Boris Ivanovic* - PowerPoint PPT Presentation

CSC411 Tutorial #6 Clustering: K-Means, GMM, EM March 11, 2016 Boris Ivanovic* csc411ta@cs.toronto.edu *Based on the tutorial by Shikhar Sharma and Wenjie Luo’s 2014 slides.

Outline for Today • K-Means • GMM • Questions • I’ll be focusing more on the intuitions behind these models, the math is not as important for your learning here

Clustering In classification, we are given data with associated labels What if we aren’t given any labels? Our data might still have structure We basically want to simultaneously label points and build a classifier PS. I didn't change the bottom information because that would be disingenuous of me, and also because credit should be given where credit is due. Thanks Shikhar for the tutorial slides! Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 3 / 29

Tomato sauce A major tomato sauce company wants to tailor their brands to sauces to suit their customers They run a market survey where the test subject rates different sauces After some processing they get the following data Each point represents the preferred sauce characteristics of a specific person Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 4 / 29

Tomato sauce data More Garlic → More Sweet → This tells us how much different customers like different flavors Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 5 / 29

Some natural questions How many different sauces should the company make? How sweet/garlicy should these sauces be? Idea: We will segment the consumers into groups (in this case 3), we will then find the best sauce for each group Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 6 / 29

Approaching k-means Say I give you 3 sauces whose garlicy-ness and sweetness are marked by X More Garlic → More Sweet → Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 7 / 29

Approaching k-means We will group each customer by the sauce that most closely matches their taste More Garlic → More Sweet → Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 8 / 29

Approaching k-means Given this grouping, can we choose sauces that would make each group happier on average? More Garlic → More Sweet → Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 9 / 29

Approaching k-means Given this grouping, can we choose sauces that would make each group happier on average? More Garlic → Yes ! More Sweet → Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 10 / 29

Approaching k-means Given these new sauces, we can regroup the customers More Garlic → More Sweet → Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 11 / 29

Approaching k-means Given these new sauces, we can regroup the customers More Garlic → More Sweet → Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 12 / 29

The k-means algorithm Initialization: Choose k random points to act as cluster centers Iterate until convergence: Step 1: Assign points to closest center (forming k groups) Step 2: Reset the centers to be the mean of the points in their respective groups Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 13 / 29

Viewing k-means in action Demo... Note: K-Means only finds a local optimum Questions: How do we choose k? Couldn’t we just let each person have their own sauce? (Probably not feasible...) Can we change the distance measure? Right now we’re using Euclidean Why even bother with this when we can “see” the groups? (Can we plot high-dimensional data?) Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 14 / 29

A “simple” extension Let’s look at the data again, notice how the groups aren’t necessarily circular? More Garlic → More Sweet → Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 15 / 29

A “simple” extension Also, does it make sense to say that points in this region belong to one group or the other? More Garlic → More Sweet → Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 16 / 29

Flaws of k-means It can be shown that k-means assumes the data belong to spherical groups, moreover it doesn’t take into account the variance of the groups (size of the circles) It also makes hard assignments, which may not be ideal for ambiguous points This is especially a problem if groups overlap We will look at one way to correct these issues Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 17 / 29

Isotropic Gaussian mixture models K-means implicitly assumes each cluster is an isotropic (spherical) Gaussian, it simply tries to find the optimal mean for each Gaussian However, it makes an additional assumption that each point belongs to a single group We will correct this problem first by allowing each point to “belong to multiple groups” More accurately, that it belongs to each group with probability p i , where � i p i = 1 Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 18 / 29

Gaussian mixture models Given a data point x with dimension D: A multivariate isotropic Gaussian PDF is given by: 2 σ 2 ( x − µ ) T ( x − µ ) 1 P ( x ) = (2 π ) − D 2 ( σ 2 ) − D 2 e − (1) A multivariate Gaussian in general is given by: 2 ( x − µ ) T Σ − 1 ( x − µ ) P ( x ) = (2 π ) − D 2 | Σ | − 1 2 e − 1 (2) We can try to model the covariance as well to account for elliptical clusters Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 20 / 29

Gaussian mixture models Demo GMM with full covariance Notice that now it takes much longer to converge Can be much faster convergence by first initializing with k-meansThe EM algorithm Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 21 / 29

THE EM algorithm What we have just seen is an instance of the EM algorithm The EM algorithm is actually a meta-algorithm, it tells you the steps needed in order to derive an algorithm to learn a model The “E” stands for expectation, the “M” stands for maximization We will look more closely at what this algorithm does, but won’t go into extreme detail Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 22 / 29

EM for the Gaussian Mixture Model Recall that we are trying to put the data into groups, while simultaneously learning the parameters of that group If we knew the groupings in advance, the problem would be easy With k groups, we are just fitting k separate Gaussians With soft assignments, the data is simply weighted (i.e. we calculate weighted means and covariances) Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 23 / 29

EM for the Gaussian Mixture Model Given initial parameters: Iterate until convergence E-step: Partition the data into different groups (soft assignments) M-step: For each group, fit a Gaussian to the weighted data belonging to that group Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 24 / 29

EM in general We specify a model that has variables ( x , z ) with parameters θ , denote this by P ( x , z | θ ) We want to optimize the log-likelihood of our data log( P ( x | θ )) = log( � z P ( x , z | θ )) x is our data, z is some variable with extra information Cluster assignments in the GMM, for example We don’t know z , it is a “latent variable” E-step: infer the expected value for z given x M-step: maximize the “complete data log-likelihood” log( P ( x , z | θ )) with respect to θ Shikhar Sharma (UofT) Unsupervised Learning October { 27,29,30 } , 2015 25 / 29

Clustering: K-Means, GMM, EM March 11, 2016 Boris Ivanovic* - PowerPoint PPT Presentation

CSC411 Tutorial #6 Clustering: K-Means, GMM, EM March 11, 2016 Boris Ivanovic* csc411ta@cs.toronto.edu *Based on the tutorial by Shikhar Sharma and Wenjie Luos 2014 slides. Outline for Today K-Means GMM Questions Ill be

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

Data Clustering: Data Clustering: 50 Years Beyond K means 50 Years Beyond K means 50 Years

Single-Equation GMM Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Clustering: Models and Algorithms Shikui Tu 2019-02-28 1 Outline Clustering K-mean

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

1 K-means clustering The K-means clustering algorithm can be seen as applying the EM algorithm to

K-Means + GMMs Clustering Readings: EM, GMM Readings: Matt Gormley Murphy 25.5

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

MOL2NET , 2017 , 3, doi: 10.3390/mol2net-03-04608 2 with SuperPro Designer software from a raw

ATLAS Pixel Detector Upgrade The Insertable B-Layer David Bertsche November 8th, 2012 ATLAS

Optical Properties ISSUES TO ADDRESS... What happens when light shines on a material ?

Planck-like Radiation and the Parton-Hadron Phase Transition in QCD Hans J. Specht

Review S. Cheng (OU-Tulsa) October 17, 2017 1 / 28 Lecture 10 Review Conditioning reduces

Neural LMs Image: (Bengio et al, 03) One Hot Vectors Neural LMs (Bengio et al, 03)

Program Guided Agent ICLR 2020 (Spotlight) Shao-Hua Sun Te-Lin Wu Joseph J. Lim Follow an

Introduction to Linear Programming Linear Programming is the study of optimization problems in