Clustering and K-means Root Mean Square Error (RMS) Data: ! x 1 , ! - PowerPoint PPT Presentation

Clustering and K-means

Root Mean Square Error (RMS) Data: ! x 1 , ! x 2 , … , ! x N ∈ R d Approximations: ! z 1 , ! z 2 , … , ! z N ∈ R d x i − ! ! N 1 ∑ 2 Root Mean Square error = y i 2 z N i = 1

PCA based predic>on Data: ! x 1 , ! x 2 , … , ! x N ∈ R d Mean vector: ! µ Top k eigenvectors: ! v 1 , ! v 2 , … , ! v k o j = ! ( ) ! Approximation of ! x j : ! v i ⋅ ! ! k ∑ µ + x j v i i = 1 o x N x i − ! ! 1 ∑ 2 RMS Error = o i 2 o o x N i = 1 x o x x o o x x o o x

Regression based Predic>on Data: ( ! x 1 , y 1 ), ( ! x 2 , y 2 ), … , ( ! x N , y N ) ∈ R d Input: ! x ∈ R d Output: y ∈ R Approximation of y given ! d ∑ y = a 0 + x : ˆ a i x i i = 1 o x N 1 ∑ ( ) RMS Error = y i − ˆ 2 o y i x N i = 1 o x o x o x o x x o o x

K-means clustering Data: ! x 1 , ! x 2 , … , ! x N ∈ R d Model: k representatives: ! 1 , ! 2 , … , ! k ∈ R d r r r Approximation of ! x j : ! x j − ! ! 2 o j = argmin ! r r i 2 i = the representative closest to ! x j N x i − ! ! 1 ∑ 2 RMS Error = o i 2 N i = 1

K-means Algorithm Initialize k representatives ! 1 , ! 2 , … , ! k ∈ R d r r r Iterate until convergence: "! " ! a . Associate each ! → r j x i with it's closest representative x i b . Replace each representative ! r j with the mean of the points assigned to ! r j Both a step and b step reduce RMSE

Simple Ini>aliza>on Simplest Ini>aliza>on: choose representa>ve from data points independently at random. – Problem: some representa>ves are close to each other and some parts of the data have no representa>ves. – Kmeans is a local search method – can get stuck in local minima.

Kmeans++ • A different method for ini>alizing representa>ves. • Spreads out ini>al representa>ves • Add representa>ves one by one • Before adding representa>ve, define distribu>on over unselected data points. Data: ! x 1 , … , ! Current Reps: ! 1 , … , ! x N r r j Distance of example to Reps: d ( ! x ,{ ! 1 , … , ! r j }) = min 1 ≤ i ≤ j ‖ ! x − ! r r i ‖ Prob. of selecting example ! x as next representative: P ( ! x ) = 1 1 d ( ! x ,{ ! 1 , … , ! Z r r j })

Example for Kmeans++ This is an unlikely ini>aliza>on for kmeans++

Parallelized Kmeans • Suppose the data points are par>>oned randomly across several machines. • We want to perform the a,b steps with minimal communica>on btwn machines. 1. Choose ini>al representa>ves and broadcast to all machines. 2. Each machine par>>ons its own data points according to closest representa>ve. Defines (key,value) pairs where key=index of closest representa>ve. Value=example. 3. Compute the mean for each set by performing reduceByKey. (most of the summing done locally on each machine). 4. Broadcast new reps to all machines.

Clustering stability

Clustering stability Clustering using Star>ng points I Clustering using Star>ng points 2 Clustering using Star>ng points 3

Measuring clustering stability Entry in row “clustering j”, column “xi” contains the index of the closest representa>ve to xi for clustering j x1 x2 x3 x4 x5 x6 xn Clustering 1 1 1 3 1 3 2 2 2 3 Clustering 2 2 2 1 2 1 3 3 3 1 Clustering 3 2 2 3 2 3 1 1 1 3 Clustering 4 1 1 1 1 3 3 3 3 1 The first three clusterings are completely consistent with each other The fourth clustering has a disagreement in x5

How to quan>fy stability? • We say that a clustering is stable if the examples are always grouped in the same way. • When we have thousands of examples, we cannot expect all of them to always be grouped the same way. • We need a way to quan>fy the stability. • Basic idea: measure how much groupings differ between clusterings.

Entropy A partition G of the data defines a distribution over the parts: p 1 + p 2 + ! + p k = 1 The information in this partition is measured by the Entropy: k 1 ∑ H ( G ) = H ( p 1 , p 2 , … , p k ) = p i log 2 p i i = 1 H ( G ) is a number between 0 (one part with prob. 1) and log 2 k ( p 1 = p 2 = ! = p k = 1 k )

Entropy of a combined par>>on If clustering1 and clustering 2 partition the data in the exact same way then G 1 = G 2 , H ( G 1 , G 2 ) = H ( G 1 ) = H ( G 2 ) If clustering1 and clustering 2 are independent (partition the data independently from each other). then H ( G 1 , G 2 ) = H ( G 1 ) + H ( G 2 ) Suppse we produce many clusterings, using many starting points. Suppose we plot H ( G 1 ), H ( G 1 , G 2 ), … , H ( G 1 , G 2 , … , G i ), … As a function of i If the graph increases like i log 2 k then the clustering is completely unstable If the graph stops increasing after some i then we reached stability.

Clustering and K-means Root Mean Square Error (RMS) Data: ! x 1 , ! - PowerPoint PPT Presentation

Clustering and K-means Root Mean Square Error (RMS) Data: ! x 1 , ! x 2 , , ! x N R d Approximations: ! z 1 , ! z 2 , , ! z N R d x i ! ! N 1 2 Root Mean Square error = y i 2 z N i = 1 PCA based predic>on Data: !

Square Root of Not: Square Root of Not: . . . A Major Difference Between Square Root of

PRESS ROOT TO PRESS ROOT TO CONTINUE: PRESS ROOT TO PRESS ROOT TO CONTINUE: PRESS ROOT TO

Quick Intro to RMS Quick Intro to RMS RMS is a Record Management System that :

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) & (Root

Root River Fisheries Root River Fisheries Craig Helker Craig Helker WDNR WDNR Root River

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

Certicate Transparency Root Explorer Nikita Korzhitskii Niklas Carlsson Web Public Key

Data Clustering: Data Clustering: 50 Years Beyond K means 50 Years Beyond K means 50 Years

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Sup

Partial match queries: a limit process Nicolas Broutin Ralph Neininger Henning Sulzbach Partial

The hyperbolic Brownian plane Thomas Budzinski ENS Paris July 7th, 2016 Thomas Budzinski The

Understanding MCMC Marcel Lthi, University of Basel Slides based on presentation by Sandro

Chapter 3 Asymptotic Equipartition Property Peng-Hua Wang Graduate Inst. of Comm. Engineering

Clustering 2 Clustering 2 Nov 3 2008 HAC Algorithm HAC Algorithm St t Start with all objects in

Stochastic solution of large least squares systems in variational data assimilation Parallel

Linear Regression II, SGD Milan Straka October 12, 2020 Charles University in Prague Faculty of

Clustering and K-means Root Mean Square Error (RMS) Data: ! x 1 , ! - PowerPoint PPT Presentation

Clustering and K-means Root Mean Square Error (RMS) Data: ! x 1 , ! x 2 , , ! x N R d Approximations: ! z 1 , ! z 2 , , ! z N R d x i ! ! N 1 2 Root Mean Square error = y i 2 z N i = 1 PCA based predic>on Data: !

Square Root of Not: Square Root of Not: . . . A Major Difference Between Square Root of

PRESS ROOT TO PRESS ROOT TO CONTINUE: PRESS ROOT TO PRESS ROOT TO CONTINUE: PRESS ROOT TO

Quick Intro to RMS Quick Intro to RMS RMS is a Record Management System that :

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) &amp; (Root

Root River Fisheries Root River Fisheries Craig Helker Craig Helker WDNR WDNR Root River

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

Certicate Transparency Root Explorer Nikita Korzhitskii Niklas Carlsson Web Public Key

Data Clustering: Data Clustering: 50 Years Beyond K means 50 Years Beyond K means 50 Years

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Sup

Partial match queries: a limit process Nicolas Broutin Ralph Neininger Henning Sulzbach Partial

The hyperbolic Brownian plane Thomas Budzinski ENS Paris July 7th, 2016 Thomas Budzinski The

Understanding MCMC Marcel Lthi, University of Basel Slides based on presentation by Sandro

Chapter 3 Asymptotic Equipartition Property Peng-Hua Wang Graduate Inst. of Comm. Engineering

Clustering 2 Clustering 2 Nov 3 2008 HAC Algorithm HAC Algorithm St t Start with all objects in

Stochastic solution of large least squares systems in variational data assimilation Parallel

Linear Regression II, SGD Milan Straka October 12, 2020 Charles University in Prague Faculty of

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) & (Root