Stability of Clustering Methods Sasha Rakhlin Ph.D. candidate, MIT - PowerPoint PPT Presentation

Stability of Clustering Methods Sasha Rakhlin Ph.D. candidate, MIT 1

A procedure is stable if P ( � solution − perturbed solution � > ε ) → 0 2

This talk: A tool for theoretical analysis of stability of clustering algorithms. Idea: phrase clustering as empirical risk minimization and use stability of ERM. Based on work with A. Caponnetto: “Some properties of ERM over Donsker classes,” submitted to JMLR. 3

Stability for model selection 4

Stability for model selection If the 2 -cluster solution is in our hypothesis space (“realizable” case), we get stability with respect to perturbations of the whole dataset. 4

Stability for model selection Instability (w.r.t. complete change of dataset) arises in the “non-realizable” case when there are two or more clusterings of similar ”distance” to the underlying density. What can we say about “non-realizable”? We will show that natural algorithms are stable w.r.t. change of o ( √ n ) points. 5

Toy example Choose, according to majority, either left or right half as the cluster. Probability that one point changes the cluster is Ω( n − 1 / 2 ) . This procedure is stable with respect to changes of o ( √ n ) points. 6

Much harder Choose, according to majority, a cluster of fixed size. Does the probability of jumps by ε decrease as n → ∞ ? 7

Much harder Choose, according to majority, a cluster of fixed size. Does the probability of jumps by ε decrease as n → ∞ ? Yes, this procedure is stable w.r.t. changes of o ( √ n ) points, no matter what P is. 7

Similar problem Choose, according to majority, k clusters of fixed size. Does the probability of jumps (in L 1 distance) by ε decrease as n → ∞ ? 8

Similar problem Choose, according to majority, k clusters of fixed size. Does the probability of jumps (in L 1 distance) by ε decrease as n → ∞ ? Yes, this procedure is stable w.r.t. changes of o ( √ n ) points. 8

Empirical Risk Minimization A. Caponnetto and A. Rakhlin “Some properties of ERM over Donsker classes,” submitted to JMLR. 9

Empirical Risk Minimization A. Caponnetto and A. Rakhlin “Some properties of ERM over Donsker classes,” submitted to JMLR. These examples are instances of empirical risk minimization. The following general result holds for any fixed distribution P : ∀ ε > 0 , P ( � f S − f T � L 1 ≥ ε ) → 0 where S and T differ on o ( √ n ) points, and f S , f T are respective almost-minimizers over a P -Donsker class. 9

Empirical Risk Minimization A. Caponnetto and A. Rakhlin “Some properties of ERM over Donsker classes,” submitted to JMLR. These examples are instances of empirical risk minimization. The following general result holds for any fixed distribution P : ∀ ε > 0 , P ( � f S − f T � L 1 ≥ ε ) → 0 where S and T differ on o ( √ n ) points, and f S , f T are respective almost-minimizers over a P -Donsker class. For binary functions, Donsker = VC. 9

k -means clustering We can now study stability of other clustering procedures which optimize an objective function. k -means clustering is n � � x i − m C ( x i ) � 2 min C i =1 which is empirical risk minimization over the class F = {� x − m C ( X ) � 2 : C is a k -partition and m C ( x ) are centers } 10

k -means clustering 11

k -means clustering F = {� x − m C ( X ) � 2 : C is a k -partition and m C ( x ) are centers } If F is Donsker (e.g. domain is compact), then L 1 stability implies stability of centers m C ( x ) . 11

MLE density estimation 12

MLE density estimation n � max log f ( x i ) f ∈F i =1 Under some assumptions on the class F of densities, this should imply stability of modes/clusters. 12

That’s all 13

Stability of Clustering Methods Sasha Rakhlin Ph.D. candidate, MIT - PowerPoint PPT Presentation

Stability of Clustering Methods Sasha Rakhlin Ph.D. candidate, MIT 1 A procedure is stable if P ( solution perturbed solution > ) 0 2 This talk: A tool for theoretical analysis of stability of clustering algorithms. Idea:

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Some Clustering Methods on Some Clustering Methods on Some Clustering Methods on Dissimilarity

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Clustering ! Hierarchical methods ! Model-based methods ! Density-based methods 1 2 What is

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Testing in the PHP w orld Marcus Brger PHP Qubec Conference 2007 The need for Testing

Lecture 5: Logistic Regression Feb 10 2020 Lecturer: Steven Wu Scribe: Steven Wu Last lecture,

Openness, Technology Capital, and Development Ellen McGrattan and Edward Prescott April 2007 Why

Principled Learning Method for Wasserstein Distributionally Robust Optimization with Local

CSC2412: Private Gradient Descent & Empirical Risk Minimization Sasho Nikolov 1 Empirical

Learning Theory and Model Selection Weinan Zhang Shanghai Jiao Tong University

and Strong Convexity Nati Srebro Ohad Shamir Shai Shalev-Shwartz Karthik Sridharan Ambuj

ECML 2015 Big Targets Workshop Paul Mineiro Paul Mineiro ECML 2015 Big Targets Workshop How can

Stability of Clustering Methods Sasha Rakhlin Ph.D. candidate, MIT - PowerPoint PPT Presentation

Stability of Clustering Methods Sasha Rakhlin Ph.D. candidate, MIT 1 A procedure is stable if P ( solution perturbed solution > ) 0 2 This talk: A tool for theoretical analysis of stability of clustering algorithms. Idea:

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Some Clustering Methods on Some Clustering Methods on Some Clustering Methods on Dissimilarity

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Clustering ! Hierarchical methods ! Model-based methods ! Density-based methods 1 2 What is

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Testing in the PHP w orld Marcus Brger PHP Qubec Conference 2007 The need for Testing

Lecture 5: Logistic Regression Feb 10 2020 Lecturer: Steven Wu Scribe: Steven Wu Last lecture,

Openness, Technology Capital, and Development Ellen McGrattan and Edward Prescott April 2007 Why

Principled Learning Method for Wasserstein Distributionally Robust Optimization with Local

CSC2412: Private Gradient Descent &amp; Empirical Risk Minimization Sasho Nikolov 1 Empirical

Learning Theory and Model Selection Weinan Zhang Shanghai Jiao Tong University

and Strong Convexity Nati Srebro Ohad Shamir Shai Shalev-Shwartz Karthik Sridharan Ambuj

ECML 2015 Big Targets Workshop Paul Mineiro Paul Mineiro ECML 2015 Big Targets Workshop How can

CSC2412: Private Gradient Descent & Empirical Risk Minimization Sasho Nikolov 1 Empirical