Informational and Computational Limits of Clustering and other - PowerPoint PPT Presentation

Informational and Computational Limits of Clustering and other questions about clustering Nati Srebro University of Toronto based on work in progress with Gregory Shakhnarovich Sam Roweis Brown University University of Toronto

“Clustering” • Clustering with respect to a specific model / structure / objective • Gaussian mixture model – Each point comes from one of k “centers” – Gaussian cloud around each center – For now: unit-variance Gaussians, uniform prior over choice of center • As an optimization problem: – Likelihood of centers: Σ i log( Σ j exp -(x i - µ j ) 2 /2 ) – k -means objective—Likelihood of assignment: Σ i min j (x i - µ j ) 2

Is Clustering Hard or Easy? • k -means (and ML estimation?) is NP-hard – For some point configurations, it is hard to find the optimal solution. – But do these point configurations actually correspond to clusters of points?

Is Clustering Hard or Easy? • k -means (and ML estimation?) is NP-hard – For some point configurations, it is hard to find the optimal solution. – But do these point configurations actually correspond to clusters of points? • Well separated Gaussian clusters, lots of data – Poly time algorithms for very large separation, #points – Empirically, EM* works (modest separation, #points) *EM with some bells and whistles: spectral projection (PCA), pruning centers, etc

Is Clustering Hard or Easy? (when its interesting) • k -means (and ML estimation?) is NP-hard – For some point configurations, it is hard to find the optimal solution. – But do these point configurations actually correspond to clusters of points? • Well separated Gaussian clusters, lots of data – Poly time algorithms for very large separation, #points – Empirically, EM* works (modest separation, #points) • Not enough data – Can’t identify clusters (ML clustering meaningless) *EM with some bells and whistles: spectral projection (PCA), pruning centers, etc

Effect of “Signal Strength” Large separation, Lots of data— More samples true solution creates distinct peak. Easy to find. Not enough data— “optimal” solution is Small separation, meaningless. Less samples

Effect of “Signal Strength” Large separation, Lots of data— More samples true solution creates distinct peak. Easy to find. Just enough data— optimal solution is meaningful, but hard to find? Not enough data— “optimal” solution is Small separation, meaningless. Less samples

Effect of “Signal Strength” Large separation, Lots of data— More samples true solution creates distinct peak. Easy to find. Computational limit ~ Just enough data— optimal solution is meaningful, but hard to find? Informational ~ limit Not enough data— “optimal” solution is Small separation, meaningless. Less samples

Effect of “Signal Strength” Infinite data limit: E x [cost(x;model)] = KL(true||model) Mode always at true model Determined by • number of clusters (k) • dimensionality (d) • separation (s) true model

Effect of “Signal Strength” Infinite data limit: E x [cost(x;model)] = KL(true||model) Mode always at true model Determined by • number of clusters (k) • dimensionality (d) • separation (s) Actual log-likelihood true model Also depends on: • sample size (n) − 1 1 N( true ; n J ) “local ML model” ~ Fisher [Redner Walker 84]

Informational and Computational Limits sample size (n) Enough information to reconstruct N o t e n o u g h i n f o r m ( a M t i o L n s o t o l u r e t i o c n o n i s s t r r a u n c d t o m ) separation (s)

Informational and Computational Limits sample size (n) Enough information to reconstruct Enough information to efficiently reconstruct N o t e n o u g h i n f o r m ( a M t i o L n s o t o l u r e t i o c n o n i s s t r r a u n c d t o m ) separation (s)

Informational and Computational Limits • What are the informational and computational limits? sample size (n) • Is there a gap? • Is there some minimum required separation for computational tractability? • Is the learning the centers always easy given the true distribution? Analytic, quantitative answers. Independent of specific algorithm / estimator separation (s)

Behavior as a function of Sample Size k=16 d=1024 sep=6 σ “fair” EM 0.12 EM from true centers 0.1 Max likelihood (fair or not) label error 0.08 True centers 0.06 0.04 0.02 0 100 300 1000 3000 sample size

Behavior as a function of Sample Size k=16 d=1024 sep=6 σ “fair” EM 0.12 EM from true centers 0.1 Max likelihood (fair or not) label error 0.08 True centers 0.06 0.04 0.02 0 5 Difference between likelihood of “fair” 4 EM runs and EM from true centers bits/sample 3 each run (random init) 2 run attaining max likelihood 1 0 -1 -2 100 300 1000 3000 sample size

Clustering Model of clustering What structure are we trying to capture? What properties do we expect the data to have? What are we trying to get out of it? What is a “good clustering”? Empirical objective and evaluation (e.g. minimization objective) Can it be used to recover the clustering (as specified above)? Post-hoc analysis: is what we found “real”? Algorithm How well does it achieve objective? How efficient is it? Under what circumstances?

Clustering Model of clustering What structure are we trying to capture? What properties do we expect the data to have? What are we trying to get out of it? Questions What is a “good clustering”? about the world Mathematics Empirical objective and evaluation (e.g. minimization objective) Can it be used to recover the clustering (as specified above)? Post-hoc analysis: is what we found “real”? Algorithm How well does it achieve objective? How efficient is it? Under what circumstances?

Clustering Model of clustering What structure are we trying to capture? What properties do we expect the data to have? What are we trying to get out of it? Questions What is a “good clustering”? about the world Mathematics Empirical objective and evaluation (e.g. minimization objective) Can it be used to recover the clustering (as specified above)? Post-hoc analysis: is what we found “real”? Can what we found generalize? Algorithm How well does it achieve objective? How efficient is it? Under what circumstances?

“Clustering is Easy”, take 1: Approximation Algorithms (1+ ε )-Approximation for k-means in time O(2 (k/ ε )const nd) [Kumar Sabharwal Sen 2004] For any data set of points, find clustering with k-means cost · (1+ ε ) × cost-of-optimal-clustering

“Clustering is Easy”, take 1: Approximation Algorithms (1+ ε )-Approximation for k-means in time O(2 (k/ ε )const nd) [Kumar Sabharwal Sen 2004] µ 1 = ( 5,0,0,0,…,0) 0.5 N( µ 1 , I ) + 0.5 N( µ 2 , I ) µ 2 = (-5,0,0,0,…,0) cost([ µ 1 , µ 2 ]) ≈ ∑ i min j (x i - µ j ) 2 ≈ d · n cost([0,0]) ≈ ∑ i min j (x i -0) 2 ≈ (d+25) · n ⇒ [0,0] is a (1+25/d)-approximation Need ε < sep 2 /d, time becomes O(2 (kds)const n)

Informational and Computational Limits of Clustering and other - PowerPoint PPT Presentation

Informational and Computational Limits of Clustering and other questions about clustering Nati Srebro University of Toronto based on work in progress with Gregory Shakhnarovich Sam Roweis Brown University University of Toronto

City Limits Lions Clubs City Limits Lions Clubs City Limits Lions Clubs City Limits Lions

Different Types of Limits Besides ordinary, two-sided limits, there are one-sided limits (left-

MAT 166 Calculus for Bus/Soc Chapter 3 Notes Limits The Deriviative David J. Gisch Limits

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Limits (the size of the pie) allocation limits minimum reliability flow of supply Limits

Medical Programs Overview Table 1. Caption Medical SNAP TANF Programs Income Limits Income

Scope & Limits of Scope & Limits of Scope & Limits of Legal Authority Legal

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Probabilistic & Unsupervised Learning Factored Variational Approximations and Variational

Lecture 3: Crossed Products by Finite Groups; the 1129 July 2016 Rokhlin Property Lecture 1

Convergence of ensemble Kalman filters in the large ensemble limit and infinite dimension Jan

Preliminary Results on n e / n t selection at DUNE FD. CP violation & n t physics perspectives.

Lecture 2.1: Separation of Variables Matthew Macauley Department of Mathematical Sciences

Brentuximab Vedotin in ALCL Bar arbara Pro, MD Nor orthwestern rn Univ iversity CD30 A ( (

Helmets and Neck Injuries in Fatal Motorcycle Crashes James V. Ouellet David R. Thom Terry A.

Analysis of serious injury collisions and serious injuries on Irish roads (2014-2017) RSA and An

Informational and Computational Limits of Clustering and other - PowerPoint PPT Presentation

Informational and Computational Limits of Clustering and other questions about clustering Nati Srebro University of Toronto based on work in progress with Gregory Shakhnarovich Sam Roweis Brown University University of Toronto

City Limits Lions Clubs City Limits Lions Clubs City Limits Lions Clubs City Limits Lions

Different Types of Limits Besides ordinary, two-sided limits, there are one-sided limits (left-

MAT 166 Calculus for Bus/Soc Chapter 3 Notes Limits The Deriviative David J. Gisch Limits

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Limits (the size of the pie) allocation limits minimum reliability flow of supply Limits

Medical Programs Overview Table 1. Caption Medical SNAP TANF Programs Income Limits Income

Scope &amp; Limits of Scope &amp; Limits of Scope &amp; Limits of Legal Authority Legal

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Probabilistic &amp; Unsupervised Learning Factored Variational Approximations and Variational

Lecture 3: Crossed Products by Finite Groups; the 1129 July 2016 Rokhlin Property Lecture 1

Convergence of ensemble Kalman filters in the large ensemble limit and infinite dimension Jan

Preliminary Results on n e / n t selection at DUNE FD. CP violation &amp; n t physics perspectives.

Lecture 2.1: Separation of Variables Matthew Macauley Department of Mathematical Sciences

Brentuximab Vedotin in ALCL Bar arbara Pro, MD Nor orthwestern rn Univ iversity CD30 A ( (

Helmets and Neck Injuries in Fatal Motorcycle Crashes James V. Ouellet David R. Thom Terry A.

Analysis of serious injury collisions and serious injuries on Irish roads (2014-2017) RSA and An

Scope & Limits of Scope & Limits of Scope & Limits of Legal Authority Legal

Probabilistic & Unsupervised Learning Factored Variational Approximations and Variational

Preliminary Results on n e / n t selection at DUNE FD. CP violation & n t physics perspectives.