Clustering shrinkage, L 0 and Staircases K. PELCKMANS, J.A.K. - PowerPoint PPT Presentation

Clustering shrinkage, L 0 and Staircases K. PELCKMANS, J.A.K. SUYKENS, B. DE MOOR NIPS workshop on theoretical foundations of clustering December 2005 KULeuven - Department of Electrical Engineering - SCD/SISTA Kasteelpark Arenberg 10, 3001 Heverlee (Leuven), Belgium Kristiaan.Pelckmans@esat.kuleuven.ac.be K. PELCKMANS K.U.Leuven - SCD/SISTA

Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D Empirical CCS � Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � � • Convex complexity measure ℓ : R D → R + Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � � • Convex complexity measure ℓ : R D → R + γ=0 Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � � γ = 10 • Convex complexity measure ℓ : R D → R + γ=0 Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � γ = 10000 � γ = 10 • Convex complexity measure ℓ : R D → R + γ=0 Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � γ = 10000 � γ = 10 • Convex complexity measure ℓ : R D → R + γ=0 Theoretical CCS Convex Programming Problem: N J γ ( M i ) = 1 X � x i − M i � p min 2 Mi i =1 K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � γ = 10000 � γ = 10 • Convex complexity measure ℓ : R D → R + γ=0 Theoretical CCS Convex Programming Problem: N J γ ( M i ) = 1 X X � x i − M i � p + γ ℓ ( M i − M j ) min 2 Mi i =1 i<j → Pelckmans et al. , Convex Clustering Shrinkage, PASCAL workshop 2005 K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

• γ = 0 : M i = X i • γ → + ∞ : M 1 = · · · = M N = ¯ X • ℓ = | · | 1 • Ranging γ , Empirical CCS increasing number of sparse differences � Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 2/6

• γ = 0 : M i = X i • γ → + ∞ : M 1 = · · · = M N = ¯ X • ℓ = | · | 1 • Ranging γ , Empirical CCS increasing number of sparse differences � Theoretical CCS m(X) �x x X K. PELCKMANS K.U.Leuven - SCD/SISTA 2/6

• γ = 0 : M i = X i • γ → + ∞ : M 1 = · · · = M N = ¯ X • ℓ = | · | 1 • Ranging γ , Empirical CCS increasing number of sparse differences � Theoretical CCS m(X) Univariate x i ∈ R M i → Discrete �x m ( x i ) → Continuous m(x)=m(x’) x’ x x X K. PELCKMANS K.U.Leuven - SCD/SISTA 2/6

Clustering Shrinkage (Ct’d) Modifications: • 0 -norm (count different pairs) → non-convex but interpretability! • ǫ -neighborhood: B ( ǫ ) ball with measure | B ( ǫ ) | Empirical CCS � J ǫ,p m ǫ = arg min ˆ γ ( m ) Theoretical CCS m : R D → R D N N = 1 γ X X X � m ( x i ) − x i � p + I ( � m ( x i ) − m ( x j ) � > 0) , | B ( ǫ ) | p i =1 i =1 , � xi − xj �≤ ǫ (1) → the second term measures the density of different assigned datapoints in a local neighborhood (cfr. histogram density estimator). K. PELCKMANS K.U.Leuven - SCD/SISTA 3/6

Clustering Shrinkage (Ct’d) Definition 1. [Theoretical Shrinkage Clustering] Let m : R → R be such m ( x − δ ) − m ( x + δ ) that lim � δ �→ 0 exists almost everywhere. Let the cdf P ( x ) | B ( � δ � ) | underlying the dataset be known and assume its pdf p ( x ) exists everywhere and is nonzero on a connected compact interval C ⊂ R with nonzero measure | C | > 0 . We will study the following theoretical counterpart to (1) Z Z Empirical CCS J p, 0 ‚ m ′ ( x ) ‚ ‚ ‚ ‚ m = arg min ˆ ( m ) = ‚ m ( x ) − x p dP ( x ) + γ 0 dP ( x ) , ‚ ‚ γ m : R → R C C � (2) where we define the latter term -denoted further as the zero-norm variation- formally Theoretical CCS as follows „ I ( m ( B ( x ; ǫ )) � = const ) « ‚ m ′ ( x ) ‚ ‚ 0 � lim (3) , ‚ | B ( x, ǫ ) | ǫ → 0 ` ´ with the characteristic function I m ( B ( x ; ǫ )) � = const equals one if ∃ y ∈ B ( x ; ǫ ) such that � m ( x ) − m ( y ) � > 0 ( B ( x, ǫ ) contains parts of different clusters), and equal to zero otherwise. K. PELCKMANS K.U.Leuven - SCD/SISTA 4/6

Clustering Shrinkage (Ct’d) Theorem 1. [Univariate Staircase Representation] When P ( x ) is a fixed, smooth and differentiable distribution function with pdf p : R → R + which is nonzero on a compact interval C ⊂ R , the minimizer to (2) takes the form of a staircase function uniquely defined on C with a finite number of positive steps (say K < + ∞ ) of size a = ( a 1 , . . . , a K ) T ∈ R K at the points D ( K ) = { x ( k ) } K k =1 ⊂ C Empirical CCS K X ` ´ ` ´ ˆ x ; a, D ( K ) = s.t. a k ≥ 0 , x ( k ) ∈ C ∀ k (4) m a k I x > x ( k ) � k =1 Theoretical CCS Moreover, the optimization problem (2) is equivalent to the problem ‚ ‚ K K Z ‚ ‚ X X J p ` ´ ` ´ a, D ( K ) − x min = a k I x > x ( k ) p ( x ) dx + p ( x ( k ) ) , ‚ ‚ K ‚ ‚ a, D ( K ) C ‚ ‚ k =1 k =1 p (5) where K ∈ N relates to γ ∈ R + in a way depending on D . K. PELCKMANS K.U.Leuven - SCD/SISTA 5/6

Interpretations Unifying perspective: • Vector Quantization ( k -means) Empirical CCS � Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

Interpretations Unifying perspective: • Vector Quantization ( k -means) • Bump-hunting and max-cut Empirical CCS � Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

Interpretations Unifying perspective: • Vector Quantization ( k -means) • Bump-hunting and max-cut • Optimal coding: ”finding a short code for X that preserves the maximum Empirical CCS information about X itself.” L 2 → KL � Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

Interpretations Unifying perspective: • Vector Quantization ( k -means) • Bump-hunting and max-cut • Optimal coding: ”finding a short code for X that preserves the maximum Empirical CCS information about X itself.” L 2 → KL � • Optimal bin placement Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

Interpretations Unifying perspective: • Vector Quantization ( k -means) • Bump-hunting and max-cut • Optimal coding: ”finding a short code for X that preserves the maximum Empirical CCS information about X itself.” L 2 → KL � • Optimal bin placement Theoretical CCS Main message: • Optimization view to clustering K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

Interpretations Unifying perspective: • Vector Quantization ( k -means) • Bump-hunting and max-cut • Optimal coding: ”finding a short code for X that preserves the maximum Empirical CCS information about X itself.” L 2 → KL � • Optimal bin placement Theoretical CCS Main message: • Optimization view to clustering • Clustering → study of the class of staircases (cfr. classification). K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

Clustering shrinkage, L 0 and Staircases K. PELCKMANS, J.A.K. - PowerPoint PPT Presentation

Clustering shrinkage, L 0 and Staircases K. PELCKMANS, J.A.K. SUYKENS, B. DE MOOR NIPS workshop on theoretical foundations of clustering December 2005 KULeuven - Department of Electrical Engineering - SCD/SISTA Kasteelpark Arenberg 10, 3001

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Econ 2148, fall 2017 Shrinkage in the Normal means model Maximilian Kasy Department of

Advanced Econometrics 2, Hilary term 2020 Shrinkage in the Normal means model Maximilian Kasy

Econ 2148, fall 2019 Shrinkage in the Normal means model Maximilian Kasy Department of

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Multivariate smoothing, model selection David L Miller Recap How GAMs work How to include

The One-Quarter Fraction Need two generating relations. E.g. a 2 6 2 design, with generating

Hyperparameter tuning in caret Dr. Shirin Glander Data Scientist DataCamp Hyperparameter

Shrinkage priors Dr. Jarad Niemi Iowa State University August 24, 2017 Jarad Niemi (Iowa State)

CSE 291D/234 Data Systems for Machine Learning Arun Kumar Topic 3: Feature Engineering and Model

New Algorithms for Sparse Representation of Discrete Signals Based on p - 2 Optimization

A Study of Probability Estimation Techniques for Rule Learning Jan-Nikolas Sulzmann Johannes F

Forecasting with R A practical workshop International Symposium on Forecasting 2017 25 th June

Sambuz

Useful Links

Newsletter

Mail Us