clustering shrinkage l 0 and staircases
play

Clustering shrinkage, L 0 and Staircases K. PELCKMANS, J.A.K. - PowerPoint PPT Presentation

Clustering shrinkage, L 0 and Staircases K. PELCKMANS, J.A.K. SUYKENS, B. DE MOOR NIPS workshop on theoretical foundations of clustering December 2005 KULeuven - Department of Electrical Engineering - SCD/SISTA Kasteelpark Arenberg 10, 3001


  1. Clustering shrinkage, L 0 and Staircases K. PELCKMANS, J.A.K. SUYKENS, B. DE MOOR NIPS workshop on theoretical foundations of clustering December 2005 KULeuven - Department of Electrical Engineering - SCD/SISTA Kasteelpark Arenberg 10, 3001 Heverlee (Leuven), Belgium Kristiaan.Pelckmans@esat.kuleuven.ac.be K. PELCKMANS K.U.Leuven - SCD/SISTA

  2. Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D Empirical CCS � Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

  3. Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � � • Convex complexity measure ℓ : R D → R + Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

  4. Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � � • Convex complexity measure ℓ : R D → R + Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

  5. Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � � • Convex complexity measure ℓ : R D → R + γ=0 Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

  6. Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � � γ = 10 • Convex complexity measure ℓ : R D → R + γ=0 Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

  7. Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � γ = 10000 � γ = 10 • Convex complexity measure ℓ : R D → R + γ=0 Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

  8. Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � γ = 10000 � γ = 10 • Convex complexity measure ℓ : R D → R + γ=0 Theoretical CCS Convex Programming Problem: N J γ ( M i ) = 1 X � x i − M i � p min 2 Mi i =1 K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

  9. Optimization view to Clustering Empirical Convex Clustering Shrinkage: • Dataset { x i } N i =1 ⊂ R D • N centroids: { M i } N i =1 ⊂ R D • Given γ ≥ 0 Empirical CCS • Distance measure � · � γ = 10000 � γ = 10 • Convex complexity measure ℓ : R D → R + γ=0 Theoretical CCS Convex Programming Problem: N J γ ( M i ) = 1 X X � x i − M i � p + γ ℓ ( M i − M j ) min 2 Mi i =1 i<j → Pelckmans et al. , Convex Clustering Shrinkage, PASCAL workshop 2005 K. PELCKMANS K.U.Leuven - SCD/SISTA 1/6

  10. • γ = 0 : M i = X i • γ → + ∞ : M 1 = · · · = M N = ¯ X • ℓ = | · | 1 • Ranging γ , Empirical CCS increasing number of sparse differences � Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 2/6

  11. • γ = 0 : M i = X i • γ → + ∞ : M 1 = · · · = M N = ¯ X • ℓ = | · | 1 • Ranging γ , Empirical CCS increasing number of sparse differences � Theoretical CCS m(X) �x x X K. PELCKMANS K.U.Leuven - SCD/SISTA 2/6

  12. • γ = 0 : M i = X i • γ → + ∞ : M 1 = · · · = M N = ¯ X • ℓ = | · | 1 • Ranging γ , Empirical CCS increasing number of sparse differences � Theoretical CCS m(X) �x x X K. PELCKMANS K.U.Leuven - SCD/SISTA 2/6

  13. • γ = 0 : M i = X i • γ → + ∞ : M 1 = · · · = M N = ¯ X • ℓ = | · | 1 • Ranging γ , Empirical CCS increasing number of sparse differences � Theoretical CCS m(X) Univariate x i ∈ R M i → Discrete �x m ( x i ) → Continuous m(x)=m(x’) x’ x x X K. PELCKMANS K.U.Leuven - SCD/SISTA 2/6

  14. Clustering Shrinkage (Ct’d) Modifications: • 0 -norm (count different pairs) → non-convex but interpretability! • ǫ -neighborhood: B ( ǫ ) ball with measure | B ( ǫ ) | Empirical CCS � J ǫ,p m ǫ = arg min ˆ γ ( m ) Theoretical CCS m : R D → R D N N = 1 γ X X X � m ( x i ) − x i � p + I ( � m ( x i ) − m ( x j ) � > 0) , | B ( ǫ ) | p i =1 i =1 , � xi − xj �≤ ǫ (1) → the second term measures the density of different assigned datapoints in a local neighborhood (cfr. histogram density estimator). K. PELCKMANS K.U.Leuven - SCD/SISTA 3/6

  15. Clustering Shrinkage (Ct’d) Definition 1. [Theoretical Shrinkage Clustering] Let m : R → R be such m ( x − δ ) − m ( x + δ ) that lim � δ �→ 0 exists almost everywhere. Let the cdf P ( x ) | B ( � δ � ) | underlying the dataset be known and assume its pdf p ( x ) exists everywhere and is nonzero on a connected compact interval C ⊂ R with nonzero measure | C | > 0 . We will study the following theoretical counterpart to (1) Z Z Empirical CCS J p, 0 ‚ m ′ ( x ) ‚ ‚ ‚ ‚ m = arg min ˆ ( m ) = ‚ m ( x ) − x p dP ( x ) + γ 0 dP ( x ) , ‚ ‚ γ m : R → R C C � (2) where we define the latter term -denoted further as the zero-norm variation- formally Theoretical CCS as follows „ I ( m ( B ( x ; ǫ )) � = const ) « ‚ m ′ ( x ) ‚ ‚ 0 � lim (3) , ‚ | B ( x, ǫ ) | ǫ → 0 ` ´ with the characteristic function I m ( B ( x ; ǫ )) � = const equals one if ∃ y ∈ B ( x ; ǫ ) such that � m ( x ) − m ( y ) � > 0 ( B ( x, ǫ ) contains parts of different clusters), and equal to zero otherwise. K. PELCKMANS K.U.Leuven - SCD/SISTA 4/6

  16. Clustering Shrinkage (Ct’d) Theorem 1. [Univariate Staircase Representation] When P ( x ) is a fixed, smooth and differentiable distribution function with pdf p : R → R + which is nonzero on a compact interval C ⊂ R , the minimizer to (2) takes the form of a staircase function uniquely defined on C with a finite number of positive steps (say K < + ∞ ) of size a = ( a 1 , . . . , a K ) T ∈ R K at the points D ( K ) = { x ( k ) } K k =1 ⊂ C Empirical CCS K X ` ´ ` ´ ˆ x ; a, D ( K ) = s.t. a k ≥ 0 , x ( k ) ∈ C ∀ k (4) m a k I x > x ( k ) � k =1 Theoretical CCS Moreover, the optimization problem (2) is equivalent to the problem ‚ ‚ K K Z ‚ ‚ X X J p ` ´ ` ´ a, D ( K ) − x min = a k I x > x ( k ) p ( x ) dx + p ( x ( k ) ) , ‚ ‚ K ‚ ‚ a, D ( K ) C ‚ ‚ k =1 k =1 p (5) where K ∈ N relates to γ ∈ R + in a way depending on D . K. PELCKMANS K.U.Leuven - SCD/SISTA 5/6

  17. Interpretations Unifying perspective: • Vector Quantization ( k -means) Empirical CCS � Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

  18. Interpretations Unifying perspective: • Vector Quantization ( k -means) • Bump-hunting and max-cut Empirical CCS � Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

  19. Interpretations Unifying perspective: • Vector Quantization ( k -means) • Bump-hunting and max-cut • Optimal coding: ”finding a short code for X that preserves the maximum Empirical CCS information about X itself.” L 2 → KL � Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

  20. Interpretations Unifying perspective: • Vector Quantization ( k -means) • Bump-hunting and max-cut • Optimal coding: ”finding a short code for X that preserves the maximum Empirical CCS information about X itself.” L 2 → KL � • Optimal bin placement Theoretical CCS K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

  21. Interpretations Unifying perspective: • Vector Quantization ( k -means) • Bump-hunting and max-cut • Optimal coding: ”finding a short code for X that preserves the maximum Empirical CCS information about X itself.” L 2 → KL � • Optimal bin placement Theoretical CCS Main message: • Optimization view to clustering K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

  22. Interpretations Unifying perspective: • Vector Quantization ( k -means) • Bump-hunting and max-cut • Optimal coding: ”finding a short code for X that preserves the maximum Empirical CCS information about X itself.” L 2 → KL � • Optimal bin placement Theoretical CCS Main message: • Optimization view to clustering • Clustering → study of the class of staircases (cfr. classification). K. PELCKMANS K.U.Leuven - SCD/SISTA 6/6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend