Clustering shrinkage, L 0 and Staircases K. PELCKMANS, J.A.K. - - PowerPoint PPT Presentation

clustering shrinkage l 0 and staircases
SMART_READER_LITE
LIVE PREVIEW

Clustering shrinkage, L 0 and Staircases K. PELCKMANS, J.A.K. - - PowerPoint PPT Presentation

Clustering shrinkage, L 0 and Staircases K. PELCKMANS, J.A.K. SUYKENS, B. DE MOOR NIPS workshop on theoretical foundations of clustering December 2005 KULeuven - Department of Electrical Engineering - SCD/SISTA Kasteelpark Arenberg 10, 3001


slide-1
SLIDE 1

Clustering shrinkage, L0 and Staircases

  • K. PELCKMANS, J.A.K. SUYKENS, B. DE MOOR

NIPS workshop on theoretical foundations of clustering December 2005

KULeuven - Department of Electrical Engineering - SCD/SISTA Kasteelpark Arenberg 10, 3001 Heverlee (Leuven), Belgium Kristiaan.Pelckmans@esat.kuleuven.ac.be

  • K. PELCKMANS

K.U.Leuven - SCD/SISTA

slide-2
SLIDE 2

Empirical CCS

  • Theoretical CCS

Optimization view to Clustering

Empirical Convex Clustering Shrinkage:

  • Dataset {xi}N

i=1 ⊂ RD

  • N centroids: {Mi}N

i=1 ⊂ RD

  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 1/6

slide-3
SLIDE 3

Empirical CCS

  • Theoretical CCS

Optimization view to Clustering

Empirical Convex Clustering Shrinkage:

  • Dataset {xi}N

i=1 ⊂ RD

  • N centroids: {Mi}N

i=1 ⊂ RD

  • Given γ ≥ 0
  • Distance measure ·
  • Convex complexity measure

ℓ : RD → R+

  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 1/6

slide-4
SLIDE 4

Empirical CCS

  • Theoretical CCS

Optimization view to Clustering

Empirical Convex Clustering Shrinkage:

  • Dataset {xi}N

i=1 ⊂ RD

  • N centroids: {Mi}N

i=1 ⊂ RD

  • Given γ ≥ 0
  • Distance measure ·
  • Convex complexity measure

ℓ : RD → R+

  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 1/6

slide-5
SLIDE 5

Empirical CCS

  • Theoretical CCS

Optimization view to Clustering

Empirical Convex Clustering Shrinkage:

  • Dataset {xi}N

i=1 ⊂ RD

  • N centroids: {Mi}N

i=1 ⊂ RD

  • Given γ ≥ 0
  • Distance measure ·
  • Convex complexity measure

ℓ : RD → R+

γ=0

  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 1/6

slide-6
SLIDE 6

Empirical CCS

  • Theoretical CCS

Optimization view to Clustering

Empirical Convex Clustering Shrinkage:

  • Dataset {xi}N

i=1 ⊂ RD

  • N centroids: {Mi}N

i=1 ⊂ RD

  • Given γ ≥ 0
  • Distance measure ·
  • Convex complexity measure

ℓ : RD → R+

γ=0 γ = 10

  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 1/6

slide-7
SLIDE 7

Empirical CCS

  • Theoretical CCS

Optimization view to Clustering

Empirical Convex Clustering Shrinkage:

  • Dataset {xi}N

i=1 ⊂ RD

  • N centroids: {Mi}N

i=1 ⊂ RD

  • Given γ ≥ 0
  • Distance measure ·
  • Convex complexity measure

ℓ : RD → R+

γ=0 γ = 10 γ = 10000

  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 1/6

slide-8
SLIDE 8

Empirical CCS

  • Theoretical CCS

Optimization view to Clustering

Empirical Convex Clustering Shrinkage:

  • Dataset {xi}N

i=1 ⊂ RD

  • N centroids: {Mi}N

i=1 ⊂ RD

  • Given γ ≥ 0
  • Distance measure ·
  • Convex complexity measure

ℓ : RD → R+

γ=0 γ = 10 γ = 10000 Convex Programming Problem:

min

Mi

Jγ(Mi) = 1 2

N

X

i=1

xi − Mip

  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 1/6

slide-9
SLIDE 9

Empirical CCS

  • Theoretical CCS

Optimization view to Clustering

Empirical Convex Clustering Shrinkage:

  • Dataset {xi}N

i=1 ⊂ RD

  • N centroids: {Mi}N

i=1 ⊂ RD

  • Given γ ≥ 0
  • Distance measure ·
  • Convex complexity measure

ℓ : RD → R+

γ=0 γ = 10 γ = 10000 Convex Programming Problem:

min

Mi

Jγ(Mi) = 1 2

N

X

i=1

xi − Mip + γ X

i<j

ℓ (Mi − Mj) → Pelckmans et al., Convex Clustering Shrinkage, PASCAL workshop 2005

  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 1/6

slide-10
SLIDE 10

Empirical CCS

  • Theoretical CCS
  • γ = 0: Mi = Xi
  • γ → +∞: M1 =

· · · = MN = ¯ X

  • ℓ = | · |1
  • Ranging

γ, increasing number

  • f sparse differences
  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 2/6

slide-11
SLIDE 11

Empirical CCS

  • Theoretical CCS
  • γ = 0: Mi = Xi
  • γ → +∞: M1 =

· · · = MN = ¯ X

  • ℓ = | · |1
  • Ranging

γ, increasing number

  • f sparse differences

m(X) X x x

  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 2/6

slide-12
SLIDE 12

Empirical CCS

  • Theoretical CCS
  • γ = 0: Mi = Xi
  • γ → +∞: M1 =

· · · = MN = ¯ X

  • ℓ = | · |1
  • Ranging

γ, increasing number

  • f sparse differences

m(X) X x x

  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 2/6

slide-13
SLIDE 13

Empirical CCS

  • Theoretical CCS
  • γ = 0: Mi = Xi
  • γ → +∞: M1 =

· · · = MN = ¯ X

  • ℓ = | · |1
  • Ranging

γ, increasing number

  • f sparse differences

m(X) X x x x x’ m(x)=m(x’)

Univariate xi ∈ R Mi → Discrete m(xi) → Continuous

  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 2/6

slide-14
SLIDE 14

Empirical CCS

  • Theoretical CCS

Clustering Shrinkage (Ct’d)

Modifications:

  • 0-norm (count different pairs) → non-convex but interpretability!
  • ǫ-neighborhood: B(ǫ) ball with measure |B(ǫ)|

ˆ mǫ = arg min

m:RD→RD

J ǫ,p

γ (m)

= 1 p

N

X

i=1

m(xi)−xip+ γ |B(ǫ)|

N

X

i=1,

X

xi−xj≤ǫ

I (m(xi) − m(xj) > 0) , (1) → the second term measures the density of different assigned datapoints in a local neighborhood (cfr. histogram density estimator).

  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 3/6

slide-15
SLIDE 15

Empirical CCS

  • Theoretical CCS

Clustering Shrinkage (Ct’d)

Definition 1. [Theoretical Shrinkage Clustering] Let m : R → R be such that limδ→0

m(x−δ)−m(x+δ) |B(δ)|

exists almost everywhere. Let the cdf P (x) underlying the dataset be known and assume its pdf p(x) exists everywhere and is nonzero on a connected compact interval C ⊂ R with nonzero measure |C| > 0. We will study the following theoretical counterpart to (1) ˆ m = arg min

m:R→R

J p,0

γ

(m) = Z

C

‚ ‚m(x) − x ‚ ‚

pdP (x) + γ

Z

C

‚ ‚m′(x) ‚ ‚

0 dP (x),

(2) where we define the latter term -denoted further as the zero-norm variation- formally as follows ‚ ‚m′(x) ‚ ‚

0 lim ǫ→0

„I (m(B(x; ǫ)) = const) |B(x, ǫ)| « , (3) with the characteristic function I ` m(B(x; ǫ)) = const ´ equals one if ∃y ∈ B(x; ǫ) such that m(x) − m(y) > 0 (B(x, ǫ) contains parts of different clusters), and equal to zero otherwise.

  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 4/6

slide-16
SLIDE 16

Empirical CCS

  • Theoretical CCS

Clustering Shrinkage (Ct’d)

Theorem 1. [Univariate Staircase Representation] When P (x) is a fixed, smooth and differentiable distribution function with pdf p : R → R+ which is nonzero on a compact interval C ⊂ R, the minimizer to (2) takes the form of a staircase function uniquely defined on C with a finite number of positive steps (say K < +∞) of size a = (a1, . . . , aK)T ∈ RK at the points D(K) = {x(k)}K

k=1 ⊂ C

ˆ m ` x; a, D(K) ´ =

K

X

k=1

akI ` x > x(k) ´ s.t. ak ≥ 0, x(k) ∈ C ∀k (4) Moreover, the optimization problem (2) is equivalent to the problem min

a,D(K)

J p

K

` a, D(K) ´ = Z

C

‚ ‚ ‚ ‚ ‚

K

X

k=1

akI ` x > x(k) ´ − x ‚ ‚ ‚ ‚ ‚

p

p(x)dx+

K

X

k=1

p(x(k)), (5) where K ∈ N relates to γ ∈ R+ in a way depending on D.

  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 5/6

slide-17
SLIDE 17

Empirical CCS

  • Theoretical CCS

Interpretations

Unifying perspective:

  • Vector Quantization (k-means)
  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 6/6

slide-18
SLIDE 18

Empirical CCS

  • Theoretical CCS

Interpretations

Unifying perspective:

  • Vector Quantization (k-means)
  • Bump-hunting and max-cut
  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 6/6

slide-19
SLIDE 19

Empirical CCS

  • Theoretical CCS

Interpretations

Unifying perspective:

  • Vector Quantization (k-means)
  • Bump-hunting and max-cut
  • Optimal coding:

”finding a short code for X that preserves the maximum information about X itself.” L2 → KL

  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 6/6

slide-20
SLIDE 20

Empirical CCS

  • Theoretical CCS

Interpretations

Unifying perspective:

  • Vector Quantization (k-means)
  • Bump-hunting and max-cut
  • Optimal coding:

”finding a short code for X that preserves the maximum information about X itself.” L2 → KL

  • Optimal bin placement
  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 6/6

slide-21
SLIDE 21

Empirical CCS

  • Theoretical CCS

Interpretations

Unifying perspective:

  • Vector Quantization (k-means)
  • Bump-hunting and max-cut
  • Optimal coding:

”finding a short code for X that preserves the maximum information about X itself.” L2 → KL

  • Optimal bin placement

Main message:

  • Optimization view to clustering
  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 6/6

slide-22
SLIDE 22

Empirical CCS

  • Theoretical CCS

Interpretations

Unifying perspective:

  • Vector Quantization (k-means)
  • Bump-hunting and max-cut
  • Optimal coding:

”finding a short code for X that preserves the maximum information about X itself.” L2 → KL

  • Optimal bin placement

Main message:

  • Optimization view to clustering
  • Clustering → study of the class of staircases (cfr. classification).
  • K. PELCKMANS

K.U.Leuven - SCD/SISTA 6/6