Clustering by Support Vector Manifold Learning Marcin Orchel AGH - - PowerPoint PPT Presentation

clustering by support vector manifold learning
SMART_READER_LITE
LIVE PREVIEW

Clustering by Support Vector Manifold Learning Marcin Orchel AGH - - PowerPoint PPT Presentation

Clustering by Support Vector Manifold Learning Marcin Orchel AGH University of Science and Technology in Poland 1 / 12 Problem and My Contributions Problem Characterizations of clusters are boundary, center (prototype), cluster core,


slide-1
SLIDE 1

Clustering by Support Vector Manifold Learning

Marcin Orchel

AGH University of Science and Technology in Poland

1 / 12

slide-2
SLIDE 2

Problem and My Contributions

Problem

Characterizations of clusters are boundary, center (prototype), cluster core, characteristic manifold of a cluster. The multiple manifold learning problem is to fit multiple manifolds (hypersurfaces) to data points and generalize to unseen data.

Approach

The support vector manifold learning (SVML) transforms a feature space to a kernel-induced feature space and then fits to the data with the hypothesis space containing only hyperplanes and generalize well. For fitting to the data with SVML, we need a regression method that works completely in a kernel-induced feature space. SVML duplicates and shifts points in the kernel-induced feature space in the direction of any training vector and solves a classification problem.

2 / 12

slide-3
SLIDE 3

Comparison of Manifold Learning Methods

  • 1.0

1.0 1.0 y x

(a)

0.0 1.0 0.0 1.0 y x

(b)

  • 1.0

1.0 1.0 y x

(c)

  • Fig. 1: Manifold learning. Points—examples. (a) For points generated from a circle. Solid line—solution of
  • ne-class support vector machines (OCSVM) for C = 1.0, σ = 0.9, dashed line—solution of SVML for

C = 100.0, σ = 0.9, t = 0.01, thin dotted line—solution of kernel principal component analysis (KPCA) for σ = 0.9. (b) For points generated from a Lissajous curve. Solid line—solution of OCSVM for C = 1000.0, σ = 0.5, dashed line—solution of SVML for C = 100000.0, σ = 0.8, t = 0.01, thin dotted line—solution of KPCA for σ = 0.5. (c) Solid line— solution of SVML for c = 0, C = 100.0, σ = 0.9, t = 0.01, dashed line—solution of SVML for random values of c, C = 100.0, σ = 0.9, t = 0.01.

3 / 12

slide-4
SLIDE 4

Support Vector Manifold Learning (SVML)

The kernel function for two data points xi and xj for i, j = 1, . . . , n is K

  • xi,

xj

  • = Ko
  • xi,

xj

  • + yjtKo (

xi, c) (1) +yitKo

  • c,

xj

  • + yjyit2Ko (

c, c) , (2) where c is the shifting direction defined in an original feature space, t is the translation parameter, yi = 1 for the point shifted up, and yi = −1 for the point shifted down. The cross kernel is K ( xi, x) = Ko ( xi, x) + yitKo ( c, x) . (3) The number of support vectors is maximally equal to n + 1. The solution is

n

  • i=1

(αi − αi+n) K ( xi, x) +

n

  • i=1

(αi + αi+n) tK ( c, x) + b = 0 . (4)

4 / 12

slide-5
SLIDE 5

Model with Shifted Hyperplanes

Proposition 1

Shifting a hyperplane with any value of c gives a new hyperplane which differs from the original by a free term b.

Lemma 1

After duplicating and shifting a n − 1 dimensional hyperplane constrained by n − 1-dimensional hypersphere, the maximal distance from an original center

  • f a hypersphere to any point belonging to the shifted n − 2 hypersphere is for

a point such as after projecting this point to the n − 1 dimensional hyperplane (before shift), a vector from 0 to this point is parallel to a vector from 0 to a projected center of one of the shifted n − 2 hyperspheres.

5 / 12

slide-6
SLIDE 6

Model with Shifted Hyperplanes

Lemma 2

The radius Rn of a minimal hypersphere containing both hyperplanes constrained by n − 1 dimensional hypersphere after shifting is equal to Rn = c + R cm/ cm (5) where cm is defined as

  • cm =

c − b + w · c w2

  • w .

(6) and cm = 0. For cm = 0, we get Rn =

  • c2 + R2.

6 / 12

slide-7
SLIDE 7

Generalization bounds for Shifted Hyperplanes

We can improve generalization bounds when

  • c + R

cm/ cm2 D2

  • 1 + D
  • cp
  • 2 ≤ R2D2

(7)

  • c + R

cm/ cm2

  • 1 + D
  • cp
  • 2

≤ R2 (8) For a special case, when cm = 0, we get

  • cp
  • 1 + D
  • cp
  • 2 ≤ R2 .

(9)

7 / 12

slide-8
SLIDE 8

Model with Shifted Hyperplanes

Proposition 2

When cp is constant and 2 cm ≤ R, then the solution of maximizing a margin between two n − 2 hyperspheres is equivalent to the hyperplane that contains the n − 2 hypersphere before duplicating and shifting.

8 / 12

slide-9
SLIDE 9

Performance measure

For OCSVM the distance between a point r and the minimal hypersphere in a kernel-induced feature space can be computed as R −  

n

  • i=1

n

  • j=1

αiαjK

  • xi,

xj

  • (10)

− 2

n

  • j=1

αjK

  • xj,

r

  • + K (

r, r)  

1/2

. (11) For kernels for which K( x, x) is constant, such as the radial basis function (RBF) kernel, the radius R can be computed as follows R =

  • K (

x, x) +

n

  • i=1

n

  • j=1

αiαjK

  • xi,

xj

  • + 2b∗ .

(12)

9 / 12

slide-10
SLIDE 10

Performance measure

For SVML, the distance between a point r and the hyperplane in a kernel-induced feature space can be computed as | wc · r + bc|

  • wc2

= (13)

  • n

i=1 yi cα∗ i K (

xi, r)

  • + bc
  • n

i=1

n

j=1 yi cyj cα∗ i α∗ j K

  • xi,

xj . (14)

10 / 12

slide-11
SLIDE 11

Comparison of Clustering Methods

First, we map any two points to the same cluster if there do not exist two points between them with different sign of a functional margin. Second, we map remaining unassigned points to clusters of the nearest neighbors from the assigned points. 0.0 1.0 0.0 y x

(a)

0.0 1.0 0.0 y x

(b)

0.0 1.0 0.0 y x

(c)

  • Fig. 2: Clustering by manifold learning. Points—examples, filled points—support vectors. (a) Solid

line—solution of support vector clustering (SVCL) for C = 10000.0, σ = 0.35. (b) Solid line—solution of support vector manifold learning clustering (SVMLC) for C = 100000.0, σ = 1.1, t = 0.01. (c) Solid line—solution of KPCA.

11 / 12

slide-12
SLIDE 12

Results

For the manifold learning experiment, we check the average distance between points and a solution in a kernel-induced feature space. We validate clustering on classification data sets. We assume that data samples that belong to the same cluster have the same class in a classification problem.

Table 1: Performance of SVMLC, SVCL, KPCA, SVML, OCSVM for real world data, part 2. The numbers in

descriptions of the columns mean the methods: 1 - SVMLC, 2 - SVCL, 3 - KPCA for the first row, 1 - SVML, 2 - OCSVM, 3 - KPCA for the second row. The test with id=0 is for all tests for the clustering experiment. The test with id=1 is for all tests for the manifold learning experiment. Column descriptions: rs – an average rank of the method for the mean error; the best method is in bold, tsf – the Friedman statistic for average ranks for the mean error; the significant value is in bold, tsn – the Nemenyi statistic for average ranks for the mean error, reported when the Friedman statistic is significant, the significant value is in bold, svr – the average rank for the number of nonzero coefficients (support vectors for support vector machines (SVM) methods); the smallest value is in bold. id rs1 rs2 rs3 tsf tsn12 tsn13 tsn23 sv1 sv2 sv3 1.71 1.93 2.36 4.5 – – – 2.83 1.67 1.5 1 1.49 2.98 1.53 33.09 -4.82 0.3 5.13 1.51 2.38 2.11

12 / 12