The Variational Nystrm Method for Large-Scale Spectral Problems Max - - PowerPoint PPT Presentation

the variational nystr m method for large scale spectral
SMART_READER_LITE
LIVE PREVIEW

The Variational Nystrm Method for Large-Scale Spectral Problems Max - - PowerPoint PPT Presentation

The Variational Nystrm Method for Large-Scale Spectral Problems Max Vladymyrov Miguel Carreira-Perpin Google Inc. EECS, UC Merced June 20, 2016 Graph based dimensionality reduction methods Given high-dimensional data points


slide-1
SLIDE 1

The Variational Nyström Method for Large-Scale Spectral Problems

Max Vladymyrov Google Inc. Miguel Carreira-Perpiñán EECS, UC Merced

June 20, 2016

slide-2
SLIDE 2

M

Given high-dimensional data points . 1.Convert data points to a affinity matrix . 2.Find low-dimensional coordinates , so that their similarity is as close as possible to . YD×N = (y1, . . . , yN)

N × N

20 40 60 80 100 20 40 60 80 100

Xd×N = (x1, . . . , xN)

Y X High-dimensional input Affinity Low-dimensional

  • utput

RD Rd

Graph based dimensionality reduction methods

M

M

2

slide-3
SLIDE 3

3

Spectral methods

  • Consider a spectral problem:
  • : symmetric psd affinity matrix.
  • Examples:
  • Laplacian eigenmaps, is a graph Laplacian.
  • ISOMAP

, is given by a matrix of shortest distances.

  • Kernel PCA, MDS, Locally Linear Embedding (LLE), etc.
  • Solution is unique and can be found in closed form from the

eigenvectors of : . With large , solving the eigenproblem is infeasible even if is sparse. N

minX tr

  • XMXT

s.t. XXT = I, MN×N M

M

M M X = UT

M

slide-4
SLIDE 4

4

Learning with landmarks

Compute reduced . affinity matrix

5 10 15 20 5 10 15 20 0.1 0.2 0.3 0.4 0.5

Select landmarks (e.g. random subset) Learn landmark representation Project the rest

  • f the points

L

Goal is find a fast, approximate solution for the embedding using only the subset of the original points from . X

Y

L × L

RD

Rd Rd

slide-5
SLIDE 5

5

Nyström method

Essentially, an out-of-sample formula:

  • 1. Solve the eigenproblem for a subset of points.
  • 2. Predict the rest of the points through the interpolation formula.

Writing the affinity matrix by blocks (landmarks first):

M

The approximation to the eigendecomposition is equal to:

e UM = ✓ UA B21UAΛ−1

A

◆ M = A B21 B22 BT

21

C = A B21

slide-6
SLIDE 6

6

Column Sampling method

Uses more information from the affinity matrix than Nyström, but still ignores non-landmark/non-landmark interaction part .

M

The approximation to the eigendecomposition is given by the left singular vectors of : C

C = UCΣCVT

C

⇒ e UM = UC B22 M

Writing the affinity matrix by blocks (landmarks first):

M = A B21 B22 BT

21

C = A B21

slide-7
SLIDE 7

7

Locally Linear Landmarks (LLL) (Vladymyrov & Carreira-Perpiñán, 2013)

  • Additional assumption: this projection is satisfied in the

embedding space: .

  • Plugging the projection to the original obj. function:
  • Construct the local linear projection matrix from the input :

Z Y minX tr

  • XMXT

s.t. XXT = I, X = e XZT

yn ≈ PL

l=1 zlne

yl, n = 1, . . . , N Y ≈ e YZT ⇒ X = e XZT min e

X tr

⇣ e XZT MZ e XT ⌘ s.t. e XZT Z e XT = I

  • The solution is given by the reduced generalized eigenproblem:

e X = eig(ZMZT , ZZT )

  • Final embedding are predicted as: .

X = e XZT

  • This solution is optimal given the constraint .

X = e XZT

slide-8
SLIDE 8

Generalizing approximations

Nyström: LLL:

8

Column Sampling:

Rewrite the solution as , where is computed optimally (given ) as:

CT C

Rewrite using the eigendecomposition of matrix :

X = e XZT e UM = UC = CVCΣ−1

C = CUCT CΛ−1/2 CT C

e UM = ✓ UA B21UAΛ−1

A

◆ = ✓ AUAΛ−1

A

B21UAΛ−1

A

◆ = CUAΛ−1

A

e UM = Z e XT e X

Expand the upper part:

Z

N × L

}

}

L × L

L × d

e X = eig(ZMZT , ZZT )

slide-9
SLIDE 9

9

Generalizing approximations

Nyström:

  • 1. Solve the smaller eigendecomposition:
  • 2. Apply out-of-sample matrix:

LLL:

  • 1. Solve the smaller eigendecomposition:
  • 2. Apply out-of-sample matrix:

L × L L × L e UM = CUAΛ−1

A

A = UAΛAUT

A

Column Sampling:

  • 1. Solve the smaller eigendecomposition:
  • 2. Apply out-of-sample matrix:

L × L CT C = UCT CΛCT CUCT C e UM = CUCT CΛ−1/2

CT C

e UM = Z e XT N × L N × L N × L e X = eig(ZMZT , ZZT )

slide-10
SLIDE 10

Each approximation consist of the following steps:

  • define an out-of-sample matrix ,
  • compute some reduced eigenproblem and a matrix that

depends on it,

  • final approximation is equal to .

Nyström Column Sampling LLL

computed from

Random Projection

e UM = ZQ QL×d ZN×L QL×d UΛ−1 UΛ−1/2 Y ≈ e YZ C C qr(MqS)

Generalizing approximations

10

ZN×L ZMZT , ZT Z ZMZT , ZT Z ZT Z, I A, I

AU = BUΛ

Eigenproblem

A, B U U

slide-11
SLIDE 11

Nyström Column Sampling LLL

computed from

Random Projection Variational Nyström

e UM = ZQ QL×d ZN×L UΛ−1 U C qr(MqS)

Generalizing approximations

C U

11

ZN×L A, I ZMZT , ZZT

AU = BUΛ

Eigenproblem

A, B QL×d Each approximation consist of the following steps:

  • define an out-of-sample matrix ,
  • compute some reduced eigenproblem and a matrix that

depends on it,

  • final approximation is equal to .

U UΛ−1/2 Y ≈ e YZ C ZMZT , ZT Z ZT Z, I ZMZT , ZT Z

slide-12
SLIDE 12

12

Variational Nyström

minX tr

  • XMXT

s.t. XXT = I, X = e XCT

min e

X tr

⇣ e XCT MC e XT ⌘ s.t. e XCT C e XT = I Add this Nyström out-of-sample constraint to the spectral problem: From LLL perspective:

  • replace customary built out-of-sample matrix with a readily

available column matrix ,

  • abandon local linearity assumption of the weights ,
  • save computation of ,
  • is usually sparser than (due to locality).

Z Z Z C C Z

slide-13
SLIDE 13

13

Variational Nyström

Add this Nyström out-of-sample constraint to the spectral problem: From Nyström perspective:

  • use the same out-of-sample matrix , but optimize the choice of

the reduced eigenproblem,

  • for fixed gives better approx. than Nyström or Column

Sampling (optimal for the out-of-sample kernel ).

  • uses all the elements from to construct the reduced

eigenproblem,

  • forgo the interpolating property of Nyström.

C

C M

e Y minX tr

  • XMXT

s.t. XXT = I, X = e XCT

min e

X tr

⇣ e XCT MC e XT ⌘ s.t. e XCT C e XT = I

slide-14
SLIDE 14

14

Subsampling graph Laplacian

  • Graph Laplacian kernel is a data dependent:
  • Consider given by normalized graph Laplacian matrix:

D = diag (PN

m=1 wnm)

M wnm = exp(ky2

n y2 mk/2σ2)

  • Gaussian affinity matrix:
  • Degree matrix:

graph Laplacian computed for a subset

  • f input points

subset of graph Laplacian constructed for points.

6=

L × L N × N

L × L L

N

L ∝ D−1/2WD−1/2

  • One of the most widely used kernel (Laplacian Eigenmaps, spectral

clustering).

slide-15
SLIDE 15

Subsampling graph Laplacian

L = N

  • Data dependance can be a problem for methods that depend
  • n the subsampling:
  • Nyström,
  • Column Sampling,
  • Variational Nyström.
  • Not a problem methods for which there is no subsampling:
  • LLL,
  • Random projection.

D1 C D2 D−1/2 D−1/2 M L → N Our solution: normalize subsample kernel separately, but in a way that interpolates over the landmarks and gives exact solution when :

15

slide-16
SLIDE 16

Subsampling graph Laplacian

  • For Nyström and Column Sampling:
  • we propose different forms for and ,
  • we evaluate empirically which one is the best.

D1 D2

  • For Variational Nyström:
  • we showed that factors out,
  • any leads to the exact solution when .

D2 L = N D1 D1 C D2 D−1/2 D−1/2 M For the graph Laplacian kernel, the Variational Nyström approximation is more general.

16

L → N

slide-17
SLIDE 17

10

2

10

3

10

4

10

−4

10

−3

10

−2

10

−1

10 L

  • bjfun

Nys CS LLL

  • Nys

Halko(q=1) Halko(q=2) Halko(q=3)

Number of landmarks Error with respect to the exact

Experiments: Laplacian eigenmaps

17

  • Reduce dimensionality of digits from MNIST .
  • Run 5 times for different randomly chosen landmarks from

to .

L = 11 L = 19 900 N = 20 000 d = 10

slide-18
SLIDE 18

10

2

10

3

10

4

10 10

1

10

2

10

3

10

4

L time

Number of landmarks Runtime

18

Experiments: Laplacian eigenmaps

  • Reduce dimensionality of digits from MNIST .
  • Run 5 times for different randomly chosen landmarks from

to .

L = 11 L = 19 900 N = 20 000 d = 10

slide-19
SLIDE 19

10 10

1

10

2

10

3

10

4

10

−4

10

−3

10

−2

10

−1

10 time

  • bjfun

Runtime Error with respect to the exact

19

Experiments: Laplacian eigenmaps

  • Reduce dimensionality of digits from MNIST .
  • Run 5 times for different randomly chosen landmarks from

to .

L = 11 L = 19 900 N = 20 000 d = 10

slide-20
SLIDE 20

10 10

1

10

2

10

3

10

4

10

−4

10

−3

10

−2

10

−1

10 time

  • bjfun

Runtime Error with respect to the exact

19

Experiments: Laplacian eigenmaps

  • Reduce dimensionality of digits from MNIST .
  • Run 5 times for different randomly chosen landmarks from

to .

L = 11 L = 19 900 N = 20 000 d = 10

Variational Nyström is winning! 2x as fast as LLL!

slide-21
SLIDE 21

20

Experiments: Spectral clustering

Original image Exact Spectral clustering, Nyström, Variational Nyström, t = 512s t = 25s t = 25s 20x speedup!

slide-22
SLIDE 22

21

infiniteMNIST embedding

Nyström LLL Variational Nyström

Embedding of digits from MNIST. Fix the runtime to . min

L = 16 000 L = 5 000 L = 4 500 N = 1 020 000 t = 10

slide-23
SLIDE 23

Conclusions

22

  • The Variational Nyström method is the optimal way to

use the out-of-sample Nyström formula to solve an eigenproblem approximately. It is able to achieve a low- to-medium accuracy solution faster than Nyström and

  • ther methods.
  • We present a simple unified model of spectral

clustering approximations, combining many existing algorithms such as Nyström, Column Sampling, LLL.

  • We study the role of normalization in subsampling of the

graph Laplacian kernel and show that Variational Nyström is more general for this kernel.

Partially supported by NSF award IIS-1423515 Poster #64 tomorrow (10am-1pm)