Efficient Algorithms and Error Analysis for Motivation the Modified - - PowerPoint PPT Presentation

efficient algorithms and error analysis for
SMART_READER_LITE
LIVE PREVIEW

Efficient Algorithms and Error Analysis for Motivation the Modified - - PowerPoint PPT Presentation

The Modified Nystrm Method Wang & Zhang Efficient Algorithms and Error Analysis for Motivation the Modified Nystrm Method The Nystrm Method Column Sampling Shusen Wang 1 Zhihua Zhang 2 Improve the Nystrm Method 1 Zhejiang


slide-1
SLIDE 1

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Efficient Algorithms and Error Analysis for the Modified Nyström Method

Shusen Wang1 Zhihua Zhang2

1Zhejiang University, China 2Shanghai Jiao Tong University, China

AISTATS 2014

slide-2
SLIDE 2

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Outline

1

Motivation

2

The Nyström Method

3

Column Sampling

4

Improve the Nyström Method

5

The Modified Nyström Method Comparisons between the Two Methods Efficient Algorithms Theories

slide-3
SLIDE 3

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Outline

1

Motivation

2

The Nyström Method

3

Column Sampling

4

Improve the Nyström Method

5

The Modified Nyström Method Comparisons between the Two Methods Efficient Algorithms Theories

slide-4
SLIDE 4

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Kernel methods

K: n × n kernel matrix. Matrix inverse b = (K + αIn)−1y

time complexity: O(n3) performed by Gaussian process regression, least square SVM, kernel ridge regression

Partial eigenvalue decomposition of K

time complexity: O(n2k) performed by kernel PCA and some manifold learning methods

slide-5
SLIDE 5

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Kernel methods

K: n × n kernel matrix. Matrix inverse b = (K + αIn)−1y

time complexity: O(n3) performed by Gaussian process regression, least square SVM, kernel ridge regression

Partial eigenvalue decomposition of K

time complexity: O(n2k) performed by kernel PCA and some manifold learning methods

slide-6
SLIDE 6

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Kernel methods

K: n × n kernel matrix. Matrix inverse b = (K + αIn)−1y

time complexity: O(n3) performed by Gaussian process regression, least square SVM, kernel ridge regression

Partial eigenvalue decomposition of K

time complexity: O(n2k) performed by kernel PCA and some manifold learning methods

slide-7
SLIDE 7

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Computational Challenges

High time complexities: O(n3) or O(n2k) High space complexity: O(n2)

the iterative algorithms go many passes through the data you had better put the entire kernel matrix in RAM if the data does not fit in the RAM ⇒ one swap between RAM and disk in each pass ⇒ very slow!

slide-8
SLIDE 8

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Computational Challenges

High time complexities: O(n3) or O(n2k) High space complexity: O(n2)

the iterative algorithms go many passes through the data you had better put the entire kernel matrix in RAM if the data does not fit in the RAM ⇒ one swap between RAM and disk in each pass ⇒ very slow!

slide-9
SLIDE 9

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

How to Speedup

If we can find a fast low-rank factorization K

  • n×n

≈ D

  • n×d

DT

  • d×n

, then (K + αIn)−1 and the partial eigenvalue decomposition of K can be approximated solved highly efficiently.

slide-10
SLIDE 10

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

How to Speedup: Example 1

Suppose we have a low-rank factorization K

  • n×n

≈ D

  • n×d

DT

  • d×n

. Approximately compute the matrix inverse (K + αIn)−1 as follows. Expand (DDT + αIn)−1 using the Sherman-Morrison-Woodbury formula and obtain

  • DDT + αIn

−1 = α−1In − α−1 D

  • n×d
  • αId + DTD
  • d×d

−1 DT

  • d×n

.

It costs only O(nd2) time and O(nd) space to compute b =

  • DDT + αIn

−1y.

slide-11
SLIDE 11

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

How to Speedup: Example 2

Suppose we have a low-rank factorization K

  • n×n

≈ D

  • n×d

DT

  • d×n

, Compute the eigenvalue decomposition of K as follows. Compute the eigenvalue decomposition of the d × d small matrix S = DTD ∈ Rd×d : S = USΛSUT

S.

The partial eigenvalue decomposition of DDT is K ≈ DDT =

  • DUSΛ−1/2

S

  • ΛS
  • DUSΛ−1/2

S

T It costs only O(nd2) time and O(nd) space.

slide-12
SLIDE 12

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Outline

1

Motivation

2

The Nyström Method

3

Column Sampling

4

Improve the Nyström Method

5

The Modified Nyström Method Comparisons between the Two Methods Efficient Algorithms Theories

slide-13
SLIDE 13

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

The Nyström Method

Random Selection: selects c (≪ n) columns of K to construct C using some randomized algorithms. After permutation we have K = W KT

21

K21 K22

  • ,

C = W K21

  • .

The Nyström Approximation: ˜ Knys

c

≈ K ˜ Knys

c

  • n×n

= C

  • n×c

W†

  • c×c

CT

  • c×n

.

slide-14
SLIDE 14

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

The Nyström Method

Random Selection: selects c (≪ n) columns of K to construct C using some randomized algorithms. After permutation we have K = W KT

21

K21 K22

  • ,

C = W K21

  • .

The Nyström Approximation: ˜ Knys

c

≈ K ˜ Knys

c

  • n×n

= C

  • n×c

W†

  • c×c

CT

  • c×n

.

slide-15
SLIDE 15

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

The Nyström Approximation

The Nyström Approximation: K ≈ ˜ Knys

c

= CW†CT (A low-rank factorization).

Nyström Approximation

× × n×n c×n n×c c×c

slide-16
SLIDE 16

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Outline

1

Motivation

2

The Nyström Method

3

Column Sampling

4

Improve the Nyström Method

5

The Modified Nyström Method Comparisons between the Two Methods Efficient Algorithms Theories

slide-17
SLIDE 17

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Problem Formulation

Problem: How to select informative columns of K ∈ Rn×n to construct C ∈ Rn×c? The approximation error

  • K − CUCT
  • F or
  • K − CUCT
  • 2 should be as small as possible.

Hardness: Totally (n

c) choices.

slide-18
SLIDE 18

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Problem Formulation

Problem: How to select informative columns of K ∈ Rn×n to construct C ∈ Rn×c? The approximation error

  • K − CUCT
  • F or
  • K − CUCT
  • 2 should be as small as possible.

Hardness: Totally (n

c) choices.

slide-19
SLIDE 19

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Criterion: Upper Error Bounds

Using approximation algorithms to find c good columns (not necessarily the best) Hope that K−CUCT F

K−KkF

has upper bound, which is the smaller the better.

slide-20
SLIDE 20

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Uniform Sampling: The Simplest Algorithm

Sample c columns of K uniformly at random to construct C. The simplest, but the most widely used.

slide-21
SLIDE 21

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Adaptive Sampling

The adaptive sampling algorithm [Deshpande et al. , 2006]:

1 Sample c1 columns of K to construct C1 using some

algorithm;

2 Compute the residual B = K − C1C† 1K; 3 Compute sampling probabilities pi = bi2

2

B2

F , for i = 1 to

n;

4 Sample further c2 columns of K in c2 i.i.d. trials, in each

trial the i-th column is chosen with probability pi; Denote the selected columns by C2;

5 Return C = [C1 , C2].

slide-22
SLIDE 22

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Adaptive Sampling

The error term K − CC†KF is bounded theoretically, but K − CW†CTF is not. Empirically, the adaptive sampling algorithm works very well.

slide-23
SLIDE 23

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Outline

1

Motivation

2

The Nyström Method

3

Column Sampling

4

Improve the Nyström Method

5

The Modified Nyström Method Comparisons between the Two Methods Efficient Algorithms Theories

slide-24
SLIDE 24

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

How to Improve the Nyström Approximation?

Devise better sampling algorithms to improve the upper error bounds. Use other types of low-rank approximation instead of the Nyström approximation K ≈ CW†CT.

slide-25
SLIDE 25

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

How to Improve the Nyström Approximation?

Devise better sampling algorithms to improve the upper error bounds. Use other types of low-rank approximation instead of the Nyström approximation K ≈ CW†CT.

slide-26
SLIDE 26

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Better Sampling Algorithms?

We hope K−CW†CT F

K−KkF

will be very small if the column sampling algorithm is good enough. But it cannot be arbitrarily small. Lower Error Bound Theorem (Wang & Zhang, JMLR 2013) Whatever column sampling is used to select c columns, there exists a bad case K such that K − CW†CT2

F

K − Kk2

F

≥ Ω

  • 1 + nk

c2

  • .
slide-27
SLIDE 27

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Better Sampling Algorithms?

We hope K−CW†CT F

K−KkF

will be very small if the column sampling algorithm is good enough. But it cannot be arbitrarily small. Lower Error Bound Theorem (Wang & Zhang, JMLR 2013) Whatever column sampling is used to select c columns, there exists a bad case K such that K − CW†CT2

F

K − Kk2

F

≥ Ω

  • 1 + nk

c2

  • .
slide-28
SLIDE 28

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Better Sampling Algorithms?

We hope K−CW†CT F

K−KkF

will be very small if the column sampling algorithm is good enough. But it cannot be arbitrarily small. Lower Error Bound Theorem (Wang & Zhang, JMLR 2013) Whatever column sampling is used to select c columns, there exists a bad case K such that K − CW†CT2

F

K − Kk2

F

≥ Ω

  • 1 + nk

c2

  • .
slide-29
SLIDE 29

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Different Types of Low-Rank Approximation?

The Ensemble Nyström Method [Kumar et al. , JMLR 2012]: K ≈

t

  • i=1

1 t C(i)W(i)†C(i)T It does not improve the lower error bound. Lower Error Bound Theorem (Wang & Zhang, JMLR 2013) Whatever column sampling is used to select c columns, there exists a bad case K such that

  • K − t

i=1 1 t C(i)W(i)†C(i)T

2

F

K − Kk2

F

≥ Ω

  • 1 + nk

c2

  • .
slide-30
SLIDE 30

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Different Types of Low-Rank Approximation?

The Ensemble Nyström Method [Kumar et al. , JMLR 2012]: K ≈

t

  • i=1

1 t C(i)W(i)†C(i)T It does not improve the lower error bound. Lower Error Bound Theorem (Wang & Zhang, JMLR 2013) Whatever column sampling is used to select c columns, there exists a bad case K such that

  • K − t

i=1 1 t C(i)W(i)†C(i)T

2

F

K − Kk2

F

≥ Ω

  • 1 + nk

c2

  • .
slide-31
SLIDE 31

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Different Types of Low-Rank Approximation?

The Ensemble Nyström Method [Kumar et al. , JMLR 2012]: K ≈

t

  • i=1

1 t C(i)W(i)†C(i)T It does not improve the lower error bound. Lower Error Bound Theorem (Wang & Zhang, JMLR 2013) Whatever column sampling is used to select c columns, there exists a bad case K such that

  • K − t

i=1 1 t C(i)W(i)†C(i)T

2

F

K − Kk2

F

≥ Ω

  • 1 + nk

c2

  • .
slide-32
SLIDE 32

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Different Types of Low-Rank Approximation?

The Modified Nyström Method [Wang & Zhang, JMLR 2013]: K ≈ C

  • C†K(C†)T
  • c×c
  • CT.

Theorem (Wang & Zhang, JMLR 2013) Using a column sampling algorithm, the error incurred by the modified Nyström method satisfies E

  • K − C
  • C†K(C†)T

CT 2

F

K − Kk2

F

≤ 1 +

  • k

c .

slide-33
SLIDE 33

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Different Types of Low-Rank Approximation?

The Modified Nyström Method [Wang & Zhang, JMLR 2013]: K ≈ C

  • C†K(C†)T
  • c×c
  • CT.

Theorem (Wang & Zhang, JMLR 2013) Using a column sampling algorithm, the error incurred by the modified Nyström method satisfies E

  • K − C
  • C†K(C†)T

CT 2

F

K − Kk2

F

≤ 1 +

  • k

c .

slide-34
SLIDE 34

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Outline

1

Motivation

2

The Nyström Method

3

Column Sampling

4

Improve the Nyström Method

5

The Modified Nyström Method Comparisons between the Two Methods Efficient Algorithms Theories

slide-35
SLIDE 35

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Notation

Define TSVD(n3): time of the full SVD (or eigenvalue decomposition, matrix inverse, etc.) of an n × n matrix Define TMultiply(n3): time of multiplying two n × n matrices They are both O(n3), but very different in practice. Large scale matrix multiplication is not a challenge in real-world applications.

slide-36
SLIDE 36

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Notation

Define TSVD(n3): time of the full SVD (or eigenvalue decomposition, matrix inverse, etc.) of an n × n matrix Define TMultiply(n3): time of multiplying two n × n matrices They are both O(n3), but very different in practice. Large scale matrix multiplication is not a challenge in real-world applications.

slide-37
SLIDE 37

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Comparisons between the Two Methods

The Standard Nyström Method: fast. It costs only TSVD(c3) time to compute the intersection matrix Unys = W†. The Modified Nyström Method: slow. It costs TSVD(nc2) + TMultiply(n2c) time to compute the intersection matrix Umod = C†K(C†)T naively.

slide-38
SLIDE 38

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Comparisons between the Two Methods

The Standard Nyström Method: inaccurate. It cannot attain 1 + ǫ Frobenius relative-error bound unless c ≥

  • nk/ǫ

columns are selected, whatever column selection algorithm is used. (Due to its lower error bound.) The Modified Nyström Method: accurate. Some adaptive sampling based algorithms attain 1 + ǫ Frobenius relative-error bound when c = O(k/ǫ2). (c is the smaller the better.)

slide-39
SLIDE 39

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Comparisons between the Two Methods

The Standard Nyström Method: inaccurate. It cannot attain 1 + ǫ Frobenius relative-error bound unless c ≥

  • nk/ǫ

columns are selected, whatever column selection algorithm is used. (Due to its lower error bound.) The Modified Nyström Method: accurate. Some adaptive sampling based algorithms attain 1 + ǫ Frobenius relative-error bound when c = O(k/ǫ2). (c is the smaller the better.)

slide-40
SLIDE 40

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Fast Computation of the Intersection Matrix

Naively computing the intersection matrix U = C†K(C†)T costs TSVD(nc2) + TMultiply(n2c) time. How to speedup?

slide-41
SLIDE 41

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Fast Computation of the Intersection Matrix

Naively computing the intersection matrix U = C†K(C†)T costs TSVD(nc2) + TMultiply(n2c) time. How to speedup?

slide-42
SLIDE 42

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Fast Computation of the Intersection Matrix

Moore-Penrose inverse of partitioned matrices can be expanded! Let P be a permutation matrix, and let PC = W K21

  • .

If W is nonsingular, let S = K21W−1, the Moore-Penrose inverse of C can be written as C† = W−1 Ic + STS −1 Ic ST P,

slide-43
SLIDE 43

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Fast Computation of the Intersection Matrix

Compute the intersection matrix by U = C†K(C†)T = T1

  • W + T2 + TT

2 + T3

  • TT

1 ,

where the intermediate matrices are computed by T0 = KT

21K21,

T1 = W−1 Ic + W−1T2 −1, T2 = T0W−1, T3 = W−1 KT

21K22K21

  • W−1.

The four intermediate matrices are all of size c × c, and the matrix inverse operations are on c × c small matrices.

slide-44
SLIDE 44

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Fast Computation of the Intersection Matrix

In this way, it costs only TSVD(c3) + TMultiply

  • (n − c)2c
  • time.

The naive approach cost TSVD(nc2) + TMultiply(n2c) time. Our method works only if W is nonsingular. If K is Gaussian RBF kernel matrix, and if the selected c data are distinct points, then W is nonsingular. If K is linear kernel matrix, W is usually singular.

slide-45
SLIDE 45

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Fast Computation of the Intersection Matrix

In this way, it costs only TSVD(c3) + TMultiply

  • (n − c)2c
  • time.

The naive approach cost TSVD(nc2) + TMultiply(n2c) time. Our method works only if W is nonsingular. If K is Gaussian RBF kernel matrix, and if the selected c data are distinct points, then W is nonsingular. If K is linear kernel matrix, W is usually singular.

slide-46
SLIDE 46

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Fast Computation of the Intersection Matrix

In this way, it costs only TSVD(c3) + TMultiply

  • (n − c)2c
  • time.

The naive approach cost TSVD(nc2) + TMultiply(n2c) time. Our method works only if W is nonsingular. If K is Gaussian RBF kernel matrix, and if the selected c data are distinct points, then W is nonsingular. If K is linear kernel matrix, W is usually singular.

slide-47
SLIDE 47

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Fast Computation of the Intersection Matrix

In this way, it costs only TSVD(c3) + TMultiply

  • (n − c)2c
  • time.

The naive approach cost TSVD(nc2) + TMultiply(n2c) time. Our method works only if W is nonsingular. If K is Gaussian RBF kernel matrix, and if the selected c data are distinct points, then W is nonsingular. If K is linear kernel matrix, W is usually singular.

slide-48
SLIDE 48

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Fast Computation of the Intersection Matrix

In this way, it costs only TSVD(c3) + TMultiply

  • (n − c)2c
  • time.

The naive approach cost TSVD(nc2) + TMultiply(n2c) time. Our method works only if W is nonsingular. If K is Gaussian RBF kernel matrix, and if the selected c data are distinct points, then W is nonsingular. If K is linear kernel matrix, W is usually singular.

slide-49
SLIDE 49

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Fast Computation of the Intersection Matrix

Results on an 15, 000 × 15, 000 (dense) RBF kernel matrix.

200 400 600 800 1000 1200 1400 1600 1800 2000 20 40 60 80 100 120 140 160 180

Time (s)

c

Modified Nystrom (naive) Modified Nystrom (fast) Standard Nystrom

slide-50
SLIDE 50

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Fast Computation of the Intersection Matrix

Results on an 15, 000 × 15, 000 sparse RBF kernel matrix with 1% entries nonzero.

200 400 600 800 1000 1200 1400 1600 1800 2000 20 40 60 80 100 120 140 160 180

Time (s)

c

Modified Nystrom (naive) Modified Nystrom (fast) Standard Nystrom

slide-51
SLIDE 51

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Efficient Column Sampling Algorithm

The uniform+adaptive2 algorithm:

1 Uniform Sampling. Uniformly sample

c1 = 8.7µk log √ 5k

  • columns of K without replacement to construct C1;

2 Adaptive Sampling. Sample

c2 = 10kǫ−1 columns of K to construct C2 using adaptive sampling algorithm according to the residual K − PC1K;

3 Adaptive Sampling. Sample

c3 = 2ǫ−1(c1 + c2) columns of K to construct C3 using adaptive sampling algorithm according to the residual K − P[C1, C2]K;

4 Return C = [C1, C2, C3].

slide-52
SLIDE 52

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Efficient Column Sampling Algorithm

The uniform+adaptive2 algorithm:

1 Uniform Sampling. Uniformly sample

c1 = 8.7µk log √ 5k

  • columns of K without replacement to construct C1;

2 Adaptive Sampling. Sample

c2 = 10kǫ−1 columns of K to construct C2 using adaptive sampling algorithm according to the residual K − PC1K;

3 Adaptive Sampling. Sample

c3 = 2ǫ−1(c1 + c2) columns of K to construct C3 using adaptive sampling algorithm according to the residual K − P[C1, C2]K;

4 Return C = [C1, C2, C3].

slide-53
SLIDE 53

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Efficient Column Sampling Algorithm

The uniform+adaptive2 algorithm:

1 Uniform Sampling. Uniformly sample

c1 = 8.7µk log √ 5k

  • columns of K without replacement to construct C1;

2 Adaptive Sampling. Sample

c2 = 10kǫ−1 columns of K to construct C2 using adaptive sampling algorithm according to the residual K − PC1K;

3 Adaptive Sampling. Sample

c3 = 2ǫ−1(c1 + c2) columns of K to construct C3 using adaptive sampling algorithm according to the residual K − P[C1, C2]K;

4 Return C = [C1, C2, C3].

slide-54
SLIDE 54

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Efficient Column Sampling Algorithm

Theorem The uniform+adaptive2 algorithm cost time TSVD(nc2ǫ2) + TMultiply(n2cǫ). Theorem By sampling c = O(kǫ−2 + µkǫ−1k log k) columns using the uniform+adaptive2 algorithm,

  • K − C
  • C†K(C†)T

CT

  • F ≤ (1 + ǫ) K − KkF

holds with high probability.

slide-55
SLIDE 55

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Efficient Column Sampling Algorithm

Theorem The uniform+adaptive2 algorithm cost time TSVD(nc2ǫ2) + TMultiply(n2cǫ). Theorem By sampling c = O(kǫ−2 + µkǫ−1k log k) columns using the uniform+adaptive2 algorithm,

  • K − C
  • C†K(C†)T

CT

  • F ≤ (1 + ǫ) K − KkF

holds with high probability.

slide-56
SLIDE 56

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Theoretical Justification

Theorem (Exact Recovery.) For the symmetric matrix K defined previously, the following three statements are equivalent:

1 rank(W) = rank(K), 2 K = CW†CT,

(i.e., the standard Nyström method is exact)

3 K = C

  • C†K(C†)T

CT, (i.e., the modified Nyström method is exact)

slide-57
SLIDE 57

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Theoretical Justification

Theorem (Exact Recovery.) It holds in general that K − C

  • C†K(C†)T

CTF ≤ K − CW†CTF. It is because U = C†K(C†)T is the solution to the problem min

U

K − CUCTF.

slide-58
SLIDE 58

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Lower Error Bounds

Lower error bound of the modified Nyström method Theorem Whatever column sampling is used to select c columns, there exists a bad case K such that

  • K − C
  • C†K(C†)T

CT 2

F

K − Kk2

F

≥ n − c n − k

  • 1 + 2k

c

  • .
slide-59
SLIDE 59

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Lower Error Bounds

The modified Nyström method has a strong resemblance with the column selection problem. Lower error bound of the column selection problem Theorem (Boutsidis et al. , FOCS 2011) Whatever column sampling is used to select c columns, there exists a bad case A ∈ Rm×n such that

  • A − CC†A
  • 2

F

A − Ak2

F

≥ n − c n − k

  • 1 + k

c

  • .

This lower bound is tight, because it is attained by a column selection algorithm of [Guruswami & Sinop, SODA 2012].

slide-60
SLIDE 60

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Lower Error Bounds

The modified Nyström method has a strong resemblance with the column selection problem. Lower error bound of the column selection problem Theorem (Boutsidis et al. , FOCS 2011) Whatever column sampling is used to select c columns, there exists a bad case A ∈ Rm×n such that

  • A − CC†A
  • 2

F

A − Ak2

F

≥ n − c n − k

  • 1 + k

c

  • .

This lower bound is tight, because it is attained by a column selection algorithm of [Guruswami & Sinop, SODA 2012].

slide-61
SLIDE 61

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Lower Error Bounds

Their lower error bounds are very similar:

  • K − CC†K(C†)TCT

2

F

K − Kk2

F

≥ n − c n − k

  • 1 + 2k

c

  • K − Kk2

F,

  • A − CC†A
  • 2

F

A − Ak2

F

≥ n − c n − k

  • 1 + k

c

  • A − Ak2

F.

It is a reasonable conjecture that the lower bound of the modified Nyström method is also tight! (an open problem).

slide-62
SLIDE 62

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Lower Error Bounds

Their lower error bounds are very similar:

  • K − CC†K(C†)TCT

2

F

K − Kk2

F

≥ n − c n − k

  • 1 + 2k

c

  • K − Kk2

F,

  • A − CC†A
  • 2

F

A − Ak2

F

≥ n − c n − k

  • 1 + k

c

  • A − Ak2

F.

It is a reasonable conjecture that the lower bound of the modified Nyström method is also tight! (an open problem).

slide-63
SLIDE 63

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Open Problem

Lower error bound: at least c ≥ 2k/ǫ to attain 1 + ǫ relative-error bound. An upper error bound [Wang & Zhang, JMLR 2013]: samples c = k

ǫ2 (1 + o(1)) columns to attain 1 + ǫ

relative-error bound. The gap implies better column sampling algorithms for the modified Nyström method.

slide-64
SLIDE 64

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Open Problem

Lower error bound: at least c ≥ 2k/ǫ to attain 1 + ǫ relative-error bound. An upper error bound [Wang & Zhang, JMLR 2013]: samples c = k

ǫ2 (1 + o(1)) columns to attain 1 + ǫ

relative-error bound. The gap implies better column sampling algorithms for the modified Nyström method.

slide-65
SLIDE 65

The Modified Nyström Method Wang & Zhang Motivation The Nyström Method Column Sampling Improve the Nyström Method The Modified Nyström Method

Comparisons between the Two Methods Efficient Algorithms Theories

Open Problem

Lower error bound: at least c ≥ 2k/ǫ to attain 1 + ǫ relative-error bound. An upper error bound [Wang & Zhang, JMLR 2013]: samples c = k

ǫ2 (1 + o(1)) columns to attain 1 + ǫ

relative-error bound. The gap implies better column sampling algorithms for the modified Nyström method.

slide-66
SLIDE 66

The Modified Nyström Method Wang & Zhang Reference

Reference

  • A. Deshpande, L. Rademacher, S. Vempala, and G.Wang: Matrix

approximation and projective clustering via volume sampling. Theory of Computing, 2006.

  • S. Kumar, M. Mohri, and A. Talwalkar: Sampling methods for the

Nyström method. JMLR, 2012.

  • S. Wang and Z. Zhang: Improving CUR matrix decomposition and

the Nyström approximation via adaptive sampling. JMLR, 2013.

  • C. Boutsidis, P

. Drineas, and M. Magdon-Ismail: Near optimal column-based matrix reconstruction. In FOCS, 2011.

  • V. Guruswami and A. K. Sinop: Optimal column based low-rank

matrix reconstruction. In SODA, 2012.