Recitations for 10-701 Randomized Algorithm for matrices Mu Li - - PowerPoint PPT Presentation

recitations for 10 701 randomized algorithm for matrices
SMART_READER_LITE
LIVE PREVIEW

Recitations for 10-701 Randomized Algorithm for matrices Mu Li - - PowerPoint PPT Presentation

Recitations for 10-701 Randomized Algorithm for matrices Mu Li April 9, 2013 Low-rank approximation Given a matrix A R n m , we form the rank- k approximation by A = P Q T such that A A , where R k k is typical


slide-1
SLIDE 1

Recitations for 10-701 Randomized Algorithm for matrices

Mu Li April 9, 2013

slide-2
SLIDE 2

Low-rank approximation

Given a matrix A ∈ Rn×m, we form the rank-k approximation by

  • A = PΣQT

such that

  • A ≈ A,

where Σ ∈ Rk×k is typical a diagonal matrix and k ≪ min{n, m} m n A

rank-k approx

k P

×

Σ k

×

m QT

slide-3
SLIDE 3

Why low-rank?

Why low-rank approximation works?

◮ data are quite often lie in a low dimensional manifold, a

low-rank representation approximates the data well Advantages of low-rank presentation

◮ denoising ◮ visualization ◮ reduce storage requirement from O(nm) into O((n + m)k) ◮ reduce computational complexity

◮ Ax ≈ P(Σ(QTx)), time

O(nm) → O((n + m)k)

◮ A+ ≈ PΣ+QT, QR-decomposition to ensure P, Q are

  • rthogonal, time

O(nm min{n, m}) → O((n + m)k2 + k3)

slide-4
SLIDE 4

Best k-rank approximation

Given matrix A and rank k, we solves the following problem: min

P,Σ,Q

  • A − PΣQT
  • subject to Σ ∈ Rk×k

The close form solution is truncated SVD, namely keeping the top k singular values and the corresponding singular vectors However, the time complexity O(nmk) is too large...

slide-5
SLIDE 5

Johnson-Lindenstrauss Theorem.

For any 0 < ǫ < 1 and any integer n, let k be a positive integer such that k ≥ 4(ǫ2/2 − ǫ3/3)−1 ln n. Then for any set V of n points in Rd, there is a map f : Rd → Rk such that for all u, v ∈ V , (1 − ǫ) u − v2 ≤ f (u) − f (v)2 ≤ (1 + ǫ) u − v2 How to construct f (u)

◮ f (u) = Ru, p(Rij = 0) = 2/3, p(Rij = +1) = 1/6,

p(Rij = −1) = 1/6,

◮ f (u) = GSu, G: random Gaussian matrix, S: diagonal scaling

matrix

◮ f (u) = DHSu, D: random row selection matrix, H Hadamard

transform matrix

slide-6
SLIDE 6

A basic random projection algorithm

Algorithm

  • 1. construct an m × k random matrix Ω, e.g. Gaussian or

sampled Hadamard, with ℓ = O(k/ǫ)

  • 2. B = AΩ
  • 3. perform truncated SVD B = UkSkV k

k where Uk ∈ Rn×k

  • 4. approximate A by Uk(UT

k A)

Theoretical guarantee

with high probability A − UkUT

k AF ≤ (1 + ǫ)A − AkF,

where Ak is the best rank-k approximation Time complexity: O(n(m log(ℓ) + k2)), but ℓ may be much bigger than k

slide-7
SLIDE 7

A cheaper but less accurate algorithm

Remember JL Theorem, typical ℓ = O(k ln k) is good enough

Algorithm

The same as before, but use a different ℓ = k + p, usually p = 5, 10, . . .

Theoretical guarantee

with high probability A − UkUT

k A2 ≤

  • 10ℓ min {n, m}A − Ak2

Comparing the previous algorithm, the error constant here is much worse

slide-8
SLIDE 8

Still cheap but more accurate algorithm

iterate several time improves the space quality, remember how to fast compute the leading singular vector:

  • 1. start with random u,
  • 2. repeat u = AT Au

u2 until converges

Algorithm

the same as above except for B = (AAT)qAΩ

Theoretical guarantee

with high probability A − UkUT

k A2 ≤ (10ℓ min {n, m})1/(4q+2) A − Ak2

Time complexity: O(qnm ln ℓ) by following the order B = (. . . (A(AT(AΩ))))

slide-9
SLIDE 9

Nystr¨

  • m Methods

The previous algorithms need to touch the whole matrix A, which is expensive

Nystr¨

  • m Methods

Assume A is symmetric

  • 1. random pick ℓ columns of A to form P
  • 2. the corresponding rows are then QT = PT, denote by B the

k-by-k cross matrix

  • 3. then approximate A by
  • A = PB+PT

n n A

Nystr¨

  • m

ℓ P

×

SVD

B

×

+

n PT

slide-10
SLIDE 10

Nystr¨

  • m Methods, cont

Theoretical guarantee

E

  • A −

A2

  • ≤ A − Ak2 + 2n

ℓ A∗

ii

here A∗

ii = max i

Aii The error bound is much worse, but the time complexity reduces to O(nℓk + ℓ2k) with k ≤ ℓ ≪ n

slide-11
SLIDE 11

Example, spectral embedding

Dataset: minist8m; in total 3,276,294 samples

(a) SVD on 8k samples (b) Nystrom method

Mu Li, et.al. Making Large-Scale Nystrm Approximation Possible. ICML 2010 Mu Li, et.al. Time and Space Ecient Spectral Clustering via Column

  • Sampling. CVPR 2011
slide-12
SLIDE 12

Example, image segmentations

1 Million pixels

slide-13
SLIDE 13

Example, image segmentations

1 Million pixels, 2 segmentations, CPU time 1.2 sec

slide-14
SLIDE 14

Example, image segmentations

1.8 Million pixels

slide-15
SLIDE 15

Example, image segmentations

1.8 Million pixels, 4 segmentations, CPU time 5.9 sec

slide-16
SLIDE 16

Example, image segmentations

10 Million pixels

slide-17
SLIDE 17

Example, image segmentations

10 Million pixels, 18 segmentations, CPU time 18.9 sec

slide-18
SLIDE 18

Example, image segmentations

15 Million pixels

slide-19
SLIDE 19

Example, image segmentations

15 Million pixels, 8 segmentations, CPU time 22.6 sec

slide-20
SLIDE 20

Conclusion

It works!