SLIDE 1
Recitations for 10-701 Randomized Algorithm for matrices
Mu Li April 9, 2013
SLIDE 2 Low-rank approximation
Given a matrix A ∈ Rn×m, we form the rank-k approximation by
such that
where Σ ∈ Rk×k is typical a diagonal matrix and k ≪ min{n, m} m n A
rank-k approx
k P
×
Σ k
×
m QT
SLIDE 3 Why low-rank?
Why low-rank approximation works?
◮ data are quite often lie in a low dimensional manifold, a
low-rank representation approximates the data well Advantages of low-rank presentation
◮ denoising ◮ visualization ◮ reduce storage requirement from O(nm) into O((n + m)k) ◮ reduce computational complexity
◮ Ax ≈ P(Σ(QTx)), time
O(nm) → O((n + m)k)
◮ A+ ≈ PΣ+QT, QR-decomposition to ensure P, Q are
O(nm min{n, m}) → O((n + m)k2 + k3)
SLIDE 4 Best k-rank approximation
Given matrix A and rank k, we solves the following problem: min
P,Σ,Q
- A − PΣQT
- subject to Σ ∈ Rk×k
The close form solution is truncated SVD, namely keeping the top k singular values and the corresponding singular vectors However, the time complexity O(nmk) is too large...
SLIDE 5
Johnson-Lindenstrauss Theorem.
For any 0 < ǫ < 1 and any integer n, let k be a positive integer such that k ≥ 4(ǫ2/2 − ǫ3/3)−1 ln n. Then for any set V of n points in Rd, there is a map f : Rd → Rk such that for all u, v ∈ V , (1 − ǫ) u − v2 ≤ f (u) − f (v)2 ≤ (1 + ǫ) u − v2 How to construct f (u)
◮ f (u) = Ru, p(Rij = 0) = 2/3, p(Rij = +1) = 1/6,
p(Rij = −1) = 1/6,
◮ f (u) = GSu, G: random Gaussian matrix, S: diagonal scaling
matrix
◮ f (u) = DHSu, D: random row selection matrix, H Hadamard
transform matrix
SLIDE 6 A basic random projection algorithm
Algorithm
- 1. construct an m × k random matrix Ω, e.g. Gaussian or
sampled Hadamard, with ℓ = O(k/ǫ)
- 2. B = AΩ
- 3. perform truncated SVD B = UkSkV k
k where Uk ∈ Rn×k
- 4. approximate A by Uk(UT
k A)
Theoretical guarantee
with high probability A − UkUT
k AF ≤ (1 + ǫ)A − AkF,
where Ak is the best rank-k approximation Time complexity: O(n(m log(ℓ) + k2)), but ℓ may be much bigger than k
SLIDE 7 A cheaper but less accurate algorithm
Remember JL Theorem, typical ℓ = O(k ln k) is good enough
Algorithm
The same as before, but use a different ℓ = k + p, usually p = 5, 10, . . .
Theoretical guarantee
with high probability A − UkUT
k A2 ≤
Comparing the previous algorithm, the error constant here is much worse
SLIDE 8 Still cheap but more accurate algorithm
iterate several time improves the space quality, remember how to fast compute the leading singular vector:
- 1. start with random u,
- 2. repeat u = AT Au
u2 until converges
Algorithm
the same as above except for B = (AAT)qAΩ
Theoretical guarantee
with high probability A − UkUT
k A2 ≤ (10ℓ min {n, m})1/(4q+2) A − Ak2
Time complexity: O(qnm ln ℓ) by following the order B = (. . . (A(AT(AΩ))))
SLIDE 9 Nystr¨
The previous algorithms need to touch the whole matrix A, which is expensive
Nystr¨
Assume A is symmetric
- 1. random pick ℓ columns of A to form P
- 2. the corresponding rows are then QT = PT, denote by B the
k-by-k cross matrix
- 3. then approximate A by
- A = PB+PT
n n A
Nystr¨
ℓ P
×
SVD
B
×
+
n PT
SLIDE 10 Nystr¨
Theoretical guarantee
E
A2
ℓ A∗
ii
here A∗
ii = max i
Aii The error bound is much worse, but the time complexity reduces to O(nℓk + ℓ2k) with k ≤ ℓ ≪ n
SLIDE 11 Example, spectral embedding
Dataset: minist8m; in total 3,276,294 samples
(a) SVD on 8k samples (b) Nystrom method
Mu Li, et.al. Making Large-Scale Nystrm Approximation Possible. ICML 2010 Mu Li, et.al. Time and Space Ecient Spectral Clustering via Column
SLIDE 12
Example, image segmentations
1 Million pixels
SLIDE 13
Example, image segmentations
1 Million pixels, 2 segmentations, CPU time 1.2 sec
SLIDE 14
Example, image segmentations
1.8 Million pixels
SLIDE 15
Example, image segmentations
1.8 Million pixels, 4 segmentations, CPU time 5.9 sec
SLIDE 16
Example, image segmentations
10 Million pixels
SLIDE 17
Example, image segmentations
10 Million pixels, 18 segmentations, CPU time 18.9 sec
SLIDE 18
Example, image segmentations
15 Million pixels
SLIDE 19
Example, image segmentations
15 Million pixels, 8 segmentations, CPU time 22.6 sec
SLIDE 20
Conclusion
It works!