Ra Randomized SV SVD, CU CUR De Decom
- mpos
- sition
- n,
and and SPSD SPSD Ma Matri trix Ap Approximati tion
- n
Ra Randomized SV SVD, CU CUR De Decom ompos osition on, and - - PowerPoint PPT Presentation
Ra Randomized SV SVD, CU CUR De Decom ompos osition on, and and SPSD SPSD Ma Matri trix Ap Approximati tion on Shusen Wang Outline CX Decomposition & Approximate SVD CUR Decomposition SPSD Matrix Approximation CX
5 6 = ๐7๐
J๐ = ๐G๐ = ๐G๐L๐ปL๐L J
WXYZ ๐ [\ ๐ โ ๐๐ ] 6 โค 1 + ๐ ๐ โ ๐\ ] 6 Uniform sampling Leverage score sampling Gaussian projection SRHT Count sketch
O ๐๐ log ๐ + 1 ๐ O ๐ log ๐ + 1 ๐ O ๐ ๐ O ๐ + log ๐ log ๐ + 1 ๐ O ๐6 + ๐ ๐
๐ is the column coherence of ๐Z
J๐ = ๐G๐ = ๐G๐L๐ปL๐L J
J๐ = ๐G๐ = ๐G๐L๐ปL๐L J SVD: ๐ = ๐G ๐ปG๐G
J โ โ$ร*
Time cost: ๐(๐๐6)
J๐ = ๐G๐ = ๐G๐L๐ปL๐L J SVD: ๐ = ๐G ๐ปG๐G
J โ โ$ร*
Let ๐ปG๐G
J๐ = ๐ โ โ*ร&
Time cost: ๐(๐๐6 + ๐๐6)
J๐ = ๐G๐ = ๐G๐L๐ปL๐L J SVD: ๐ = ๐G ๐ปG๐G
J โ โ$ร*
Let ๐ปG๐G
J๐ = ๐ โ โ*ร&
SVD: ๐ = ๐L๐ปL๐L
J โ โ*ร&
Time cost: ๐(๐๐6 + ๐๐6 + ๐๐6)
J๐ = ๐G๐ = ๐G๐L๐ปL๐L J SVD: ๐ = ๐G ๐ปG๐G
J โ โ$ร*
Let ๐ปG๐G
J๐ = ๐ โ โ*ร&
SVD: ๐ = ๐L๐ปL๐L
J โ โ*ร&
๐ร๐ก matrix with
diagonal matrix ๐กร๐ matrix with
Time cost: ๐(๐๐6 + ๐๐6 + ๐๐6 + ๐๐6)
J
J๐ = ๐G๐ = ๐G๐L๐ปL๐L J ๐ร๐ก matrix with
diagonal matrix ๐กร๐ matrix with
Time cost: ๐ ๐๐6 + ๐๐6 + ๐๐6 + ๐๐6 = ๐(๐๐6 + ๐๐6)
5 6 = ๐7๐
5 6 = ๐7๐
A regression problem!
5 6 = ๐J๐ 7 ๐J๐
๐ โ ๐๐ n
5 6
โค 1 + ๐ โ min๐ ๐ โ ๐๐
5 6
J๐ โ โvร&
J๐ โ โvร&
7
7
7 ๐๐
7
7 ๐๐
7
7 ๐๐
7
7 ๐๐
\ w and ๐ = ๐
* w such that
5 6 โค 1 + ๐ ๐ โ ๐\ 5 6
๐๐๐๐ 7
๐
] 6 = ๐7๐๐7
๐
] 6 = ๐7๐๐7
\ w and ๐ = ๐ \ w
] 6 โค 1 + ๐ ๐ โ ๐\ 5 6
๐
] 6 = ๐7๐๐7
๐
] 6
J๐ 7 ๐๐
๐
] 6
๐
] 6
J๐ 7 ๐๐
* w and sv = ๐ v w
] 6
๐
] 6
๐
] 6
J๐ 7 ๐๐
Type 2: Optimal CUR Original Type 1: Fast CX Type 3: Fast CUR ๐ก* = 2๐, ๐กv = 2๐ Type 3: Fast CUR ๐ก* = 4๐, ๐กv = 4๐ ๐: ๐ = 1920 ๐ = 1168 ๐ and ๐:
6 6
6 6
When ๐ = 10โข, the ๐ร๐ matrix costs 80GB memory!
{โฐ๐ณ
{โฐ๐ณ
{โฐ๐ณ
{โฐ๐ณ
{โฐ๐ณ
{โฐ๐ณ
๐๐๐ ๐
๐๐๐ ๐
๐๐๐ ๐
q ๐6๐
๐
] 6
๐
] 6
๐
] 6
๐
] 6
๐
] 6
\ w columns by adaptive sampling
] 6
] 6
๐
5 6
๐
๐ฎ ๐
๐
๐ โ ๐๐๐J
5 6
= ๐7๐ ๐7 J
n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
] ๐
= ๐J๐
7(๐J๐๐) ๐J๐ 7.
๐
๐ โ ๐๐๐J
5 6
= ๐7๐ ๐7 J
n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
] ๐
= ๐J๐
7(๐J๐๐) ๐J๐ 7.
&* w
n๐J
] 6
โค 1 + ๐ ๐ โ ๐๐โ๐J
] 6
The faster model is nearly as good as the prototype model!
๐
๐ โ ๐๐๐J
5 6
= ๐7๐ ๐7 J
n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
] ๐
= ๐J๐
7(๐J๐๐) ๐J๐ 7.
&* w
n๐J
] 6
โค 1 + ๐ ๐ โ ๐๐โ๐J
] 6
linear in ๐
n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
5 6
๐ n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
5 6
= ๐J๐
7 ๐J๐๐
๐J๐
7
= ๐7๐๐7 = ๐7
n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
5 6
๐ n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
5 6
= ๐J๐
7 ๐J๐๐
๐J๐
7
= ๐7๐๐7 = ๐7
n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
5 6
๐ n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
5 6
= ๐J๐
7 ๐J๐๐
๐J๐
7
= ๐7๐๐7 = ๐7
n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
5 6
๐ n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
5 6
= ๐J๐
7 ๐J๐๐
๐J๐
7
= ๐7๐๐7 = ๐7
n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
5 6
๐ n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
5 6
= ๐J๐
7 ๐J๐๐
๐J๐
7
= ๐7๐๐7 = ๐7
n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
5 6
๐ n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
5 6
= ๐J๐
7 ๐J๐๐
๐J๐
7
= ๐7๐๐7 = ๐7
instance of the fast model.
prototype model
n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
5 6
๐ n = argmin
๐
๐J ๐ โ ๐๐๐J ๐
5 6
= ๐J๐
7 ๐J๐๐
๐J๐
7
= ๐7๐๐7 = ๐7
instance of the fast model.
prototype model
Very efficient!
Very efficient!
7(๐J๐๐) ๐J๐ 7
7(๐J๐๐) ๐J๐ 7
7(๐J๐๐) ๐J๐ 7
๐ โ ๐๐๐J
] 6
๐
] 6
The Nystrรถm Method
๐ ๐๐6 time
The Fast Model ๐ ๐๐6 + ๐6๐ time The Prototype Model ๐ ๐6๐ time