Ra Randomized SV SVD, CU CUR De Decom ompos osition on, and - - PowerPoint PPT Presentation

โ–ถ
ra randomized sv svd cu cur de decom ompos osition on and
SMART_READER_LITE
LIVE PREVIEW

Ra Randomized SV SVD, CU CUR De Decom ompos osition on, and - - PowerPoint PPT Presentation

Ra Randomized SV SVD, CU CUR De Decom ompos osition on, and and SPSD SPSD Ma Matri trix Ap Approximati tion on Shusen Wang Outline CX Decomposition & Approximate SVD CUR Decomposition SPSD Matrix Approximation CX


slide-1
SLIDE 1

Ra Randomized SV SVD, CU CUR De Decom

  • mpos
  • sition
  • n,

and and SPSD SPSD Ma Matri trix Ap Approximati tion

  • n

Shusen Wang

slide-2
SLIDE 2

Outline

  • CX Decomposition & Approximate SVD
  • CUR Decomposition
  • SPSD Matrix Approximation
slide-3
SLIDE 3
  • Given any matrix ๐ โˆˆ โ„$ร—&
  • The CX decomposition of ๐
  • 1. Sketching: ๐ƒ = ๐๐ โˆˆ โ„$ร—*
  • 2. Find ๐˜ such that ๐ โ‰ˆ ๐ƒ๐˜
  • E.g. ๐˜โ‹† = argmin๐˜ ๐ โˆ’ ๐ƒ๐˜

5 6 = ๐ƒ7๐

  • It costs ๐‘ƒ ๐‘›๐‘œ๐‘‘
  • CX decomposition โ‡” approximate SVD

๐ โ‰ˆ ๐ƒ๐˜ = ๐•G๐šปG๐–G

J๐˜ = ๐•G๐š = ๐•G๐•L๐šปL๐–L J

CX Decomposition

slide-4
SLIDE 4
  • Let the sketching matrix ๐ โˆˆ โ„&ร—* be defined in the table.
  • min

WXYZ ๐˜ [\ ๐ โˆ’ ๐ƒ๐˜ ] 6 โ‰ค 1 + ๐œ— ๐ โˆ’ ๐\ ] 6 Uniform sampling Leverage score sampling Gaussian projection SRHT Count sketch

c โ‰ฅ

O ๐œ‰๐‘™ log ๐‘™ + 1 ๐œ— O ๐‘™ log ๐‘™ + 1 ๐œ— O ๐‘™ ๐œ— O ๐‘™ + log ๐‘œ log ๐‘™ + 1 ๐œ— O ๐‘™6 + ๐‘™ ๐œ—

๐œ‰ is the column coherence of ๐Z

CX Decomposition

slide-5
SLIDE 5

CX Decomposition โ‡” Approximate SVD

  • CX decomposition โ‡” approximate SVD

๐ โ‰ˆ ๐ƒ๐˜ = ๐•G๐šปG๐–G

J๐˜ = ๐•G๐š = ๐•G๐•L๐šปL๐–L J

slide-6
SLIDE 6

CX Decomposition โ‡” Approximate SVD

  • CX decomposition โ‡” approximate SVD

๐ โ‰ˆ ๐ƒ๐˜ = ๐•G๐šปG๐–G

J๐˜ = ๐•G๐š = ๐•G๐•L๐šปL๐–L J SVD: ๐ƒ = ๐•G ๐šปG๐–G

J โˆˆ โ„$ร—*

Time cost: ๐‘ƒ(๐‘›๐‘‘6)

slide-7
SLIDE 7

CX Decomposition โ‡” Approximate SVD

  • CX decomposition โ‡” approximate SVD

๐ โ‰ˆ ๐ƒ๐˜ = ๐•G๐šปG๐–G

J๐˜ = ๐•G๐š = ๐•G๐•L๐šปL๐–L J SVD: ๐ƒ = ๐•G ๐šปG๐–G

J โˆˆ โ„$ร—*

Let ๐šปG๐–G

J๐˜ = ๐š โˆˆ โ„*ร—&

Time cost: ๐‘ƒ(๐‘›๐‘‘6 + ๐‘œ๐‘‘6)

slide-8
SLIDE 8

CX Decomposition โ‡” Approximate SVD

  • CX decomposition โ‡” approximate SVD

๐ โ‰ˆ ๐ƒ๐˜ = ๐•G๐šปG๐–G

J๐˜ = ๐•G๐š = ๐•G๐•L๐šปL๐–L J SVD: ๐ƒ = ๐•G ๐šปG๐–G

J โˆˆ โ„$ร—*

Let ๐šปG๐–G

J๐˜ = ๐š โˆˆ โ„*ร—&

SVD: ๐š = ๐•L๐šปL๐–L

J โˆˆ โ„*ร—&

Time cost: ๐‘ƒ(๐‘›๐‘‘6 + ๐‘œ๐‘‘6 + ๐‘œ๐‘‘6)

slide-9
SLIDE 9

CX Decomposition โ‡” Approximate SVD

  • CX decomposition โ‡” approximate SVD

๐ โ‰ˆ ๐ƒ๐˜ = ๐•G๐šปG๐–G

J๐˜ = ๐•G๐š = ๐•G๐•L๐šปL๐–L J SVD: ๐ƒ = ๐•G ๐šปG๐–G

J โˆˆ โ„$ร—*

Let ๐šปG๐–G

J๐˜ = ๐š โˆˆ โ„*ร—&

SVD: ๐š = ๐•L๐šปL๐–L

J โˆˆ โ„*ร—&

๐‘›ร—๐‘ก matrix with

  • rthonormal columns

diagonal matrix ๐‘กร—๐‘œ matrix with

  • rthonormal rows

Time cost: ๐‘ƒ(๐‘›๐‘‘6 + ๐‘œ๐‘‘6 + ๐‘œ๐‘‘6 + ๐‘›๐‘‘6)

slide-10
SLIDE 10

CX Decomposition โ‡” Approximate SVD

  • CX decomposition โ‡” approximate SVD
  • Done! Approximate rank ๐‘‘ SVD: ๐ โ‰ˆ (๐•G๐•L)๐šปL๐–L

J

๐ โ‰ˆ ๐ƒ๐˜ = ๐•G๐šปG๐–G

J๐˜ = ๐•G๐š = ๐•G๐•L๐šปL๐–L J ๐‘›ร—๐‘ก matrix with

  • rthonormal columns

diagonal matrix ๐‘กร—๐‘œ matrix with

  • rthonormal rows

Time cost: ๐‘ƒ ๐‘›๐‘‘6 + ๐‘œ๐‘‘6 + ๐‘œ๐‘‘6 + ๐‘›๐‘‘6 = ๐‘ƒ(๐‘›๐‘‘6 + ๐‘œ๐‘‘6)

slide-11
SLIDE 11

CX Decomposition โ‡” Approximate SVD

  • CX decomposition โ‡” approximate SVD
  • Given ๐ โˆˆ โ„$ร—& and ๐ƒ โˆˆ โ„$ร—*, the approximate SVD costs
  • ๐‘ƒ ๐‘›๐‘œ๐‘‘ time
  • ๐‘ƒ ๐‘›๐‘‘ + ๐‘œ๐‘‘ memory
slide-12
SLIDE 12

CX Decomposition

  • The CX decomposition of ๐ โˆˆ โ„$ร—&
  • Optimal solution: ๐˜โ‹† = argmin๐˜ ๐ โˆ’ ๐ƒ๐˜

5 6 = ๐ƒ7๐

  • How to make it more efficient?
slide-13
SLIDE 13

CX Decomposition

  • The CX decomposition of ๐ โˆˆ โ„$ร—&
  • Optimal solution: ๐˜โ‹† = argmin๐˜ ๐ โˆ’ ๐ƒ๐˜

5 6 = ๐ƒ7๐

  • How to make it more efficient?

A regression problem!

slide-14
SLIDE 14

Fast CX Decomposition

  • Fast CX [Drineas, Mahoney, Muthukrishnan, 2008][Clarkson & Woodruff, 2013]
  • Draw another sketching matrix ๐“ โˆˆ โ„$ร—m
  • Compute ๐˜

n = argmin๐˜ ๐“o ๐ โˆ’ ๐ƒ๐˜

5 6 = ๐“J๐ƒ 7 ๐“J๐

  • Time cost: ๐‘ƒ ๐‘œ๐‘‘๐‘ก + TimeOfSketch
  • When ๐‘ก = ๐‘ƒ

q ๐‘‘/๐œ— ,

๐ โˆ’ ๐ƒ๐˜ n

5 6

โ‰ค 1 + ๐œ— โ‹… min๐˜ ๐ โˆ’ ๐ƒ๐˜

5 6

slide-15
SLIDE 15

Outline

  • CX Decomposition & Approximate SVD
  • CUR Decomposition
  • SPSD Matrix Approximation
slide-16
SLIDE 16

CUR Decomposition

  • Sketching
  • ๐ƒ = ๐๐๐ƒ โˆˆ โ„$ร—*
  • ๐’ = ๐๐’

J๐ โˆˆ โ„vร—&

  • Find ๐• such that ๐ƒ๐•๐’ โ‰ˆ ๐
  • CUR โ‡” Approximate SVD
  • In the same way as โ€œCXโ‡” Approximate SVDโ€
slide-17
SLIDE 17

CUR Decomposition

  • Sketching
  • ๐ƒ = ๐๐๐ƒ โˆˆ โ„$ร—*
  • ๐’ = ๐๐’

J๐ โˆˆ โ„vร—&

  • Find ๐• such that ๐ƒ๐•๐’ โ‰ˆ ๐
  • CUR โ‡” Approximate SVD
  • In the same way as โ€œCXโ‡” Approximate SVDโ€
  • 3 types of ๐•
slide-18
SLIDE 18

CUR Decomposition

  • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008]:

๐• = ๐๐’

  • ๐๐๐ƒ

7

๐ƒ ๐ ๐• ๐’

slide-19
SLIDE 19

CUR Decomposition

  • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008]:

๐• = ๐๐’

  • ๐๐๐ƒ

7

  • Recall the fast CX decomposition

๐ โ‰ˆ ๐ƒ๐˜ n = ๐ƒ ๐๐’

  • ๐ƒ

7 ๐๐’

  • ๐ = ๐ƒ๐•๐’
slide-20
SLIDE 20

CUR Decomposition

  • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008]:

๐• = ๐๐’

  • ๐๐๐ƒ

7

  • Recall the fast CX decomposition

๐ โ‰ˆ ๐ƒ๐˜ n = ๐ƒ ๐๐’

  • ๐ƒ

7 ๐๐’

  • ๐ = ๐ƒ๐•๐’
slide-21
SLIDE 21

CUR Decomposition

  • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008]:

๐• = ๐๐’

  • ๐๐๐ƒ

7

  • Recall the fast CX decomposition

๐ โ‰ˆ ๐ƒ๐˜ n = ๐ƒ ๐๐’

  • ๐ƒ

7 ๐๐’

  • ๐ = ๐ƒ๐•๐’
  • Theyโ€™re equivalent: ๐ƒ ๐˜

n = ๐ƒ ๐• ๐’

slide-22
SLIDE 22

CUR Decomposition

  • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008]:

๐• = ๐๐’

  • ๐๐๐ƒ

7

  • Recall the fast CX decomposition

๐ โ‰ˆ ๐ƒ๐˜ n = ๐ƒ ๐๐’

  • ๐ƒ

7 ๐๐’

  • ๐ = ๐ƒ๐•๐’
  • Theyโ€™re equivalent: ๐ƒ ๐˜

n = ๐ƒ ๐• ๐’

  • Require ๐‘‘ = ๐‘ƒ

q

\ w and ๐‘  = ๐‘ƒ

q

* w such that

๐ โˆ’ ๐ƒ๐•๐’

5 6 โ‰ค 1 + ๐œ— ๐ โˆ’ ๐\ 5 6

slide-23
SLIDE 23

CUR Decomposition

  • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008]:

๐• = ๐๐’

๐”๐๐๐ƒ 7

  • Efficient
  • O ๐‘ ๐‘‘6 + TimeOfSketch
  • Loose bound
  • Sketch size โˆ ๐œ—{6
  • Bad empirical performance
slide-24
SLIDE 24

CUR Decomposition

  • Type 2: Optimal CUR

๐•โ‹† = min

๐•

๐ โˆ’ ๐ƒ๐•๐’

] 6 = ๐ƒ7๐๐’7

slide-25
SLIDE 25

CUR Decomposition

  • Type 2: Optimal CUR

๐•โ‹† = min

๐•

๐ โˆ’ ๐ƒ๐•๐’

] 6 = ๐ƒ7๐๐’7

  • Theory [W & Zhang, 2013], [Boutsidis & Woodruff, 2014]:
  • ๐ƒ and ๐’ are selected by the adaptive sampling algorithm
  • ๐‘‘ = ๐‘ƒ

\ w and ๐‘  = ๐‘ƒ \ w

  • ๐ โˆ’ ๐ƒ๐•๐’

] 6 โ‰ค 1 + ๐œ— ๐ โˆ’ ๐\ 5 6

slide-26
SLIDE 26

CUR Decomposition

  • Type 2: Optimal CUR

๐•โ‹† = min

๐•

๐ โˆ’ ๐ƒ๐•๐’

] 6 = ๐ƒ7๐๐’7

  • Inefficient
  • O ๐‘›๐‘œ๐‘‘ + TimeOfSketch
slide-27
SLIDE 27

CUR Decomposition

  • Type 3: Fast CUR [W, Zhang, Zhang, 2015]
  • Draw 2 sketching matrices ๐“๐ƒ and ๐“๐’
  • Solve the problem

๐• n = min

๐•

๐‘ป๐‘ซ

  • ๐ โˆ’ ๐ƒ๐•๐’ ๐“๐’

] 6

= ๐“๐ƒ

J๐ƒ 7 ๐“๐ƒ

  • ๐๐“๐‘บ

๐’๐“๐’ 7

  • Intuition?
slide-28
SLIDE 28

CUR Decomposition

  • The optimal ๐• matrix is obtained by the optimization problem

๐•โ‹† = min

๐•

๐ƒ๐•๐’ โˆ’ ๐

] 6

slide-29
SLIDE 29

CUR Decomposition

  • Approximately solve the optimization problem, e.g. by column

selection

slide-30
SLIDE 30

CUR Decomposition

  • Solve the small scale problem
slide-31
SLIDE 31

CUR Decomposition

  • Type 3: Fast CUR [W, Zhang, Zhang, 2015]
  • Draw 2 sketching matrices ๐“๐ƒ โˆˆ โ„$ร—mโ‚ฌ and ๐“๐’ โˆˆ โ„&ร—mโ€ข
  • Solve the problem

๐• n = min

๐•

๐“๐‘ซ

  • ๐ โˆ’ ๐ƒ๐•๐’ ๐“๐’

] 6

= ๐“๐ƒ

J๐ƒ 7 ๐“๐ƒ

  • ๐๐“๐‘บ

๐’๐“๐’ 7

  • Theory
  • ๐‘ก* = ๐‘ƒ

* w and sv = ๐‘ƒ v w

  • ๐ โˆ’ ๐ƒ๐•

n๐’

] 6

โ‰ค 1 + ๐œ— โ‹… min

๐•

๐ โˆ’ ๐ƒ๐•๐’

] 6

slide-32
SLIDE 32

CUR Decomposition

  • Type 3: Fast CUR [W, Zhang, Zhang, 2015]
  • Draw 2 sketching matrices ๐“๐ƒ โˆˆ โ„$ร—mโ‚ฌ and ๐“๐’ โˆˆ โ„&ร—mโ€ข
  • Solve the problem

๐• n = min

๐•

๐“๐‘ซ

  • ๐ โˆ’ ๐ƒ๐•๐’ ๐“๐’

] 6

= ๐“๐ƒ

J๐ƒ 7 ๐“๐ƒ

  • ๐๐“๐‘บ

๐’๐“๐’ 7

  • Efficient
  • ๐‘ƒ ๐‘ก*๐‘กv ๐‘‘ + ๐‘ 

+ TimeOfSketch

  • Good empirical performance
slide-33
SLIDE 33

Type 2: Optimal CUR Original Type 1: Fast CX Type 3: Fast CUR ๐‘ก* = 2๐‘‘, ๐‘กv = 2๐‘  Type 3: Fast CUR ๐‘ก* = 4๐‘‘, ๐‘กv = 4๐‘  ๐: ๐‘› = 1920 ๐‘œ = 1168 ๐ƒ and ๐’:

  • ๐‘‘ = ๐‘  = 100
  • uniform sampling
slide-34
SLIDE 34

Conclusions

  • Approximate truncated SVD
  • CX decomposition
  • CUR decomposition (3 types)
  • Fast CUR is the best
slide-35
SLIDE 35

Outline

  • CX Decomposition & Approximate SVD
  • CUR Decomposition
  • SPSD Matrix Approximation
slide-36
SLIDE 36

Motivation 1: Kernel Matrix

  • Given ๐‘œ samples ๐ฒโ€ฐ, โ‹ฏ , ๐ฒ& โˆˆ โ„โ€น and kernel function ๐œ† โ‹…,โ‹… .
  • E.g. Gaussian RBF kernel

๐œ† ๐ฒโ€ข, ๐ฒลฝ = exp โˆ’ ๐ฒโ€ข โˆ’ ๐ฒลฝ

6 6

๐œ6 .

  • Computing the kernel matrix ๐‹ โˆˆ โ„&ร—&
  • where ๐‘™โ€ขลฝ = ๐œ† ๐ฒโ€ข, ๐ฒลฝ
  • costs O(๐‘œ6๐‘’) time
slide-37
SLIDE 37

Motivation 1: Kernel Matrix

  • Given ๐‘œ samples ๐ฒโ€ฐ, โ‹ฏ , ๐ฒ& โˆˆ โ„โ€น and kernel function ๐œ† โ‹…,โ‹… .
  • E.g. Gaussian RBF kernel

๐œ† ๐ฒโ€ข, ๐ฒลฝ = exp โˆ’ ๐ฒโ€ข โˆ’ ๐ฒลฝ

6 6

๐œ6 .

  • Computing the kernel matrix ๐‹ โˆˆ โ„&ร—&
  • where ๐‘™โ€ขลฝ = ๐œ† ๐ฒโ€ข, ๐ฒลฝ
  • costs ๐‘ƒ(๐‘œ6๐‘’) time
slide-38
SLIDE 38

Motivation 2: Matrix Inversion

  • Solve the linear system

๐‹ + ๐›ฝ๐‰& ๐ฑ = ๐ณ to find ๐ฑ โˆˆ โ„&.

  • It costs O(๐‘œโ€“) time and O(๐‘œ6) memory.
  • Performed by
  • Gaussian process regression (equivalently, kernel ridge regression)
  • Least squares kernel SVM
  • ๐‹ โˆˆ โ„&ร—& is the kernel matrix
  • ๐ณ = ๐‘งโ€ฐ, โ‹ฏ , ๐‘ง& โˆˆ โ„& contains the labels
slide-39
SLIDE 39

Motivation 2: Matrix Inversion

  • Solve the linear system

๐‹ + ๐›ฝ๐‰& ๐ฑ = ๐ณ to find ๐ฑ โˆˆ โ„&.

  • Solution: ๐ฑโ‹† = ๐‹ + ๐›ฝ๐‰& {โ€ฐ๐ณ
slide-40
SLIDE 40

Motivation 2: Matrix Inversion

  • Solve the linear system

๐‹ + ๐›ฝ๐‰& ๐ฑ = ๐ณ to find ๐ฑ โˆˆ โ„&.

  • Solution: ๐ฑโ‹† = ๐‹ + ๐›ฝ๐‰& {โ€ฐ๐ณ
  • It costs
  • ๐‘ƒ(๐‘œโ€“) time
  • ๐‘ƒ(๐‘œ6) memory.
slide-41
SLIDE 41

Motivation 2: Matrix Inversion

  • Solve the linear system

๐‹ + ๐›ฝ๐‰& ๐ฑ = ๐ณ to find ๐ฑ โˆˆ โ„&.

  • Solution: ๐ฑโ‹† = ๐‹ + ๐›ฝ๐‰& {โ€ฐ๐ณ
  • It costs
  • ๐‘ƒ(๐‘œโ€“) time
  • ๐‘ƒ(๐‘œ6) memory.
  • Performed by
  • Kernel ridge regression
  • Least squares kernel SVM
slide-42
SLIDE 42

Motivation 3: Eigenvalue Decomposition

  • Find the top ๐‘™ (โ‰ช ๐‘œ) eigenvectors of ๐‹.
  • It costs
  • ๐‘ƒ

q(๐‘œ6๐‘™) time

  • ๐‘ƒ(๐‘œ6) memory.
slide-43
SLIDE 43

Motivation 3: Eigenvalue Decomposition

  • Find the top ๐‘™ (โ‰ช ๐‘œ) eigenvectors of ๐‹.
  • It costs
  • ๐‘ƒ

q(๐‘œ6๐‘™) time

  • ๐‘ƒ(๐‘œ6) memory.
  • Performed by
  • Kernel PCA (๐‘™ is the target rank)
  • Manifold learning (๐‘™ is the target rank)
slide-44
SLIDE 44

Computational Challenges

  • Time costs
  • Computing kernel matrix: ๐‘ƒ(๐‘œ6๐‘’)
  • Matrix inversion: ๐‘ƒ(๐‘œโ€“)
  • Rank ๐‘™ eigenvalue decomposition: ๐‘ƒ(๐‘œ6๐‘™)
slide-45
SLIDE 45

Computational Challenges

  • Time costs
  • Computing kernel matrix: ๐‘ƒ(๐‘œ6๐‘’)
  • Matrix inversion: ๐‘ƒ(๐‘œโ€“)
  • Rank ๐‘™ eigenvalue decomposition: ๐‘ƒ(๐‘œ6๐‘™)

At least quadratic time!

slide-46
SLIDE 46

Computational Challenges

  • Time costs
  • Computing kernel matrix: ๐‘ƒ(๐‘œ6๐‘’)
  • Matrix inversion: ๐‘ƒ(๐‘œโ€“)
  • Rank ๐‘™ eigenvalue decomposition: ๐‘ƒ(๐‘œ6๐‘™)
  • Memory costs
  • Inversion and eigenvalue decomposition: ๐‘ƒ(๐‘œ6)
slide-47
SLIDE 47

Computational Challenges

  • Time costs
  • Computing kernel matrix: ๐‘ƒ(๐‘œ6๐‘’)
  • Matrix inversion: ๐‘ƒ(๐‘œโ€“)
  • Rank ๐‘™ eigenvalue decomposition: ๐‘ƒ(๐‘œ6๐‘™)
  • Memory costs
  • Inversion and eigenvalue decomposition: ๐‘ƒ(๐‘œ6)
  • Because
  • the numerical algorithms are pass-inefficient
  • รจ form ๐‹ and keep it in memory
slide-48
SLIDE 48

Computational Challenges

  • Time costs
  • Computing kernel matrix: ๐‘ƒ(๐‘œ6๐‘’)
  • Matrix inversion: ๐‘ƒ(๐‘œโ€“)
  • Rank ๐‘™ eigenvalue decomposition: ๐‘ƒ(๐‘œ6๐‘™)
  • Memory costs
  • Inversion and eigenvalue decomposition: ๐‘ƒ(๐‘œ6)
  • Because
  • the numerical algorithms are pass-inefficient
  • รจ form ๐‹ and keep it in memory

When ๐‘œ = 10โ„ข, the ๐‘œร—๐‘œ matrix costs 80GB memory!

slide-49
SLIDE 49

How to Speedup?

  • Efficiently form the low-rank approximation

๐‹ โ‰ˆ ๐ƒ ๐• ๐ƒJ ๐‹ ๐ƒ ๐ƒJ ๐•

slide-50
SLIDE 50

How to Speedup?

  • Efficiently form the low-rank approximation

๐‹ โ‰ˆ ๐ƒ ๐• ๐ƒJ

  • Equivalent ๐‹ โ‰ˆ ๐Œ ๐ŒJ
slide-51
SLIDE 51

Efficient Matrix Inversion

  • Solve the linear system ๐‹ + ๐›ฝ๐‰& ๐ฑ = ๐ณ:
  • Replace ๐‹ by ๐Œ๐ŒJ: ๐ฑโ‹† = ๐‹ + ๐›ฝ๐‰& {โ€ฐ๐ณ โ‰ˆ ๐Œ๐ŒJ + ๐›ฝ๐‰&

{โ€ฐ๐ณ

slide-52
SLIDE 52

Efficient Matrix Inversion

  • Approximately solve the linear system ๐‹ + ๐›ฝ๐‰& ๐ฑ = ๐ณ:
  • Replace ๐‹ by ๐Œ๐ŒJ: ๐ฑโ‹† = ๐‹ + ๐›ฝ๐‰& {โ€ฐ๐ณ โ‰ˆ ๐Œ๐ŒJ + ๐›ฝ๐‰&

{โ€ฐ๐ณ

slide-53
SLIDE 53

Efficient Matrix Inversion

  • Approximately solve the linear system ๐‹ + ๐›ฝ๐‰& ๐ฑ = ๐ณ
  • Replace ๐‹ by ๐Œ๐ŒJ: ๐ฑโ‹† = ๐‹ + ๐›ฝ๐‰& {โ€ฐ๐ณ โ‰ˆ ๐Œ๐ŒJ + ๐›ฝ๐‰&

{โ€ฐ๐ณ

  • Expand the inversion by the Woodbury identity
slide-54
SLIDE 54

Efficient Matrix Inversion

  • Approximately solve the linear system ๐‹ + ๐›ฝ๐‰& ๐ฑ = ๐ณ
  • Replace ๐‹ by ๐Œ๐ŒJ: ๐ฑโ‹† = ๐‹ + ๐›ฝ๐‰& {โ€ฐ๐ณ โ‰ˆ ๐Œ๐ŒJ + ๐›ฝ๐‰&

{โ€ฐ๐ณ

  • Expand the inversion by the Woodbury identity

๐ + ๐‚๐ƒ๐„ {โ€ฐ = ๐{โ€ฐ โˆ’ ๐{โ€ฐ๐‚ ๐ƒ{โ€ฐ + ๐„๐{โ€ฐ๐‚ {โ€ฐ๐„๐{โ€ฐ

slide-55
SLIDE 55

Efficient Matrix Inversion

  • Approximately solve the linear system ๐‹ + ๐›ฝ๐‰& ๐ฑ = ๐ณ
  • Replace ๐‹ by ๐Œ๐ŒJ: ๐ฑโ‹† = ๐‹ + ๐›ฝ๐‰& {โ€ฐ๐ณ โ‰ˆ ๐Œ๐ŒJ + ๐›ฝ๐‰&

{โ€ฐ๐ณ

  • Expand the inversion by the Woodbury identity

๐ฑโ‹† โ‰ˆ ๐›ฝ{โ€ฐ๐ณ + ๐›ฝ{โ€ฐ๐Œ ๐›ฝ๐‰ + ๐Œo๐Œ {โ€ฐ๐Œ๐”๐ณ

slide-56
SLIDE 56

Efficient Matrix Inversion

  • Approximately solve the linear system ๐‹ + ๐›ฝ๐‰& ๐ฑ = ๐ณ
  • Replace ๐‹ by ๐Œ๐ŒJ: ๐ฑโ‹† = ๐‹ + ๐›ฝ๐‰& {โ€ฐ๐ณ โ‰ˆ ๐Œ๐ŒJ + ๐›ฝ๐‰&

{โ€ฐ๐ณ

  • Expand the inversion by the Woodbury identity

๐ฑโ‹† โ‰ˆ ๐›ฝ{โ€ฐ๐ณ + ๐›ฝ{โ€ฐ๐Œ ๐›ฝ๐‰ + ๐Œo๐Œ {โ€ฐ๐Œ๐”๐ณ

  • Time cost: ๐‘ƒ ๐‘œ๐‘‘6

Linear in ๐‘œ, much better than ๐‘ƒ ๐‘œโ€“

slide-57
SLIDE 57

Efficient Eigenvalue Decomposition

  • Approximately compute the ๐‘™-eigenvalue decomposition of ๐‹
  • SVD: ๐Œ = ๐•๐Œ๐šป๐Œ๐–๐Œ
  • ๐‹ โ‰ˆ ๐Œ๐Œ๐” = ๐•๐Œ๐šป๐Œ

๐Ÿ‘๐•๐Œ ๐”

  • Approximate ๐‘™- eigenvalue decomposition of ๐‹
  • eigenvectors: the first ๐‘™ vectors in ๐•๐Œ
  • eigenvalues: the first ๐‘™ vectors diagonal entries in ๐šป๐Œ
slide-58
SLIDE 58

Efficient Eigenvalue Decomposition

  • Approximately compute the ๐‘™-eigenvalue decomposition of ๐‹
  • SVD: ๐Œ = ๐•๐Œ๐šป๐Œ๐–๐Œ
  • ๐‹ โ‰ˆ ๐Œ๐Œ๐” = ๐•๐Œ๐šป๐Œ

๐Ÿ‘๐•๐Œ ๐”

  • Approximate ๐‘™- eigenvalue decomposition of ๐‹
  • eigenvectors: the first ๐‘™ vectors in ๐•๐Œ
  • Time cost: ๐‘ƒ ๐‘œ๐‘‘6
slide-59
SLIDE 59

Efficient Eigenvalue Decomposition

  • Approximately compute the ๐‘™-eigenvalue decomposition of ๐‹
  • SVD: ๐Œ = ๐•๐Œ๐šป๐Œ๐–๐Œ
  • ๐‹ โ‰ˆ ๐Œ๐Œ๐” = ๐•๐Œ๐šป๐Œ

๐Ÿ‘๐•๐Œ ๐”

  • Approximate ๐‘™- eigenvalue decomposition of ๐‹
  • eigenvectors: the first ๐‘™ vectors in ๐•๐Œ
  • Time cost: ๐‘ƒ ๐‘œ๐‘‘6
  • Much lower than ๐‘ƒ

q ๐‘œ6๐‘™

slide-60
SLIDE 60

Sketching Based Models

  • How to find such an approximation?

๐‹ โ‰ˆ ๐ƒ ๐• ๐ƒJ

๐‹ ๐ƒ ๐ƒJ ๐•

slide-61
SLIDE 61

Sketching Based Models

  • How to find such an approximation?
  • Sketching based Methods: ๐ƒ = ๐‹๐“ โˆˆ โ„&ร—* is a sketch of ๐‹.
  • ๐“ โˆˆ โ„&ร—* can be column selection or random projection matrix

๐‹ โ‰ˆ ๐ƒ ๐• ๐ƒJ

slide-62
SLIDE 62

Sketching Based Models

  • How to find such an approximation?
  • Sketching based Methods: ๐ƒ = ๐‹๐“ โˆˆ โ„&ร—* is a sketch of ๐‹.
  • ๐“ โˆˆ โ„&ร—* can be column selection or random projection matrix
  • Three methods:
  • The prototype model [HMT11, WZ13, WLZ16]
  • The fast model [WZZ15]
  • The Nystrรถm method [WS15, GM13]

๐‹ โ‰ˆ ๐ƒ ๐• ๐ƒJ

slide-63
SLIDE 63

The Prototype Model

  • Objective: ๐‹ โ‰ˆ ๐ƒ๐•๐ƒJ
  • Minimize the approximation error by

๐•โ‹† = argmin

๐•

๐‹ โˆ’ ๐ƒ๐•๐ƒJ

] 6

= ๐ƒ7๐‹ ๐ƒ7 J.

slide-64
SLIDE 64

The Prototype Model

  • Objective: ๐‹ โ‰ˆ ๐ƒ๐•๐ƒJ
  • Minimize the approximation error by

๐•โ‹† = argmin

๐•

๐‹ โˆ’ ๐ƒ๐•๐ƒJ

] 6

= ๐ƒ7๐‹ ๐ƒ7 J. Extension of the random SVD to SPSD matrix [HMT11]

slide-65
SLIDE 65

The Prototype Model

  • Objective: ๐‹ โ‰ˆ ๐ƒ๐•๐ƒJ
  • Minimize the approximation error by

๐•โ‹† = argmin

๐•

๐‹ โˆ’ ๐ƒ๐•๐ƒJ

] 6

= ๐ƒ7๐‹ ๐ƒ7 J.

  • Time: ๐‘ƒ(๐‘œ6๐‘‘)
  • The time complexity is nearly the same to the ๐‘™-eigenvalue decomposition.
  • It is much faster than the ๐‘™- eigenvalue decomposition in practice.
slide-66
SLIDE 66

The Prototype Model

  • Objective: ๐‹ โ‰ˆ ๐ƒ๐•๐ƒJ
  • Minimize the approximation error by

๐•โ‹† = argmin

๐•

๐‹ โˆ’ ๐ƒ๐•๐ƒJ

] 6

= ๐ƒ7๐‹ ๐ƒ7 J.

  • Time: ๐‘ƒ(๐‘œ6๐‘‘)
  • #Passes: one
slide-67
SLIDE 67

The Prototype Model

  • Objective: ๐‹ โ‰ˆ ๐ƒ๐•๐ƒJ
  • Minimize the approximation error by

๐•โ‹† = argmin

๐•

๐‹ โˆ’ ๐ƒ๐•๐ƒJ

] 6

= ๐ƒ7๐‹ ๐ƒ7 J.

  • Time: ๐‘ƒ(๐‘œ6๐‘‘)
  • #Passes: one
  • Memory: ๐‘ƒ(๐‘œ๐‘‘)
  • Put ๐‘™โ€ขลฝ in memory only when it is visited
  • Keep ๐ƒ7 in memory
slide-68
SLIDE 68

The Prototype Model

  • Error Bound
  • ๐‘™ โ‰ช ๐‘œ is arbitrary integer
  • ๐ samples ๐‘‘ = ๐‘ƒ

\ w columns by adaptive sampling

  • ๐”ฝ

๐‹ โˆ’ ๐ƒ๐•โ‹†๐ƒJ

] 6

โ‰ค 1 + ๐œ— ๐‹ โˆ’ ๐‹\

] 6

slide-69
SLIDE 69

The Prototype Model

  • Limitations
  • ๐•โ‹† = ๐ƒ7๐‹ ๐ƒ7 J
  • Time cost is ๐‘ƒ ๐‘œ6๐‘‘
  • Requires observing the whole of ๐‹
slide-70
SLIDE 70

The Prototype Model

  • Prototype model: ๐‹ โ‰ˆ ๐ƒ ๐•โ‹† ๐ƒJ, where

๐•โ‹† = argmin

๐•

๐‹ โˆ’ ๐ƒ๐•๐ƒJ

5 6

. ๐‹ ๐ƒ ๐ƒJ ๐•

slide-71
SLIDE 71

The Fast Model

  • Column/row selection
  • Form ๐J๐‹๐ and ๐J๐ƒ

๐J๐‹๐ ๐J๐ƒ ๐ƒJ๐ ๐•

slide-72
SLIDE 72

The Fast Model

  • Column/row selection
  • Form ๐J๐‹๐ and ๐J๐ƒ

๐J๐‹๐ ๐J๐ƒ ๐ƒJ๐ ๐•

slide-73
SLIDE 73

The Fast Model

  • ๐‹ โ‰ˆ ๐ƒ ๐•

n ๐ƒJ, where ๐• n = argmin

๐•

๐J ๐‹ โˆ’ ๐ƒ๐•๐ƒJ ๐

๐‘ฎ ๐Ÿ‘

. ๐J๐‹๐ ๐J๐ƒ ๐ƒJ๐ ๐•

slide-74
SLIDE 74

The Fast Model

  • Prototype model: ๐•โ‹† = argmin

๐•

๐‹ โˆ’ ๐ƒ๐•๐ƒJ

5 6

= ๐ƒ7๐‹ ๐ƒ7 J

  • Fast model: ๐•

n = argmin

๐•

๐J ๐‹ โˆ’ ๐ƒ๐•๐ƒJ ๐

] ๐Ÿ‘

= ๐J๐ƒ

7(๐J๐‹๐) ๐ƒJ๐ 7.

slide-75
SLIDE 75

The Fast Model

  • Prototype model: ๐•โ‹† = argmin

๐•

๐‹ โˆ’ ๐ƒ๐•๐ƒJ

5 6

= ๐ƒ7๐‹ ๐ƒ7 J

  • Fast model: ๐•

n = argmin

๐•

๐J ๐‹ โˆ’ ๐ƒ๐•๐ƒJ ๐

] ๐Ÿ‘

= ๐J๐ƒ

7(๐J๐‹๐) ๐ƒJ๐ 7.

  • Theory
  • ๐‘ž = ๐‘ƒ

&* w

  • ๐ is column selection matrix (according to the row leverage scores of ๐ƒ)
  • Then ๐‹ โˆ’ ๐ƒ๐•

n๐ƒJ

] 6

โ‰ค 1 + ๐œ— ๐‹ โˆ’ ๐ƒ๐•โ‹†๐ƒJ

] 6

The faster model is nearly as good as the prototype model!

slide-76
SLIDE 76

The Fast Model

  • Prototype model: ๐•โ‹† = argmin

๐•

๐‹ โˆ’ ๐ƒ๐•๐ƒJ

5 6

= ๐ƒ7๐‹ ๐ƒ7 J

  • Fast model: ๐•

n = argmin

๐•

๐J ๐‹ โˆ’ ๐ƒ๐•๐ƒJ ๐

] ๐Ÿ‘

= ๐J๐ƒ

7(๐J๐‹๐) ๐ƒJ๐ 7.

  • Theory
  • ๐‘ž = ๐‘ƒ

&* w

  • ๐ is column selection matrix (according to the row leverage scores of ๐ƒ)
  • Then ๐‹ โˆ’ ๐ƒ๐•

n๐ƒJ

] 6

โ‰ค 1 + ๐œ— ๐‹ โˆ’ ๐ƒ๐•โ‹†๐ƒJ

] 6

  • Overall time cost: ๐‘ƒ ๐‘ž6๐‘‘ + ๐‘œ๐‘‘6 = ๐‘ƒ ๐‘œ๐‘‘โ€“/๐œ—

linear in ๐‘œ

slide-77
SLIDE 77

The Nystrรถm Method

  • ๐“ (๐‘œร—๐‘‘): column selection matrix
  • ๐ƒ = ๐‹๐“ (๐‘œร—๐‘‘), ๐— = ๐“J๐‹๐“ = ๐“J๐ƒ (๐‘‘ร—๐‘‘)
  • The Nystrรถm method: ๐‹ โ‰ˆ ๐ƒ๐—7๐ƒJ

๐‹

slide-78
SLIDE 78

The Nystrรถm Method

  • ๐“ (๐‘œร—๐‘‘): column selection matrix
  • ๐ƒ = ๐‹๐“ (๐‘œร—๐‘‘), ๐— = ๐“J๐‹๐“ = ๐“J๐ƒ (๐‘‘ร—๐‘‘)
  • The Nystrรถm method: ๐‹ โ‰ˆ ๐ƒ๐—7๐ƒJ

๐‹ ๐ƒ ๐ƒJ

slide-79
SLIDE 79

The Nystrรถm Method

  • ๐“ (๐‘œร—๐‘‘): column selection matrix
  • ๐ƒ = ๐‹๐“ (๐‘œร—๐‘‘), ๐— = ๐“J๐‹๐“ = ๐“J๐ƒ (๐‘‘ร—๐‘‘)
  • The Nystrรถm method: ๐‹ โ‰ˆ ๐ƒ๐—7๐ƒJ

๐‹ ๐ƒ ๐ƒJ ๐—

slide-80
SLIDE 80

The Nystrรถm Method

  • ๐“ (๐‘œร—๐‘‘): column selection matrix
  • ๐ƒ = ๐‹๐“ (๐‘œร—๐‘‘), ๐— = ๐“J๐‹๐“ = ๐“J๐ƒ (๐‘‘ร—๐‘‘)
  • The Nystrรถm method: ๐‹ โ‰ˆ ๐ƒ ๐—7 ๐ƒJ

๐‹ ๐ƒ ๐ƒJ ๐—7

slide-81
SLIDE 81

The Nystrรถm Method

  • ๐“ (๐‘œร—๐‘‘): column selection matrix
  • ๐ƒ = ๐‹๐“ (๐‘œร—๐‘‘), ๐— = ๐“J๐‹๐“ = ๐“J๐ƒ (๐‘‘ร—๐‘‘)
  • The Nystrรถm method: ๐‹ โ‰ˆ ๐ƒ ๐—7 ๐ƒJ
  • New explanation:
  • Recall the fast model: ๐˜

n = argmin

๐˜

๐J ๐‹ โˆ’ ๐ƒ๐˜๐ƒJ ๐

5 6

  • Setting ๐ = ๐“, then

๐˜ n = argmin

๐˜

๐“J ๐‹ โˆ’ ๐ƒ๐˜๐ƒJ ๐“

5 6

= ๐“J๐ƒ

7 ๐“J๐‹๐“

๐ƒJ๐“

7

= ๐—7๐—๐—7 = ๐—7

slide-82
SLIDE 82

The Nystrรถm Method

  • ๐“ (๐‘œร—๐‘‘): column selection matrix
  • ๐ƒ = ๐‹๐“ (๐‘œร—๐‘‘), ๐— = ๐“J๐‹๐“ = ๐“J๐ƒ (๐‘‘ร—๐‘‘)
  • The Nystrรถm method: ๐‹ โ‰ˆ ๐ƒ ๐—7 ๐ƒJ
  • New explanation:
  • Recall the fast model: ๐˜

n = argmin

๐˜

๐J ๐‹ โˆ’ ๐ƒ๐˜๐ƒJ ๐

5 6

  • Setting ๐ = ๐“, then

๐˜ n = argmin

๐˜

๐“J ๐‹ โˆ’ ๐ƒ๐˜๐ƒJ ๐“

5 6

= ๐“J๐ƒ

7 ๐“J๐‹๐“

๐ƒJ๐“

7

= ๐—7๐—๐—7 = ๐—7

slide-83
SLIDE 83

The Nystrรถm Method

  • ๐“ (๐‘œร—๐‘‘): column selection matrix
  • ๐ƒ = ๐‹๐“ (๐‘œร—๐‘‘), ๐— = ๐“J๐‹๐“ = ๐“J๐ƒ (๐‘‘ร—๐‘‘)
  • The Nystrรถm method: ๐‹ โ‰ˆ ๐ƒ ๐—7 ๐ƒJ
  • New explanation:
  • Recall the fast model: ๐˜

n = argmin

๐˜

๐J ๐‹ โˆ’ ๐ƒ๐˜๐ƒJ ๐

5 6

  • Setting ๐ = ๐“, then

๐˜ n = argmin

๐˜

๐“J ๐‹ โˆ’ ๐ƒ๐˜๐ƒJ ๐“

5 6

= ๐“J๐ƒ

7 ๐“J๐‹๐“

๐ƒJ๐“

7

= ๐—7๐—๐—7 = ๐—7

slide-84
SLIDE 84

The Nystrรถm Method

  • ๐“ (๐‘œร—๐‘‘): column selection matrix
  • ๐ƒ = ๐‹๐“ (๐‘œร—๐‘‘), ๐— = ๐“J๐‹๐“ = ๐“J๐ƒ (๐‘‘ร—๐‘‘)
  • The Nystrรถm method: ๐‹ โ‰ˆ ๐ƒ ๐—7 ๐ƒJ
  • New explanation:
  • Recall the fast model: ๐˜

n = argmin

๐˜

๐J ๐‹ โˆ’ ๐ƒ๐˜๐ƒJ ๐

5 6

  • Setting ๐ = ๐“, then

๐˜ n = argmin

๐˜

๐“J ๐‹ โˆ’ ๐ƒ๐˜๐ƒJ ๐“

5 6

= ๐“J๐ƒ

7 ๐“J๐‹๐“

๐ƒJ๐“

7

= ๐—7๐—๐—7 = ๐—7

slide-85
SLIDE 85

The Nystrรถm Method

  • ๐“ (๐‘œร—๐‘‘): column selection matrix
  • ๐ƒ = ๐‹๐“ (๐‘œร—๐‘‘), ๐— = ๐“J๐‹๐“ = ๐“J๐ƒ (๐‘‘ร—๐‘‘)
  • The Nystrรถm method: ๐‹ โ‰ˆ ๐ƒ ๐—7 ๐ƒJ
  • New explanation:
  • Recall the fast model: ๐˜

n = argmin

๐˜

๐J ๐‹ โˆ’ ๐ƒ๐˜๐ƒJ ๐

5 6

  • Setting ๐ = ๐“, then

๐˜ n = argmin

๐˜

๐“J ๐‹ โˆ’ ๐ƒ๐˜๐ƒJ ๐“

5 6

= ๐“J๐ƒ

7 ๐“J๐‹๐“

๐ƒJ๐“

7

= ๐—7๐—๐—7 = ๐—7

slide-86
SLIDE 86

The Nystrรถm Method

  • ๐“ (๐‘œร—๐‘‘): column selection matrix
  • ๐ƒ = ๐‹๐“ (๐‘œร—๐‘‘), ๐— = ๐“J๐‹๐“ = ๐“J๐ƒ (๐‘‘ร—๐‘‘)
  • The Nystrรถm method: ๐‹ โ‰ˆ ๐ƒ ๐—7 ๐ƒJ
  • New explanation:
  • Recall the fast model: ๐˜

n = argmin

๐˜

๐J ๐‹ โˆ’ ๐ƒ๐˜๐ƒJ ๐

5 6

  • Setting ๐ = ๐“, then

๐˜ n = argmin

๐˜

๐“J ๐‹ โˆ’ ๐ƒ๐˜๐ƒJ ๐“

5 6

= ๐“J๐ƒ

7 ๐“J๐‹๐“

๐ƒJ๐“

7

= ๐—7๐—๐—7 = ๐—7

  • The Nystrom method is special

instance of the fast model.

  • It is approximate solution to the

prototype model

slide-87
SLIDE 87

The Nystrรถm Method

  • ๐“ (๐‘œร—๐‘‘): column selection matrix
  • ๐ƒ = ๐‹๐“ (๐‘œร—๐‘‘), ๐— = ๐“J๐‹๐“ = ๐“J๐ƒ (๐‘‘ร—๐‘‘)
  • The Nystrรถm method: ๐‹ โ‰ˆ ๐ƒ ๐—7 ๐ƒJ
  • New explanation:
  • Recall the fast model: ๐˜

n = argmin

๐˜

๐J ๐‹ โˆ’ ๐ƒ๐˜๐ƒJ ๐

5 6

  • Setting ๐ = ๐“, then

๐˜ n = argmin

๐˜

๐“J ๐‹ โˆ’ ๐ƒ๐˜๐ƒJ ๐“

5 6

= ๐“J๐ƒ

7 ๐“J๐‹๐“

๐ƒJ๐“

7

= ๐—7๐—๐—7 = ๐—7

  • The Nystrom method is special

instance of the fast model.

  • It is approximate solution to the

prototype model

slide-88
SLIDE 88

The Nystrรถm Method

  • Cost
  • Time: ๐‘ƒ ๐‘œ๐‘‘6
  • Memory: ๐‘ƒ ๐‘œ๐‘‘
slide-89
SLIDE 89

The Nystrรถm Method

  • Cost
  • Time: ๐‘ƒ ๐‘œ๐‘‘6
  • Memory: ๐‘ƒ ๐‘œ๐‘‘

Very efficient!

slide-90
SLIDE 90

The Nystrรถm Method

  • Cost
  • Time: ๐‘ƒ ๐‘œ๐‘‘6
  • Memory: ๐‘ƒ ๐‘œ๐‘‘
  • Error bound: weak

Very efficient!

slide-91
SLIDE 91

Comparisons

  • ๐ƒ = ๐‹๐“ โˆˆ โ„&ร—*, ๐— = ๐“J๐‹๐“ = ๐“J๐ƒ โˆˆ โ„*ร—*
  • SPSD matrix approximation: ๐‹ โ‰ˆ ๐ƒ๐•๐ƒJ
  • The prototype model: ๐• = ๐ƒ7๐‹ ๐ƒ7 J
  • The fast model: ๐• = ๐J๐ƒ

7(๐J๐‹๐) ๐ƒJ๐ 7

  • The Nystrรถm method: ๐• = ๐—7
slide-92
SLIDE 92

Comparisons

  • ๐ƒ = ๐‹๐“ โˆˆ โ„&ร—*, ๐— = ๐“J๐‹๐“ = ๐“J๐ƒ โˆˆ โ„*ร—*
  • SPSD matrix approximation: ๐‹ โ‰ˆ ๐ƒ๐•๐ƒJ
  • The prototype model: ๐• = ๐ƒ7๐‹ ๐ƒ7 J
  • The fast model: ๐• = ๐J๐ƒ

7(๐J๐‹๐) ๐ƒJ๐ 7

  • The Nystrรถm method: ๐• = ๐—7

When ๐ = ๐‰&, the prototype model โ‡” the fast model

slide-93
SLIDE 93

Comparisons

  • ๐ƒ = ๐‹๐“ โˆˆ โ„&ร—*, ๐— = ๐“J๐‹๐“ = ๐“J๐ƒ โˆˆ โ„*ร—*
  • SPSD matrix approximation: ๐‹ โ‰ˆ ๐ƒ๐•๐ƒJ
  • The prototype model: ๐• = ๐ƒ7๐‹ ๐ƒ7 J
  • The fast model: ๐• = ๐J๐ƒ

7(๐J๐‹๐) ๐ƒJ๐ 7

  • The Nystrรถm method: ๐• = ๐—7

When ๐ = ๐“, the Nystrรถm method โ‡” the fast model

slide-94
SLIDE 94

Comparisons

  • ๐‘‘ = 150, ๐‘œ = 100๐‘‘, vary ๐‘ž from 2๐‘‘ to 40๐‘‘

๐‘ž/๐‘‘

๐‹ โˆ’ ๐ƒ๐•๐ƒJ

] 6

๐‹

] 6

The Nystrรถm Method

๐‘ƒ ๐‘œ๐‘‘6 time

The Fast Model ๐‘ƒ ๐‘œ๐‘‘6 + ๐‘ž6๐‘‘ time The Prototype Model ๐‘ƒ ๐‘œ6๐‘‘ time

slide-95
SLIDE 95

Conclusions

  • Motivations
  • Avoid forming the kernel matrix
  • Avoid inversion/decomposition
  • Prototype model, fast model, Nystrom
  • They have connections
  • The fast model and Nystrom are practical