JL Lemma, Dimensionality Reduction, and Subspace Embeddings - - PowerPoint PPT Presentation

jl lemma dimensionality reduction and subspace embeddings
SMART_READER_LITE
LIVE PREVIEW

JL Lemma, Dimensionality Reduction, and Subspace Embeddings - - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data JL Lemma, Dimensionality Reduction, and Subspace Embeddings Lecture 11 September 29, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 25 F 2 estimation in turnstile setting AMS- ` 2 -Estimate : Let Y 1 , Y 2


slide-1
SLIDE 1

CS 498ABD: Algorithms for Big Data

JL Lemma, Dimensionality Reduction, and Subspace Embeddings

Lecture 11

September 29, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 25
slide-2
SLIDE 2

F2 estimation in turnstile setting

AMS-`2-Estimate: Let Y1, Y2, . . . , Yn be {1, +1} random variables that are 4-wise independent z 0 While (stream is not empty) do aj = (ij, ∆j) is current update z z + ∆jYij endWhile Output z2 Claim: Output estimates ||x||2 2 where x is the vector at end of stream of updates. Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 25

"

±: 't

:

"

I? I

  • e

slide-3
SLIDE 3

Analysis

Z = Pn i=1 xiYi and output is Z 2 Z 2 = X i x2 i Y 2 i + 2 X i6=j xixjYiYj and hence E ⇥ Z 2⇤ = X i x2 i = ||x||2 2. One can show that Var(Z 2)  2(E ⇥ Z 2⇤ )2. Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 25
slide-4
SLIDE 4

Linear Sketching View

Recall that we take average of independent estimators and take median to reduce error. Can we view all this as a sketch? AMS-`2-Sketch: k = c log(1/)/✏2 Let M be a ` ⇥ n matrix with entries in {1, 1} s.t (i) rows are independent and (ii) in each row entries are 4-wise independent z is a ` ⇥ 1 vector initialized to 0 While (stream is not empty) do aj = (ij, ∆j) is current update z z + ∆jMeij endWhile Output vector z as sketch. M is compactly represented via k hash functions, one per row, independently chosen from 4-wise independent hash family. Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 25
slide-5
SLIDE 5

Geometric Interpretation

Given vector x 2 Rn let M the random map z = Mx has the following features E[zi] = 0 and E ⇥ z2 i ⇤ = kxk2 2 for each 1  i  k where k is number of rows of M Thus each z2 i is an estimate of length of x in Euclidean norm When k = Θ( 1 ✏2 log(1/)) one can obtain an (1 ± ✏) estimate
  • f kxk2 by averaging and median ideas
Thus we are able to compress x into k-dimensional vector z such that z contains information to estimate kxk2 accurately Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 25

I

"

l

" Is

¥ is;

THEHellxdI

slide-6
SLIDE 6

Geometric Interpretation

Given vector x 2 Rn let M the random map z = Mx has the following features E[zi] = 0 and E ⇥ z2 i ⇤ = kxk2 2 for each 1  i  k where k is number of rows of M Thus each z2 i is an estimate of length of x in Euclidean norm When k = Θ( 1 ✏2 log(1/)) one can obtain an (1 ± ✏) estimate
  • f kxk2 by averaging and median ideas
Thus we are able to compress x into k-dimensional vector z such that z contains information to estimate kxk2 accurately Question: Do we need median trick? Will averaging do? Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 25
slide-7
SLIDE 7

Distributional JL Lemma

Lemma (Distributional JL Lemma) Fix vector x 2 Rd and let Π 2 Rk⇥d matrix where each entry Πij is chosen independently according to standard normal distribution N (0, 1) distribution. If k = Ω( 1 ✏2 log(1/)), then with probability (1 ) k 1 p k Πxk2 = (1 ± ✏)kxk2. Can choose entries from {1, 1} as well. Note: unlike `2 estimation, entries of Π are independent. Letting z = 1 p k Πx we have projected x from d dimensions to k = O( 1 ✏2 log(1/)) dimensions while preserving length to within (1 ± ✏)-factor. Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 25 E

(i-c) 11×112 E

slide-8
SLIDE 8 C- R"

'

" "÷ ''

Kx n

⇐ HEIL

Tig

. n NCO, l)

elite) 11×11.

=

with pub

x G -f) .

slide-9
SLIDE 9

Dimensionality reduction

Theorem (Metric JL Lemma) Let v1, v2, . . . , vn be any n points/vectors in Rd. For any ✏ 2 (0, 1/2), there is linear map f : Rd ! Rk where k  8 ln n/✏2 such that for all 1  i < j  n, (1 ✏)||vi vj||2  ||f (vi) f (vj)||2  ||vi vj||2. Moreover f can be obtained in randomized polynomial-time. Linear map f is simply given by random matrix Π: f (v) = Πv. Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 25 CHL)
  • ve
.

I

Uj

HEH

→ bwfY

  • iii. viii.
  • ud
  • Hui- Y'll
slide-10
SLIDE 10 Vi , Vr, . . . Tu .

E Rd

.

DJL

Lemma :

If

you chase

IT E R'

"dpr

K

  • I
, but

then if any fixed vector

I C-Rd

KITxllznn (II.e) 11×112

(7)

nectar

Ji

  • Tj

i #j

=

If

we chose

f

  • in
⇒ k= Eln 's ,

then

with pub G- plz )

= Ellen

.

ITCui - 5.Ill, a line, Hui

  • fill.

Rg

union

band

all

' Ii - u

;

rectum are

preserved

with

pals

I- ⇐ g. ha 7

I - Ln .
slide-11
SLIDE 11

Dimensionality reduction

Theorem (Metric JL Lemma) Let v1, v2, . . . , vn be any n points/vectors in Rd. For any ✏ 2 (0, 1/2), there is linear map f : Rd ! Rk where k  8 ln n/✏2 such that for all 1  i < j  n, (1 ✏)||vi vj||2  ||f (vi) f (vj)||2  ||vi vj||2. Moreover f can be obtained in randomized polynomial-time. Linear map f is simply given by random matrix Π: f (v) = Πv. Proof. Apply DJL with = 1/n2 and apply union bound to n 2
  • vectors
(vi vj), i 6= j. Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 25
slide-12
SLIDE 12

DJL and Metric JL

Key advantage: mapping is oblivious to data! Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 25
slide-13
SLIDE 13

Normal Distribution

Density function: f (x) = 1 p 2⇡2e (x−µ)2 2σ2 Standard normal: N (0, 1) is when µ = 0, = 1 Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 25

=

44

slide-14
SLIDE 14

Normal Distribution

Cumulative density function for standard normal: Φ(x) = 1 p 2⇡ R t 1 et2/2 (no closed form) Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 25

#

slide-15
SLIDE 15

Sum of independent Normally distributed variables

Lemma Let X and Y be independent random variables. Suppose X ⇠ N (µX, 2 X) and Y ⇠ N (µY , 2 Y ). Let Z = X + Y . Then Z ⇠ N (µX + µY , 2 X + 2 Y ). Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 25
slide-16
SLIDE 16

Sum of independent Normally distributed variables

Lemma Let X and Y be independent random variables. Suppose X ⇠ N (µX, 2 X) and Y ⇠ N (µY , 2 Y ). Let Z = X + Y . Then Z ⇠ N (µX + µY , 2 X + 2 Y ). Corollary Let X and Y be independent random variables. Suppose X ⇠ N (0, 1) and Y ⇠ N (0, 1). Let Z = aX + bY . Then Z ⇠ N (0, a2 + b2). Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 25
slide-17
SLIDE 17

Sum of independent Normally distributed variables

Lemma Let X and Y be independent random variables. Suppose X ⇠ N (µX, 2 X) and Y ⇠ N (µY , 2 Y ). Let Z = X + Y . Then Z ⇠ N (µX + µY , 2 X + 2 Y ). Corollary Let X and Y be independent random variables. Suppose X ⇠ N (0, 1) and Y ⇠ N (0, 1). Let Z = aX + bY . Then Z ⇠ N (0, a2 + b2). Normal distribution is a stable distributions: adding two independent random variables within the same class gives a distribution inside the
  • class. Others exist and useful in Fp estimation for p 2 (0, 2).
Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 25
slide-18
SLIDE 18

Concentration of sum of squares of normally distributed variables

2(k) distribution: distribution of sum of k independent standard normally distributed variables Y = Pk i=1 Zi where each Zi ' N (0, 1). Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 25
slide-19
SLIDE 19

Concentration of sum of squares of normally distributed variables

2(k) distribution: distribution of sum of k independent standard normally distributed variables Y = Pk i=1 Zi where each Zi ' N (0, 1). E ⇥ Z 2 i ⇤ = 1 hence E[Y ] = k. Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 25 ~

Effi 7- O

=

slide-20
SLIDE 20

Concentration of sum of squares of normally distributed variables

2(k) distribution: distribution of sum of k independent standard normally distributed variables Y = Pk i=1 Zi where each Zi ' N (0, 1). E ⇥ Z 2 i ⇤ = 1 hence E[Y ] = k. Lemma Let Z1, Z2, . . . , Zk be independent N (0, 1) random variables and let Y = P i Z 2 i . Then, for ✏ 2 (0, 1/2), there is a constant c such that, Pr[(1 ✏)2k  Y  (1 + ✏)2k] 1 2ec✏2k. Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 25 " = .

I

slide-21
SLIDE 21

2 distribution

Density function Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 25
slide-22
SLIDE 22

2 distribution

Cumulative density function Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 25
slide-23
SLIDE 23

Concentration of sum of squares of normally distributed variables

2(k) distribution: distribution of sum of k independent standard normally distributed variables Lemma Let Z1, Z2, . . . , Zk be independent N (0, 1) random variables and let Y = P i Z 2 i . Then, for ✏ 2 (0, 1/2), there is a constant c such that, Pr[(1 ✏)2k  Y  (1 + ✏)2k] 1 2ec✏2k. Recall Chernoff-Hoeffding bound for bounded independent non-negative random variables. Z 2 i is not bounded, however Chernoff-Hoeffding bounds extend to sums of random variables with exponentially decaying tails. Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 25 T S
slide-24
SLIDE 24

→**¥¥÷h¥*

t

.

slide-25
SLIDE 25

Proof of DJL Lemma

Without loss of generality assume kxk2 = 1 (unit vector) Zi = Pn j=1 Πijxi Zi ⇠ N (0, 1) Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 25 M

IT

' z r
  • n

ER

,
  • X ,

.
  • TIE Rk
" " Xu

G-SYNTH,EH

IT Ella Elite) llxth with

pub

G -f)

where

Tlij

  • N 10,17 .

1k¥ # bust

slide-26
SLIDE 26

Proof of DJL Lemma

Without loss of generality assume kxk2 = 1 (unit vector) Zi = Pn j=1 Πijxi Zi ⇠ N (0, 1) Let Y = Pk i=1 Z 2 i . Y ’s distribution is 2 since Z1, . . . , Zk are iid. Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 25

O

  • EITI '

I

'""

" '

slide-27
SLIDE 27

Proof of DJL Lemma

Without loss of generality assume kxk2 = 1 (unit vector) Zi = Pn j=1 Πijxi Zi ⇠ N (0, 1) Let Y = Pk i=1 Z 2 i . Y ’s distribution is 2 since Z1, . . . , Zk are iid. Hence Pr[(1 ✏)2k  Y  (1 + ✏)2k] 1 2ec✏2k Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 25

  • III. =L::]

=

, - z

Lien's

> I - S

slide-28
SLIDE 28

Proof of DJL Lemma

Without loss of generality assume kxk2 = 1 (unit vector) Zi = Pn j=1 Πijxi Zi ⇠ N (0, 1) Let Y = Pk i=1 Z 2 i . Y ’s distribution is 2 since Z1, . . . , Zk are iid. Hence Pr[(1 ✏)2k  Y  (1 + ✏)2k] 1 2ec✏2k Since k = Ω( 1 ✏2 log(1/)) we have Pr[(1 ✏)2k  Y  (1 + ✏)2k] 1 Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 25
slide-29
SLIDE 29

Proof of DJL Lemma

Without loss of generality assume kxk2 = 1 (unit vector) Zi = Pn j=1 Πijxi Zi ⇠ N (0, 1) Let Y = Pk i=1 Z 2 i . Y ’s distribution is 2 since Z1, . . . , Zk are iid. Hence Pr[(1 ✏)2k  Y  (1 + ✏)2k] 1 2ec✏2k Since k = Ω( 1 ✏2 log(1/)) we have Pr[(1 ✏)2k  Y  (1 + ✏)2k] 1 Therefore kzk2 = p Y /k has the property that with probability (1 ), kzk2 = (1 ± ✏)kxk2. Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 25
slide-30
SLIDE 30

JL lower bounds

Question: Are the bounds achieved by the lemmas tight or can we do better? How about non-linear maps? Essentially optimal modulo constant factors for worst-case point sets. Chandra (UIUC) CS498ABD 17 Fall 2020 17 / 25

nvedin

in Rd n

actin

in

Rk

K
  • Ezln
n set distances are preened

ni bum

slide-31
SLIDE 31

Fast JL and Sparse JL

Projection matrix Π is dense and hence Πx takes Θ(kd) time. Question: Can we find Π to improve time bound? Two scenarios: x is dense and x is sparse Chandra (UIUC) CS498ABD 18 Fall 2020 18 / 25

viii.

. .tn
  • vi. be
' ,
  • . .vn
' k.

End

Rk

  • I
  • iii. ui
'

I'

fifa

Oded)

un

' i
slide-32
SLIDE 32

Fast JL and Sparse JL

Projection matrix Π is dense and hence Πx takes Θ(kd) time. Question: Can we find Π to improve time bound? Two scenarios: x is dense and x is sparse Known results: Choose Πij to be {1, 0, 1} with probability 1/6, 1/3, 1/6. Also works. Roughly 1/3 entries are 0 Fast JL: Choose Π in a dependent way to ensure Πx can be computed in O(d log d + k2) time. For dense x. Sparse JL: Choose Π such that each column is s-sparse. The best known is s = O( 1 ✏ log(1/)). Helps in sparse x. Chandra (UIUC) CS498ABD 18 Fall 2020 18 / 25
  • L'

=

slide-33
SLIDE 33

Part I (Oblivious) Subspace Embeddings

Chandra (UIUC) CS498ABD 19 Fall 2020 19 / 25
slide-34
SLIDE 34

Subspace Embedding

Question: Suppose we have linear subspace E of Rn of dimension
  • d. Can we find a projection Π : Rn ! Rk such that for every
x 2 E, kΠxk2 = (1 ± ✏)kxk2? Chandra (UIUC) CS498ABD 20 Fall 2020 20 / 25

I

is

arvelo

ni

R

"

Then

HIT *It

,

a 11×112

T xt

  • R
' '

Kun .

=

a-

  • y
  • CI

Hc

'

HIT5112 - CHINK

slide-35
SLIDE 35

Two

vector

;

Fix

I , I

Xx 7

5=95 taxi

want

projection

matrix IT

  • b. t t 5
wi spancxi, xD

HIT5/4=4*3*1141

slide-36
SLIDE 36

Subspace Embedding

Question: Suppose we have linear subspace E of Rn of dimension
  • d. Can we find a projection Π : Rn ! Rk such that for every
x 2 E, kΠxk2 = (1 ± ✏)kxk2? Not possible if k < d. Why? Chandra (UIUC) CS498ABD 20 Fall 2020 20 / 25

=

=-

=

=

slide-37
SLIDE 37

Subspace Embedding

Question: Suppose we have linear subspace E of Rn of dimension
  • d. Can we find a projection Π : Rn ! Rk such that for every
x 2 E, kΠxk2 = (1 ± ✏)kxk2? Not possible if k < d. Why? Π maps E to a lower dimension. Implies some non-zero vector x 2 E mapped to 0 Chandra (UIUC) CS498ABD 20 Fall 2020 20 / 25
slide-38
SLIDE 38

Subspace Embedding

Question: Suppose we have linear subspace E of Rn of dimension
  • d. Can we find a projection Π : Rn ! Rk such that for every
x 2 E, kΠxk2 = (1 ± ✏)kxk2? Not possible if k < d. Why? Π maps E to a lower dimension. Implies some non-zero vector x 2 E mapped to 0 Possible if k = d. Why? Chandra (UIUC) CS498ABD 20 Fall 2020 20 / 25
slide-39
SLIDE 39

Subspace Embedding

Question: Suppose we have linear subspace E of Rn of dimension
  • d. Can we find a projection Π : Rn ! Rk such that for every
x 2 E, kΠxk2 = (1 ± ✏)kxk2? Not possible if k < d. Why? Π maps E to a lower dimension. Implies some non-zero vector x 2 E mapped to 0 Possible if k = d. Why? Pick Π to be an orthonormal basis for E. Chandra (UIUC) CS498ABD 20 Fall 2020 20 / 25
slide-40
SLIDE 40

Subspace Embedding

Question: Suppose we have linear subspace E of Rn of dimension
  • d. Can we find a projection Π : Rn ! Rk such that for every
x 2 E, kΠxk2 = (1 ± ✏)kxk2? Not possible if k < d. Why? Π maps E to a lower dimension. Implies some non-zero vector x 2 E mapped to 0 Possible if k = d. Why? Pick Π to be an orthonormal basis for
  • E. Disadvantage: This requires knowing E and computing
  • rthonormal basis which is slow.
Chandra (UIUC) CS498ABD 20 Fall 2020 20 / 25
slide-41
SLIDE 41

Subspace Embedding

Question: Suppose we have linear subspace E of Rn of dimension
  • d. Can we find a projection Π : Rn ! Rk such that for every
x 2 E, kΠxk2 = (1 ± ✏)kxk2? Not possible if k < d. Why? Π maps E to a lower dimension. Implies some non-zero vector x 2 E mapped to 0 Possible if k = d. Why? Pick Π to be an orthonormal basis for
  • E. Disadvantage: This requires knowing E and computing
  • rthonormal basis which is slow.
What we really want: Oblivious subspace embedding ala JL based
  • n random projections
Chandra (UIUC) CS498ABD 20 Fall 2020 20 / 25
slide-42
SLIDE 42

Oblivious Supspace Embedding

Theorem Suppose E is a linear subspace of Rn of dimension d. Let Π be a DJL matrix Π 2 Rk⇥n with k = O( d ✏2 log(1/)) rows. Then with probability (1 ) for every x 2 E, k 1 p k Πxk2 = (1 ± ✏)kxk2. In other words JL Lemma extends from one dimension to arbitrary number of dimensions in a graceful way. Chandra (UIUC) CS498ABD 21 Fall 2020 21 / 25
slide-43
SLIDE 43

Proof Idea

How do we prove that Π works for all x 2 E which is an infinite set? Several proofs but one useful argument that is often a starting hammer is the “net argument” Choose a large but finite set of vectors T carefully (the net) Prove that Π preserves lengths of vectors in T (via naive union bound) Argue that any vector x 2 E is sufficiently close to a vector in T and hence Π also preserves length of x Chandra (UIUC) CS498ABD 22 Fall 2020 22 / 25
  • ¥ I
,
slide-44
SLIDE 44

Net argument

Sufficient to focus on unit vectors in E. Why? Chandra (UIUC) CS498ABD 23 Fall 2020 23 / 25

Eit

.Ei

slide-45
SLIDE 45

Net argument

Sufficient to focus on unit vectors in E. Why? Also assume wlog and ease of notation that E is the subspace formed by the first d coordinates in standard basis. Chandra (UIUC) CS498ABD 23 Fall 2020 23 / 25

E

is

linear hubspace of d-dim

wi

Rh

.
slide-46
SLIDE 46

Net argument

Sufficient to focus on unit vectors in E. Why? Also assume wlog and ease of notation that E is the subspace formed by the first d coordinates in standard basis. Claim: There is a net T of size eO(d) such that preserving lengths
  • f vectors in T suffices.
Chandra (UIUC) CS498ABD 23 Fall 2020 23 / 25

d

d

slide-47
SLIDE 47

Net argument

Sufficient to focus on unit vectors in E. Why? Also assume wlog and ease of notation that E is the subspace formed by the first d coordinates in standard basis. Claim: There is a net T of size eO(d) such that preserving lengths
  • f vectors in T suffices.
Assuming claim: use DJL with k = O( d ✏2 log(1/)) and union bound to show that all vectors in T are preserved in length up to (1 ± ✏) factor. Chandra (UIUC) CS498ABD 23 Fall 2020 23 / 25

①Idddhrd

. =

any fixed vector

is pure

E- I - exp

!

  • df

=

slide-48
SLIDE 48

i#÷¥#÷:÷÷÷÷i÷

.

I

ITI

  • ?

⇐d①

claim

: If

all

vectors in T are

"FETE

".tY¥÷t

K

  • flu ( Held
. f ) .

dtnd

i**#:÷÷÷:÷÷÷÷÷÷

:÷÷

.

slide-49
SLIDE 49

Net argument

Sufficient to focus on unit vectors in E. Also assume wlog and ease of notation that E is the subspace formed by the first d coordinates in standard basis. A weaker net: Consider the box [1, 1]d and make a grid with side length ✏/d Number of grid vertices is (2d/✏)d Sufficient to take T to be the grid vertices Gives a weaker bound of O( 1 ✏2d log(d/✏)) dimensions A more careful net argument gives tight bound Chandra (UIUC) CS498ABD 24 Fall 2020 24 / 25
slide-50
SLIDE 50

Net argument: analysis

Fix any x 2 E such that kxk2 = 1 (unit vector) There is grid point y such that kyk2  1 and x is close to y Let z = x y. We have |zi|  ✏/d for 1  i  i  d and zi = 0 for i > d Chandra (UIUC) CS498ABD 25 Fall 2020 25 / 25 = T = = =
slide-51
SLIDE 51

Net argument: analysis

Fix any x 2 E such that kxk2 = 1 (unit vector) There is grid point y such that kyk2  1 and x is close to y Let z = x y. We have |zi|  ✏/d for 1  i  i  d and zi = 0 for i > d kΠxk = kΠy + Πzk  kΠyk + kΠzk  (1 + ✏) + (1 + ✏) d X i=1 |zi|  (1 + ✏) + ✏(1 + ✏)  1 + 3✏ Chandra (UIUC) CS498ABD 25 Fall 2020 25 / 25 HIT ZA z,either + . - t Zd . Ed # t " " ÷ =

I

=

=

= HIT X11 x I- 3C

Httyllx G -e) HII

> (I -C)(I -C)a IIe
slide-52
SLIDE 52

Net argument: analysis

Fix any x 2 E such that kxk2 = 1 (unit vector) There is grid point y such that kyk2  1 and x is close to y Let z = x y. We have |zi|  ✏/d for 1  i  i  d and zi = 0 for i > d kΠxk = kΠy + Πzk  kΠyk + kΠzk  (1 + ✏) + (1 + ✏) d X i=1 |zi|  (1 + ✏) + ✏(1 + ✏)  1 + 3✏ Similarly kΠxk 1 O(✏). Chandra (UIUC) CS498ABD 25 Fall 2020 25 / 25
slide-53
SLIDE 53

Application of Subspace Embeddings

Faster algorithms for approximate matrix multiplication regression SVD Basic idea: Want to perform operations on matrix A with n data columns (say in large dimension Rh) with small effective rank d. Want to reduce to a matrix of size roughly Rd⇥d by spending time proportional to nnz(A). Later in course. Chandra (UIUC) CS498ABD 26 Fall 2020 26 / 25

IT

where IT

is a DJL '

Fleet

II

ITER

" "

I [11-11]

II

rag tee

uxd

in lime

NNZCA)

A- ER

=