Subspace Embeddings for Regression Lecture 12 October 1, 2020 - - PowerPoint PPT Presentation

subspace embeddings for regression
SMART_READER_LITE
LIVE PREVIEW

Subspace Embeddings for Regression Lecture 12 October 1, 2020 - - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data Subspace Embeddings for Regression Lecture 12 October 1, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 18 Subspace Embedding Question: Suppose we have linear subspace E of R n of dimension d . Can we find


slide-1
SLIDE 1

CS 498ABD: Algorithms for Big Data

Subspace Embeddings for Regression

Lecture 12

October 1, 2020

Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 18

slide-2
SLIDE 2

Subspace Embedding

Question: Suppose we have linear subspace E of Rn of dimension

  • d. Can we find a projection Π : Rd ! Rk such that for every

x 2 E, kΠxk2 = (1 ± ✏)kxk2? Not possible if k < d. Possible if k = `. Pick Π to be an orthonormal basis for E. Disadvantage: This requires knowing E and computing

  • rthonormal basis which is slow.

What we really want: Oblivious subspace embedding ala JL based

  • n random projections

Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 18

slide-3
SLIDE 3

Oblivious Supspace Embedding

Theorem Suppose E is a linear subspace of Rn of dimension d. Let Π be a DJL matrix Π 2 Rk⇥d with k = O( d

✏2 log(1/)) rows. Then with

probability (1 ) for every x 2 E, k 1 p k Πxk2 = (1 ± ✏)kxk2. In other words JL Lemma extends from one dimension to arbitrary number of dimensions in a graceful way.

Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18

slide-4
SLIDE 4

Part I Faster algorithms via subspace embeddings

Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 18

slide-5
SLIDE 5

Linear model fitting

An important problem in data analysis n data points Each data point ai 2 Rd and real value bi. We think of ai = (ai,1, ai,2, . . . , ai,d). Interesting special case is when d = 1. What model should one use to explain the data?

Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 18

slide-6
SLIDE 6

Linear model fitting

An important problem in data analysis n data points Each data point ai 2 Rd and real value bi. We think of ai = (ai,1, ai,2, . . . , ai,d). Interesting special case is when d = 1. What model should one use to explain the data? Simplest model? Affine fitting. bi = ↵0 + Pd

j=1 ↵jai,j for some real

numbers ↵0, ↵1, . . . , ↵d. Can restrict to ↵0 = 0 by lifting to d + 1 dimensions and hence linear model.

Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 18

slide-7
SLIDE 7

Linear model fitting

An important problem in data analysis n data points Each data point ai 2 Rd and real value bi. We think of ai = (ai,1, ai,2, . . . , ai,d). Interesting special case is when d = 1. What model should one use to explain the data? Simplest model? Affine fitting. bi = ↵0 + Pd

j=1 ↵jai,j for some real

numbers ↵0, ↵1, . . . , ↵d. Can restrict to ↵0 = 0 by lifting to d + 1 dimensions and hence linear model. But data is noisy so we won’t be able to satisfy all data points even if true model is a linear model. How do we find a good linear model?

Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 18

slide-8
SLIDE 8

Regression

n data points Each data point ai 2 Rd and real value bi. We think of ai = (ai,1, ai,2, . . . , ai,d). Linear model fitting: Find real numbers ↵1, . . . , ↵d such that bi ' Pd

j=1 ↵jai,j for all points.

Let A be matrix with one row per data point ai. We write x1, x2, . . . , xd as variables for finding ↵1, . . . , ↵d. Ideally: Find x 2 Rd such that Ax = b Best fit: Find x 2 Rd to minimize Ax b under some norm. kAx bk1, kAx bk2, kAx bk1

Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 18

¥

.

Lait

, F-bi

A x

  • b

=

=

f-

=

slide-9
SLIDE 9

Linear least squares/Regression

Linear least squares: Given A 2 Rn⇥d and b 2 Rd find x to minimize kAx bk2. Optimal estimator for certain noise models Interesting when n d the over constrained case when there is no solution to Ax = b and want to find best fit.

Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 18

Wikipedia

=

slide-10
SLIDE 10

Linear least squares/Regression

Linear least squares: Given A 2 Rn⇥d and b 2 Rd find x to minimize kAx bk2. Interesting when n d the over constrained case when there is no solution to Ax = b and want to find best fit. Geometrically Ax is a linear combination of columns of A. Hence we are asking what is the vector z in the column space of A that is closest to vector b in `2 norm. Closest vector to b is the projection of b into the column space of A so it is “obvious” geometrically. How do we find it?

Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 18

=

(

I

slide-11
SLIDE 11

[

"" [gains

. .

Fix

x

A

x

HA x # blk

=

  • x. A' + KA

'

= 02 ?

=

x

  • a xD Ad

b

is

in

the column

span

  • f A
.

Supper

b

is

int

wi the column

space

.

what

is the

am

?

+3 11 Ax -BHT

n

=

Dfid

xp

Is

"

s.tA

"

"""Itis

. .

A'

=

Ah

slide-12
SLIDE 12

Linear least squares/Regression

Linear least squares: Given A 2 Rn⇥d and b 2 Rd find x to minimize kAx bk2. Geometrically Ax is a linear combination of columns of A. Hence we are asking what is the vector z in the column space of A that is closest to vector b in `2 norm. Closest vector to b is the projection of b into the column space of A so it is “obvious” geometrically. How do we find it? Find an orthonormal basis z1, z2, . . . , zr for the columns of A. Compute projection c of b to column space of A as c = Pr

j=1hb, zjizj and output answer as kb ck2.

What is x?

Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 18

=

X is obtained

by

expressing

c

Ax -

  • c
slide-13
SLIDE 13

Linear least squares/Regression

Linear least squares: Given A 2 Rn⇥d and b 2 Rd find x to minimize kAx bk2. Geometrically Ax is a linear combination of columns of A. Hence we are asking what is the vector z in the column space of A that is closest to vector b in `2 norm. Closest vector to b is the projection of b into the column space of A so it is “obvious” geometrically. How do we find it? Find an orthonormal basis z1, z2, . . . , zr for the columns of A. Compute projection c of b to column space of A as c = Pr

j=1hb, zjizj and output answer as kb ck2.

What is x? We know that Ax = c. Solve linear system. Can combine both steps via SVD and other methods.

Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 18

slide-14
SLIDE 14

Linear least square: Optimization perspective

Linear least squares: Given A 2 Rn⇥d and b 2 Rd find x to minimize kAx bk2. Optimization: Find x 2 Rd to minimize kAx bk2

2

kAx bk2

2 = xTATAx 2bTAx + btb

The quadratic function f (x) = xTATAx 2bTAx + btb is a convex function since the matrix ATA is positive semi-definite. rf (x) = 2ATAx 2bTA and hence optimum solution x⇤ is given by x⇤ = (ATA)1bTA.

Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 18

=

#

  • K

d

.

° #

slide-15
SLIDE 15

Computational perspective

n large (number of data points), d smaller so A is tall and skinny. Exact solution requires SVD or other methods. Worst case time nd 2. Can we speed up computation with some potential approximation?

Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 18

d

'

aid

  • .

E

= DI

EL

+ nd

F

slide-16
SLIDE 16

Linear least squares via Subspace embeddings

Let A(1), A(2), . . . , A(d) be the columns of A and let E be the subspace spanned by {A(1), A(2), . . . , A(d), b} Note columns are in Rn corresponding to n data points E has dimension at most d + 1. Use subspace embedding on E. Applying JL matrix Π with k = O( d

✏2) rows we reduce {A(1), A(2), . . . , A(d), b} to

{A

0(1), A 0(2), . . . , A 0(d), b0} which are vectors in Rk.

Solve minx02RdkA0x0 b0k2

Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 18

ITE Rk"

EL

'

' ' 'j

K

  • cdjftlugt

I

slide-17
SLIDE 17

ft

"

  • by

d "

IT e R'"n

k:#bust)

.

" kilt

"

s

'

xx#

k

  • ft
slide-18
SLIDE 18

Analysis

Lemma With probability (1 ), (1✏) min

x2RdkAxbk  min x02RdkA0x0b0k2  (1+✏) min x2RdkAxbk

Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 18

=

slide-19
SLIDE 19

Analysis

Lemma With probability (1 ), (1✏) min

x2RdkAxbk  min x02RdkA0x0b0k2  (1+✏) min x2RdkAxbk

With probability (1 ) via the subpsace embedding guarantee, for all z 2 E, (1 ✏)kzk2  kΠzk2  (1 + ✏)kzk2 Now prove two inequalities in lemma separately using above.

Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 18

a

slide-20
SLIDE 20

Analysis

Suppose x⇤ is an optimum solution to minxkAx bk2. Let z = Ax⇤ b. We have kΠzk2  (1 + ✏)kzk2 since z 2 E.

Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 18

=

=

,

slide-21
SLIDE 21

Analysis

Suppose x⇤ is an optimum solution to minxkAx bk2. Let z = Ax⇤ b. We have kΠzk2  (1 + ✏)kzk2 since z 2 E. Since x⇤ is a feasible solution to minx0kA0x0 b0k,

min

x0 kA0x0b0k2  kA0x⇤b0k2 = kΠ(Ax⇤b)k2  (1+✏)kAx⇤bk2

Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 18

I

=

=

Hitz'll Elite) # EH =htEHAxEblL

slide-22
SLIDE 22

Analysis

For any y 2 Rd, kΠAy Πbk2 (1 ✏)kAy bk2 because Ay b is a vector in E and Π preserves all of them.

Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 18

ay - b) HE

HIT A y

  • tbh

HIT zllz, Ci - e) Hell

slide-23
SLIDE 23

Analysis

For any y 2 Rd, kΠAy Πbk2 (1 ✏)kAy bk2 because Ay b is a vector in E and Π preserves all of them. Let y ⇤ be optimum solution to minx0kA0x0 b0k2. Then kΠ(Ay ⇤ b)k2 (1 ✏)kAy ⇤ bk2 (1 ✏)kAx⇤ bk2

Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 18

  • T

=p

slide-24
SLIDE 24

Running time

Reduce problem for d vectors in Rn to d vectors in Rk where k = O(d/✏2). Computing ΠA, Πb can be done in nnz(A) via sparse/fast JL (input sparsity time). Need to solve least squares on A0, b0 which can be done in O(d 3/✏2) time. Essentially reduce n to d/✏2. Useful when n d/✏2 (for this ✏ should not be too small)

Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 18

It

slide-25
SLIDE 25

Further improvement

Reduced dimension of vectors from Rn to Rk where k = O(d/✏2). For small ✏ a dependence of 1/✏2 is not so good. Can we improve? Can use Π with k = O(d/✏). Suffices if Π has 1/10-approximate subspace embedding property and property of preserving matrix multiplication (ΠA)T(ΠA) has small condition number Use Π that has 1/10-approximate subspace embedding property and then use gradient descent whose convergence depends on condition number of A.

Chandra (UIUC) CS498ABD 17 Fall 2020 17 / 18

slide-26
SLIDE 26

Other uses of JL/subspace embeddings in numerical linear algebra

Approximate matrix multiplication Low rank approximation and SVD Compressed Sensing

Chandra (UIUC) CS498ABD 18 Fall 2020 18 / 18