Minimax Rates for Memory-Constrained Sparse Linear Regression Jacob - - PowerPoint PPT Presentation

minimax rates for memory constrained sparse linear
SMART_READER_LITE
LIVE PREVIEW

Minimax Rates for Memory-Constrained Sparse Linear Regression Jacob - - PowerPoint PPT Presentation

Minimax Rates for Memory-Constrained Sparse Linear Regression Jacob Steinhardt John Duchi Stanford University { jsteinha,jduchi } @stanford.edu July 6, 2015 J. Steinhardt & J. Duchi (Stanford) Memory-Constrained Sparse Regression July 6,


slide-1
SLIDE 1

Minimax Rates for Memory-Constrained Sparse Linear Regression

Jacob Steinhardt John Duchi

Stanford University

{jsteinha,jduchi}@stanford.edu

July 6, 2015

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 1 / 11

slide-2
SLIDE 2

Resource-Constrained Learning

How do we solve statistical problems with limited resources?

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 2 / 11

slide-3
SLIDE 3

Resource-Constrained Learning

How do we solve statistical problems with limited resources? computation (Natarajan, 1995; Berthet & Rigollet, 2013; Zhang et al., 2014; Foster et al., 2015)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 2 / 11

slide-4
SLIDE 4

Resource-Constrained Learning

How do we solve statistical problems with limited resources? computation (Natarajan, 1995; Berthet & Rigollet, 2013; Zhang et al., 2014; Foster et al., 2015) privacy (Kasiviswanathan et al., 2011; Duchi et al., 2013)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 2 / 11

slide-5
SLIDE 5

Resource-Constrained Learning

How do we solve statistical problems with limited resources? computation (Natarajan, 1995; Berthet & Rigollet, 2013; Zhang et al., 2014; Foster et al., 2015) privacy (Kasiviswanathan et al., 2011; Duchi et al., 2013) communication / memory (Zhang et al., 2013; Shamir, 2014; Garg et al., 2014; Braverman et al., 2015)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 2 / 11

slide-6
SLIDE 6

Setting

Sparse linear regression in Rd: Y (i) = w∗,X (i)+ε(i)

w∗0 = k, k ≪ d

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 3 / 11

slide-7
SLIDE 7

Setting

Sparse linear regression in Rd: Y (i) = w∗,X (i)+ε(i)

w∗0 = k, k ≪ d

Memory constraint:

(X (i),Y (i)) observed as read-only stream

Only keep b bits of state Z (i) between successive observations

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 3 / 11

slide-8
SLIDE 8

Setting

Sparse linear regression in Rd: Y (i) = w∗,X (i)+ε(i)

w∗0 = k, k ≪ d

Memory constraint:

(X (i),Y (i)) observed as read-only stream

Only keep b bits of state Z (i) between successive observations

W ∗

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 3 / 11

slide-9
SLIDE 9

Setting

Sparse linear regression in Rd: Y (i) = w∗,X (i)+ε(i)

w∗0 = k, k ≪ d

Memory constraint:

(X (i),Y (i)) observed as read-only stream

Only keep b bits of state Z (i) between successive observations

W ∗ X (1)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 3 / 11

slide-10
SLIDE 10

Setting

Sparse linear regression in Rd: Y (i) = w∗,X (i)+ε(i)

w∗0 = k, k ≪ d

Memory constraint:

(X (i),Y (i)) observed as read-only stream

Only keep b bits of state Z (i) between successive observations

W ∗ X (1) Y (1)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 3 / 11

slide-11
SLIDE 11

Setting

Sparse linear regression in Rd: Y (i) = w∗,X (i)+ε(i)

w∗0 = k, k ≪ d

Memory constraint:

(X (i),Y (i)) observed as read-only stream

Only keep b bits of state Z (i) between successive observations

W ∗ X (1) Y (1) Z (1)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 3 / 11

slide-12
SLIDE 12

Setting

Sparse linear regression in Rd: Y (i) = w∗,X (i)+ε(i)

w∗0 = k, k ≪ d

Memory constraint:

(X (i),Y (i)) observed as read-only stream

Only keep b bits of state Z (i) between successive observations

W ∗ X (1) Y (1) Z (1) X (2)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 3 / 11

slide-13
SLIDE 13

Setting

Sparse linear regression in Rd: Y (i) = w∗,X (i)+ε(i)

w∗0 = k, k ≪ d

Memory constraint:

(X (i),Y (i)) observed as read-only stream

Only keep b bits of state Z (i) between successive observations

W ∗ X (1) Y (1) Z (1) X (2) Y (2)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 3 / 11

slide-14
SLIDE 14

Setting

Sparse linear regression in Rd: Y (i) = w∗,X (i)+ε(i)

w∗0 = k, k ≪ d

Memory constraint:

(X (i),Y (i)) observed as read-only stream

Only keep b bits of state Z (i) between successive observations

W ∗ X (1) Y (1) Z (1) X (2) Y (2) Z (2) b

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 3 / 11

slide-15
SLIDE 15

Setting

Sparse linear regression in Rd: Y (i) = w∗,X (i)+ε(i)

w∗0 = k, k ≪ d

Memory constraint:

(X (i),Y (i)) observed as read-only stream

Only keep b bits of state Z (i) between successive observations

W ∗ X (1) Y (1) Z (1) X (2) Y (2) Z (2) b X (3) Y (3) Z (3) b

...

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 3 / 11

slide-16
SLIDE 16

Motivating Question

If we have enough memory to represent the answer, can we also efficiently learn the answer?

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 4 / 11

slide-17
SLIDE 17

Problem Statement

How much data n is needed to obtain estimator ˆ w with

E[ˆ

w − w∗2

2] ≤ ε?

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 5 / 11

slide-18
SLIDE 18

Problem Statement

How much data n is needed to obtain estimator ˆ w with

E[ˆ

w − w∗2

2] ≤ ε?

Classical case (no memory constraint):

Theorem (Wainwright, 2009)

k

ε log(d) n k ε log(d)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 5 / 11

slide-19
SLIDE 19

Problem Statement

How much data n is needed to obtain estimator ˆ w with

E[ˆ

w − w∗2

2] ≤ ε?

Classical case (no memory constraint):

Theorem (Wainwright, 2009)

k

ε log(d) n k ε log(d)

Achievable with ˜

O(d) memory (Agarwal et al., 2012; S., Wager, & Liang, 2015).

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 5 / 11

slide-20
SLIDE 20

Problem Statement

How much data n is needed to obtain estimator ˆ w with

E[ˆ

w − w∗2

2] ≤ ε?

Classical case (no memory constraint):

Theorem (Wainwright, 2009)

k

ε log(d) n k ε log(d)

With memory constraints b:

Theorem (S. & Duchi, 2015)

k

ε

d b n k

ε2

d b

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 5 / 11

slide-21
SLIDE 21

Problem Statement

How much data n is needed to obtain estimator ˆ w with

E[ˆ

w − w∗2

2] ≤ ε?

Classical case (no memory constraint):

Theorem (Wainwright, 2009)

k

ε log(d) n k ε log(d)

With memory constraints b:

Theorem (S. & Duchi, 2015)

k

ε

d b n k

ε2

d b Exponential increase if b ≪ d!

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 5 / 11

slide-22
SLIDE 22

Problem Statement

How much data n is needed to obtain estimator ˆ w with

E[ˆ

w − w∗2

2] ≤ ε?

Classical case (no memory constraint):

Theorem (Wainwright, 2009)

k

ε log(d) n k ε log(d)

With memory constraints b:

Theorem (S. & Duchi, 2015)

k

ε

d b n k

ε2

d b [Note: up to log factors; assumes k log(d) ≪ b ≤ d]

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 5 / 11

slide-23
SLIDE 23

Proof Overview

Lower bound:

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 6 / 11

slide-24
SLIDE 24

Proof Overview

Lower bound:

information-theoretic

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 6 / 11

slide-25
SLIDE 25

Proof Overview

Lower bound:

information-theoretic strong data-processing inequality W ∗ X,Y Z d 1

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 6 / 11

slide-26
SLIDE 26

Proof Overview

Lower bound:

information-theoretic strong data-processing inequality W ∗ X,Y Z

db

1 b

d

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 6 / 11

slide-27
SLIDE 27

Proof Overview

Lower bound:

information-theoretic strong data-processing inequality W ∗ X,Y Z

db

1 b

d

main challenge: dependence between X,Y

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 6 / 11

slide-28
SLIDE 28

Proof Overview

Lower bound:

information-theoretic strong data-processing inequality W ∗ X,Y Z

db

1 b

d

main challenge: dependence between X,Y

Upper bound:

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 6 / 11

slide-29
SLIDE 29

Proof Overview

Lower bound:

information-theoretic strong data-processing inequality W ∗ X,Y Z

db

1 b

d

main challenge: dependence between X,Y

Upper bound:

count-min sketch + ℓ1-regularized dual averaging

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 6 / 11

slide-30
SLIDE 30

Proof Overview

Lower bound:

information-theoretic strong data-processing inequality W ∗ X,Y Z

db

1 b

d

main challenge: dependence between X,Y

Upper bound:

count-min sketch + ℓ1-regularized dual averaging more regularization → easier sketching problem

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 6 / 11

slide-31
SLIDE 31

Lower Bound Construction

Split coordinates into k blocks of size d/k

d k

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 7 / 11

slide-32
SLIDE 32

Lower Bound Construction

Split coordinates into k blocks of size d/k w∗ in each block: single non-zero coordinate J, ±δ with equal probability

J = 2

d k

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 7 / 11

slide-33
SLIDE 33

Lower Bound Construction

Split coordinates into k blocks of size d/k w∗ in each block: single non-zero coordinate J, ±δ with equal probability Direct sum argument: reduce to k = 1

J = 2

d k

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 7 / 11

slide-34
SLIDE 34

Lower Bound Construction

Split coordinates into k blocks of size d/k w∗ in each block: single non-zero coordinate J, ±δ with equal probability Direct sum argument: reduce to k = 1

J = 2

d k

Estimation to testing:

E[w∗ − ˆ

w2

2] ≥ δ 2

2 P[J = ˆ J]

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 7 / 11

slide-35
SLIDE 35

Lower Bound Construction

Split coordinates into k blocks of size d/k w∗ in each block: single non-zero coordinate J, ±δ with equal probability Direct sum argument: reduce to k = 1

J = 2

d k

Estimation to testing:

E[w∗ − ˆ

w2

2] ≥ δ 2

2 P[J = ˆ J] Looking ahead: bound KL between Pj and base distribution P0

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 7 / 11

slide-36
SLIDE 36

Some Information Theory

Let X ∼ Uniform({±1}d)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 8 / 11

slide-37
SLIDE 37

Some Information Theory

Let X ∼ Uniform({±1}d) Let Pj(Z (1:n)) be distribution conditioned on J = j

2δ Xj :

−1 +1

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 8 / 11

slide-38
SLIDE 38

Some Information Theory

Let X ∼ Uniform({±1}d) Let Pj(Z (1:n)) be distribution conditioned on J = j Let P0(Z (1:n)) be distribution with Y independent of X

2δ Xj :

−1 +1

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 8 / 11

slide-39
SLIDE 39

Some Information Theory

Let X ∼ Uniform({±1}d) Let Pj(Z (1:n)) be distribution conditioned on J = j Let P0(Z (1:n)) be distribution with Y independent of X Assouad’s method:

P[J = ˆ

J] ≥ 1 2 −

  • 1

d

d

j=1

Dkl

  • P0(Z (1:n)) || Pj(Z (1:n))

Xj :

−1 +1

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 8 / 11

slide-40
SLIDE 40

Some Information Theory

Let X ∼ Uniform({±1}d) Let Pj(Z (1:n)) be distribution conditioned on J = j Let P0(Z (1:n)) be distribution with Y independent of X Assouad’s method:

P[J = ˆ

J] ≥ 1 2 −

  • 1

d

d

j=1

Dkl

  • P0(Z (1:n)) || Pj(Z (1:n))

Xj :

−1 +1

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 8 / 11

slide-41
SLIDE 41

Some Information Theory

Let X ∼ Uniform({±1}d) Let Pj(Z (1:n)) be distribution conditioned on J = j Let P0(Z (1:n)) be distribution with Y independent of X Assouad’s method:

P[J = ˆ

J] ≥ 1 2 −

  • 1

d

d

j=1

Dkl

  • P0(Z (1:n)) || Pj(Z (1:n))

Xj :

−1 +1

Key fact: (Y,Xj) independent of X¬j under Pj

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 8 / 11

slide-42
SLIDE 42

Some Information Theory

Let X ∼ Uniform({±1}d) Let Pj(Z (1:n)) be distribution conditioned on J = j Let P0(Z (1:n)) be distribution with Y independent of X Assouad’s method:

P[J = ˆ

J] ≥ 1 2 −

  • 1

d

d

j=1

Dkl

  • P0(Z (1:n)) || Pj(Z (1:n))

Xj :

−1 +1

Key fact: (Y,Xj) independent of X¬j under Pj

Intuition: Dkl (P0 || Pj) small unless Z stores info about Xj; need to store majority of Xj to make average Dkl small.

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 8 / 11

slide-43
SLIDE 43

Strong Data-Processing Inequality

Focus on a single index Z = Z (i), with ˆ z = z(1:i−1) fixed.

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 9 / 11

slide-44
SLIDE 44

Strong Data-Processing Inequality

Focus on a single index Z = Z (i), with ˆ z = z(1:i−1) fixed.

Proposition

For any ˆ z, Dkl (P0(Z | ˆ z) || Pj(Z | ˆ z)) ≤ 4δ 2 I(Xj;Z | Y, ˆ Z = ˆ z)

  • mutual information
  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 9 / 11

slide-45
SLIDE 45

Strong Data-Processing Inequality

Focus on a single index Z = Z (i), with ˆ z = z(1:i−1) fixed.

Proposition

For any ˆ z, Dkl (P0(Z | ˆ z) || Pj(Z | ˆ z)) ≤ 4δ 2I(Xj;Z | Y, ˆ Z = ˆ z)

≤ 4δ 2I(Xj;Z,Y | ˆ

Z = ˆ z)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 9 / 11

slide-46
SLIDE 46

Strong Data-Processing Inequality

Focus on a single index Z = Z (i), with ˆ z = z(1:i−1) fixed.

Proposition

For any ˆ z, Dkl (P0(Z | ˆ z) || Pj(Z | ˆ z)) ≤ 4δ 2I(Xj;Z | Y, ˆ Z = ˆ z)

≤ 4δ 2I(Xj;Z,Y | ˆ

Z = ˆ z) Plug into Assouad: 1 d

d

j=1

Dkl (P0 || Pj)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 9 / 11

slide-47
SLIDE 47

Strong Data-Processing Inequality

Focus on a single index Z = Z (i), with ˆ z = z(1:i−1) fixed.

Proposition

For any ˆ z, Dkl (P0(Z | ˆ z) || Pj(Z | ˆ z)) ≤ 4δ 2I(Xj;Z | Y, ˆ Z = ˆ z)

≤ 4δ 2I(Xj;Z,Y | ˆ

Z = ˆ z) Plug into Assouad: 1 d

d

j=1

Dkl (P0 || Pj) ≤ 4δ 2 d

d

j=1

I(Xj;Z,Y | ˆ Z)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 9 / 11

slide-48
SLIDE 48

Strong Data-Processing Inequality

Focus on a single index Z = Z (i), with ˆ z = z(1:i−1) fixed.

Proposition

For any ˆ z, Dkl (P0(Z | ˆ z) || Pj(Z | ˆ z)) ≤ 4δ 2I(Xj;Z | Y, ˆ Z = ˆ z)

≤ 4δ 2I(Xj;Z,Y | ˆ

Z = ˆ z) Plug into Assouad: 1 d

d

j=1

Dkl (P0 || Pj) ≤ 4δ 2 d

d

j=1

I(Xj;Z,Y | ˆ Z)

≤ 4δ 2

d I(X;Z,Y | ˆ Z)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 9 / 11

slide-49
SLIDE 49

Strong Data-Processing Inequality

Focus on a single index Z = Z (i), with ˆ z = z(1:i−1) fixed.

Proposition

For any ˆ z, Dkl (P0(Z | ˆ z) || Pj(Z | ˆ z)) ≤ 4δ 2I(Xj;Z | Y, ˆ Z = ˆ z)

≤ 4δ 2I(Xj;Z,Y | ˆ

Z = ˆ z) Plug into Assouad: 1 d

d

j=1

Dkl (P0 || Pj) ≤ 4δ 2 d

d

j=1

I(Xj;Z,Y | ˆ Z)

≤ 4δ 2

d I(X;Z,Y | ˆ Z)

  • b+O(1)
  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 9 / 11

slide-50
SLIDE 50

Strong Data-Processing Inequality

Focus on a single index Z = Z (i), with ˆ z = z(1:i−1) fixed.

Proposition

For any ˆ z, Dkl (P0(Z | ˆ z) || Pj(Z | ˆ z)) ≤ 4δ 2I(Xj;Z | Y, ˆ Z = ˆ z)

≤ 4δ 2I(Xj;Z,Y | ˆ

Z = ˆ z) Plug into Assouad: 1 d

d

j=1

Dkl (P0 || Pj) ≤ 4δ 2 d

d

j=1

I(Xj;Z,Y | ˆ Z)

≤ 4δ 2

d I(X;Z,Y | ˆ Z)

  • b+O(1)

Only get 4δ 2b

d

bits per round!

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 9 / 11

slide-51
SLIDE 51

Upper Bound

Solve ℓ1-regularized dual averaging problem (Xiao, 2010), λ ≫ 1: w(i) = argminw

  • θ (i),w+λ√

nw1 + 1 2η w2

2

  • ,

θ (i) =

i−1

i′=1

x(i′)(y(i′) −w(i′),x(i′)).

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 10 / 11

slide-52
SLIDE 52

Upper Bound

Solve ℓ1-regularized dual averaging problem (Xiao, 2010), λ ≫ 1: w(i) = argminw

  • θ (i),w+λ√

nw1 + 1 2η w2

2

  • ,

θ (i) =

i−1

i′=1

x(i′)(y(i′) −w(i′),x(i′)). Hard part: determine support of w(i).

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 10 / 11

slide-53
SLIDE 53

Upper Bound

Solve ℓ1-regularized dual averaging problem (Xiao, 2010), λ ≫ 1: w(i) = argminw

  • θ (i),w+λ√

nw1 + 1 2η w2

2

  • ,

θ (i) =

i−1

i′=1

x(i′)(y(i′) −w(i′),x(i′)). Hard part: determine support of w(i).

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 10 / 11

slide-54
SLIDE 54

Upper Bound

Solve ℓ1-regularized dual averaging problem (Xiao, 2010), λ ≫ 1: w(i) = argminw

  • θ (i),w+λ√

nw1 + 1 2η w2

2

  • ,

θ (i) =

i−1

i′=1

x(i′)(y(i′) −w(i′),x(i′)). Hard part: determine support of w(i). Need to distinguish |θj| ≥ λ√ n (signal) from |θj| ≈ √ n (noise)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 10 / 11

slide-55
SLIDE 55

Upper Bound

Solve ℓ1-regularized dual averaging problem (Xiao, 2010), λ ≫ 1: w(i) = argminw

  • θ (i),w+λ√

nw1 + 1 2η w2

2

  • ,

θ (i) =

i−1

i′=1

x(i′)(y(i′) −w(i′),x(i′)). Hard part: determine support of w(i). Need to distinguish |θj| ≥ λ√ n (signal) from |θj| ≈ √ n (noise) Can use count-min sketch, memory usage ≈ d log(d)

λ 2

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 10 / 11

slide-56
SLIDE 56

Upper Bound

Solve ℓ1-regularized dual averaging problem (Xiao, 2010), λ ≫ 1: w(i) = argminw

  • θ (i),w+λ√

nw1 + 1 2η w2

2

  • ,

θ (i) =

i−1

i′=1

x(i′)(y(i′) −w(i′),x(i′)). Hard part: determine support of w(i). Need to distinguish |θj| ≥ λ√ n (signal) from |θj| ≈ √ n (noise) Can use count-min sketch, memory usage ≈ d log(d)

λ 2

= ⇒ regularization decreases computation; seen before in ℓ2 case

(Shalev-Shwartz & Zhang, 2013; Bruer et al., 2014)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 10 / 11

slide-57
SLIDE 57

Discussion

Summary:

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 11 / 11

slide-58
SLIDE 58

Discussion

Summary: Upper and lower bounds on memory-constrained regression

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 11 / 11

slide-59
SLIDE 59

Discussion

Summary: Upper and lower bounds on memory-constrained regression Lower bound: extend data processing inequality to handle covariates

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 11 / 11

slide-60
SLIDE 60

Discussion

Summary: Upper and lower bounds on memory-constrained regression Lower bound: extend data processing inequality to handle covariates Upper bound: use ℓ1-regularizer to reduce to sketching

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 11 / 11

slide-61
SLIDE 61

Discussion

Summary: Upper and lower bounds on memory-constrained regression Lower bound: extend data processing inequality to handle covariates Upper bound: use ℓ1-regularizer to reduce to sketching Future work:

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 11 / 11

slide-62
SLIDE 62

Discussion

Summary: Upper and lower bounds on memory-constrained regression Lower bound: extend data processing inequality to handle covariates Upper bound: use ℓ1-regularizer to reduce to sketching Future work: Close the gap (kd/bε vs kd/bε2)

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 11 / 11

slide-63
SLIDE 63

Discussion

Summary: Upper and lower bounds on memory-constrained regression Lower bound: extend data processing inequality to handle covariates Upper bound: use ℓ1-regularizer to reduce to sketching Future work: Close the gap (kd/bε vs kd/bε2) Weaken upper bound assumptions

  • J. Steinhardt & J. Duchi (Stanford)

Memory-Constrained Sparse Regression July 6, 2015 11 / 11