Leveraged volume sampling for linear regression Micha l Derezi - - PowerPoint PPT Presentation

leveraged volume sampling for linear regression
SMART_READER_LITE
LIVE PREVIEW

Leveraged volume sampling for linear regression Micha l Derezi - - PowerPoint PPT Presentation

Leveraged volume sampling for linear regression Micha l Derezi nski Manfred K. Warmuth Daniel Hsu UC Berkeley UC Santa Cruz Columbia University Linear regression d y X n i w y i ) 2 ( x Loss: L ( w ) = i w = argmin


slide-1
SLIDE 1

Leveraged volume sampling for linear regression

Micha l Derezi´ nski Manfred K. Warmuth Daniel Hsu

UC Berkeley UC Santa Cruz Columbia University

Linear regression

n d

X y Loss: L(w) =

  • i

(x⊤

i w − yi)2

Optimum: w∗ = argmin

w

L(w)

slide-2
SLIDE 2

Leveraged volume sampling for linear regression

Micha l Derezi´ nski Manfred K. Warmuth Daniel Hsu

UC Berkeley UC Santa Cruz Columbia University

Linear regression with hidden responses Sample S = {4, 6, 9} Receive y4, y6, y9

x⊤

4

x⊤

6

x⊤

9

X

d

y

y4. y6. y9

Loss: L(w) =

  • i

(x⊤

i w − yi)2

Optimum: w∗ = argmin

w

L(w)

slide-3
SLIDE 3

Leveraged volume sampling for linear regression

Micha l Derezi´ nski Manfred K. Warmuth Daniel Hsu

UC Berkeley UC Santa Cruz Columbia University

Linear regression with hidden responses Sample S = {4, 6, 9} Receive y4, y6, y9

x⊤

4

x⊤

6

x⊤

9

X

d

y

y4. y6. y9

Loss: L(w) =

  • i

(x⊤

i w − yi)2

Optimum: w∗ = argmin

w

L(w) Goal: Best unbiased estimator w(S) E

  • w(S)
  • = w∗

L

  • w(S)
  • ≤ (1 + ǫ) L(w∗)

Existing sampling methods:

  • 1. leverage score sampling:

i.i.d., biased

  • 2. volume sampling:

joint, unbiased

slide-4
SLIDE 4

Leveraged volume sampling

Volume sampling Jointly choose set S of k ≥ d indices s.t. Pr(S) ∝ det

i∈S

xix⊤

i

slide-5
SLIDE 5

Leveraged volume sampling

Volume sampling Jointly choose set S of k ≥ d indices s.t. Pr(S) ∝ det

i∈S

xix⊤

i

  • Theorem [DW17]

E

  • w(S)
  • = w∗

where

  • w(S) = argmin

w

  • i∈S

(x⊤

i w − yi)2.

slide-6
SLIDE 6

Leveraged volume sampling

Volume sampling Jointly choose set S of k ≥ d indices s.t. Pr(S) ∝ det

i∈S

xix⊤

i

  • Theorem [DW17]

E

  • w(S)
  • = w∗

New Lower Bound Volume sampling may need a sample of size k = Ω(n) to get a (3/2)

ǫ=1/2

  • approximation
slide-7
SLIDE 7

Leveraged volume sampling

Volume sampling Jointly choose set S of k ≥ d indices s.t. Pr(S) ∝ det

i∈S

xix⊤

i

  • Theorem [DW17]

E

  • w(S)
  • = w∗

New Lower Bound Volume sampling may need a sample of size k = Ω(n) to get a (3/2)

ǫ=1/2

  • approximation

Solution: Use i.i.d. and joint sampling Pr(S) ∝

leverage scores

  • i∈S

ℓi

  • volume sampling
  • det

i∈S

1 ℓi xix⊤

i

slide-8
SLIDE 8

Leveraged volume sampling

Volume sampling Jointly choose set S of k ≥ d indices s.t. Pr(S) ∝ det

i∈S

xix⊤

i

  • Theorem [DW17]

E

  • w(S)
  • = w∗

New Lower Bound Volume sampling may need a sample of size k = Ω(n) to get a (3/2)

ǫ=1/2

  • approximation

Solution: Use i.i.d. and joint sampling Pr(S) ∝

leverage scores

  • i∈S

ℓi

  • volume sampling
  • det

i∈S

1 ℓi xix⊤

i

  • w(S) = argmin

w

  • i∈S

1 ℓi

  • x⊤

i w − yi

2 New Theorem For k = O(d log d + d/ǫ) E

  • w(S)
  • = w∗

and w.h.p. L

  • w(S)
  • ≤ (1 + ǫ)L(w∗)
slide-9
SLIDE 9

New volume sampling algorithm

Determinantal rejection sampling trick

repeat Sample i1, . . . , is i.i.d. ∼ (ℓ1, . . . , ℓn) Sample Accept ∼ Bernoulli det ( s

t=1 1 ℓit

xit x⊤

it )

det(X⊤X)

  • until Accept = true

preprocessing O(nd2)

  • improvable to

O(nd+poly(d))

+ sampling O(d4)

  • no dependence on n
slide-10
SLIDE 10

New volume sampling algorithm

Determinantal rejection sampling trick

repeat Sample i1, . . . , is i.i.d. ∼ (ℓ1, . . . , ℓn) Sample Accept ∼ Bernoulli det ( s

t=1 1 ℓit

xit x⊤

it )

det(X⊤X)

  • until Accept = true

preprocessing O(nd2)

  • improvable to

O(nd+poly(d))

+ sampling O(d4)

  • no dependence on n

Experiments – 7 datasets from Libsvm Check out poster #151