Learning a Compressed Sensing Measurement Matrix via Gradient - - PowerPoint PPT Presentation

â–ķ
learning a compressed sensing measurement matrix via
SMART_READER_LITE
LIVE PREVIEW

Learning a Compressed Sensing Measurement Matrix via Gradient - - PowerPoint PPT Presentation

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu 1 , Alex Dimakis 1 , Sujay Sanghavi 1 , Felix Yu 2 , Dan Holtmann-Rice 2 , Dmitry Storcheus 2 , Afshin Rostamizadeh 2 , Sanjiv Kumar 2 1 University of Texas at


slide-1
SLIDE 1

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling

Shanshan Wu1, Alex Dimakis1, Sujay Sanghavi1, Felix Yu2, Dan Holtmann-Rice2, Dmitry Storcheus2, Afshin Rostamizadeh2, Sanjiv Kumar2

1University of Texas at Austin, 2Google Research

Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189

slide-2
SLIDE 2

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling

Shanshan Wu1, Alex Dimakis1, Sujay Sanghavi1, Felix Yu2, Dan Holtmann-Rice2, Dmitry Storcheus2, Afshin Rostamizadeh2, Sanjiv Kumar2

1University of Texas at Austin, 2Google Research

Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189

slide-3
SLIDE 3

Motivation

  • Goal: Create good representations for sparse data
slide-4
SLIDE 4

Motivation

  • Goal: Create good representations for sparse data
  • Amazon employee dataset: 𝑒 = 15𝑙, nnz = 9
  • RCV1 text dataset: 𝑒 = 47𝑙, nnz = 76
  • Wiki multi-label dataset: 𝑒 = 31𝑙, nnz = 19
  • eXtreme Multi-label Learning (XML).

(Multiple labels per item, from a very large class of labels)

One-hot encoded categorical data +Text parts

slide-5
SLIDE 5

Motivation

  • Goal: Create good representations for sparse data
  • Amazon employee dataset: 𝑒 = 15𝑙, nnz = 9
  • RCV1 text dataset: 𝑒 = 47𝑙, nnz = 76
  • Wiki multi-label dataset: 𝑒 = 31𝑙, nnz = 19
  • eXtreme Multi-label Learning (XML).

(Multiple labels per item, from a very large class of labels)

  • Unlike image/video data, there is no notion of spatial/time locality.

No CNN

  • Reduce the dimensionality via a linear sketching/embedding

Want: Beyond sparsity, learn additional structure One-hot encoded categorical data +Text parts

slide-6
SLIDE 6

Representing vectors in low-dimension

  • ðĩ ∈ ℝ-×/ Measurement matrix
  • If we ask: Linear compression,
  • And Linear recovery
  • Best learned

measurement/reconstruction matrices for l2 norm? ð‘Ķ ∈ ℝ/ 𝑧 = ðĩð‘Ķ 𝑧 ∈ ℝ- (𝑛 < 𝑒) ? ð‘Ķ ≈ ð‘Ķ Encode Recover

𝑚𝑗𝑜𝑓𝑏𝑠 𝑚𝑗𝑜𝑓𝑏𝑠

slide-7
SLIDE 7

Representing vectors in low-dimension

  • ðĩ ∈ ℝ-×/ Measurement matrix
  • If we ask Linear compression,
  • And Linear recovery
  • Best learned

measurement/reconstruction matrices for l2 norm?

  • PCA

ð‘Ķ ∈ ℝ/ 𝑧 = ðĩð‘Ķ 𝑧 ∈ ℝ- (𝑛 < 𝑒) ? ð‘Ķ ≈ ð‘Ķ Recover

𝑚𝑗𝑜𝑓𝑏𝑠 𝑚𝑗𝑜𝑓𝑏𝑠

Encode

slide-8
SLIDE 8

Representing vectors in low-dimension

  • ðĩ ∈ ℝ-×/ Measurement matrix
  • If we ask Linear compression,
  • And Linear recovery
  • Best learned

measurement/reconstruction matrices for l2 norm?

  • PCA
  • But if x is sparse we can do

better ð‘Ķ ∈ ℝ/ 𝑧 = ðĩð‘Ķ 𝑧 ∈ ℝ- (𝑛 < 𝑒) ? ð‘Ķ ≈ ð‘Ķ Recover

𝑚𝑗𝑜𝑓𝑏𝑠 𝑚𝑗𝑜𝑓𝑏𝑠

Encode

slide-9
SLIDE 9

Compressed Sensing (Donoho; Cand ˀ

es et al.; â€Ķ)

ð‘Ķ ∈ ℝ/ 𝑧 = ðĩð‘Ķ ∈ ℝ- (𝑛 < 𝑒) ? ð‘Ķ ≈ ð‘Ķ Recover

  • ðĩ ∈ ℝ-×/ Measurement matrix
  • If we ask Linear compression,
  • Recovery by convex opt
  • ℓI-min, Lasso,...
  • Near-perfect recovery for sparse

vectors.

  • Provably for Gaussian random A.

𝑔(ðĩ, 𝑧) ≔ argminOP ð‘ĶQ I

  • s. t. ðĩð‘ĶQ = 𝑧

ℓ𝟐-min 𝑚𝑗𝑜𝑓𝑏𝑠

Encode

slide-10
SLIDE 10

Compressed Sensing (Donoho; Cand ˀ

es et al.; â€Ķ)

ð‘Ķ ∈ ℝ/ 𝑧 = ðĩð‘Ķ ∈ ℝ- (𝑛 < 𝑒) ? ð‘Ķ ≈ ð‘Ķ Compress Recover

  • ðĩ ∈ ℝ-×/ Measurement matrix
  • If we ask Linear compression,
  • Recovery by convex opt
  • ℓI-min, Lasso,...
  • Near-perfect recovery for sparse

vectors.

  • Provably for Gaussian random A.
  • 1. If our vectors are

sparse +additional unkn known structure (e.g. one-hot encoded features, text+features, XML, etc)

  • 2. Can we LE

LEARN RN a measurement matrix A

  • 3. Make it work well for

convex opt decoder

slide-11
SLIDE 11

Comparisons of the recovery performance

Learned measurements + ℓI-min decoding [our method]

Gaussian measurements+ model-based CoSaMP

Gaussian + ℓI-min- decoding

Fraction of exactly recovered points Exact recovery: ð‘Ķ − ? ð‘Ķ U â‰Ī 10XIY Number of measurements (m)

It has structure knowledge!

slide-12
SLIDE 12

Learning a measurement matrix

  • Training data: 𝑜 sparse vectors ð‘ĶI, ð‘ĶU, â€Ķ , ð‘Ķ[ ∈ ℝ/

ð‘Ķ\ ∈ ℝ/

𝑧 = ðĩð‘Ķ\ ] ð‘Ķ\ = 𝑔 ðĩ, ðĩð‘Ķ\

∈ ℝ/

𝑔(ðĩ, 𝑧) ≔ argminOP ð‘ĶQ I

  • s. t. ðĩð‘ĶQ = 𝑧

ℓI-min ðĩ ∈ ℝ-×/ Recover

Encode

slide-13
SLIDE 13

Learning a measurement matrix

  • Training data: 𝑜 sparse vectors ð‘ĶI, ð‘ĶU, â€Ķ , ð‘Ķ[ ∈ ℝ/

ð‘Ķ\ ∈ ℝ/

𝑧 = ðĩð‘Ķ\

𝑔(ðĩ, 𝑧) ≔ argminOP ð‘ĶQ I

  • s. t. ðĩð‘ĶQ = 𝑧

ℓI-min ðĩ ∈ ℝ-×/

Objective function: min ∑\_I

[

ð‘Ķ\ − 𝑔 ðĩ, ðĩð‘Ķ\

U U

ðĩ ∈ ℝ-×/ Recover ] ð‘Ķ\ = 𝑔 ðĩ, ðĩð‘Ķ\

∈ ℝ/

Encode

slide-14
SLIDE 14

Learning a measurement matrix

  • Training data: 𝑜 sparse vectors ð‘ĶI, ð‘ĶU, â€Ķ , ð‘Ķ[ ∈ ℝ/

ð‘Ķ\ ∈ ℝ/

𝑧 = ðĩð‘Ķ\

𝑔(ðĩ, 𝑧) ≔ argminOP ð‘ĶQ I

  • s. t. ðĩð‘ĶQ = 𝑧

ℓI-min ðĩ ∈ ℝ-×/

Objective function: min ∑\_I

[

ð‘Ķ\ − 𝑔 ðĩ, ðĩð‘Ķ\

U U

ðĩ ∈ ℝ-×/

Problem: How to compute gradient w.r.t. ðĩ?

Recover ] ð‘Ķ\ = 𝑔 ðĩ, ðĩð‘Ķ\

∈ ℝ/

Encode

slide-15
SLIDE 15

Learning a measurement matrix

  • Training data: 𝑜 sparse vectors ð‘ĶI, ð‘ĶU, â€Ķ , ð‘Ķ[ ∈ ℝ/

ð‘Ķ\ ∈ ℝ/

𝑧 = ðĩð‘Ķ\ Recover

𝑔(ðĩ, 𝑧) ≔ argminOP ð‘ĶQ I

  • s. t. ðĩð‘ĶQ = 𝑧

ℓI-min ðĩ ∈ ℝ-×/

Objective function: min ∑\_I

[

ð‘Ķ\ − 𝑔 ðĩ, ðĩð‘Ķ\

U U

ðĩ ∈ ℝ-×/

Problem: How to compute gradient w.r.t. ðĩ? Key idea: Replace 𝑔(ðĩ, 𝑧) by a few steps of projected subgradient

] ð‘Ķ\ = 𝑔 ðĩ, ðĩð‘Ķ\

∈ ℝ/

Encode

slide-16
SLIDE 16

ℓI-AE: a novel autoencoder architecture

ð‘Ķ 𝑧 = ðĩð‘Ķ ðĩ ðĩ% ð‘Ķ(') = ðĩ%𝑧 ð‘Ļ' = sign(ð‘Ķ ' ) âˆ’ð›―' ð― − ðĩ%ðĩ ð‘Ļ' ð‘Ķ(1) ð‘Ķ(') BN âˆ’ð›―% ð― − ðĩ%ðĩ ð‘Ļ% ð‘Ķ(%4') ð‘Ķ(%) BN ð‘Ļ% = sign(ð‘Ķ % )

â€Ķ

ð‘Ķ 5 = ReLU(ð‘Ķ(%4')) 𝐉ðĻ𝐊ðŊðŪ 𝐏ðŊðŪ𝐊ðŊðŪ 𝐅ðĻ𝐝ðĐ𝐞𝐟𝐎 𝐄𝐟𝐝ðĐ𝐞𝐟𝐎

ð‘Ķ(`aI) = ð‘Ķ(`) − ð›―` ð― − ðĩdðĩ sign(ð‘Ķ(`)) One step of projected subgradient

slide-17
SLIDE 17

Real sparse datasets

Fraction of exactly recovered test points Our method performs the best!

[Our method]

2-layer

Test RMSE

2-layer

d = 15k, nnz = 9 d = 31k, nnz = 19 d = 47k, nnz = 76

Number of measurements (m)

slide-18
SLIDE 18

Summary

  • Key idea: We learn a compressed sensing measurement matrix by

unrolling the projected subgradient of ℓI-min decoder

  • Implemented as an autoencoder ℓI-AE
  • Compared 12 algorithms over 6 datasets (3 synthetic and 3 real)
  • Our method created perfect reconstruction with 1.1-3X fewer

measurements compared to previous state-of-the-art methods

  • Applied to Extreme multilabel classification, our method outperforms

SLEEC (Bhatia et al., 2015) Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189