Learning a Compressed Sensing Measurement Matrix via Gradient - - PowerPoint PPT Presentation

▶

Jan 19, 2024 152 likes •340 views

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu 1 , Alex Dimakis 1 , Sujay Sanghavi 1 , Felix Yu 2 , Dan Holtmann-Rice 2 , Dmitry Storcheus 2 , Afshin Rostamizadeh 2 , Sanjiv Kumar 2 1 University of Texas at

SLIDE 1

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling

Shanshan Wu1, Alex Dimakis1, Sujay Sanghavi1, Felix Yu2, Dan Holtmann-Rice2, Dmitry Storcheus2, Afshin Rostamizadeh2, Sanjiv Kumar2

1University of Texas at Austin, 2Google Research

Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189

SLIDE 2

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling

Shanshan Wu1, Alex Dimakis1, Sujay Sanghavi1, Felix Yu2, Dan Holtmann-Rice2, Dmitry Storcheus2, Afshin Rostamizadeh2, Sanjiv Kumar2

1University of Texas at Austin, 2Google Research

Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189

SLIDE 3

Motivation

Goal: Create good representations for sparse data

SLIDE 4

Motivation

Goal: Create good representations for sparse data
Amazon employee dataset: 𝑒 = 15𝑙, nnz = 9
RCV1 text dataset: 𝑒 = 47𝑙, nnz = 76
Wiki multi-label dataset: 𝑒 = 31𝑙, nnz = 19
eXtreme Multi-label Learning (XML).

(Multiple labels per item, from a very large class of labels)

One-hot encoded categorical data +Text parts

SLIDE 5

Motivation

Goal: Create good representations for sparse data
Amazon employee dataset: 𝑒 = 15𝑙, nnz = 9
RCV1 text dataset: 𝑒 = 47𝑙, nnz = 76
Wiki multi-label dataset: 𝑒 = 31𝑙, nnz = 19
eXtreme Multi-label Learning (XML).

(Multiple labels per item, from a very large class of labels)

Unlike image/video data, there is no notion of spatial/time locality.

No CNN

Reduce the dimensionality via a linear sketching/embedding

Want: Beyond sparsity, learn additional structure One-hot encoded categorical data +Text parts

SLIDE 6

Representing vectors in low-dimension

𝐵 ∈ ℝ-×/ Measurement matrix
If we ask: Linear compression,
And Linear recovery
Best learned

measurement/reconstruction matrices for l2 norm? 𝑦 ∈ ℝ/ 𝑧 = 𝐵𝑦 𝑧 ∈ ℝ- (𝑛 < 𝑒) ? 𝑦 ≈ 𝑦 Encode Recover

𝑚𝑗𝑜𝑓𝑏𝑠 𝑚𝑗𝑜𝑓𝑏𝑠

SLIDE 7

Representing vectors in low-dimension

𝐵 ∈ ℝ-×/ Measurement matrix
If we ask Linear compression,
And Linear recovery
Best learned

measurement/reconstruction matrices for l2 norm?

𝑦 ∈ ℝ/ 𝑧 = 𝐵𝑦 𝑧 ∈ ℝ- (𝑛 < 𝑒) ? 𝑦 ≈ 𝑦 Recover

𝑚𝑗𝑜𝑓𝑏𝑠 𝑚𝑗𝑜𝑓𝑏𝑠

Encode

SLIDE 8

Representing vectors in low-dimension

𝐵 ∈ ℝ-×/ Measurement matrix
If we ask Linear compression,
And Linear recovery
Best learned

measurement/reconstruction matrices for l2 norm?

PCA
But if x is sparse we can do

better 𝑦 ∈ ℝ/ 𝑧 = 𝐵𝑦 𝑧 ∈ ℝ- (𝑛 < 𝑒) ? 𝑦 ≈ 𝑦 Recover

𝑚𝑗𝑜𝑓𝑏𝑠 𝑚𝑗𝑜𝑓𝑏𝑠

Encode

SLIDE 9

Compressed Sensing (Donoho; Cand ̀

es et al.; …)

𝑦 ∈ ℝ/ 𝑧 = 𝐵𝑦 ∈ ℝ- (𝑛 < 𝑒) ? 𝑦 ≈ 𝑦 Recover

𝐵 ∈ ℝ-×/ Measurement matrix
If we ask Linear compression,
Recovery by convex opt
ℓI-min, Lasso,...
Near-perfect recovery for sparse

vectors.

Provably for Gaussian random A.

𝑔(𝐵, 𝑧) ≔ argminOP 𝑦Q I

s. t. 𝐵𝑦Q = 𝑧

ℓ𝟐-min 𝑚𝑗𝑜𝑓𝑏𝑠

Encode

SLIDE 10

Compressed Sensing (Donoho; Cand ̀

es et al.; …)

𝑦 ∈ ℝ/ 𝑧 = 𝐵𝑦 ∈ ℝ- (𝑛 < 𝑒) ? 𝑦 ≈ 𝑦 Compress Recover

𝐵 ∈ ℝ-×/ Measurement matrix
If we ask Linear compression,
Recovery by convex opt
ℓI-min, Lasso,...
Near-perfect recovery for sparse

vectors.

Provably for Gaussian random A.
1. If our vectors are

sparse +additional unkn known structure (e.g. one-hot encoded features, text+features, XML, etc)

2. Can we LE

LEARN RN a measurement matrix A

3. Make it work well for

convex opt decoder

SLIDE 11

Comparisons of the recovery performance

Learned measurements + ℓI-min decoding [our method]

Gaussian measurements+ model-based CoSaMP

Gaussian + ℓI-min- decoding

Fraction of exactly recovered points Exact recovery: 𝑦 − ? 𝑦 U ≤ 10XIY Number of measurements (m)

It has structure knowledge!

SLIDE 12

Learning a measurement matrix

Training data: 𝑜 sparse vectors 𝑦I, 𝑦U, … , 𝑦[ ∈ ℝ/

𝑦\ ∈ ℝ/

𝑧 = 𝐵𝑦\ ] 𝑦\ = 𝑔 𝐵, 𝐵𝑦\

∈ ℝ/

𝑔(𝐵, 𝑧) ≔ argminOP 𝑦Q I

s. t. 𝐵𝑦Q = 𝑧

ℓI-min 𝐵 ∈ ℝ-×/ Recover

Encode

SLIDE 13

Learning a measurement matrix

Training data: 𝑜 sparse vectors 𝑦I, 𝑦U, … , 𝑦[ ∈ ℝ/

𝑦\ ∈ ℝ/

𝑧 = 𝐵𝑦\

𝑔(𝐵, 𝑧) ≔ argminOP 𝑦Q I

s. t. 𝐵𝑦Q = 𝑧

ℓI-min 𝐵 ∈ ℝ-×/

Objective function: min ∑\_I

[

𝑦\ − 𝑔 𝐵, 𝐵𝑦\

U U

𝐵 ∈ ℝ-×/ Recover ] 𝑦\ = 𝑔 𝐵, 𝐵𝑦\

∈ ℝ/

Encode

SLIDE 14

Learning a measurement matrix

Training data: 𝑜 sparse vectors 𝑦I, 𝑦U, … , 𝑦[ ∈ ℝ/

𝑦\ ∈ ℝ/

𝑧 = 𝐵𝑦\

𝑔(𝐵, 𝑧) ≔ argminOP 𝑦Q I

s. t. 𝐵𝑦Q = 𝑧

ℓI-min 𝐵 ∈ ℝ-×/

Objective function: min ∑\_I

[

𝑦\ − 𝑔 𝐵, 𝐵𝑦\

U U

𝐵 ∈ ℝ-×/

Problem: How to compute gradient w.r.t. 𝐵?

Recover ] 𝑦\ = 𝑔 𝐵, 𝐵𝑦\

∈ ℝ/

Encode

SLIDE 15

Learning a measurement matrix

Training data: 𝑜 sparse vectors 𝑦I, 𝑦U, … , 𝑦[ ∈ ℝ/

𝑦\ ∈ ℝ/

𝑧 = 𝐵𝑦\ Recover

𝑔(𝐵, 𝑧) ≔ argminOP 𝑦Q I

s. t. 𝐵𝑦Q = 𝑧

ℓI-min 𝐵 ∈ ℝ-×/

Objective function: min ∑\_I

[

𝑦\ − 𝑔 𝐵, 𝐵𝑦\

U U

𝐵 ∈ ℝ-×/

Problem: How to compute gradient w.r.t. 𝐵? Key idea: Replace 𝑔(𝐵, 𝑧) by a few steps of projected subgradient

] 𝑦\ = 𝑔 𝐵, 𝐵𝑦\

∈ ℝ/

Encode

SLIDE 16

ℓI-AE: a novel autoencoder architecture

𝑦 𝑧 = 𝐵𝑦 𝐵 𝐵% 𝑦(') = 𝐵%𝑧 𝑨' = sign(𝑦 ' ) −𝛽' 𝐽 − 𝐵%𝐵 𝑨' 𝑦(1) 𝑦(') BN −𝛽% 𝐽 − 𝐵%𝐵 𝑨% 𝑦(%4') 𝑦(%) BN 𝑨% = sign(𝑦 % )

…

𝑦 5 = ReLU(𝑦(%4')) 𝐉𝐨𝐪𝐯𝐮 𝐏𝐯𝐮𝐪𝐯𝐮 𝐅𝐨𝐝𝐩𝐞𝐟𝐬 𝐄𝐟𝐝𝐩𝐞𝐟𝐬

𝑦(`aI) = 𝑦(`) − 𝛽` 𝐽 − 𝐵d𝐵 sign(𝑦(`)) One step of projected subgradient

SLIDE 17

Real sparse datasets

Fraction of exactly recovered test points Our method performs the best!

[Our method]

2-layer

Test RMSE

2-layer

d = 15k, nnz = 9 d = 31k, nnz = 19 d = 47k, nnz = 76

Number of measurements (m)

SLIDE 18

Summary

Key idea: We learn a compressed sensing measurement matrix by

unrolling the projected subgradient of ℓI-min decoder

Implemented as an autoencoder ℓI-AE
Compared 12 algorithms over 6 datasets (3 synthetic and 3 real)
Our method created perfect reconstruction with 1.1-3X fewer

measurements compared to previous state-of-the-art methods

Applied to Extreme multilabel classification, our method outperforms

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling

Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling

Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189

Motivation

Motivation

One-hot encoded categorical data +Text parts

Motivation

No CNN

Want: Beyond sparsity, learn additional structure One-hot encoded categorical data +Text parts

Representing vectors in low-dimension

measurement/reconstruction matrices for l2 norm? 𝑦 ∈ ℝ/ 𝑧 = 𝐵𝑦 𝑧 ∈ ℝ- (𝑛 < 𝑒) ? 𝑦 ≈ 𝑦 Encode Recover

Representing vectors in low-dimension

measurement/reconstruction matrices for l2 norm?

𝑦 ∈ ℝ/ 𝑧 = 𝐵𝑦 𝑧 ∈ ℝ- (𝑛 < 𝑒) ? 𝑦 ≈ 𝑦 Recover

Encode

Representing vectors in low-dimension

measurement/reconstruction matrices for l2 norm?

better 𝑦 ∈ ℝ/ 𝑧 = 𝐵𝑦 𝑧 ∈ ℝ- (𝑛 < 𝑒) ? 𝑦 ≈ 𝑦 Recover

Encode

Compressed Sensing (Donoho; Cand ̀

es et al.; …)

𝑦 ∈ ℝ/ 𝑧 = 𝐵𝑦 ∈ ℝ- (𝑛 < 𝑒) ? 𝑦 ≈ 𝑦 Recover

vectors.

Encode

Compressed Sensing (Donoho; Cand ̀

es et al.; …)

𝑦 ∈ ℝ/ 𝑧 = 𝐵𝑦 ∈ ℝ- (𝑛 < 𝑒) ? 𝑦 ≈ 𝑦 Compress Recover

vectors.

Comparisons of the recovery performance

Learning a measurement matrix

Encode

Learning a measurement matrix

Objective function: min ∑\_I

𝑦\ − 𝑔 𝐵, 𝐵𝑦\

Encode

Learning a measurement matrix

Objective function: min ∑\_I

𝑦\ − 𝑔 𝐵, 𝐵𝑦\

Problem: How to compute gradient w.r.t. 𝐵?

Encode

Learning a measurement matrix

Objective function: min ∑\_I

𝑦\ − 𝑔 𝐵, 𝐵𝑦\

Problem: How to compute gradient w.r.t. 𝐵? Key idea: Replace 𝑔(𝐵, 𝑧) by a few steps of projected subgradient

Encode

ℓI-AE: a novel autoencoder architecture

Real sparse datasets

Summary

unrolling the projected subgradient of ℓI-min decoder

measurements compared to previous state-of-the-art methods

SLEEC (Bhatia et al., 2015) Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189