 
              Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu 1 , Alex Dimakis 1 , Sujay Sanghavi 1 , Felix Yu 2 , Dan Holtmann-Rice 2 , Dmitry Storcheus 2 , Afshin Rostamizadeh 2 , Sanjiv Kumar 2 1 University of Texas at Austin, 2 Google Research Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189
Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling Shanshan Wu 1 , Alex Dimakis 1 , Sujay Sanghavi 1 , Felix Yu 2 , Dan Holtmann-Rice 2 , Dmitry Storcheus 2 , Afshin Rostamizadeh 2 , Sanjiv Kumar 2 1 University of Texas at Austin, 2 Google Research Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189
Motivation • Goal: Create good representations for sparse data
Motivation • Goal: Create good representations for sparse data • Amazon employee dataset: 𝑒 = 15𝑙, nnz = 9 One-hot encoded • RCV1 text dataset: 𝑒 = 47𝑙, nnz = 76 categorical data • Wiki multi-label dataset: 𝑒 = 31𝑙, nnz = 19 +Text parts • eXtreme Multi-label Learning (XML). (Multiple labels per item, from a very large class of labels)
Motivation • Goal: Create good representations for sparse data • Amazon employee dataset: 𝑒 = 15𝑙, nnz = 9 One-hot encoded • RCV1 text dataset: 𝑒 = 47𝑙, nnz = 76 categorical data • Wiki multi-label dataset: 𝑒 = 31𝑙, nnz = 19 +Text parts • eXtreme Multi-label Learning (XML). (Multiple labels per item, from a very large class of labels) • Unlike image/video data, there is no notion of spatial/time locality. No CNN • Reduce the dimensionality via a linear sketching/embedding Want: Beyond sparsity, learn additional structure
Representing vectors in low-dimension 𝑦 ∈ ℝ / 𝑦 ≈ 𝑦 ? • 𝐵 ∈ ℝ -×/ Measurement matrix • If we ask: Linear compression, Encode Recover • And Linear recovery 𝑚𝑗𝑜𝑓𝑏𝑠 𝑚𝑗𝑜𝑓𝑏𝑠 • Best learned 𝑧 = 𝐵𝑦 measurement/reconstruction matrices for l2 norm? 𝑧 ∈ ℝ - ( 𝑛 < 𝑒 )
Representing vectors in low-dimension 𝑦 ∈ ℝ / 𝑦 ≈ 𝑦 ? • 𝐵 ∈ ℝ -×/ Measurement matrix • If we ask Linear compression, Encode Recover • And Linear recovery 𝑚𝑗𝑜𝑓𝑏𝑠 𝑚𝑗𝑜𝑓𝑏𝑠 • Best learned 𝑧 = 𝐵𝑦 measurement/reconstruction matrices for l2 norm? 𝑧 ∈ ℝ - • PCA ( 𝑛 < 𝑒 )
Representing vectors in low-dimension 𝑦 ∈ ℝ / 𝑦 ≈ 𝑦 ? • 𝐵 ∈ ℝ -×/ Measurement matrix • If we ask Linear compression, Encode Recover • And Linear recovery 𝑚𝑗𝑜𝑓𝑏𝑠 𝑚𝑗𝑜𝑓𝑏𝑠 • Best learned 𝑧 = 𝐵𝑦 measurement/reconstruction matrices for l2 norm? 𝑧 ∈ ℝ - • PCA ( 𝑛 < 𝑒 ) • But if x is sparse we can do better
Compressed Sensing (Donoho; Cand ̀ e s et al.; …) 𝑦 ∈ ℝ / 𝑦 ≈ 𝑦 ? • 𝐵 ∈ ℝ -×/ Measurement matrix 𝑧 = 𝐵𝑦 ∈ ℝ - • If we ask Linear compression, ( 𝑛 < 𝑒 ) • Recovery by convex opt Encode Recover • ℓ I -min, Lasso,... ℓ 𝟐 -min 𝑚𝑗𝑜𝑓𝑏𝑠 • Near-perfect recovery for sparse vectors. • Provably for Gaussian random A. s. t. 𝐵𝑦 Q = 𝑧 𝑔(𝐵, 𝑧) ≔ argmin O P 𝑦 Q I
Compressed Sensing (Donoho; Cand ̀ e s et al.; …) 1. If our vectors are sparse +additional unkn known structure 𝑦 ∈ ℝ / 𝑦 ≈ 𝑦 ? • 𝐵 ∈ ℝ -×/ Measurement matrix (e.g. one-hot encoded features, 𝑧 = 𝐵𝑦 ∈ ℝ - text+features, XML, etc) • If we ask Linear compression, ( 𝑛 < 𝑒 ) 2. Can we LE LEARN RN a measurement matrix A • Recovery by convex opt Compress Recover • ℓ I -min, Lasso,... • Near-perfect recovery for sparse 3. Make it work well for vectors. convex opt decoder • Provably for Gaussian random A.
Comparisons of the recovery performance Learned measurements + ℓ I -min decoding [our method] Fraction of exactly recovered points Gaussian measurements+ model-based CoSaMP It has structure knowledge! Exact recovery: 𝑦 U ≤ 10 XIY 𝑦 − ? Gaussian + ℓ I -min- decoding Number of measurements ( m )
Learning a measurement matrix • Training data: 𝑜 sparse vectors 𝑦 I , 𝑦 U , … , 𝑦 [ ∈ ℝ / ∈ ℝ / 𝑦 \ ∈ ℝ / 𝑦 \ = 𝑔 𝐵, 𝐵𝑦 \ ] 𝑧 = 𝐵𝑦 \ Encode Recover 𝐵 ∈ ℝ -×/ ℓ I -min s. t. 𝐵𝑦 Q = 𝑧 𝑔(𝐵, 𝑧) ≔ argmin O P 𝑦 Q I
Learning a measurement matrix • Training data: 𝑜 sparse vectors 𝑦 I , 𝑦 U , … , 𝑦 [ ∈ ℝ / Objective function: ∈ ℝ / 𝑦 \ ∈ ℝ / 𝑦 \ = 𝑔 𝐵, 𝐵𝑦 \ ] [ U min ∑ \_I 𝑦 \ − 𝑔 𝐵, 𝐵𝑦 \ U 𝐵 ∈ ℝ -×/ 𝑧 = 𝐵𝑦 \ Encode Recover 𝐵 ∈ ℝ -×/ ℓ I -min s. t. 𝐵𝑦 Q = 𝑧 𝑔(𝐵, 𝑧) ≔ argmin O P 𝑦 Q I
Learning a measurement matrix • Training data: 𝑜 sparse vectors 𝑦 I , 𝑦 U , … , 𝑦 [ ∈ ℝ / Objective function: ∈ ℝ / 𝑦 \ ∈ ℝ / 𝑦 \ = 𝑔 𝐵, 𝐵𝑦 \ ] [ U min ∑ \_I 𝑦 \ − 𝑔 𝐵, 𝐵𝑦 \ U 𝐵 ∈ ℝ -×/ 𝑧 = 𝐵𝑦 \ Encode Recover Problem: 𝐵 ∈ ℝ -×/ ℓ I -min How to compute gradient w.r.t. 𝐵 ? s. t. 𝐵𝑦 Q = 𝑧 𝑔(𝐵, 𝑧) ≔ argmin O P 𝑦 Q I
Learning a measurement matrix • Training data: 𝑜 sparse vectors 𝑦 I , 𝑦 U , … , 𝑦 [ ∈ ℝ / Objective function: ∈ ℝ / 𝑦 \ ∈ ℝ / 𝑦 \ = 𝑔 𝐵, 𝐵𝑦 \ ] [ U min ∑ \_I 𝑦 \ − 𝑔 𝐵, 𝐵𝑦 \ U 𝐵 ∈ ℝ -×/ 𝑧 = 𝐵𝑦 \ Encode Recover Problem: 𝐵 ∈ ℝ -×/ ℓ I -min How to compute gradient w.r.t. 𝐵 ? Key idea: s. t. 𝐵𝑦 Q = 𝑧 Replace 𝑔(𝐵, 𝑧) by a few steps of 𝑔(𝐵, 𝑧) ≔ argmin O P 𝑦 Q I projected subgradient
ℓ I -AE: a novel autoencoder architecture 𝑨 ' = sign(𝑦 ' ) 𝑨 % = sign(𝑦 % ) 𝑦 (%4') 𝑦 (') = 𝐵 % 𝑧 5 = ReLU(𝑦 (%4') ) −𝛽 ' 𝐽 − 𝐵 % 𝐵 𝑨 ' 𝑦 (1) −𝛽 % 𝐽 − 𝐵 % 𝐵 𝑨 % 𝑦 𝑦 𝑧 = 𝐵𝑦 BN BN … 𝐵 % 𝐵 𝑦 (') 𝑦 (%) 𝐉𝐨𝐪𝐯𝐮 𝐏𝐯𝐮𝐪𝐯𝐮 𝐅𝐨𝐝𝐩𝐞𝐟𝐬 𝐄𝐟𝐝𝐩𝐞𝐟𝐬 One step of projected subgradient 𝑦 (`aI) = 𝑦 (`) − 𝛽 ` 𝐽 − 𝐵 d 𝐵 sign(𝑦 (`) )
Real sparse datasets d = 31k, nnz = 19 d = 15k, nnz = 9 d = 47k, nnz = 76 2-layer [Our method] Fraction of exactly recovered 2-layer test points Test RMSE Our method performs the best! Number of measurements ( m )
Summary • Key idea: We learn a compressed sensing measurement matrix by unrolling the projected subgradient of ℓ I -min decoder • Implemented as an autoencoder ℓ I -AE • Compared 12 algorithms over 6 datasets (3 synthetic and 3 real) • Our method created perfect reconstruction with 1.1-3X fewer measurements compared to previous state-of-the-art methods • Applied to Extreme multilabel classification, our method outperforms SLEEC (Bhatia et al., 2015) Wed Jun 12th 06:30 -- 09:00 PM @ Pacific Ballroom #189
Recommend
More recommend