greedy sparsity constrained optimization
play

Greedy Sparsity-Constrained Optimization Sohail Bahmani with Petros - PowerPoint PPT Presentation

Greedy Sparsity-Constrained Optimization Sohail Bahmani with Petros Boufounos and Bhiksha Raj 45 th Asilomar Conference, Nov. 2011 Outline Background Compressed Sensing Problem Formulation Generalizing Compressed Sensing


  1. Greedy Sparsity-Constrained Optimization Sohail Bahmani with Petros Boufounos and Bhiksha Raj 45 th Asilomar Conference, Nov. 2011

  2. Outline  Background  Compressed Sensing  Problem Formulation  Generalizing Compressed Sensing  Example  Prior Work  GraSP Algorithm  Main Result  Required Conditions  Example: ℓ 2 -regularized Logistic Regression 45 th Asilomar Conference, Nov. 2011

  3. Compressed Sensing (1) Linear Inverse Problem 𝐲 ⋆ ∈ ℝ 𝒒 Sparse signal Measurement matrix 𝐁 ∈ ℝ 𝑜×𝑞 𝐳 = 𝐁𝐲 ⋆ + 𝐟 Measurement 𝐟 ∈ ℝ 𝑜 Noise Given 𝐳 and 𝐁 with 𝑜 ≪ 𝑞 , estimate 𝐲 ⋆  Applications:  Biomedical Imaging, Image Denoising, Image Segmentation, Filter Design, System Identification, etc 45 th Asilomar Conference, Nov. 2011

  4. Compressed Sensing (2) 𝑞 𝑞 𝐲 0 = supp 𝐲 = 𝕁 𝑦 𝑗 ≠ 0 𝐲 1 = 𝑦 𝑗 𝑗=1 𝑗=1 ℓ 𝟏 -minimization ℓ 𝟐 -minimization arg min arg min 𝐲 0 𝐲 1 𝐲 𝐲 (L0) (L1) 𝐁𝐲 − 𝐳 2 ≤ 𝜗 𝐁𝐲 − 𝐳 2 ≤ 𝜗 subject to subject to (1) Convexify ℓ 𝟏 -constrained LS ℓ 𝟐 -constrained LS arg min arg min 2 2 𝐲 𝐁𝐲 − 𝐳 2 𝐲 𝐁𝐲 − 𝐳 2 (C0) (C1) 𝐲 0 ≤ 𝑡 𝐲 1 ≤ 𝑆 subject to subject to Use ℓ 1 -norm as a proxy for ℓ 0 -pseudonorm (Greedy) Approximate Solvers 45 th Asilomar Conference, Nov. 2011

  5. Generalizing Compressed Sensing  Common assumptions in CS consider nonlinear relations  The relation between the input and response has a linear form: 𝒛 = 𝐁𝐲 + 𝐟 2  The error is usually measured in squared error : 𝑔 𝐲 = 𝐁𝐲 − 𝐳 2 other measures of fidelity General Formulation Let 𝑔: ℝ 𝑞 → ℝ be a cost function. Approximate the solution to 𝐲 = arg min 𝑔(𝐲) 𝐲 subject to 𝐲 0 ≤ 𝑡. 2 we get the ℓ 0 -constrained least squares formulation in CS  For 𝑔 𝐲 = 𝐁𝐲 − 𝐳 2  We will see ℓ 𝟑 -regularized logistic loss as another example for 𝑔 𝐲  More generally, 𝑔 𝐲 can be the empirical loss associated with some observations in statistical estimation problems 45 th Asilomar Conference, Nov. 2011

  6. Example  Gene selection problem  Data points 𝐛 ∈ ℝ 𝑞 : Gene expression coefficients obtained from tissue samples  Labels 𝑧 ∈ 0,1 : Determines healthy ( 𝑧 = 0 ) vs. cancer ( 𝑧 = 1 ) samples 𝑜  Observation: 𝑜 copies of 𝐛, 𝑧 namely iid instances 𝐛 𝑗 , 𝑧 𝑗 𝑗=1  Restriction: Fewer samples than dimensions, i.e., 𝑜 < 𝑞  Goal: Find 𝑡 ≪ 𝑞 entries (i.e., variables) of data points 𝐛 using which label 𝑧 can be predicted with least “error” Nonlinearity  MLE  𝑧|𝐛 has a likelihood function that depends on a 𝑡 -sparse parameter vector 𝐲 1 𝑜 𝑜 Empirical loss: 𝑔 𝐲 = −log 𝑚(𝐲 ; 𝐛 𝑗 , 𝑧 𝑗 ) 𝑗=1  Min. the loss (equivalent to max. joint likelihood) to estimate true parameter 𝐲 ⋆ 45 th Asilomar Conference, Nov. 2011

  7. Prior Work  In statistical estimation framework: convex 𝑔 + ℓ 1 -regularization  Kakade et al. [AISTAT’ 09] : Loss functions from exponential family  Negahban et al. [NIPS’ 09] : M-estimators and “decomposable” norms  Agarwal et al. [NIPS’ 10] : Projected Gradient Descent with ℓ 1 -constraint  Issue: Sparsity cannot be guaranteed to be optimal, because  Nonlinearity causes solution-dependent error bounds that can become very large  ℓ 1 -regularization is merely a proxy to induce sparsity  We consider a greedy algorithm for the problem  Algorithm enforces sparsity directly  Generally has lower computational complexity 45 th Asilomar Conference, Nov. 2011

  8. Algorithm Inspired by the CoSaMP algorithm Gradient Support Pursuit [Needell & Tropp ’09] 𝑔 ⋅ and 𝑡 Input 𝐲 𝐴 = 𝛼𝑔 𝐲 𝐜 𝐜 𝑡 𝐲 Output 1 𝐲 = 𝟏 0. Initialize supp 𝐲 4 Repeat 𝒰 𝐴 = 𝛼𝑔 𝐲 1. Compute Gradient Ω 3 2. Identify Coordinates Ω = supp 𝐴 2𝑡 2 𝒰 = supp 𝐲 ⋃Ω 3. Merge Supports 5 4. Find Crude Estimate 𝐜 = arg min 𝑔 𝐲 𝐲 s.t. 𝐲| 𝒰 𝑑 = 𝟏 𝐲 = 𝐜 𝑡 5. Prune Until Halting Condition Holds Tractable because 𝑔 obeys certain conditions Update 45 th Asilomar Conference, Nov. 2011

  9. Main Result Theroem If 𝑔 satisfies certain properties then the estimate obtained at the 𝑗 -th iteration of GraSP obeys 𝑗 − 𝐲 ⋆ 2 ≤ 𝜆 𝑗 𝐲 ⋆ 2 + 𝐷 𝛼 𝑔 𝐲 ⋆ 𝐲 , ℐ 2 where ℐ contains the indices of the 3𝑡 largest coordinates of 𝛼𝑔 𝐲 ⋆ in magnitude.  For 𝜆 < 1 (ie., contraction factor ) we get linear rate of convergence up to an approximation error  In statistical estimation problems 𝛼𝑔 𝐲 ⋆ | ℐ can be related to the statistical precision of the estimator 45 th Asilomar Conference, Nov. 2011

  10. Required Conditions Definition (Stable Hessian Property) For 𝑔: ℝ 𝑞 ⟶ ℝ with Hessian 𝐈 𝑔 ⋅ let 𝚬 T 𝐈 𝑔 𝐲 𝚬 𝐵 𝑙 𝐲 ≔ sup supp 𝐲 ⋃supp 𝚬 ≤𝑙 𝚬 2 =1 𝚬 T 𝐈 𝑔 𝐲 𝚬 . 𝐶 𝑙 𝐲 ≔ inf supp 𝐲 ⋃supp 𝚬 ≤𝑙 𝚬 2 =1 Then we say 𝑔 satisfies SHP of order 𝑙 with constant 𝜈 𝑙 if we have 𝐵 𝑙 𝐲 𝐶 𝑙 𝐲 ≤ 𝜈 𝑙 for all 𝑙 -sparse vectors 𝐲 .  SHP basically says that symmetric restrictions of the Hessian are well-conditioned 1 2 as in CS, SHP implies the Restricted Isometry Property  For 𝑔 𝐲 = 2 𝐁𝐲 − 𝐳 2 1 + 𝜀 𝑙 ≤ 𝜈 𝑙 ⇒ 𝜀 𝑙 ≤ 𝜈 𝑙 − 1 𝜈 𝑙 + 1 1 − 𝜀 𝑙 45 th Asilomar Conference, Nov. 2011

  11. Example  Logistic model: 1  𝑧 ∣ 𝐛; 𝐲 ~ Bernoulli( 1+𝑓 − 𝐛,𝐲 ) 𝑜  For iid observation pairs 𝐛 𝑗 , 𝑧 𝑗 write the logistic loss as 𝑗=1 𝑜 ℒ 𝐲 ≔ 1 𝑜 log 1 + 𝑓 𝐛 𝑗 ,𝐲 − 𝑧 𝑗 𝐛 𝑗 , 𝐲 . 𝑗=1  ℓ 𝟑 -regularized logistic regression with sparsity constraint: 𝑔 𝐲 = ℒ 𝐲 + 𝜃 2 arg min 2 𝐲 2 𝐲 subject to 𝐲 0 ≤ 𝑡 . 𝛽 𝑙  We can show 𝜈 𝑙 ≤ 1 + 4𝜃 , where 𝛽 𝑙 = max 𝒧 𝜇 max (𝐁 𝒧 ) subject to 𝒧 ≤ 𝑙. 45 th Asilomar Conference, Nov. 2011

  12. Main Result Revisited If 𝑔 satisfies certain properties then the estimate obtained at the 𝑗 -th iteration of GraSP obeys 𝑗 − 𝐲 ⋆ 2 ≤ 𝜆 𝑗 𝐲 ⋆ 2 + 𝐷 𝛼 𝑔 𝐲 ⋆ 𝐲 , ℐ 2 where ℐ contains the indices of the 3𝑡 largest coordinates of 𝛼𝑔 𝐲 ⋆ in magnitude. Theorem If 𝑔 satisfies SHP of order 𝟓𝒕 with constant 𝝂 𝟓𝒕 < 𝟑 and 𝑪 𝟓𝒕 𝐲 > 𝝑 , then the estimate obtained at the 𝑗 -th iteration of GraSP obeys 2 − 1 𝑗 𝐲 ⋆ 2 + 2 𝜈 4𝑡 + 2 𝑗 − 𝐲 ⋆ 𝛼 𝑔 𝐲 ⋆ 𝐲 2 ≤ 𝜈 4𝑡 , 2 𝜗 2 − 𝜈 4𝑡 ℐ 2 where ℐ contains the indices of the 3𝑡 largest coordinates of 𝛼𝑔 𝐲 ⋆ in magnitude. 45 th Asilomar Conference, Nov. 2011

  13. Summary  Extend CS results to Nonlinear Models and Different Error Measures  ℓ 1 - regularization may not yield sufficiently sparse solutions because of the type of cost functions introduced by nonlinearities in the model  GraSP Algorithm  Greedy method that always gives a sparse solution  Accuracy is guaranteed for the class of functions that satisfy SHP  Linear rate of convergence up to the approximation error  Some interesting problems to study  Deterministic results, e.g., using equivalent of incoherence  Relax SHP to an entirely local condition 45 th Asilomar Conference, Nov. 2011

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend