Greedy Sparsity-Constrained Optimization Sohail Bahmani with Petros - PowerPoint PPT Presentation

Greedy Sparsity-Constrained Optimization Sohail Bahmani with Petros Boufounos and Bhiksha Raj 45 th Asilomar Conference, Nov. 2011

Outline  Background  Compressed Sensing  Problem Formulation  Generalizing Compressed Sensing  Example  Prior Work  GraSP Algorithm  Main Result  Required Conditions  Example: ℓ 2 -regularized Logistic Regression 45 th Asilomar Conference, Nov. 2011

Compressed Sensing (1) Linear Inverse Problem 𝐲 ⋆ ∈ ℝ 𝒒 Sparse signal Measurement matrix 𝐁 ∈ ℝ 𝑜×𝑞 𝐳 = 𝐁𝐲 ⋆ + 𝐟 Measurement 𝐟 ∈ ℝ 𝑜 Noise Given 𝐳 and 𝐁 with 𝑜 ≪ 𝑞 , estimate 𝐲 ⋆  Applications:  Biomedical Imaging, Image Denoising, Image Segmentation, Filter Design, System Identification, etc 45 th Asilomar Conference, Nov. 2011

Compressed Sensing (2) 𝑞 𝑞 𝐲 0 = supp 𝐲 = 𝕁 𝑦 𝑗 ≠ 0 𝐲 1 = 𝑦 𝑗 𝑗=1 𝑗=1 ℓ 𝟏 -minimization ℓ 𝟐 -minimization arg min arg min 𝐲 0 𝐲 1 𝐲 𝐲 (L0) (L1) 𝐁𝐲 − 𝐳 2 ≤ 𝜗 𝐁𝐲 − 𝐳 2 ≤ 𝜗 subject to subject to (1) Convexify ℓ 𝟏 -constrained LS ℓ 𝟐 -constrained LS arg min arg min 2 2 𝐲 𝐁𝐲 − 𝐳 2 𝐲 𝐁𝐲 − 𝐳 2 (C0) (C1) 𝐲 0 ≤ 𝑡 𝐲 1 ≤ 𝑆 subject to subject to Use ℓ 1 -norm as a proxy for ℓ 0 -pseudonorm (Greedy) Approximate Solvers 45 th Asilomar Conference, Nov. 2011

Generalizing Compressed Sensing  Common assumptions in CS consider nonlinear relations  The relation between the input and response has a linear form: 𝒛 = 𝐁𝐲 + 𝐟 2  The error is usually measured in squared error : 𝑔 𝐲 = 𝐁𝐲 − 𝐳 2 other measures of fidelity General Formulation Let 𝑔: ℝ 𝑞 → ℝ be a cost function. Approximate the solution to 𝐲 = arg min 𝑔(𝐲) 𝐲 subject to 𝐲 0 ≤ 𝑡. 2 we get the ℓ 0 -constrained least squares formulation in CS  For 𝑔 𝐲 = 𝐁𝐲 − 𝐳 2  We will see ℓ 𝟑 -regularized logistic loss as another example for 𝑔 𝐲  More generally, 𝑔 𝐲 can be the empirical loss associated with some observations in statistical estimation problems 45 th Asilomar Conference, Nov. 2011

Example  Gene selection problem  Data points 𝐛 ∈ ℝ 𝑞 : Gene expression coefficients obtained from tissue samples  Labels 𝑧 ∈ 0,1 : Determines healthy ( 𝑧 = 0 ) vs. cancer ( 𝑧 = 1 ) samples 𝑜  Observation: 𝑜 copies of 𝐛, 𝑧 namely iid instances 𝐛 𝑗 , 𝑧 𝑗 𝑗=1  Restriction: Fewer samples than dimensions, i.e., 𝑜 < 𝑞  Goal: Find 𝑡 ≪ 𝑞 entries (i.e., variables) of data points 𝐛 using which label 𝑧 can be predicted with least “error” Nonlinearity  MLE  𝑧|𝐛 has a likelihood function that depends on a 𝑡 -sparse parameter vector 𝐲 1 𝑜 𝑜 Empirical loss: 𝑔 𝐲 = −log 𝑚(𝐲 ; 𝐛 𝑗 , 𝑧 𝑗 ) 𝑗=1  Min. the loss (equivalent to max. joint likelihood) to estimate true parameter 𝐲 ⋆ 45 th Asilomar Conference, Nov. 2011

Prior Work  In statistical estimation framework: convex 𝑔 + ℓ 1 -regularization  Kakade et al. [AISTAT’ 09] : Loss functions from exponential family  Negahban et al. [NIPS’ 09] : M-estimators and “decomposable” norms  Agarwal et al. [NIPS’ 10] : Projected Gradient Descent with ℓ 1 -constraint  Issue: Sparsity cannot be guaranteed to be optimal, because  Nonlinearity causes solution-dependent error bounds that can become very large  ℓ 1 -regularization is merely a proxy to induce sparsity  We consider a greedy algorithm for the problem  Algorithm enforces sparsity directly  Generally has lower computational complexity 45 th Asilomar Conference, Nov. 2011

Algorithm Inspired by the CoSaMP algorithm Gradient Support Pursuit [Needell & Tropp ’09] 𝑔 ⋅ and 𝑡 Input 𝐲 𝐴 = 𝛼𝑔 𝐲 𝐜 𝐜 𝑡 𝐲 Output 1 𝐲 = 𝟏 0. Initialize supp 𝐲 4 Repeat 𝒰 𝐴 = 𝛼𝑔 𝐲 1. Compute Gradient Ω 3 2. Identify Coordinates Ω = supp 𝐴 2𝑡 2 𝒰 = supp 𝐲 ⋃Ω 3. Merge Supports 5 4. Find Crude Estimate 𝐜 = arg min 𝑔 𝐲 𝐲 s.t. 𝐲| 𝒰 𝑑 = 𝟏 𝐲 = 𝐜 𝑡 5. Prune Until Halting Condition Holds Tractable because 𝑔 obeys certain conditions Update 45 th Asilomar Conference, Nov. 2011

Main Result Theroem If 𝑔 satisfies certain properties then the estimate obtained at the 𝑗 -th iteration of GraSP obeys 𝑗 − 𝐲 ⋆ 2 ≤ 𝜆 𝑗 𝐲 ⋆ 2 + 𝐷 𝛼 𝑔 𝐲 ⋆ 𝐲 , ℐ 2 where ℐ contains the indices of the 3𝑡 largest coordinates of 𝛼𝑔 𝐲 ⋆ in magnitude.  For 𝜆 < 1 (ie., contraction factor ) we get linear rate of convergence up to an approximation error  In statistical estimation problems 𝛼𝑔 𝐲 ⋆ | ℐ can be related to the statistical precision of the estimator 45 th Asilomar Conference, Nov. 2011

Required Conditions Definition (Stable Hessian Property) For 𝑔: ℝ 𝑞 ⟶ ℝ with Hessian 𝐈 𝑔 ⋅ let 𝚬 T 𝐈 𝑔 𝐲 𝚬 𝐵 𝑙 𝐲 ≔ sup supp 𝐲 ⋃supp 𝚬 ≤𝑙 𝚬 2 =1 𝚬 T 𝐈 𝑔 𝐲 𝚬 . 𝐶 𝑙 𝐲 ≔ inf supp 𝐲 ⋃supp 𝚬 ≤𝑙 𝚬 2 =1 Then we say 𝑔 satisfies SHP of order 𝑙 with constant 𝜈 𝑙 if we have 𝐵 𝑙 𝐲 𝐶 𝑙 𝐲 ≤ 𝜈 𝑙 for all 𝑙 -sparse vectors 𝐲 .  SHP basically says that symmetric restrictions of the Hessian are well-conditioned 1 2 as in CS, SHP implies the Restricted Isometry Property  For 𝑔 𝐲 = 2 𝐁𝐲 − 𝐳 2 1 + 𝜀 𝑙 ≤ 𝜈 𝑙 ⇒ 𝜀 𝑙 ≤ 𝜈 𝑙 − 1 𝜈 𝑙 + 1 1 − 𝜀 𝑙 45 th Asilomar Conference, Nov. 2011

Example  Logistic model: 1  𝑧 ∣ 𝐛; 𝐲 ~ Bernoulli( 1+𝑓 − 𝐛,𝐲 ) 𝑜  For iid observation pairs 𝐛 𝑗 , 𝑧 𝑗 write the logistic loss as 𝑗=1 𝑜 ℒ 𝐲 ≔ 1 𝑜 log 1 + 𝑓 𝐛 𝑗 ,𝐲 − 𝑧 𝑗 𝐛 𝑗 , 𝐲 . 𝑗=1  ℓ 𝟑 -regularized logistic regression with sparsity constraint: 𝑔 𝐲 = ℒ 𝐲 + 𝜃 2 arg min 2 𝐲 2 𝐲 subject to 𝐲 0 ≤ 𝑡 . 𝛽 𝑙  We can show 𝜈 𝑙 ≤ 1 + 4𝜃 , where 𝛽 𝑙 = max 𝒧 𝜇 max (𝐁 𝒧 ) subject to 𝒧 ≤ 𝑙. 45 th Asilomar Conference, Nov. 2011

Main Result Revisited If 𝑔 satisfies certain properties then the estimate obtained at the 𝑗 -th iteration of GraSP obeys 𝑗 − 𝐲 ⋆ 2 ≤ 𝜆 𝑗 𝐲 ⋆ 2 + 𝐷 𝛼 𝑔 𝐲 ⋆ 𝐲 , ℐ 2 where ℐ contains the indices of the 3𝑡 largest coordinates of 𝛼𝑔 𝐲 ⋆ in magnitude. Theorem If 𝑔 satisfies SHP of order 𝟓𝒕 with constant 𝝂 𝟓𝒕 < 𝟑 and 𝑪 𝟓𝒕 𝐲 > 𝝑 , then the estimate obtained at the 𝑗 -th iteration of GraSP obeys 2 − 1 𝑗 𝐲 ⋆ 2 + 2 𝜈 4𝑡 + 2 𝑗 − 𝐲 ⋆ 𝛼 𝑔 𝐲 ⋆ 𝐲 2 ≤ 𝜈 4𝑡 , 2 𝜗 2 − 𝜈 4𝑡 ℐ 2 where ℐ contains the indices of the 3𝑡 largest coordinates of 𝛼𝑔 𝐲 ⋆ in magnitude. 45 th Asilomar Conference, Nov. 2011

Summary  Extend CS results to Nonlinear Models and Different Error Measures  ℓ 1 - regularization may not yield sufficiently sparse solutions because of the type of cost functions introduced by nonlinearities in the model  GraSP Algorithm  Greedy method that always gives a sparse solution  Accuracy is guaranteed for the class of functions that satisfy SHP  Linear rate of convergence up to the approximation error  Some interesting problems to study  Deterministic results, e.g., using equivalent of incoherence  Relax SHP to an entirely local condition 45 th Asilomar Conference, Nov. 2011

Greedy Sparsity-Constrained Optimization Sohail Bahmani with Petros - PowerPoint PPT Presentation

Greedy Sparsity-Constrained Optimization Sohail Bahmani with Petros Boufounos and Bhiksha Raj 45 th Asilomar Conference, Nov. 2011 Outline Background Compressed Sensing Problem Formulation Generalizing Compressed Sensing

Greedy On-Line Planning - abstract overview: what is greedy on-line planning? Part 1: - greedy

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

From greedy approximation to greedy optimization Vladimir Temlyakov July, 2014 Vladimir

From greedy approximation to greedy optimization Vladimir Temlyakov December 10, 2013 Vladimir

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

CS 170 Section 4 Greedy Algorithms I Owen Jow | owenjow@berkeley.edu Agenda Greedy

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

Greedy Algorithms Pedro Ribeiro DCC/FCUP 2018/2019 Pedro Ribeiro (DCC/FCUP) Greedy Algorithms

Greedy algorithms Greedy algorithms Find the best solution to a local problem and (hope) it

Greedy Algorithms 1 The main idea of greedy algorithm is look some optimal solution locally

Greedy routing Greedy routing Other variations on greedy criterion Introduce

General remarks Algorithms Algorithms Oliver Oliver Week 8 Kullmann Kullmann Greedy Greedy

Chapter 16: Greedy Algorithms Greedy is a strategy that works well on optimization problems with

MATHEMATICS 1 CONTENTS Unconstrained optimization Constrained optimization Lagrange method

AM 205: lecture 20 Today: PDE optimization, constrained optimization example New topic:

PDE-Constrained Optimization Using Hyper-Reduced Models Matthew J. Zahr and Charbel Farhat

CSSS 569 Visualizing Data and Models Lab 7: Visualizing Spatial Data Kai Ping (Brian) Leung

National Performance Management Research Data Set (NPMRDS) Quarterly Webinar - February 12, 2014

Plotting Pol y gons IN TE R AC TIVE MAP S W ITH L E AFL E T IN R Rich Majer u s Assistant Vice

geotools: Exporting cartography data from Stata to GIS systems Sergiy Radyakin

Prioritized Restreaming Algorithms for Balanced Graph Partitioning Amel Awadelkarim

LuckyCat Organized Cyber-Crime (tracing back to China) Targets in India, Japan and Tibet.

HHGM Insights LUPA Utilization Chris Attaya VP of Product Strategy, SHP HHFMA Call with the

Session 20 Session 20 Tool Time Tuesday Tool Time Tuesday Textbooks; Recording Classes; Zoom