45th Asilomar Conference, Nov. 2011
Greedy Sparsity-Constrained Optimization Sohail Bahmani with Petros - - PowerPoint PPT Presentation
Greedy Sparsity-Constrained Optimization Sohail Bahmani with Petros - - PowerPoint PPT Presentation
Greedy Sparsity-Constrained Optimization Sohail Bahmani with Petros Boufounos and Bhiksha Raj 45 th Asilomar Conference, Nov. 2011 Outline Background Compressed Sensing Problem Formulation Generalizing Compressed Sensing
45th Asilomar Conference, Nov. 2011
- Background
- Compressed Sensing
- Problem Formulation
- Generalizing Compressed Sensing
- Example
- Prior Work
- GraSP Algorithm
- Main Result
- Required Conditions
- Example: β2-regularized Logistic Regression
Outline
45th Asilomar Conference, Nov. 2011
- Applications:
- Biomedical Imaging, Image Denoising, Image Segmentation, Filter Design, System
Identification, etc
Compressed Sensing (1)
Linear Inverse Problem
Sparse signal π²β β βπ Measurement matrix π β βπΓπ Measurement π³ = ππ²β + π Noise π β βπ Given π³ and π with π βͺ π, estimate π²β
45th Asilomar Conference, Nov. 2011
Compressed Sensing (2)
βπ-minimization arg min
π²
π² 0 (L0) subject to ππ² β π³ 2 β€ π βπ-minimization arg min
π²
π² 1 (L1) subject to ππ² β π³ 2 β€ π βπ-constrained LS arg min
π²
ππ² β π³ 2
2
(C0) subject to π² 0 β€ π‘ βπ-constrained LS arg min
π²
ππ² β π³ 2
2
(C1) subject to π² 1 β€ π
Use β1-norm as a proxy for β0-pseudonorm (Greedy) Approximate Solvers π² 1 = π¦π
π π=1
π² 0 = supp π² = π π¦π β 0
π π=1
(1) Convexify
45th Asilomar Conference, Nov. 2011
- For π π² =
ππ² β π³ 2
2 we get the β0-constrained least squares formulation in CS
- We will see βπ-regularized logistic loss as another example for π π²
- More generally, π π² can be the empirical loss associated with some observations
in statistical estimation problems
Generalizing Compressed Sensing
- Common assumptions in CS
- The relation between the input and response has a linear form: π = ππ² + π
- The error is usually measured in squared error: π π² =
ππ² β π³ 2
2
consider nonlinear relations
- ther measures of fidelity
General Formulation Let π: βπ β β be a cost function. Approximate the solution to π² = arg min
π²
π(π²) subject to π² 0 β€ π‘.
45th Asilomar Conference, Nov. 2011
- Gene selection problem
- Data points π β βπ: Gene expression coefficients obtained from tissue samples
- Labels π§ β 0,1 : Determines healthy (π§ = 0) vs. cancer (π§ = 1) samples
- Observation: π copies of π, π§ namely iid instances
ππ, π§π
π=1 π
- Restriction: Fewer samples than dimensions, i.e., π < π
- Goal: Find π‘ βͺ π entries (i.e., variables) of data points π using which label π§ can
be predicted with least βerrorβ
- MLE
- π§|π has a likelihood function that depends on a π‘-sparse parameter vector π²
- Min. the loss (equivalent to max. joint likelihood) to estimate true parameter π²β
Example
Nonlinearity Empirical loss: π π² =
1 π
βlog π(π² ; ππ, π§π)
π π=1
45th Asilomar Conference, Nov. 2011
- In statistical estimation framework: convex π + β1-regularization
- Kakade et al. [AISTATβ09] : Loss functions from exponential family
- Negahban et al. [NIPSβ09] : M-estimators and βdecomposableβ norms
- Agarwal et al. [NIPSβ10] : Projected Gradient Descent with β1-constraint
- Issue: Sparsity cannot be guaranteed to be optimal, because
- Nonlinearity causes solution-dependent error bounds that can become very large
- β1-regularization is merely a proxy to induce sparsity
- We consider a greedy algorithm for the problem
- Algorithm enforces sparsity directly
- Generally has lower computational complexity
Prior Work
45th Asilomar Conference, Nov. 2011
Algorithm
Gradient Support Pursuit
Input π β and π‘ Output π²
- 0. Initialize
π² = π Repeat
- 1. Compute Gradient
π΄ = πΌπ π²
- 2. Identify Coordinates Ξ© = supp π΄2π‘
- 3. Merge Supports
π° = supp π² βΞ©
- 4. Find Crude Estimate π = arg min
π²
π π² s.t. π²|π°π = π
- 5. Prune
π² = ππ‘ Until Halting Condition Holds
Inspired by the CoSaMP algorithm [Needell & Tropp β09]
π² Ξ© 2 Update π΄ = πΌπ π² 1 supp π² π° 3 π 4 ππ‘ 5 Tractable because π obeys certain conditions
45th Asilomar Conference, Nov. 2011
Theroem
If π satisfies certain properties then the estimate obtained at the π-th iteration of GraSP obeys π² π β π²β
2 β€ ππ π²β 2 + π· πΌ π π²β β 2
, where β contains the indices of the 3π‘ largest coordinates of πΌπ π²β in magnitude.
- For π < 1 (ie., contraction factor) we get linear rate of convergence up to an
approximation error
- In statistical estimation problems πΌπ π²β |β can be related to the statistical
precision of the estimator
Main Result
45th Asilomar Conference, Nov. 2011
Definition (Stable Hessian Property)
For π: βπ βΆ β with Hessian ππ β let π΅π π² β sup
supp π² βsupp π¬ β€π π¬ 2=1
π¬Tππ π² π¬ πΆπ π² β inf
supp π² βsupp π¬ β€π π¬ 2=1
π¬Tππ π² π¬ . Then we say π satisfies SHP of order π with constant ππ if we have π΅π π² πΆπ π² β€ ππ for all π-sparse vectors π².
- SHP basically says that symmetric restrictions of the Hessian are well-conditioned
- For π π² =
1 2 ππ² β π³ 2 2 as in CS, SHP implies the Restricted Isometry Property
1 + ππ 1 β ππ β€ ππ β ππ β€ ππ β 1 ππ + 1
Required Conditions
45th Asilomar Conference, Nov. 2011
- Logistic model:
- π§ β£ π; π² ~ Bernoulli(
1 1+πβ π,π² )
- For iid observation pairs ππ, π§π
π=1 π
write the logistic loss as
β π² β 1 π log 1 + π ππ,π² β π§π ππ, π²
π π=1
.
- βπ-regularized logistic regression with sparsity constraint:
- We can show ππ β€ 1 +
π½π 4π, where
π½π = max
π§ πmax (ππ§) subject to π§ β€ π.
Example
arg min
π²
π π² = β π² + π 2 π² 2
2
subject to π² 0 β€ π‘.
45th Asilomar Conference, Nov. 2011
Main Result Revisited
Theorem
If π satisfies SHP of order ππ with constant πππ < π and πͺππ π² > π, then the estimate obtained at the π-th iteration of GraSP obeys π² π β π²β
2 β€ π4π‘ 2 β 1 π π²β 2 + 2 π4π‘ + 2
π 2 β π4π‘
2
πΌ π π²β
β 2
, where β contains the indices of the 3π‘ largest coordinates of πΌπ π²β in magnitude. If π satisfies certain properties then the estimate obtained at the π-th iteration of GraSP obeys π² π β π²β
2 β€ ππ π²β 2 + π· πΌ π π²β β 2
, where β contains the indices of the 3π‘ largest coordinates of πΌπ π²β in magnitude.
45th Asilomar Conference, Nov. 2011
- Extend CS results to Nonlinear Models and Different Error Measures
- β1-regularization may not yield sufficiently sparse solutions because of the type of
cost functions introduced by nonlinearities in the model
- GraSP Algorithm
- Greedy method that always gives a sparse solution
- Accuracy is guaranteed for the class of functions that satisfy SHP
- Linear rate of convergence up to the approximation error
- Some interesting problems to study
- Deterministic results, e.g., using equivalent of incoherence
- Relax SHP to an entirely local condition