SLIDE 1 Sparsity-inducing dual frames and applications to compressed sensing with frames
Shidong Li
San Francisco State University Jointly with Tiebin Mi, Remin University of China CIMPA 2013 New Trends in Applied Harmonic Analysis Sparse Representations, Compressed Sensing and Multifractal Analysis
August 5-16, 2013 Mar del Plata, ARGENTINA
SLIDE 2 Outline
1
The notion of sparsity-inducing dual frames
2
Sparse duals applied to compressed sensing with frames A performance analysis of the ℓ1-synthesis method Focus: The sparse-dual-based ℓ1-analysis approach
3
The “decoupling” effect of sparse-dual-based-analysis approach
SLIDE 3 Outline
1
The notion of sparsity-inducing dual frames
2
Sparse duals applied to compressed sensing with frames A performance analysis of the ℓ1-synthesis method Focus: The sparse-dual-based ℓ1-analysis approach
3
The “decoupling” effect of sparse-dual-based-analysis approach
SLIDE 4
The notion of sparsity-inducing dual frames Question: Suppose f has a sparse frame representation f = Dx, where x is sparse, is there a dual frame D such that D∗f = x ? Proposition/Definition (The existence of the sparse dual) Suppose D is a non-exact frame. Suppose f has a sparse representation with respect to D, namely, f = Dx. Then there exits a sparsity-inducing dual frame D of D such that DD∗ = I and D∗f = x.
SLIDE 5 All dual frame formula Theorem (The dual frame formula) [Li’95] For every given finite non-exact frame D, the set of all dual frames ˜ D of D is given by ˜ D = (DD∗)−1D + W
(1) where W can be an arbitrary matrix (of the right dimensions). Conclusion: sparsity-inducing dual frames always exist!
SLIDE 6
Basic Properties of Sparse Duals There are many choices of such sparse duals. Sparse duals are signal dependent, but there are sparse duals for classes signals. Sparse duals are stable! Theorem (Stability of D) Let D ∈ Rd×n. Let Tf ⊂ {1, 2, · · · , n} be the sparsity index set of f such that |Tf| = s and D∗f − (D∗f)Tf 1 ≤ ε. Suppose f − g2 ≤ δ, and let Tg be the sparsity index set for g with |Tg| = s. Then D∗g − (D∗g)Tg1 ≤ 2ε + √ n − s( √ B + s)δ. Here B is the upper frame bound of the sparse dual frame D. Motivation of the notion of sparse duals: in the study of compressed sensing when f has a coherent sparse frame representation.
SLIDE 7 Outline
1
The notion of sparsity-inducing dual frames
2
Sparse duals applied to compressed sensing with frames A performance analysis of the ℓ1-synthesis method Focus: The sparse-dual-based ℓ1-analysis approach
3
The “decoupling” effect of sparse-dual-based-analysis approach
SLIDE 8 Compressed sensing: when f has a sparse representation CS observation model: The linear measurement of f by A ∈ Rm×d: y = Af + z, given f = Dx. The task of such CS problems: The task is to recover f from underdetermined measurement y. Example: Pulse Radar, where f is sparse w.r.t a frame D Received and de-modulated signal, up to some discrete approximation, is given by y(t) =
K
xk · g(t − τk) · exp(jωkt). Here, the observation y is a frame expansion, and is naturally sparse w.r.t. the Gabor frame {g(t − τj) · exp(jωkt)}jk.
SLIDE 9 Two classes of signal recovery methods Major approach 1: The ℓ1-synthesis approach Let x be sparse. Determine x via ˜ x = argminx∈Rdx1 s.t. y − ADx2 ≤ ε, (2) Then find f via synthesis operation, i.e., ˜ f = D˜ x. Major approach 2: The ℓ1-analysis approach The analysis approach is to recover f directly: ˜ f = argmin
f∈Rn D∗f1
s.t. y − Af2 ≤ ε. (3) Here D∗ is the “analysis” operator.
SLIDE 10 Issues with the ℓ1-synthesis approach Largely for the lack of good understanding of its performance. Available analysis are (largely) based on the accurate recovery of the coeff. x by, e.g., the RIP condition ∀x ∈ {x : x0 ≤ s}, (1 − γs)x2
2 ≤ Ax2 2 ≤ (1 + γs)x2 2.
When D is coherent, AD hardly satisfies the RIP [Rauhut et al.,2008], ⇒ recovering x is not guaranteed. However, the synthesis approach often recovers f fine, even though coeff x may not be recovered exactly. Hence RIP-based analysis would not be able to explain it. Still have limitations in recovering signals that are NOT TOO sparse.
SLIDE 11 Example where ℓ1-synthesis fails D is a Gabor frame as seen in Pulse-Dopplar Radar applications.
−1 −0.5 0.5 1 1.5 2 2.5 3 −1 −0.5 0.5 1 1.5 2 2.5 3 Real coefficients Coefficients by L1−synthesis L1 synthesis
(a) ℓ1 synthesis
20 40 60 80 100 120 −3 −2 −1 1 2 3 4 Original signal L1 synthesis
(b) ℓ1 synthesis
We’ll come back to address the performance understanding issue + the recoverability issue.
SLIDE 12 Major approach 2: The ℓ1-analysis approach The ℓ1-analysis approach The analysis approach is to recover f directly: ˜ f = argmin
f∈Rn D∗f1
s.t. y − Af2 ≤ ε. (4) Here D is assumed a Parseval frame, and D∗ is the “analysis” operator. The idea is to focus on recovering f, and not to care for the exact coefficient x (since coefficients are not unique any way).
SLIDE 13 An inspiring error bound for ℓ1-analysis Theorem [Candès etc.’2011] Let f be the original signal, and f # be the solution to the ℓ1-analysis approach (4). Then f # − f2 ≤ C0 · ε + C1 · D∗f − (D∗f)s1 √s , (5) provided that A obeys a D-RIP condition with δ2s < 0.08. The D-RIP condition [Candès etc. 2011] The D-RIP constant δs of A is the smallest δs > 0 satisfying (1 − δs)f2
2 ≤ Af2 2 ≤ (1 + δs)f2 2
(6) for all f ∈ Σs, the union of all subspaces spanned by all subsets of s columns of D.
SLIDE 14
An unforgiven reality of the conventional ℓ1-analysis approach The error bound is heavily depend on the decay property of D∗f − (D∗f)s1. Recall f # − f2 ≤ C0 · ε + C1 · D∗f − (D∗f)s1 √s . (7) In particular, if D is a Parseval frame, = ⇒ D∗f often has the worst sparsity property! = ⇒ The choice of the “analysis” operator D∗ is problematic in general!
SLIDE 15 Example where ℓ1-analysis fails D is a Gabor frame as seen in radar applications, same example as seen in the ℓ1-synthesis.
−1 1 2 3 4 −1 −0.5 0.5 1 1.5 2 2.5 3 Real signal Signal reconstructed by L1−analysis L1 analysis
(c) ℓ1 analysis
20 40 60 80 100 120 −3 −2 −1 1 2 3 4 Original signal L1 analysis
(d) ℓ1 analysis
SLIDE 16 What are good choices of analysis operators? Exploring Optimality: Alternative dual-based analysis method Use non-canonical dual frame ˜ D as the analysis operator min
f
˜ D∗f1 s.t. Af − y2 ≤ ε (8) One good property: For any dual frame ˜ D, the error bound similar to the conventional ℓ1 analysis holds
SLIDE 17 Theorem [Liu, Li, Mi 2011] Let D be a frame of Rn with frame bounds 0 < A ≤ B < ∞. Let ˜ D be an alternative dual frame of D with frame bounds 0 < ˜ A ≤ ˜ B < ∞. Suppose
B 2 · δs+a + ρB ˜ B · δb < 1 − 2
B (9) holds for some positive integers a and b satisfying 0 < b − a ≤ 3a. Then the solution ¯ f to (8) satisfies ¯ f − f2 ≤ C0 · ε + C1 · ˜ D∗f − (˜ D∗f)s1 √s , (10) where ρ ≡ s/b.
SLIDE 18 An optimal-dual-based analysis operators? An optimal dual-based analysis model Consider a “blind” optimization min
D ˜ D∗=I, f∈Rn ˜
D∗f1 s.t. y − Af2 ≤ ε, (11) where the optimization is over both the class of all dual frames ˜ D of D and the feasible signal set f. This has the potential to give rise to better results, because the
Do is chosen to minimize the ˜ D∗f1 among all dual frames ˜ D. It is not hard to see that this optimal-dual-based analysis problem is equivalent to the synthesis procedure: min
x∈Rd x1 s.t. y − ADx2 ≤ ε,
(12) min
D ˜ D∗=I, f∈Rn ˜
D∗f1 s.t. y − Af2 ≤ ε, (13)
SLIDE 19 A deeper understanding of the synthesis approach Consequently, there is a better performance analysis of the synthesis procedure! Theorem [Li, Liu, Mi 2012] Let ˜ Do be the optimal dual of D in the sense of (13). Then under some D-RIP condition of A, the solution ¯ f to the ℓ1-synthesis approach satisfies ¯ f − f2 ≤ C0 · ε + C1 · ˜ D∗
D∗
√s . (14) This provides a deeper understanding of the synthesis method: fine approximations of f. But x is off by a large margin. Here the error bound does not depend on the accuracy of the recovered coefficients x, but the decaying property of |˜ D∗
The old RIP analysis results will NOT be able to explain these fine approximations, b/c RIP places the focus on the exact recovery of
SLIDE 20 The Pulse-Dopplar example: highly coherent
500 1000 1500 2000 2500 3000 3500 −1.5 −1 −0.5 0.5 1 (a) Coefficient Domain True coefficients Recovered by ℓ1-synthesis
(e) ¯
x−x2 x2
= 0.9039
20 40 60 80 100 120 −0.5 0.5 Signal Domain − Real Part of Signal True signal Recovered by ℓ1-analysis Recovered by ℓ1-synthesis 20 40 60 80 100 120 −0.5 0.5 (b) Signal Domain − Image Part of Signal
(f) ¯
f−f2 f2
= 0.0845
10 20 30 40 50 60 70 80 90 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (c) ¯ D∗f ˜ D∗
(g) |˜ D∗
∗f| (blue)
SLIDE 21 Synthesis approach is still insufficient
−1 −0.5 0.5 1 1.5 2 2.5 3 −1 −0.5 0.5 1 1.5 2 2.5 3 Real coefficients Coefficients by L1−synthesis L1 synthesis
(h) ℓ1 synthesis
20 40 60 80 100 120 −3 −2 −1 1 2 3 4 Original signal L1 synthesis
(i) ℓ1 synthesis
Need better choice of the analysis operator: The sparsity-inducing dual frame is even better a choice!
SLIDE 22 The sparse-dual-based analysis approach The sparse-dual-based analysis approach The sparse-dual-based analysis method would be: ¯ f = argmin
f∈Rn D∗f0
s.t. y − Af2 ≤ ε. (15)
¯ f = argmin
f∈Rn D∗f1
s.t. y − Af2 ≤ ε. (16) Sparse dual is clearly an ultimate “analysis operator”.
SLIDE 23
The error bound still holds Theorem: Error bound for the sparse-dual-based ℓ1-analysis Let D be a sparse dual frame of D. Then following signal recovery error bound estimate also holds, under some D-RIP conditions of A. ¯ f − f2 ≤ C0 · ε + C1 · D∗f − (D∗f)s1 √s .
SLIDE 24 How is it done: an alternating iterative algorithm The alternating iteration approach
1
Step 0. k = 0: Initial estimate of f0 by ℓ1-synthesis.
2
Step 1. For k = k + 1, using a (fast) thresholding algorithm to find an approximation of the corresponding sparsest coefficient xk, xk = argminxx0 s.t. Dx = fk Then compute ∆xfk = xk − D∗(DD∗)−1fk.
3
Step 2. Using the thresholding algorithm to approximate fk+1 = argminf D∗(DD∗)−1f + ∆xfk 0 s.t. Af − y2 ≤ ε.
4
Repeat step 2, until fk+1 − fk2 is sufficiently small.
SLIDE 25 Performance comparison: Scatter plot
−1 1 2 3 4 −1 −0.5 0.5 1 1.5 2 2.5 3 Real signal Signal reconstructed by L1−analysis L1 analysis
(j) ℓ1-analysis
−1 −0.5 0.5 1 1.5 2 2.5 3 −1 −0.5 0.5 1 1.5 2 2.5 3 Real coefficients Coefficients by L1−synthesis L1 synthesis
(k) ℓ1-synthesis
−1 −0.5 0.5 1 1.5 2 2.5 3 −1 −0.5 0.5 1 1.5 2 2.5 3 Real coefficients Coefficients by sparse dual analysis Sparse dual analysis
(l) Sparse dual analysis
SLIDE 26 Performance comparison: Function plot
20 40 60 80 100 120 −3 −2 −1 1 2 3 4 Original signal L1 analysis
(m) ℓ1-analysis
20 40 60 80 100 120 −3 −2 −1 1 2 3 4 Original signal L1 synthesis
(n) ℓ1-synthesis
20 40 60 80 100 120 −3 −2 −1 1 2 3 4 Original signal Sparse dual analysis
(o) Sparse dual analysis
SLIDE 27
Sparse Dual in understanding TV regularization Sparse-dual-based ℓ1-analysis approach can also be used to provide an analysis for simple (finite difference) TV regularization. Because a finite difference operator, as an analysis operator, is a Sparsity-inducing Dual Frame of a frame of some piecewise constant nature.
SLIDE 28 Outline
1
The notion of sparsity-inducing dual frames
2
Sparse duals applied to compressed sensing with frames A performance analysis of the ℓ1-synthesis method Focus: The sparse-dual-based ℓ1-analysis approach
3
The “decoupling” effect of sparse-dual-based-analysis approach
SLIDE 29 The “decoupling” effect Suppose f = Dc where c is sparse, and D is a sparse dual. Then for some W c = D∗f = D∗(DD∗)−1f + Pker(D)Wf ≡ x + ∆x, (17) where x ≡ D∗(DD∗)−1f, and ∆x = c − x is the “tail”. Suppose ∆x ∈ ker(D) is fixed. The sparse-dual-based ℓ1-analysis formulation may now become min
f
D∗(DD∗)−1f + ∆x1
Af = b. (18) (18) is equivalent to min
x
x + ∆x1 subj. to ADx = b & x = D∗(DD∗)−1f. (19) Let v = x + ∆x, then ADv = AD(x + ∆x) = ADx + 0 = b. or ADv = b.
SLIDE 30 The “decoupling” effect (cont’d) Meantime, the constraint x = D∗(DD∗)−1f may be written as , with the canonical expression f = Dx,
Applying
- I − D∗(DD∗)−1D
- to v = x + ∆x,
- I − D∗(DD∗)−1D
- v =
- I − D∗(DD∗)−1D
- (x + ∆x) = ∆x.
Consequently, the sparse-dual-based analysis problem has an equivalent “synthesis” form min
v
v1
˜ Av = ˜ b. where ˜ A ≡
I − D∗(DD∗)−1D
b ≡ b ∆x
A opens up some new view about the problem! For instance, the Null Space Property of ˜ A is way different from that of AD !
SLIDE 31
The “decoupling” effect (cont’d) In fact, ker(˜ A) = (kerD)⊥ ∩ ker(AD). If v ∈ (kerD)⊥, then Dv = 0. Hence ker(˜ A) = {v | Dv = 0, Dv ∈ ker(A)} = D∗(DD∗)−1 (kerA) . Somehow, ker(˜ A) decouples ker(A) from ker(AD)!
SLIDE 32 The “decoupling” effect (cont’d) Sparse-dual-added ℓ1 procedure is capable to recover signals having much larger number of non-zero coefficients!
0.95 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 s/m Frequency of exact recovery CVX NST
There are still many more unknowns in this direction... Examining these unknowns are part of the the on-going joint work with Jameson Cahill and Pete Casazza.
SLIDE 33
Thank You! Thank the Organizers!