Joint Blind Deconvolution and Blind Demixing via Nonconvex Optimization
Shuyang Ling
Department of Mathematics, UC Davis
July 19, 2017
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 1 / 28
Joint Blind Deconvolution and Blind Demixing via Nonconvex - - PowerPoint PPT Presentation
Joint Blind Deconvolution and Blind Demixing via Nonconvex Optimization Shuyang Ling Department of Mathematics, UC Davis July 19, 2017 Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 1 / 28 Acknowledgements Research in
Shuyang Ling
Department of Mathematics, UC Davis
July 19, 2017
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 1 / 28
Research in collaboration with: Prof.Xiaodong Li (UC Davis) Prof.Thomas Strohmer (UC Davis) Dr.Ke Wei (UC Davis) This work is sponsored by NSF-DMS and DARPA.
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 2 / 28
(a) Blind deconvolution meets blind demixing: applications in image processing and wireless communication (b) Mathematical models and convex approach (c) A nonconvex optimization approach towards joint blind deconvolution and blind demixing
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 3 / 28
Suppose we observe a function y which consists of the convolution of two unknown functions, the blurring function f and the signal of interest g, plus noise w. How to reconstruct f and g from y? y = f ∗ g + w. It is obviously a highly ill-posed bilinear inverse problem... Much more difficult than ordinary deconvolution...but have important applications in various fields. Solvability? What conditions on f and g make this problem solvable? How? What algorithms shall we use to recover f and g?
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 4 / 28
Let f be the blurring kernel and g be the original image, then y = f ∗ g is the blurred image. Question: how to reconstruct f and g from y
blurred image
blurring kernel
image
noise
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 5 / 28
Suppose there are s users and each of them sends a message xi, which is encoded by C i, to a common receiver. Each encoded message g i = C ixi is convolved with an unknown impulse response function f i.
User 1 User 𝑗 User 𝑡
$ = 𝐷$𝑦$: signal
𝑧 = ∑ 𝑔
3 ∗ 3 + 𝑥 7 38$
𝑔
3: channel
𝑔
$: channel
𝑔
7: channel
𝐹𝑡𝑢𝑗𝑛𝑏𝑢𝑓 (𝑔
$, 𝑦$)
𝐹𝑡𝑢𝑗𝑛𝑏𝑢𝑓 (𝑔
3, 𝑦3)
𝐹𝑡𝑢𝑗𝑛𝑏𝑢𝑓 (𝑔
7, 𝑦7)
Decoder 𝑔
3 ∗ 3
𝑔
$ ∗ $
𝑔
7 ∗ 7
3 = 𝐷3𝑦3: signal 7 = 𝐷7𝑦7: signal
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 6 / 28
Consider the model: y =
s
f i ∗ g i + w. This is even more difficult than blind deconvolution (s = 1), since this is a “mixture” of blind deconvolution problems. It also includes phase retrieval as a special case if s = 1 and ¯ g i = f i.
Each impulse response f i has maximum delay spread K (compact support): f i(n) = 0, for n > K, f i = hi
Let g i := C ixi be the signal xi ∈ CN encoded by C i ∈ CL×N with L > N. We also require C i to be mutually incoherent by imposing randomness.
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 7 / 28
Denote F as the L × L DFT matrix. Let hi ∈ CK be the first K nonzero entries of f i and B be a low-frequency DFT matrix. There holds, ˆ f i = Ff i = Bhi. Let ˆ g i := Aixi where Ai := FC i and xi ∈ CN.
y =
s
diag(Bhi)Aixi + w. Goal: We want to recover (hi, xi)s
i=1 from (y, B, Ai)s i=1.
Remark: The degree of freedom for unknowns: s(K + N); number of constraints: L.
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 8 / 28
Denote F as the L × L DFT matrix. Let hi ∈ CK be the first K nonzero entries of f i and B be a low-frequency DFT matrix. There holds, ˆ f i = Ff i = Bhi. Let ˆ g i := Aixi where Ai := FC i and xi ∈ CN.
y =
s
diag(Bhi)Aixi + w. Goal: We want to recover (hi, xi)s
i=1 from (y, B, Ai)s i=1.
Remark: The degree of freedom for unknowns: s(K + N); number of constraints: L.
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 8 / 28
We may want to try nonlinear least squares approach: min
(hi,xi)
diag(Bhi)Aixi − y
. The objective function is highly nonconvex and more complicated than blind deconvolution (s = 1). Gradient descent might get stuck at local minima. No guarantees for recoverability.
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 9 / 28
We may want to try nonlinear least squares approach: min
(hi,xi)
diag(Bhi)Aixi − y
. The objective function is highly nonconvex and more complicated than blind deconvolution (s = 1). Gradient descent might get stuck at local minima. No guarantees for recoverability.
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 9 / 28
Let ai,l be the l-th column of A∗
i and bl be the l-th column of B∗.
yl =
s
(Bhi)l · (AIxi)l =
s
b∗
l hix∗ i
ai,l. Let X i := hix∗
i and define the linear operator Ai : CK×N → CL as,
Ai(Z) := {b∗
l Zai,l}L l=1 = {
i,l
l=1.
Then, there holds y = s
i=1 Ai(X i) + w.
See [Cand` es-Strohmer-Voroninski 13], [Ahmed-Recht-Romberg, 14].
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 10 / 28
We rewrite y = s
i=1 diag(Bhi)Aixi as
yl =
h1x∗
1
· · · h2x∗
2
· · · . . . . . . ... . . . · · · hsx∗
s
, bla∗
1,l
· · · bla∗
2,l
· · · . . . . . . ... . . . · · · bla∗
s,l
Finding such a rank-s matrix is generally an NP-hard problem.
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 11 / 28
The ground truth is a rank-s block-diagonal matrix. It is natural to recover the solution via solving min
s
Z i∗ subject to
s
Ai(Z i) = y where s
i=1 Z i∗ is the nuclear norm of blkdiag(Z 1, · · · , Z s).
Question: Can we recover {hi0x∗
i0}s i=1 exactly?
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 12 / 28
Assume that Let B ∈ CL×K be a partial DFT matrix with B∗B = I K; Each Ai is a Gaussian random matrix. The SDP relaxation is able to recover {(hi0, xi0)}s
i=1 exactly with
probability at least 1 − O(L−γ). Here the number of measurements L satifies [Ling-Strohmer 15] L ≥ Cγs2(K + µ2
hN) log3 L;
[Jung-Krahmer-St¨
hN)) log3 L
where µ2
h = L max1≤i≤s Bhi02
∞
hi02 .
We can jointly estimate the channels and signals for s users with one simple convex program. SDP is able to recover {(hi, xi)}s
i=1 but it is computationally
expensive. Can we solve this problem simply with gradient descent which also
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 13 / 28
Assume that Let B ∈ CL×K be a partial DFT matrix with B∗B = I K; Each Ai is a Gaussian random matrix. The SDP relaxation is able to recover {(hi0, xi0)}s
i=1 exactly with
probability at least 1 − O(L−γ). Here the number of measurements L satifies [Ling-Strohmer 15] L ≥ Cγs2(K + µ2
hN) log3 L;
[Jung-Krahmer-St¨
hN)) log3 L
where µ2
h = L max1≤i≤s Bhi02
∞
hi02 .
We can jointly estimate the channels and signals for s users with one simple convex program. SDP is able to recover {(hi, xi)}s
i=1 but it is computationally
expensive. Can we solve this problem simply with gradient descent which also
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 13 / 28
An increasing list of nonconvex approaches to various problems in machine learning and signal processing: Phase retrieval: Cand` es, Li, Soltanolkotabi, Chen, Wright, Sun, etc... Matrix completion: Sun, Luo, Montanari, etc... Various problems: Recht, Wainwright, Constantine, etc...
(a) Use spectral method to construct a starting point inside “the basin of attraction”; (b) Run gradient descent method. The key is to build up “the basin of attraction”.
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 14 / 28
The basin of attraction relies on the following three observations.
If the pair (hi0, xi0) is a solution to y = s
i=1 diag(Bhi0)Aixi0, then
so is the pair (αihi0, α−1
i
xi0) for any αi = 0. Thus the blind deconvolution problem always has infinitely many solutions of this type. We can recover (hi0, xi0) only up to a scalar. It is possible that hi ≫ xi (vice versa) while hi · xi is fixed. Hence we define Nd0 to balance hi and xi: Nd0 := {{(hi, xi)}s
i=1 : hi ≤ 2
where di0 = hi0xi0.
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 15 / 28
Our numerical experiments have shown that the algorithm’s performance depends on how much bl (the rows of B) and hi0 are correlated. µ2
h := max 1≤i≤s
LBhi02
∞
hi02 , the smaller µh, the better. Therefore, we introduce the Nµ to control the incoherence: Nµ := {{hi}s
i=1 :
√ LBhi∞ ≤ 4µ
“Incoherence” is not a new idea. In matrix completion, we also require the left and right singular vectors of the ground truth cannot be too “aligned” with those of measurement matrices {bla∗
i,l}1≤l≤L.
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 16 / 28
We define Nε to quantify closeness of {(hi, xi)}s
i=1 to true solution, i.e.,
Nε := {{(hi, xi)}s
i=1 : hix∗ i − hi0x∗ i0F ≤ εdi0}.
We want to find an initial guess close to {(hi0, xi0)}s
i=1.
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 17 / 28
Based on the three observations above, we define the three neighborhoods:
The basin of attraction is the intersection of the following three sets Nd0 ∩ Nµ ∩ Nε: Nd0 := {{(hi, xi)}s
i=1 : hi ≤ 2
Nµ := { {hi}s
i=1 :
√ LBhi∞ ≤ 4
Nε :=
i=1 : hix∗ i − hi0x∗ i0F
di0 ≤ ε, 1 ≤ i ≤ s
predetermined parameter in (0, 1
15].
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 18 / 28
The objective function F consists of two parts: F and G: min
(h,x)
F(h, x)
least squares term
+ G(h, x)
regularization term
where F(h, x) := s
i=1 Ai(hix∗ i ) − y2 and
G(h, x) := ρ
s
hi2 2di
xi2 2di
+
L
G0 L|b∗
l hi|2
8diµ2
Here G0(z) = max{z − 1, 0}2, ρ ≈ d2, d ≈ d0, di ≈ di0 and µ ≥ µh.
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 19 / 28
Note that A∗
i (y) =
A∗
i Ai(hi0x∗ i0)
i Ai(hi0x∗ i0))=hi0x∗ i0
+ A∗
i
j=i
Aj(hj0x∗
j0)
The leading singular vectors of A∗
i (y) can approximate (hi0, xi0).
Step 1: Initialization via spectral method and projection:
1: for i = 1, 2, . . . , s do 2:
Compute A∗
i (y), (since E(A∗ i (y)) = hi0x∗ i0);
3:
(d, ˆ hi0, ˆ xi0) = svds(A∗
i (y));
4:
u(0)
i
:= PNµ(√di ˆ hi0) and v (0)
i
:= √di ˆ xi0.
5: end for
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 20 / 28
Step 2: Gradient descent with constant stepsize η:
1: Initialization: obtain (u(0)
i
, v (0)
i
) via Algorithm 1.
2: for t = 1, 2, . . . , do 3:
for i = 1, 2, . . . , s do
4:
u(t)
i
= u(t−1)
i
− η∇ Fhi(u(t−1), v (t−1))
5:
v (t)
i
= v (t−1)
i
− η∇ Fxi(u(t−1), v (t−1))
6:
end for
7: end for
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 21 / 28
Assume w ∼ CN(0, σ2d2
0/L) and Ai as a complex Gaussian matrix.
There hold: the initial guess (u(0), v (0)) ∈
1 √ 3Nd0
√ 3Nµ
N
2ε 5√sκ ,
s
i=1 u(t) i (v (t) i )∗ − hi0x∗ i02 F ≤ (1 − α)tεd0 + c0
√sA∗(w) with probability at least 1 − L−γ+1 and α = O((s(K + N) log2 L)−1) if L ≥ Cγ(µ2
h + σ2)s2κ4(K + N) log2 L log s/ε2,
where κ = max di0
min di0 .
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 22 / 28
Assume w ∼ CN(0, σ2d2
0/L) and Ai as a complex Gaussian matrix.
There hold: the initial guess (u(0), v (0)) ∈
1 √ 3Nd0
√ 3Nµ
N
2ε 5√sκ ,
s
i=1 u(t) i (v (t) i )∗ − hi0x∗ i02 F ≤ (1 − α)tεd0
+ c0 √sA∗(w)
with probability at least 1 − L−γ+1 and α = O((s(K + N) log2 L)−1) if L ≥ Cγ(µ2
h + σ2)s2κ4(K + N) log2 L log s/ε2,
where κ = max di0
min di0 .
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 23 / 28
The iterates (u(t)
i , v (t) i ) converges linearly to (hi0, xi0):
u(∞)
i
(v (∞)
i
)∗ − hi0x∗
i0F ≤ c0
√sA∗(w) A∗(w) converges to 0 with the rate of O(L−1/2): A∗(w) ≤ C0σd0
L Therefore, (u(∞)
i
, v (∞)
i
) is a consistent estimator of (hi0, xi0). Challenges: s2 is not optimal. The optimal scaling should be L = O(s(K + N)) instead of L = O(s2(K + N)).
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 24 / 28
The iterates (u(t)
i , v (t) i ) converges linearly to (hi0, xi0):
u(∞)
i
(v (∞)
i
)∗ − hi0x∗
i0F ≤ c0
√sA∗(w) A∗(w) converges to 0 with the rate of O(L−1/2): A∗(w) ≤ C0σd0
L Therefore, (u(∞)
i
, v (∞)
i
) is a consistent estimator of (hi0, xi0). Challenges: s2 is not optimal. The optimal scaling should be L = O(s(K + N)) instead of L = O(s2(K + N)).
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 24 / 28
Let each Ai be a complex Gaussian matrix. The number of measurement scales linearly with the number of sources s if K and N are fixed. Approximately, L ≈ 1.5s(K + N) yields exact recovery.
s: 1 to 8 Number of measurements: L, from 100 to 1250 L vs. s, K = N = 50, Regularized GD, Gaussian
1 2 3 4 5 6 7 8 250 500 750 1000 1250 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure: Black: failure; white: success
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 25 / 28
A more practical and useful choice of encoding matrix C i: C i = DiH (i.e., Ai = FDiH) where Di is a diagonal random binary ±1 matrix and H is an L × N deterministic partial Hadamard matrix. With this setting, our approach can demix many users without performing channel estimation.
1 2 3 4 5 6 7 8 9 10 11 12 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
s: number of sources Empirical probability of success Plot: K = N = 50; Hadamard matrix L = 512 L = 786 L = 1024 L = 1280
L ≈ 1.5s(K + N) yields exact recovery.
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 26 / 28
We see that the relative error is linearly correlated with the noise in dB. Approximately, 10 units of increase in SNR leads to the same amount of decrease in relative error (in dB).
5 10 15 20 25 30 35 40 45 50 −60 −50 −40 −30 −20 −10 10
SNR(dB) Average Relative Error of 10 Samples(dB) K = N = 64, s = 6, Gaussian
L = 2s(K+N) L = 4s(K+N) 5 10 15 20 25 30 35 40 45 50 −60 −50 −40 −30 −20 −10 10
SNR(dB) Average Relative Error of 10 Samples(dB) K = N = 64, s = 6, Hadamard
L = 2s(K+N) L = 4s(K+N)
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 27 / 28
Conclusion: The proposed algorithm is arguably the first blind deconvolution/blind demixing algorithm that is numerically efficient, robust against noise and comes with rigorous recovery guarantees under subspace conditions. Open problem: Does similar result hold for other types of Ai? Open problem: what if either hi or xi is sparse? Major open problem in nonconvex optimization: How to remove the s2-dependence for rank-s matrix recovery?
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 28 / 28
Conclusion: The proposed algorithm is arguably the first blind deconvolution/blind demixing algorithm that is numerically efficient, robust against noise and comes with rigorous recovery guarantees under subspace conditions. Open problem: Does similar result hold for other types of Ai? Open problem: what if either hi or xi is sparse? Major open problem in nonconvex optimization: How to remove the s2-dependence for rank-s matrix recovery?
Shuyang Ling (UC Davis) FOCM, Barcelona, 2017 July 19, 2017 28 / 28