One sketch for all: Fast algorithms for compressed sensing
Martin J. Strauss University of Michigan Covers joint work with Anna Gilbert (Michigan), Joel Tropp (Michigan), and Roman Vershynin (UC Davis)
One sketch for all: Fast algorithms for compressed sensing Martin - - PowerPoint PPT Presentation
One sketch for all: Fast algorithms for compressed sensing Martin J. Strauss University of Michigan Covers joint work with Anna Gilbert (Michigan), Joel Tropp (Michigan), and Roman Vershynin (UC Davis) Heavy Hitters/Sparse Recovery Sparse
Martin J. Strauss University of Michigan Covers joint work with Anna Gilbert (Michigan), Joel Tropp (Michigan), and Roman Vershynin (UC Davis)
Sparse Recovery is the idea that noisy sparse signals can be approximately reconstructed efficiently from a small number of nonadaptive linear measurements. Known as “Compress(ed/ive) Sensing,” or the “Heavy Hitters” problem in database.
1
Measurements Signal, s Measurement matrix, Φ ✁ ✁ ✁ ✁ ☛ ❅ ❅ ❘ 5.3 · · · 5.3 = 1 1 1 1 1 1 1 1 · · · · · · · · · · · · · · · · · · · · · · · · 1 1 1 1 1 1 1 1 1 1 1 1 · 5.3 Recover position and coefficient of single spike in signal.
2
✸ 2 spinach sold, 1 spinach returned, 1 kaopectate sold, ...
Linearity of Φ:
3
– Error Goal: Error proportional to the optimal m-term error Resources:
– Time close to output size m d.
4
5
Signal is worst-case, not random. Two possible models for random measurement matrix.
6
We present coin-tossing algorithm. ✟ ✟ ✟ ✟ ✙ ❅ ❅ ❅ ❅ ❘ Coins are flipped. Adversary picks worst signal. ❄ Matrix Φ is fixed. ❅ ❅ ❅ ❘ ✁ ✁ ✁ ✁ ✁ ☛ Algorithm runs
7
We present coin-tossing algorithm. ❄ Coins are flipped. ❄ Matrix Φ is fixed. ❄ Adversary picks worst signal. ❄ Algorithm runs
(probabilistic method).
8
Often unnecessary, but needed for iterative schemes. E.g.
for kaopectate ...
... s2 depends on measurement matrix Φ. No guarantees for Φ on s2. Too costly to have separate Φ per sale. Today: Universal guarantee.
9
10
Previous work achieved two out of three. Ref. Univ. Fast Few meas. technique KM ×
D, CRT
CM∗
comb’l Today
∗restrictions apply 11
Two algorithms, Chaining and HHS.
# meas. Time # out error Chg
m E1 ≤ O(log(m)) Eopt1
12
Two algorithms, Chaining and HHS.
# meas. Time # out error Chg
m E1 ≤ O(log(m)) Eopt1 HHS
E2 ≤ (/√m) Eopt1
13
Two algorithms, Chaining and HHS.
# meas. Time # out error Chg
m E1 ≤ O(log(m)) Eopt1 HHS
E2 ≤ (/√m) Eopt1 3 m E2 ≤ Eopt2 + (/√m) Eopt1 4 E1 ≤ (1 + ) Eopt1 (3) and (4) are gotten by truncating output of HHS.
14
# meas. Time error Failure K-M
poly(m) E2 ≤ (1 + ) Eopt2 “for each” D, C-T O(m log(d)) d(1to3) E2 ≤ (/√m) Eopt1 univ. CM
poly(m) E2 ≤ (/√m) Eopt<1 Det’c Chg
E1 ≤ O(log(m)) Eopt1 univ. HHS
E2 ≤ (/√m) Eopt1 univ.
15
16
– Process several spikes at once – Reduce noise
17
– succeed “for each” signal
– At most exp(m log(d)) configurations of spikes. – Convert “for each” to universal model
18
Each group is defined by a mask: signal: 0.1 5.3 −0.1 0.2 6.8 random mask: 1 1 1 1 1 product: 0.1 5.3 0.2
19
5.6 · · · 0.2 5.5 = 1 1 1 1 1 1 1 1 · · · · · · · · · · · · · · · · · · · · · · · · 1 1 1 1 1 1 1 1 1 1 1 1 · 0.1 5.3 0.2 Recover position and coefficient of single spike, even with noise. (Mask and bit tests combine into measurements.)
20
E.g., m spikes (i, si) at height 1/m; noise1 = 1/20. (For now.)
1
m
21
E.g., m spikes (i, si) at height 1/m; noise1 = 1/20. (For now.)
1
m
Throw d positions into n = O(m) groups, by Φ.
...except with probability e−m. Repeat O(log(d)) times: Recover Ω(m) spikes except with prob e−m log(d).
22
7 9 5 = 1 1 1 1 1 1 · 1 2 3 4 5 6
23
We’ve found (1/4)m spikes.
But...
24
✸ Subtract off large phantom spike ✸ Introduce new (negative) spike (to be found later)
✸ Spike threshold rises from m−1 to 3m
4
−1.
25
Number of spikes: m → (c1 − c2 − c3)m ≈ (3/4)m. Spike threshold increases—delicate analysis.
m log(m)
✸ Lets noise grow from round to round.
26
27
Two algorithms, Chaining and HHS. # meas. Time # out error Chg
m E1 ≤ O(log(m)) Eopt1 HHS
E2 ≤ (/√m) Eopt1 3 m E2 ≤ Eopt2 + (/√m) Eopt1 4 E1 ≤ (1 + ) Eopt1
28
✸ E.g., preprocess with (simplified) Chaining algorithm
✸ Identify fraction of spikes ✸ Estimate values.
problems caused by false positives.
29
Our focus:
(Try all q’s and t’s in a geometric progression.) Remark:
height 1/t is big.)
30
Have: q spikes at 1/t; noise 1. Double hashing:
among (t/q)2. (Some log factors suppressed.)
31
Have: q spikes at 1/t; noise 1. Throw positions into q buckets, by Φ. As in Chaining, except with prob e−q log(d) = d
q
−1,
✸ Thus only O(q) buckets get noise more than 1/q.
32
Have 1 spike at 1/t; noise ν1 ≤ 1/q. Use r = (t/q)2 rows of Bernoulli(q/t). ↓ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1/dq 1/t 1/dq 1/dq 1/dq 1/dq 1/dq
33
Have 1 spike at 1/t; noise ν1 ≤ 1/q. Use r = (t/q)2 rows of Bernoulli(q/t). ↓ 1 1 1 1 1 1 1 1 1/dq 1/t 1/dq 1/dq 1/dq 1/dq 1/dq
34
Have 1 spike at 1/t; noise ν1 ≤ 1/q. Use r = O((t/q)2) rows of Bernoulli(q/t). ↓ 1 1 1 1 1 1 1/dq 1/t 1/dq 1/dq 1/dq 1/dq 1/dq
column.
35
Have 1 spike at 1/t; noise ν1 ≤ 1/q. With prob 1/d3,
Take union bound over d spikes and d matrix columns. For any noise ν1 = 1/q, some row gets average noise, (1/q)/r′ = 1/t. Can recover spike of magnitude 1/t from noise 1/(2t).
36
Number of measurements: q(t/q)2 log(d) = poly(log(d)/)t2/q, for
Note: q/t2 = s2
2 > (m−1/2 Eopt1)2 = 1/m.
So number of measurements is t2/q ≤ m.
37
Re-measure O(m)-sparse vector by matrix with at most O(m) rows:
Matrix generation, first hashing:
Matrix generation, second hashing:
Improvement to m3/2 possible here; bottleneck of m2 in Estimation.
38
Have:
Want:
sA for sA with
sA − sA2 ≤ s − sA2 + m−1/2 s − sA1. Note: Can assume by s − sA2 small, by goodness of identification.
39
A(Φs) (Least squares).
A
ΦA sA
es-Tao, Rudelson-Vershynin.
O(m2) immediate.
40
New compressed sensing/heavy hitter algorithms that get
Chaining material based on paper: Algorithmic Linear Dimension Reduction in the ℓ1 Norm for Sparse Vectors (available from my homepage) HHS material based on paper: One sketch for all: Fast algorithms for compressed sensing (submitted; available soon.) by Gilbert, Strauss, Tropp, Vershynin
41
Optimal error vector Eopt = s − sm is s with m heavy hitters zeroed
Our error vector is E = s − s.
✸ Achievable with “for each” guarantee ✸ Impossible with universal guarantee (Cohen-Dahmen-DeVore, 2006)
related).
42
Defeat Φ by finding s with s ∈ null(Φ).
Today: Universal failure guarantee, with ℓ1 noise.
43
Goal: (Rd, ℓ1) → (Rn, ℓ1), for n d. Impossibility results, in general (Brinkman and Charikar, 2003) Chaining algorithm: (Xd
m, ℓ1) → (Rn, ℓ1),
for n = mpolylogd, and Xd
m ⊆ Rd is m-sparse signals.
n = Θ((m/)2 log(d)).
44
=
AΦs − sA
=
AΦ(s − sA)
≤ O(s − sAK) (Need this!) = O(s − sA2 + m−1/2 s − sA1). We’ll bound
AΦ
A
45
Need to bound
A
Cand` es-Tao, Rudelson-Vershynin:
We show RIC implies bound on ΦK→2.
46
If s is q spikes of (near-)equal size, m ≤ q ≤ 2m, then Φs2 ≤ m−1/2 s1. Suppose Φx2 ≤ m−1/2 Φx1 and Φy2 ≤ m−1/2 Φy1, for x and y disjointly supported. Then Φ(x + y)2 ≤ Φx2 + Φy2 ≤ m−1/2(x1 + y1) = m−1/2 x + y1 ≤ m−1/2 x + yK Combine all groups of size ≥ m this way.
47
If s is q ≤ m spikes of (near-)equal size t, then Φs2 ≤ s2. Do all q = 1, 2, 4, 8, . . . , m and O(log(d)) relevant values of t. Suppose Φx2 ≤ x2 and Φy2 ≤ y2, for x and y disjointly
Φ(x + y)2 ≤ Φx2 + Φy2 ≤ x2 + y2 = √ 2 x + y2 , by Cauchy-Schwarz. Give up factor polylog(d) in this proof. Slicker proof gives no overhead from RIC to K → 2 norm.
48