Adaptive Signal Recovery by Convex Optimization Dmitrii Ostrovskii - - PowerPoint PPT Presentation

adaptive signal recovery by convex optimization
SMART_READER_LITE
LIVE PREVIEW

Adaptive Signal Recovery by Convex Optimization Dmitrii Ostrovskii - - PowerPoint PPT Presentation

Adaptive Signal Recovery by Convex Optimization Dmitrii Ostrovskii CWI, Amsterdam 19 April 2018 Signal denoising problem Recover complex signal x = ( x ) , = n , ..., n , from noisy observations y = x + , = n ,


slide-1
SLIDE 1

Adaptive Signal Recovery by Convex Optimization

Dmitrii Ostrovskii CWI, Amsterdam 19 April 2018

slide-2
SLIDE 2

Signal denoising problem

Recover complex signal x = (xτ), τ = −n, ..., n, from noisy observations yτ = xτ + σξτ, τ = −n, ..., n, where ξτ are i.i.d. standard complex Gaussian random variables.

20 40 60 80 100

  • 2
  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2

Signal

20 40 60 80 100

  • 5

5

Observations

  • Assumption: signal has unknown shift-invariant structure.

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 1 / 31

slide-3
SLIDE 3

Preliminaries

  • Finite-dimensional spaces and norms:

Cn(Z) = {x = (xτ)τ∈Z : xτ = 0 whenever |τ| > n} ; ℓp-norms restricted to Cn(Z): xp =

  • |τ|≤n |xτ|p 1

p ;

Scaled ℓp-norms: xn,p = 1 (2n + 1)1/p xp.

  • Loss:

ℓ( x, x) = | x0 − x0| – pointwise loss; ℓ( x, x) = x − xn,2 – ℓ2-loss.

  • Risk:

R( x, x) = [Eℓ( x, x)2]

1 2 ;

Rδ( x, x) = min {r ≥ 0 : ℓ( x, x) ≤ r with probability ≥ 1 − δ}.

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 2 / 31

slide-4
SLIDE 4

Adaptive estimation: disclaimer

Classical approach Given a set X containing x, look for a near-minimax, over X, estimator

  • xo. One can often assume that

xo is linear in y (e.g. for pointwise loss)*. If X is unknown, xo becomes an unavailable linear oracle. Mimic it! Oracle approach Knowing that there exists a linear oracle xo with small risk R( xo, x), construct an adaptive estimator x = x(y) satisfying an oracle inequality: R( x, x) ≤ P · R( xo, x) + Rem, Rem ≪ R( xo, x). x, xo can change but P and Rem must be uniformly bounded over ( xo, x).

  • P = “price of adaptation”. Inequalities with P = 1 are called sharp*.

*[Ibragimov and Khasminskii, 1984; Donoho et al., 1990], *[Tsybakov, 2008]

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 3 / 31

slide-5
SLIDE 5

Classical example: unknown smoothness

Let x be a regularly sampled function: xt = f (t/N), t = −N, ..., N, where f : [−1, 1] → R has weak derivative Dsf of order s ≥ 1 on [−1, 1], and belongs to a Sobolev (q = 2) or H¨

  • lder (q = ∞) smoothness class:*

Fs,L = {f (·) : Dsf Lq ≤ L}.

  • Linear oracle: kernel estimator with properly chosen bandwidth h:
  • f (t/N) =

1 2hN + 1

  • |τ|≤hN

K τ hN

  • yt−τ,

|t| ≤ N − hN.

  • Adaptive bandwidth selection*: Lepski’s method, Stein’s method,

...

*[Adams and Fournier, 2003; Brown et al., 1996; Watson, 1964; Nadaraya, 1964; Tsybakov, 2008; Johnstone, 2011], *[Lepski, 1991; Lepski et al., 1997, 2015; Goldenshluger et al., 2011]

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 4 / 31

slide-6
SLIDE 6

Recoverable signals

  • We consider convolution-type (or time-invariant) estimators
  • xt = [ϕ ∗ y]t :=
  • τ∈Z

ϕτyt−τ, where ∗ is discrete convolution, and ϕ ∈ Cn(Z) is called a filter. Definition* A signal x is (n, ρ)-recoverable if there exists φo ∈ Cn(Z) which satisfies

  • E |xt − [φo ∗ y]t|21/2

≤ σρ √2n + 1, |t| ≤ 3n.

  • Consequence: small ℓ2-risk: [E x − φo ∗ y2

3n,2]1/2 ≤ σρ √2n+1.

  • 4n

4n

  • 3n

3n

  • 2n

2n

  • n

n

*[Juditsky and Nemirovski, 2009; Nemirovski, 1991; Goldenshluger and Nemirovski, 1997]

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 5 / 31

slide-7
SLIDE 7

Adaptive signal recovery: main questions

Goal Assuming that x is (n, ρ)-recoverable, construct an adaptive filter

  • ϕ =

ϕ(y) such that the pointwise or ℓ2-risk of x = ϕ ∗ y is close to

σρ √2n+1.

Main questions:

  • Can we adapt to the oracle?

Yes, but we must pay the price polynomial in ρ;

  • Can

ϕ be efficiently computed? Yes, by solving a well-structured convex optimization problem.

  • Do recoverable signals with small ρ exist?

Yes: when the signal belongs to shift-invariant subspace S ⊂ C(Z), dim(S) = s, we have “nice” bounds on ρ = ρ(s).

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 6 / 31

slide-8
SLIDE 8

Adaptive estimators and their analysis

slide-9
SLIDE 9

Main idea

  • “Bias-variance decomposition”

xt − [φo ∗ y]t

  • total error

= xt − [φo ∗ x]t

  • bias

+ σ[φo ∗ ξ]t

  • stochastic error

.

  • (n, ρ)-recoverability implies

|xt − [φo ∗ x]t| ≤ σρ √2n + 1, |t| ≤ 3n, and φo2 ≤ ρ √2n + 1.

  • Unitary Discrete Fourier transform operator Fn : Cn(Z) → Cn(Z).

Look at the Fourier transforms Estimate x via x = ϕ ∗ y, where ϕ = ϕ(y) ∈ C2n(Z) minimizes the Fourier-domain residual F2n[y − ϕ ∗ y]p while keeping F2n[ϕ]1 small.

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 7 / 31

slide-10
SLIDE 10

Motivation: new oracle

Oracle with small ℓ1-norm of DFT* If x is (n, ρ)-recoverable, then there exists a ϕo ∈ C2n(Z) s.t. for R= 2ρ2, |xt − [ϕo ∗ x]t| ≤ CσR √4n + 1, |t| ≤ 2n, F2n[ϕo]1 ≤ R √4n + 1.

  • Proof. 1o. Consider ϕo = φo ∗ φo ∈ C2n(Z). On one hand, for |t| ≤ 2n,

|xt − [ϕo ∗ x]t| = |xt − [φo ∗ x]t| + |[φo ∗ (x − φo ∗ x)]t| ≤ (1 + φo1) max

|τ|≤3n |xτ − [φo ∗ x]τ| ≤ σρ(1 + ρ)

√2n + 1 .

  • 2o. On the other hand, we get

F2n[ϕo]1 = 4n + 1 √4n + 1F2n[φo]2

2 =

√ 4n + 1Fn[φo]2

2 ≤

2ρ2 √4n + 1.

  • *[Juditsky and Nemirovski, 2009]

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 8 / 31

slide-11
SLIDE 11

Uniform-fit estimators

  • Constrained uniform-fit estimator*:
  • ϕ ∈ Argmin

ϕ∈Cn(Z)

  • Fn[y − ϕ ∗ y]∞ : Fn[ϕ]1 ≤

R √2n + 1

  • . (CUF)
  • Penalized estimator: for some λ ≥ 0,
  • ϕ ∈ Argmin

ϕ∈Cn(Z)

  • Fn[y − ϕ ∗ y]∞ + σλ

√ 2n + 1Fn[ϕ]1

  • .

(PUF) Pointwise upper bound for uniform-fit estimators Let x be (⌈ n

2⌉, ρ)-recoverable. Let R = 2ρ2 for the constrained estimator,

and λ = 2

  • log[(2n + 1)/δ] for the penalized one, then w.p. ≥ 1 − δ,

|x0 − [ ϕ ∗ y]0| ≤ Cσρ4 log[(2n + 1)/δ] √2n + 1 . High price of adaptation: O(ρ3√log n).

*[Juditsky and Nemirovski, 2009]

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 9 / 31

slide-12
SLIDE 12

Analysis of uniform-fit estimators

Let ϕ be an optimal solution to (CUF) with R = R, and let Θn(ζ) = Fn[ζ]∞ = O(

  • log n)

w.h.p.

  • 1o. Already in the first step, we see why the new oracle is useful:

|[x − ϕ ∗ y]0| ≤ σ|[ ϕ ∗ ζ]0| + |[x − ϕ ∗ x]0| ≤ σFn[ ϕ]1Fn[ζ]∞ + |[x − ϕ ∗ x]0| [Young’s ineq.] ≤ σΘn(ζ)R √2n + 1 + |[x − ϕ ∗ x]0|. [Feasibility of ϕ]

  • 2o. To control |[x −

ϕ ∗ x]0|, we can add & subtract convolution with ϕo: |x0 − [ ϕ ∗ x]0| ≤ |[ϕo ∗ (x − ϕ ∗ x)]0| + |[(1 − ϕ) ∗ (x − ϕo ∗ x)]0| ≤ Fn[ϕo]1Fn[x − ϕ ∗ x]∞ + (1 + ϕ1)[x − ϕo ∗ x]∞ ≤ R √2n + 1Fn[x − ϕ ∗ x]∞ + CR(1 + R) √2n + 1 .

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 10 / 31

slide-13
SLIDE 13

Analysis of uniform-fit estimators, cont.

  • 3o. It remains to control Fn[x −

ϕ ∗ x]∞ which can be done as follows: Fn[x − ϕ ∗ x]∞ ≤ Fn[y − ϕ ∗ y]∞ + σFn[ζ − ϕ ∗ ζ]∞ ≤ Fn[y − ϕ ∗ y]∞ + σ(1 + ϕ1)Θn(ζ) ≤ Fn[y − ϕo ∗ y]∞ + σ(1 + ϕ1)Θn(ζ) [Feas. of ϕo] ≤ Fn[x − ϕo ∗ x]∞ + 2σ(1 + R)Θn(ζ).

  • 4o. Finally, note that

Fn[x − ϕo ∗ x]∞ ≤ Fn[x − ϕo ∗ x]2 = [x − ϕo ∗ x]2 [Parseval’s identity] ≤ √ 2n + 1 x − ϕo ∗ x∞ ≤ σCR. Collecting the above, we obtain a bound dominated by σCR(1+R)Θn(ζ)

√2n+1

.

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 11 / 31

slide-14
SLIDE 14

Limit of performance

Proposition: pointwise lower bound For any integer n ≥ 2, α < 1/4, and ρ satisfying 1 ≤ ρ ≤ nα, one can point out a family of signals Xn,ρ ∈ C2n(Z) such that

  • any signal in Xn,ρ is (n, ρ)-recoverable;
  • for any estimate

x0 of x0 from observations y ∈ C2n(Z), one can find x ∈ Xn,ρ satisfying P

  • |x0 −

x0| ≥ cσρ2 (1 − 4α) log n √2n + 1

  • ≥ 1/8.

Conclusion: there is a gap ρ2 between upper and lower bounds.

  • To bridge it (and encompass ℓ2-loss), we introduce new estimators.

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 12 / 31

slide-15
SLIDE 15

Least-squares estimators

  • Constrained formulation:
  • ϕ ∈ Argmin

ϕ∈Cn(Z)

  • Fn[y − ϕ ∗ y]2 : Fn[ϕ]1 ≤

R √2n + 1

  • ;

(CLS)

  • Penalized formulations: ....

For the analysis, we have to restrict the set of signals, introducing shift-invariant subspaces (s.-i.s.)

  • Definition. A linear subspace S ⊆ C∞(Z) is called shift-invariant if it is

an invariant subspace of the unit lag operator [∆x]t = xt−1.

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 13 / 31

slide-16
SLIDE 16

Oracle inequality for ℓ2-loss

Theorem: sharp ℓ2-oracle inequality for least-squares estimators Suppose that x belongs to some s.-i.s. S, and let ϕo be feasible in (CLS): Fn[ϕo]1 ≤ R √2n + 1. For any δ ∈ (0, 1], an optimal solution ϕ to (CLS) w.p. ≥ 1 − δ satisfies x − ϕ ∗ yn,2 ≤ x − ϕo ∗ yn,2 + Cσ √2n + 1

  • R log

2n + 1 δ

  • + dim(S).
  • Consequence. Suppose that x is (⌈ n

2⌉, ρ)-recoverable, and let R = 2ρ2.

Then, ϕo = φo ∗ φo satisfies x − ϕo ∗ yn,2 = O

  • σρ2

√2n+1

  • , whence

x − ϕ ∗ yn,2 = O

  • σ
  • ρ2 + ρ√log n +
  • dim(S)
  • √2n + 1
  • .

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 14 / 31

slide-17
SLIDE 17

Sketch of the proof of ℓ2-oracle inequality

slide-18
SLIDE 18

Control of the cross-term

  • ϕ ∈ Argmin

ϕ∈Cn(Z)

  • y − ϕ ∗ y2

2 : Fn[ϕ]1 ≤

R √2n + 1

  • .
  • ϕo is feasible, so that y −

ϕ ∗ y2

2 ≤ y − ϕo ∗ y2 2.

  • Expand the squares:

x − ϕ ∗ y2

2 = x − ϕo ∗ y2 2 + 2σ2Reξ,

ϕ ∗ ξ + [...]

  • Heuristic: replace convolution in ξ,

ϕ ∗ ξn with the cyclic one ⊛: ξ, ϕ ⊛ ξ = Fn[ξ], Fn[ ϕ ⊛ ξ] [Parseval] = √ 2n + 1Fn[ξ], Fn[ ϕ] ⊙ Fn[ξ] [Diagonalization] ≤ √ 2n + 1Fn[ξ]2

∞Fn[

ϕ]1 [Young] ≤ CR log 2n + 1 δ

  • with probability at least 1 − δ.
  • Rigorous argument: represent ξ, ϕ ∗ ξn as a random process

indexed by ϕ, and control its maximum on ℓ1-ball.

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 15 / 31

slide-19
SLIDE 19

Error decomposition

x − ϕ ∗ y2

2 ≤ x − ϕo ∗ y2 2 + 2σReξ, x − ϕo ∗ y

− 2σReξ, x − ϕ ∗ y.

  • ϕ-cross-term poses the main difficulty. It can be decomposed as:

ξ, x − ϕ ∗ y = ΠSξ, x − ϕ ∗ y + σΠ⊥

S ξ,

ϕ ∗ ξ + ξ, Π⊥

S [x −

ϕ ∗ x], where ΠS is the projector onto S.

  • For the first term, we use Cauchy-Schwarz + χ2-deviation bound:

Re ΠSξ, x − ϕ ∗ y ≤ x − ϕ ∗ y2

  • 2 dim(S) +
  • 2 log

1 δ

  • .
  • The second term Π⊥

S ξ,

ϕ ∗ ξ is bounded similarly to ξ, ϕ ∗ ξ.

  • The third term vanishes due to the shift-invariance of S:

Π⊥

S [x −

ϕ ∗ x] ≡ [Π⊥

S x] −

ϕ ∗ [Π⊥

S x] ≡ 0.

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 16 / 31

slide-20
SLIDE 20

Summary

We summarize the risk multiplier for

σ √2n+1 (up to a constant factor):

Pointwise loss ℓ2-loss Oracle ρ ρ (Adaptive) lower bound ρ2√log n ρ√log n ∗ (Adaptive) upper bound ρ4√log n ρ2 + ρ√log n +

  • dim(S)

In fact, one can also control the pointwise loss for least-squares estimators, so that ρ4√log n can be replaced with ρ3 + ρ2√log n + ρ

  • dim(S).

∗Obtained via a simple argument from the corresponding pointwise bound. Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 17 / 31

slide-21
SLIDE 21

Application: Recovery from an unknown shift-invariant subspace

slide-22
SLIDE 22

Shift-invariant subspaces

Assume that x ∈ S ⊂ C∞(Z), a shift-invariant subspace with dim(S) = s. Equivalent formulations:

  • x satisfies a homogeneous difference equation of order s = dim(S),

[P(∆)x]t ≡ 0, t ∈ Z, where ∆ : [∆x]t = xt−1 is the lag operator, and P(z) is a polynomial with deg(P) = s.

  • x is an exponential polynomial of order s: for some r ≤ s,

xt =

r

  • k=1

qk(t)eλkt, λk ∈ C, where deg(qk) − 1 is the multiplicity of the root zk = eλk of P(z). Unknown shift-invariant structure of x is encoded by S, or equivalently, P.

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 18 / 31

slide-23
SLIDE 23

Recoverability for shift-invariant subspaces

Signals from shift-invariant subspaces admit oracle filters with ρ = ρ(s). Theorem Let x ∈ S where S is a shift-invariant subspace, dim(S) = s. Then, for any n ≥ s there exists a filter φo ∈ Cn(Z) which satisfies xt − [φo ∗ x]t≡0 and φo2 ≤

  • s

2n + 1.

  • Lower bound ρ(s) = Ω(√s) for φo = φo(S) from parametric theory.
  • The result can be extended to signals close to S in · p-norm,

encompassing general differential inequalities*: P(D)f Lp ≤ L, deg(P) ≤ s.

*[Juditsky and Nemirovski, 2010]

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 19 / 31

slide-24
SLIDE 24

Recoverability for shift-invariant subspaces (cont.)

One-sided filters: φo ∈ C+

n (Z) = {ϕ ∈ Cn(Z) : ϕτ = 0 for τ < 0}.

  • In this case, we consider “generalized harmonic oscillations”:

xt =

r≤s

  • k=1

qk(t)eiωkt, ωk ∈ [0, 2π). We improve over the state-of-the art bound* φo2 ≤

  • Cs3 log(s + 1)

n + 1 : Theorem Under the premise of the previous theorem, there exists φo ∈ C+

n (Z):

xt − [φo ∗ x]t ≡ 0 and φo2 ≤

  • Cs2 log(ns + 1)

n + 1 .

*[Juditsky and Nemirovski, 2013]

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 20 / 31

slide-25
SLIDE 25

Recovery in ℓ2-loss on the whole domain

Goal: recover an ordinary harmonic oscillation on the whole [−n, n]: xτ =

s

  • k=1

Ckeiωkτ, ωk ∈ [0, 2π).

  • Atomic Soft Thresholding*: requires frequency separation by

2π 2n+1.

  • One-sided recovery: ℓ2-oracle inequality + one-sided oracles.
  • Two-zone recovery: ℓ2-oracle inequality + two-sided oracle in the

center + one-sided oracles in the border zones of size n/(s log n). Arbitrary frequencies Separated frequencies AST O(n−1/4) – slow rate

σ √n · (s log n)1/2 – optimal

One-sided recovery

σ √n · s2 log n σ √n · [s + (s log n)1/2]

Two-zone recovery

σ √n · s3/2 log n σ √n · [s + (s log n)1/2] *[Bhaskar et al., 2013; Tang et al., 2013]

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 21 / 31

slide-26
SLIDE 26

Algorithmic implementation

slide-27
SLIDE 27

Optimization problem

min

ϕ∈Φ(r) {F(ϕ) + Pen(ϕ)} ,

where F(ϕ) = Fn[y − y ∗ ϕ]∞ for uniform-fit recovery, Fn[y − y ∗ ϕ]2

2

for least-squares recovery, Pen(ϕ) := µFn[ϕ]1, and Φ(r) := {ϕ ∈ Cn(Z) : Fn[ϕ]1 ≤ r} .

  • Simple constraint / penalization after changing variables to Fn[ϕ].
  • Large scale: n up to 104 in signal processing and 106-109 in imaging.
  • (Sub-)gradient of F(ϕ) in O(n log n) via FFT and elementwise ops.
  • Low accuracy: approximate solutions with medium accuracy in the
  • bjective are sufficient (more precisely later).

First-order proximal methods

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 22 / 31

slide-28
SLIDE 28

Strategies

Least-squares recovery min

ϕ∈Φ(r)

  • Fn[y − y ∗ ϕ]2

2 + Pen(ϕ)

  • .
  • Composite objective with Lipschitz continuous gradient ∇F(ϕ);
  • Nesterov’s Fast Gradient Method, O(1/k2) convergence.

Uniform-fit recovery min

ϕ∈Φ(r) {Fn[y − y ∗ ϕ]∞ + Pen(ϕ)}

= min

ϕ∈Φ(r) max ψ∈Φ(1) {ψ, y − y ∗ ϕ + Pen(ϕ)} .

  • Convex-concave saddle-point problem, smooth part is bilinear;
  • Composite Mirror Prox, O(1/k) convergence.
  • Non-Euclidean prox (ℓ1/ℓ2-norm), accuracy certificates, adaptive stepsize.

[Nesterov and Nemirovski, 2013; Juditsky and Nemirovski, 2011a,b; Nemirovski et al., 2010]

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 23 / 31

slide-29
SLIDE 29

Convergence

Constrained uniform-fit (Mirror Prox)

1 101 102

Absolute accuracy

10-2 10-1 100 101 CMP-`2 CMP-`2-Gap

Constrained least-squares (Fast Gradient Method)

1 101 102 10-4 10-3 10-2 10-1 100 101 102 103 FGM-`2 FGM-`2-Gap

Convergence of the residual (95% upper confidence bound) for harmonic

  • scillations with s = 4 random frequencies, observed with SNR = 4.

Dashed: online accuracy bounds via the accuracy certificate technique.

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 24 / 31

slide-30
SLIDE 30

Statistical accuracy: theoretical results

  • Recalling the statistical analysis of the adaptive estimators, we get:

Theorem Approximate solutions ˜ ϕ with objective accuracy ε∗ = σρ2 for uniform fit,

  • r ε∗ = σ2ρ4 for least-squares fit, admit the same statistical guarantees as

the exact solutions (up to a constant).

  • Combining this with the usual guarantees for CMP and FGM, ...

Corollary To reach the threshold accuracy ε∗, in each case it is sufficient to perform T∗ = O(Fn[y]∞/σ) iterations of the suitable first-order algorithm (CMP or FGM).

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 25 / 31

slide-31
SLIDE 31

Statistical accuracy: early stopping experiment

SNR!1

0.06 0.12 0.25 0.5 1 2 4

`2-error

0.025 0.05 10-1 0.25 0.5 100

Lasso Coarse Fine SNR!1

0.06 0.12 0.25 0.5 1 2 4

CPU time (s)

10-3 10-2 10-1 100 101

Lasso Coarse Fine SNR!1

0.06 0.12 0.25 0.5 1 2 4

`2-error

0.025 0.05 10-1 0.25 0.5 100

Lasso Coarse Fine SNR!1

0.06 0.12 0.25 0.5 1 2 4

CPU time (s)

10-3 10-2 10-1 100 101

Lasso Coarse Fine

  • Figure. Comparison of (CLS) with an σρ2-accurate solution (Coarse),

0.01σρ2-accurate solution (Fine), and the oversampled Lasso estimator∗. Two signal generation scenarios are compared: 4 random frequencies on [0, 2π] (left) and 2 random pairs of 0.2π

n -close frequencies (right). Bhaskar et al. [2013]

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 26 / 31

slide-32
SLIDE 32

Statistical accuracy: T∗ experiment

SNR

10-2 100 102

T$

100 101 102 CMP-`2

SNR

10-2 100 102

T$

100 101 102 FGM-`2

  • Figure. Iteration at which accuracy ε∗ is attained experimentally

for (CUF), left, and (CLS), right (signal with 4 random frequencies).

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 27 / 31

slide-33
SLIDE 33

Constrained least-squares: phase transition

Constrained least-squares can be recast as (non-squared) ℓ2-minimization: min

ϕ∈Φ(r) Res2(ϕ) := Fn[y − y ∗ ϕ]2.

Objective is non-smooth but can be minimized at rate O(1/k2) by FGM:

  • Indeed, after k iterations of FGM applied to the “squared” problem,

Res2

2(ϕk) − Res2 2( ˜

ϕ∗) ≤ Q k2 , where ϕ∗ is any minimizer of Res2

2(·) on Φ(r), and Q is a constant.

  • Since t → t2 is monotone on t ≥ 0, ϕ∗ also minimizes Res2(·).
  • By the difference-of-squares formula,

Res2( ˜ ϕk) − Res2(ϕ∗) ≤ Q (Res2( ˜ ϕk) + Res2(ϕ∗))k2 ≤ Q 2Res2(ϕ∗)k2 (Note that this requires the “non-ideal” fit: Res2(ϕ∗) > 0.)

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 28 / 31

slide-34
SLIDE 34

Constrained least-squares: phase transition (cont.)

  • We also have the usual O(1/k) rate as in “Nesterov’s smoothing”:

Res2( ˜ ϕk) ≤

  • Res2

2(ϕ∗) + Q

k2 ≤ Res2(ϕ∗) + √Q k .

  • To summarize,

Res2( ˜ ϕk) − Res2(ϕ∗) ≤ min √Q k , Q 2Res2(ϕ∗)k2

  • ,

i.e. there is an “elbow” at k ≈

√Q 2Res2(ϕ∗). Confirmed empirically:

1 101 102 10-4 10-2 100 102 CMP-`2 FGM-`2

  • Figure. Relative accuracy vs. iteration for (CLS) with non-squared residual

solved with Mirror Prox and FGM (2 pairs of close frequencies, SNR = 4).

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 29 / 31

slide-35
SLIDE 35

Conclusions and perspectives

slide-36
SLIDE 36

Conclusions

  • We construct adaptive estimators for signals with unknown

shift-invariant structure;

  • We prove statistical bounds on the pointwise and ℓ2-loss of the new

estimators, and compare them with lower bounds.

  • We provide efficient algorithmic implementation for the estimators.
  • As an application, we address the problem of signal recovery from a

shift-invariant subspace without frequency separation assumptions.

Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 30 / 31

slide-37
SLIDE 37

Perspectives

  • GPU implementation: gradient computations are reduced to FFT.
  • Generalization to indirect observations:

yτ = [a ∗ x]τ + σξτ, where a ∈ Cm(Z) is a known filter. Applications: inverse PDEs1, fluorescence microscopy2, exoplanet detection3, ...

  • Challenge: adaptation to the “mutual coherency” of a and x.
  • Signal recovery on graphs4: other domains than Z.

Applications: social network analysis, sensor networks,... Challenge: no FFT, difficult to work in the Fourier domain.

1[Cavalier et al., 2002], 2[Waters, 2009; Bissantz et al., 2015], 3[Fischer et al., 2015; Kim et al.,

2017],

4[Sandryhaila and Moura, 2013] Dmitrii Ostrovskii Adaptive Signal Recovery by Convex Optimization 31 / 31

slide-38
SLIDE 38

Thank you for your attention!

Publications and preprints:

  • Z. Harchaoui, A. Juditsky, A. Nemirovski, D.O.

Adaptive Signal Recovery by Convex Optimization. COLT 2015.

  • D.O., Z. Harchaoui, A. Juditsky, A. Nemirovski.

Structure-Blind Signal Recovery. NIPS 2016. extended version: arXiv:1607.05712.

  • D.O, Z. Harchaoui.

Efficient First-Order Algorithms for Adaptive Signal Denoising. Submitted to ICML 2018.

  • D.O, Z. Harchaoui, A. Juditsky, A. Nemirovski.

Adaptive Signal Recovery: an Overview. In preparation.

  • D.O, Z. Harchaoui, A. Juditsky, A. Nemirovski.

Adaptive Signal Deconvolution by Convex Optimization. In preparation.

slide-39
SLIDE 39

References I

Adams, R. A. and Fournier, J. J. (2003). Sobolev spaces, volume 140. Academic press. Bhaskar, B., Tang, G., and Recht, B. (2013). Atomic norm denoising with applications to line spectral estimation. IEEE Trans. Signal Processing, 61(23):5987–5999. Bickel, P., Ritov, Y., and Tsybakov, A. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist., 37(4):1705–1732. Bissantz, K., Bissantz, N., and Proksch, K. (2015). Monitoring of Significant Changes Over Time in Fluorescence Microscopy Imaging of Living Cells. Universit¨ atsbibliothek Dortmund. Brown, L. D., Low, M. G., et al. (1996). Asymptotic equivalence of nonparametric regression and white noise. The Annals of Statistics, 24(6):2384–2398. B¨ uhlmann, P. and Van De Geer, S. (2011). Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media.

slide-40
SLIDE 40

References II

Cavalier, L., Golubev, G., Picard, D., Tsybakov, A., et al. (2002). Oracle inequalities for inverse problems. The Annals of Statistics, 30(3):843–874. Donoho, D. L., Liu, R. C., and MacGibbon, B. (1990). Minimax risk over hyperrectangles, and implications. The Annals of Statistics, pages 1416–1437. Fischer, D. A., Howard, A. W., Laughlin, G. P., Macintosh, B., Mahadevan, S., Sahlmann, J., and Yee, J. C. (2015). Exoplanet detection techniques. arXiv preprint arXiv:1505.06869. Goldenshluger, A., Lepski, O., et al. (2011). Bandwidth selection in kernel density estimation: oracle inequalities and adaptive minimax optimality. The Annals of Statistics, 39(3):1608–1632. Goldenshluger, A. and Nemirovski, A. (1997). Adaptive de-noising of signals satisfying differential inequalities. IEEE Transactions on Information Theory, 43(3):872–889. Ibragimov, I. and Khasminskii, R. (1984). Nonparametric estimation of the value

  • f a linear functional in gaussian white noise. Theor. Probab. & Appl.,

29:1–32. Johnstone, I. (2011). Gaussian estimation: sequence and multiresolution models.

slide-41
SLIDE 41

References III

Juditsky, A. and Nemirovski, A. (2009). Nonparametric denoising of signals with unknown local structure, I: Oracle inequalities. Appl. & Comput. Harmon. Anal., 27(2):157–179. Juditsky, A. and Nemirovski, A. (2010). Nonparametric denoising signals of unknown local structure, II: Nonparametric function recovery. Appl. &

  • Comput. Harmon. Anal., 29(3):354–367.

Juditsky, A. and Nemirovski, A. (2011a). First-order methods for nonsmooth convex large-scale optimization, I: general purpose methods. Optimization for Machine Learning, pages 121–148. Juditsky, A. and Nemirovski, A. (2011b). First-order methods for nonsmooth convex large-scale optimization, II: utilizing problem structure. Optimization for Machine Learning, pages 149–183. Juditsky, A. and Nemirovski, A. (2013). On detecting harmonic oscillations. Bernoulli, 23(2):1134–1165. Juditsky, A. and Nemirovski, A. (2017). Near-optimality of linear recovery from indirect observations. arXiv preprint arXiv:1704.00835.

slide-42
SLIDE 42

References IV

Kim, T. H., Lee, K. M., Sch¨

  • lkopf, B., and Hirsch, M. (2017). Online video

deblurring via dynamic temporal blending network. In IEEE International Conference on Computer Vision (ICCV 2017). Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Ann. Statist., 28(5):1302–1338. Lepski, O. (1991). On a problem of adaptive estimation in Gaussian white noise. Theory of Probability & Its Applications, 35(3):454–466. Lepski, O. et al. (2015). Adaptive estimation over anisotropic functional classes via oracle approach. The Annals of Statistics, 43(3):1178–1242. Lepski, O., Mammen, E., and Spokoiny, V. (1997). Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors. The Annals of Statistics, pages 929–947. Nadaraya, E. A. (1964). On estimating regression. Theory of Probability & Its Applications, 9(1):141–142. Nemirovski, A. (1991). On non-parametric estimation of functions satisfying differential inequalities.

slide-43
SLIDE 43

References V

Nemirovski, A., Onn, S., and Rothblum, U. (2010). Accuracy certificates for computational problems with convex structure. Mathematics of Operations Research, 35(1):52–78. Nesterov, Y. and Nemirovski, A. (2013). On first-order algorithms for ℓ1/nuclear norm minimization. Acta Numerica, 22:509–575. Sandryhaila, A. and Moura, J. M. (2013). Discrete signal processing on graphs. IEEE transactions on signal processing, 61(7):1644–1656. Tang, G., Bhaskar, B., and Recht, B. (2013). Near minimax line spectral

  • estimation. In Information Sciences and Systems (CISS), 2013 47th Annual

Conference on, pages 1–6. IEEE. Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. R.

  • Stat. Soc. Ser. B. Stat. Methodol., 58(1):267–288.

Tsybakov, A. (2008). Introduction to Nonparametric Estimation. Springer. Waters, J. C. (2009). Accuracy and precision in quantitative fluorescence

  • microscopy. The Journal of Cell Biology, 185(7):1135–1148.

Watson, G. S. (1964). Smooth regression analysis. Sankhy¯ a: The Indian Journal

  • f Statistics, Series A, pages 359–372.