compressed sensing off the grid the fisher metric support
play

Compressed sensing off-the-grid: The Fisher metric, support - PowerPoint PPT Presentation

Compressed sensing off-the-grid: The Fisher metric, support stability and optimal sampling bounds Clarice Poon University of Bath Joint work with: Nicolas Keriven and Gabriel Peyr e Ecole Normale Sup erieure February 6, 2019 1 / 36


  1. Compressed sensing off-the-grid: The Fisher metric, support stability and optimal sampling bounds Clarice Poon University of Bath Joint work with: Nicolas Keriven and Gabriel Peyr´ e ´ Ecole Normale Sup´ erieure February 6, 2019 1 / 36

  2. Outline Compressed sensing off-the-grid 1 The Fisher metric and the minimum separation condition 2 Support stability for the subsampled problem 3 Ideas behind the proofs – Dual certificates 4 Removal of random signs assumption 5 2 / 36

  3. Compressed sensing [Cand` es, Romberg & Tao ’06; Donoho ’06] Task: Recover a ∈ C N from y = Φ a where Φ ∈ C m × N with m ≪ N and a is s -sparse. Typical compressed sensing statement: For certain random matrices Φ ∈ C m × N , with high probability, a can be uniquely recovered from m = O ( s log ( N )) measurements by solving z ∈ C N � z � 1 subject to Φ z = y min or in the noisy case of y = Φ a + w , the minimizer ˆ a of z ∈ C N λ � z � 1 + 1 2 � Φ z − y � 2 min 2 with λ ∼ δ/ √ s and � w � � δ satisfies � a − ˆ a � 1 � σ s ( x ) 1 + √ sδ. 3 / 36

  4. Compressed sensing [Cand` es, Romberg & Tao ’06; Donoho ’06] Task: Recover a ∈ C N from y = Φ a where Φ ∈ C m × N with m ≪ N and a is s -sparse. Typical compressed sensing statement: For certain random matrices Φ ∈ C m × N , with high probability, a can be uniquely recovered from m = O ( s log ( N )) measurements by solving z ∈ C N � z � 1 subject to Φ z = y min or in the noisy case of y = Φ a + w , the minimizer ˆ a of z ∈ C N λ � z � 1 + 1 2 � Φ z − y � 2 min 2 with λ ∼ δ/ √ s and � w � � δ satisfies � a − ˆ a � 1 � σ s ( x ) 1 + √ sδ. In the case where U is unitary, the above statement holds with Φ = P Ω U where Ω are m = O ( N · µ ( U ) 2 · s · log( N )) uniformly drawn indices, µ ( U ) = max i,j | U ij | is the so called coherence . In the case of U being the DFT, we have µ ( U ) 2 = 1 /N . 3 / 36

  5. Compressed sensing off the grid Aim: Recover µ 0 ∈ M ( X ), X ⊆ R d , from m observations, y = Φ µ 0 + w Let (Ω , Λ) be a probability space. For ω ∈ Ω, we have random features ϕ ω ∈ C ( X ) . iid For k = 1 , . . . , m , let ω k ∼ Λ. The measurement operator is � m 1 �� def. Φ : M ( X ) → C m , Φ µ = √ m ϕ ω k ( x )d µ ( x ) k =1 Typically, the measure of interest is µ 0 = � s j =1 a j δ x j where aδ x denotes the Dirac at x ∈ X with amplitude a ∈ C (also called a “spike”). 4 / 36

  6. Imaging Sampling the Fourier transform (e.g. astronomy) Recover µ ∈ M ( T d ) from ( F µ ( ω k )) m k =1 where F is the Fourier transform and ω k are drawn ] d , Unif). iid from ([ [ − f c , f c ] � − i2 πx ⊤ ω � Here, ϕ ω ( x ) = exp and m   s 1 � � � − i2 πx ⊤ Φ µ 0 = √ m a j exp j ω k   j =1 k =1 Sampling the Laplace transform (e.g. fluorescence microscopy) + ) from ( L µ ( ω k )) m Recover µ ∈ M ( R d k =1 where L is the Laplace transform and ω k are drawn iid from ( R d − 2 α ⊤ ω + , Λ α ) where Λ α ( ω ) ∝ exp � � . � − x ⊤ ω � Here, ϕ ω ( x ) = exp and m   s 1 � � � − x ⊤ Φ µ 0 = √ m a j exp j ω k   j =1 k =1 5 / 36

  7. Two layer neural network [Bach, 2015] Let Ω ⊆ R d , and ω 1 , . . . , ω m are the training samples drawn from (Ω , Λ), with corresponding values y 1 , . . . , y m ∈ R . Find a function of the form s � f ( ω ) = a j max ( � x j , ω � , 0) j =1 where a j ∈ R and x j ∈ R d such that f ( ω j ) ≈ y j for j = 1 , . . . , m . We can then use the function f to predict y given ω ∈ Ω. 6 / 36

  8. Two layer neural network [Bach, 2015] Let Ω ⊆ R d , and ω 1 , . . . , ω m are the training samples drawn from (Ω , Λ), with corresponding values y 1 , . . . , y m ∈ R . Find a function of the form s � f ( ω ) = a j max ( � x j , ω � , 0) j =1 where a j ∈ R and x j ∈ R d such that f ( ω j ) ≈ y j for j = 1 , . . . , m . We can then use the function f to predict y given ω ∈ Ω. This is precisely our sparse spikes problem where we let ϕ ω ( x ) = max ( � x, ω � , 0) and m   s � Φ µ 0 = a j max ( � x j , ω k � , 0)   j =1 k =1 where µ 0 = � s j =1 a j δ x j . 6 / 36

  9. Density estimation i =1 ∈ X s of a mixture Task: Given data on T , estimate parameters ( a i ) ∈ R N + and ( x i ) s s � � ξ ( t ) = a j ξ x j ( t ) = ξ x ( t )d µ 0 ( x ) X j =1 where µ 0 = � j a j δ x j where ( ξ x ) x ∈X is a family of template distributions. E.g. x = ( m, σ ) ∈ X = R × R + and ξ x = N ( m, σ 2 ). 7 / 36

  10. Density estimation i =1 ∈ X s of a mixture Task: Given data on T , estimate parameters ( a i ) ∈ R N + and ( x i ) s s � � ξ ( t ) = a j ξ x j ( t ) = ξ x ( t )d µ 0 ( x ) X j =1 where µ 0 = � j a j δ x j where ( ξ x ) x ∈X is a family of template distributions. E.g. x = ( m, σ ) ∈ X = R × R + and ξ x = N ( m, σ 2 ). Sketching [Gribonval, Blanchard, Keriven & Traonmilin, 2017] No direct access to ξ but n iid samples ( t 1 , . . . , t n ) ∈ T n drawn from ξ . You do not record this (possibly huge) set of data, but compute online a small set y ∈ C m of m sketches against sketching functions θ ω ( t ): n = 1 � � � def. � y k θ ω k ( t j ) ≈ θ ω k ( t ) ξ ( t )d t = θ ω k ( t ) ξ x ( t )d t d µ 0 ( x ) . n T X T j =1 def. T θ ω k ( t ) ξ x ( t )d t . E.g. θ ω ( t ) = e i � ω, t � and ϕ · ( x ) is the characterisatic � So, ϕ ω ( x ) = function of ξ x . 7 / 36

  11. The Beurling LASSO The BLASSO was initially proposed by [De Castro & Gamboa, 2012] and [Bredies & Pikkarainnen, 2013]. Solve 1 2 � Φ µ − y � 2 + λ | µ | ( X ) ( ˆ P λ ( y )) min µ ∈M ( X ) def. � � where | µ | ( X ) = sup Re ( � f, µ � ) ; f ∈ C ( X ) , � f � ∞ � 1 . Noiseless problem: for y 0 = Φ µ 0 , ( ˆ µ ∈M ( X ) | µ | ( X ) subject to Φ µ = y 0 min P 0 ( y 0 )) NB: If µ = � j a j δ x j , then | µ | ( X ) = � a � 1 . 8 / 36

  12. The Beurling LASSO The BLASSO was initially proposed by [De Castro & Gamboa, 2012] and [Bredies & Pikkarainnen, 2013]. Solve 1 2 � Φ µ − y � 2 + λ | µ | ( X ) ( ˆ P λ ( y )) min µ ∈M ( X ) def. � � where | µ | ( X ) = sup Re ( � f, µ � ) ; f ∈ C ( X ) , � f � ∞ � 1 . Noiseless problem: for y 0 = Φ µ 0 , ( ˆ µ ∈M ( X ) | µ | ( X ) subject to Φ µ = y 0 min P 0 ( y 0 )) NB: If µ = � j a j δ x j , then | µ | ( X ) = � a � 1 . Goal: A CS-type theory . Under what conditions can we recover µ 0 = � s j =1 a j δ x j exactly (stably) from m = O ( s × log factors) (noisy) randomised linear measurements? 8 / 36

  13. Remarks Other approaches include Prony-type methods (1795): MUSIC [Schmidt, 1986], ESPRIT [Roy, 1987], Finite Rate of Innovation [Vetterli, 2002] ... ◮ Nonvariational approaches which encodes the spikes positions as the zeros of some polynomial, whose coefficients are derived from the measurements. ◮ Generally restricted to Fourier type measurements. ◮ Extension to multivariate setting is nontrivial. There are efficient algorithms for solving this infinite dimensional problem, e.g. SDP approaches [Cand` es & Fernandez-Granda, 2012; De Castro, Gamboa, Henrion & Lasserre 2015] and Frank-Wolfe approaches [Bredies & Pikkarainnen 2013; Boyd, Schiebinger & Recht ’15; Denoyelle, Duval & Peyr´ e ’18] . 9 / 36

  14. Background on the BLASSO Recovery of spikes of arbitrary signs require a minimum separation condition: � F µ 0 ( k ) ; k ∈ Z d , � k � ∞ � f c � [Cand` es & Fernandez-Granda ’12]: Given , µ 0 can be recovered uniquely if ∆ = min i � = j � x i − x j � ∞ � C d f c . Many extensions to other measurement operators, minimum separation is fundamental (for BLASSO) and often imposed via ad hoc metrics [Bendory et al ’15, Tang ’15]. 10 / 36

  15. Background on the BLASSO Recovery of spikes of arbitrary signs require a minimum separation condition: � F µ 0 ( k ) ; k ∈ Z d , � k � ∞ � f c � [Cand` es & Fernandez-Granda ’12]: Given , µ 0 can be recovered uniquely if ∆ = min i � = j � x i − x j � ∞ � C d f c . Many extensions to other measurement operators, minimum separation is fundamental (for BLASSO) and often imposed via ad hoc metrics [Bendory et al ’15, Tang ’15]. Stability for the recovered measure ˆ µ : Integral type stability estimates [Cand` es & Fernandez-Granda ’13]: � K hi ⋆ (ˆ µ − µ 0 ) � L 1 . Support concentration [Fernandez-Granda ’13; Asa¨ ıs, De Castro & Gamboa ’12]: � � µ ( X near µ | ( X far ). Bounds on � ˆ ) − a j � and | ˆ � � j Support stability [Duval and Peyr´ e ’15]: in the small noise regime where � w � and λ are sufficiently small, ˆ µ consists of exactly s spikes, and the recovered amplitudes and positions vary continuously with respect to λ and w . 10 / 36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend