a stochastic convergence analysis for tikhonov
play

A Stochastic Convergence Analysis for Tikhonov-Regularization with - PowerPoint PPT Presentation

A Stochastic Convergence Analysis for Tikhonov-Regularization with Sparsity Constraints Daniel Gerth, Ronny Ramlau Sparse Tomo Days, Lyngby, Denmark 28.03.14 Doctoral Program Computational Mathematics Numerical Analysis and Symbolic


  1. A Stochastic Convergence Analysis for Tikhonov-Regularization with Sparsity Constraints Daniel Gerth, Ronny Ramlau Sparse Tomo Days, Lyngby, Denmark 28.03.14 Doctoral Program Computational Mathematics Numerical Analysis and Symbolic Computation Gerth,Ramlau 1 / 34

  2. Introduction � � Bayesian approach � Convergence theorem Convergence rates � � Numerical examples Gerth,Ramlau 2 / 34

  3. Introduction Overview Introduction � Bayesian approach � Convergence theorem � Convergence rates � Numerical examples � Gerth,Ramlau 2 / 34

  4. Introduction We study the solution of the linear ill-posed problem Ax = y with A ∈ L ( X , Y ) where X and Y are Hilbert spaces we seek solutions x which are sparse w.r.t to a given ONB the observed data is assumed to be noisy Basic deterministic model: || Ax − y δ || 2 + ˆ α Φ w ,p ( x ) → min (1) x Penalty Φ w ,p ( x ) = � λ ∈ Λ w λ |� x , ψ λ �| p for an ONB { ψ λ } Gerth,Ramlau 3 / 34

  5. Introduction noise modelling two different approaches deterministic stochastic worst case error stochastic information || y δ − y || ≤ δ e.g. y σ ∼ N ( y, σ 2 ) , E ( || y σ − y || ) = f ( σ ) ,... . . . . . . “easy” analysis “hard” “fast” algorithms “slow” δ hard to get parameters σ easy to get ⇐ ? ⇒ Gerth,Ramlau 4 / 34

  6. Introduction noise modelling two different approaches deterministic stochastic worst case error stochastic information || y δ − y || ≤ δ e.g. y σ ∼ N ( y, σ 2 ) , E ( || y σ − y || ) = f ( σ ) ,... . . . . . . “easy” analysis “hard” “fast” algorithms “slow” δ hard to get parameters σ easy to get ⇐ ? ⇒ We want to combine the advantages and find links between both branches. Question: Can we prove convergence (rates) for sparsity regularization, if we use an explicit stochastic noise model instead of the worst case error? Gerth,Ramlau 4 / 34

  7. Introduction stochastic noise model based on discretization, also computation requires discretization, done via projections P m : Y → R m , y �→ y, e.g. point evaluation T n : X → R n , x = T n x = {� x , ψ i �} i =1 ,...,n where { ψ i } ∞ i =1 is ONB in X . each component of y carries stochastic noise, y σ = y + ε , ε ∼ N (0 , σ 2 I m ) . Define A := P m A T ∗ n , then we want to find x s.t. Ax = y σ (2) Gerth,Ramlau 5 / 34

  8. Bayesian approach Overview Introduction � Bayesian approach � Convergence theorem � Convergence rates � Numerical examples � Gerth,Ramlau 5 / 34

  9. Bayesian approach We use Bayes’ formula to characterize the solution. In this framework, every quantity is treated as a random variable in a complete probability space (Ω , F , P ) . π post ( x | y σ ) = π ε ( y σ | x ) π pr ( x ) . π y σ ( y σ ) π post ( x | y σ ) posterior density π ε ( y σ | x ) likelihood function π pr ( x ) prior distribution π y σ ( y σ ) data distribution (irrelevant) Gerth,Ramlau 6 / 34

  10. Bayesian approach We use Bayes’ formula to characterize the solution. In this framework, every quantity is treated as a random variable in a complete probability space (Ω , F , P ) . π post ( x | y σ ) = π ε ( y σ | x ) π pr ( x ) . π y σ ( y σ ) π post ( x | y σ ) posterior density π ε ( y σ | x ) likelihood function π pr ( x ) prior distribution π y σ ( y σ ) data distribution (irrelevant) gaussian error model: π ε ∝ exp ( − 1 2 σ 2 || Ax − y σ || 2 ) , Now we need a prior Gerth,Ramlau 6 / 34

  11. Bayesian approach Besov spaces We are looking for sparse reconstructions w.r.t. a basis in X our choice: Besov-space B s p,p ( R d ) prior Reasons: ”easy” characterization with coefficients of a wavelet expansion sparsity-promoting properties known, connection to TV regularization discretization invariance (Lassas, Saksman, Siltanen ’09), avoiding the following phenomena: solutions diverge as m → ∞ solutions diverge as n → ∞ Representation of a-priori knowledge is incompatible with discretization (this is the case, e.g., for a TV prior) Gerth,Ramlau 7 / 34

  12. Bayesian approach we consider a wavelet basis suitable for multi resolution analysis let { ψ λ : λ ∈ Λ } denote the set of all wavelets ψ , also including the scaling functions where Λ is an appropriate index set, possibly infinite set | λ | = j , then x ∈ B s p,p ( R d ) ⊂ L 2 ( R d ) , s < ˜ s , if   1 /p � 2 ςp | λ | |� x , ψ λ �| p  || x || B s p,p ( R d ) := < ∞ � �� � λ ∈ Λ w λ and ς = s + d ( 1 2 − 1 p ) ≥ 0 . We focus on 1 ≤ p ≤ 2 . Gerth,Ramlau 8 / 34

  13. Bayesian approach Besov-space random variables Definition (adapted from Lassas/Saksman/Siltanen, 2009) Let 1 ≤ p < ∞ and s ∈ R . Let X be the random function � 2 − ς | λ | X α t ∈ R d , X ( t ) = λ ψ λ ( t ) , λ ∈ Λ where the coefficients ( X α λ ) λ ∈ Λ are independent identically distributed real-valued random variables with probability density function � α � 1 p exp( − α | τ | p p λ ( τ ) = c α c α p π X α ) , p = p ) , τ ∈ R . 2Γ( 1 2 2 Then we say X is distributed according to a B s p,p -prior, 2 || X || p X ∝ exp( − α p,p ( R d ) ) . B s Gerth,Ramlau 9 / 34

  14. Bayesian approach “Problem”: P ( X ∈ B s p,p ( R d )) = 0 Gerth,Ramlau 10 / 34

  15. Bayesian approach “Problem”: P ( X ∈ B s p,p ( R d )) = 0 Theorem (adapted from Lassas/Saksman/Siltanen, 2009) Let X be as before, 2 < α < ∞ and take r ∈ R . Then the following three conditions are equivalent: (i) || X || B r p,p ( R d ) < ∞ almost surely , � � || X || p < ∞ , (ii) E exp B r p,p ( R d ) (iii) r < s − d p . same result as [LSS 2009], but here R d instead of T d considered Gerth,Ramlau 10 / 34

  16. Bayesian approach How to avoid this phenomenon? “finite model” (MI) “infinite model” (MII) Gerth,Ramlau 11 / 34

  17. Bayesian approach How to avoid this phenomenon? “finite model” (MI) consider discretization level m and n fixed, finite index set Λ n Then � � λ | p < ∞ 2 − ς | λ | X α λ ψ λ ( t ) ⇒ || X n || p | X α X n ( t ) := p,p ( R d ) = B s λ ∈ Λ n λ ∈ Λ n � p , α̺p Γ( n ) ≤ 1 2 n and P ( || X n || B s p,p ( R d ) > ̺ ) = 2 p Γ( n p ) ̺ αp “infinite model” (MII) Gerth,Ramlau 11 / 34

  18. Bayesian approach How to avoid this phenomenon? “finite model” (MI) consider discretization level m and n fixed, finite index set Λ n Then � � λ | p < ∞ 2 − ς | λ | X α λ ψ λ ( t ) ⇒ || X n || p | X α X n ( t ) := p,p ( R d ) = B s λ ∈ Λ n λ ∈ Λ n � p , α̺p Γ( n ) ≤ 1 2 n and P ( || X n || B s p,p ( R d ) > ̺ ) = 2 p Γ( n p ) ̺ αp “infinite model” (MII) define X ( t ) in B r p ( R d ) with s < r − d p , then � � j =0 2 − j (( r − s ) p − d ) �� 1 � ∞ p < ∞ 2 c 1 λ + c 2 E ( || X || B s p,p ( R d ) ) = λ αp p,p ( R d ) > ̺ ) ≤ 1 and P ( || X || B s ̺ E ( || X || B s p,p ( R d ) ) Gerth,Ramlau 11 / 34

  19. Bayesian approach Recall π post ( x | y σ ) = π pr ( x ) π ε ( y σ | x ) . π y σ ( y σ ) π ε ( y σ | x ) Gaussian noise, π pr ( x ) Besov-space prior 2 || x || p ⇒ π post ( x | y σ ) ∝ exp( − 1 2 σ 2 || Ax − y σ || 2 ) · exp( − α p,p ( R d ) ) B s we are interested in the maximum a-priori solution x map π post ( x | y σ ) = argmax α x ∈ R n or equivalently || Ax − y σ || 2 + ασ 2 || x || p x map = argmin (3) α B s p ( R d ) x ∈ R n Gerth,Ramlau 12 / 34

  20. Bayesian approach Recall π post ( x | y σ ) = π pr ( x ) π ε ( y σ | x ) . π y σ ( y σ ) π ε ( y σ | x ) Gaussian noise, π pr ( x ) Besov-space prior 2 || x || p ⇒ π post ( x | y σ ) ∝ exp( − 1 2 σ 2 || Ax − y σ || 2 ) · exp( − α p,p ( R d ) ) B s we are interested in the maximum a-priori solution x map π post ( x | y σ ) = argmax α x ∈ R n or equivalently || Ax − y σ || 2 + ασ 2 || x || p x map = argmin (3) ���� α B s p ( R d ) x ∈ R n α ˆ same functional as in deterministic case, but we only know E ( || y − y σ || ) = f ( σ ) Gerth,Ramlau 12 / 34

  21. Bayesian approach stochastic setting requires different measure for convergence we use the Ky Fan metric Definition Let x 1 and x 2 be random variables in a probability space (Ω , F , P ) with values in a metric space ( χ, d χ ) . The distance between x 1 and x 2 in the Ky Fan metric is defined as ρ K ( x 1 , x 2 ) := inf { ǫ > 0 : P ( d χ ( x 1 ( ω ) , x 2 ( ω )) > ǫ ) < ǫ } . Gerth,Ramlau 13 / 34

  22. Bayesian approach stochastic setting requires different measure for convergence we use the Ky Fan metric Definition Let x 1 and x 2 be random variables in a probability space (Ω , F , P ) with values in a metric space ( χ, d χ ) . The distance between x 1 and x 2 in the Ky Fan metric is defined as ρ K ( x 1 , x 2 ) := inf { ǫ > 0 : P ( d χ ( x 1 ( ω ) , x 2 ( ω )) > ǫ ) < ǫ } . allows combination of deterministic and stochastic quantities metric for convergence in probability Gerth,Ramlau 13 / 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend