unbiased risk estimation as parameter choice rule for
play

Unbiased Risk Estimation as Parameter Choice Rule for Filter-based - PowerPoint PPT Presentation

Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry, G ottingen and Felix Bernstein


  1. Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry, G¨ ottingen and Felix Bernstein Institute for Mathematical Statistics in the Biosciences University of G¨ ottingen Chemnitz Symposium on Inverse Problems 2017 (on Tour in Rio) 1 joint work with Housen Li Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 1 / 34

  2. Outline 1 Introduction 2 A posteriori parameter choice methods 3 Error analysis 4 Simulations 5 Conclusion Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 2 / 34

  3. Introduction Outline 1 Introduction 2 A posteriori parameter choice methods 3 Error analysis 4 Simulations 5 Conclusion Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 3 / 34

  4. Introduction Statistical inverse problems Setting: X , Y Hilbert spaces, T : X → Y bounded, linear Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 4 / 34

  5. Introduction Statistical inverse problems Setting: X , Y Hilbert spaces, T : X → Y bounded, linear Task: Recover unknown f ∈ X from noisy measurements Y = Tf + σξ Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 4 / 34

  6. Introduction Statistical inverse problems Setting: X , Y Hilbert spaces, T : X → Y bounded, linear Task: Recover unknown f ∈ X from noisy measurements Y = Tf + σξ Noise: ξ is a standard Gaussian white noise process, σ > 0 noise level Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 4 / 34

  7. Introduction Statistical inverse problems Setting: X , Y Hilbert spaces, T : X → Y bounded, linear Task: Recover unknown f ∈ X from noisy measurements Y = Tf + σξ Noise: ξ is a standard Gaussian white noise process, σ > 0 noise level The model has to be understood in a weak sense: Y g := � Tf , g � Y + σ � ξ, g � for all g ∈ Y � � 0 , � g � 2 with � ξ, g � ∼ N and E [ � ξ, g 1 � � ξ, g 2 � ] = � g 1 , g 2 � Y . Y Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 4 / 34

  8. Introduction Statistical inverse problems Assumptions: • T is injective and Hilbert-Schmidt ( � σ 2 k < ∞ , σ k singular values) • σ is known exactly Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 5 / 34

  9. Introduction Statistical inverse problems Assumptions: • T is injective and Hilbert-Schmidt ( � σ 2 k < ∞ , σ k singular values) • σ is known exactly As the problem is ill-posed, regularization is needed. Consider filter-based regularization schemes f α := q α ( T ∗ T ) T ∗ Y , ˆ α > 0 . Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 5 / 34

  10. Introduction Statistical inverse problems Assumptions: • T is injective and Hilbert-Schmidt ( � σ 2 k < ∞ , σ k singular values) • σ is known exactly As the problem is ill-posed, regularization is needed. Consider filter-based regularization schemes f α := q α ( T ∗ T ) T ∗ Y , ˆ α > 0 . Aim: A posteriori choice of α such that rate of convergence (as σ ց 0) is order optimal (no loss of log-factors) Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 5 / 34

  11. Introduction Statistical inverse problems Assumptions: • T is injective and Hilbert-Schmidt ( � σ 2 k < ∞ , σ k singular values) • σ is known exactly As the problem is ill-posed, regularization is needed. Consider filter-based regularization schemes f α := q α ( T ∗ T ) T ∗ Y , ˆ α > 0 . Aim: A posteriori choice of α such that rate of convergence (as σ ց 0) is order optimal (no loss of log-factors) Note: Heuristic parameter choice rules might work here as well, as the Bakushinski˘ ı veto does not hold in our setting (Becker ’11). Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 5 / 34

  12. A posteriori parameter choice methods Outline 1 Introduction 2 A posteriori parameter choice methods 3 Error analysis 4 Simulations 5 Conclusion Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 6 / 34

  13. A posteriori parameter choice methods The discrepancy principle � � � � � T ˆ � • For deterministic data: α DP = max f α − Y Y ≤ τσ α > 0 � � � � Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 7 / 34

  14. A posteriori parameter choice methods The discrepancy principle � � � � � T ˆ � • For deterministic data: α DP = max f α − Y Y ≤ τσ α > 0 � � � � ∈ Y ! Either pre-smoothing ( Y � Z := T ∗ Y ∈ X ) ... • But here: Y / Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 7 / 34

  15. A posteriori parameter choice methods The discrepancy principle � � � � � T ˆ � • For deterministic data: α DP = max f α − Y Y ≤ τσ α > 0 � � � � ∈ Y ! Either pre-smoothing ( Y � Z := T ∗ Y ∈ X ) ... • But here: Y / • ... or discretization: Y ∈ R n , ξ ∼ N n (0 , I n ) and choose 2 ≤ τσ √ n � � � � � T ˆ � α DP = max α > 0 f α − Y � � � � Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 7 / 34

  16. A posteriori parameter choice methods The discrepancy principle � � � � � T ˆ � • For deterministic data: α DP = max f α − Y Y ≤ τσ α > 0 � � � � ∈ Y ! Either pre-smoothing ( Y � Z := T ∗ Y ∈ X ) ... • But here: Y / • ... or discretization: Y ∈ R n , ξ ∼ N n (0 , I n ) and choose 2 ≤ τσ √ n � � � � � T ˆ � α DP = max α > 0 f α − Y � � � � Pros: Cons: • Easy to implement • How to choose τ ≥ 1? • Works for all q α • Only discretized meaningful • Order-optimal convergence • Early saturation rates Davies & Anderssen ’86, Lukas ’95, Blanchard, Hoffmann & Reiß ’16 Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 7 / 34

  17. A posteriori parameter choice methods The quasi-optimality criterion � � � r α ( T ∗ T ) ˆ • Neubauer ’08 ( r α ( λ ) = 1 − λ q α ( λ )): α QO = argmin f α � � � X α> 0 Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 8 / 34

  18. A posteriori parameter choice methods The quasi-optimality criterion � � � r α ( T ∗ T ) ˆ • Neubauer ’08 ( r α ( λ ) = 1 − λ q α ( λ )): α QO = argmin f α � � � X α> 0 • But for spectral cut-off r α ( T ∗ T ) ˆ f α = 0 for all α > 0 Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 8 / 34

  19. A posteriori parameter choice methods The quasi-optimality criterion � � � r α ( T ∗ T ) ˆ • Neubauer ’08 ( r α ( λ ) = 1 − λ q α ( λ )): α QO = argmin f α � � � X α> 0 • But for spectral cut-off r α ( T ∗ T ) ˆ f α = 0 for all α > 0 • Alternative formulation for Tikhonov regularization if candidates α 1 < ... < α m are given: � � � ˆ f α n − ˆ n QO = argmin X , α QO := α n QO . f α n +1 � � � 1 ≤ n ≤ m − 1 Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 8 / 34

  20. A posteriori parameter choice methods The quasi-optimality criterion � � � r α ( T ∗ T ) ˆ • Neubauer ’08 ( r α ( λ ) = 1 − λ q α ( λ )): α QO = argmin f α � � � X α> 0 • But for spectral cut-off r α ( T ∗ T ) ˆ f α = 0 for all α > 0 • Alternative formulation for Tikhonov regularization if candidates α 1 < ... < α m are given: � � � ˆ f α n − ˆ n QO = argmin X , α QO := α n QO . f α n +1 � � � 1 ≤ n ≤ m − 1 Cons: Pros: • Only for special q α • Easy to implement, very fast • Additional assumptions on • No knowledge of σ necessary noise and/or f necessary • Order-optimal convergence • Performance unclear in rates in mildly ill-posed severely ill-posed situations situations Bauer & Kindermann ’08, Bauer & Reiß ’08, Bauer & Kindermann ’09 Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 8 / 34

  21. A posteriori parameter choice methods The Lepski˘ ı-type balancing principle • For given α , the standard deviation of ˆ f α can be bounded by � � q α k ( T ∗ T ) 2 T ∗ T � std ( α ) := σ Tr • If candidates α 1 < ... < α m are given: � � � � � ˆ f α j − ˆ � X ≤ 4 κ std ( α k ) for all 1 ≤ k ≤ j n LEP = max j f α k � � � � and α LEP = α n LEP Frank Werner, MPIbpC G¨ ottingen Unbiased Risk Estimation October 30, 2017 9 / 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend