regularization parameter estimation for least squares
play

Regularization Parameter Estimation for Least Squares: Using the 2 - PowerPoint PPT Presentation

Regularization Parameter Estimation for Least Squares: Using the 2 -curve Rosemary Renaut, Jodi Mead Supported by NSF Arizona State and Boise State Harrachov, August 2007 Outline Introduction Methods Examples Chi squared Method


  1. Regularization Parameter Estimation for Least Squares: Using the χ 2 -curve Rosemary Renaut, Jodi Mead Supported by NSF Arizona State and Boise State Harrachov, August 2007

  2. Outline Introduction Methods Examples Chi squared Method Background Algorithm Single Variable Newton Method Extend for General D: Generalized Tikhonov Results Conclusions References

  3. Regularized Least Squares for A x = b ◮ Ill-posed system: A ∈ R m × n , b ∈ R m , x ∈ R n ◮ Generalized Tikhonov regularization with operator D on x . x = argmin J ( x ) = argmin {� A x − b � 2 W b + � D ( x − x 0 ) � 2 ˆ W x } . (1) Assume N ( A ) ∩ N ( D ) = ∅ ◮ Statistically W b is inverse covariance matrix for data b . ◮ Standard: W x = λ 2 I , λ unknown penalty parameter Focus: How to find λ ?

  4. Standard Methods I: L-curve - Find the corner 1. Let r ( λ ) = ( A ( λ ) − A ) b , where Influence Matrix A ( λ ) = A ( A T W b A + λ 2 D T D ) − 1 A T Plot log ( � D x � ) , log ( � r ( λ ) � ) Find corner 2. Trade off contributions. 3. Expensive - requires range of λ . 4. GSVD makes calculations efficient . 5. No statistical information. No corner

  5. Standard Methods II: Generalized Cross-Validation (GCV) 1. Minimizes GCV function � b − A x ( λ ) � 2 W b [ trace ( I m − A ( W x ))] 2 , W x = λ − 2 I n which estimates predictive Multiple minima risk. 2. Expensive - requires range of λ . 3. GSVD makes calculations efficient . 4. Uses statistical information. Sometimes flat

  6. Standard Methods III: Unbiased Predictive Risk Estimation (UPRE) 1. Minimize expected value of predictive risk: Minimize UPRE function � b − A x ( λ ) � 2 W b + 2 trace ( A ( W x )) − m 2. Expensive - requires range of λ . 3. GSVD makes calculations efficient . 4. Uses statistical information. 5. Minimum needed

  7. An Illustrative Example: phillips Fredholm integral equation (Hansen) Example Error 10 % 1. Add noise to b 2. Standard deviation σ b i = . 01 | b i | + . 1 b max 3. Covariance matrix C b = σ 2 b I m = W b − 1 4. σ 2 b average of σ 2 b i 5. − is the original b and ∗ noisy data.

  8. An Illustrative Example: phillips Fredholm integral equation (Hansen) 1. Add noise to b Comparison with new method 2. Standard deviation σ b i = . 01 | b i | + . 1 b max 3. Covariance matrix C b = σ 2 b I m = W b − 1 4. σ 2 b average of σ 2 b i 5. − is the original b and ∗ noisy data. 6. Each method gives different solution: o is L-curve 7. + is reference

  9. General Result: Tikhonov ( D = I ) Cost functional at min is χ 2 r.v. Theorem (Rao:73, Tarantola, Mead (2007)) J ( x ) = ( b − A x ) T C b − 1 ( b − A x ) + ( x − x 0 ) T C x − 1 ( x − x 0 ) , ◮ x and b are stochastic (need not be normal) ◮ r = b − A x 0 are iid. ◮ Matrices C b = W b − 1 and C x = W x − 1 are SPD - ◮ Then for large m, ◮ minimium value of J is a random variable ◮ it follows a χ 2 distribution with m degrees of freedom.

  10. Implication: Find W x such that J is χ 2 r.v. ◮ Theorem implies √ √ 2 z α/ 2 < J (ˆ m − x ) < m + 2 z α/ 2 for confidence interval ( 1 − α ) , ˆ x the solution. ◮ Equivalently, when D = I , √ √ 2 z α/ 2 < r T ( AC x A T + C b ) − 1 r < m + m − 2 z α/ 2 . ◮ Having found W x posterior inverse covariance matrix is W x = A T W b A + W x ˜ Note that W x is completely general

  11. A New Algorithm for Estimating Model Covariance Algorithm (Mead 07) Given confidence interval parameter α , initial residual r = b − A x 0 and estimate of the data covariance C b , find L x which solves the nonlinear optimization. � L x L x T � 2 Minimize √ √ F 2 z α/ 2 < r T ( AL x L x T A T + C b ) − 1 r < m + m − Subject to 2 z α/ 2 AL x L x T A T + C b well-conditioned. Expensive

  12. Single Variable Approach: Seek efficient, practical algorithm 1. Let W x = σ − 2 x I , where regularization parameter λ = 1 /σ x . 2. Use SVD to implement U b Σ b V T b = W b 1 / 2 A , svs σ 1 ≥ σ 2 ≥ . . . σ p and define s = U b W b 1 / 2 r : 3. Find σ x such that √ √ 1 2 z α/ 2 < s T diag ( m − x + 1 ) s < m + 2 z α/ 2 . σ 2 i σ 2 4. Equivalently, find σ 2 x such that 1 F ( σ x ) = s T diag ( ) s − m = 0 . 1 + σ 2 x σ 2 i Scalar Root Finding: Newton’s Method

  13. Extension to Generalized Tikhonov Define x GTik = argmin J D ( x ) = argmin {� A x − b � 2 W b + � D ( x − x 0 ) � 2 ˆ W x } , (2) Theorem For large m, the minimium value of J D is a random variable which follows a χ 2 distribution with m − n + p degrees of freedom. Proof. Use the Generalized Singular Value Decomposition for [ W b 1 / 2 A , W x 1 / 2 D ] Find W x such that J D is χ 2 with m − n + p d.o.f.

  14. Newton Root Finding W x = σ − 2 x I p ◮ GSVD of [ W b 1 / 2 A , D ] � � Υ X T D = V [ M , 0 p × n − p ] X T , A = U 0 m − n × n ◮ γ i are the generalized singular values m = m − n + p − � p i δ γ i 0 − � m ◮ ˜ i = 1 s 2 i = n + 1 s 2 i , ◮ ˜ s i = s i / ( γ 2 i σ 2 x + 1 ) , i = 1 , . . . , p ◮ t i = ˜ s i γ i . Solve F = 0 , where F ( σ x ) = s T ˜ and F ′ ( σ x ) = − 2 σ x � t � 2 s − ˜ m 2 .

  15. Observations: Example F ◮ Initialization GCV, UPRE, L-curve, χ 2 all use GSVD (or SVD). ◮ Algorithm is cheap as compared to GCV, UPRE, L-curve. ◮ F is monotonic decreasing , even ◮ Solution either exists and is unique for positive σ ◮ Or no solution exists F ( 0 ) < 0 .

  16. Relationship to Discrepancy Principle ◮ The discrepancy principle can be implemented by a Newton method. ◮ Finds σ x such that the regularized residual satisfies b = 1 σ 2 m � b − A x ( σ ) � 2 2 . (3) ◮ Consistent with our notation p m 1 � � i σ 2 + 1 ) 2 s 2 s 2 ( i + i = m , (4) γ 2 i = 1 i = n + 1 ◮ Similar but note that the weight in the first sum is squared in this case.

  17. Some Solutions: with no prior information x 0 Illustrated are solutions and error bars With statistical information No Statistical Information C b = diag ( σ 2 b i ) Solution is Smoothed

  18. Some Generalized Tikhonov Solutions: First Order Derivative C b = diag ( σ 2 b i ) No Statistical Information

  19. Some Generalized Tikhonov Solutions: Prior x 0 : Solution not smoothed C b = diag ( σ 2 b i ) No Statistical Information

  20. Some Generalized Tikhonov Solutions: x 0 = 0: Exponential noise C b = diag ( σ 2 b i ) No Statistical Information

  21. Newton’s Method converges in 5 − 10 Iterations l cb Iterations k mean std 6 . 64 e − 01 0 1 8 . 23 e + 00 9 . 80 e − 01 0 2 8 . 31 e + 00 0 3 8 . 06 e + 00 1 . 06 e + 00 4 . 92 e + 00 5 . 10 e − 01 1 1 1 . 00 e + 01 1 . 16 e + 00 1 2 1 3 1 . 00 e + 01 1 . 19 e + 00 2 1 5 . 01 e + 00 8 . 90 e − 01 2 2 8 . 29 e + 00 1 . 48 e + 00 2 3 8 . 38 e + 00 1 . 50 e + 00 Table: Convergence characteristics for problem phillips with n = 40 over 500 runs

  22. Newton’s Method converges in 5 − 10 Iterations l cb Iterations k mean std 0 1 6 . 84 e + 00 1 . 28 e + 00 0 2 8 . 81 e + 00 1 . 36 e + 00 0 3 8 . 72 e + 00 1 . 46 e + 00 6 . 05 e + 00 1 . 30 e + 00 1 1 7 . 40 e + 00 7 . 68 e − 01 1 2 1 3 7 . 17 e + 00 8 . 12 e − 01 2 1 6 . 01 e + 00 1 . 40 e + 00 2 2 7 . 28 e + 00 8 . 22 e − 01 2 3 7 . 33 e + 00 8 . 66 e − 01 Table: Convergence characteristics for problem blur with n = 36 over 500 runs

  23. Estimating The Error and Predictive Risk Error χ 2 l cb L GCV UPRE mean mean mean mean 4 . 37 e − 03 4 . 39 e − 03 4 . 21 e − 03 4 . 22 e − 03 0 2 4 . 32 e − 03 4 . 42 e − 03 4 . 21 e − 03 4 . 22 e − 03 0 3 4 . 35 e − 03 5 . 17 e − 03 4 . 30 e − 03 4 . 30 e − 03 1 2 4 . 39 e − 03 5 . 05 e − 03 4 . 38 e − 03 4 . 37 e − 03 1 3 4 . 50 e − 03 6 . 68 e − 03 4 . 39 e − 03 4 . 56 e − 03 2 2 4 . 37 e − 03 6 . 66 e − 03 4 . 43 e − 03 4 . 54 e − 03 2 3 Table: Error characteristics for problem phillips with n = 60 over 500 runs with error contaminated x 0 . Relative errors larger than . 009 removed. Results are comparable

  24. Estimating The Error and Predictive Risk Risk χ 2 l cb L GCV UPRE mean mean mean mean 3 . 78 e − 02 5 . 22 e − 02 3 . 15 e − 02 2 . 92 e − 02 0 2 3 . 88 e − 02 5 . 10 e − 02 2 . 97 e − 02 2 . 90 e − 02 0 3 1 2 3 . 94 e − 02 5 . 71 e − 02 3 . 02 e − 02 2 . 74 e − 02 1 3 1 . 10 e − 01 5 . 90 e − 02 3 . 27 e − 02 2 . 79 e − 02 2 2 3 . 41 e − 02 6 . 00 e − 02 3 . 35 e − 02 3 . 79 e − 02 2 3 3 . 61 e − 02 5 . 98 e − 02 3 . 35 e − 02 3 . 82 e − 02 Table: Error characteristics for problem phillips with n = 60 over 500 runs χ 2 method does not give best estimate of risk

  25. Estimating The Error and Predictive Risk Error Histogram Normal noise on rhs, first order derivative, C b = σ 2 I

  26. Estimating The Error and Predictive Risk Error Histogram Exponential noise on rhs, first order derivative, C b = σ 2 I

  27. Conclusions ◮ χ 2 Newton algorithm is cost effective ◮ It performs as well ( or better) than GCV and UPRE when statistical information is available. ◮ Should be method of choice when statistical information is provided ◮ Method can be adapted to find W b if W x is provided.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend