the chi squared distribution of the regularized least
play

The Chi-squared Distribution of the Regularized Least Squares - PowerPoint PPT Presentation

Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter


  1. Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization Parameter Estimation Rosemary Renaut DEPARTMENT OF MATHEMATICS AND STATISTICS GAMM Workshop 2008 MATHEMATICS AND STATISTICS 1 / 28

  2. Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Outline Introduction 1 Statistical Results for Least Squares 2 Implications of Statistical Results for Regularized Least Squares 3 Newton algorithm 4 Results 5 Conclusions and Future Work 6 Further Results and More Details 7 MATHEMATICS AND STATISTICS 2 / 28

  3. Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Least Squares for A x = b, (Weighted) Consider discrete systems: A ∈ R m × n , b ∈ R m , x ∈ R n A x = b + e , e is the m − vector of random measurement errors with mean 0 and positive definite covariance matrix C b = E ( ee T ) . Assume that C b is known. (Calculate if given multiple b ) For uncorrelated measurements C b is diagonal matrix of standard deviations of the errors. (Colored noise) For correlated measurements, let W b = C b − 1 and L b L b T = W b be the Choleski factorization of W b and weight the equation: L b A x = L b b + ˜ e , ˜ e are uncorrelated. (White noise). ˜ e ∼ N ( 0 , I ) , normally distributed mean 0 and variance I . MATHEMATICS AND STATISTICS 3 / 28

  4. Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Weighted Regularized Least Squares for numerically ill-posed systems Formulation: x = argmin J ( x ) = argmin {� A x − b � 2 W b + � x − x 0 � 2 ˆ W x } . (1) x 0 is a reference solution, often x 0 = 0. Standard: W x = λ 2 I , λ unknown penalty parameter. Statistically, W x is inverse covariance matrix for the model x i.e. λ = 1 /σ x , σ 2 x the common variance in x . Assumes the resulting estimates for x uncorrelated. ˆ x is the standard maximum a posteriori (MAP) estimate of the solution, when all a priori information is provided. Question: The Problem How do we find an appropriate regularization parameter λ ? More generally, what is the correct W x ? MATHEMATICS AND STATISTICS 4 / 28

  5. Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Weighted Regularized Least Squares for numerically ill-posed systems Formulation: x = argmin J ( x ) = argmin {� A x − b � 2 W b + � x − x 0 � 2 ˆ W x } . (1) x 0 is a reference solution, often x 0 = 0. Standard: W x = λ 2 I , λ unknown penalty parameter. Statistically, W x is inverse covariance matrix for the model x i.e. λ = 1 /σ x , σ 2 x the common variance in x . Assumes the resulting estimates for x uncorrelated. ˆ x is the standard maximum a posteriori (MAP) estimate of the solution, when all a priori information is provided. Question: The Problem How do we find an appropriate regularization parameter λ ? More generally, what is the correct W x ? MATHEMATICS AND STATISTICS 4 / 28

  6. Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results The General Case : Generalized Tikhonov Regularization Formulation: Regularization with Solution Mapping Generalized Tikhonov regularization, operator D acts on x . x = argmin J D ( x ) = argmin {� A x − b � 2 W b + � ( x − x 0 ) � 2 ˆ W D } . (2) Assume invertibility N ( A ) ∩ N ( D ) = ∅ Then solutions depend on W D = λ 2 D T D : x ( λ ) = argmin J D ( x ) = argmin {� A x − b � 2 W b + λ 2 � D ( x − x 0 ) � 2 } . (3) ˆ GOAL Can we estimate λ efficiently when W b is known? Use statistics of the solution to find λ . MATHEMATICS AND STATISTICS 5 / 28

  7. Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results The General Case : Generalized Tikhonov Regularization Formulation: Regularization with Solution Mapping Generalized Tikhonov regularization, operator D acts on x . x = argmin J D ( x ) = argmin {� A x − b � 2 W b + � ( x − x 0 ) � 2 ˆ W D } . (2) Assume invertibility N ( A ) ∩ N ( D ) = ∅ Then solutions depend on W D = λ 2 D T D : x ( λ ) = argmin J D ( x ) = argmin {� A x − b � 2 W b + λ 2 � D ( x − x 0 ) � 2 } . (3) ˆ GOAL Can we estimate λ efficiently when W b is known? Use statistics of the solution to find λ . MATHEMATICS AND STATISTICS 5 / 28

  8. Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Background: Statistics of the Least Squares Problem Theorem (Rao73: First Fundamental Theorem) Let r be the rank of A and for b ∼ N ( A x , σ 2 b I ) , (errors in measurements are normally distributed with mean 0 and covariance σ 2 b I), then x � A x − b � 2 ∼ σ 2 b χ 2 ( m − r ) . J = min J follows a χ 2 distribution with m − r degrees of freedom. Corollary (Weighted Least Squares) For b ∼ N ( A x , C b ) , and W b = C b − 1 then x � A x − b � 2 W b ∼ χ 2 ( m − r ) . J = min MATHEMATICS AND STATISTICS 6 / 28

  9. Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Background: Statistics of the Least Squares Problem Theorem (Rao73: First Fundamental Theorem) Let r be the rank of A and for b ∼ N ( A x , σ 2 b I ) , (errors in measurements are normally distributed with mean 0 and covariance σ 2 b I), then x � A x − b � 2 ∼ σ 2 b χ 2 ( m − r ) . J = min J follows a χ 2 distribution with m − r degrees of freedom. Corollary (Weighted Least Squares) For b ∼ N ( A x , C b ) , and W b = C b − 1 then x � A x − b � 2 W b ∼ χ 2 ( m − r ) . J = min MATHEMATICS AND STATISTICS 6 / 28

  10. Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Extension: Statistics of the Regularized Least Squares Problem Theorem: χ 2 distribution of the regularized functional x = argmin J D ( x ) = argmin {� A x − b � 2 W b + � ( x − x 0 ) � 2 W D = D T W x D . ˆ W D } , (4) Assume W b and W x are symmetric positive definite. Problem is uniquely solvable N ( A ) ∩ N ( D ) � = 0. Moore-Penrose generalized inverse of W D is C D Statistics: ( b − A x ) = e ∼ N ( 0 , C b ) , ( x − x 0 ) = f ∼ N ( 0 , C D ) , x 0 is the mean vector of the model parameters. Then J D ∼ χ 2 ( m + p − n ) MATHEMATICS AND STATISTICS 7 / 28

  11. Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Key Aspects of the Proof I: The Functional J Algebraic Simplifications: Rewrite functional as quadratic form Regularized solution given in terms of resolution matrix R ( W D ) x 0 + ( A T W b A + D T W x D ) − 1 A T W b r , ˆ x = (5) x 0 + R ( W D ) W b 1 / 2 r , = r = b − A x 0 = x 0 + y ( W D ) . (6) ( A T W b A + D T W x D ) − 1 A T W b 1 / 2 R ( W D ) = (7) Functional is given in terms of influence matrix A ( W D ) W b 1 / 2 AR ( W D ) A ( W D ) = (8) r T W b 1 / 2 ( I m − A ( W D )) W b 1 / 2 r , r = W b 1 / 2 r (9) J D (ˆ ˜ x ) = let r T ( I m − A ( W D ))˜ = ˜ r . (10) MATHEMATICS AND STATISTICS 8 / 28

  12. Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Key Aspects of the Proof II : Properties of a Quadratic Form χ 2 distribution of Quadratic Forms x T P x for normal variables (Fisher- Cochran Theorem) Components x i are independent normal variables x i ∼ N ( 0 , 1 ) , i = 1 : n . A necessary and sufficient condition that x T P x has a central χ 2 distribution is that P is idempotent , P 2 = P . In which case the degrees of freedom of χ 2 is rank( P ) = trace( P ) = n . . When the means of x i are µ i � = 0, x T P x has a non-central χ 2 distribution, with non-centrality parameter c = µ T P µ A χ 2 random variable with n degrees of freedom and centrality parameter c has mean n + c and variance 2 ( n + 2 c ) . MATHEMATICS AND STATISTICS 9 / 28

  13. Introduction Statistical Results for Least Squares Implications of Statistical Results for Regularized Least Squares Newton algorithm Results Key Aspects of the Proof III: Requires the GSVD Lemma Assume invertibility and m ≥ n ≥ p. There exist unitary matrices U ∈ R m × m , V ∈ R p × p , and a nonsingular matrix X ∈ R n × n such that � � Υ X T D = V [ M , 0 p × ( n − p ) ] X T , A = U (11) 0 ( m − n ) × n Υ = diag ( υ 1 , . . . , υ p , 1 , . . . , 1 ) ∈ R n × n , M = diag ( µ 1 , . . . , µ p ) ∈ R p × p , 0 ≤ υ 1 ≤ · · · ≤ υ p ≤ 1 , 1 ≥ µ 1 ≥ · · · ≥ µ p > 0 , (12) υ 2 i + µ 2 i = 1 , i = 1 , . . . p . The Functional with the GSVD ˜ = diag ( µ 1 , . . . , µ p , 0 n − p , I m − n ) Let Q r = � ˜ r T ( I m − A ( W D ))˜ QU T ˜ r � 2 ˜ = 2 , then J MATHEMATICS AND STATISTICS 10 / 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend