 
              Introduction and Motivation Contribution/Results of the Proposed Work Summary Efficient Monte Carlo computation of Fisher information matrix using prior information Sonjoy Das, UB James C. Spall, APL/JHU Roger Ghanem, USC SIAM Conference on Data Mining Anaheim, California, USA April 26–28, 2012 Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm
Introduction and Motivation Contribution/Results of the Proposed Work Summary Outline Introduction and Motivation 1 General discussion of Fisher information matrix Current resampling algorithm – No use of prior information Contribution/Results of the Proposed Work 2 Improved resampling algorithm – using prior information Theoretical basis Numerical Illustrations Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm
Introduction and Motivation Contribution/Results of the Proposed Work Summary Outline Introduction and Motivation 1 General discussion of Fisher information matrix Current resampling algorithm – No use of prior information Contribution/Results of the Proposed Work 2 Improved resampling algorithm – using prior information Theoretical basis Numerical Illustrations Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm
Introduction and Motivation Contribution/Results of the Proposed Work Summary Outline Introduction and Motivation 1 General discussion of Fisher information matrix Current resampling algorithm – No use of prior information Contribution/Results of the Proposed Work 2 Improved resampling algorithm – using prior information Theoretical basis Numerical Illustrations Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm
Introduction and Motivation Contribution/Results of the Proposed Work Summary Significance of Fisher Information Matrix Fundamental role of data analysis is to extract information from data Parameter estimation for models is central to process of extracting information The Fisher information matrix plays a central role in parameter estimation for measuring information: Information matrix summarizes amount of information in data relative to parameters being estimated Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm
Introduction and Motivation Contribution/Results of the Proposed Work Summary Problem Setting Consider classical problem of estimating parameter vector, θ , from n data vectors, Z n ≡ { Z 1 , · · · , Z n } Suppose have a probability density or mass function (PDF or PMF) associated with the data The parameter, θ , appears in the PDF or PMF and affect the nature of the distribution Example: Z i ∼ N ( µ ( θ ) , Σ ( θ )) , for all i Let ℓ ( θ | Z n ) represents the likelihood function, i.e. , ℓ ( · ) is the PDF or PMF viewed as a function of θ conditioned on the data Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm
Introduction and Motivation Contribution/Results of the Proposed Work Summary Selected Applications Information matrix is measure of performance for several applications. Five uses are: Confidence regions for parameter estimation 1 Uses asymptotic normality and/or Cramér-Rao inequality Prediction bounds for mathematical models 2 Basis for “ D -optimal” criterion for experimental 3 design Information matrix serves as measure of how well θ can be estimated for a given set of inputs Basis for “noninformative prior” in Bayesian 4 analysis Sometimes used for “objective” Bayesian inference Model selection 5 Is model A “better” than model B? Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm
Introduction and Motivation Contribution/Results of the Proposed Work Summary Information Matrix Recall likelihood function, ℓ ( θ | Z n ) and the log-likelihood function by L ( θ | Z n ) ≡ ln ℓ ( θ | Z n ) Information matrix defined as � ∂ L ∂ θ · ∂ L � � F n ( θ ) ≡ E � � θ ∂ θ T � where expectation is w.r.t. the measure of Z n If Hessian matrix exists, equivalent form based on Hessian matrix: ∂ 2 L � � � F n ( θ ) = − E � � θ ∂ θ ∂ θ T � F n ( θ ) is positive semidefinite of dimension p × p , p = dim ( θ ) Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm
Introduction and Motivation Contribution/Results of the Proposed Work Summary Two Famous Results Connection of F n ( θ ) and uncertainty in estimate, ˆ θ n , is rigorously specified via following results ( θ ∗ = true value of θ ): Asymptotic normality: 1 √ θ n − θ ∗ � dist � n → N p ( 0 , ¯ ˆ F − 1 ) − n →∞ F n ( θ ∗ ) / n where ¯ F ≡ lim Cramér-Rao inequality: 2 for all n ( unbiased ˆ cov (ˆ θ n ) ≥ F n ( θ ∗ ) − 1 , θ n ) greater variability in ˆ Above two results indicate: θ n = ⇒ “smaller” F n ( θ ) (and vice versa) Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm
Introduction and Motivation Contribution/Results of the Proposed Work Summary Outline Introduction and Motivation 1 General discussion of Fisher information matrix Current resampling algorithm – No use of prior information Contribution/Results of the Proposed Work 2 Improved resampling algorithm – using prior information Theoretical basis Numerical Illustrations Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm
Introduction and Motivation Contribution/Results of the Proposed Work Summary Monte Carlo Computation of Information Matrix Analytical formula for F n ( θ ) requires first or second derivative and expectation calculation Often impossible or very difficult to compute in practical applications Involves expected value of highly nonlinear (possibly unknown) functions of data Schematic next summarizes “easy” Monte Carlo-based method for determining F n ( θ ) Uses averages of very efficient (simultaneous perturbation) Hessian estimates Hessian estimates evaluated at artificial (pseudo) data Computational horsepower instead of analytical analysis Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm
Introduction and Motivation Contribution/Results of the Proposed Work Summary Monte Carlo Computation of Information Matrix Analytical formula for F n ( θ ) requires first or second derivative and expectation calculation Often impossible or very difficult to compute in practical applications Involves expected value of highly nonlinear (possibly unknown) functions of data Schematic next summarizes “easy” Monte Carlo-based method for determining F n ( θ ) Uses averages of very efficient (simultaneous perturbation) Hessian estimates Hessian estimates evaluated at artificial (pseudo) data Computational horsepower instead of analytical analysis Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm
Introduction and Motivation Contribution/Results of the Proposed Work Summary Schematic of Monte Carlo Method for Estimating Information Matrix Average Pseudo data ^ ( ) H k i Hessian estimates i − ( ) i H ^ ( ) H Input = of k ( ) i H − , . . . . ., (1) ( ) 1 − H 1 ^ H ( ) ^ 1 H Z (1) M pseudo Negative average of . . . . . . . . . N , . . . . ., N Z pseudo ( ) ( ) N ( ) H 1 ^ H ^ ( ) − N H M FIM estimate, − F M,N (Spall, 2005, JCGS ) Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm
Introduction and Motivation Contribution/Results of the Proposed Work Summary Supplement : Simultaneous Perturbation (SP) Hessian and Gradient Estimate G ( θ ± c ∆ k | Z pseudo ( i ))  δ G ( i ) 1 H ( i )  � � k ˆ ∆ − 1 k 1 , · · · , ∆ − 1 = k 2 c kp 2 ∂ L ( θ ± c ∆ k | Z pseudo ( i ))  = �� T  δ G ( i ) ∂ θ � k �  ∆ − 1 k 1 , · · · , ∆ − 1 + OR kp 2 c �  c ) L ( θ + ˜ c ˜ ∆ k ± c ∆ k | Z pseudo ( i )) = ( 1 / ˜ where  ˜  ∆ − 1 k 1 δ G ( i ) . G ( θ + c ∆ k | Z pseudo ( i )) − L ( θ ± c ∆ k | Z pseudo ( i )) �   ≡ . k  .    − G ( θ − c ∆ k | Z pseudo ( i )) ∆ − 1 ˜ kp ∆ k = [∆ k 1 , · · · , ∆ kp ] T and ∆ k 1 , · · · , ∆ kp , mean-zero and statistically independent r.v.s with finite inverse moments ˜ ∆ k has same statistical properties as ∆ k c > c > 0 are “small’ numbers ˜ Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm
Introduction and Motivation Contribution/Results of the Proposed Work Summary Supplement : Simultaneous Perturbation (SP) Hessian and Gradient Estimate G ( θ ± c ∆ k | Z pseudo ( i ))  δ G ( i ) 1 H ( i )  � � k ˆ ∆ − 1 k 1 , · · · , ∆ − 1 = k 2 c kp 2 ∂ L ( θ ± c ∆ k | Z pseudo ( i ))  = �� T  δ G ( i ) ∂ θ � k �  ∆ − 1 k 1 , · · · , ∆ − 1 + OR kp 2 c �  c ) L ( θ + ˜ c ˜ ∆ k ± c ∆ k | Z pseudo ( i )) = ( 1 / ˜ where  ˜  ∆ − 1 k 1 δ G ( i ) . G ( θ + c ∆ k | Z pseudo ( i )) − L ( θ ± c ∆ k | Z pseudo ( i )) �   ≡ . k  .    − G ( θ − c ∆ k | Z pseudo ( i )) ∆ − 1 ˜ kp ∆ k = [∆ k 1 , · · · , ∆ kp ] T and ∆ k 1 , · · · , ∆ kp , mean-zero and statistically independent r.v.s with finite inverse moments ˜ ∆ k has same statistical properties as ∆ k c > c > 0 are “small’ numbers ˜ Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm
Recommend
More recommend