Rao r Cram r Rao Bounds and Bounds and Cram Monte Carlo - PowerPoint PPT Presentation

− Rao r − Cramé ér Rao Bounds and Bounds and Cram Monte Carlo Calculation of the Monte Carlo Calculation of the Fisher Information Matrix Fisher Information Matrix Interfaces 2004 Interfaces 2004 James C. Spall The Johns Hopkins University Applied Physics Laboratory james.spall@jhuapl.edu

Introduction Introduction • Fundamental role of data analysis is to extract information from data • Parameter estimation for models is central to process of extracting information • The Fisher information matrix plays a central role in parameter estimation for measuring information: Information matrix summarizes the amount Information matrix summarizes the amount of information in the data relative to the of information in the data relative to the parameters being estimated being estimated parameters 2

Problem Setting Problem Setting • Consider the classical statistical problem of estimating parameter vector θ from n data vectors z 1 , z 2 ,…, z n • Suppose have a probability density and/or mass function associated with the data The parameters θ appear in the probability function and affect the • nature of the distribution Example: z i ∼ N (mean( θ ), covariance( θ )) for all i – Let L ( θ | z 1 , z 2 ,…, z n ) represent the likelihood function, i.e., the • p.d.f./p.m.f. viewed as a function of θ conditioned on the data 3

Selected Applications Selected Applications • Information matrix is measure of performance for several applications. Four uses are: 1. Confidence regions for parameter estimation – Uses asymptotic normality and/or Cramer-Rao inequality 2. Prediction bounds for mathematical models 3. Basis for “ D -optimal” criterion for experimental design Information matrix serves as measure of how well θ can – be estimated for a given set of inputs 4. Basis for “noninformative prior” in Bayesian analysis – Sometimes used for “objective” Bayesian inference 4

Information Matrix Information Matrix Recall likelihood function L ( θ | z 1 , z 2 ,…, z n ) • • Information matrix defined as ∂ ∂ ⎛ ⎞ log L log L = θ ( ) F E ⎜ ⎟ n ∂ θ ∂ θ T ⎝ ⎠ where expectation is w.r.t. z 1 , z 2 ,…, z n • Equivalent form based on Hessian matrix: ⎛ 2 log ⎞ ∂ L = θ − ⎜ ⎟ ( ) F E ⎜ ⎟ n ∂ ∂ θ θ T ⎝ ⎠ F n ( θ ) is positive semidefinite of dimension p × p ( p =dim( θ )) • 5

Information Matrix (cont’d) Information Matrix (cont’d) Connection of F n ( θ ) and uncertainty in estimate is θ ˆ n • rigorously specified via two famous results ( θ ∗ = true value of θ ): 1. Asymptotic normality: 1. Asymptotic normality: ∗ − dist θ − θ ⎯⎯⎯ → ˆ 1 ( ) ( ) N 0 n , F n where ∗ ≡ lim θ ( ) F F n n →∞ n 2. Cramé ér r- -Rao Rao inequality: inequality: 2. Cram ∗ − ≥ θ θ 1 ˆ ˆ ( ) ( ) cov F for all n n n Above two results indicate: greater variability of ⇒ “smaller” F n ( θ ) (and vice versa) θ ˆ n 6

Computation of Information Matrix Computation of Information Matrix Analytical formula for F n ( θ ) requires first or second • derivative info. and expectation calculation – Often impossible or very difficult to compute in real-world models – Involves expected value of highly nonlinear (possibly unknown) functions of data • Schematic below summarizes “easy” Monte Carlo- based method for determining F n ( θ ) – Uses averages of very efficient (simultaneous perturbation) Hessian estimates – Hessian estimates evaluated at artificial (pseudo) data – Computational horsepower instead of analytical analysis 7

Schematic of Monte Carlo Method for Schematic of Monte Carlo Method for Estimating Information Matrix Estimating Information Matrix 8

Optimal Implementation Optimal Implementation Several implementation questions/answers: • Q. How to compute (cheap) Hessian estimates? Q. A. Use simultaneous perturbation (SP) based method A. ( IEEE Trans. Auto. Control , 2000, pp. 1839–1853) Q. How to allocate per-realization ( M ) and across- Q. realization ( N ) averaging? A. M = 1 is the optimal solution for a fixed total number of A. Hessian estimates. However, M > 1 is useful when accounting for cost of generating pseudo data. Q. Can correlation be introduced to improve overall Q. θ accuracy of ? F , ( ) M N A. Yes, antithetic random numbers can reduce variance A. θ of the elements in . Discussed on slides below. F , ( ) M N 9

Antithetic Random Numbers Antithetic Random Numbers • Above solution ( M = 1) assumes all Hessian estimates generated with independent perturbation vectors • Is it possible to introduce correlated perturbations to reduce variability? • Implemented based on M >1 – Contrasts with optimal solution above of M = 1 • Antithetic random numbers (ARNs ARNs) ) are a way to • Antithetic random numbers ( reduce variability of sums of pseudo random numbers – Contrast with common random numbers for differences of pseudo random numbers • Based on introducing negative correlation according to var( X + Y ) = var( X ) + var( Y ) + 2cov( X , Y ) 10

Implementing Antithetic Random Numbers Implementing Antithetic Random Numbers • Implementing ARNs represents both art and science – Typically more difficult than common random numbers • Possible to write down analytical basis for “best” implementation of ARNs – Unusable in practice – Requires full knowledge of true Hessian values • Practical implementation requires problem insight and approximations • Not a panacea, but sometimes useful to increase accuracy and/or reduce computational cost 11

Numerical Experiments for Monte Carlo Numerical Experiments for Monte Carlo Method of Estimating Information Matrix Method of Estimating Information Matrix Consider a problem of estimating µ and Σ from data z i ∼ N ( µ , • Σ + P i ) ∀ i. Let n = 30 – A problem with known information matrix – Useful for comparing approach here with known result – P i ’s assumed known (non-identical) Have dim( z i ) = 4 and dim( θ ) = 14 • ⇒ 14 × (14+1) / 2=105 unique elements in F n ( θ ) need to be calculated • Real-world implementation of Monte Carlo method is for problems where solution is not known (unlike this example) 12

Evaluation Criteria Evaluation Criteria θ • Let denote the estimate for the Fisher info. F , ( ) M N matrix from M Hessian estimates at each pseudodata vector and N pseudodata vectors θ • Many ways of comparing and the true matrix F , ( ) M N F n ( θ ) = F 30 ( θ ) • As summary measure we use the standard matrix (spectral) norm (scaled): θ − θ F , ( ) F ( ) M N n = norm θ F ( ) n 13

Focus of Numerical Experiments Focus of Numerical Experiments • Two tables below show results of numerical studies of various implementations – Optimality of M = 1 under fixed budget B = MN of Hessian estimates – Value of gradient information (when available) in improving estimate – Value of ARNs • Assume only likelihood values are available (i.e., no gradient) in study of M = 1 – Crude Hessian estimates based on difference of SP gradient estimates – Harder to obtain good Hessian estimates than when exact gradient is available 14

Two Studies: Optimality of M M = 1 and = 1 and Two Studies: Optimality of Value of Gradient Information Value of Gradient Information • • Values in columns (a) , (b) , and (c) are scaled matrix norms; P - -v va al lu ue es s shown to right associated statistical P • • Constant budget B of SP Hessian estimates ( B = MN ) • • P -values based on two-sided t -test M = 1 M = 20 M = 1 N = 40,000 N = 2000 N = 40,000 P -value P -value Likelihood Likelihood Gradient (a) vs. (b) (a) vs. (c) Values Values Values (a) (b) (c) < 10 − 10 0.0502 0.0532 0.0183 0.0009 15

µ Test of Antithetic Random Numbers for µ Test of Antithetic Random Numbers for θ ): Matrix Norms and ( θ Portion of F F n ): Matrix Norms and P P - -Value Value Portion of n ( • • Constant budget of SP Hessian estimates ( B = MN ) • • P -values based on two-sided t -test • • SP Hessian estimates based on true gradient values M = 1 M = 2 N = 40,000 N = 20,000 P -value ( no ARNs ) ( ARNs ) 0.0084 0.0071 0.018 16

Concluding Remarks Concluding Remarks • Fisher information matrix is a central quantity in data analysis and parameter estimation –Measures information in data relative to quantities being estimated –Applications in confidence bounds, prediction error bounds, experimental design, Bayesian analysis, etc. • Direct computation of information matrix in general nonlinear problems usually impossible • Described Monte Carlo approach for computing matrix in arbitrarily complex (nonlinear) estimation problems: • Replaces detailed analytical analysis with computational Replaces detailed analytical analysis with computational • power via resampling power via resampling • Easy to implement, but may be computationally demanding Easy to implement, but may be computationally demanding • 17

Rao r Cram r Rao Bounds and Bounds and Cram Monte Carlo - PowerPoint PPT Presentation

Rao r Cram r Rao Bounds and Bounds and Cram Monte Carlo Calculation of the Monte Carlo Calculation of the Fisher Information Matrix Fisher Information Matrix Interfaces 2004 Interfaces 2004 James C. Spall The Johns Hopkins

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

Draft 1 Density estimation by Monte Carlo and randomized quasi-Monte Carlo (RQMC) Pierre

Introduction to Monte Carlo Method Andrzej Palczewski and Jan Palczewski Introduction to Monte

Techniques in Artificial Intelligence - Part I Todd W. Neller Gettysburg College Monte Carlo

Tutorial on quasi-Monte Carlo methods Josef Dick School of Mathematics and Statistics, UNSW,

Draft 1 Density estimation by Monte Carlo and randomized quasi-Monte Carlo Pierre LEcuyer

Barrier Option Pricing Introduction Barrier Options and Monte Carlo Simulations The

An Introduction to Monte Carlo Methods and Rare Event Simulation Gerardo Rubino and Bruno Tuffin

ELES role in the energy transition Uro Salobir, Director for Strategic Innovation, ELES New

Development David Cheong (Trade and Employment Programme, ILO) Trade and Employment in a

Welcome! Whats the Deal with NorthWestern Energy? Why are we here? Because NorthWestern has

WHAT IS SCRAM? A SHORT INTRODUCTION . Ms Angela Tuffley, RedBay Consulting Dr Betsy Clark,

Refrigerant Update 2010 and Beyond Stephen Niles Refrigeration/Commercial Sales Manager Agenda

Comments on: Rethinking Climate Change Governance and Its Relationship to the World Trading

NATIONAL STRATEGY FOR TRANSITION TO NON-CFC MDIs AND PLAN FOR PHASE-OUT OF CFCs IN THE

Rao r Cram r Rao Bounds and Bounds and Cram Monte Carlo - PowerPoint PPT Presentation

Rao r Cram r Rao Bounds and Bounds and Cram Monte Carlo Calculation of the Monte Carlo Calculation of the Fisher Information Matrix Fisher Information Matrix Interfaces 2004 Interfaces 2004 James C. Spall The Johns Hopkins

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&amp;B 5.3-5.5, 5.7 Lecture Outline 1.

Draft 1 Density estimation by Monte Carlo and randomized quasi-Monte Carlo (RQMC) Pierre

Introduction to Monte Carlo Method Andrzej Palczewski and Jan Palczewski Introduction to Monte

Techniques in Artificial Intelligence - Part I Todd W. Neller Gettysburg College Monte Carlo

Tutorial on quasi-Monte Carlo methods Josef Dick School of Mathematics and Statistics, UNSW,

Draft 1 Density estimation by Monte Carlo and randomized quasi-Monte Carlo Pierre LEcuyer

Barrier Option Pricing Introduction Barrier Options and Monte Carlo Simulations The

An Introduction to Monte Carlo Methods and Rare Event Simulation Gerardo Rubino and Bruno Tuffin

ELES role in the energy transition Uro Salobir, Director for Strategic Innovation, ELES New

Development David Cheong (Trade and Employment Programme, ILO) Trade and Employment in a

Welcome! Whats the Deal with NorthWestern Energy? Why are we here? Because NorthWestern has

WHAT IS SCRAM? A SHORT INTRODUCTION . Ms Angela Tuffley, RedBay Consulting Dr Betsy Clark,

Refrigerant Update 2010 and Beyond Stephen Niles Refrigeration/Commercial Sales Manager Agenda

Comments on: Rethinking Climate Change Governance and Its Relationship to the World Trading

NATIONAL STRATEGY FOR TRANSITION TO NON-CFC MDIs AND PLAN FOR PHASE-OUT OF CFCs IN THE

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.