Efficient Monte Carlo computation of Fisher information matrix using - - PowerPoint PPT Presentation

efficient monte carlo computation of fisher information
SMART_READER_LITE
LIVE PREVIEW

Efficient Monte Carlo computation of Fisher information matrix using - - PowerPoint PPT Presentation

Introduction and Motivation Contribution/Results of the Proposed Work Summary Efficient Monte Carlo computation of Fisher information matrix using prior information Sonjoy Das, UB James C. Spall, APL/JHU Roger Ghanem, USC SIAM Conference on


slide-1
SLIDE 1

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Efficient Monte Carlo computation of Fisher information matrix using prior information

Sonjoy Das, UB James C. Spall, APL/JHU Roger Ghanem, USC SIAM Conference on Data Mining Anaheim, California, USA April 26–28, 2012

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-2
SLIDE 2

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Outline

1

Introduction and Motivation General discussion of Fisher information matrix Current resampling algorithm – No use of prior information

2

Contribution/Results of the Proposed Work Improved resampling algorithm – using prior information Theoretical basis Numerical Illustrations

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-3
SLIDE 3

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Outline

1

Introduction and Motivation General discussion of Fisher information matrix Current resampling algorithm – No use of prior information

2

Contribution/Results of the Proposed Work Improved resampling algorithm – using prior information Theoretical basis Numerical Illustrations

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-4
SLIDE 4

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Outline

1

Introduction and Motivation General discussion of Fisher information matrix Current resampling algorithm – No use of prior information

2

Contribution/Results of the Proposed Work Improved resampling algorithm – using prior information Theoretical basis Numerical Illustrations

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-5
SLIDE 5

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Significance of Fisher Information Matrix

Fundamental role of data analysis is to extract information from data Parameter estimation for models is central to process

  • f extracting information

The Fisher information matrix plays a central role in parameter estimation for measuring information: Information matrix summarizes amount of information in data relative to parameters being estimated

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-6
SLIDE 6

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Problem Setting

Consider classical problem of estimating parameter vector, θ, from n data vectors, Zn ≡ {Z1, · · · , Zn} Suppose have a probability density or mass function (PDF or PMF) associated with the data The parameter, θ, appears in the PDF or PMF and affect the nature of the distribution

Example: Zi ∼ N(µ(θ), Σ(θ)), for all i

Let ℓ(θ|Zn) represents the likelihood function, i.e., ℓ(·) is the PDF or PMF viewed as a function of θ conditioned on the data

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-7
SLIDE 7

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Selected Applications

Information matrix is measure of performance for several

  • applications. Five uses are:

1

Confidence regions for parameter estimation

Uses asymptotic normality and/or Cramér-Rao inequality

2

Prediction bounds for mathematical models

3

Basis for “D-optimal” criterion for experimental design

Information matrix serves as measure of how well θ can be estimated for a given set of inputs

4

Basis for “noninformative prior” in Bayesian analysis

Sometimes used for “objective” Bayesian inference

5

Model selection

Is model A “better” than model B?

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-8
SLIDE 8

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Information Matrix

Recall likelihood function, ℓ(θ|Zn) and the log-likelihood function by L(θ|Zn) ≡ ln ℓ(θ|Zn) Information matrix defined as Fn(θ) ≡ E ∂L ∂θ · ∂L ∂θT

  • θ
  • where expectation is w.r.t. the measure of Zn

If Hessian matrix exists, equivalent form based on Hessian matrix: Fn(θ) = −E

  • ∂2L

∂θ ∂θT

  • θ
  • Fn(θ) is positive semidefinite of dimension p × p,

p = dim(θ)

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-9
SLIDE 9

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Two Famous Results

Connection of Fn(θ) and uncertainty in estimate, ˆ θn, is rigorously specified via following results (θ∗ = true value of θ):

1

Asymptotic normality: √ n

  • ˆ

θn − θ∗ dist − → Np(0, ¯ F−1) where ¯ F ≡ lim

n→∞ Fn(θ∗)/n

2

Cramér-Rao inequality: cov(ˆ θn) ≥ Fn(θ∗)−1, for all n (unbiased ˆ θn) Above two results indicate: greater variability in ˆ θn = ⇒ “smaller” Fn(θ) (and vice versa)

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-10
SLIDE 10

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Outline

1

Introduction and Motivation General discussion of Fisher information matrix Current resampling algorithm – No use of prior information

2

Contribution/Results of the Proposed Work Improved resampling algorithm – using prior information Theoretical basis Numerical Illustrations

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-11
SLIDE 11

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Monte Carlo Computation of Information Matrix

Analytical formula for Fn(θ) requires first or second derivative and expectation calculation

Often impossible or very difficult to compute in practical applications Involves expected value of highly nonlinear (possibly unknown) functions of data

Schematic next summarizes “easy” Monte Carlo-based method for determining Fn(θ)

Uses averages of very efficient (simultaneous perturbation) Hessian estimates Hessian estimates evaluated at artificial (pseudo) data Computational horsepower instead of analytical analysis

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-12
SLIDE 12

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Monte Carlo Computation of Information Matrix

Analytical formula for Fn(θ) requires first or second derivative and expectation calculation

Often impossible or very difficult to compute in practical applications Involves expected value of highly nonlinear (possibly unknown) functions of data

Schematic next summarizes “easy” Monte Carlo-based method for determining Fn(θ)

Uses averages of very efficient (simultaneous perturbation) Hessian estimates Hessian estimates evaluated at artificial (pseudo) data Computational horsepower instead of analytical analysis

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-13
SLIDE 13

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Schematic of Monte Carlo Method for Estimating Information Matrix

(1)

pseudo

Z Pseudo data Input H1

( ) 1

^ ^

H ( )

M 1

, . . . . .,

^

H1

( ) N

^

H

( ) M N (1)

H

( )

H

N

. . . . . . . . .

H

( ) i

Negative average of Z pseudo( ) N

, . . . . .,

Hessian estimates

i

Hk

^ ( )

Average H

^ k ( ) i

  • f

=

i

H

− ( )

FIM estimate, F

M,N −

(Spall, 2005, JCGS)

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-14
SLIDE 14

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Supplement: Simultaneous Perturbation (SP) Hessian and Gradient Estimate

ˆ H(i)

k

= 1 2    δG(i)

k

2 c

  • ∆−1

k1 , · · · , ∆−1 kp

  • +
  • δG(i)

k

2 c

  • ∆−1

k1 , · · · , ∆−1 kp

T   where δG(i)

k

≡ G(θ + c∆k|Zpseudo(i)) − G(θ − c∆k|Zpseudo(i)) G(θ ± c∆k|Zpseudo(i)) = ∂L(θ ± c∆k|Zpseudo(i)) ∂θ OR = (1/˜ c)

  • L(θ + ˜

c ˜ ∆k ± c∆k|Zpseudo(i)) − L(θ ± c∆k|Zpseudo(i))

   ˜ ∆−1

k1

. . . ˜ ∆−1

kp

    ∆k = [∆k1, · · · , ∆kp]T and ∆k1, · · · , ∆kp, mean-zero and statistically independent r.v.s with finite inverse moments ˜ ∆k has same statistical properties as ∆k ˜ c > c > 0 are “small’ numbers

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-15
SLIDE 15

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Supplement: Simultaneous Perturbation (SP) Hessian and Gradient Estimate

ˆ H(i)

k

= 1 2    δG(i)

k

2 c

  • ∆−1

k1 , · · · , ∆−1 kp

  • +
  • δG(i)

k

2 c

  • ∆−1

k1 , · · · , ∆−1 kp

T   where δG(i)

k

≡ G(θ + c∆k|Zpseudo(i)) − G(θ − c∆k|Zpseudo(i)) G(θ ± c∆k|Zpseudo(i)) = ∂L(θ ± c∆k|Zpseudo(i)) ∂θ OR = (1/˜ c)

  • L(θ + ˜

c ˜ ∆k ± c∆k|Zpseudo(i)) − L(θ ± c∆k|Zpseudo(i))

   ˜ ∆−1

k1

. . . ˜ ∆−1

kp

    ∆k = [∆k1, · · · , ∆kp]T and ∆k1, · · · , ∆kp, mean-zero and statistically independent r.v.s with finite inverse moments ˜ ∆k has same statistical properties as ∆k ˜ c > c > 0 are “small’ numbers

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-16
SLIDE 16

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Supplement: Optimal Implementation

Several implementation questions/answers:

  • Q. How to compute (cheap) Hessian estimates?
  • A. Use simultaneous perturbation (SP) based method

(Spall, 2000, IEEE Trans. Auto. Control)

  • Q. How to allocate per-realization (M) and

across-realization (N) averaging?

  • A. M = 1 is the optimal solution for a fixed total number
  • f Hessian estimates. However, M > 1 is useful when

accounting for cost of generating pseudo data

  • Q. Can correlation be introduced to improve overall

accuracy of ¯ FM,N(θ)?

  • A. Yes, antithetic random numbers can reduce

variance of elements in ¯ FM,N(θ) (Spall, 2005, JCGS)

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-17
SLIDE 17

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Fisher information matrix with analytically known elements

1 4 8 12 16 20 22 1 4 8 12 16 20 22 Known elements = 54

(Das, Ghanem, Spall, 2008, SISC)

1 5 10 15 20 1 5 10 15 20

(Navigation application at APL)

The previous resampling approach (Spall, 2005, JCGS) yields the “full” Fisher information matrix without consid- ering the available prior information

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-18
SLIDE 18

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Outline

1

Introduction and Motivation General discussion of Fisher information matrix Current resampling algorithm – No use of prior information

2

Contribution/Results of the Proposed Work Improved resampling algorithm – using prior information Theoretical basis Numerical Illustrations

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-19
SLIDE 19

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Schematic of the Proposed Resampling Algorithm

(1)

pseudo

Z H1

^

H

^

N

Current re−sampling algorithm H Hessian estimates,

k

^

Pseudo data Input

(given)

F

n

Use Hessian estimates, Hk

~

H1

~

HN

~

H

~

k

Negative average of

. . . . . . . . .

Z pseudo( ) N New FIM estimate, F

n

~

For M = 1; however, can be readily extended to the case when M > 1

✬ ✫ ✩ ✪

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-20
SLIDE 20

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Schematic of the Proposed Resampling Algorithm

(1)

pseudo

Z H1

^

H

^

N

Current re−sampling algorithm H Hessian estimates,

k

^

Pseudo data Input

(given)

F

n

Use Hessian estimates, Hk

~

H1

~

HN

~

H

~

k

Negative average of

. . . . . . . . .

Z pseudo( ) N New FIM estimate, F

n

~

For M = 1; however, can be readily extended to the case when M > 1

✬ ✫ ✩ ✪

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-21
SLIDE 21

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Outline

1

Introduction and Motivation General discussion of Fisher information matrix Current resampling algorithm – No use of prior information

2

Contribution/Results of the Proposed Work Improved resampling algorithm – using prior information Theoretical basis Numerical Illustrations

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-22
SLIDE 22

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Supplement: Improvement in terms of Variance Reduction

Case 1: Log-likelihood based var[ˆ H(L)

ii |θ] − var[˜

H(L)

ii |θ]

  • l
  • m∈Il

alm(i, i) (Flm(θ))2 > 0, i ∈ Ic

i

where alm(i, j) = E[∆2

m/∆2 j ]E[ ˜

∆2

l / ˜

∆2

i ]> 0

Case 2: Gradient based var[ˆ H(g)

ii

|θ] − var[˜ H(g)

ii

|θ] ≈

  • l∈Ii

bl(i) (Fil(θ))2 > 0, i ∈ Ic

i

where bl(j) = E[∆2

l /∆2 j ] > 0

For (i, j)-th element an additional covariance terms needed to be considered and it can still be shown that var[˜ Hij|θ] < var[ˆ Hij|θ] (see the paper) = ⇒ var[˜ Fij|θ] < var[ˆ Fij|θ] Also, E[˜ Hij|θ] = E[ˆ Hij|θ], for all j ∈ Ic

i , for both cases Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-23
SLIDE 23

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Basic Ideas for Proofs/Implementation

Per current resampling algorithm, the (i, j)-th element is given by, ˆ Hij ≈

  • unknown+known

ωlmHlm, (weighted sum of unknown elements of Hk) ⇒ var(ˆ Hij) ≈ E

  • unknown+known

ωlmHlm 2 − (Fij(θ))2, since E[ˆ Hij|θ] ≈ −Fij(θ) Per modified resampling algorithm, ˆ Hij ≈

  • ωlmHlm + parts corresponding to unknown elements of Fn(θ)

✓ ✓ ✴

Hlm = −Flm + elm, elm ≡ mean-zero error terms The new estimate, therefore, is given by, ˜ Hij ≡ ˆ Hij −

  • known

ωlm(−Flm)

  • unknown

ωlmHlm +

  • known

ωlmelm ⇒ var(˜ Hij) ≈ E

  • unknown

ωlmHlm +

  • known

ωlmelm 2 − (Fij(θ))2 Since E[e2

lm] ≡ var(Hlm) < E[H2 lm] ⇒ var(˜

Hij) < var(ˆ Hij)

= ⇒ var[˜ Fij|θ] < var[ˆ Fij|θ]

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-24
SLIDE 24

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Basic Ideas for Proofs/Implementation

Per current resampling algorithm, the (i, j)-th element is given by, ˆ Hij ≈

  • unknown+known

ωlmHlm, (weighted sum of unknown elements of Hk) ⇒ var(ˆ Hij) ≈ E

  • unknown+known

ωlmHlm 2 − (Fij(θ))2, since E[ˆ Hij|θ] ≈ −Fij(θ) Per modified resampling algorithm, ˆ Hij ≈

  • ωlmHlm + parts corresponding to unknown elements of Fn(θ)

✓ ✓ ✴

Hlm = −Flm + elm, elm ≡ mean-zero error terms The new estimate, therefore, is given by, ˜ Hij ≡ ˆ Hij −

  • known

ωlm(−Flm)

  • unknown

ωlmHlm +

  • known

ωlmelm ⇒ var(˜ Hij) ≈ E

  • unknown

ωlmHlm +

  • known

ωlmelm 2 − (Fij(θ))2 Since E[e2

lm] ≡ var(Hlm) < E[H2 lm] ⇒ var(˜

Hij) < var(ˆ Hij)

= ⇒ var[˜ Fij|θ] < var[ˆ Fij|θ]

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-25
SLIDE 25

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Basic Ideas for Proofs/Implementation

Per current resampling algorithm, the (i, j)-th element is given by, ˆ Hij ≈

  • unknown+known

ωlmHlm, (weighted sum of unknown elements of Hk) ⇒ var(ˆ Hij) ≈ E

  • unknown+known

ωlmHlm 2 − (Fij(θ))2, since E[ˆ Hij|θ] ≈ −Fij(θ) Per modified resampling algorithm, ˆ Hij ≈

  • ωlmHlm + parts corresponding to unknown elements of Fn(θ)

✓ ✓ ✴

Hlm = −Flm + elm, elm ≡ mean-zero error terms The new estimate, therefore, is given by, ˜ Hij ≡ ˆ Hij −

  • known

ωlm(−Flm)

  • unknown

ωlmHlm +

  • known

ωlmelm ⇒ var(˜ Hij) ≈ E

  • unknown

ωlmHlm +

  • known

ωlmelm 2 − (Fij(θ))2 Since E[e2

lm] ≡ var(Hlm) < E[H2 lm] ⇒ var(˜

Hij) < var(ˆ Hij)

= ⇒ var[˜ Fij|θ] < var[ˆ Fij|θ]

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-26
SLIDE 26

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Outline

1

Introduction and Motivation General discussion of Fisher information matrix Current resampling algorithm – No use of prior information

2

Contribution/Results of the Proposed Work Improved resampling algorithm – using prior information Theoretical basis Numerical Illustrations

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-27
SLIDE 27

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Problem Description

Consider independently distributed scalar-valued random data zi ∼ N(µ, σ2 + ciα), ∀i, n = 30

A problem with known information matrix Useful for comparing simulation results with known analytical results θ = [µ, σ2, α]T and assume µ = 0, σ2 = 1 and α = 1 for the purpose of illustration 0 < ci < 1 assumed known (non-identical across i)

p = dim(θ) = 3

= ⇒ 3(3 + 1)/2 = 6 unique elements in Fn(θ) Assume that only the upper-left 2 × 2 block of Fn(θ) known a priori

The analytical Fisher information matrix in practical applications is not known (unlike this example) or partially known

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-28
SLIDE 28

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Results

Analytical FIM Fn(θ) =   F11 F22 F23 F23 F33   For illustration, Fgiven

n

(θ) =   F11 ? F22 ? ? ? ?   , Error in FIM estimates MSE relMSE(ˆ Fn) relMSE(˜ Fn) (variance) [MSE(ˆ Fn)] [MSE(˜ Fn)] reduction L-based 0.00135 % 0.00011 % 91.5 % [0.0071] [0.0006] (93.4 %) g-based 0.0533 % 0.0198 % 81.0 % [0.0020] [0.0004] (93.5 %) MSE and MSE reduction of estimates for Fn(θ), N = 5 × 105, c = 0.0001, ˜ c = 0.00011, ∆ki, ˜ ∆ki ∼ Bernoulli±1

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-29
SLIDE 29

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Concluding Remarks

Fisher information matrix is a central quantity in data analysis and parameter estimation

Direct computation of information matrix in general nonlinear problems usually impossible Monte Carlo approach usually preferred

A modification of the previous Monte Carlo based resampling algorithm is proposed to enhance the statistical characteristics of the estimator of Fn(θ)

Particularly useful in those cases where some elements

  • f Fn(θ) are analytically known from prior information

Numerical illustrations show considerable improvement of the new estimator (in the sense of mean-squared error reduction as well as variance reduction) over the previous estimator

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-30
SLIDE 30

Introduction and Motivation Contribution/Results of the Proposed Work Summary

Concluding Remarks

Fisher information matrix is a central quantity in data analysis and parameter estimation

Direct computation of information matrix in general nonlinear problems usually impossible Monte Carlo approach usually preferred

A modification of the previous Monte Carlo based resampling algorithm is proposed to enhance the statistical characteristics of the estimator of Fn(θ)

Particularly useful in those cases where some elements

  • f Fn(θ) are analytically known from prior information

Numerical illustrations show considerable improvement of the new estimator (in the sense of mean-squared error reduction as well as variance reduction) over the previous estimator

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm

slide-31
SLIDE 31

Appendix

For Further Reading

  • S. Das, “Efficient calculation of Fisher information matrix: Monte

Carlo approach using prior information,” Master’s thesis, Department of Applied Mathematics and Statistics, The Johns Hopkins University, Baltimore, Maryland, USA, May 2007, http://dspace.library.jhu.edu/handle/1774.2/32459.

  • J. C. Spall, “Monte carlo computation of the Fisher information

matrix in nonstandard settings,” J. Comput. Graph. Statist., vol. 14,

  • no. 4, pp. 889–909, 2005.
  • S. Das, J. C. Spall, and R. Ghanem, “Efficient Monte Carlo

computation of Fisher information matrix using prior information,” Computational Statistics and Data Analysis, Volume 54, No. 2,

  • pp. 272–289, 2010.

Sonjoy Das James C. Spall Roger Ghanem Improved resampling algorithm