Ecient Likelihood Evaluation of State-Space Representations David N. - PowerPoint PPT Presentation

E¢cient Likelihood Evaluation of State-Space Representations David N. DeJong � , Roman Liesenfeld �� , Guilherme V. Moura �� , Jean-Francois Richard � , Hariharan Dharmarajan �� University of Pittsburgh �� Universität Kiel �� VU University �� Bates White LLC August 2010 DLMRD () EIS Filtering August 2010 1 / 41

Objective Likelihood evaluation and …ltering for state-space representations featuring departures from: Linearity Normality DLMRD () EIS Filtering August 2010 2 / 41

Motivation In the linear/normal case, exact likelihood evaluations are available analytically via the Kalman …lter. However, linear/normal characterizations of economic phenomenon are often inadequate or inappropriate, thus necessitating the implementation of numerical approximation techniques known as sequential Monte Carlo (SMC) methods. Example: In working with DSGE models, linear approximations are problematic for conducting likelihood analysis (Fernandez-Villaverde and Rubio-Ramirez, 2005 JAE; 2009 REStud ) DLMRD () EIS Filtering August 2010 3 / 41

Sketch of Literature on SMC Methods SMC methods employ importance sampling densities to construct numerical approximations of integrals that arise in pursuit of likelihood evaluation and …ltering. Typically, importance samplers are based on discrete approximations of …ltering densities. The individual elements of these samplers are known as particles; the approximations they represent collectively are known as particle swarms . DLMRD () EIS Filtering August 2010 4 / 41

SMC Methods, cont. Baseline methods construct time- t approximations of …ltering densities absent information on the time- t observables y t . Such methods are termed as being unadapted . Leading examples include Handschin and Mayne, 1969 Intl. J. of Control; Handschin, 1970 Automatica; Gordon, Salmond, and Smith, 1993 IEEE Proceedings. Baseline methods are relatively easy to implement , and yield unbiased estimates ; however, they can be numerically ine¢cient . Re…nements seek to achieve improvements in numerical e¢ciency by taking y t into account in constructing time- t samplers. The pursuit of such improvements is known as adaption . A prominent example of an adapted algorithm is the auxiliary particle …lter of Pitt and Shephard, 1999 JASA. DLMRD () EIS Filtering August 2010 5 / 41

SMC Methods, cont. To date, adaption has been pursued subject to the constraint that the discrete support of the …ltering density constructed in period t � 1 is taken as given and …xed in period t . We refer to the imposition of this constraint as the pursuit of conditional adaption . The approach to …ltering we propose here is implemented absent this constraint: our objective is to pursue unconditional adaption . DLMRD () EIS Filtering August 2010 6 / 41

SMC Methods, cont. Speci…cally, we use continuous approximations of …ltering densities as an input to the construction of time- t importance samplers designed to generate optimal (in terms of numerical e¢ciency) global approximations to targeted integrands. The approximations fully account for the information conveyed by y t , and are constructed using the methodology of e¢cient importance sampling (EIS) developed by Richard and Zhang, 2007 J. of Econometrics . Resulting likelihood approximations are continuous functions of model parameters, greatly enhancing the pursuit of parameter estimation. DLMRD () EIS Filtering August 2010 7 / 41

State Space Representations State-transition equation: s t = γ ( s t � 1 , Y t � 1 , υ t ) Associated density: f ( s t j s t � 1 , Y t � 1 ) Measurement equation: y t = δ ( s t , Y t � 1 , u t ) Associated density: f ( y t j s t , Y t � 1 ) Initialization: f ( s 0 ) DLMRD () EIS Filtering August 2010 8 / 41

State Space Representations, cont. Objective: evaluate the likelihood function T ∏ f ( Y T ) = f ( y t j Y t � 1 ) , t = 1 where f ( y 1 j Y 0 ) � f ( y 1 ) . Time- t likelihoods are evaluated via marginalization of measurement densities: Z f ( y t j Y t � 1 ) = f ( y t j s t , Y t � 1 ) f ( s t j Y t � 1 ) ds t . Marginalization requires the evaluation of f ( s t j Y t � 1 ) : Z f ( s t j Y t � 1 ) = f ( s t j s t � 1 , Y t � 1 ) f ( s t � 1 j Y t � 1 ) ds t � 1 , where f ( s t j Y t ) = f ( y t , s t j Y t � 1 ) = f ( y t j s t , Y t � 1 ) f ( s t j Y t � 1 ) . f ( y t j Y t � 1 ) f ( y t j Y t � 1 ) DLMRD () EIS Filtering August 2010 9 / 41

Particle Filters: General Principle Period- t computation inherently requires the evaluation of Z Z f ( y t j s t , Y t � 1 ) � f ( s t j s t � 1 , Y t � 1 ) � b f ( y t j Y t � 1 ) = f ( s t � 1 j Y t � 1 ) ds t � 1 ds Particle …lters rely upon approximations in the form of a mixture-of-Dirac measures associated with the period- ( t � 1 ) swarm f s i t � 1 g N i = 1 which is …xed in period- t : N b ω i ∑ f ( s t � 1 j Y t � 1 ) = t � 1 � δ s i t � 1 ( s t � 1 ) , i = 1 t � 1 ( s ) denotes the Dirac measure at point s i t � 1 , and ω i where δ s i t � 1 the weight associated with particle s i t � 1 . This approximation e¤ectively solves the (inner) integration in s t � 1 , yielding Z N � � ω i s t j s i ∑ f ( y t j Y t � 1 ) = f ( y t j s t , Y t � 1 ) � f t � 1 , Y t � 1 ds t . t � 1 i = 1 DLMRD () EIS Filtering August 2010 10 / 41

Unadapted Filters Period-t algorithm: Inherit b f ( s t � 1 j Y t � 1 ) , represented using f ω i t � 1 , s i t � 1 g N i = 1 , from the period- ( t � 1 ) step. Approximate f ( s t j Y t � 1 ) : for each s i t � 1 , draw s i t from � � s t j s i f t � 1 , Y t � 1 , yielding N � � b ω i y t j s i ∑ f ( y t j Y t � 1 ) = t � 1 � f t , Y t � 1 . i = 1 Approximate b f ( s t j Y t ) as N b ω i ∑ f ( s t j Y t ) = t ( s t ) , t δ s i i = 1 where the (posterior) weights ω i t obtain from the (prior) weights ω i t � 1 by application of Bayes’ theorem: � � y t j s i t � 1 � f t , Y t � 1 ω i t = ω i . b f ( y t j Y t � 1 ) DLMRD () EIS Filtering August 2010 11 / 41

Conditional Adaptation The measurement density incorporates the assumption that y t is independent of s t � 1 given ( s t , Y t � 1 ) ; this implies f ( y t j s t , Y t � 1 ) � f ( s t j s t � 1 , Y t � 1 ) = f ( s t j s t � 1 , Y t ) � f ( y t j s t � 1 , Y t � 1 ) When this factorization is analytically tractable, it is possible to achieve conditionally optimal adaption: Z Z f ( s t j s t � 1 , Y t ) � f ( y t j s t � 1 , Y t � 1 ) � b f ( y t j Y t � 1 ) = f ( s t � 1 j Y t � 1 ) ds Z f ( y t j s t � 1 , Y t � 1 ) � b = f ( s t � 1 j Y t � 1 ) ds t � 1 N � � ω i y t j s i ∑ = t � 1 � f t � 1 , Y t � 1 . i = 1 DLMRD () EIS Filtering August 2010 12 / 41

Conditional Adaptation: Implementation To implement, for each particle s i t � 1 , draw a particle s i t from � � s t j s i f t � 1 , Y t . The corresponding weights are given by � � y t j s i t � 1 � f t � 1 , Y t � 1 ω i t = ω i . b f ( y t j Y t � 1 ) Key di¤erence relative to unadapted …lters: the draws of s t are conditional on y t . Since ω i t does not depend on s i t , but only on s i t � 1 , its conditional variance is zero given f s i t � 1 g N i = 1 . This is referenced as the optimal sampler following Zaritskii et al., 1975 Automation and Remote Control; Akaski and Kumamoto, 1977 Automatica . Since the factorization f ( y t j s t , Y t � 1 ) � f ( s t j s t � 1 , Y t � 1 ) = f ( s t j s t � 1 , Y t ) � f ( y t j s t � 1 , Y t � 1 ) is tractable only in special cases, this sampler represents a theoretical rather than an operational benchmark. DLMRD () EIS Filtering August 2010 13 / 41

Approximate Conditional Optimality Attempts at approximating conditional optimality follow from the interpretation of Z N � � ω i s t j s i ∑ f ( y t j Y t � 1 ) = f ( y t j s t , Y t � 1 ) � f t � 1 , Y t � 1 ds t t � 1 i = 1 as a mixed integral in ( s t , k t ) , where k t denotes the index of particles, � � N , f ω i t � 1 g N and follows the multinomial distribution MN . i = 1 The likelihood integral may then be evaluated via importance sampling, relying upon a mixed density kernel of the form � � γ t ( s , k ) = ω k s t j s k t � 1 � p t ( s , k ) � f t � 1 , Y t � 1 . Pitt and Shephard (1993 JASA ) pursue conditional optimality by specifying p t ( s , k ) as � � y t j µ k µ k t = E ( s t j s k p t ( s , k ) = f t � 1 , Y t � 1 ) . t , Y t � 1 , DLMRD () EIS Filtering August 2010 14 / 41

Unconditional Optimality Returning to the period- t likelihood integral Z Z f ( y t j s t , Y t � 1 ) � f ( s t j s t � 1 , Y t � 1 ) � b f ( y t j Y t � 1 ) = f ( s t � 1 j Y t � 1 ) ds t � 1 ds consider the theoretical factorization f ( y t j s t , Y t � 1 ) � f ( s t j s t � 1 , Y t � 1 ) � b f ( s t � 1 j Y t � 1 ) = f ( s t , s t � 1 j Y t ) � f ( y t j Y If analytically tractable, f ( s t , s t � 1 j Y t ) would be the unconditionally optimal (fully adapted) sampler for the likelihood integral, as a single draw from it would produce an estimate of f ( y t j Y t � 1 ) with zero MC variance. The period- t …ltering density would then obtain by marginalization with respect to s t � 1 : Z f ( s t j Y t ) = f ( s t , s t � 1 j Y t ) ds t � 1 . DLMRD () EIS Filtering August 2010 15 / 41

Ecient Likelihood Evaluation of State-Space Representations David N. - PowerPoint PPT Presentation

Ecient Likelihood Evaluation of State-Space Representations David N. DeJong , Roman Liesenfeld , Guilherme V. Moura , Jean-Francois Richard , Hariharan Dharmarajan University of Pittsburgh

Max. likelihood & Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Lesson 3: Likelihood-based inference for POMP models Aaron A. King, Edward L. Ionides, Kidus

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Applied Statistics Lecturer: Serena Arima Likelihood ML estimator Summaries ML properties LR

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Max Likelihood for Log-Linear Models Daphne Koller Log-Likelihood for Markov Nets A B C

The Network Operation Centre of a RREN: The Network Operation Centre of a RREN: Anella Cient

Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (114) 3. Su ffi ciency 3.1. Su ffi cient

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Partial-Order Planning 1 State-Space vs. Plan-Space State-space ( situation space ) planning

Semi-supervised Image Classification in Likelihood Space Rong Duan, Wei Jiang, Hong Man Stevens

Derivation of Hartree theory for generic mean-field Bose gases Mathieu LEWIN

Set Cover Algorithms For Very Large Datasets Graham Cormode Howard Karloff AT&T

6/24/2019 1 2 3 4 5 6 1 6/24/2019 7 8 9 10 11 12 2 6/24/2019 13 14 15 16 17

On the Abhyankar-Moh inequality Evelia Garca Barroso La Laguna University, Tenerife September,

Entropy-based Selection of Graph Cuboids Dritan Bleco Yannis Kotidis

Third Generation PV and Other Ways to Utilize Solar Energy Third Generation PV Technologies Week

High Reliability Monitoring and Control of Wind Turbines Peter Seiler Department of Aerospace

Alex Suciu Northeastern University Special Session Advances in Arrangement Theory Mathematical

Ecient Likelihood Evaluation of State-Space Representations David N. - PowerPoint PPT Presentation

Ecient Likelihood Evaluation of State-Space Representations David N. DeJong , Roman Liesenfeld , Guilherme V. Moura , Jean-Francois Richard , Hariharan Dharmarajan University of Pittsburgh

Max. likelihood &amp; Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Lesson 3: Likelihood-based inference for POMP models Aaron A. King, Edward L. Ionides, Kidus

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Applied Statistics Lecturer: Serena Arima Likelihood ML estimator Summaries ML properties LR

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Max Likelihood for Log-Linear Models Daphne Koller Log-Likelihood for Markov Nets A B C

The Network Operation Centre of a RREN: The Network Operation Centre of a RREN: Anella Cient

Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (114) 3. Su ffi ciency 3.1. Su ffi cient

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Partial-Order Planning 1 State-Space vs. Plan-Space State-space ( situation space ) planning

Semi-supervised Image Classification in Likelihood Space Rong Duan, Wei Jiang, Hong Man Stevens

Derivation of Hartree theory for generic mean-field Bose gases Mathieu LEWIN

Set Cover Algorithms For Very Large Datasets Graham Cormode Howard Karloff AT&amp;T

6/24/2019 1 2 3 4 5 6 1 6/24/2019 7 8 9 10 11 12 2 6/24/2019 13 14 15 16 17

On the Abhyankar-Moh inequality Evelia Garca Barroso La Laguna University, Tenerife September,

Entropy-based Selection of Graph Cuboids Dritan Bleco Yannis Kotidis

Third Generation PV and Other Ways to Utilize Solar Energy Third Generation PV Technologies Week

High Reliability Monitoring and Control of Wind Turbines Peter Seiler Department of Aerospace

Alex Suciu Northeastern University Special Session Advances in Arrangement Theory Mathematical

Max. likelihood & Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

Set Cover Algorithms For Very Large Datasets Graham Cormode Howard Karloff AT&T