Particle Filters: Convergence Results and High Dimensions Mark - PowerPoint PPT Presentation

Particle Filters: Convergence Results and High Dimensions Mark Coates mark.coates@mcgill.ca McGill University Department of Electrical and Computer Engineering Montreal, Quebec, Canada Bellairs 2012

Outline Introduction to Sequential Monte Carlo Methods 1 Convergence Results 2 High Dimensionality 3 2 / 28

References Crisan, D. and Doucet, A. (2002). A survey of convergence results on particle filtering methods for practitioners. IEEE Trans. Signal Processing, 50(3):736-746, Mar. 2002. Beskos, A., Crisan, D., & Jasra A. (2011). On the stability of sequential Monte Carlo methods in high dimensions. Technical Report, Imperial College London. Snyder, C., Bengtsson, T., Bickel, P., & Anderson, J. (2008). Obstacles to high-dimensional particle filtering. Month. Weather Rev., 136, 46294640. Bengtsson, T., Bickel, P., & Li, B. (2008). Curse-of-dimensionality revisited: Collapse of the particle filter in very large scale systems. In Essays in Honor of David A. Freeman, D. Nolan & T. Speed, Eds, 316334, IMS. Quang, P.B., Musso, C. and Le Gland F. (2011). An Insight into the Issue of Dimensionality in Particle Filtering. Proc. ISIF Int. Conf. Information Fusion, Edinburgh, Scotland. 3 / 28

Discrete-time Filtering Fixed observations y 1 , . . . , y n with y k ∈ R d y . Hidden Markov chain X 0 , . . . , X n with X k ∈ E d . Initial distribution X 0 ∼ µ ( dx 0 ). Probability transition kernel K ( dx t | x t − 1 ) such that: � Pr ( X t ∈ A | X t − 1 = x t − 1 ) = K ( dx t | x t − 1 ) (1) A Observations conditionally independent of X and have marginal distribution: � Pr ( Y t ∈ B | X t = x t ) = g ( dy t | x t ) (2) B 4 / 28

Bayes’ Recursion Paths of signal and observation processes from time k to l : X k : l = ( X k , X k +1 , . . . , X l ); Y k : l = ( Y k , Y k +1 , . . . , Y l ) . Define probability distribution: π k : l | m ( dx k : l ) = P ( X k : l ∈ dx k : l | Y 1: m = y 1: m ) Bayes theorem leads to the following relationship: t � π 0: t | t ( dx 0: t ) ∝ µ ( dx 0 ) K ( dx k | x k − 1 ) g ( y k | x k ) (3) k =1 5 / 28

Particle Filtering Recursive algorithm. Produce particle cloud with empirical measure close to π t | t . N particle paths { x ( i ) t } N i =1 . Associated empirical measure: N t | t ( dx t ) = 1 � π N δ x ( i ) t ( dx t ) (4) N i =1 7 / 28

Particle Filtering Initialization: Sample x ( i ) ∼ π 0 | 0 ( dx 0 ). 0 For t ≥ 1 x ( i ) ∼ π N Importance sampling: Sample ˜ t − 1 | t − 1 K ( dx t ). t Weight evaluation: N w ( i ) x ( i ) w ( i ) � ∝ g ( y t | ˜ t ); = 1 (5) t t i =1 Resample: Sample x ( i ) π N ∼ ˜ t | t ( dx t ). t 8 / 28

Drawbacks Variation of Importance Weights x ( i ) t } N Distn. of particles { ˜ i =1 is approx. π t | t − 1 = π t − 1 | t − 1 K . The algorithm can be inefficent if this is “far” from π t | t . Then the ratio: π t | t ( dx t ) π t | t − 1 ( dx t ) ∝ g ( y t | x t ) can generate weights with high variance. 9 / 28

Drawbacks Variation induced by resampling Proposed resampling generates N ( i ) copies of the i -th particle. t These are drawn from a multinomial distribution, so: E ( N ( i ) Nw ( i ) t ) = t var ( N ( i ) Nw ( i ) t (1 − w ( i ) t ) = t ) 10 / 28

Sequential Importance Sampling/Resampling Initialization: Sample x ( i ) ∼ π 0 | 0 ( dx 0 ). 0 For t ≥ 1 x ( i ) t − 1 | t − 1 ˜ ∼ π N Importance sampling: Sample ˜ K ( dx t ). t Weight evaluation: ∝ K ( dx t | x ( i ) x ( i ) N t − 1 ) g ( y t | ˜ t ) w ( i ) w ( i ) � ; = 1 (6) t t K ( dx t | x ( i ) ˜ t − 1 ) i =1 Resample: Sample x ( i ) π N ∼ ˜ t | t ( dx t ). t 11 / 28

Sequential Importance Sampling/Resampling Algorithm is the same as the bootstrap with a new dynamic model. � ˜ Pr ( X t ∈ A | X t = x t − 1 , Y t = y t ) = K ( dx t | x t − 1 , y t ) A � Pr ( Y t ∈ B | X t − 1 = x t − 1 , X t = x t ) = w ( x t − 1 , x t , dy t ) B Only true if we assume observations are fixed! With this model, ρ 0: t | t − 1 � = π 0: t | t − 1 but ρ 0: t | t = π 0: t | t . If ˜ K has better mixing properties, or w ( x t − 1 , x t , y t ) is a flatter likelihood, then algorithm will perform better. 12 / 28

Almost Sure Convergence Theorem Assume that the transition kernel K is Feller and that the likelihood function g is bounded, continuous and strictly positive, N →∞ π N then lim t | t = π t | t almost surely. Feller: for ϕ a continuous bounded function, K ϕ is also a continous bounded function. Intuition: we want two realizations of the signal that start from “close” positions to remain “close” at subsequent times. � Define ( µ, ϕ ) = ϕµ . N →∞ µ N = µ if lim N →∞ ( µ N , ϕ ) = ( µ, ϕ ) for any We write lim continuous bounded function ϕ . 13 / 28

Proof discussion Let ( E , d ) be a metric space Let ( a t ) ∞ t =1 and ( b t ) ∞ t =1 be two sequences of continuous functions a t , b t : E → E . Let k t and k 1: t be defined: k t = a t ◦ b t k 1: t = k t ◦ k t − 1 ◦ · · · ◦ k 1 . (7) Perturb k t and k 1: t using function c N : t = c N ◦ a t ◦ c N ◦ b t k N k N 1: t = k N t ◦ k N t − 1 ◦ · · · ◦ k N 1 . (8) Assume that as N becomes larger, perturbations become smaller; c N converges to the identity function on E . Does this mean that k N t and k N 1: t converge? 14 / 28

Counterexample Let E = [0 , 1] and d ( α, β ) = | α − β | . Let a t and b t be equal to identity i on E ; so k t is also identity. α + α  N , if α ∈ [0 , 1 / 2]     1 − ( N − 1) | 1 2 + 1 if α ∈ (1 2 , 1 2 + 1  c N ( α ) = 2 N − α | , N ) α + α − 1 if α ∈ (1 2 + 1   N − 2 , N , 1)    N →∞ c N ( α ) = α for all α ∈ [0 , 1]. Now lim N →∞ k N (1 � 1 2 + 1 � N →∞ c N But lim 2) = lim = 1 2 N 15 / 28

Proof Discussion So successive small perturbations may not always lead to a small perturbation overall. We need a stronger type of convergence for c N : a uniform manner. For all ǫ > 0 there exists N ( ǫ ) such that d ( c N ( e ) , i ( e )) < ǫ for all N ≥ N ( ǫ ). N →∞ e N = e ⇒ lim N →∞ c N ( e N ) = e . This implies that lim N →∞ k N N →∞ k N Then lim t = k t and lim 1: t = k 1: t 16 / 28

Filtering Application E = P ( R d ): set of probability measures over R d endowed with topology of weak convergence. µ N converges weakly if lim N →∞ ( µ N , ϕ ) = ( µ, ϕ ) for all continuous bounded functions ϕ . � Here ( µ, ϕ ) = ϕµ . � Define b t ( ν )( dx t ) = R d K ( dx t | x t − 1 ) ν ( dx t − 1 ). So π t | t − 1 = b t ( π t − 1 | t − 1 ). Let a t ( ν ) be a probability measure: ( a t , ν ) = ( ν, g ) − 1 ( ν, ϕ g ) for any continuous bounded function ϕ . Then π t | t = a t ( π t | t − 1 ) = a t ◦ b t ( π t − 1 | t − 1 ). 17 / 28

Filtering Application Assume a t is continuous; slight variation in conditional distribution of X t will not result in big variation in conditional distribution after y t taken into account. One way: assume g ( y t |· ) is continuous, bounded strictly positive function. Positivity ensures the normalizing denominator is never 0. Particle filtering: perturbation c N is random, but with probability 1 we have the properties outlined above. 18 / 28

Convergence of the Mean Square Error N →∞ E [(( µ N , ϕ ) − ( µ, ϕ )) 2 ] = 0. Different convergence: lim Expectation over all realizations of the random particle method. Assumption: g ( y t |· ) is a bounded function in argument x t . Theorem There exists c t | t independent of N such that for any continous bounded function ϕ : || ϕ || 2 � t | t , ϕ ) − ( π t | t , ϕ )) 2 � (( π N E ≤ c t | t (9) N 19 / 28

Convergence of the Mean Square Error If one uses a kernel ˜ K instead of K , we need that || w || < ∞ . “In other words, particle filtering methods beat the curse of dimensionality as the rate of convergence is independent of the state dimension d .” “However to ensure a given precision on the mean square error...the number of particles N also depends on c t | t , which can depend on d .” [Crisan and Doucet, 2002] 20 / 28

Uniform Convergence We have shown that ( π N t | t , ϕ ) converges to ( π t | t , ϕ ) in the mean-square sense. Rate of convergence is in 1 / N . But how does c t | t behave over time? If the true optimal filter doesn’t forget its initial conditions, then errors accumulate over time. Need mixing assumptions on dynamic model (and thus on the true optimal filter). Uniform convergence results can be obtained [Del Moral 2004]. 21 / 28

Curse of dimensionality Let’s consider the batch setting. Observe Y 1: n ; try to estimate the hidden state X n . Let g ( y 1: n | x ) be the likelihood and f ( x ) the prior density. Suppose f ( x ) is chosen as the importance density. RMSE convergence can be bounded [Leglande, Oudjane 2002] as: t | t , ϕ ) − ( π t | t , ϕ )) 2 � 1 / 2 ≤ c 0 � (( π N E √ I ( f , g ) || ϕ || (10) N where sup x g ( y 1: n | x ) I ( f , g ) = (11) � R d g ( y 1: n | x ) f ( x ) dx 22 / 28

Particle Filters: Convergence Results and High Dimensions Mark - PowerPoint PPT Presentation

Particle Filters: Convergence Results and High Dimensions Mark Coates mark.coates@mcgill.ca McGill University Department of Electrical and Computer Engineering Montreal, Quebec, Canada Bellairs 2012 Outline Introduction to Sequential Monte

Overview of Discrete-Time Filters First-order filters Ideal filters Practical filters

Overview of Discrete-Time Filters Discrete-Time Filters Overview First-order filters N M

CS 4495 Computer Vision Tracking 2: Particle Filters Aaron Bobick School of Interactive

Homework 2 CSE 573: Artificial Intelligence Autumn 2012 Particle Filters Particle Filters for

Finite Impulse Response (FIR) Digital Filters Digital filters are rapidly replacing classic

AngularJS Unit Testing AngularJS Filters and Services with Karma & Jasmine Filters

Practical Analog Filters Overview Types of practical filters Filter specifications

HLT MET Noise Filters in Run2011B Alex Mott Caltech Review of Noise Filters HBHE noise

Project 2: Basic particle system Constrained Particle System Tinkertoys Requirements for

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

CSE-571 Probabilistic Robotics Bayes Filter Implementations Particle filters Motivation So

CTA WEIGHTS AND CTA WEIGHTS AND DIMENSIONS DIMENSIONS INITIATIVES INITIATIVES Meeting of the

Module 4: Building Working with Standard Dimensions Dimensions Using the Basic Level

Today CS 188: Artificial Intelligence HMMs, Particle Filters, and Applications HMMs

Today CS 188: Artificial Intelligence HMMs, Particle Filters, and Applications HMMs

An introduction to particle filters Andreas Svensson Department of Information Technology

Dynamic gradient estimation in machine learning Thomas Flynn Abstract The optimization problems

Frasers Property Thailand Opportunity Day 3/2019 3rd Quarter Fiscal Year 2019 Earnings Three

Modeling Land Competition Modeling Land Competition Modeling Land Competition Ron Sands Ron

Nhuong Tran, WorldFish Anh Van Thi Nguyen and Norbert Wilson, Auburn University Non-tariff

Nonlinear Expectations and Stochastic Calculus under Uncertainty with Robust Central Limit

On the Resource/Performance Tradeoff in Large Scale Queueing Systems David Gamarnik MIT Joint

Q1 2015 EARNINGS RELEASE April 21, 2015 PENTAIR Q1 2015 Earnings Release FORWARD-LOOKING

Economic and institutional implications of network convergence Dr. Zsuzsanna KOSA, Budapest

Particle Filters: Convergence Results and High Dimensions Mark - PowerPoint PPT Presentation

Particle Filters: Convergence Results and High Dimensions Mark Coates mark.coates@mcgill.ca McGill University Department of Electrical and Computer Engineering Montreal, Quebec, Canada Bellairs 2012 Outline Introduction to Sequential Monte

Overview of Discrete-Time Filters First-order filters Ideal filters Practical filters

Overview of Discrete-Time Filters Discrete-Time Filters Overview First-order filters N M

CS 4495 Computer Vision Tracking 2: Particle Filters Aaron Bobick School of Interactive

Homework 2 CSE 573: Artificial Intelligence Autumn 2012 Particle Filters Particle Filters for

Finite Impulse Response (FIR) Digital Filters Digital filters are rapidly replacing classic

AngularJS Unit Testing AngularJS Filters and Services with Karma &amp; Jasmine Filters

Practical Analog Filters Overview Types of practical filters Filter specifications

HLT MET Noise Filters in Run2011B Alex Mott Caltech Review of Noise Filters HBHE noise

Project 2: Basic particle system Constrained Particle System Tinkertoys Requirements for

Linear ensemble transform filters: A unified perspective on ensemble Kalman and particle filters

CSE-571 Probabilistic Robotics Bayes Filter Implementations Particle filters Motivation So

CTA WEIGHTS AND CTA WEIGHTS AND DIMENSIONS DIMENSIONS INITIATIVES INITIATIVES Meeting of the

Module 4: Building Working with Standard Dimensions Dimensions Using the Basic Level

Today CS 188: Artificial Intelligence HMMs, Particle Filters, and Applications HMMs

Today CS 188: Artificial Intelligence HMMs, Particle Filters, and Applications HMMs

An introduction to particle filters Andreas Svensson Department of Information Technology

Dynamic gradient estimation in machine learning Thomas Flynn Abstract The optimization problems

Frasers Property Thailand Opportunity Day 3/2019 3rd Quarter Fiscal Year 2019 Earnings Three

Modeling Land Competition Modeling Land Competition Modeling Land Competition Ron Sands Ron

Nhuong Tran, WorldFish Anh Van Thi Nguyen and Norbert Wilson, Auburn University Non-tariff

Nonlinear Expectations and Stochastic Calculus under Uncertainty with Robust Central Limit

On the Resource/Performance Tradeoff in Large Scale Queueing Systems David Gamarnik MIT Joint

Q1 2015 EARNINGS RELEASE April 21, 2015 PENTAIR Q1 2015 Earnings Release FORWARD-LOOKING

Economic and institutional implications of network convergence Dr. Zsuzsanna KOSA, Budapest

AngularJS Unit Testing AngularJS Filters and Services with Karma & Jasmine Filters