a road map to more complex dynamic models
play

A road map to more complex dynamic models discrete discrete - PDF document

School of Computer Science State Space Models Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 13, part II Nov 5th, 2007 Receptor A Receptor A Receptor B Receptor B X 1 X 1 X 1 X 2 X 2 X 2 Eric


  1. School of Computer Science State Space Models Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 13, part II Nov 5th, 2007 Receptor A Receptor A Receptor B Receptor B X 1 X 1 X 1 X 2 X 2 X 2 Eric Xing Eric Xing Kinase C Kinase C X 3 X 3 X 3 Kinase D Kinase D X 4 X 4 X 4 Kinase E Kinase E X 5 X 5 X 5 TF F TF F X 6 X 6 X 6 Reading: J-Chap. 15, K&F chapter 19.1 -19.3 Gene G Gene G X 7 X 7 X 7 Gene H Gene H X 8 X 8 X 8 1 A road map to more complex dynamic models discrete discrete continuous Y Y Y discrete continuous continuous X A X A X A Mixture model Mixture model Factor analysis e.g., mixture of multinomials e.g., mixture of Gaussians y 1 y 1 y 2 y 2 y 3 y 3 y N y N y 1 y 1 y 2 y 2 y 3 y 3 y N y N y 1 y 1 y 2 y 2 y 3 y 3 y N y N ... ... ... ... ... ... x 1 A A x 1 x 2 x 2 A A x 3 A A x 3 x N x N A A x 1 x 1 A A x 2 x 2 A A x 3 A x 3 A x N x N A A x 1 A x 1 A x 2 x 2 A A x 3 A A x 3 A x N A x N ... ... ... ... ... ... HMM HMM State space model (for discrete sequential data, e.g., text) (for continuous sequential data, e.g., speech signal) S 1 S 1 S 2 S 2 S 3 S 3 S N S N S 1 S 1 S 2 S 2 S 3 S 3 S N S N ... ... ... ... y 11 y 11 y 12 y 12 y 13 y 13 y 1N y 1N y 11 y 11 y 12 y 12 y 13 y 13 y 1N y 1N ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... y k1 y k1 y k2 y k2 y k3 y k3 y kN y kN y k1 y k1 y k2 y k2 y k3 y k3 y kN y kN Factorial HMM Switching SSM ... ... ... ... x 1 A A x 1 A A x 2 x 2 x 3 A x 3 A A A x N x N A x 1 A x 1 A A x 2 x 2 A x 3 A x 3 x N x N A A ... ... ... ... Eric Xing 2 1

  2. State space models (SSM): � A sequential FA or a continuous state HMM = + x x A G w − 1 X 1 X 2 X 3 X N t t t ... = + y x C v t t t N N ~ ( 0 ; ), ~ ( 0 ; ) w Q v R t t Y 1 A Y 2 A Y 3 A Y N A Σ x N ... ~ ( 0 ; ), 0 0 This is a linear dynamic system. � In general, = + w x x ( ) f G t t 1 t − v = + y x ( ) g t t − 1 t where f is an (arbitrary) dynamic model, and g is an (arbitrary) observation model Eric Xing 3 LDS for 2D tracking � Dynamics: new position = old position + ∆× velocity + noise (constant velocity model, Gaussian noise) ⎛ ⎞ ⎛ ∆ ⎞ ⎛ ⎞ 1 1 1 0 0 x x ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ − t t 1 ∆ ⎜ ⎟ ⎜ ⎟ 2 ⎜ ⎟ 2 0 1 0 x x = − + 1 t t ⎜ ⎟ ⎜ ⎟ noise ⎜ ⎟ & 1 & 1 0 0 1 0 x x ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ − t t 1 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 2 2 & ⎝ 0 0 0 1 ⎠ & ⎝ x ⎠ ⎝ x ⎠ − t t 1 � Observation: project out first two components (we observe Cartesian position of object - linear!) ⎛ x 1 ⎞ ⎜ ⎟ t ⎛ y 1 ⎞ 1 0 0 0 x 2 ⎛ ⎞ ⎜ ⎟ ⎜ t ⎟ ⎜ ⎟ t = + noise ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ y 2 0 1 0 0 x 1 & ⎝ ⎠ ⎝ ⎠ ⎜ ⎟ t t ⎜ ⎟ x 2 & ⎝ ⎠ t Eric Xing 4 2

  3. The inference problem 1 � Filtering � given y 1 , …, y t , estimate x t: y ( | ) P x 1 t : t The Kalman filter is a way to perform exact online inference (sequential � Bayesian updating) in an LDS. It is the Gaussian analog of the forward algorithm for HMMs: � ∑ j p i i p i p i j = = α ∝ = = − = α y ( X | ) ( y | X ) ( X | X ) t 1 t t t t t t 1 t 1 − : j α α α α 0 1 2 t X 1 X 2 X 3 X t ... Y 1 A Y 2 Y 3 Y t ... Eric Xing 5 The inference problem 2 � Smoothing � given y 1 , …, y T , estimate x t (t<T) The Rauch-Tung-Strievel smoother is a way to perform exact off-line � inference in an LDS. It is the Gaussian analog of the forwards- backwards (alpha-gamma) algorithm: γ γ γ γ 0 1 2 t α α α α 0 1 2 t X 1 X 2 X 3 X t ... Y 1 A Y 2 Y 3 Y t ... ∑ = = γ ∝ α γ i i j j j ( X | ) ( | ) p i y P X X + + 1 : 1 1 t T t t t i t j Eric Xing 6 3

  4. 2D tracking filtered X 2 X 2 X 1 X 1 Eric Xing 7 Kalman filtering in the brain? Eric Xing 8 4

  5. Kalman filtering derivation � Since all CPDs are linear Gaussian, the system defines a large multivariate Gaussian. Hence all marginals are Gaussian. � Hence we can represent the belief state p ( X t | y 1:t ) as a Gaussian � ≡ y K y x ˆ | ( X | , , ) mean E � t t t 1 t ≡ T y K y ( X X | , , ) P E covariance . t t t t 1 t � | Hence, instead of marginalization for message passing, we will directly � estimate the means and covariances of the required marginals − 1 P It is common to work with the inverse covariance (precision) matrix ; � t | t this is called information form. Eric Xing 9 Kalman filtering derivation � Kalman filtering is a recursive procedure to update the belief state: X 1 X t X t+1 ... Predict step: compute p ( X t+1 | y 1:t ) from prior belief � p ( X t | y 1:t ) and dynamical model p ( X t+1 | X t ) --- time update A Y 1 A Y t Update step: compute new belief p ( X t+1 | y 1:t+1 ) from � X 1 X t X t+1 ... prediction p ( X t+1 | y 1:t ), observation y t+1 and observation model p ( y t+1 | X t+1 ) --- measurement update A Y 1 A Y t Y t+1 A Eric Xing 10 5

  6. Predict step w w 0 � Dynamical Model: = + x x N A G , ~ ( ; Q ) t 1 t t t + One step ahead prediction of state: � X 1 X t X t+1 ... Y 1 A Y t A X 1 X t X t+1 � Observation model: = + N ... y x , ~ ( 0 ; ) C v v R t t t t One step ahead prediction of observation: Y 1 A Y t A Y t+1 A � Eric Xing 11 Update step � Summarizing results from previous slide, we have p ( X t+1 , Y t+1 | y 1:t ) ~ N ( m t+1 , V t+1 ), where x ⎛ ⎞ P P C T ˆ ⎛ ⎞ ⎜ t 1 t ⎟ m + ⎜ ⎟ = | V t + 1 t t + 1 t = , | | , ⎜ ⎟ t 1 C x ⎜ ⎟ + t 1 + CP CP C T R ˆ + ⎝ ⎠ ⎝ ⎠ t + 1 t t 1 t t 1 t | + + | | � Remember the formulas for conditional Gaussian distributions: µ Σ Σ ⎡ x ⎤ ⎡ x ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 1 11 12 p µ Σ = N ( | , ) ( , ) ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ µ Σ Σ x x ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 2 2 2 21 22 p p = m m = N N x x x m V x x m V ( ) ( | , ) ( ) ( | , ) 1 2 1 1 2 1 2 2 2 2 2 | | m − 1 = µ = µ + Σ Σ − µ m m x ( ) 2 2 1 2 1 12 22 2 2 | m = Σ 1 V = Σ − Σ Σ − Σ V 2 22 1 2 11 12 22 21 | Eric Xing 12 6

  7. Kalman Filter � Measurement updates: = + − x x ˆ ˆ ˆ (y C x ) K t 1 t 1 t 1 t t 1 + + + + + + | | t 1 t 1 | t = − P P K CP + + + + + t 1 | t 1 t 1 | t t 1 t 1 | t where K t+1 is the Kalman gain matrix � = + T T -1 K P C ( CP C R ) t 1 t 1 t t 1 t + + + | | Eric Xing 13 Example of KF in 1D � Consider noisy observations of a 1D particle doing a random walk: x x w w 0 = + σ z x v v 0 N = + N σ , ~ ( , ) , ~ ( , ) t t 1 t 1 x t t z − − | � KF equations: = + = σ + σ x x x = = T T ˆ ˆ ˆ P AP A GQG , A + t + 1 t t|t t|t t 1 | t t | t t x | = + = σ + σ σ + σ + σ T T -1 ( ) ( )( ) K P C CP C R t 1 t 1 t t 1 t t x t x z + + + | | ( ) σ + σ + σ ˆ z x + = + = 1 | t x t z t t ˆ ˆ ( - ˆ ) x x K z C x + + + + + + 1 | 1 1 | 1 t 1 t 1 | t σ + σ + σ t t t t t t x z ( ) σ + σ σ = = t x z P P - KCP t 1 t 1 t 1 t t 1 t + + + + | | | σ + σ + σ t x z Eric Xing 14 7

  8. KF intuition � The KF update of the mean is ( ) σ + σ + σ ˆ z x = + = + t x t 1 z t | t ˆ ˆ ˆ x x K ( z - C x ) + + + + + + t 1 | t 1 t 1 | t t 1 t 1 t 1 | t σ + σ + σ t x z z x + − ˆ the term is called the innovation ( ) C � + t 1 t 1 | t � New belief is convex combination of updates from prior and observation, weighted by Kalman Gain matrix: = + = σ + σ σ + σ + σ T T -1 ( ) ( )( ) K P C CP C R t + 1 t + 1 t t + 1 t t x t x z | | � If the observation is unreliable, σ z (i.e., R ) is large so K t+1 is small, so we pay more attention to the prediction. � If the old prior is unreliable (large σ t ) or the process is very unpredictable (large σ x ), we pay more attention to the observation. Eric Xing 15 KF, RLS and LMS � The KF update of the mean is x x y x = + ˆ ˆ ˆ ( - ) A K C t + 1 t + 1 t t t + 1 + + | | t 1 t 1 | t � Consider the special case where the hidden state is a constant, x t = θ , but the “observation matrix” C is a time- varying vector, C = x t T . y x T v = θ + Hence the observation model at each time slide, , is a � t t t linear regression � We can estimate recursively using the Kalman filter: 1 y x T x θ ˆ = θ ˆ + − − θ ˆ ( ) P R t + 1 t t + 1 + t t t t 1 This is called the recursive least squares (RLS) algorithm. 1 − ≈ η � We can approximate by a scalar constant. This is P R t + 1 t + 1 called the least mean squares (LMS) algorithm. � We can adapt η t online using stochastic approximation theory. Eric Xing 16 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend