Markov Models Kunsch, H.R., State Space and Hidden Markov Models . - - PowerPoint PPT Presentation

markov models
SMART_READER_LITE
LIVE PREVIEW

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . - - PowerPoint PPT Presentation

State Space and Hidden Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State Space Model 4. Filtering and


slide-1
SLIDE 1

State Space and Hidden Markov Models

Aliaksandr Hubin Oslo 2014

Kunsch, H.R., State Space and Hidden Markov Models. ETH- Zurich, Zurich;

slide-2
SLIDE 2

Contents

1. Introduction 2. Markov Chains 3. Hidden Markov and State Space Model 4. Filtering and Smoothing of State Densities 5. Estimation of Parameters 6. Spatial-Temporal Model

11/10/2014 ALIAKSANDR HUBIN. STK 9200

2

slide-3
SLIDE 3

Introduction

HMM

  • Broadly speaking we will address the models where observations

are noisy and incomplete functions of some underlying unobservable process, called the state process, which is assumed to have simple markovian dynamics.

11/10/2014 ALIAKSANDR HUBIN. STK 9200

3

slide-4
SLIDE 4

Markov chains

  • Set of states:
  • Process moves from one state to another generating a sequence of states :
  • Markov chain property: probability of each subsequent state depends only on what was the

previous state:

  • To define Markov chain, the following probabilities have to be specified:

transition probabilities matrix and initial probabilities

  • The output of the process is the set of states at each instant of time
  • Joint probability of all states sequence:

11/10/2014 ALIAKSANDR HUBIN. STK 9200

4

,... 2 , 1 }, , , , {

2 1

  i m m m x

N i

   , , , ,

2 1 k

x x x

) | ( ) , , , | (

1 1 2 1  

k k k k

x x P x x x x P 

) | (

j i ij

m m P a 

) (

i i

m P  

) ( ) | ( ) | ( ) | ( ) , , , ( ) | ( ) , , , ( ) , , , | ( ) , , , (

1 1 2 2 1 1 1 2 1 1 1 2 1 1 2 1 2 1

x P x x P x x P x x P x x x P x x P x x x P x x x x P x x x P

k k k k k k k k k k k

     

      

   

slide-5
SLIDE 5

Simple Example of Markov Model

  • Epigenetic state process is considered
  • Two states : ‘Methylated’ and ‘Non-methylated’.
  • Initial probabilities: P(‘Methylated’)=0.4 , P(‘Non-

methylated’)=0.6

  • Transition probabilities described in the graph ->
  • Inference example:

Suppose we want to calculate a probability of a sequence

  • f states in our example, {‘Methylated’,’ Methylated’,’ Non-

methylated’, Non-methylated’}. This then corresponds to 0.4*0.3*0.7*0.8 = 6.72%

11/10/2014 ALIAKSANDR HUBIN. STK 9200

5

slide-6
SLIDE 6

Hidden Markov Models

  • The observations are represented by a probabilistic function (discrete
  • r continuous) of a state instead of an one-to-one correspondence of a state
  • The following components describe a Hidden Markov Model in the simplest case:

1. Distribution of transition probabilities ; 2. Distribution of initial state probabilities ; 3. Distribution of observation probabilities 𝜘𝑗 = 𝑄 𝑧𝑗 𝑦𝑗 .

 𝑦𝑗 ∈ 𝑆𝑂- are usually addressed as state space models;  𝑦𝑗∈ {𝑛1, … , 𝑛𝑂} – are usually addressed as hidden markov chains;

 Parameters listed above estimation might be very challenging for complicated models!

11/10/2014 ALIAKSANDR HUBIN. STK 9200

6

) | (

j i ij

m m P a 

) (

i i

m P  

slide-7
SLIDE 7

Hidden Markov Models

The following components describe a Hidden Markov Model in the general case: 1. Distribution of initial state probabilities 𝑌0~𝑏0 𝑦 𝑒𝜈(𝑦); 2. Distribution of transition probabilities 𝑌𝑢| 𝑌𝑢−1 = 𝑦𝑢−1 ~𝑏𝑢 𝑦𝑢−1, 𝑦 𝑒𝜈(𝑦); 3. Distribution of observation probabilities 𝑍

𝑢 (𝑌𝑢= 𝑦𝑢 ~𝑐𝑢 𝑦𝑢, 𝑧 𝑒𝑤 𝑧 ;

4. The joint density of the process and observations then looks as follows: 𝑄 𝑌0, … , 𝑌𝑈, 𝑍

1, … , 𝑍 𝑈 = 𝑏0(𝑦0) 𝑢=1 𝑈

𝑏𝑢 𝑦𝑢−1, 𝑦𝑢 𝑐𝑢(𝑦𝑢, 𝑧𝑢) .

11/10/2014 ALIAKSANDR HUBIN. STK 9200

7

slide-8
SLIDE 8

Simple Example of HMM

11/10/2014 ALIAKSANDR HUBIN. STK 9200

8

  • Whether state process is considered
  • Two discrete states : ‘Methylated’ and ‘Non-methylated’.
  • Initial probabilities: P(‘Methylated’)=0.4 , P(‘Non-

methylated’)=0.6

  • Transition probabilities described in the graph ->
  • Locations associated are observed with respect to their

state-dependent probabilities

  • Corresponding observation probabilities are represented

in the graph ->

slide-9
SLIDE 9

Graphical illustration of the Hidden Markov Model

11/10/2014 ALIAKSANDR HUBIN. STK 9200

9

slide-10
SLIDE 10

Project application

Let further:

  • 𝐽𝑢 be a binary variable indication whether location t is methylated
  • 𝐹𝑢 be a binary varibale indicating whether the gene, to which location 𝑢 belongs is expressed
  • 𝑜𝑢 be an amount of reads for location t
  • 𝑧𝑢 be a binomial variable indicating the number of methylated reads at location t
  • 𝑨𝑢 be a quantitative measure of some phenotypic response for the expressed genes at location t

11/10/2014 ALIAKSANDR HUBIN. STK 9200

10

slide-11
SLIDE 11

Project application

11/10/2014 ALIAKSANDR HUBIN. STK 9200

11

𝐹1 𝐹𝑢 𝐹𝑢+1 𝐹𝑈 𝐽𝑢+1 𝐽𝑈

𝑨1 𝑨𝑢

𝑨𝑢+1 𝑨𝑈

𝑧1 𝑧𝑢 𝑧𝑢+1 𝑧𝑈 … … … … … … … …

𝐽1 𝐽𝑢

Initial Extended

slide-12
SLIDE 12

Project application

  • 𝐽𝑢 = 1,

− location 𝑢 is methylated 0, − location 𝑢 is not methylated ~ 𝑄(𝐽𝑢) = 𝑄

1,

− location 𝑢 is methylated 𝑄2, −location 𝑢 is not methylated

  • 𝑄𝑗𝑘 = 𝑄 𝐽𝑢 = 𝑘 𝐽𝑢−1 = 𝑗 −define dynamic structures in the given neighborhood
  • 𝑧𝑢|𝐽𝑢, 𝑜𝑢 − number of metylated observations in given reads and methylation status
  • 𝑄(𝑧𝑢 𝐽𝑢, 𝑜𝑢 = 𝑄𝐶𝑗𝑜𝑝𝑛 𝑜𝑢,𝑄 𝐽𝑢

𝑧𝑢 − it has binomial distribution

11/10/2014 ALIAKSANDR HUBIN. STK 9200

12

slide-13
SLIDE 13

Extensions of the model

 Let 𝑞𝑢be continuous: define stochastic process 𝑞(𝐽𝑢)= 𝐶𝑓𝑢𝑏(𝛾

𝑞𝑢−1 1−𝑞𝑢−1)

𝐶𝑓𝑢𝑏(𝛾

𝑟𝐽𝑢 1−𝑟𝐽𝑢

) , giving similarities when the state do not change but a renewal in cases where the state changes. Link state transition probabilities to underlying genomic structure Look at more global structures than transition probabilities 𝑄𝑗𝑘:

  • address more complex models describing more global structures (ARIMA, n-markov chains, etc.)
  • use simple markov chain model above as a first step, but then extract more global features of {𝐽𝑢}

Consider more complex spatial structures than the one-directional approach above. Simultaneous modelling of several tissues and/or individuals at the same time.

11/10/2014 ALIAKSANDR HUBIN. STK 9200

13

slide-14
SLIDE 14

Parameters’ estimation & Inference

1. Viterbi algorithm for fitting the joint distribution of the most probable sequence of states 𝑏𝑠𝑕𝑛𝑏𝑦𝑦1:𝑈{𝑄(𝑦1:𝑈|𝑒1:𝑈)} 2. Forward algorithm for fitting filtered probability of a given state, given data up to 𝑄(𝑦𝑢|𝑒1, … , 𝑒𝑢) 3. Forward–backward algorithm for fitting the smoothed probability of a given state, given all data 𝑄(𝑦𝑙|𝑒1, … , 𝑒𝑈) 4. Maximal likelihood maximization or Bayesian methods for parameters’ vector 𝜄 estimation  Where 𝑒𝑢 is data at point t; Note that these algorithms are linear in the number of time points.

11/10/2014 ALIAKSANDR HUBIN. STK 9200

14

slide-15
SLIDE 15
  • Inference. Filtering.
  • 1. Recursion in k for prediction density of the states:

𝑔

𝑢+𝑙|𝑢 𝑦𝑢+𝑙 𝑧1 𝑢 = 𝑏𝑢+𝑙 𝑦, 𝑦𝑢+𝑙 𝑔 𝑢+𝑙−1|𝑢 𝑦 𝑧1 𝑢 𝑒𝜈(𝑦)

  • 2. Prediction densities for the observations given the corresponding prediction densities of the

states: 𝑞 𝑧𝑢+𝑙 𝑧1

𝑢 = 𝑐𝑢+𝑙 𝑦, 𝑧𝑢+𝑙 𝑔 𝑢+𝑙 𝑦 𝑧1 𝑢 𝑒𝜈(𝑦)

  • 3. Thus, filtering densities of the states can be computed according to the following forward

recursion in t (starting with 𝑔

0|0):

𝑔

𝑢+1|𝑢+1 𝑦𝑢+1 𝑧1 𝑢+1 = 𝑐𝑢+1 𝑦𝑢+1,𝑧𝑢+1 𝑔𝑢+1|𝑢 𝑦𝑢+1 𝑧1 𝑢 𝑞 𝑧𝑢+𝑙 𝑧1 𝑢

11/10/2014 ALIAKSANDR HUBIN. STK 9200

15

slide-16
SLIDE 16
  • Interface. Smoothing.
  • 1. Conditional on all observations sequence of states is the markov chain with the following

transition densities: , where

  • 2. The backward transition densities then become:

11/10/2014 ALIAKSANDR HUBIN. STK 9200

16

slide-17
SLIDE 17
  • Interface. Smoothing.

The smoothing densities are then computed with respect to the recursions in t: and , where

  • 5. Note that the joint density of the subsequence of states, which might be required at ML stage:

11/10/2014 ALIAKSANDR HUBIN. STK 9200

17

slide-18
SLIDE 18

Recursion order

11/10/2014 ALIAKSANDR HUBIN. STK 9200

18

slide-19
SLIDE 19

Inference on a function of states.

  • 6. It can also shown that having the smoothing distribution of the sequence of states it is easy to
  • btain the conditional expectation of some function assigned to the sequence, which can be

recursively updated when a new observation becomes available (t>s):

11/10/2014 ALIAKSANDR HUBIN. STK 9200

19

slide-20
SLIDE 20

Posterior mode estimation

We want to get the posterior mean of the states, namely: Maximization of the posterior joint distribution of its states is invariant to being logorithmed:

11/10/2014 ALIAKSANDR HUBIN. STK 9200

20

slide-21
SLIDE 21

Viterbi algorithm

Because of the special structure of the expression above the most likely value of 𝑦0 depends on

  • nly on 𝑦1 and after maximizing over 𝑦0, the most likely value of 𝑦1 depends only on 𝑦2 and so
  • n, which leads to the following dynamic programming algorithm:

After which other values of the sequence are recovered in the following way:

11/10/2014 ALIAKSANDR HUBIN. STK 9200

21

slide-22
SLIDE 22

Reference Probability Method

Let 𝑄 define the distribution when states and observations are independent with distribution g for the observations, then the following ratio can be derived: And for any absolutely continuous measurable transformation 𝜚: 𝑦 → ℝ, we have (easy to compute): On the other hand:

11/10/2014 ALIAKSANDR HUBIN. STK 9200

22

slide-23
SLIDE 23

Reference Probability Method

With From the last expression on previous slide one can easily derive filtering recursions for the states of the model: Thus, gives us an easy way to compute the filtering recursions for the states, Even for the case when the time becomes continuous (e.g. differential equation model for states)

11/10/2014 ALIAKSANDR HUBIN. STK 9200

23

slide-24
SLIDE 24

Forgetting initial distributions

Let us define the transition operator 𝐵∗and Bayes operator B: and Then the recursion densities which forget the initial distributions if 𝐵∗ and B are contracting for some norm of densities. It can be shown that initial distribution of the states is forgotten exponentially fast, and thus changes in filtering distributions and changes at fixed times disappear when the updates are made with the same observations.

11/10/2014 ALIAKSANDR HUBIN. STK 9200

24

slide-25
SLIDE 25

Linear and General State Space Model

Consider process for states And for the observations Where 𝐻𝑢 and 𝐼𝑢 are coefficient matrices; And 𝑊

𝑢~𝑂𝑙 0, Σ𝑢 and 𝑋 𝑢~𝑂𝑙 0, Ω𝑢 are independent Gaussian processes;

This can be generalized as With 𝑕𝑢 and ℎ𝑢 being some functions and 𝑊

𝑢, 𝑋 𝑢 having the same meaning.

11/10/2014 ALIAKSANDR HUBIN. STK 9200

25

slide-26
SLIDE 26

Kalman filtering and Smoothing

Then filtering densities for its hidden process look as follows: Known as the so called Kalman Filter. Kalman smoother correspondingly looks as follows:

11/10/2014 ALIAKSANDR HUBIN. STK 9200

26

slide-27
SLIDE 27

Extended Kalman Filtering and Smoothing

Use the standard Kalman filers and smoothers for the approximation of the process below:

11/10/2014 ALIAKSANDR HUBIN. STK 9200

27

slide-28
SLIDE 28

General Cases

  • Use robust approximations for filtering and smoothing
  • Use MCMC algorithms for filtering and smoothing

11/10/2014 ALIAKSANDR HUBIN. STK 9200

28

slide-29
SLIDE 29

Parameter estimation

Let state transitions 𝑏𝑢 depend on the vector of parameters 𝜐 and observation densities - 𝜃; Let 𝜄 = {𝜐, 𝜃}; Then the following methods are usually addressed for estimation of 𝜄 in HMM:

  • Maximal likelihood method:
  • EM(expectation-maximization)-algorithm, fast convergence to the neighborhood of the optimal

value, extremely slow performance within this region;

  • Newton's algorithm, slow convergence to the neighborhood of the optimal solution, good

performance within it; Methaheuristics and/or Combinations of them can be applied.

  • Bayesian algorithms (Gibbs, Metropolis, etc.) can be applied after having set the priors for 𝜐

and 𝜃. Full conditionals of posteriors are then as follows:

11/10/2014 ALIAKSANDR HUBIN. STK 9200

29

and

slide-30
SLIDE 30

Parameter estimation ML

11/10/2014 ALIAKSANDR HUBIN. STK 9200

30

  • Posterior joint probability of interest
  • Likelihood function to be maximized
  • Pdf of signal given states

sequence

  • Probability of the states' sequence

Where ℳ - is a model, s – is a sequence of states and X – is a set of the corresponding Observations.

slide-31
SLIDE 31

Parameter estimation ML

Thus we are to maximize the likelihood of the model to estimate the parameters of interest: Note that in general case the difficulty of these calculations is exponential 𝑃(𝑂𝑈), however in the efficient implementation the structure of the sequence of states yields into getting the polynomial difficulty algorithm.

11/10/2014 ALIAKSANDR HUBIN. STK 9200

31

slide-32
SLIDE 32

Spatial-Temporal Model

  • Consider positions spatial-temporal positions t and their neighborhoods L(t) on 1 … 𝑈

1 ×

1 … 𝑈

2 , with the following markov property 𝑄 𝑦 𝑢 𝑦 𝑣 , 𝑣 ≠ 𝑢 = 𝑄 𝑦 𝑢 𝑦 𝑣 , 𝑣 ∈ 𝑀(𝑢 )

  • Then given the prior for x: 𝑄 𝑦𝐷 =

1 𝑎 𝑑∈𝐷 𝑓−Ф𝑑(𝑦𝑑) for any class C of non-empty complete

subsets of L

  • The posterior for the state looks as:
  • The joint distribution of the observation within given neighborhood:
  • The joint density for the neighborhood:

11/10/2014 ALIAKSANDR HUBIN. STK 9200

32

slide-33
SLIDE 33

Spatial-Temporal Model. Issues

11/10/2014 ALIAKSANDR HUBIN. STK 9200

33

slide-34
SLIDE 34

Spatial-Temporal Model. Issues

11/10/2014 ALIAKSANDR HUBIN. STK 9200

34

slide-35
SLIDE 35

Spatial-Temporal Model. Treatments

1. Use pseudo likelihood: 2. Use mean field approximation for the normalizing constant: 3. Compute iterated conditional mode: 4. Apply EM to pseudo likelihood: 5. Use methaheuristics to compute the posterior mode and/or maximize pseudo likelihood 6. Estimate real likelihood by means of MCMC. 7. Etc.

11/10/2014 ALIAKSANDR HUBIN. STK 9200

35

slide-36
SLIDE 36

References

  • 1. Kunsch, H.R., State Space and Hidden Markov Models. ETH-Zurich, Zurich;
  • 2. Vaseghi, S. V., Hidden Markov Models. Wiley & Sons Ltd.

11/10/2014 ALIAKSANDR HUBIN. STK 9200

36

slide-37
SLIDE 37

The End

11/10/2014 ALIAKSANDR HUBIN. STK 9200

37