State Space and Hidden Markov Models
Aliaksandr Hubin Oslo 2014
Kunsch, H.R., State Space and Hidden Markov Models. ETH- Zurich, Zurich;
Markov Models Kunsch, H.R., State Space and Hidden Markov Models . - - PowerPoint PPT Presentation
State Space and Hidden Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State Space Model 4. Filtering and
Aliaksandr Hubin Oslo 2014
Kunsch, H.R., State Space and Hidden Markov Models. ETH- Zurich, Zurich;
1. Introduction 2. Markov Chains 3. Hidden Markov and State Space Model 4. Filtering and Smoothing of State Densities 5. Estimation of Parameters 6. Spatial-Temporal Model
11/10/2014 ALIAKSANDR HUBIN. STK 9200
2
HMM
are noisy and incomplete functions of some underlying unobservable process, called the state process, which is assumed to have simple markovian dynamics.
11/10/2014 ALIAKSANDR HUBIN. STK 9200
3
previous state:
transition probabilities matrix and initial probabilities
11/10/2014 ALIAKSANDR HUBIN. STK 9200
4
,... 2 , 1 }, , , , {
2 1
i m m m x
N i
, , , ,
2 1 k
x x x
) | ( ) , , , | (
1 1 2 1
k k k k
x x P x x x x P
) | (
j i ij
m m P a
) (
i i
m P
) ( ) | ( ) | ( ) | ( ) , , , ( ) | ( ) , , , ( ) , , , | ( ) , , , (
1 1 2 2 1 1 1 2 1 1 1 2 1 1 2 1 2 1
x P x x P x x P x x P x x x P x x P x x x P x x x x P x x x P
k k k k k k k k k k k
methylated’)=0.6
Suppose we want to calculate a probability of a sequence
methylated’, Non-methylated’}. This then corresponds to 0.4*0.3*0.7*0.8 = 6.72%
11/10/2014 ALIAKSANDR HUBIN. STK 9200
5
1. Distribution of transition probabilities ; 2. Distribution of initial state probabilities ; 3. Distribution of observation probabilities 𝜘𝑗 = 𝑄 𝑧𝑗 𝑦𝑗 .
𝑦𝑗 ∈ 𝑆𝑂- are usually addressed as state space models; 𝑦𝑗∈ {𝑛1, … , 𝑛𝑂} – are usually addressed as hidden markov chains;
Parameters listed above estimation might be very challenging for complicated models!
11/10/2014 ALIAKSANDR HUBIN. STK 9200
6
) | (
j i ij
m m P a
) (
i i
m P
The following components describe a Hidden Markov Model in the general case: 1. Distribution of initial state probabilities 𝑌0~𝑏0 𝑦 𝑒𝜈(𝑦); 2. Distribution of transition probabilities 𝑌𝑢| 𝑌𝑢−1 = 𝑦𝑢−1 ~𝑏𝑢 𝑦𝑢−1, 𝑦 𝑒𝜈(𝑦); 3. Distribution of observation probabilities 𝑍
𝑢 (𝑌𝑢= 𝑦𝑢 ~𝑐𝑢 𝑦𝑢, 𝑧 𝑒𝑤 𝑧 ;
4. The joint density of the process and observations then looks as follows: 𝑄 𝑌0, … , 𝑌𝑈, 𝑍
1, … , 𝑍 𝑈 = 𝑏0(𝑦0) 𝑢=1 𝑈
𝑏𝑢 𝑦𝑢−1, 𝑦𝑢 𝑐𝑢(𝑦𝑢, 𝑧𝑢) .
11/10/2014 ALIAKSANDR HUBIN. STK 9200
7
11/10/2014 ALIAKSANDR HUBIN. STK 9200
8
methylated’)=0.6
state-dependent probabilities
in the graph ->
11/10/2014 ALIAKSANDR HUBIN. STK 9200
9
Let further:
11/10/2014 ALIAKSANDR HUBIN. STK 9200
10
11/10/2014 ALIAKSANDR HUBIN. STK 9200
11
𝐹1 𝐹𝑢 𝐹𝑢+1 𝐹𝑈 𝐽𝑢+1 𝐽𝑈
𝑨1 𝑨𝑢
𝑨𝑢+1 𝑨𝑈
𝑧1 𝑧𝑢 𝑧𝑢+1 𝑧𝑈 … … … … … … … …
𝐽1 𝐽𝑢
Initial Extended
− location 𝑢 is methylated 0, − location 𝑢 is not methylated ~ 𝑄(𝐽𝑢) = 𝑄
1,
− location 𝑢 is methylated 𝑄2, −location 𝑢 is not methylated
𝑧𝑢 − it has binomial distribution
11/10/2014 ALIAKSANDR HUBIN. STK 9200
12
Let 𝑞𝑢be continuous: define stochastic process 𝑞(𝐽𝑢)= 𝐶𝑓𝑢𝑏(𝛾
𝑞𝑢−1 1−𝑞𝑢−1)
𝐶𝑓𝑢𝑏(𝛾
𝑟𝐽𝑢 1−𝑟𝐽𝑢
) , giving similarities when the state do not change but a renewal in cases where the state changes. Link state transition probabilities to underlying genomic structure Look at more global structures than transition probabilities 𝑄𝑗𝑘:
Consider more complex spatial structures than the one-directional approach above. Simultaneous modelling of several tissues and/or individuals at the same time.
11/10/2014 ALIAKSANDR HUBIN. STK 9200
13
1. Viterbi algorithm for fitting the joint distribution of the most probable sequence of states 𝑏𝑠𝑛𝑏𝑦𝑦1:𝑈{𝑄(𝑦1:𝑈|𝑒1:𝑈)} 2. Forward algorithm for fitting filtered probability of a given state, given data up to 𝑄(𝑦𝑢|𝑒1, … , 𝑒𝑢) 3. Forward–backward algorithm for fitting the smoothed probability of a given state, given all data 𝑄(𝑦𝑙|𝑒1, … , 𝑒𝑈) 4. Maximal likelihood maximization or Bayesian methods for parameters’ vector 𝜄 estimation Where 𝑒𝑢 is data at point t; Note that these algorithms are linear in the number of time points.
11/10/2014 ALIAKSANDR HUBIN. STK 9200
14
𝑔
𝑢+𝑙|𝑢 𝑦𝑢+𝑙 𝑧1 𝑢 = 𝑏𝑢+𝑙 𝑦, 𝑦𝑢+𝑙 𝑔 𝑢+𝑙−1|𝑢 𝑦 𝑧1 𝑢 𝑒𝜈(𝑦)
states: 𝑞 𝑧𝑢+𝑙 𝑧1
𝑢 = 𝑐𝑢+𝑙 𝑦, 𝑧𝑢+𝑙 𝑔 𝑢+𝑙 𝑦 𝑧1 𝑢 𝑒𝜈(𝑦)
recursion in t (starting with 𝑔
0|0):
𝑔
𝑢+1|𝑢+1 𝑦𝑢+1 𝑧1 𝑢+1 = 𝑐𝑢+1 𝑦𝑢+1,𝑧𝑢+1 𝑔𝑢+1|𝑢 𝑦𝑢+1 𝑧1 𝑢 𝑞 𝑧𝑢+𝑙 𝑧1 𝑢
11/10/2014 ALIAKSANDR HUBIN. STK 9200
15
transition densities: , where
11/10/2014 ALIAKSANDR HUBIN. STK 9200
16
The smoothing densities are then computed with respect to the recursions in t: and , where
11/10/2014 ALIAKSANDR HUBIN. STK 9200
17
11/10/2014 ALIAKSANDR HUBIN. STK 9200
18
recursively updated when a new observation becomes available (t>s):
11/10/2014 ALIAKSANDR HUBIN. STK 9200
19
We want to get the posterior mean of the states, namely: Maximization of the posterior joint distribution of its states is invariant to being logorithmed:
11/10/2014 ALIAKSANDR HUBIN. STK 9200
20
Because of the special structure of the expression above the most likely value of 𝑦0 depends on
After which other values of the sequence are recovered in the following way:
11/10/2014 ALIAKSANDR HUBIN. STK 9200
21
Let 𝑄 define the distribution when states and observations are independent with distribution g for the observations, then the following ratio can be derived: And for any absolutely continuous measurable transformation 𝜚: 𝑦 → ℝ, we have (easy to compute): On the other hand:
11/10/2014 ALIAKSANDR HUBIN. STK 9200
22
With From the last expression on previous slide one can easily derive filtering recursions for the states of the model: Thus, gives us an easy way to compute the filtering recursions for the states, Even for the case when the time becomes continuous (e.g. differential equation model for states)
11/10/2014 ALIAKSANDR HUBIN. STK 9200
23
Let us define the transition operator 𝐵∗and Bayes operator B: and Then the recursion densities which forget the initial distributions if 𝐵∗ and B are contracting for some norm of densities. It can be shown that initial distribution of the states is forgotten exponentially fast, and thus changes in filtering distributions and changes at fixed times disappear when the updates are made with the same observations.
11/10/2014 ALIAKSANDR HUBIN. STK 9200
24
Consider process for states And for the observations Where 𝐻𝑢 and 𝐼𝑢 are coefficient matrices; And 𝑊
𝑢~𝑂𝑙 0, Σ𝑢 and 𝑋 𝑢~𝑂𝑙 0, Ω𝑢 are independent Gaussian processes;
This can be generalized as With 𝑢 and ℎ𝑢 being some functions and 𝑊
𝑢, 𝑋 𝑢 having the same meaning.
11/10/2014 ALIAKSANDR HUBIN. STK 9200
25
Then filtering densities for its hidden process look as follows: Known as the so called Kalman Filter. Kalman smoother correspondingly looks as follows:
11/10/2014 ALIAKSANDR HUBIN. STK 9200
26
Use the standard Kalman filers and smoothers for the approximation of the process below:
11/10/2014 ALIAKSANDR HUBIN. STK 9200
27
11/10/2014 ALIAKSANDR HUBIN. STK 9200
28
Let state transitions 𝑏𝑢 depend on the vector of parameters 𝜐 and observation densities - 𝜃; Let 𝜄 = {𝜐, 𝜃}; Then the following methods are usually addressed for estimation of 𝜄 in HMM:
value, extremely slow performance within this region;
performance within it; Methaheuristics and/or Combinations of them can be applied.
and 𝜃. Full conditionals of posteriors are then as follows:
11/10/2014 ALIAKSANDR HUBIN. STK 9200
29
and
11/10/2014 ALIAKSANDR HUBIN. STK 9200
30
sequence
Where ℳ - is a model, s – is a sequence of states and X – is a set of the corresponding Observations.
Thus we are to maximize the likelihood of the model to estimate the parameters of interest: Note that in general case the difficulty of these calculations is exponential 𝑃(𝑂𝑈), however in the efficient implementation the structure of the sequence of states yields into getting the polynomial difficulty algorithm.
11/10/2014 ALIAKSANDR HUBIN. STK 9200
31
1 ×
1 … 𝑈
2 , with the following markov property 𝑄 𝑦 𝑢 𝑦 𝑣 , 𝑣 ≠ 𝑢 = 𝑄 𝑦 𝑢 𝑦 𝑣 , 𝑣 ∈ 𝑀(𝑢 )
1 𝑎 𝑑∈𝐷 𝑓−Ф𝑑(𝑦𝑑) for any class C of non-empty complete
subsets of L
11/10/2014 ALIAKSANDR HUBIN. STK 9200
32
11/10/2014 ALIAKSANDR HUBIN. STK 9200
33
11/10/2014 ALIAKSANDR HUBIN. STK 9200
34
1. Use pseudo likelihood: 2. Use mean field approximation for the normalizing constant: 3. Compute iterated conditional mode: 4. Apply EM to pseudo likelihood: 5. Use methaheuristics to compute the posterior mode and/or maximize pseudo likelihood 6. Estimate real likelihood by means of MCMC. 7. Etc.
11/10/2014 ALIAKSANDR HUBIN. STK 9200
35
11/10/2014 ALIAKSANDR HUBIN. STK 9200
36
11/10/2014 ALIAKSANDR HUBIN. STK 9200
37