csi5180 machinelearningfor bioinformaticsapplications
play

CSI5180. MachineLearningfor BioinformaticsApplications Hidden - PowerPoint PPT Presentation

CSI5180. MachineLearningfor BioinformaticsApplications Hidden Markov Models by Marcel Turcotte Version November 6, 2019 Preamble Preamble 2/82 Preamble Hidden Markov Models In this lecture, we focus on learning algorithms suited for


  1. Markov chains Like finite state automata (FSA): Finite Markov chains allow to model processes which can be represented by a finite number of states . A process can be in any of these states at a given time ; for some discrete units of time t = 0 , 1 , 2 , . . . . Background information 15/82

  2. Markov chains Like finite state automata (FSA): Finite Markov chains allow to model processes which can be represented by a finite number of states . A process can be in any of these states at a given time ; for some discrete units of time t = 0 , 1 , 2 , . . . . E.g. the type of nucleotide at a given position t in a sequence. Background information 15/82

  3. Markov chains Unlike FSAs: Background information 16/82

  4. Markov chains Unlike FSAs: The transitions from one state to another are stochastic (not deterministic). Background information 16/82

  5. Markov chains Unlike FSAs: The transitions from one state to another are stochastic (not deterministic). If the current state of the process at time t is E i then at time t + 1 either the process stays in E i or move to E j , for some j , according to a well-defined probability. Background information 16/82

  6. Markov chains Unlike FSAs: The transitions from one state to another are stochastic (not deterministic). If the current state of the process at time t is E i then at time t + 1 either the process stays in E i or move to E j , for some j , according to a well-defined probability. E.g. at time t + 1 the amino acid type for a given sequence position either stays the same of is substituted by one of the remaining 19 amino acid types, according to a well-defined probability, to be estimated. Background information 16/82

  7. Markov chains 0.4 E3 0.1 0.6 0.8 E1 0.1 0.6 E2 0.4 Background information 17/82

  8. GAATTC TA TATAAATAAGTATTAAATTCTGGTTAAAA TA TAGAAAAAATAGAATTAGATT Properties A (first order) Markovian process must conform to the following 2 properties: 1. Memoryless . If a process is in state E i at time t then the probability that it will be in state E j at time t + 1 only depends on E i (and not on the previous states visited at time t ′ < t , no history). This is called a first-order Markovian process. Background information 18/82

  9. GAATTC TA TATAAATAAGTATTAAATTCTGGTTAAAA TA TAGAAAAAATAGAATTAGATT Properties A (first order) Markovian process must conform to the following 2 properties: 1. Memoryless . If a process is in state E i at time t then the probability that it will be in state E j at time t + 1 only depends on E i (and not on the previous states visited at time t ′ < t , no history). This is called a first-order Markovian process. 2. Homogeneity of time . If a process is in state E i at time t then the probability that it will be in state E j at time t + 1 is independent of t . Background information 18/82

  10. GAATTC TA TATAAATAAGTATTAAATTCTGGTTAAAA TA TAGAAAAAATAGAATTAGATT Properties A (first order) Markovian process must conform to the following 2 properties: 1. Memoryless . If a process is in state E i at time t then the probability that it will be in state E j at time t + 1 only depends on E i (and not on the previous states visited at time t ′ < t , no history). This is called a first-order Markovian process. 2. Homogeneity of time . If a process is in state E i at time t then the probability that it will be in state E j at time t + 1 is independent of t . Background information 18/82

  11. GAATTC TA TATAAATAAGTATTAAATTCTGGTTAAAA TA TAGAAAAAATAGAATTAGATT Properties A (first order) Markovian process must conform to the following 2 properties: 1. Memoryless . If a process is in state E i at time t then the probability that it will be in state E j at time t + 1 only depends on E i (and not on the previous states visited at time t ′ < t , no history). This is called a first-order Markovian process. 2. Homogeneity of time . If a process is in state E i at time t then the probability that it will be in state E j at time t + 1 is independent of t . P ( X t + 1 = A | X t = T ) Background information 18/82

  12. Markov perperty, chain, process A Markov chain is a stochastic (probabilistic) model describing a sequence of events. Background information 19/82

  13. Markov perperty, chain, process A Markov chain is a stochastic (probabilistic) model describing a sequence of events. Herein, we focus on discrete-time homogeneous finite Markov chain models . Background information 19/82

  14. Markov chain A (first order) Markov chain is a sequence of random variables X 0 , . . . , X t − 1 , X t that satisfies the following property P ( X t = x t | X t − 1 = x t − 1 , X t − 2 = x t − 2 , . . . , X 0 = x 0 ) = P ( X t = x t | X t − 1 = x t − 1 ) Background information 20/82

  15. Markov chain More generally, a m - order Markov chain is a sequence of random variables: X 0 , . . . , X t − 1 , X t that satisfies the following property P ( X t = x t | X t − 1 = x t − 1 , X t − 2 = x t − 2 , . . . , X 0 = x 0 ) = P ( X t = x t | X t − 1 = x t − 1 , . . . , X t − m = x m ) Markov chain models are denoted Mm , where m is the order of the model, e.g. M 0, M 1, M 2, M 3, etc. A 0-order model is known as a Bernouilli model . Background information 21/82

  16. Transition probabilities The transition probabilities , p ij , can be represented graphically, 0.4 E3 0.1 0.6 0.8 E1 0.1 0.6 E2 0.4 or as a transition probability matrix ,   0 . 8 0 . 1 0 . 1 P = 0 . 6 0 . 4 0 . 0     0 . 6 0 . 0 0 . 4 Background information 22/82

  17. Transition probabilities   0 . 8 0 . 1 0 . 1 P = 0 . 6 0 . 4 0 . 0     0 . 6 0 . 0 0 . 4 where p ij is understood as the probability of a transition from state i (row) to state j (column). The values in a row represent all the transitions from state i , i.e. all outgoing arcs, and therefore their sum must be 1 . Background information 23/82

  18. Transition probabilities E3 0.2 0.1 0.4 0.2 0.4 0.6 0.5 0.2 E1 E2 E5 0.1 0.4 0.8 0.5 0.6 E4 The framework allows answering elegantly questions such as this one, ‘ ‘a Markovian random variable is in state E i at time t , what is the probability that it will be in state E j at t + 2 ? ” Background information 24/82

  19. Transition probabilities E3 0.2 0.1 0.4 0.2 0.4 0.6 0.5 0.2 E1 E2 E5 0.1 0.4 0.8 0.5 0.6 E4 The framework allows answering elegantly questions such as this one, ‘ ‘a Markovian random variable is in state E i at time t , what is the probability that it will be in state E j at t + 2 ? ” For the Markovian process graphically depicted above, knowing that a random variable is in state E 2 at time t what is the probability that it will be in state E 5 at t + 2, i.e. after two transitions? Background information 24/82

  20. Transition probabilities E3 0.2 0.1 0.4 0.2 0.4 0.6 0.5 0.2 E1 E2 E5 0.1 0.4 0.8 0.5 0.6 E4 There are exactly 3 paths of length 2 leading from E 2 to E 5 : ( E 2 , E 2 , E 5 ) , ( E 2 , E 3 , E 5 ) and ( E 2 , E 4 , E 5 ) . Background information 25/82

  21. Transition probabilities E3 0.2 0.1 0.4 0.2 0.4 0.6 0.5 0.2 E1 E2 E5 0.1 0.4 0.8 0.5 0.6 E4 There are exactly 3 paths of length 2 leading from E 2 to E 5 : ( E 2 , E 2 , E 5 ) , ( E 2 , E 3 , E 5 ) and ( E 2 , E 4 , E 5 ) . The probability that ( E 2 , E 2 , E 5 ) is followed is 0 . 2 × 0 . 2 = 0 . 04 Background information 25/82

  22. Transition probabilities E3 0.2 0.1 0.4 0.2 0.4 0.6 0.5 0.2 E1 E2 E5 0.1 0.4 0.8 0.5 0.6 E4 There are exactly 3 paths of length 2 leading from E 2 to E 5 : ( E 2 , E 2 , E 5 ) , ( E 2 , E 3 , E 5 ) and ( E 2 , E 4 , E 5 ) . The probability that ( E 2 , E 2 , E 5 ) is followed is 0 . 2 × 0 . 2 = 0 . 04 The probability that ( E 2 , E 3 , E 5 ) is followed is 0 . 1 × 0 . 4 = 0 . 04 Background information 25/82

  23. Transition probabilities E3 0.2 0.1 0.4 0.2 0.4 0.6 0.5 0.2 E1 E2 E5 0.1 0.4 0.8 0.5 0.6 E4 There are exactly 3 paths of length 2 leading from E 2 to E 5 : ( E 2 , E 2 , E 5 ) , ( E 2 , E 3 , E 5 ) and ( E 2 , E 4 , E 5 ) . The probability that ( E 2 , E 2 , E 5 ) is followed is 0 . 2 × 0 . 2 = 0 . 04 The probability that ( E 2 , E 3 , E 5 ) is followed is 0 . 1 × 0 . 4 = 0 . 04 The probability that ( E 2 , E 4 , E 5 ) is followed is 0 . 1 × 0 . 4 = 0 . 04 Background information 25/82

  24. Transition probabilities E3 0.2 0.1 0.4 0.2 0.4 0.6 0.5 0.2 E1 E2 E5 0.1 0.4 0.8 0.5 0.6 E4 There are exactly 3 paths of length 2 leading from E 2 to E 5 : ( E 2 , E 2 , E 5 ) , ( E 2 , E 3 , E 5 ) and ( E 2 , E 4 , E 5 ) . The probability that ( E 2 , E 2 , E 5 ) is followed is 0 . 2 × 0 . 2 = 0 . 04 The probability that ( E 2 , E 3 , E 5 ) is followed is 0 . 1 × 0 . 4 = 0 . 04 The probability that ( E 2 , E 4 , E 5 ) is followed is 0 . 1 × 0 . 4 = 0 . 04 Therefore, the probability that the random variable is found in E 5 at t + 2 knowing that it was in E 2 at t is 0 . 04 + 0 . 04 + 0 . 04 = 0 . 12. Background information 25/82

  25. 0.2 0.1 E3 0.4 0.2 0.4 0.6 0.5 0.2 E1 E2 E5 0.1 0.4 0.8 0.5 E4 0.6 In general , the probability that a random variable is found in state E j at t + 2 knowing that it was in E i at t is, p ( 2 ) ∑ = p ik p kj ij k Background information 26/82

  26. Gene finding Source: [2] Figure 1 Background information 27/82

  27. Gene finding Source: [1] Figure 1 Background information 28/82

  28. Hidden (latent) variables What is hidden ? Background information 29/82

  29. Dishonest casino A simplified example will help better understand hidden variables and the characteristics of HMMs. Background information 30/82

  30. Dishonest casino A simplified example will help better understand hidden variables and the characteristics of HMMs. I want to play a game . Background information 30/82

  31. Dishonest casino A simplified example will help better understand hidden variables and the characteristics of HMMs. I want to play a game . I will be tossing a coin n times. Background information 30/82

  32. Dishonest casino A simplified example will help better understand hidden variables and the characteristics of HMMs. I want to play a game . I will be tossing a coin n times. This information can be represented as follows: { H, T, T, H, T, T, . . . } or { 0, 1, 1, 0, 1, 1, . . . }. Background information 30/82

  33. Dishonest casino A simplified example will help better understand hidden variables and the characteristics of HMMs. I want to play a game . I will be tossing a coin n times. This information can be represented as follows: { H, T, T, H, T, T, . . . } or { 0, 1, 1, 0, 1, 1, . . . }. In fact, I will be using two coins ! Background information 30/82

  34. Dishonest casino A simplified example will help better understand hidden variables and the characteristics of HMMs. I want to play a game . I will be tossing a coin n times. This information can be represented as follows: { H, T, T, H, T, T, . . . } or { 0, 1, 1, 0, 1, 1, . . . }. In fact, I will be using two coins ! One coin is fair , i.e. head and tail are equiprobable outcomes, Background information 30/82

  35. Dishonest casino A simplified example will help better understand hidden variables and the characteristics of HMMs. I want to play a game . I will be tossing a coin n times. This information can be represented as follows: { H, T, T, H, T, T, . . . } or { 0, 1, 1, 0, 1, 1, . . . }. In fact, I will be using two coins ! One coin is fair , i.e. head and tail are equiprobable outcomes, but the other one is loaded (biased), it returns head with probability 1 4 and tail with probability 3 4 . Background information 30/82

  36. Dishonest casino If I were using the same coin for the duration of the game, then it would be easy for you to guess which coin I am using . Background information 31/82

  37. Dishonest casino If I were using the same coin for the duration of the game, then it would be easy for you to guess which coin I am using . For instance, we could look at the odds ratio : L P ( S ( i ) | Loaded ) ∏ P ( S | Loaded ) = i = 1 Background information 31/82

  38. Dishonest casino If I were using the same coin for the duration of the game, then it would be easy for you to guess which coin I am using . For instance, we could look at the odds ratio : L P ( S ( i ) | Loaded ) ∏ P ( S | Loaded ) = i = 1 Background information 31/82

  39. Dishonest casino If I were using the same coin for the duration of the game, then it would be easy for you to guess which coin I am using . For instance, we could look at the odds ratio : L P ( S ( i ) | Loaded ) ∏ P ( S | Loaded ) = i = 1 L ∏ P ( S ( i ) | Fair ) P ( S | Fair ) = i = 1 Background information 31/82

  40. Dishonest casino If I were using the same coin for the duration of the game, then it would be easy for you to guess which coin I am using . For instance, we could look at the odds ratio : L P ( S ( i ) | Loaded ) ∏ P ( S | Loaded ) = i = 1 L ∏ P ( S ( i ) | Fair ) P ( S | Fair ) = i = 1 ∏ L i = 1 P ( S ( i ) | Loaded ) P ( S | Loaded ) = ∏ L P ( S | Fair ) i = 1 P ( S ( i ) | Fair ) Background information 31/82

  41. Dishonest casino If I were using the same coin for the duration of the game, then it would be easy for you to guess which coin I am using . For instance, we could look at the odds ratio : L P ( S ( i ) | Loaded ) ∏ P ( S | Loaded ) = i = 1 L ∏ P ( S ( i ) | Fair ) P ( S | Fair ) = i = 1 ∏ L i = 1 P ( S ( i ) | Loaded ) P ( S | Loaded ) = ∏ L P ( S | Fair ) i = 1 P ( S ( i ) | Fair ) L log( P ( S ( i ) | Loaded ) log( P ( S | Loaded ) ∑ P ( S | Fair ) ) = P ( S ( i ) | Fair ) ) i = 1 Background information 31/82

  42. Dishonest casino Let’s consider a specific sequence: S = { H, T, T, H, T, T } Background information 32/82

  43. Dishonest casino Let’s consider a specific sequence: S = { H, T, T, H, T, T } 2 × 3 4 = 0 . 01977539062 P ( S | Loaded ) = 1 4 4 Background information 32/82

  44. Dishonest casino Let’s consider a specific sequence: S = { H, T, T, H, T, T } 2 × 3 4 = 0 . 01977539062 P ( S | Loaded ) = 1 4 4 6 = 0 . 015625 P ( S | Fair ) = 1 2 Background information 32/82

  45. Dishonest casino Let’s consider a specific sequence: S = { H, T, T, H, T, T } 2 × 3 4 = 0 . 01977539062 P ( S | Loaded ) = 1 4 4 6 = 0 . 015625 P ( S | Fair ) = 1 2 P ( S | Loaded ) = 1 . 265625 P ( S | Fair ) Background information 32/82

  46. Dishonest casino Let’s consider a specific sequence: S = { H, T, T, H, T, T } 2 × 3 4 = 0 . 01977539062 P ( S | Loaded ) = 1 4 4 6 = 0 . 015625 P ( S | Fair ) = 1 2 P ( S | Loaded ) = 1 . 265625 P ( S | Fair ) log( P ( S | Loaded ) P ( S | Fair ) ) = 0 . 1023050449 Background information 32/82

  47. Occasionally dishonest casino However, I will not reveal when I am exchanging the coins. Background information 33/82

  48. Occasionally dishonest casino However, I will not reveal when I am exchanging the coins. This information is hidden from you. Background information 33/82

  49. Occasionally dishonest casino However, I will not reveal when I am exchanging the coins. This information is hidden from you. Objective : Background information 33/82

  50. Occasionally dishonest casino However, I will not reveal when I am exchanging the coins. This information is hidden from you. Objective : Looking at a series of observations , S , can you predict when the exchanges of coins occurred? Background information 33/82

  51. HiddenMarkovModel Hidden Markov Model 34/82

  52. Hidden Markov Models (HMM) “A hidden Markov model (HMM) is a statistical model that can be used to describe the evolution of observable events [ symbols ] that depend on internal factors [ states ], which are not directly observable.” Hidden Markov Model 35/82

  53. Hidden Markov Models (HMM) “A hidden Markov model (HMM) is a statistical model that can be used to describe the evolution of observable events [ symbols ] that depend on internal factors [ states ], which are not directly observable.” “An HMM consists of two stochastic processes (. . . )”: Invisible process consisting of states Visible (observable) process consisting of symbols Hidden Markov Model 35/82

  54. Hidden Markov Models (HMM) “A hidden Markov model (HMM) is a statistical model that can be used to describe the evolution of observable events [ symbols ] that depend on internal factors [ states ], which are not directly observable.” “An HMM consists of two stochastic processes (. . . )”: Invisible process consisting of states Visible (observable) process consisting of symbols Yoon, B.-J. Hidden Markov Models and their Applications in Biological Sequence Analysis. Current Genomics 10 , 402415 (2009). Hidden Markov Model 35/82

  55. Definitions We need to distinguish between the sequence of states ( π ) and the sequence of symbols ( S ). Hidden Markov Model 36/82

  56. Definitions We need to distinguish between the sequence of states ( π ) and the sequence of symbols ( S ). The sequence of states, denoted by π and called the path , is modelled as a Markov chain , these transitions are not directly observable (they are hidden ), a kl = P ( π i = l | π i − 1 = k ) where a kl is a transition probability from the state k to l . Hidden Markov Model 36/82

  57. Definitions We need to distinguish between the sequence of states ( π ) and the sequence of symbols ( S ). The sequence of states, denoted by π and called the path , is modelled as a Markov chain , these transitions are not directly observable (they are hidden ), a kl = P ( π i = l | π i − 1 = k ) where a kl is a transition probability from the state k to l . Each state has emission probabilities associated with it: e k ( b ) = P ( S ( i ) = b | π i = k ) the probability of observing /emitting the symbol b when in state k . Hidden Markov Model 36/82

  58. Definitions The alphabet of emitted symbols , Σ , the set of (hidden) states , Q , a matrix of transition probabilities , A , as well as a the emission probabilities , E , are the parameters of an HMM: M = < Σ , Q , A , E > . Hidden Markov Model 37/82

  59. Definitions The alphabet of emitted symbols , Σ , the set of (hidden) states , Q , a matrix of transition probabilities , A , as well as a the emission probabilities , E , are the parameters of an HMM: M = < Σ , Q , A , E > . It is often useful, to think of an HMM as a device generating sequences. Hidden Markov Model 37/82

  60. Definitions The alphabet of emitted symbols , Σ , the set of (hidden) states , Q , a matrix of transition probabilities , A , as well as a the emission probabilities , E , are the parameters of an HMM: M = < Σ , Q , A , E > . It is often useful, to think of an HMM as a device generating sequences. With some probability the process stays in the same state or moves to the next state ; Hidden Markov Model 37/82

  61. Definitions The alphabet of emitted symbols , Σ , the set of (hidden) states , Q , a matrix of transition probabilities , A , as well as a the emission probabilities , E , are the parameters of an HMM: M = < Σ , Q , A , E > . It is often useful, to think of an HMM as a device generating sequences. With some probability the process stays in the same state or moves to the next state ; At each step, the process emits a symbol according to a well defined probability distribution; Hidden Markov Model 37/82

  62. Definitions The alphabet of emitted symbols , Σ , the set of (hidden) states , Q , a matrix of transition probabilities , A , as well as a the emission probabilities , E , are the parameters of an HMM: M = < Σ , Q , A , E > . It is often useful, to think of an HMM as a device generating sequences. With some probability the process stays in the same state or moves to the next state ; At each step, the process emits a symbol according to a well defined probability distribution; When looking at a sequence of observable symbols, the observer is wondering if the sequence could have been produced or not by the model . Hidden Markov Model 37/82

  63. Definitions The alphabet of emitted symbols , Σ , the set of (hidden) states , Q , a matrix of transition probabilities , A , as well as a the emission probabilities , E , are the parameters of an HMM: M = < Σ , Q , A , E > . It is often useful, to think of an HMM as a device generating sequences. With some probability the process stays in the same state or moves to the next state ; At each step, the process emits a symbol according to a well defined probability distribution; When looking at a sequence of observable symbols, the observer is wondering if the sequence could have been produced or not by the model . Remembering our discussion about finite state automata , an HMM is equivalent to a stochastic regular grammar . Hidden Markov Model 37/82

  64. Problems 1. P ( S , π ) : the joint probability of a sequence of symbols S and a sequence of states π . Hidden Markov Model 38/82

  65. Problems 1. P ( S , π ) : the joint probability of a sequence of symbols S and a sequence of states π . The decoding problem consists of finding a path π such that P ( S , π ) is maximum; Hidden Markov Model 38/82

  66. Problems 1. P ( S , π ) : the joint probability of a sequence of symbols S and a sequence of states π . The decoding problem consists of finding a path π such that P ( S , π ) is maximum; 2. P ( S | θ ) : the probability of a sequence of symbols S given the model θ . Hidden Markov Model 38/82

  67. Problems 1. P ( S , π ) : the joint probability of a sequence of symbols S and a sequence of states π . The decoding problem consists of finding a path π such that P ( S , π ) is maximum; 2. P ( S | θ ) : the probability of a sequence of symbols S given the model θ . It represents the likelihood that sequence S has been produced by this HMM, let’s call this the likelihood problem ; Hidden Markov Model 38/82

  68. Problems 1. P ( S , π ) : the joint probability of a sequence of symbols S and a sequence of states π . The decoding problem consists of finding a path π such that P ( S , π ) is maximum; 2. P ( S | θ ) : the probability of a sequence of symbols S given the model θ . It represents the likelihood that sequence S has been produced by this HMM, let’s call this the likelihood problem ; 3. Finally, how are the parameters of the model (HMM), θ , learnt ? Hidden Markov Model 38/82

  69. Problems 1. P ( S , π ) : the joint probability of a sequence of symbols S and a sequence of states π . The decoding problem consists of finding a path π such that P ( S , π ) is maximum; 2. P ( S | θ ) : the probability of a sequence of symbols S given the model θ . It represents the likelihood that sequence S has been produced by this HMM, let’s call this the likelihood problem ; 3. Finally, how are the parameters of the model (HMM), θ , learnt ? Let’s call this the parameter estimation problem . Hidden Markov Model 38/82

  70. Definitions ... ... D j I j ... ... M j Begin End Joint probability of a sequence of symbols S and a sequence of states π : L ∏ P ( S , π ) = a 0 π 1 e π i ( S ( i )) a π i π i + 1 i = 1 P ( S = VGPGGAHA , π = BEG , M 1 , M 2 , I 3 , I 3 , I 3 , M 3 , M 4 , M 5 , END ) ⇒ However in practice, the state sequence π is not known in advance. Hidden Markov Model 39/82

  71. Occasionally dishonest casino .9 .2 .1 P(0) = 1/2 P(0) = 1/4 P(1) = 1/2 P(1) = 3/4 .8 π π 1 2 Modelled using an HMM , where each state represents a coin , with its own emission probability distribution , and the transition probabilities represent exchanging the coins. Hidden Markov Model 40/82

  72. Worked example .9 .2 .1 P(0) = 1/2 P(0) = 1/4 P(1) = 1/2 P(1) = 3/4 .8 π π 1 2 Given an input sequence of symbols (heads and tails), such as 0, 1, 1, 0, 1, 1, 1, which sequence of states has the highest probability? Hidden Markov Model 41/82

  73. Worked example Which path leads to the highest joint probability? 0 1 1 0 1 1 1 S .9 .2 π π 1 π 1 π 1 π 1 π 1 π 1 π 1 π π 1 π 1 π 1 π 1 π 1 π 1 π 2 .1 P(0) = 1/2 P(0) = 1/4 . . . P(1) = 1/2 P(1) = 3/4 π π 2 π 2 π 1 π 1 π 2 π 2 π 2 .8 π π 1 2 . . . π π 2 π 2 π 2 π 2 π 2 π 2 π 2 Hidden Markov Model 42/82

  74. Brute-force Since the game consists of printing the series of switches from one coin to the other, selecting the path with the highest joint probability , P ( S , π ) , seems appropriate. Hidden Markov Model 43/82

  75. Brute-force Since the game consists of printing the series of switches from one coin to the other, selecting the path with the highest joint probability , P ( S , π ) , seems appropriate. Here, there are 2 7 = 128 possible paths, enumerating all of them is feasible. Hidden Markov Model 43/82

  76. Brute-force Since the game consists of printing the series of switches from one coin to the other, selecting the path with the highest joint probability , P ( S , π ) , seems appropriate. Here, there are 2 7 = 128 possible paths, enumerating all of them is feasible. However, the number of states and consequently the number of possible paths are generally much larger: O ( M L ) , where M is the number of states and L is the length of the sequence of symbols. Hidden Markov Model 43/82

  77. Decoding problem Given an observed sequence of symbols, S , the decoding problem consists of finding a sequence of states, π , such that the joint probability of S and π is maximum. argmax π P ( S , π ) Hidden Markov Model 44/82

  78. Decoding problem Given an observed sequence of symbols, S , the decoding problem consists of finding a sequence of states, π , such that the joint probability of S and π is maximum. argmax π P ( S , π ) For our game, the sequence of states is of interest because it serves to predict the exchanges of coins. Hidden Markov Model 44/82

  79. Decoding problem — Viterbi The most probable path can be found recursively . The score for the most probable path ending in state l with observation i , noted v l ( i ) , is given by, v l ( i ) = e l ( S ( i )) max k [ v k ( i − 1 ) a kl ] v k (i−1) k e (S(i)) a kl l l ... where k is running for states such that a kl is defined. Hidden Markov Model 45/82

  80. Decoding problem The algorithm for solving the decoding problem is known as the Viterbi algorithm . It finds the best (most probable) path using the dynamic programming technique. Forward. This requires filling the table v , for all i and for all l — see the definition of v l ( i ) on the previous slide. Traceback. Starting with v end ( n ) , the algorithm reverses the computation to find the path with maximum joint probability. Sean R Eddy, What is dynamic programming?, Nat Biotechnol 22 :7, 90910, 2004. Hidden Markov Model 46/82

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend