CSI5180. MachineLearningfor BioinformaticsApplications Hidden - - PowerPoint PPT Presentation

csi5180 machinelearningfor bioinformaticsapplications
SMART_READER_LITE
LIVE PREVIEW

CSI5180. MachineLearningfor BioinformaticsApplications Hidden - - PowerPoint PPT Presentation

CSI5180. MachineLearningfor BioinformaticsApplications Hidden Markov Models by Marcel Turcotte Version November 6, 2019 Preamble Preamble 2/82 Preamble Hidden Markov Models In this lecture, we focus on learning algorithms suited for


slide-1
SLIDE 1
  • CSI5180. MachineLearningfor

BioinformaticsApplications

Hidden Markov Models

by

Marcel Turcotte

Version November 6, 2019

slide-2
SLIDE 2

Preamble 2/82

Preamble

slide-3
SLIDE 3

Preamble

Preamble 3/82

Hidden Markov Models In this lecture, we focus on learning algorithms suited for sequence (string) input

  • data. In particular, we study Hidden Markov Models. First, we introduce Markov

processes and Markov chains. Next, using the example of the occasionally dishonest casino, we discern the concept of hidden variables. The presentation puts the emphasis on the graphical nature of these models. We use the example of a gene finder algorithm as running example. General objective :

Explain the concepts related to Hidden Markov Models.

slide-4
SLIDE 4

Learning objectives

Preamble 4/82

Discuss the properties of a Markovian process Explain the concept of hidden (latent) variables Describe Hidden Markov Models Name the important problems (questions) solved by HMM

Reading:

Sean R Eddy. What is a hidden Markov model? Nat Biotechnol 22(10):13156, Oct 2004. Byung-Jun Yoon. Hidden Markov Models and their applications in biological sequence analysis. Curr Genomics 10(6):40215, Sep 2009.

slide-5
SLIDE 5

Plan

Preamble 5/82

  • 1. Preamble
  • 2. Problem
  • 3. Background information
  • 4. Hidden Markov Model
  • 5. Applications
  • 6. Prologue
  • 7. Appendix
slide-6
SLIDE 6

Problem 6/82

Problem

slide-7
SLIDE 7

Gene structure in eukaryotes - the input

Problem 7/82

GAATTCTATATAAATAAGTATTAAATTCTGGTTAAAATATAGAAAAAATAGAATTAGATT CAATGATATCTAATAACATACCAAGGAGTAAAGGACTAATTGAGGATGACTAGTCATTCT ATAATTGGAAGCACGAATGAGGCTAAAAGAATGATAGTATGTTGTTCGATTCCAAAGGTG AAAACCAAAGACGGAGAATTCTTATGGAGTCCTGTCTATTTTTATTAACCCTGTGAATTG AAACATCTTAGTAACAGGAGGAAAAGAAATCAACCGAGATTTTAACGAGTAGTGGCGAGC GAAAGTAAATGAAAACATTCATGTTTTGATCCGAAATATCTTATCGATGTTTCGATTTTT TCAAAGACCCCGTACCGGGTCTTGGGGCATGTCTGAAATTGAACATCACACACTTACCCA TGATAAAGGAGATGGTTTGGATCTTCGATTCTACCATTTTCAGGCAGTGTGTTTATGGAA TGGGTGGCCAAAGAAGGTGAAAGTCCTGTAAATTTCAGTAGTAGACCACTTATGGAGTAG AACGAATTTTGTTCAGAAGAAAGGGGTACCATCCTCTAATAAATTAAATATGATAGAATG AACGATAGTGAAGAGTACCGTGAGGGAAAGTTGAAAAGTACCTCTAAGAGAACGAAGCCT TCCGAGGCTTCGAATATCAATGCCGGAGGGGTGAAATAGATCCTGAATCAGTTAAGTCTA AAAGCAGTTTGAGCCAAGATTATGGTGAAGACGTACCTTTTGCATAATGGGTCAGCAAGT TAATTTTTGGAGCAAGAGAACAAAAGAACGTATCTTTGGTACGTTGGTGATCTAAGTGAA AACAAAAGAACAAAGTGAGACTTAGTCTTACCCCTATACATAATTTTGCACTCAGTGTGA CATGGCCAGGTGTAAGACCGA(...52,624,944...)ATCGTAGTAATGCTCTCCGAT AAGAATCATTGATTCTTCGGACCCACATGGGTACCCATACTCCCCCCAAATGA

slide-8
SLIDE 8

Gene structure in eukaryotes

Problem 8/82



































 











 










 Regulatory
sequence



















 










 Regulatory
sequence



















 










 Promoter



















 










 Enhancer



















 










 /silencer



















 










 Terminator



















 






 Open
reading
frame











 










 5’UTR



















 










 3’UTR



















 










 Start



















 










 Stop



















 










 Enhancer



















 










 /silencer


































Transcription


































modification



















 










 Translation


































Pre­


































mRNA



















 










 Protein



















 










 DNA



















 










 Mature


































Post­transcription



















 










 mRNA



















 






 Core











 






 Proximal











 












 Exon























 












 Intron























 












 Intron























 












 Exon























 












 Exon























 








 Protein
coding
region















 












 Poly­A
tail























 








 5'cap















 Source: Thomas Shafee (https://commons.wikimedia.org/wiki/File:Gene_structure_eukaryote_2_annotated.svg)

slide-9
SLIDE 9

Gene structure

Problem 9/82

The gene structure comprises several elements, including a transcription start site, a 5’ untranslated region (5’ UTR), an open reading frame (ORF), a 3’ untranslated region (3’ UTR).

slide-10
SLIDE 10

Gene structure

Problem 9/82

The gene structure comprises several elements, including a transcription start site, a 5’ untranslated region (5’ UTR), an open reading frame (ORF), a 3’ untranslated region (3’ UTR). Upstream of the gene, there is a regulatory sequence comprising enhancers, silencers, and a promotor.

slide-11
SLIDE 11

Gene structure

Problem 9/82

The gene structure comprises several elements, including a transcription start site, a 5’ untranslated region (5’ UTR), an open reading frame (ORF), a 3’ untranslated region (3’ UTR). Upstream of the gene, there is a regulatory sequence comprising enhancers, silencers, and a promotor. In eukaryotes, genes are made coding segments, called exons, and non-coding segments, called introns, that need to be removed (spliced) prior to translation.

slide-12
SLIDE 12

Gene finding/prediction

Problem 10/82

Problem: identifying the segments of DNA that are making up protein-coding genes.

slide-13
SLIDE 13

Gene finding/prediction

Problem 10/82

Problem: identifying the segments of DNA that are making up protein-coding genes. This can be seen as segmentation and labelling of the DNA sequence.

slide-14
SLIDE 14

Gene finding/prediction

Problem 10/82

Problem: identifying the segments of DNA that are making up protein-coding genes. This can be seen as segmentation and labelling of the DNA sequence. A Hidden Markov Model allows representing and integrate these elements into one model. Furthermore, these models have been shown to be effective.

slide-15
SLIDE 15

Gene finding/prediction

Problem 11/82

GENSCAN

C Burge and S Karlin. Prediction of complete gene structures in human genomic DNA. J Mol Biol 268(1):7894, Apr 1997.

GENIE

Kulp, D., Haussler, D., Reese, M. G. & Eeckman, F. H. A generalized hidden Markov model for the recognition of human genes in DNA. ISMB International Conference on Intelligent Systems for Molecular Biology 4, 134142 (1996).

HMMGENE

Krogh, A. Two methods for improving performance of an HMM and their application for gene finding. ISMB International Conference on Intelligent Systems for Molecular Biology 5, 179186 (1997).

slide-16
SLIDE 16

Other applications

Problem 12/82

Other applications include:

  • 1. Modelling pairwise and multiple sequence alignments
  • 2. Protein secondary structure prediction
  • 3. Modelling transmembrane proteins
slide-17
SLIDE 17

Background information 13/82

Backgroundinformation

slide-18
SLIDE 18

Remarks

Background information 14/82

Our presentation will be informal. An entire course could be taught on Markov chains and stochastic processes.

MAT 4374 Modern Computational Statistics Simulation including the rejection method and importance sampling; applications to Monte Carlo Markov chains. Resampling methods such as the bootstrap and jackknife, with applications. Smoothing methods in curve estimation. MAT 5198 Stochastic Models Markov systems, stochastic networks, queuing networks, spatial processes, approximation methods in stochastic processes and queuing theory. Applications to the modelling and analysis of computer-communications systems and other distributed networks.

slide-19
SLIDE 19

Markov chains

Background information 15/82

Like finite state automata (FSA):

slide-20
SLIDE 20

Markov chains

Background information 15/82

Like finite state automata (FSA):

Finite Markov chains allow to model processes which can be represented by a finite number of states.

slide-21
SLIDE 21

Markov chains

Background information 15/82

Like finite state automata (FSA):

Finite Markov chains allow to model processes which can be represented by a finite number of states. A process can be in any of these states at a given time; for some discrete units of time t = 0, 1, 2, . . ..

slide-22
SLIDE 22

Markov chains

Background information 15/82

Like finite state automata (FSA):

Finite Markov chains allow to model processes which can be represented by a finite number of states. A process can be in any of these states at a given time; for some discrete units of time t = 0, 1, 2, . . .. E.g. the type of nucleotide at a given position t in a sequence.

slide-23
SLIDE 23

Markov chains

Background information 16/82

Unlike FSAs:

slide-24
SLIDE 24

Markov chains

Background information 16/82

Unlike FSAs:

The transitions from one state to another are stochastic (not deterministic).

slide-25
SLIDE 25

Markov chains

Background information 16/82

Unlike FSAs:

The transitions from one state to another are stochastic (not deterministic). If the current state of the process at time t is Ei then at time t + 1 either the process stays in Ei or move to Ej, for some j, according to a well-defined probability.

slide-26
SLIDE 26

Markov chains

Background information 16/82

Unlike FSAs:

The transitions from one state to another are stochastic (not deterministic). If the current state of the process at time t is Ei then at time t + 1 either the process stays in Ei or move to Ej, for some j, according to a well-defined probability. E.g. at time t + 1 the amino acid type for a given sequence position either stays the same of is substituted by one of the remaining 19 amino acid types, according to a well-defined probability, to be estimated.

slide-27
SLIDE 27

Markov chains

Background information 17/82

E1 E2 E3 0.4 0.4 0.8 0.6 0.1 0.6 0.1

slide-28
SLIDE 28

Properties

Background information 18/82

A (first order) Markovian process must conform to the following 2 properties:

  • 1. Memoryless. If a process is in state Ei at time t then the probability that it

will be in state Ej at time t + 1 only depends on Ei (and not on the previous states visited at time t′ < t, no history). This is called a first-order Markovian process.

GAATTCTATATAAATAAGTATTAAATTCTGGTTAAAATATAGAAAAAATAGAATTAGATT

slide-29
SLIDE 29

Properties

Background information 18/82

A (first order) Markovian process must conform to the following 2 properties:

  • 1. Memoryless. If a process is in state Ei at time t then the probability that it

will be in state Ej at time t + 1 only depends on Ei (and not on the previous states visited at time t′ < t, no history). This is called a first-order Markovian process.

  • 2. Homogeneity of time. If a process is in state Ei at time t then the

probability that it will be in state Ej at time t + 1 is independent of t.

GAATTCTATATAAATAAGTATTAAATTCTGGTTAAAATATAGAAAAAATAGAATTAGATT

slide-30
SLIDE 30

Properties

Background information 18/82

A (first order) Markovian process must conform to the following 2 properties:

  • 1. Memoryless. If a process is in state Ei at time t then the probability that it

will be in state Ej at time t + 1 only depends on Ei (and not on the previous states visited at time t′ < t, no history). This is called a first-order Markovian process.

  • 2. Homogeneity of time. If a process is in state Ei at time t then the

probability that it will be in state Ej at time t + 1 is independent of t.

GAATTCTATATAAATAAGTATTAAATTCTGGTTAAAATATAGAAAAAATAGAATTAGATT

slide-31
SLIDE 31

Properties

Background information 18/82

A (first order) Markovian process must conform to the following 2 properties:

  • 1. Memoryless. If a process is in state Ei at time t then the probability that it

will be in state Ej at time t + 1 only depends on Ei (and not on the previous states visited at time t′ < t, no history). This is called a first-order Markovian process.

  • 2. Homogeneity of time. If a process is in state Ei at time t then the

probability that it will be in state Ej at time t + 1 is independent of t.

GAATTCTATATAAATAAGTATTAAATTCTGGTTAAAATATAGAAAAAATAGAATTAGATT P(Xt+1 = A|Xt = T)

slide-32
SLIDE 32

Markov perperty, chain, process

Background information 19/82

A Markov chain is a stochastic (probabilistic) model describing a sequence of events.

slide-33
SLIDE 33

Markov perperty, chain, process

Background information 19/82

A Markov chain is a stochastic (probabilistic) model describing a sequence of events. Herein, we focus on discrete-time homogeneous finite Markov chain models.

slide-34
SLIDE 34

Markov chain

Background information 20/82

A (first order) Markov chain is a sequence of random variables X0, . . . , Xt−1, Xt that satisfies the following property

P(Xt = xt|Xt−1 = xt−1, Xt−2 = xt−2, . . . , X0 = x0) = P(Xt = xt|Xt−1 = xt−1)

slide-35
SLIDE 35

Markov chain

Background information 21/82

More generally, a m-order Markov chain is a sequence of random variables: X0, . . . , Xt−1, Xt that satisfies the following property

P(Xt = xt|Xt−1 = xt−1, Xt−2 = xt−2, . . . , X0 = x0) = P(Xt = xt|Xt−1 = xt−1, . . . , Xt−m = xm)

Markov chain models are denoted Mm, where m is the order of the model, e.g. M0, M1, M2, M3, etc. A 0-order model is known as a Bernouilli model.

slide-36
SLIDE 36

Transition probabilities

Background information 22/82

The transition probabilities, pij, can be represented graphically,

E1 E2 E3 0.4 0.4 0.8 0.6 0.1 0.6 0.1

  • r as a transition probability matrix,

P =

  

0.8 0.1 0.1 0.6 0.4 0.0 0.6 0.0 0.4

  

slide-37
SLIDE 37

Transition probabilities

Background information 23/82

P =

  

0.8 0.1 0.1 0.6 0.4 0.0 0.6 0.0 0.4

  

where pij is understood as the probability of a transition from state i (row) to state j (column). The values in a row represent all the transitions from state i, i.e. all

  • utgoing arcs, and therefore their sum must be 1.
slide-38
SLIDE 38

Transition probabilities

Background information 24/82

0.6 0.1 0.4 0.8 0.2 E1 E2 E3 E4 E5 0.5 0.4 0.4 0.2 0.2 0.1 0.5 0.6

The framework allows answering elegantly questions such as this one, ‘‘a Markovian random variable is in state Ei at time t, what is the probability that it will be in state Ej at t + 2?”

slide-39
SLIDE 39

Transition probabilities

Background information 24/82

0.6 0.1 0.4 0.8 0.2 E1 E2 E3 E4 E5 0.5 0.4 0.4 0.2 0.2 0.1 0.5 0.6

The framework allows answering elegantly questions such as this one, ‘‘a Markovian random variable is in state Ei at time t, what is the probability that it will be in state Ej at t + 2?” For the Markovian process graphically depicted above, knowing that a random variable is in state E2 at time t what is the probability that it will be in state E5 at t + 2, i.e. after two transitions?

slide-40
SLIDE 40

Transition probabilities

Background information 25/82

0.6 0.1 0.4 0.8 0.2 E1 E2 E3 E4 E5 0.5 0.4 0.4 0.2 0.2 0.1 0.5 0.6

There are exactly 3 paths of length 2 leading from E2 to E5: (E2, E2, E5), (E2, E3, E5) and (E2, E4, E5).

slide-41
SLIDE 41

Transition probabilities

Background information 25/82

0.6 0.1 0.4 0.8 0.2 E1 E2 E3 E4 E5 0.5 0.4 0.4 0.2 0.2 0.1 0.5 0.6

There are exactly 3 paths of length 2 leading from E2 to E5: (E2, E2, E5), (E2, E3, E5) and (E2, E4, E5).

The probability that (E2, E2, E5) is followed is 0.2 × 0.2 = 0.04

slide-42
SLIDE 42

Transition probabilities

Background information 25/82

0.6 0.1 0.4 0.8 0.2 E1 E2 E3 E4 E5 0.5 0.4 0.4 0.2 0.2 0.1 0.5 0.6

There are exactly 3 paths of length 2 leading from E2 to E5: (E2, E2, E5), (E2, E3, E5) and (E2, E4, E5).

The probability that (E2, E2, E5) is followed is 0.2 × 0.2 = 0.04 The probability that (E2, E3, E5) is followed is 0.1 × 0.4 = 0.04

slide-43
SLIDE 43

Transition probabilities

Background information 25/82

0.6 0.1 0.4 0.8 0.2 E1 E2 E3 E4 E5 0.5 0.4 0.4 0.2 0.2 0.1 0.5 0.6

There are exactly 3 paths of length 2 leading from E2 to E5: (E2, E2, E5), (E2, E3, E5) and (E2, E4, E5).

The probability that (E2, E2, E5) is followed is 0.2 × 0.2 = 0.04 The probability that (E2, E3, E5) is followed is 0.1 × 0.4 = 0.04 The probability that (E2, E4, E5) is followed is 0.1 × 0.4 = 0.04

slide-44
SLIDE 44

Transition probabilities

Background information 25/82

0.6 0.1 0.4 0.8 0.2 E1 E2 E3 E4 E5 0.5 0.4 0.4 0.2 0.2 0.1 0.5 0.6

There are exactly 3 paths of length 2 leading from E2 to E5: (E2, E2, E5), (E2, E3, E5) and (E2, E4, E5).

The probability that (E2, E2, E5) is followed is 0.2 × 0.2 = 0.04 The probability that (E2, E3, E5) is followed is 0.1 × 0.4 = 0.04 The probability that (E2, E4, E5) is followed is 0.1 × 0.4 = 0.04 Therefore, the probability that the random variable is found in E5 at t + 2 knowing that it was in E2 at t is 0.04 + 0.04 + 0.04 = 0.12.

slide-45
SLIDE 45

Background information 26/82

0.6 0.1 0.4 0.8 0.2 E1 E2 E3 E4 E5 0.5 0.4 0.4 0.2 0.2 0.1 0.5 0.6

In general, the probability that a random variable is found in state Ej at t + 2 knowing that it was in Ei at t is, p(2)

ij

=

k

pikpkj

slide-46
SLIDE 46

Gene finding

Background information 27/82

Source: [2] Figure 1

slide-47
SLIDE 47

Gene finding

Background information 28/82

Source: [1] Figure 1

slide-48
SLIDE 48

Hidden (latent) variables

Background information 29/82

What is hidden?

slide-49
SLIDE 49

Dishonest casino

Background information 30/82

A simplified example will help better understand hidden variables and the characteristics of HMMs.

slide-50
SLIDE 50

Dishonest casino

Background information 30/82

A simplified example will help better understand hidden variables and the characteristics of HMMs.

I want to play a game.

slide-51
SLIDE 51

Dishonest casino

Background information 30/82

A simplified example will help better understand hidden variables and the characteristics of HMMs.

I want to play a game. I will be tossing a coin n times.

slide-52
SLIDE 52

Dishonest casino

Background information 30/82

A simplified example will help better understand hidden variables and the characteristics of HMMs.

I want to play a game. I will be tossing a coin n times. This information can be represented as follows: { H, T, T, H, T, T, . . . } or { 0, 1, 1, 0, 1, 1, . . . }.

slide-53
SLIDE 53

Dishonest casino

Background information 30/82

A simplified example will help better understand hidden variables and the characteristics of HMMs.

I want to play a game. I will be tossing a coin n times. This information can be represented as follows: { H, T, T, H, T, T, . . . } or { 0, 1, 1, 0, 1, 1, . . . }. In fact, I will be using two coins!

slide-54
SLIDE 54

Dishonest casino

Background information 30/82

A simplified example will help better understand hidden variables and the characteristics of HMMs.

I want to play a game. I will be tossing a coin n times. This information can be represented as follows: { H, T, T, H, T, T, . . . } or { 0, 1, 1, 0, 1, 1, . . . }. In fact, I will be using two coins!

One coin is fair, i.e. head and tail are equiprobable outcomes,

slide-55
SLIDE 55

Dishonest casino

Background information 30/82

A simplified example will help better understand hidden variables and the characteristics of HMMs.

I want to play a game. I will be tossing a coin n times. This information can be represented as follows: { H, T, T, H, T, T, . . . } or { 0, 1, 1, 0, 1, 1, . . . }. In fact, I will be using two coins!

One coin is fair, i.e. head and tail are equiprobable outcomes, but the other one is loaded (biased), it returns head with probability 1

4 and

tail with probability 3

4.

slide-56
SLIDE 56

Dishonest casino

Background information 31/82

If I were using the same coin for the duration of the game, then it would be easy for you to guess which coin I am using.

slide-57
SLIDE 57

Dishonest casino

Background information 31/82

If I were using the same coin for the duration of the game, then it would be easy for you to guess which coin I am using. For instance, we could look at the odds ratio: P(S|Loaded) =

L

i=1

P(S(i)|Loaded)

slide-58
SLIDE 58

Dishonest casino

Background information 31/82

If I were using the same coin for the duration of the game, then it would be easy for you to guess which coin I am using. For instance, we could look at the odds ratio: P(S|Loaded) =

L

i=1

P(S(i)|Loaded)

slide-59
SLIDE 59

Dishonest casino

Background information 31/82

If I were using the same coin for the duration of the game, then it would be easy for you to guess which coin I am using. For instance, we could look at the odds ratio: P(S|Loaded) =

L

i=1

P(S(i)|Loaded) P(S|Fair) =

L

i=1

P(S(i)|Fair)

slide-60
SLIDE 60

Dishonest casino

Background information 31/82

If I were using the same coin for the duration of the game, then it would be easy for you to guess which coin I am using. For instance, we could look at the odds ratio: P(S|Loaded) =

L

i=1

P(S(i)|Loaded) P(S|Fair) =

L

i=1

P(S(i)|Fair) P(S|Loaded) P(S|Fair) =

∏L

i=1 P(S(i)|Loaded)

∏L

i=1 P(S(i)|Fair)

slide-61
SLIDE 61

Dishonest casino

Background information 31/82

If I were using the same coin for the duration of the game, then it would be easy for you to guess which coin I am using. For instance, we could look at the odds ratio: P(S|Loaded) =

L

i=1

P(S(i)|Loaded) P(S|Fair) =

L

i=1

P(S(i)|Fair) P(S|Loaded) P(S|Fair) =

∏L

i=1 P(S(i)|Loaded)

∏L

i=1 P(S(i)|Fair)

log(P(S|Loaded) P(S|Fair) ) =

L

i=1

log(P(S(i)|Loaded) P(S(i)|Fair) )

slide-62
SLIDE 62

Dishonest casino

Background information 32/82

Let’s consider a specific sequence: S = { H, T, T, H, T, T }

slide-63
SLIDE 63

Dishonest casino

Background information 32/82

Let’s consider a specific sequence: S = { H, T, T, H, T, T } P(S|Loaded) = 1

4 2 × 3 4 4 = 0.01977539062

slide-64
SLIDE 64

Dishonest casino

Background information 32/82

Let’s consider a specific sequence: S = { H, T, T, H, T, T } P(S|Loaded) = 1

4 2 × 3 4 4 = 0.01977539062

P(S|Fair) = 1

2 6 = 0.015625

slide-65
SLIDE 65

Dishonest casino

Background information 32/82

Let’s consider a specific sequence: S = { H, T, T, H, T, T } P(S|Loaded) = 1

4 2 × 3 4 4 = 0.01977539062

P(S|Fair) = 1

2 6 = 0.015625 P(S|Loaded) P(S|Fair)

= 1.265625

slide-66
SLIDE 66

Dishonest casino

Background information 32/82

Let’s consider a specific sequence: S = { H, T, T, H, T, T } P(S|Loaded) = 1

4 2 × 3 4 4 = 0.01977539062

P(S|Fair) = 1

2 6 = 0.015625 P(S|Loaded) P(S|Fair)

= 1.265625 log(P(S|Loaded)

P(S|Fair) ) = 0.1023050449

slide-67
SLIDE 67

Occasionally dishonest casino

Background information 33/82

However, I will not reveal when I am exchanging the coins.

slide-68
SLIDE 68

Occasionally dishonest casino

Background information 33/82

However, I will not reveal when I am exchanging the coins. This information is hidden from you.

slide-69
SLIDE 69

Occasionally dishonest casino

Background information 33/82

However, I will not reveal when I am exchanging the coins. This information is hidden from you. Objective:

slide-70
SLIDE 70

Occasionally dishonest casino

Background information 33/82

However, I will not reveal when I am exchanging the coins. This information is hidden from you. Objective:

Looking at a series of observations, S, can you predict when the exchanges

  • f coins occurred?
slide-71
SLIDE 71

Hidden Markov Model 34/82

HiddenMarkovModel

slide-72
SLIDE 72

Hidden Markov Models (HMM)

Hidden Markov Model 35/82

“A hidden Markov model (HMM) is a statistical model that can be used to describe the evolution of observable events [symbols] that depend on internal factors [states], which are not directly observable.”

slide-73
SLIDE 73

Hidden Markov Models (HMM)

Hidden Markov Model 35/82

“A hidden Markov model (HMM) is a statistical model that can be used to describe the evolution of observable events [symbols] that depend on internal factors [states], which are not directly observable.” “An HMM consists of two stochastic processes (. . . )”:

Invisible process consisting of states Visible (observable) process consisting of symbols

slide-74
SLIDE 74

Hidden Markov Models (HMM)

Hidden Markov Model 35/82

“A hidden Markov model (HMM) is a statistical model that can be used to describe the evolution of observable events [symbols] that depend on internal factors [states], which are not directly observable.” “An HMM consists of two stochastic processes (. . . )”:

Invisible process consisting of states Visible (observable) process consisting of symbols

Yoon, B.-J. Hidden Markov Models and their Applications in Biological Sequence

  • Analysis. Current Genomics 10, 402415 (2009).
slide-75
SLIDE 75

Definitions

Hidden Markov Model 36/82

We need to distinguish between the sequence of states (π) and the sequence of symbols (S).

slide-76
SLIDE 76

Definitions

Hidden Markov Model 36/82

We need to distinguish between the sequence of states (π) and the sequence of symbols (S).

The sequence of states, denoted by π and called the path, is modelled as a Markov chain, these transitions are not directly observable (they are hidden), akl = P(πi = l|πi−1 = k) where akl is a transition probability from the state k to l.

slide-77
SLIDE 77

Definitions

Hidden Markov Model 36/82

We need to distinguish between the sequence of states (π) and the sequence of symbols (S).

The sequence of states, denoted by π and called the path, is modelled as a Markov chain, these transitions are not directly observable (they are hidden), akl = P(πi = l|πi−1 = k) where akl is a transition probability from the state k to l. Each state has emission probabilities associated with it: ek(b) = P(S(i) = b|πi = k) the probability of observing/emitting the symbol b when in state k.

slide-78
SLIDE 78

Definitions

Hidden Markov Model 37/82

The alphabet of emitted symbols, Σ, the set of (hidden) states, Q, a matrix of transition probabilities, A, as well as a the emission probabilities, E, are the parameters of an HMM: M =< Σ, Q, A, E >.

slide-79
SLIDE 79

Definitions

Hidden Markov Model 37/82

The alphabet of emitted symbols, Σ, the set of (hidden) states, Q, a matrix of transition probabilities, A, as well as a the emission probabilities, E, are the parameters of an HMM: M =< Σ, Q, A, E >. It is often useful, to think of an HMM as a device generating sequences.

slide-80
SLIDE 80

Definitions

Hidden Markov Model 37/82

The alphabet of emitted symbols, Σ, the set of (hidden) states, Q, a matrix of transition probabilities, A, as well as a the emission probabilities, E, are the parameters of an HMM: M =< Σ, Q, A, E >. It is often useful, to think of an HMM as a device generating sequences.

With some probability the process stays in the same state or moves to the next state;

slide-81
SLIDE 81

Definitions

Hidden Markov Model 37/82

The alphabet of emitted symbols, Σ, the set of (hidden) states, Q, a matrix of transition probabilities, A, as well as a the emission probabilities, E, are the parameters of an HMM: M =< Σ, Q, A, E >. It is often useful, to think of an HMM as a device generating sequences.

With some probability the process stays in the same state or moves to the next state; At each step, the process emits a symbol according to a well defined probability distribution;

slide-82
SLIDE 82

Definitions

Hidden Markov Model 37/82

The alphabet of emitted symbols, Σ, the set of (hidden) states, Q, a matrix of transition probabilities, A, as well as a the emission probabilities, E, are the parameters of an HMM: M =< Σ, Q, A, E >. It is often useful, to think of an HMM as a device generating sequences.

With some probability the process stays in the same state or moves to the next state; At each step, the process emits a symbol according to a well defined probability distribution; When looking at a sequence of observable symbols, the observer is wondering if the sequence could have been produced or not by the model.

slide-83
SLIDE 83

Definitions

Hidden Markov Model 37/82

The alphabet of emitted symbols, Σ, the set of (hidden) states, Q, a matrix of transition probabilities, A, as well as a the emission probabilities, E, are the parameters of an HMM: M =< Σ, Q, A, E >. It is often useful, to think of an HMM as a device generating sequences.

With some probability the process stays in the same state or moves to the next state; At each step, the process emits a symbol according to a well defined probability distribution; When looking at a sequence of observable symbols, the observer is wondering if the sequence could have been produced or not by the model.

Remembering our discussion about finite state automata, an HMM is equivalent to a stochastic regular grammar.

slide-84
SLIDE 84

Problems

Hidden Markov Model 38/82

  • 1. P(S, π): the joint probability of a sequence of symbols S and a

sequence of states π.

slide-85
SLIDE 85

Problems

Hidden Markov Model 38/82

  • 1. P(S, π): the joint probability of a sequence of symbols S and a

sequence of states π.

The decoding problem consists of finding a path π such that P(S, π) is maximum;

slide-86
SLIDE 86

Problems

Hidden Markov Model 38/82

  • 1. P(S, π): the joint probability of a sequence of symbols S and a

sequence of states π.

The decoding problem consists of finding a path π such that P(S, π) is maximum;

  • 2. P(S|θ): the probability of a sequence of symbols S given the model θ.
slide-87
SLIDE 87

Problems

Hidden Markov Model 38/82

  • 1. P(S, π): the joint probability of a sequence of symbols S and a

sequence of states π.

The decoding problem consists of finding a path π such that P(S, π) is maximum;

  • 2. P(S|θ): the probability of a sequence of symbols S given the model θ.

It represents the likelihood that sequence S has been produced by this HMM, let’s call this the likelihood problem;

slide-88
SLIDE 88

Problems

Hidden Markov Model 38/82

  • 1. P(S, π): the joint probability of a sequence of symbols S and a

sequence of states π.

The decoding problem consists of finding a path π such that P(S, π) is maximum;

  • 2. P(S|θ): the probability of a sequence of symbols S given the model θ.

It represents the likelihood that sequence S has been produced by this HMM, let’s call this the likelihood problem;

  • 3. Finally, how are the parameters of the model (HMM), θ, learnt?
slide-89
SLIDE 89

Problems

Hidden Markov Model 38/82

  • 1. P(S, π): the joint probability of a sequence of symbols S and a

sequence of states π.

The decoding problem consists of finding a path π such that P(S, π) is maximum;

  • 2. P(S|θ): the probability of a sequence of symbols S given the model θ.

It represents the likelihood that sequence S has been produced by this HMM, let’s call this the likelihood problem;

  • 3. Finally, how are the parameters of the model (HMM), θ, learnt?

Let’s call this the parameter estimation problem.

slide-90
SLIDE 90

Definitions

Hidden Markov Model 39/82 Begin

I j M j Dj ... ...

End

... ...

Joint probability of a sequence of symbols S and a sequence of states π: P(S, π) = a0π1

L

i=1

eπi(S(i))aπiπi+1 P(S = VGPGGAHA, π = BEG, M1, M2, I3, I3, I3, M3, M4, M5, END) ⇒ However in practice, the state sequence π is not known in advance.

slide-91
SLIDE 91

Occasionally dishonest casino

Hidden Markov Model 40/82

P(0) = 1/2 P(1) = 1/2 π P(0) = 1/4 P(1) = 3/4 π .2 .9 .8 .1

1 2

Modelled using an HMM, where each state represents a coin, with its

  • wn emission probability distribution, and the transition probabilities

represent exchanging the coins.

slide-92
SLIDE 92

Worked example

Hidden Markov Model 41/82

P(0) = 1/2 P(1) = 1/2 π P(0) = 1/4 P(1) = 3/4 π .2 .9 .8 .1

1 2

Given an input sequence of symbols (heads and tails), such as 0, 1, 1, 0, 1, 1, 1, which sequence of states has the highest probability?

slide-93
SLIDE 93

Worked example

Hidden Markov Model 42/82

Which path leads to the highest joint probability?

S 1 1 1 1 1 π π1 π1 π1 π1 π1 π1 π1 π π1 π1 π1 π1 π1 π1 π2 . . . π π2 π2 π1 π1 π2 π2 π2 . . . π π2 π2 π2 π2 π2 π2 π2

P(0) = 1/2 P(1) = 1/2 π P(0) = 1/4 P(1) = 3/4 π .2 .9 .8 .1

1 2

slide-94
SLIDE 94

Brute-force

Hidden Markov Model 43/82

Since the game consists of printing the series of switches from one coin to the other, selecting the path with the highest joint probability, P(S, π), seems appropriate.

slide-95
SLIDE 95

Brute-force

Hidden Markov Model 43/82

Since the game consists of printing the series of switches from one coin to the other, selecting the path with the highest joint probability, P(S, π), seems appropriate. Here, there are 27 = 128 possible paths, enumerating all of them is feasible.

slide-96
SLIDE 96

Brute-force

Hidden Markov Model 43/82

Since the game consists of printing the series of switches from one coin to the other, selecting the path with the highest joint probability, P(S, π), seems appropriate. Here, there are 27 = 128 possible paths, enumerating all of them is feasible. However, the number of states and consequently the number of possible paths are generally much larger: O(ML), where M is the number of states and L is the length of the sequence of symbols.

slide-97
SLIDE 97

Decoding problem

Hidden Markov Model 44/82

Given an observed sequence of symbols, S, the decoding problem consists

  • f finding a sequence of states, π, such that the joint probability of S and π

is maximum. argmaxπ P(S, π)

slide-98
SLIDE 98

Decoding problem

Hidden Markov Model 44/82

Given an observed sequence of symbols, S, the decoding problem consists

  • f finding a sequence of states, π, such that the joint probability of S and π

is maximum. argmaxπ P(S, π) For our game, the sequence of states is of interest because it serves to predict the exchanges of coins.

slide-99
SLIDE 99

Decoding problem — Viterbi

Hidden Markov Model 45/82

The most probable path can be found recursively. The score for the most probable path ending in state l with observation i, noted vl(i), is given by, vl(i) = el(S(i)) max

k [vk(i − 1)akl]

l ... k akl e (S(i)) vk

l (i−1)

where k is running for states such that akl is defined.

slide-100
SLIDE 100

Decoding problem

Hidden Markov Model 46/82

The algorithm for solving the decoding problem is known as the Viterbi

  • algorithm. It finds the best (most probable) path using the dynamic

programming technique.

  • Forward. This requires filling the table v, for all i and for all l — see the

definition of vl(i) on the previous slide.

  • Traceback. Starting with vend(n), the algorithm reverses the computation to

find the path with maximum joint probability.

Sean R Eddy, What is dynamic programming?, Nat Biotechnol 22:7, 90910, 2004.

slide-101
SLIDE 101

Decoding problem — table v

Hidden Markov Model 47/82

π1 π2

S(1) ... S(2) S(3) S(n-1) S(n)

πm

...

slide-102
SLIDE 102

Decoding problem — gene finding

Hidden Markov Model 48/82

Source: [2] Figure 1

For a given input sequence, the decoding problem reveals the path with maximum join probability. Effectively telling us the nucleotides that are likely to be in exons (states E1, E2, E3) and those that likely to be in introns (state I).

slide-103
SLIDE 103

Decoding problem — Gene finding

Hidden Markov Model 49/82

Source: [1] Figure 1

slide-104
SLIDE 104

The likelihood problem: calculating P(S|θ)

Hidden Markov Model 50/82

In the case of a Markov chain there is a single path for a given sequence S and therefore P(S|θ) is given by, P(S|θ) = P(S(1))

n

i=2

aS(i−1)S(i)

slide-105
SLIDE 105

The likelihood problem: calculating P(S|θ)

Hidden Markov Model 50/82

In the case of a Markov chain there is a single path for a given sequence S and therefore P(S|θ) is given by, P(S|θ) = P(S(1))

n

i=2

aS(i−1)S(i) In the case of an HMM, there are several paths producing the same S (some paths will be more likely than others) and P(S|θ) should be defined as the sum of all the probabilities of all possible paths producing S, P(S|θ) =

π

P(S, π)

slide-106
SLIDE 106

The likelihood problem: calculating P(S|θ)

Hidden Markov Model 50/82

In the case of a Markov chain there is a single path for a given sequence S and therefore P(S|θ) is given by, P(S|θ) = P(S(1))

n

i=2

aS(i−1)S(i) In the case of an HMM, there are several paths producing the same S (some paths will be more likely than others) and P(S|θ) should be defined as the sum of all the probabilities of all possible paths producing S, P(S|θ) =

π

P(S, π) The number of paths grows exponentially with respect to the length of the sequence, therefore all the paths cannot simply be enumerated and summed.

slide-107
SLIDE 107

The likelihood problem: forward algorithm

Hidden Markov Model 51/82

Modifying the Viterbi algorithm changing the maximization by a sum calculates the probability of the observed sequence up to position i ending in state l, fl(i) = el(S(i))

k

fk(i − 1)akl

slide-108
SLIDE 108

The likelihood problem

Hidden Markov Model 52/82

The score represents the probability of the sequence up to (and including) S(i), noted fl(i), is given by, fl(i) = el(S(i))

k

[fk(i − 1)akl]

l ... k akl e (S(i)) fk

l (i−1)

where k is running for states such that akl is defined.

slide-109
SLIDE 109

Model specification

Hidden Markov Model 53/82

We now turn to our third and final question. How to determine the parameters of the model? Let x1, . . . , xN be N independent examples forming the training set (typically, N sequences of observable symbols), the objective is to find a set parameters, θ, such that: max

θ

ΠN

i=1P(xi|θ)

slide-110
SLIDE 110

Model Specification

Hidden Markov Model 54/82

Structure (topology): states + interconnect Estimating the transition/emission probabilities

slide-111
SLIDE 111

Modelling the length

Hidden Markov Model 55/82

At least 5 symbols long 2 to 8 symbols long

slide-112
SLIDE 112

Arbitrary deletions

Hidden Markov Model 56/82

Too expensive, too many parameters to evaluate! Silent (null) states do not emit symbols. ⇒ Silent states prevent modelling specific distant transitions.

slide-113
SLIDE 113

Profile HMMs

Hidden Markov Model 57/82

Begin

I j M j Dj ... ...

End

... ...

⇒ Models insertion/deletions separately.

slide-114
SLIDE 114

Gene prediction

Hidden Markov Model 58/82

5’ 3’ GT AG ... Flanking region Exon n CAAT GC TATA box box box box GT AG 5’UTR 3’UTR initiation Transcription Stop codon Poly (A) Initiation codon Exon 2 Flanking region GC Exon 1 Intron I

Source: [2] Figure 1

slide-115
SLIDE 115

The parameter estimation problem

Hidden Markov Model 59/82

Problem: estimate the ast and ek(b) probabilities. Given:

a fixed topology; n independent positive examples: S1, S2, . . . , Sn.

slide-116
SLIDE 116

The parameter estimation problem

Hidden Markov Model 59/82

Problem: estimate the ast and ek(b) probabilities. Given:

a fixed topology; n independent positive examples: S1, S2, . . . , Sn. Two scenarios:

The paths are known (existing annotated genes) The paths are unknown

slide-117
SLIDE 117

Expectation-Maximization (EM) algorithm

Hidden Markov Model 60/82

  • 1. Choose an initial model. If no prior information is available, make all the

transition probabilities equiprobable, similarly for the emission probabilities;

slide-118
SLIDE 118

Expectation-Maximization (EM) algorithm

Hidden Markov Model 60/82

  • 1. Choose an initial model. If no prior information is available, make all the

transition probabilities equiprobable, similarly for the emission probabilities;

  • 2. Use the decoding algorithm for finding the maximum likelihood path for

each input sequence (E-step);

slide-119
SLIDE 119

Expectation-Maximization (EM) algorithm

Hidden Markov Model 60/82

  • 1. Choose an initial model. If no prior information is available, make all the

transition probabilities equiprobable, similarly for the emission probabilities;

  • 2. Use the decoding algorithm for finding the maximum likelihood path for

each input sequence (E-step);

  • 3. Using these paths, tally statistics for estimating all akl and ek(b) values

(M-step);

slide-120
SLIDE 120

Expectation-Maximization (EM) algorithm

Hidden Markov Model 60/82

  • 1. Choose an initial model. If no prior information is available, make all the

transition probabilities equiprobable, similarly for the emission probabilities;

  • 2. Use the decoding algorithm for finding the maximum likelihood path for

each input sequence (E-step);

  • 3. Using these paths, tally statistics for estimating all akl and ek(b) values

(M-step);

  • 4. Repeat 3 and 4 until the parameter estimates converge.
slide-121
SLIDE 121

Expectation-Maximization (EM) algorithm

Hidden Markov Model 60/82

  • 1. Choose an initial model. If no prior information is available, make all the

transition probabilities equiprobable, similarly for the emission probabilities;

  • 2. Use the decoding algorithm for finding the maximum likelihood path for

each input sequence (E-step);

  • 3. Using these paths, tally statistics for estimating all akl and ek(b) values

(M-step);

  • 4. Repeat 3 and 4 until the parameter estimates converge.
slide-122
SLIDE 122

Expectation-Maximization (EM) algorithm

Hidden Markov Model 60/82

  • 1. Choose an initial model. If no prior information is available, make all the

transition probabilities equiprobable, similarly for the emission probabilities;

  • 2. Use the decoding algorithm for finding the maximum likelihood path for

each input sequence (E-step);

  • 3. Using these paths, tally statistics for estimating all akl and ek(b) values

(M-step);

  • 4. Repeat 3 and 4 until the parameter estimates converge.

Sometimes called the Baum-Welch algorithm. Gradient descent can also be used.

Chuong B Do and Serafim Batzoglou, What is the expectation maximization algorithm?, Nat Biotechnol 26:8, 897899, 2008.

slide-123
SLIDE 123

Applications 61/82

Applications

slide-124
SLIDE 124

sklearn.hmm

Applications 62/82

sklearn.hmm has been “deprecated due to it no longer matching the scope and the API of the project.” It was removed starting with the release 0.17 [as of writing this, the current version of Scikit-Learn is 0.21.3]. Pomegranate implements probabilistic models, including hidden Markov models.

Documentation

Most of the time, hidden Markov models are implemented in specializedd tools, such as GENSCAN, GENIE, HMMGENE, UGENE, SAM, HMMER, etc.

slide-125
SLIDE 125

HMMER

Applications 63/82

Eddy, S. R. Profile hidden Markov models. Bioinformatics 14, 755763 (1998).

3371 citations according to Scopus

  • J. Mistry, R. D. Finn, S. R. Eddy, A. Bateman, M. Punta. Challenges in

Homology Search: HMMER3 and Convergent Evolution of Coiled-Coil

  • Regions. Nucleic Acids Research 41:e121, 2013.

http://hmmer.org/publications.html http://hmmer.org

slide-126
SLIDE 126

Pfam

Applications 64/82

“The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).” E.L.L. Sonnhammer, S.R. Eddy and R. Durbin. Pfam: a comprehensive database of protein families based on seed alignments. Proteins 28:405-420, 1997.

806 citations according to Scopus

  • S. El-Gebali, et al. The Pfam protein families database in 2019. Nucleic

Acids Research (2019), doi: 10.1093/nar/gky995 Pfam 32.0, September 2018, 17,929 entries https://pfam.xfam.org

slide-127
SLIDE 127

Prologue 65/82

Prologue

slide-128
SLIDE 128

Summary

Prologue 66/82

A Markov process is memoryless, which means that the probability that the system be in state Ej at time t depends only on the state, Ei, at time time t − 1 (first order model).

slide-129
SLIDE 129

Summary

Prologue 66/82

A Markov process is memoryless, which means that the probability that the system be in state Ej at time t depends only on the state, Ei, at time time t − 1 (first order model). Those probabilities do not depend on the value of t. This property is called homogeneity of time. Here, time is finite.

slide-130
SLIDE 130

Summary

Prologue 66/82

A Markov process is memoryless, which means that the probability that the system be in state Ej at time t depends only on the state, Ei, at time time t − 1 (first order model). Those probabilities do not depend on the value of t. This property is called homogeneity of time. Here, time is finite. A hidden Markov model comprises two elements: a sequence of

  • bservable symbols and a sequence of hidden states.
slide-131
SLIDE 131

Next module

Prologue 67/82

Support Vector Machine

slide-132
SLIDE 132

Appendix 68/82

Appendix

slide-133
SLIDE 133

Decoding problem

Appendix 69/82

If the observed sequence of symbols was of length one, the sequence of states would also be of length one (in our restricted example).

slide-134
SLIDE 134

Decoding problem

Appendix 69/82

If the observed sequence of symbols was of length one, the sequence of states would also be of length one (in our restricted example). Which state would you predict if the observed symbol was a 0? What if it was a 1?

slide-135
SLIDE 135

Decoding problem

Appendix 69/82

If the observed sequence of symbols was of length one, the sequence of states would also be of length one (in our restricted example). Which state would you predict if the observed symbol was a 0? What if it was a 1? Now consider an observed sequence of length two, let’s assume that the last symbol is 1, what is the probability of that symbol being emitted from state π1?

slide-136
SLIDE 136

Decoding problem

Appendix 69/82

If the observed sequence of symbols was of length one, the sequence of states would also be of length one (in our restricted example). Which state would you predict if the observed symbol was a 0? What if it was a 1? Now consider an observed sequence of length two, let’s assume that the last symbol is 1, what is the probability of that symbol being emitted from state π1? There are two ways of ending up in π1 while producing S(2):

slide-137
SLIDE 137

Decoding problem

Appendix 69/82

If the observed sequence of symbols was of length one, the sequence of states would also be of length one (in our restricted example). Which state would you predict if the observed symbol was a 0? What if it was a 1? Now consider an observed sequence of length two, let’s assume that the last symbol is 1, what is the probability of that symbol being emitted from state π1? There are two ways of ending up in π1 while producing S(2):

  • 1. S(1) could have been produced from π1, and the state remained π1,
slide-138
SLIDE 138

Decoding problem

Appendix 69/82

If the observed sequence of symbols was of length one, the sequence of states would also be of length one (in our restricted example). Which state would you predict if the observed symbol was a 0? What if it was a 1? Now consider an observed sequence of length two, let’s assume that the last symbol is 1, what is the probability of that symbol being emitted from state π1? There are two ways of ending up in π1 while producing S(2):

  • 1. S(1) could have been produced from π1, and the state remained π1,
  • 2. or 2) S(1) could have been produced from π2, and there was a transition π2

to π1.

slide-139
SLIDE 139

Decoding problem

Appendix 69/82

If the observed sequence of symbols was of length one, the sequence of states would also be of length one (in our restricted example). Which state would you predict if the observed symbol was a 0? What if it was a 1? Now consider an observed sequence of length two, let’s assume that the last symbol is 1, what is the probability of that symbol being emitted from state π1? There are two ways of ending up in π1 while producing S(2):

  • 1. S(1) could have been produced from π1, and the state remained π1,
  • 2. or 2) S(1) could have been produced from π2, and there was a transition π2

to π1.

The two joint probabilities would be P(S(1)|π1)P(π1 → π1)P(S(2)|π1) and P(S(1)|π2)P(π2 → π1)P(S(2)|π1).

slide-140
SLIDE 140

Decoding problem

Appendix 70/82

Now consider an observed sequence of length three, let’s assume that the last symbol is 1, what is the probability of that symbol being emitted from state π1? There are two ways of ending up in π1 while producing S(3):

  • 1. the last state that led to the production of the sequence of symbols S[1, 2]

was π1 and the state remained π1,

  • 2. the last state that led to the production of the sequence of symbols S[1, 2]

was π2 and it is followed by a transition π2 to π1, with probability a21. Let’s define vk(i) as the probability of the most probable path ending in state k while producing the observation i. Using this notation for formulating the probabilities for the above two scenarios. v1(3) = max [ v1(2) × a11 × e1(0), v2(2) × a21 × e1(0) ]

slide-141
SLIDE 141

Decoding problem

Appendix 71/82

For our 2 states HMM, we can write the following equation, v1(i) = max [ v1(i − 1) × a11 × e1(S(i)), v2(i − 1) × a21 × e1(S(i)) ] v2(i) = max [ v1(i − 1) × a12 × e2(S(i)), v2(i − 1) × a22 × e2(S(i)) ]

slide-142
SLIDE 142

Decoding problem

Appendix 72/82

π1 π2

1 1 1 1 1

slide-143
SLIDE 143

Decoding problem

Appendix 73/82

The most probable path can be found recursively. The score for the most probable path ending in state l with observation i, noted vl(i), is given by, vl(i) = el(S(i)) max

k [vk(i − 1)akl]

l ... k akl e (S(i)) vk

l (i−1)

where k is running for states such that akl is defined.

slide-144
SLIDE 144

Decoding problem

Appendix 74/82

The algorithm for solving the decoding problem is known as the Viterbi

  • algorithm. It finds the best (most probable) path using the dynamic

programming technique.

  • Forward. First, this requires filling the table v, for all i and for all l — see

the definition of vl(i) on the previous slide.

  • Traceback. Next, starting with vend(n), the algorithm reverses the

computation to find the path with maximum joint probability.

Sean R Eddy, What is dynamic programming?, Nat Biotechnol 22:7, 90910, 2004.

slide-145
SLIDE 145

Decoding problem: Viterbi algorithm

Appendix 75/82

Initialization: v0 = 1, vk = 0, k > 0 Recurrence: vl(i) = el(S(i)) max

k (vk(i − 1)akl)

where, vk(i) represents the probability of the most probable path ending in state k and position i in S. A pointer (backward) is kept from l to the value of k that maximizes vk(i − 1)akl.

⇒ Implementation issues: because of the products (small) probabilities leads to underflow the algorithm is implemented using the logarithm of the values and therefore the products becomes sums.

slide-146
SLIDE 146

Decoding problem — table v

Appendix 76/82

π1 π2

S(1) ... S(2) S(3) S(n-1) S(n)

πm

...

slide-147
SLIDE 147

Decoding problem

Appendix 77/82

# t r a n s i t i o n p r o b a b i l i t i e s ( t ) $t [ 0 ] [ 0 ] = 0 . 9 ; $t [ 0 ] [ 1 ] = 0 . 1 ; $t [ 1 ] [ 0 ] = 0 . 2 ; $t [ 1 ] [ 1 ] = 0 . 8 ; # emission p r o b a b i l i t i e s ( e ) $e [ 0 ] [ 0 ] = 0 . 5 0 ; $e [ 0 ] [ 1 ] = 0 . 5 0 ; $e [ 1 ] [ 0 ] = 0 . 0 5 ; $e [ 1 ] [ 1 ] = 0 . 9 5 ; # observed sequence (S) @S = (0 , 1 , 0 , 1 , 0 , 1 , 1 , 1 , 1 , 1 , 1 , 1 ) ; # i n i t i a l i z a t i o n ( d i s the dynamic programming t a b l e ) $d [ ] [ ] = $e [ ] [ $S [ ] ] ; $d [ 1 ] [ ] = $e [ 1 ] [ $S [ ] ] ;

slide-148
SLIDE 148

Decoding problem

Appendix 78/82

for ( $j =1; $j < @S ; $j++ ) { for ( $ i =0; $ i <= 1; $ i++ ) { $m = 0; for ( $k=0; $k <= 1; $k++ ) { $v = $d [ $k ] [ $j −1]∗ $t [ $k ] [ $ i ]∗ $e [ $ i ] [ $S [ $j ] ] ; i f ( $v > $m ) { $from = $k ; $to = $ i ; $m = $v ; } } $d [ $ i ] [ $j ] = $m; $ t r [ $ i ] [ $j ] = " ( $from− >$to ) " ; } }

slide-149
SLIDE 149

Decoding problem

Appendix 79/82

for ( $ i =0; $ i <= 1; $ i++ ) { for ( $j =0; $j < @S ; $j++ ) { p r i n t f "\ t %5.5 f " , $d [ $ i ] [ $j ] ; } p r i n t "\n" ; for ( $j =0; $j < @S ; $j++ ) { p r i n t f "\ t %s " , $ t r [ $ i ] [ $j ] ; } p r i n t "\n" ; }

slide-150
SLIDE 150

Decoding problem

Appendix 80/82 t[0][0] = 0.9; t[0][1] = 0.1; t[1][0] = 0.2; t[1][1] = 0.8; e[0][0] = 0.50; e[0][1] = 0.50; e[1][0] = 0.05; e[1][1] = 0.95; 1 1 1 1 1 1 1 1 1 0.50000 0.22500 0.10125 0.04556 0.02050 0.00923 0.00415 0.00187 0.00084 0.00038 0.00017 0.00008 (0->0) (0->0) (0->0) (0->0) (0->0) (0->0) (0->0) (0->0) (0->0) (0->0) (0->0) 0.05000 0.04750 0.00190 0.00962 0.00038 0.00195 0.00148 0.00113 0.00086 0.00065 0.00049 0.00038 (0->1) (1->1) (0->1) (1->1) (0->1) (1->1) (1->1) (1->1) (1->1) (1->1) (1->1)

slide-151
SLIDE 151

References

Appendix 81/82

Sean R Eddy. What is a hidden Markov model? Nat Biotechnol, 22(10):1315–6, Oct 2004. Byung-Jun Yoon. Hidden Markov Models and their applications in biological sequence analysis. Curr Genomics, 10(6):402–15, Sep 2009. C Burge and S Karlin. Prediction of complete gene structures in human genomic DNA. J Mol Biol, 268(1):78–94, Apr 1997. Sean R Eddy. What is dynamic programming? Nat Biotechnol, 22(7):909–10, Jul 2004. Chuong B Do and Serafim Batzoglou. What is the expectation maximization algorithm? Nat Biotechnol, 26(8):897–899, Aug 2008.

slide-152
SLIDE 152

Appendix 82/82

Marcel Turcotte

Marcel.Turcotte@uOttawa.ca School of Electrical Engineering and Computer Science (EECS) University of Ottawa