CSI5126 . Algorithms in bioinformatics Hidden Markov Models - PowerPoint PPT Presentation

. Estimation . . . . . . . Preamble Example Decoding Likelihood Model Specifjcation Preamble . Example Decoding Likelihood Model Specifjcation Estimation CSI5126 . Algorithms in bioinformatics Hidden Markov Models (continued) Marcel Turcotte School of Electrical Engineering and Computer Science (EECS) University of Ottawa Version October 31, 2018 Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. Estimation . . . . . . . Preamble Example Decoding Likelihood Model Specifjcation Preamble . Example Decoding Likelihood Model Specifjcation Estimation Summary This module is about Hidden Markov Models . General objective Describe in your own words Hidden Markov Models. Explain the decoding , likelihood , and parameter estimation problems. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. Likelihood . . . . Preamble Example Decoding Likelihood Model Specifjcation Estimation Preamble Example Decoding Model Specifjcation . Estimation Reading Pavel A. Pevzner and Phillip Compeau (2018) Bioinformatics Algorithms: An Active Learning Approach . Active Learning Publishers. http://bioinformaticsalgorithms.com Chapter 10. Yoon, B.-J. Hidden Markov Models and their Applications in Biological Sequence Analysis. Curr. Genomics 10 , 402–415 (2009). A. Krogh, R. M. Durbin, and S. Eddy (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. Estimation . . . . . . . Preamble Example Decoding Likelihood Model Specifjcation Preamble . Example Decoding Likelihood Model Specifjcation Estimation HMM : Applications 1. Gene prediction 2. Pairwise and multiple sequence alignments 3. Protein secondary structure 4. ncRNA identifjcation, structural alignments, folding and annotations 5. Modeling transmembrane proteins Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. Decoding . . . . . Preamble Example Decoding Likelihood Model Specifjcation Estimation Preamble Example Likelihood . Model Specifjcation Estimation Hidden Markov Models (HMM) “A hidden Markov model (HMM) is a statistical model that can be used to describe the evolution of observable events [ symbols ] that depend on internal factors [ states ], which are not directly observable.” “An HMM consists of two stochastic processes (…)”: Invisible process consisting of states Visible (observable) process consisting of symbols Yoon, B.-J. Hidden Markov Models and their Applications in Biological Sequence Analysis. Current Genomics 10 , 402–415 (2009). Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. Estimation . . . . . . . Preamble Example Decoding Likelihood Model Specifjcation Preamble . Example Decoding Likelihood Model Specifjcation Estimation Defjnitions the sequence of symbols ( S ). modeled as a Markov chain , these transitions are not directly observable (they are hidden ), Each state has emission probabilities associated with it: the probability of observing /emitting the symbol b when in state k . Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics We need to distinguish between the sequence of states ( π ) and The sequence of states, denoted by π and called the path , is a kl = P ( π i = l | π i − 1 = k ) where a kl is a transition probability from the state π k to π l . e k ( b ) = P ( S ( i ) = b | π i = k )

. Decoding . . . . . . . . . Preamble Example Likelihood . Model Specifjcation Estimation Preamble Example Decoding Likelihood Model Specifjcation Estimation Defjnitions states , Q , a matrix of transition probabilities , A , as well as a the emission probabilities , E , are the parameters of an HMM: Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics The alphabet of emited symbols , Σ , the set of (hidden) M = < Σ , Q , A , E > .

. Estimation . . . . . . . Preamble Example Decoding Likelihood Model Specifjcation Preamble . Example Decoding Likelihood Model Specifjcation Estimation Remark A path is modelled as a diescte time-homogeneous fjrst-order Markov chain. Memoryless: The probability of being in state j at the next time point depends only on the current state, i ; Homogeneity of time: The transition probability does not change over time. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

. Model Specifjcation . . . . . . . . Preamble Example Decoding Likelihood Estimation . Preamble Example Decoding Likelihood Model Specifjcation Estimation Interesting questions has been produced by this HMM, let’s call this the likelihood problem ; determined? Let’s call this the parameter estimation problem . Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . 1. P ( S , π ) : the joint probability of a sequence of symbols S and a sequence of states π . The decoding problem consists of fjnding a path π such that P ( S , π ) is maximum; 2. P ( S | θ ) : the probability of a sequence of symbols S given the model θ . It represents the likelihood that sequence S 3. Finally, how are the parameters of the model (HMM), θ ,

. Likelihood . . . . . . . . Preamble Example Decoding Model Specifjcation . Estimation Preamble Example Decoding Likelihood Model Specifjcation Estimation Defjnitions Joint probability of a sequence of symbols S and a sequence of L advance. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . ... ... D j I j ... ... M j Begin End states π : ∏ P ( S , π ) = a 0 π 1 e π i ( S ( i )) a π i π i + 1 i = 1 P ( S = VGPGGAHA , π = BEG , M 1 , M 2 , I 3 , I 3 , I 3 , M 3 , M 4 , M 5 , END ) ⇒ However in practice, the state sequence π is not known in

. Model Specifjcation . . . . Preamble Example Decoding Likelihood Model Specifjcation Estimation Preamble Example Decoding Likelihood Estimation . Worked example: the occasionally dishonest player A simplifjed example will help better understanding the characteristics of HMMs. I want to play a game. I will be tossing a coin n times. This information can be represented as follows: { H, T, T, H, T, T, …} or { 0, 1, 1, 0, 1, 1, …}. In fact, I will be using two coins ! One is fair , i.e. head and tail are equiprobable outcomes, but the other one is loaded (biased), it 4 . I will not reveal when I am exchanging the coins. This information is hidden to you. Objective: Looking at a series of observations, S , can you predict when the exchanges of coins occurred? Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . returns head with probability 1 4 and tail with probability 3

. Likelihood . . . . . . . . Preamble Example Decoding Model Specifjcation . Estimation Preamble Example Decoding Likelihood Model Specifjcation Estimation Worked example: the occasionally dishonest player Such game can be modeled using an HMM where each state represents a coin , with its own emission probability distribution, and the transition probabilities represent exchanging the coins. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . .9 .2 .1 P(0) = 1/2 P(0) = 1/4 P(1) = 1/2 P(1) = 3/4 .8 π π 1 2

. Likelihood . . . . . . . . Preamble Example Decoding Model Specifjcation . Estimation Preamble Example Decoding Likelihood Model Specifjcation Estimation Worked example: the occasionally dishonest player Given an input sequence of symbols (heads and tails), such as 0, 1, 1, 0, 1, 1, 1, which sequence of states has the highest probability? Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . .9 .2 .1 P(0) = 1/2 P(0) = 1/4 P(1) = 1/2 P(1) = 3/4 .8 π π 1 2

. Example . . . . . . Preamble Example Decoding Likelihood Model Specifjcation Estimation Preamble . . Likelihood Model Specifjcation Estimation Worked example: the occasionally dishonest player S 0 1 1 0 1 1 1 Marcel Turcotte . Decoding . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . . . . π π 1 π 1 π 1 π 1 π 1 π 1 π 1 π π 1 π 1 π 1 π 1 π 1 π 1 π 2 . . . π π 2 π 2 π 1 π 1 π 2 π 2 π 2 . . . π π 2 π 2 π 2 π 2 π 2 π 2 π 2

CSI5126 . Algorithms in bioinformatics Hidden Markov Models - PowerPoint PPT Presentation

. Estimation . . . . . . . Preamble Example Decoding Likelihood Model Specifjcation Preamble . Example Decoding Likelihood Model Specifjcation Estimation CSI5126 . Algorithms in bioinformatics Hidden Markov Models (continued)

CSI5126 . Algorithms in bioinformatics Deterministic Sequence Motifs Marcel Turcotte School of

CSI5126 . Algorithms in bioinformatics Overview of the course content and expectations Marcel

CSI5126 . Algorithms in bioinformatics Suffjx Trees Marcel Turcotte School of Electrical

CSI5126 . Algorithms in bioinformatics Essential Cellular Biology (continued) Marcel Turcotte

CSI5126 . Algorithms in bioinformatics Essential Cellular Biology Marcel Turcotte School of

CSI5126 . Algorithms in bioinformatics RNA Secondary Structure Search Problem Marcel Turcotte

CSI5126 . Algorithms in bioinformatics Suffjx Trees Marcel Turcotte School of Electrical

CSI5126 . Algorithms in bioinformatics Substitution Score Marcel Turcotte School of Electrical

CSI5126 . Algorithms in bioinformatics Multiple Sequence Alignment (MSA) Marcel Turcotte School

CSI5126 . Algorithms in bioinformatics Hidden Markov Models Marcel Turcotte School of Electrical

CSI5126 . Algorithms in bioinformatics Phylogeny Marcel Turcotte School of Electrical

CSI5126 . Algorithms in bioinformatics Pairwise Sequence Alignment Marcel Turcotte School of

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

CSI5126 . Algorithms in bioinformatics Probabilistic Sequence Motifs Marcel Turcotte School of

Bioinformatics Algorithms (Fundamental Algorithms, module 2) Zsuzsanna Lipt ak Masters in

Outline Administravia What is bioinformatics CS 5263 Bioinformatics Why

CS681: Advanced Topics in Computational Biology Week 9 Lecture 1 Can Alkan EA224

Molecular Subtypes of Renal Cell Carcinoma Deepika Sirohi, MD University of Utah and ARUP

Random Variable : non empty set. {up, town} Event A: is a subset of . {up} A

Making sense of DNA methyla4on data Peter Hickey

Kernel Methods for Fusing Heterogeneous Data Gunnar R atsch Friedrich Miescher Laboratory, Max

Investor Presentation July 31, 2018 Global Partners LP (NYSE: GLP) Forward-Looking Statements

Open Source Geospatial Software - an Introduction Spatial Programming with R V. G omez-Rubio

From Transfac to HOCOMOCO: using cross-validation and human curation to take most from the high