Pair HMMs and Profile HMMs COMPSCI 260 Spring 2016 HMM - PowerPoint PPT Presentation

Pair ¡HMMs ¡and ¡Profile ¡HMMs ¡ COMPSCI ¡260 ¡– ¡Spring ¡2016 ¡

HMM ¡ An Example 5% M 1 =({ q 0 , q 1 , q 2 },{ Y , R }, P t , P e ) 15% Y =0% R =0% P t = { ( q 0 , q 1 ,1), ( q 1 , q 1 ,0.8), 80% R = 100% Y = 100% q 0 ( q 1 , q 2 ,0.15), ( q 1 , q 0 ,0.05), q 2 q 1 30% ( q 2 , q 2 ,0.7), ( q 2 , q 1 ,0.3)} 70% 100% P e ={( q 1 , Y ,1), ( q 1 , R ,0), ( q 2 , Y ,0), ( q 2 , R ,1) }

Three ¡views ¡of ¡an ¡HMM ¡ ¡

QuesAons ¡we ¡can ¡address ¡with ¡an ¡HMM ¡ INPUT ¡ The ¡HMM ¡model ¡ M : ¡ Q ={ q 0 , q 1 , ... , q m }; P t ( q j | q i ); P e ( s j | q i ) ¡ A ¡sequence ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡and ¡a ¡path ¡ ¡ φ = y 0 y 1 ... y L + 1 S = x 0 x 1 ... x L − 1 CGATATTCGATTCTACGCGCGTATACTAGCTTATCTGATC 011111112222222111111222211111112222111110 OUTPUT ¡ What ¡is ¡the ¡probability ¡of ¡generaAng ¡sequence ¡ S ¡from ¡path ¡ ϕ according ¡to ¡the ¡model ¡ M? ¡ P ( S | ϕ , M ) 5% q 2 q 1 L − 1 ∏ P ( S | φ ) = P e ( x i | y i + 1 ) 15% A =25% A =10% C =25% C =40% 80% q 0 i = 0 G =25% G =10% T =25% T =40% 30% 70% emission prob. 100%

QuesAons ¡we ¡can ¡address ¡with ¡an ¡HMM ¡ INPUT ¡ The ¡HMM ¡model ¡ M : ¡ Q ={ q 0 , q 1 , ... , q m }; P t ( q j | q i ); P e ( s j | q i ) ¡ A ¡sequence ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡and ¡a ¡path ¡ ¡ φ = y 0 y 1 ... y L + 1 S = x 0 x 1 ... x L − 1 CGATATTCGATTCTACGCGCGTATACTAGCTTATCTGATC 011111112222222111111222211111112222111110 OUTPUT ¡ What ¡is ¡the ¡joint ¡probability ¡of ¡sequence ¡ S ¡and ¡path ¡ ϕ ¡ according ¡ to ¡the ¡model ¡ M? ¡ P ( S , ϕ | M ) P ( S , φ ) = P ( S φ ) P ( φ ) L − 1 L ∏ P ( S | φ ) = P e ( x i | y i + 1 ) P ( φ ) = ∏ P t ( y i + 1 | y i ) i = 0 i = 0 transition prob. emission prob.

QuesAons ¡we ¡can ¡address ¡with ¡an ¡HMM ¡ INPUT ¡ The ¡HMM ¡model ¡ M : ¡ Q ={ q 0 , q 1 , ... , q m }; P t ( q j | q i ); P e ( s j | q i ) ¡ A ¡sequence ¡ S : ¡ CGATATTCGATTCTACGCGCGTATACTAGCTTATCTGATC OUTPUT ¡ What ¡is ¡the ¡most ¡probable ¡path ¡for ¡generaAng ¡sequence ¡ S ¡ according ¡to ¡the ¡model ¡ M ? -‑ ¡ ¡ ¡“ Decoding ” ¡ φ max = argmax P ( φ S , M ) 5% φ q 2 q 1 15% A =25% A =10% C =25% C =40% 80% q 0 G =25% G =10% T =25% T =40% 30% 70% 100%

“Decoding” ¡with ¡an ¡HMM ¡– ¡Viterbi ¡decoding ¡ P ( φ , S ) φ max = argmax P ( φ S ) = argmax P ( S ) φ φ = argmax P ( φ , S ) φ S = x 0 x 1 ... x L − 1 = argmax P ( S φ ) P ( φ ) φ = y 0 y 1 ... y L + 1 φ L L − 1 P ( φ ) = ∏ P t ( y i + 1 | y i ) ∏ P ( S | φ ) = P e ( x i | y i + 1 ) i = 0 i = 0 emission prob. transition prob. L − 1 ∏ φ max = argmax P t ( q 0 y L ) P e ( x i y i + 1 ) P t ( y i + 1 y i ) φ i = 0

“Decoding” ¡with ¡an ¡HMM ¡ • Viterbi ¡gives ¡us ¡two ¡things: ¡ – The ¡“best” ¡parse: ¡ ϕ max = argmax ϕ P( ϕ | S) – The ¡joint ¡probability: ¡ P ( ϕ max , S ) • What ¡if ¡we ¡are ¡interested ¡in ¡the ¡state ¡that ¡generated ¡a ¡ parAcular ¡character? ¡ P ( y k = q i S ) = P ( S , y k = q i ) P ( S ) • What ¡if ¡we ¡ are ¡interested ¡in ¡the ¡marginal ¡probability ¡of ¡ emiRng ¡ S , ¡regardless ¡of ¡the ¡path? ¡ P ( S ) • We ¡can ¡compute ¡these ¡using ¡the ¡Forward ¡and ¡Backward ¡ algorithms, ¡and ¡“posterior” ¡decoding ¡

“Decoding” ¡with ¡an ¡HMM ¡-‑ ¡Posterior ¡decoding ¡ P ( y k = q i S ) = P ( S , y k = q i ) = F ( i , k ) B ( i , k ) “Posterior” decoding: P ( S ) P ( S )

Training ¡an ¡HMM ¡with ¡labeled ¡sequences: ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡given ¡ ¡ S , φ CGATATTCGATTCTACGCGCGTATACTAGCTTATCTGATC 011111112222222111111222211111112222111110 to state A transitions 0 1 2 i , j a = 0 0 (0%) 1 (100%) 0 (0%) i , j | Q | 1 − from A ∑ 1 1 (4%) 21 (84%) 3 (12%) state i , h h 0 = 2 0 (0%) 3 (20%) 12 (80%) symbol E A C G T i , k emissions e = in 6 7 5 7 1 i , k | | 1 Σ − E state (24%) (28%) (20%) (28%) ∑ i , h h 0 3 3 2 7 = 2 (20%) (20%) (13%) (47%)

Training ¡an ¡HMM ¡with ¡unlabeled ¡sequences: ¡only ¡ ¡ ¡ ¡ ¡given ¡ ¡ S INPUT: ¡ ¡ A ¡set ¡of ¡sequences ¡ S ¡generated ¡by ¡the ¡HMM; ¡ Q ={ q 0 , q 1 , ... , q m } ¡ OUTPUT: ¡ ¡The ¡parameters ¡of ¡the ¡HMM: P t ( q j | q i ); P e ( s j | q i ) Two ¡Solu5ons: ¡ 1. ¡ Viterbi ¡Training : ¡Start ¡with ¡random ¡HMM ¡parameters. ¡Use ¡Viterbi ¡to ¡ find ¡the ¡most ¡probable ¡path ¡for ¡each ¡training ¡sequence, ¡and ¡then ¡label ¡ the ¡sequence ¡with ¡that ¡path. ¡ ¡Use ¡labeled ¡sequence ¡training ¡on ¡the ¡ resulAng ¡set ¡of ¡sequences ¡and ¡paths. ¡Iterate ¡unAl ¡Viterbi ¡paths ¡do ¡not ¡ change. ¡ 2. ¡ Baum-‑Welch ¡Training : ¡Start ¡with ¡random ¡HMM ¡parameters. ¡Use ¡ posterior ¡decoding ¡to ¡compute ¡the ¡‘forward’ ¡and ¡‘backward’ ¡ probabiliAes. ¡Sum ¡over ¡all ¡possible ¡paths ¡(rather ¡than ¡the ¡single ¡most ¡ probable ¡one) ¡to ¡esAmate ¡ expected ¡counts ¡ ¡ A i, j and ¡ E i,k ; ¡then ¡use ¡the ¡ same ¡formulas ¡as ¡for ¡labeled ¡sequence ¡training ¡on ¡these ¡expected ¡ counts. ¡Iterate ¡unAl ¡the ¡change ¡in ¡ P ( S | M ) < ε

HMMs ¡for ¡gene ¡predicAon ¡

HMMs ¡and ¡sequence ¡alignments ¡– ¡Pair ¡HMMs ¡ A ¡ Pair ¡HMM ¡is ¡an ¡HMM ¡which ¡has ¡ two ¡output ¡channels ¡rather ¡than ¡ one; ¡each ¡state ¡can ¡emit ¡a ¡symbol ¡into ¡one ¡or ¡the ¡other ¡(or ¡both) ¡ channels. ¡ More ¡general ¡Pair ¡HMMs ¡can ¡have ¡many ¡more ¡states, ¡but ¡those ¡states ¡can ¡ all ¡be ¡classified ¡as ¡ inser7on ¡states , ¡ dele7on ¡states , ¡or ¡ match/mismatch ¡ states . ¡

HMMs ¡and ¡sequence ¡alignments ¡– ¡Pair ¡HMMs ¡ -------AACGCAGGAGCCTGCAGGTCTGGGCAGCCAGTTAGCGGGCTGCGGGCCCAGGA bosTau2 0 60 + . CACTCCCAT--------------GGCCCGG--AGCC------CGAGCCGCGCGCCCACAA canFam2 0 60 + . AGCCCTCGCAGAGCCCTGGGAGAGACAGCCTACAGGACTGGACTTGGGGCAGGGAAACAT bosTau2 60 60 + . ---CCTGGCAGAGCGCCGGGAGCCGCAGCCTCCAGACCCGAGCGCGCAGGCGGCAGAACG canFam2 60 60 + . TTCAGAGAAAAGATAGGAGATA bosTau2 120 22 + . CGCGGAG---GGGCGGGCGCCA canFam2 120 22 + . A ¡Simple ¡Pair ¡HMM ¡ ¡ The ¡ most ¡probable ¡state ¡path ¡ through ¡the ¡Pair ¡HMM ¡ M q 0 q 0 determines ¡the ¡ op5mal ¡ I D alignment. ¡

HMMs ¡and ¡sequence ¡alignments ¡– ¡Pair ¡HMMs ¡ Pair ¡HMMs ¡can ¡be ¡used ¡for ¡simultaneous ¡alignment ¡and ¡annotaAon ¡ -------AACGCAGGAGCCTGCAGGTCTGGGCAGCCAGTTAGCGGGCTGCGGGCCCAGGA bosTau2 0 60 + . CACTCCCAT--------------GGCCCGG--AGCC------CGAGCCGCGCGCCCACAA canFam2 0 60 + . AGCCCTCGCAGAGCCCTGGGAGAGACAGCCTACAGGACTGGACTTGGGGCAGGGAAACAT bosTau2 60 60 + . ---CCTGGCAGAGCGCCGGGAGCCGCAGCCTCCAGACCCGAGCGCGCAGGCGGCAGAACG canFam2 60 60 + . TTCAGAGAAAAGATAGGAGATA bosTau2 120 22 + . CGCGGAG---GGGCGGGCGCCA canFam2 120 22 + . A ¡Pair ¡HMM ¡with ¡ FuncAonal ¡States ¡ I D Generalization: Profile HMMs M q 0 q 0 M I D

Profile ¡HMMs ¡applicaAon: ¡Pfam ¡protein ¡domains ¡

PosiAon ¡weight ¡matrices ¡(PWMs) ¡(PSSMs) ¡ PWMs ¡are ¡a ¡special ¡case ¡of ¡an ¡HMM: ¡ What are the transition probabilities? State ¡transiAon ¡diagram ¡ q 0 q 0 Graphical ¡model ¡ G A T C T C A T T T

Profile ¡HMMs ¡for ¡protein ¡families ¡ • Consider ¡the ¡PWM ¡for ¡a ¡conserved ¡segment ¡of ¡a ¡protein ¡family ¡ R I Y V R • The ¡profile ¡consists ¡of ¡the ¡frequencies ¡of ¡amino ¡acids ¡at ¡each ¡posiAon ¡ ¡ P(R) ¡= ¡2/3 ¡ P(L) ¡= ¡1/3 ¡ P(I) ¡= ¡1 ¡ P(Y) ¡= ¡1 ¡ Begin ¡ M 1 ¡ M 2 ¡ M 3 ¡ M 4 ¡ M 5 ¡ M 6 ¡ End ¡ P(L) ¡= ¡2/3 ¡ P(V) ¡= ¡2/3 ¡ P(A) ¡= ¡1/3 ¡ P(R) ¡= ¡1/3 ¡ ¡ P(A) ¡= ¡1/3 ¡ P(V) ¡= ¡1/3 ¡ P(R) ¡= ¡1/3 ¡ ¡ • However, ¡this ¡type ¡of ¡profile ¡does ¡not ¡allow ¡for ¡gaps ¡(inserAons/ deleAons) ¡

Pair HMMs and Profile HMMs COMPSCI 260 Spring 2016 HMM - PowerPoint PPT Presentation

Pair HMMs and Profile HMMs COMPSCI 260 Spring 2016 HMM An Example 5% M 1 =({ q 0 , q 1 , q 2 },{ Y , R }, P t , P e ) 15% Y =0% R =0% P t = { ( q 0 , q 1 ,1), ( q 1 , q 1 ,0.8), 80% R = 100% Y =

Pair HMMs and Pairwise Sequence Alignment COMP 571 Luay Nakhleh, Rice University Pair HMMs

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: Preethi Jyothi Recap: HMMs

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

Closest Pair of Points Cormen et.al 33.4 Closest Pair of Points Closest pair. Given n points in

An introduction to Patterns, An introduction to Patterns, Profiles, HMMs and Profiles, HMMs and

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

The PLINK example GWAS analysed by PLINK and Sib-pair David Duffy Genetic Epidemiology

On the Complexity of Closest Pair via Polar-Pair of Point-Sets Bundit Laekhanukit

Today CS 188: Artificial Intelligence HMMs, Particle Filters, and Applications HMMs

Scala Generics Type Parameterization 1 / 16 Scala Generics Consider: 1 class Pair[T, S](val

Closest Pair of Points in the Plane Inge Li Grtz Thank you to Kevin Wayne for inspiration to

Sequential Data Oliver Schulte - CMPT 726 Bishop PRML Ch. 13 Russell and Norvig, AIMA Hidden

Today CS 188: Artificial Intelligence HMMs, Particle Filters, and Applications HMMs

HMMS ARTS PROGRAMS HMMS ARTS TEACHERS Mrs. DeMayo Mrs. Wilson Theatre & Dance Band &

CSI5126 . Algorithms in bioinformatics Probabilistic Sequence Motifs Marcel Turcotte School of

Music and the Modeling Approach to Gene4c Systems of

MA/CSSE 474 Theory of Computation Functions on Languages, Decision Problems (if time) Logic:

COMP 598 Advanced Computational Biology Methods & Research Introduction Jrme

ELIXIR SCOP (Murzin) ~3000 domain structure families CATH (Orengo) Predicted domain

Novel method for estimating isotope incorporation using the half-decimal place rule Ingo

Algorithms in Bioinformatics: A Practical Introduction Sequence Similarity Earliest Researches

Genetics and pathophysiology of ARVC AJ Marian, M.D. Center for Cardiovascular Genetics B rown

Pair HMMs and Profile HMMs COMPSCI 260 Spring 2016 HMM - PowerPoint PPT Presentation

Pair HMMs and Profile HMMs COMPSCI 260 Spring 2016 HMM An Example 5% M 1 =({ q 0 , q 1 , q 2 },{ Y , R }, P t , P e ) 15% Y =0% R =0% P t = { ( q 0 , q 1 ,1), ( q 1 , q 1 ,0.8), 80% R = 100% Y =

Pair HMMs and Pairwise Sequence Alignment COMP 571 Luay Nakhleh, Rice University Pair HMMs

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: Preethi Jyothi Recap: HMMs

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

Closest Pair of Points Cormen et.al 33.4 Closest Pair of Points Closest pair. Given n points in

An introduction to Patterns, An introduction to Patterns, Profiles, HMMs and Profiles, HMMs and

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

The PLINK example GWAS analysed by PLINK and Sib-pair David Duffy Genetic Epidemiology

On the Complexity of Closest Pair via Polar-Pair of Point-Sets Bundit Laekhanukit

Today CS 188: Artificial Intelligence HMMs, Particle Filters, and Applications HMMs

Scala Generics Type Parameterization 1 / 16 Scala Generics Consider: 1 class Pair[T, S](val

Closest Pair of Points in the Plane Inge Li Grtz Thank you to Kevin Wayne for inspiration to

Sequential Data Oliver Schulte - CMPT 726 Bishop PRML Ch. 13 Russell and Norvig, AIMA Hidden

Today CS 188: Artificial Intelligence HMMs, Particle Filters, and Applications HMMs

HMMS ARTS PROGRAMS HMMS ARTS TEACHERS Mrs. DeMayo Mrs. Wilson Theatre &amp; Dance Band &amp;

CSI5126 . Algorithms in bioinformatics Probabilistic Sequence Motifs Marcel Turcotte School of

Music and the Modeling Approach to Gene4c Systems of

MA/CSSE 474 Theory of Computation Functions on Languages, Decision Problems (if time) Logic:

COMP 598 Advanced Computational Biology Methods &amp; Research Introduction Jrme

ELIXIR SCOP (Murzin) ~3000 domain structure families CATH (Orengo) Predicted domain

Novel method for estimating isotope incorporation using the half-decimal place rule Ingo

Algorithms in Bioinformatics: A Practical Introduction Sequence Similarity Earliest Researches

Genetics and pathophysiology of ARVC AJ Marian, M.D. Center for Cardiovascular Genetics B rown

HMMS ARTS PROGRAMS HMMS ARTS TEACHERS Mrs. DeMayo Mrs. Wilson Theatre & Dance Band &

COMP 598 Advanced Computational Biology Methods & Research Introduction Jrme