HMMs for Pairwise Sequence Alignment based on Ch. 4 from - PowerPoint PPT Presentation

0. HMMs for Pairwise Sequence Alignment based on Ch. 4 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgement: M.Sc. student Oana R˘ at ¸oi [ B. subtilis PRPP synthase ]

1. PLAN 1 How to build a pair HMM for pairwise alignment with affine gap penalties ...starting from the FSA for more complex pairwise alignments introduced in Ch. 2. Advantages of using such a full probabilistic model for pairwise alignment: 2 evaluate the similarity of two sequences independently of any specific alignment, by weighting all alternatives, and assess the reliability of the alignment obtained by dynamic programming 3 explore suboptimal alignments by probabilistic sampling 4 evaluate the posterior probability that x i is aligned to y j ; define the overall accuracy of an alignment; design an algorithm for getting the maximal overall accuracy alignment

2. Remember: A simple FSA for global pairwise sequence alignment with affine gap penalties Computing the values of the states I x − e (+1,+0) • Initialisations: s(x ,y ) i j V M (0 , 0) = 0 , V X (0 , 0) = V Y (0 , 0) = 0 − d M • Recurrence relations: s(x ,y )  V M ( i − 1 , j − 1) , (+1,+1) i j  V M ( i, j ) = s ( x i , y j ) + max V X ( i − 1 , j − 1) , − d V Y ( i − 1 , j − 1) ,  � V M ( i − 1 , j ) − d, V X ( i, j ) = max s(x ,y ) I y V X ( i − 1 , j ) − e, i j − e (+0,+1) � V M ( i, j − 1) − d, V Y ( i, j ) = max V Y ( i, j − 1) − e, Example: VLSPAD-K HL--AESK

3. 1 How to build a Pair HMM? A first, FSA-inspired version δ X X ε − e (+1,+0) q x i s(x ,y ) ε 1− i j δ − d M M s(x ,y ) 1−2 δ i j (+1,+1) p x y i j δ − d 1−2 δ ε 1− s(x ,y ) Y Y i j ε − e (+0,+1) q y j δ

4. The parameters • Set probabilities for transitions between states – transition from M to an insert state: δ – probability of staying in an insert state: ǫ • Set probabilities for emissions of symbols from the states – p ab for emitting an aligned pair a : b from a M state – q a for emitting symbol a against a gap • Set the initial transitions to each state to be the same as from the M state. Remark: The HMM shown on the previous slide generates pair alignments, except the pair of empty sequences. To get a full probabilistic version: • Define a new state End, and add a new parameter τ for the probability of the transition(s) into the End state.

5. Pair HMM: a full probabilistic version δ X ε δ q 1−2δ−τ x i τ τ 1−ε−τ τ p M End x y i j 1−2δ−τ 1−ε−τ τ δ Y ε q y j δ In this HMM, the emission of (pairs of) symbols is done when reaching the (non-End) states. To simplify the initialisation relations in the pair HMM algorithms (and also for symmetry with the End state), we will also add a silent state Begin.

6. Pair HMM: the final version δ X ε 1−2δ−τ δ q x i τ 1−ε−τ Begin τ p M End 1 x y i j 1−2δ−τ 1−ε−τ τ δ Y ε q δ y j τ

7. How does this pair HMM work? • start in the Begin state; • pick the next state according to the distribution of transition probabilities leaving the current state; • pick a symbol pair to be added to the alignment according to the emission distribution in the new state; • cycle over the previous two steps, while the End state is not reached.

8. Remark Throughout this chapter and also in the next one (Profile HMMs), the emission probabilities, that have been denoted b ijk in [ Manning & Sch¨ utze, 2000 ] Ch. 9, are considered independent of the origin state i . As a consequence, slightly different definitions (and notations) will be used for forward, backward and Viterbi probabilities, compared to our HMM slides, which were based on [ Manning & Sch¨ utze, 2000 ] . f k ( i ) = P ( x 1 . . . x i , π i = k ) b k ( i ) = P ( x i +1 . . . x L | π i = k ) v k ( i ) = π 1 ...π i P ( x 1 . . . x i , π 1 . . . π i , π i = k ) max See also [ Durbin et al, 1998 ] , pages 56–59.

9. The Viterbi algorithm for pair HMMs • Notation: v k ( i, j ) = max π P ( x 1 . . . x i , y 1 . . . y j , π ij = k ) will correspond to the most probable alignment up to the positions i, j , ending in state k . Remark: To simplify the equations to be given below, the Begin state is defined as being the M state. • Initialisation: v M (0 , 0) = 1 , and all other v • ( i, 0) = 0 , v • (0 , j ) = 0 (See Remark on page 116 in [ Borodowsky and Ekisheva, 2006 ] .) • Recurrence relations:  (1 − 2 δ − τ ) v M ( i − 1 , j − 1)  v M ( i, j ) = p x i y j max (1 − ǫ − τ ) v X ( i − 1 , j − 1) for i ≥ 1 , j ≥ 1 (1 − ǫ − τ ) v Y ( i − 1 , j − 1)  � δ v M ( i − 1 , j ) v X ( i, j ) = q x i max for i ≥ 1 , j ≥ 0 ǫ v X ( i − 1 , j ) � δ v M ( i, j − 1) v Y ( i, j ) = q y j max for i ≥ 0 , j ≥ 1 ǫ v Y ( i, j − 1) • Termination: v E = τ max ( v M ( n, m ) , v X ( n, m ) , v Y ( n, m ))

10. An independence model for pairwise sequence alignment 1−η η End Begin η 1−η X Y 1−η η 1 q q x y j i η 1−η • The main states are X and Y, which emit the two sequences in turn, independently of each other. • There is a silent state between X and Y, that doesn’t emit any symbols. • The probability of a pair of sequences x and y : n m n m � � � � P ( x, y | R ) = η (1 − η ) n q x i η (1 − η ) m q y j = η 2 (1 − η ) n + m q x i q y j i =1 j =1 i =1 j =1

11. A pair HMM for local alignment 1−η 1−η δ X RX 1 RX q i ε q i q i 1−2δ−τ x x 2 x η δ η 1−η 1−η τ End 1−ε−τ M τ η η P η x y j η i 1−ε−τ 1−η 1−2δ−τ 1−η Begin η τ η δ RY Y RY ε q q y j q 1 2 y j δ y j 1−η 1−η τ Note: This model is composed of the global model (states M, X and Y ) flanked by two copies of the independence model (states RX 1 , RY 1 , RX 2 , RY 2 ), because the sequences in the flanking regions are unaligned.

12. 2 The full probability of aligning x and y , summing over all paths Problem: When two sequences have a weak similarity, it is hard to identify the correct (biologically meaningful) alignment. Alternative: Using the pair HMM forward algorithm, we can calculate the probability that a given pair of sequences are related by any alignment. � P ( x, y ) = P ( x, y, π ) alignments π Note: If there is an unambiguous best/Viterbi alignment π ⋆ , almost all of the probability P ( x, y ) will be contributed by P ( x, y, π ⋆ ) . But when there are many comparable alternative alignments or alternative variations, the probability P ( x, y ) can be significantly different from P ( x, y, π ⋆ ) .

13. Algorithm: Forward calculation for pair HMM • Notation: f k ( i, j ) is the combined probability of all alignments up to the positions i, j , ending in state k : f k ( i, j ) = P ( x 1 . . . x i , y 1 . . . y j , π ij = k ) . • Initialisation: f M (0 , 0) = 1 and all other f • ( i, 0) = 0 , f • (0 , j ) = 0 • Recursion relations: f M ( i, j ) = p x i y j [(1 − 2 δ − τ ) f M ( i − 1 , j − 1) + (1 − ǫ − τ )( f X ( i − 1 , j − 1) + f Y ( i − 1 , j − 1))] , for i ≥ 1 , j ≥ 1 f X ( i, j ) = q x i [ δf M ( i − 1 , j ) + ǫf X ( i − 1 , j )] , for i ≥ 1 , j ≥ 0 f Y ( i, j ) = q y j [ δf M ( i, j − 1) + ǫf Y ( i, j − 1)] , for i ≥ 0 , j ≥ 1 • Termination: f E ( n, m ) = τ ( f M ( n, m ) + f X ( n, m ) + f Y ( n, m ))

14. Note The log-odds ratio of the full probability P ( x, y ) = f E ( n, m ) to the independence model probability is a measure of the likelihood that the two sequences are related to each other by some unspecified alignment, as opposed to being unrelated.

15. The posterior probability P ( π | x, y ) over alignments π , given x and y P ( π | x, y ) = P ( x, y, π ) P ( x, y ) In particular, for the Viterbi path: P ( π ⋆ | x, y ) = P ( x, y, π ⋆ ) = v E ( n, m ) P ( x, y ) f E ( n, m ) Note: As already mentioned, this probability can be very small. For example, for the alignment of two highly diverged proteins shown below — the human alpha globin and the leghaemoglobin from yellow lupin —, this probability is 4 . 6 × 10 − 6 . HBA_HUMAN GSAQVKGHGKKVADALTNAVAHV---D--DMPNALSALSDLHAHKL ++ ++++H+ KV + +A ++ +L+ L+++H+ K LGB2_LUPLU NNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKG

16. 3 Suboptimal alignment by probabilistic sampling Idea: [How to generate a sample alignment] While tracing back through the matrix of f k ( i, j ) values instead of taking the highest choice at each step, make a probabilistic choice based on the relative strengths of the three components.

HMMs for Pairwise Sequence Alignment based on Ch. 4 from - PowerPoint PPT Presentation

0. HMMs for Pairwise Sequence Alignment based on Ch. 4 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgement: M.Sc. student Oana R at oi [ B. subtilis PRPP synthase ] 1. PLAN 1 How to build a pair HMM for

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Pair HMMs and Pairwise Sequence Alignment COMP 571 Luay Nakhleh, Rice University Pair HMMs

Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise Alignment Review: Tips and

Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

Multiple Sequence Multiple Sequence Alignments Alignments Multiple alignment Pairwise

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: Preethi Jyothi Recap: HMMs

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Pairwise Sequence Alignment Todays Goal > DNA Sequence 1

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 5/29/2013 Mark Voorhies Sequence Alignment Exercise: Scoring

5. Scaling up November 1, 2019 Slides by Marta Arias, Jos Luis Balczar, Ramon

The story of the film so far... With every experiment we associate a probability space ( , F ,

Limited independence and Hashing Lecture 04 January 24, 2019 Chandra (UIUC) CS498ABD 1

Random Variables # faulty pixels in monitor Independence # alpha particles in a second

15-853:Algorithms in the Real World Announcements: HW2 due tomorrow noon. Small correction

Tracking Frequent Items Dynamically: Whats Hot and Whats Not To appear in PODS 2003

BLOOMIN' MARVELLOUS WHY PROBABLY CAN BE BETTER THAN DEFINITELY Adrian Colyer, @adriancolyer

Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs

HMMs for Pairwise Sequence Alignment based on Ch. 4 from - PowerPoint PPT Presentation

0. HMMs for Pairwise Sequence Alignment based on Ch. 4 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgement: M.Sc. student Oana R at oi [ B. subtilis PRPP synthase ] 1. PLAN 1 How to build a pair HMM for

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Pair HMMs and Pairwise Sequence Alignment COMP 571 Luay Nakhleh, Rice University Pair HMMs

Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise Alignment Review: Tips and

Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

Multiple Sequence Multiple Sequence Alignments Alignments Multiple alignment Pairwise

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: Preethi Jyothi Recap: HMMs

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Pairwise Sequence Alignment Todays Goal &gt; DNA Sequence 1

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 5/29/2013 Mark Voorhies Sequence Alignment Exercise: Scoring

5. Scaling up November 1, 2019 Slides by Marta Arias, Jos Luis Balczar, Ramon

The story of the film so far... With every experiment we associate a probability space ( , F ,

Limited independence and Hashing Lecture 04 January 24, 2019 Chandra (UIUC) CS498ABD 1

Random Variables # faulty pixels in monitor Independence # alpha particles in a second

15-853:Algorithms in the Real World Announcements: HW2 due tomorrow noon. Small correction

Tracking Frequent Items Dynamically: Whats Hot and Whats Not To appear in PODS 2003

BLOOMIN' MARVELLOUS WHY PROBABLY CAN BE BETTER THAN DEFINITELY Adrian Colyer, @adriancolyer

Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs

Pairwise Sequence Alignment Todays Goal > DNA Sequence 1