Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP - PowerPoint PPT Presentation

1 Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice University 2 DP Algorithms for Pairwise Alignment The number of all possible pairwise alignments (if gaps are allowed) is exponential in the length of the sequences Therefore, the approach of “ score every possible alignment and choose the best” is infeasible in practice Efficient algorithms for pairwise alignment have been devised using dynamic programming (DP) 3 DP Algorithms for Pairwise Alignment The key property of DP is that the problem can be divided into many smaller parts and the solution can be obtained from the solutions to these smaller parts SequenceAlignment-PairwiseDP - January 7, 2017

4 The Needleman-Wunch Algorithm for Global Pairwise Alignment The problem is to align two sequences x (x 1 x 2 ...x m ) and y (y 1 y 2 ...y n ) finding the best scoring alignment in which all residues of both sequences are included The score is assumed to be a measure of similarity, so the highest score is desired Alternatively, the score could be an evolutionary distance, in which case the smallest score would be sought, and all uses of “max” in the algorithm would be replaced by “min” 5 The Needleman-Wunch Algorithm for Global Pairwise Alignment The key concept in all these algorithms is the matrix S of optimal scores of subsequence alignments The matrix has (m+1) rows labeled 0 ➝ m and (n+1) columns labeled 0 ➝ n The rows correspond to the residues of sequence x, and the columns correspond to the residues of sequence y 6 The Needleman-Wunch Algorithm for Global Pairwise Alignment We’ll use as a working example the two sequences x=THISLINE and y=ISALIGNED with BLOSUM-62 substitution matrix as the scoring matrix and linear gap penalty g=E The optimal alignment of these two sequences is T H I S - L I - N E - - - I S A L I G N E D SequenceAlignment-PairwiseDP - January 7, 2017

7 The Matrix S E=-8 S i,j stores the score for the optimal alignment of all residues up to x i of sequence x will all residues up to y j of sequence y S i, 0 = S 0 ,i = ig 8 The Matrix S To complete the matrix, we use the formula  S i − 1 ,j − 1 + s ( x i , y j )  S i,j = max S i − 1 ,j + g S i,j − 1 + g  9 The Matrix S E=-8  S i − 1 ,j − 1 + s ( x i , y j )  S i,j = max S i − 1 ,j + g S i,j − 1 + g  The global alignment can be obtained by traceback SequenceAlignment-PairwiseDP - January 7, 2017

10 The alignment we obtained is not the one we expected in that it contains no gaps (the gap at the end is added only because the two sequences are of different lengths) The “problem” is that the worst score in the BLOSUM-62 matrix is -4, which is significantly less than the gap penalty (-8), so gaps are unlikely to be present in the optimal alignment 11 If instead we use a linear gap penalty of -4, inserting a gap becomes less severe, and a gapped alignment is more likely to be obtained Therefore it is very important to match the gap penalty to the substitution matrix used 12 The Matrix S E=-4  S i − 1 ,j − 1 + s ( x i , y j )  S i,j = max S i − 1 ,j + g S i,j − 1 + g  The global alignment can be obtained by traceback SequenceAlignment-PairwiseDP - January 7, 2017

13 Multiple Optimal Alignments There may be more than one optimal alignment During traceback this is indicated by encountering an element that was derived from more than one of the three possible alternatives The algorithm does not distinguish between these possible alignments, although there may be reasons (such as knowledge of molecular structure of function) for preferring one to the others Most programs will arbitrarily report just one single alignment 14 General Gap Penalty The algorithm we presented works with a linear gap penalty of the form g(n gap ) = -n gap E For a general gap penalty model g(n gap ) , one must consider the possibility of arriving at S i,j directly via insertion of a gap of length up to i in sequence x or j in sequence y 15 General Gap Penalty SequenceAlignment-PairwiseDP - January 7, 2017

16 General Gap Penalty The algorithm now has to be modified to  + s ( x i , y j ) S i − 1 ,j − 1  S i,j = max ( S i − n gap 1 ,j + g ( n gap 1 )) 1 ≤ n gap 1 ≤ i ( S i,j − n gap 2 + g ( n gap 2 )) 1 ≤ n gap 2 ≤ j  • This algorithm takes time that is proportional to mn 2 , where m and n are the sequence lengths with n>m 17 Affine Gap Penalty For an affine gap penalty g(n gap )=-I-(n gap -1)E , we can refine this algorithm to obtain an O(mn) algorithm Define two matrices V i,j = max { S i − n gap 1 ,j + g ( n gap 1 ) } 1 ≤ n gap 1 ≤ i W i,j = max { S i,j − n gap 2 + g ( n gap 2 ) } 1 ≤ n gap 2 ≤ j 18 Affine Gap Penalty These matrices can be defined recursively as � S i − 1 ,j I − V i,j = max V i − 1 ,j E − � S i,j − 1 I − W i,j = max W i,j − 1 E − • Now, the matrix S can be written as  S i − 1 ,j − 1 + s ( x i , y j )  S i,j = max V i,j W i,j  SequenceAlignment-PairwiseDP - January 7, 2017

19 Local Pairwise Alignment As mentioned before, sometimes local alignment is more appropriate (e.g., aligning two proteins that have just one domain in common) The algorithmic differences between the algorithm for local alignment (Smith-Waterman algorithm) and the one for global alignment: Whenever the score of the optimal sub-alignment is less than zero, it is rejected (the matrix element is set to 0) Traceback starts from the highest-scoring matrix element 20 Local Pairwise Alignment  S i − 1 ,j − 1 + s ( x i , y j )   ( S i − n gap 1 ,j + g ( n gap 1 )) 1 ≤ n gap 1 ≤ i  S i,j = max ( S i,j − n gap 2 + g ( n gap 2 )) 1 ≤ n gap 2 ≤ j   0  21 Local Pairwise Alignment E=-8 SequenceAlignment-PairwiseDP - January 7, 2017

22 Local Pairwise Alignment E=-4 23 Linear Space Alignment Is there a linear-space algorithm for the problem? If only the maximal score is needed, the problem is simple But even if the alignment itself is needed, there is a linear-space algorithm (originally due to Hirschberg 1975 , and introduced into computational biology by Myers and Miller 1988) 24 Linear Space Alignment Main observation 2 ,k + S r S i,j = max 0 ≤ k ≤ n { S i 2 ,n − k } i score of the best alignment of last S r i,j i residues of x with last j residues of y SequenceAlignment-PairwiseDP - January 7, 2017

25 Linear Space Alignment Compute and save row m/2 S m,n Compute and save row m/2 S r m,n Find k* that satisfies 2 ,k ∗ + S r S m 2 ,n − k ∗ = S m,n m Recurse 26 Questions? SequenceAlignment-PairwiseDP - January 7, 2017

Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP - PowerPoint PPT Presentation

1 Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice University 2 DP Algorithms for Pairwise Alignment The number of all possible pairwise alignments (if gaps are allowed) is exponential in the length

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

CSE 421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise Alignment Review: Tips and

Multiple Sequence Multiple Sequence Alignments Alignments Multiple alignment Pairwise

CSE 427 Computational Biology Winter 2008 Sequence Alignment; DNA Replication 1 Sequence

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Pairwise Sequence Alignment Todays Goal > DNA Sequence 1

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 5/29/2013 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/24/2012 Mark Voorhies Sequence Alignment Exercise:

Algorithmic Complexity Algorithmic Complexity "Algorithmic Complexity", also called

Presentation of Algorithms and Mathematics Jonathan Shapiro School of Computer Science

Intro to CS16 CS16: Introduction to Algorithms & Data Structures Spring 2020 Welcome to

Algorithms Slides Emanuele Viola 2009 present Released under Creative Commons License

Lecture 1: Introduction to Algorithms Steven Skiena Department of Computer Science State

Quicksort algorithm Average case analysis After today, you should be able to implement

Sublinear Algorithms for Graph Coloring Sanjeev Khanna University of Pennsylvania Joint work

Running time of algorithms How can we measure the running time of algorithms? Idea: Use a

Sambuz

Useful Links

Newsletter

Mail Us

Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP - PowerPoint PPT Presentation

1 Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice University 2 DP Algorithms for Pairwise Alignment The number of all possible pairwise alignments (if gaps are allowed) is exponential in the length

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

CSE 421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise Alignment Review: Tips and

Multiple Sequence Multiple Sequence Alignments Alignments Multiple alignment Pairwise

CSE 427 Computational Biology Winter 2008 Sequence Alignment; DNA Replication 1 Sequence

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Pairwise Sequence Alignment Todays Goal &gt; DNA Sequence 1

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 5/29/2013 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/24/2012 Mark Voorhies Sequence Alignment Exercise:

Algorithmic Complexity Algorithmic Complexity &quot;Algorithmic Complexity&quot;, also called

Presentation of Algorithms and Mathematics Jonathan Shapiro School of Computer Science

Intro to CS16 CS16: Introduction to Algorithms &amp; Data Structures Spring 2020 Welcome to

Algorithms Slides Emanuele Viola 2009 present Released under Creative Commons License

Lecture 1: Introduction to Algorithms Steven Skiena Department of Computer Science State

Quicksort algorithm Average case analysis After today, you should be able to implement

Sublinear Algorithms for Graph Coloring Sanjeev Khanna University of Pennsylvania Joint work

Running time of algorithms How can we measure the running time of algorithms? Idea: Use a

Sambuz

Useful Links

Newsletter

Mail Us

Pairwise Sequence Alignment Todays Goal > DNA Sequence 1

Algorithmic Complexity Algorithmic Complexity "Algorithmic Complexity", also called

Intro to CS16 CS16: Introduction to Algorithms & Data Structures Spring 2020 Welcome to