longest common subsequence
play

Longest Common Subsequence C=c 1 c g is a subsequence of A=a 1 a m - PDF document

Longest Common Subsequence C=c 1 c g is a subsequence of A=a 1 a m if C can be obtained by removing elements CSE 421 from A (but retaining order) Algorithms LCS(A, B): A maximum length sequence that is a subsequence of both A


  1. Longest Common Subsequence • C=c 1 …c g is a subsequence of A=a 1 …a m if C can be obtained by removing elements CSE 421 from A (but retaining order) Algorithms • LCS(A, B): A maximum length sequence that is a subsequence of both A and B Richard Anderson ocurranec Lecture 19 attacggct Longest Common Subsequence occurrence tacgacca Instructor Example Determine the LCS of the following String Alignment Problem strings • Align sequences with gaps CAT TGA AT BARTHOLEMEWSIMPSON CAGAT AGGA • Charge δ x if character x is unmatched KRUSTYTHECLOWN • Charge γ xy if character x is matched to character y Student Submission LCS Optimization Optimization recurrence • A = a 1 a 2 …a m If a j = b k , Opt[j,k] = 1 + Opt[j-1, k-1] • B = b 1 b 2 …b n If a j != b k , Opt[j,k] = max(Opt[j-1,k], Opt[j,k-1]) • Opt[j, k] is the length of LCS(a 1 a 2 …a j , b 1 b 2 …b k ) 1

  2. Give the Optimization Recurrence Dynamic Programming for the String Alignment Problem Computation • Charge δ x if character x is unmatched • Charge γ xy if character x is matched to character y Student Submission Write the code to compute Opt[j,k] Storing the path information A[1..m], B[1..n] b 1 …b n for i := 1 to m Opt[i, 0] := 0; for j := 1 to n Opt[0,j] := 0; Opt[0,0] := 0; a 1 …a m for i := 1 to m for j := 1 to n if A[i] = B[j] { Opt[i,j] := 1 + Opt[i-1,j-1]; Best[i,j] := Diag; } else if Opt[i-1, j] >= Opt[i, j-1] { Opt[i, j] := Opt[i-1, j], Best[i,j] := Left; } else { Opt[i, j] := Opt[i, j-1], Best[i,j] := Down; } Student Submission How good is this algorithm? Observations about the Algorithm • Is it feasible to compute the LCS of two • The computation can be done in O(m+n) strings of length 100,000 on a standard space if we only need one column of the desktop PC? Why or why not. Opt values or Best Values • The algorithm can be run from either end of the strings Student Submission 2

  3. Divide and Conquer Algorithm Constrained LCS • Where does the best path cross the • LCS i,j (A,B): The LCS such that middle column? – a 1 ,…,a i paired with elements of b 1 ,…,b j – a i+1 ,…a m paired with elements of b j+1 ,…,b n • LCS 4,3 (abbacbb, cbbaa) • For a fixed i, and for each j, compute the LCS that has a i matched with b j A = RRSSRTTRTS A = RRSSRTTRTS B=RTSRRSTST B=RTSRRSTST Compute LCS 5,1 (A,B), LCS 5,2 (A,B),…,LCS 5,9 (A,B) Compute LCS 5,1 (A,B), LCS 5,2 (A,B),…,LCS 5,9 (A,B) j left right 0 0 3 1 1 3 2 1 3 3 2 3 4 3 3 5 3 2 6 3 2 7 3 1 8 4 1 Student Instructor Submission 9 4 0 Example Computing the middle column Divide and Conquer • From the left, compute LCS(a 1 …a m/2 ,b 1 …b j ) • A = a 1 ,…,a m B = b 1 ,…,b n • From the right, compute LCS(a m/2+1 …a m ,b j+1 …b n ) • Find j such that • Add values for corresponding j’s – LCS(a 1 …a m/2 , b 1 …b j ) and – LCS(a m/2+1 …a m ,b j+1 …b n ) yield optimal solution • Recurse • Note – this is space efficient 3

  4. Prove by induction that Algorithm Analysis T(m,n) <= 2cmn • T(m,n) = T(m/2, j) + T(m/2, n-j) + cnm Instructor Example 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend