cs cs 466 466 in introduct ctio ion t to b bio ioin
play

CS CS 466 466 In Introduct ctio ion t to B Bio ioin - PowerPoint PPT Presentation

CS CS 466 466 In Introduct ctio ion t to B Bio ioin informatics ics Lecture 5 Mohammed El-Kebir February 4, 2020 Outline 1. Fitting alignment 2. Local alignment 3. Gapped alignment 4. BLOSUM scoring matrix Reading: Jones and


  1. CS CS 466 466 In Introduct ctio ion t to B Bio ioin informatics ics Lecture 5 Mohammed El-Kebir February 4, 2020

  2. Outline 1. Fitting alignment 2. Local alignment 3. Gapped alignment 4. BLOSUM scoring matrix Reading: • Jones and Pevzner. Chapters 6.6-6.9 • Lecture notes 2

  3. NGS Characterized by Short Reads … CATTCAGTAG … … AGCCATTAG … … GGTAGTTAG … … GGTAAACTAG … … TATAATTAG … … CGTACCTAG … Genome 10-100’s million short reads Next-generation Millions -billions Short read : 100 nucleotides DNA sequencing nucleotides Allow for inexact matches due to: Human reference genome is 3,300,000,000 nucleotides, while a • Sequencing errors short read is 100 nucleotides. • Polymorphisms/mutations in Global sequence alignment will not reference genome work! Question : How to account for discrepancy between lengths of reference and short read? 3

  4. Fitting Alignment For short read alignment, we want to align complete short read 𝐰 ∈ Σ $ to substring of reference genome 𝐱 ∈ Σ & . Note that 𝑛 ≪ 𝑜 . 𝐰 ∈ Σ $ 𝐱 ∈ Σ & Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 4

  5. Fitting Alignment – Naive Approach Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 𝐰 ∈ Σ $ 𝐱 ∈ Σ & • Consider all contiguous non-empty substrings of 𝐱 , defined by 1 ≤ 𝑗 ≤ 𝑘 ≤ 𝑜 • How many? 5

  6. Fitting Alignment – Naive Approach Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 𝐰 ∈ Σ $ 𝐱 ∈ Σ & • Consider all contiguous non-empty substrings of 𝐱 , defined by 1 ≤ 𝑗 ≤ 𝑘 ≤ 𝑜 & • How many? Answer: 𝑜 + 2 • What are their total lengths? • What is the running time? 6

  7. Fitting Alignment – Dynamic Programming Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 0 T A C G G C  0 , if i = 0, 𝐰 \ 𝐱    s [ i − 1 , j ] + δ ( v i , − ) , if i > 0, 0  s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if i > 0 and j > 0,    A s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0.  s ∗ = max { s [ m, 0] , . . . , s [ m, n ] } G G 7

  8. Fitting Alignment – Dynamic Programming Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 Start anywhere on first row 0 T A C G G C  0 , if i = 0, 𝐰 \ 𝐱    s [ i − 1 , j ] + δ ( v i , − ) , if i > 0, 0  s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if i > 0 and j > 0,    A s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0.  s ∗ = max { s [ m, 0] , . . . , s [ m, n ] } End anywhere on last row G G 8

  9. Fitting Alignment – Dynamic Programming Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 Start anywhere on first row 0 T A C G G C  0 , if i = 0, 𝐰 \ 𝐱    s [ i − 1 , j ] + δ ( v i , − ) , if i > 0, 0  s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if i > 0 and j > 0,    A s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0.  s ∗ = max { s [ m, 0] , . . . , s [ m, n ] } End anywhere on last row G Question : Let match score be 1, G mismatch/indel score be -1. What is 𝑡 ∗ ? 𝐰 - A - G G - Question : Same scores. What is optimal 𝐱 global alignment and score? T A C G G C 9

  10. Fitting Alignment – Dynamic Programming • Online: https://valiec.github.io/AlignmentVisualizer/index.html 0 T A C G G C  0 , if i = 0, 𝐰 \ 𝐱    s [ i − 1 , j ] + δ ( v i , − ) , if i > 0, 0  s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if i > 0 and j > 0,    A s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0.  s ∗ = max { s [ m, 0] , . . . , s [ m, n ] } G Question : Let match score be 1, G mismatch/indel score be -1. What is 𝑡 ∗ ? 𝐰 - A - G G - Question : Same scores. What is optimal 𝐱 global alignment and score? T A C G G C 10

  11. Outline 1. Fitting alignment 2. Local alignment 3. Gapped alignment 4. BLOSUM scoring matrix Reading: • Jones and Pevzner. Chapters 6.6-6.9 • Lecture notes 11

  12. Local Alignment – Biological Motivation Proteins are composed of functional units called domains. Such domains may occur in different proteins even across species. SHKA ABL1 From Pfam database (http://pfam.sanger.ac.uk/) Local Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a substring of 𝐰 and a substring of 𝐱 whose alignment has maximum global alignment score 𝑡 ∗ among all global alignments of all substrings of 𝐰 and 𝐱 12

  13. Global, Fitting and Local Alignment Global Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find alignment of 𝐰 and 𝐱 with maximum score. Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 Local Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a substring of 𝐰 and a substring of 𝐱 whose alignment has maximum global alignment score 𝑡 ∗ among all global alignments of all substrings of 𝐰 and 𝐱 13

  14. Local Alignment – Naive Algorithm Local Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a substring of 𝐰 and a substring of 𝐱 whose alignment has maximum global alignment score 𝑡 ∗ among all global alignments of all substrings of 𝐰 and 𝐱 Brute force : 1. Generate all pairs (𝐰 4 , 𝐱 4 ) of substrings of 𝐰 and 𝐱 2. For each pair (𝐰 4 , 𝐱 4 ) , solve global alignment problem. Question : There are $ & 2 pairs of substrings. 2 But they have different lengths. What is the running time? 14

  15. Key Idea Global alignment : Start at (0,0) and end at (𝑛, 𝑜) • Local alignment : Start and end anywhere • 15

  16. Local Alignment Recurrence Local Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a substring of 𝐰 and a substring of 𝐱 whose alignment has maximum global alignment score 𝑡 ∗ among all global alignments of all substrings of 𝐰 and 𝐱  0 , if i = 0 and j = 0,    s [ i − 1 , j ] + δ ( v i , − ) , if i > 0,  s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if j > 0,    s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0.  s ∗ = max i,j s [ i, j ] 16

  17. Local Alignment Recurrence Local Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a substring of 𝐰 and a substring of 𝐱 whose alignment has maximum global alignment score 𝑡 ∗ among all global alignments of all substrings of 𝐰 and 𝐱 Start anywhere  0 , if i = 0 and j = 0,    s [ i − 1 , j ] + δ ( v i , − ) , if i > 0,  s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if j > 0,    s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0.  s ∗ = max i,j s [ i, j ] End anywhere Running time: 𝑃(𝑛𝑜) 17

  18. Local Alignment – Dynamic Programming • Online: https://valiec.github.io/AlignmentVisualizer/index.html 0 T A C G G C 𝐰 \ 𝐱 0  0 , if i = 0 and j = 0,   A  s [ i − 1 , j ] + δ ( v i , − ) , if i > 0,  s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if j > 0,    G s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0.  s ∗ = max i,j s [ i, j ] G 𝐰 G G Question : Let match score be 2, mismatch 𝐱 score be -2 and indel be -4. What is 𝑡 ∗ ? G G 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend