substitution matrices
play

Substitution Matrices Michael Schroeder Biotechnology Center TU - PowerPoint PPT Presentation

Substitution Matrices Michael Schroeder Biotechnology Center TU Dresden Contents Why to compare and align sequences? How to judge an alignment? Z-score, E-value, P-value, structure and function How to compare and align sequences?


  1. Substitution Matrices Michael Schroeder Biotechnology Center TU Dresden

  2. Contents § Why to compare and align sequences? § How to judge an alignment? § Z-score, E-value, P-value, structure and function § How to compare and align sequences? § Levensthein distance, scoring schemes, longest common subsequence, global and local alignment , substitution matrix , § How to compute an alignment? § Dynamic programming § How to compute an alignment fast? § Blast § How to align many sequences § Multiple sequence alignment, phylogenetic trees § Alignments and structure § How to predict protein structure from protein sequence 2

  3. Scoring Scheme § Consider the alignments: A T G C TTATACGTAA A 2 -2 0 -2 TTA-CC-AAA T -2 2 -2 0 vs. G 0 -2 2 -2 TTATACGTAA C -2 0 -2 2 TTAC-CA-AA § with s m = 1, s r = s g = -1. § Are these alignments equally good? 3

  4. Properties of Amino Acids Hydrophobic Polar A S T G Acidic P D N Q E I L Y V C K W H M R Basic F Aromatic 4

  5. BLOSUM: Blocks Substitution Matrix Blosum62, Henikoff, 1992 5

  6. How to choose data? § BLOSUM: Blocks Substitution matrix, Henikoff 1992 § BLOSUM62: Maximal sequence identity is 62% § 3000 Blocks of 800 families § > 2300 occurrences of each substitution § General problem: How to choose representative data? 6

  7. Log Odds Ratio § How to define a score for mutation x-y : § Many substitutions x-y : high score x-y § Many occurrences of x : low score for x-y § Many occurrences of y : low score for x-y § Log odds ratio: P ( x, y ) log 2 P ( x ) P ( y ) § P ( x , y ) is probability for a substitution x ➞ y § P ( x , y ) = P ( y , x ) § P ( x ) is the probability for finding x § Where else could log odds ratios be useful? 7

  8. How to Count Substitutions? ACGCTAFKI ACGCTAFKL ASGCTAFKL ACACTAFKL GCGCTAFKI GCGCTGFKI GCGCTLFKI How many A ➞ G? 8

  9. Substitutions with and without Evolutionary Model § How many A ➞ G? ACGCTAFKI ACGCTAFKL ASGCTAFKL § BLOSUM: ACACTAFKL § 4*3+1*6+5*1 = 23 GCGCTAFKI GCGCTGFKI § PAM: 3 GCGCTLFKI 9

  10. PAM: Evolutionary Model ACGCTAFKI ACGCTAFKL ASGCTAFKL ACACTAFKL GCGCTAFKI ACGCTAFKI GCGCTGFKI GCGCTLFKI A ➞ G I ➞ L GCGCTAFKI ACGCTAFKL A ➞ G A ➞ L C ➞ S G ➞ A GCGCTGFKI GCGCTLFKI ASGCTAFKL ACACTAFKL 10

  11. How to Measure Quality of Substitution Matrices? 11

  12. Styczynski 2008: Error in BLOSUM62! § But: Revised BLOSUM does worse! Red +1 of Blosum, green -1 12

  13. Do Substitution Matrices Reflect Physicochemical Properties? 13

  14. Needleman-Wunsch Algorithm Global alignment of string a to b Let a = a 1 . . . a m and b = b 1 . . . b n be strings. Then needle a,b = needle a,b ( m, n ) is the global alignment score of a and b , where 8 if j = 0 , is g > > > if i = 0 , js g > > > < 8 needle a,b ( i � 1 , j ) + s g needle a,b ( i, j ) = > < > needle a,b ( i, j � 1) + s g max otherwise, > > > > > > needle a,b ( i � 1 , j � 1) + s m ( ai = bj ) : : for 0  i  m , 0  j  n , gap penalty s g < 0 , mismatch penalty s r < 0 , match reward s m > 0 , and ( s m if ( a i = b j ) , s m ( ai = bj ) := if ( a i 6 = b j ) s r 14

  15. Needleman-Wunsch Algorithm with Substitution Matrix Global alignment of string a to b Let a = a 1 . . . a m and b = b 1 . . . b n be strings. Then needle a,b = needle a,b ( m, n ) is the global alignment score of a and b with substitution matrix, where  if j = 0 , is g    if i = 0 , js g      needle a,b ( i, j ) = needle a,b ( i − 1 , j ) + s g    max needle a,b ( i, j − 1) + s g otherwise,       needle a,b ( i − 1 , j − 1) + d s ( a i , b j )   for 0 ≤ i ≤ m and 0 ≤ j ≤ n , substitution matrix d s ( a i , b j ) , and gap penalty s g < 0 . 15

  16. Global Alignment with Dynamic Programming needle(a,b): let d be a matrix of size m+1 × n+1 for 0 ≤ i ≤ m: d[i,0] = i * s g for 1 ≤ j ≤ n: d[0,j] = j * s g for 1 ≤ i ≤ m: for 1 ≤ j ≤ n: if a i == b j : s = s m else: s = s r d[i,j] = max(d[i-1,j ] + s g , d[i ,j-1] + s g , d[i-1,j-1] + s) return d[m,n] 16

  17. Global Alignment with Substitution Matrix and Dynamic Programming needle(a,b,d s ): let d be a matrix of size m+1 × n+1 for 0 ≤ i ≤ m: d[i,0] = i * s g for 1 ≤ j ≤ n: d[0,j] = j * s g for 1 ≤ i ≤ m: for 1 ≤ j ≤ n: d[i,j] = max(d[i-1,j ] + s g , d[i ,j-1] + s g , d[i-1,j-1] + d s [a i ,b j ]) return d[m,n] 17

  18. Global Alignment with SM and DP: Mark the Path needle(a,b,d s ): let d be the distance matrix of size m+1 × n+1 let d b be the backtracking matrix of size m+1 × n+1 d[0,0], d b [0,0] = 0, ∅ for 1 ≤ i ≤ m: d[i,0] = i * s g d b [i,0] = {'N'} for 1 ≤ j ≤ n: d[0,j] = j * s g d b [0,j] = {'W'} for 1 ≤ i ≤ m: for 1 ≤ j ≤ n: d[i,j], d b [i,j] = max_dir(d[i-1,j ] + s g , 
 d[i ,j-1] + s g , d[i-1,j-1] + d s [a i ,b j ]) return d[m,n], d b Legend: red=new code 18

  19. Global Alignment with DP: Determine Directions max_dir(d N ,d W ,d NW ): dir = {} if d N == max(d N ,d W ,d NW ): dir.add('N') if d W == max(d N ,d W ,d NW ): dir.add('W') if d NW == max(d N ,d W ,d NW ): dir.add('NW') return max(d N ,d W ,d NW ), dir 19

  20. Global Alignment with DP: Printing the Alignment print-needle(a,b,d b ): return print-needle(a,b,m,n,d b ) print-needle(a,b,i,j,d b ): if 'NW' in d b [i,j]: print-needle(a,i-1,j-1,d b ) print a i , b j else if 'W' in d b [i,j]: print-needle(a,i,j-1,d b ) print '-', b j else if 'N' in d b [i,j]: print-needle(a,i-1,j,d b ) print a i , '-' 20

  21. Global Alignment with Dynamic Programming i \ j p e t e r p e d r o 21

  22. Build a Substitution Matrix 22

  23. Summary § BLOSUM (and PAM) § Log odds ratio § Role of evolutionary model § How to assess substitution matrices 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend