pairwise alignment
play

Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise - PowerPoint PPT Presentation

Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise Alignment Review: Tips and tricks Making a file executable: chmod a+x pydotter . py Handling file/directory names with spaces: cd My \ D i r e c t o r y \ with \ Spaces


  1. Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise Alignment

  2. Review: Tips and tricks Making a file executable: chmod ”a+x” pydotter . py Handling file/directory names with spaces: cd My \ D i r e c t o r y \ with \ Spaces or cd ”My D i r e c t o r y with Spaces ” Mark Voorhies Pairwise Alignment

  3. Review: Tips and tricks Killing a process on OS X: Try ctrl-c If that doesn’t work: ps − awx | grep name of process First column in ps output is PID (process ID) PID k i l l If that doesn’t work: k i l l − KILL PID On Linux: ps − e a l f | grep name of process Mark Voorhies Pairwise Alignment

  4. Review: Content FASTA files >Name Free-form annotation MGCLLIMKEGGPGRKHKLIVMLYLDENQ EHELPIMTRAPPEDINADNAMACHINEW NQEDLYMNILKHGPPGEDEDRKHEDEDG Mark Voorhies Pairwise Alignment

  5. Review: Content FASTA files >Name Free-form annotation MGCLLIMKEGGPGRKHKLIVMLYLDENQ EHELPIMTRAPPEDINADNAMACHINEW NQEDLYMNILKHGPPGEDEDRKHEDEDG Dotplots: unbiased plot of all possible ungapped alignments of two sequences. Mark Voorhies Pairwise Alignment

  6. Pairwise Alignment How can we automate our dotplot protocol to find the “best” gapped alignment of our sequences? Mark Voorhies Pairwise Alignment

  7. Pairwise Alignment How can we automate our dotplot protocol to find the “best” gapped alignment of our sequences? What do we mean by best? Mark Voorhies Pairwise Alignment

  8. Pairwise Alignment How can we automate our dotplot protocol to find the “best” gapped alignment of our sequences? What do we mean by best? Residues with equivalent functional roles are paired Mark Voorhies Pairwise Alignment

  9. Pairwise Alignment How can we automate our dotplot protocol to find the “best” gapped alignment of our sequences? What do we mean by best? Residues with equivalent functional roles are paired Residues that derive from the same position in the common ancestor are paired (homology) Mark Voorhies Pairwise Alignment

  10. Pairwise Alignment How can we automate our dotplot protocol to find the “best” gapped alignment of our sequences? What do we mean by best? Residues with equivalent functional roles are paired Residues that derive from the same position in the common ancestor are paired (homology) The sequence alignment maximizes a similarity function Mark Voorhies Pairwise Alignment

  11. Deriving scores from alignments Frequency of residue i : p i Mark Voorhies Pairwise Alignment

  12. Deriving scores from alignments Frequency of residue i : p i Frequency of residue i aligned to residue j : q ij Mark Voorhies Pairwise Alignment

  13. Deriving scores from alignments Frequency of residue i : p i Frequency of residue i aligned to residue j : q ij Expected frequency if i and j are independent: p i p j Mark Voorhies Pairwise Alignment

  14. Deriving scores from alignments Frequency of residue i : p i Frequency of residue i aligned to residue j : q ij Expected frequency if i and j are independent: p i p j Ratio of observed to expected frequency: q ij p i p j Mark Voorhies Pairwise Alignment

  15. Deriving scores from alignments Frequency of residue i : p i Frequency of residue i aligned to residue j : q ij Expected frequency if i and j are independent: p i p j Ratio of observed to expected frequency: q ij p i p j Log odds (LOD) score: s ( i , j ) = log q ij p i p j Mark Voorhies Pairwise Alignment

  16. PAM (Dayhoff) and BLOSUM matrices PAM1 matrix originally calculated from manual alignments of highly conserved sequences (myoglobin, cytochrome C, etc.) Mark Voorhies Pairwise Alignment

  17. PAM (Dayhoff) and BLOSUM matrices PAM1 matrix originally calculated from manual alignments of highly conserved sequences (myoglobin, cytochrome C, etc.) We can think of a PAM matrix as evolving a sequence by one unit of time. Mark Voorhies Pairwise Alignment

  18. PAM (Dayhoff) and BLOSUM matrices PAM1 matrix originally calculated from manual alignments of highly conserved sequences (myoglobin, cytochrome C, etc.) We can think of a PAM matrix as evolving a sequence by one unit of time. If evolution is uniform over time, then PAM matrices for larger evolutionary steps can be generated by multiplying PAM1 by itself (so, higher numbered PAM matrices represent greater evolutionary distances). Mark Voorhies Pairwise Alignment

  19. PAM (Dayhoff) and BLOSUM matrices PAM1 matrix originally calculated from manual alignments of highly conserved sequences (myoglobin, cytochrome C, etc.) We can think of a PAM matrix as evolving a sequence by one unit of time. If evolution is uniform over time, then PAM matrices for larger evolutionary steps can be generated by multiplying PAM1 by itself (so, higher numbered PAM matrices represent greater evolutionary distances). The BLOSUM matrices were determined from automatically generated ungapped alignments. Higher numbered BLOSUM matrices correspond to smaller evolutionary distances. BLOSUM62 is the default matrix for BLAST. Mark Voorhies Pairwise Alignment

  20. BLOSUM80 Mark Voorhies Pairwise Alignment

  21. BLOSUM62 Mark Voorhies Pairwise Alignment

  22. BLOSUM45 Mark Voorhies Pairwise Alignment

  23. Fun with logarithms In log space, multiplication and division become addition and subtraction: log( xy ) = log( x ) + log( y ) log( x / y ) = log( x ) − log( y ) Therefore, exponentiation becomes multiplication: log( x y ) = y log( x ) Also, we can change of the base of a logarithm like so: log A ( x ) = log( x ) / log( A ) Mark Voorhies Pairwise Alignment

  24. Scoring an alignment Log odds (LOD) score: s ( i , j ) = log q ij p i p j Mark Voorhies Pairwise Alignment

  25. Scoring an alignment Log odds (LOD) score: s ( i , j ) = log q ij p i p j Multiplying independent probabilities is equivalent to adding independent log probabilities. Mark Voorhies Pairwise Alignment

  26. Scoring an alignment Log odds (LOD) score: s ( i , j ) = log q ij p i p j Multiplying independent probabilities is equivalent to adding independent log probabilities. Therefore, for an ungapped alignment can be scored as: N N q x i y i � � S ( x , y ) = log = s ( x i , y i ) p x i p y i i i Mark Voorhies Pairwise Alignment

  27. Scoring an alignment Log odds (LOD) score: s ( i , j ) = log q ij p i p j Multiplying independent probabilities is equivalent to adding independent log probabilities. Therefore, for an ungapped alignment can be scored as: N N q x i y i � � S ( x , y ) = log = s ( x i , y i ) p x i p y i i i What about gaps? Mark Voorhies Pairwise Alignment

  28. Scoring an alignment Log odds (LOD) score: s ( i , j ) = log q ij p i p j Multiplying independent probabilities is equivalent to adding independent log probabilities. Therefore, for an ungapped alignment can be scored as: N N q x i y i � � S ( x , y ) = log = s ( x i , y i ) p x i p y i i i What about gaps? Probability of an insertion/deletion event (gap opening, G ) Length distribution of insertions/deletions (gap extension, E ) Mark Voorhies Pairwise Alignment

  29. Scoring an alignment Log odds (LOD) score: s ( i , j ) = log q ij p i p j Multiplying independent probabilities is equivalent to adding independent log probabilities. Therefore, for an ungapped alignment can be scored as: N N q x i y i � � S ( x , y ) = log = s ( x i , y i ) p x i p y i i i What about gaps? Probability of an insertion/deletion event (gap opening, G ) Length distribution of insertions/deletions (gap extension, E ) gaps � S gapped ( x , y ) = S ( x , y ) + ( G + E ∗ L i ) i Mark Voorhies Pairwise Alignment

  30. Scoring an alignment Log odds (LOD) score: s ( i , j ) = log q ij p i p j Multiplying independent probabilities is equivalent to adding independent log probabilities. Therefore, for an ungapped alignment can be scored as: N N q x i y i � � S ( x , y ) = log = s ( x i , y i ) p x i p y i i i What about gaps? Probability of an insertion/deletion event (gap opening, G ) Length distribution of insertions/deletions (gap extension, E ) gaps � S gapped ( x , y ) = S ( x , y ) + ( G + E ∗ L i ) i We find an optimal alignment by finding x and y that maximize S . Mark Voorhies Pairwise Alignment

  31. How many ways can we align two sequences? Mark Voorhies Pairwise Alignment

  32. How many ways can we align two sequences? Mark Voorhies Pairwise Alignment

  33. How many ways can we align two sequences? Mark Voorhies Pairwise Alignment

  34. How many ways can we align two sequences? Mark Voorhies Pairwise Alignment

  35. How many ways can we align two sequences? Binomial formula: � k � k ! = r ( k − r )! r ! � 2 n � = (2 n )! n n ! n ! Stirling’s approximation: √ � x x + 1 � e − x x ! ≈ 2 π 2 ≈ 2 2 n � 2 n � √ π n n Mark Voorhies Pairwise Alignment

  36. Scoring an alignment quickly 2 2 n √ π n is too expensive. Mark Voorhies Pairwise Alignment

  37. Scoring an alignment quickly 2 2 n √ π n is too expensive. gaps � S gapped ( x , y ) = S ( x , y ) + ( G + E ∗ L i ) i Mark Voorhies Pairwise Alignment

  38. Scoring an alignment quickly 2 2 n √ π n is too expensive. gaps � S gapped ( x , y ) = S ( x , y ) + ( G + E ∗ L i ) i The best alignment of any pair of subsequences is independent of the global alignment. Mark Voorhies Pairwise Alignment

  39. Dynamic Programming Mark Voorhies Pairwise Alignment

  40. Needleman-Wunsch Global Alignment Mark Voorhies Pairwise Alignment

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend