Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise - PowerPoint PPT Presentation

Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise Alignment

Review: Tips and tricks Making a file executable: chmod ”a+x” pydotter . py Handling file/directory names with spaces: cd My \ D i r e c t o r y \ with \ Spaces or cd ”My D i r e c t o r y with Spaces ” Mark Voorhies Pairwise Alignment

Review: Tips and tricks Killing a process on OS X: Try ctrl-c If that doesn’t work: ps − awx | grep name of process First column in ps output is PID (process ID) PID k i l l If that doesn’t work: k i l l − KILL PID On Linux: ps − e a l f | grep name of process Mark Voorhies Pairwise Alignment

Review: Content FASTA files >Name Free-form annotation MGCLLIMKEGGPGRKHKLIVMLYLDENQ EHELPIMTRAPPEDINADNAMACHINEW NQEDLYMNILKHGPPGEDEDRKHEDEDG Mark Voorhies Pairwise Alignment

Review: Content FASTA files >Name Free-form annotation MGCLLIMKEGGPGRKHKLIVMLYLDENQ EHELPIMTRAPPEDINADNAMACHINEW NQEDLYMNILKHGPPGEDEDRKHEDEDG Dotplots: unbiased plot of all possible ungapped alignments of two sequences. Mark Voorhies Pairwise Alignment

Pairwise Alignment How can we automate our dotplot protocol to find the “best” gapped alignment of our sequences? Mark Voorhies Pairwise Alignment

Pairwise Alignment How can we automate our dotplot protocol to find the “best” gapped alignment of our sequences? What do we mean by best? Mark Voorhies Pairwise Alignment

Pairwise Alignment How can we automate our dotplot protocol to find the “best” gapped alignment of our sequences? What do we mean by best? Residues with equivalent functional roles are paired Mark Voorhies Pairwise Alignment

Pairwise Alignment How can we automate our dotplot protocol to find the “best” gapped alignment of our sequences? What do we mean by best? Residues with equivalent functional roles are paired Residues that derive from the same position in the common ancestor are paired (homology) Mark Voorhies Pairwise Alignment

Pairwise Alignment How can we automate our dotplot protocol to find the “best” gapped alignment of our sequences? What do we mean by best? Residues with equivalent functional roles are paired Residues that derive from the same position in the common ancestor are paired (homology) The sequence alignment maximizes a similarity function Mark Voorhies Pairwise Alignment

Deriving scores from alignments Frequency of residue i : p i Mark Voorhies Pairwise Alignment

Deriving scores from alignments Frequency of residue i : p i Frequency of residue i aligned to residue j : q ij Mark Voorhies Pairwise Alignment

Deriving scores from alignments Frequency of residue i : p i Frequency of residue i aligned to residue j : q ij Expected frequency if i and j are independent: p i p j Mark Voorhies Pairwise Alignment

Deriving scores from alignments Frequency of residue i : p i Frequency of residue i aligned to residue j : q ij Expected frequency if i and j are independent: p i p j Ratio of observed to expected frequency: q ij p i p j Mark Voorhies Pairwise Alignment

Deriving scores from alignments Frequency of residue i : p i Frequency of residue i aligned to residue j : q ij Expected frequency if i and j are independent: p i p j Ratio of observed to expected frequency: q ij p i p j Log odds (LOD) score: s ( i , j ) = log q ij p i p j Mark Voorhies Pairwise Alignment

PAM (Dayhoff) and BLOSUM matrices PAM1 matrix originally calculated from manual alignments of highly conserved sequences (myoglobin, cytochrome C, etc.) Mark Voorhies Pairwise Alignment

PAM (Dayhoff) and BLOSUM matrices PAM1 matrix originally calculated from manual alignments of highly conserved sequences (myoglobin, cytochrome C, etc.) We can think of a PAM matrix as evolving a sequence by one unit of time. Mark Voorhies Pairwise Alignment

PAM (Dayhoff) and BLOSUM matrices PAM1 matrix originally calculated from manual alignments of highly conserved sequences (myoglobin, cytochrome C, etc.) We can think of a PAM matrix as evolving a sequence by one unit of time. If evolution is uniform over time, then PAM matrices for larger evolutionary steps can be generated by multiplying PAM1 by itself (so, higher numbered PAM matrices represent greater evolutionary distances). Mark Voorhies Pairwise Alignment

PAM (Dayhoff) and BLOSUM matrices PAM1 matrix originally calculated from manual alignments of highly conserved sequences (myoglobin, cytochrome C, etc.) We can think of a PAM matrix as evolving a sequence by one unit of time. If evolution is uniform over time, then PAM matrices for larger evolutionary steps can be generated by multiplying PAM1 by itself (so, higher numbered PAM matrices represent greater evolutionary distances). The BLOSUM matrices were determined from automatically generated ungapped alignments. Higher numbered BLOSUM matrices correspond to smaller evolutionary distances. BLOSUM62 is the default matrix for BLAST. Mark Voorhies Pairwise Alignment

BLOSUM80 Mark Voorhies Pairwise Alignment

Fun with logarithms In log space, multiplication and division become addition and subtraction: log( xy ) = log( x ) + log( y ) log( x / y ) = log( x ) − log( y ) Therefore, exponentiation becomes multiplication: log( x y ) = y log( x ) Also, we can change of the base of a logarithm like so: log A ( x ) = log( x ) / log( A ) Mark Voorhies Pairwise Alignment

Scoring an alignment Log odds (LOD) score: s ( i , j ) = log q ij p i p j Mark Voorhies Pairwise Alignment

Scoring an alignment Log odds (LOD) score: s ( i , j ) = log q ij p i p j Multiplying independent probabilities is equivalent to adding independent log probabilities. Mark Voorhies Pairwise Alignment

Scoring an alignment Log odds (LOD) score: s ( i , j ) = log q ij p i p j Multiplying independent probabilities is equivalent to adding independent log probabilities. Therefore, for an ungapped alignment can be scored as: N N q x i y i � � S ( x , y ) = log = s ( x i , y i ) p x i p y i i i Mark Voorhies Pairwise Alignment

Scoring an alignment Log odds (LOD) score: s ( i , j ) = log q ij p i p j Multiplying independent probabilities is equivalent to adding independent log probabilities. Therefore, for an ungapped alignment can be scored as: N N q x i y i � � S ( x , y ) = log = s ( x i , y i ) p x i p y i i i What about gaps? Mark Voorhies Pairwise Alignment

Scoring an alignment Log odds (LOD) score: s ( i , j ) = log q ij p i p j Multiplying independent probabilities is equivalent to adding independent log probabilities. Therefore, for an ungapped alignment can be scored as: N N q x i y i � � S ( x , y ) = log = s ( x i , y i ) p x i p y i i i What about gaps? Probability of an insertion/deletion event (gap opening, G ) Length distribution of insertions/deletions (gap extension, E ) Mark Voorhies Pairwise Alignment

Scoring an alignment Log odds (LOD) score: s ( i , j ) = log q ij p i p j Multiplying independent probabilities is equivalent to adding independent log probabilities. Therefore, for an ungapped alignment can be scored as: N N q x i y i � � S ( x , y ) = log = s ( x i , y i ) p x i p y i i i What about gaps? Probability of an insertion/deletion event (gap opening, G ) Length distribution of insertions/deletions (gap extension, E ) gaps � S gapped ( x , y ) = S ( x , y ) + ( G + E ∗ L i ) i Mark Voorhies Pairwise Alignment

Scoring an alignment Log odds (LOD) score: s ( i , j ) = log q ij p i p j Multiplying independent probabilities is equivalent to adding independent log probabilities. Therefore, for an ungapped alignment can be scored as: N N q x i y i � � S ( x , y ) = log = s ( x i , y i ) p x i p y i i i What about gaps? Probability of an insertion/deletion event (gap opening, G ) Length distribution of insertions/deletions (gap extension, E ) gaps � S gapped ( x , y ) = S ( x , y ) + ( G + E ∗ L i ) i We find an optimal alignment by finding x and y that maximize S . Mark Voorhies Pairwise Alignment

How many ways can we align two sequences? Mark Voorhies Pairwise Alignment

How many ways can we align two sequences? Binomial formula: � k � k ! = r ( k − r )! r ! � 2 n � = (2 n )! n n ! n ! Stirling’s approximation: √ � x x + 1 � e − x x ! ≈ 2 π 2 ≈ 2 2 n � 2 n � √ π n n Mark Voorhies Pairwise Alignment

Scoring an alignment quickly 2 2 n √ π n is too expensive. Mark Voorhies Pairwise Alignment

Scoring an alignment quickly 2 2 n √ π n is too expensive. gaps � S gapped ( x , y ) = S ( x , y ) + ( G + E ∗ L i ) i Mark Voorhies Pairwise Alignment

Scoring an alignment quickly 2 2 n √ π n is too expensive. gaps � S gapped ( x , y ) = S ( x , y ) + ( G + E ∗ L i ) i The best alignment of any pair of subsequences is independent of the global alignment. Mark Voorhies Pairwise Alignment

Dynamic Programming Mark Voorhies Pairwise Alignment

Needleman-Wunsch Global Alignment Mark Voorhies Pairwise Alignment

Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise - PowerPoint PPT Presentation

Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise Alignment Review: Tips and tricks Making a file executable: chmod a+x pydotter . py Handling file/directory names with spaces: cd My \ D i r e c t o r y \ with \ Spaces

Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Multiple Sequence Multiple Sequence Alignments Alignments Multiple alignment Pairwise

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Ben Burr Trail PROJECT ALIGNMENT Project alignment Hamblen Elem School PROJECT ALIGNMENT

Data driven Ontology Alignment Data driven Ontology Alignment Nigam Shah nigam@stanford.edu

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

Image alignment Slides from Derek Hoiem, Svetlana Lazebnik Image source Alignment applications

Graph Resistance and Learning from Pairwise Comparisons pairwise comparisons of items. In

Pairwise sequence alignments Volker Flegel Vassilios Ioannidis VI - 2004 Page 1 Outline

PAIRWISE DECOMPOSITION OF IMAGE SEQUENCES FOR ACTIVE MULTI-VIEW RECOGNITION(EXPERIMENT)

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions MLSIG

TOD Alignment Rezoning Public Meeting July 18, 2019 TOD Alignment Rezoning The TOD Alignment

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Lecture 05: Smith Charts Matthew Spencer Harvey Mudd College E157 Radio Frequency Circuit

Factorization structures via the non-commutative Hilbert scheme of points in C 3 Emily Cliff

Lecture 15 Guidelines for Root Locus Summary Process Control Prof. Kannan M. Moudgalya IIT

Advantages and dangers on utilizing GeoGebra Automated Reasoning Tools Zolt an Kov acs The

Combining distributional semantics and structured data to study lexical change Astrid van Aggelen ,

Mortality Prediction via Orthogonal Matching Pursuit Aadirupa Saha * , Chandrahas Dewangan ,

Scenarios: The Case of The European Library Nuno Freire The European Library SWIB14 Semantic

Performing Arts LOD of ECLAP Performing Arts LOD of ECLAP Content Service Pierfrancesco Bellini,