Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence - PowerPoint PPT Presentation

Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard Jäger 9 / 62 Dynamic Programming m E n S − 0 1 2 3 4 − 1 0 1 2 3 m 2 1 1 2 3 e 3 2 2 1 2 n E 4 3 2 2 2 s 5 4 3

Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard Jäger 9 / 62 Dynamic Programming m E n S − 0 1 2 3 4 − 1 0 1 2 3 m 2 1 1 2 3 e 3 2 2 1 2 n E 4 3 2 2 2 s 5 4 3 3

Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard Jäger 9 / 62 Dynamic Programming m E n S − 0 1 2 3 4 − 1 0 1 2 3 m 2 1 1 2 3 e 3 2 2 1 2 n E 4 3 2 2 2 s 5 4 3 3 3

Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard Jäger 9 / 62 Dynamic Programming m E n S − 0 1 2 3 4 − 1 0 1 2 3 m 2 1 1 2 3 e 3 2 2 1 2 n E 4 3 2 2 2 s 5 4 3 3 3 memorizing in each step which of the three cells to the left and above gave rise to the current entry lets us recover the corresponing optimal alignment

Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard Jäger 9 / 62 Dynamic Programming m E n S − 0 1 2 3 4 − 1 0 1 2 3 m 2 1 1 2 3 e 3 2 2 1 2 n E 4 3 2 2 2 3 s 5 4 3 3 memorizing in each step which of the three cells to the left and above gave rise to the current entry lets us recover the corresponing optimal alignment

Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard Jäger 9 / 62 Dynamic Programming m E n S − 0 1 2 3 4 − 1 0 1 2 3 m 2 1 1 2 3 e 3 2 2 1 2 n E 4 3 2 2 2 3 s 5 4 3 3 m E n S − m e n E s

Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard Jäger 9 / 62 m E n S − m E n S − 0 1 2 3 4 − 0 1 2 3 4 − 0 m 1 1 2 3 1 0 1 2 3 m 2 1 1 2 3 e e 2 1 1 2 3 3 2 2 1 2 n 1 n 3 2 2 2 4 3 2 2 2 E 4 3 2 2 2 E s 5 4 3 3 3 5 4 3 3 3 s m E n S − m E n S − m e n E s m e n E s

Pairwise alignment Normalization for length ESSLLI 2016 Sequence Alignment Gerhard Jäger normalization: dividing Levenshtein distance by length of longer string: 10 / 62 grm. ze3n ( sehen , ’see’) and Hindi deg are not cognate still cognate grm. mEnS ( Mensch , ’person’) and Hindi manuSya are (partially) d L ( mEnS , manuSya ) = 4 d L ( ze3n , deg ) = 3 d LD ( mEnS , manuSya ) = 4/7 ≈ 0 . 57 d LD ( ze3n , deg ) = 3/4 = 0 . 75

Pairwise alignment How well does normalized Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard Jäger 11 / 62 predict cognacy? 1.00 1.0 0.8 0.75 empirical probability of cognacy 0.6 cognate LDN no 0.50 yes 0.4 0.25 0.2 0.0 0.2 0.4 0.6 0.8 0.00 LDN no yes cognate

Pairwise alignment t i s corresponding sounds count as mismatches even if they are aligend correctly h a n t h a n h s E n d m a n o substantial amount of chance similarities Gerhard Jäger Sequence Alignment ESSLLI 2016 k i Problems - binary distinction: match vs. non-match frequently genuin sound correspondences in cognates are missed: c v a i n a z 3 - - p f i S - - t u n - o s 12 / 62

Pairwise alignment Background: probability theory Given two sequences: How likely is it that they are aligned? More general question: Given some data, and two competing hypotheses, how likely is it that the first hypothesis is correct? Bayesian Inference!!! given: wanted: Gerhard Jäger Sequence Alignment ESSLLI 2016 13 / 62 data: d hypotheses: h 1 , h 0 model: P ( d | h 1 ) , P ( d | h 0 ) P ( h 1 | d ) : P ( h 0 | d )

Pairwise alignment ergo: ESSLLI 2016 Sequence Alignment Gerhard Jäger Bayesian inference 14 / 62 Bayes Theorem: P ( d | h ) P ( h ) P ( h | d ) = � h ′ P ( d | h ′ ) P ( h ′ ) P ( h 1 | d ) : P ( h 0 | d ) = P ( d | h 1 ) P ( h 1 ) : P ( d | h 0 ) P ( h 0 ) P ( d | h 1 ) P ( h 1 ) P ( h 1 | d ) : P ( h 0 | d ) = P ( d | h 0 ) P ( h 0 ) log P ( d | h 1 ) P ( d | h 0 ) + log P ( h 1 ) log ( P ( h 1 | d ) : P ( h 0 | d )) = P ( h 0 )

Pairwise alignment Bayesian inference ESSLLI 2016 Sequence Alignment Gerhard Jäger 16 / 62 there are various heuristics, but no generally accepted way to obtain mein argument against using Bayes’ rule: the prior probabilities them P ( h 1 ) , P ( h 0 ) are not known if n is large though, log P ( h 1 ) / P ( h 0 ) doesn’t matter very much: 1 n log P ( d i | h 1 ) log ( P ( h 1 | � d ) : P ( h 0 | � � P ( d i | h 0 ) = log ( P ( � d | h 1 ) : P ( � d )) ≈ d | h 0 )) i =1 the quantity log ( P ( � d | h 1 ) : P ( � d | h 0 )) is called log-odds 1 Also, if we choose an uninformative prior with P ( h 1 ) = P ( h 0 ) , we have log P ( h 1 ) / P ( h 0 ) = 0 anyway.

Pairwise alignment Log-odds log-odds can take any real value the higher the absolute value, the stronger is the evidence Gerhard Jäger Sequence Alignment ESSLLI 2016 17 / 62 a positive value indicates evidence for h 1 and a negative value evidence for h 0

Pairwise alignment Weighted alignment for the time being, we assume there are no gaps in the alignment additional assumptions (rough approximation in biology, pretty much off the mark in linguistics): substitions in different positions occur independently Gerhard Jäger Sequence Alignment ESSLLI 2016 18 / 62 suppose our data are two aligned sequences � x , � y h 1 : they developed from a common ancestor via substitions h 0 : they are unrelated

Pairwise alignment from all other positions ESSLLI 2016 Sequence Alignment Gerhard Jäger The null model then 19 / 62 assume the strings have no “grammar”; each position is independent as a start (quite wrong both in biology and in linguistics): let us their individual probabilities if � x and � y are unrelated, their joint probability equals the product of P ( � y | h 0 ) = P ( � x | h 0 ) P ( � y | h 0 ) x, � � = P ( x i | h 0 ) P ( y i | h 0 ) i � log P ( � log ( P ( x i | h 0 ) + log P ( y i | h 0 )) x, � y | h 0 ) = i

Pairwise alignment The null model ESSLLI 2016 Sequence Alignment Gerhard Jäger 20 / 62 DNA and protein comparison, false for cross-linguistic word comparison) suppose � x and � y are generated by the same process (reasonable for then P ( x i | h ) , P ( y i | h ) are simply the probabilities of occurrence q a : probability that symbol a occurs in a sequence � � log P ( � log q x i + log q y j x, � y | h 0 ) = i j q can be estimated from relative frequencies

Pairwise alignment The alignment model ESSLLI 2016 Sequence Alignment Gerhard Jäger 21 / 62 independence between positions: substitution mutations suppose � x and � y evolved from a common ancestor via independent � P ( � y | h 1 ) = P ( x i , y i | h 2 ) x, � i p a,b : probability that a position in the latest common ancestor of x and y evolved into an a in sequence � x and into a b in sequence � y � P ( � x, � y | h 1 ) = p x i ,y i i log P ( � � log p x i ,y i y | h 1 ) = x, � i

Pairwise alignment The log-odds score ESSLLI 2016 Sequence Alignment Gerhard Jäger assembled in a PMI substitution matrix also goes by the name of Pointwise Mutual Information (PMI) 22 / 62 taking things together, we have log p x i ,y i � log ( P ( � x, � y | h 1 ) : P ( � x, � y | h 0 )) = q x i q y i i log p ab q a q b : score of the alignment of a with b

Pairwise alignment Substitution matrices ESSLLI 2016 Sequence Alignment Gerhard Jäger 23 / 62 in bioinformatics, several commonly used substitution matrices for for nucleotids: nucleotids and proteins based on explicit models of evolution and careful empirical testing A G T C A 2 − 5 − 7 − 7 − 5 2 − 7 − 7 G T − 7 − 7 2 − 5 − 7 − 7 − 5 2 C

Pairwise alignment Substitution matrices for proteins: different matrices for different evolutionary distances for instance: BLOSUM50 Gerhard Jäger Sequence Alignment ESSLLI 2016 24 / 62

Pairwise alignment An.CENTRAL_MALAYO-POLYNESIAN.BALILEDO An.NORTHWEST_MALAYO-POLYNESIAN.LAHANAN NC.BANTOID.LIFONGA NC.BANTOID.BOMBOMA_2 IE.INDIC.WAD_PAGGA IE.INDIC.TALAGANG_HINDKO NC.BANTOID.LINGALA NC.BANTOID.LIFONGA An.CENTRAL_MALAYO-POLYNESIAN.PALUE An.NORTHERN_PHILIPPINES.LIMOS_KALINGA AuA.MUNDA.HO AuA.MUNDA.KORKU MGe.GE-KAINGANG.KAYAPO MGe.GE-KAINGANG.APINAYE Gerhard Jäger Sequence Alignment ESSLLI 2016 An.MESO-PHILIPPINE.CANIPAAN_PALAWAN An.SOUTHERN_PHILIPPINES.KAGAYANEN Substitution matrix for the ASJP data Pan.PANOAN.KASHIBO_SAN_ALEJANDRO 1. identify large sample of pairs of closely related languages (using expert information or heuristics based on aggregated Levenshtein distance) An.NORTHERN_PHILIPPINES.CENTRAL_BONTOC An.MESO-PHILIPPINE.NORTHERN_SORSOGON WF.WESTERN_FLY.IAMEGA WF.WESTERN_FLY.GAMAEWE Pan.PANOAN.KASHIBO_BAJO_AGUAYTIA AA.EASTERN_CUSHITIC.KAMBAATA_2 UA.AZTECAN.NAHUATL_CUENTEPEC_TEMIXCO AA.EASTERN_CUSHITIC.HADIYYA_2 ST.BAI.QILIQIAO_BAI_2 ST.BAI.YUNLONG_BAI An.SULAWESI.MANDAR An.OCEANIC.RAGA An.SULAWESI.TANETE An.SAMA-BAJAW.BOEPINANG_BAJAU UA.AZTECAN.NAHUATL_HUEYAPAN_TETELA_DEL_VOLCAN 25 / 62

Pairwise alignment n ESSLLI 2016 Sequence Alignment Gerhard Jäger nn: 1; ii: 1; sS; 1; ae: 1; mm: 1 5. for each sound pair, count number of correspondences m e S i m Substitution matrix for the ASJP data a s i n 4. do Levenshtein alignment nisam , niSem 3. find corresponding words from the two languages: concept: one languages: Pen.MAIDUAN.MAIDU_KONKAU , Pen.MAIDUAN.NE_MAIDU 2. pick a concept and a pair of related languages at random 26 / 62

Pairwise alignment 6,681 6,829 a ! 2 p p a d ! 2 w w 6,613 ! y d 2 N b l l 11,377 X z 2 b 8,965 8 ! k 2 s s 8,245 q 2 N s . 2 . . . . . . . E . v S 2 Gerhard Jäger Sequence Alignment ESSLLI 2016 ! 5,255 6,275 G ! E 2 h h 5,331 j 2 3 y y 5,321 G i 2 3 2 G Substitution matrix for the ASJP data i . . . . . i 33,955 . 4 8 2 u u 23,731 4 . . 2 klom steps 2-5 are repeated 100,000 times klem S3--v ligini kulox Naltir---i … S37on . ji---p Gulox Naltirtiri … a a 56,047 a n 11,601 2 ! 2 k k 16,773 s G e 16,975 e 12,745 Z 5 2 r r d t n i 21,363 G t 2 o o 19,619 ! t 2 m m 18,263 G y 2 27 / 62

Pairwise alignment 0.0056 0.0041 q 0.0045 v 0.0049 5 0.0052 f c 0.0035 0.0062 x 0.0064 S 0.0073 C 0.0124 7 z j E ! ESSLLI 2016 Sequence Alignment Gerhard Jäger 0.0001 G 0.0002 4 0.0009 0.0011 0.0035 Z 0.0014 8 0.0022 X 0.0027 L 0.0029 T 0.0134 0.0178 Substitution matrix for the ASJP data n t 0.0465 m 0.0478 k 0.0478 e 0.0614 0.0626 r o 0.0696 u 0.0969 i 0.1479 a entire database 6. determine relative frequency of occurrence of each sound within the 0.0449 0.0346 g 0.0222 0.0201 N 0.0202 p 0.0213 h 0.0214 d y l 0.0228 3 0.0232 w 0.0243 s 0.0248 b 0.0331 28 / 62

Pairwise alignment -3.2893 T Z 3.7380 8 G 3.6993 o q -3.2842 C a j x o -3.2914 a m -3.2915 E v -3.3035 ! w -3.3079 3.8116 X u q 4.4037 ! G 4.3760 3 3 4.3692 r r 4.3061 X 4.1200 3.9046 m m 4.1087 t t 4.1021 G Z 4.0429 k k ! -3.3087 l -4.2625 4 i -3.5529 5 a -3.8294 C N -3.8451 ! t ! u e -4.3534 ! i -4.3712 ! a -4.9817 Gerhard Jäger Sequence Alignment ESSLLI 2016 -3.4690 T 5 ! q -3.3116 T o -3.3158 ! k -3.3526 e z -3.3763 s -3.4558 -3.3788 f q -3.3942 N S -3.3954 ! b -3.4077 L b l 4.4637 Substitution matrix for the ASJP data 6.7587 f 6.9117 v v 6.8418 5 5 6.7731 j j T 7.2542 T 6.6580 S S 6.6054 c c 6.5989 C C 6.2439 f q G 8.0650 G G 11.2348 ! ! 10.0202 4 4 9.1480 8 8 Z q Z 7.9575 X X 7.9375 L L 7.6276 z z 7.2624 y 4 6.1943 w b 4.8906 s s 4.8277 4 5 4.7508 E E 4.7143 w 4.8958 4.6512 h h 4.5819 G x 4.5573 Z z 4.4943 y x b g p x 6.1210 G X 5.3342 G q 5.3017 7 7 g 5.2111 p j 4.9263 d d 5.0693 4.9386 Z 4.9821 N N 29 / 62 7. estimate p ab as relative frequency of co-occurrence of a with b , q a , q b as individual relative frequencies, and determine PMI scores log 2 p ab q a q b · · ·

Pairwise alignment Evaluation ESSLLI 2016 Sequence Alignment Gerhard Jäger 30 / 62 − 10 − 5 0 5 10 PMI j Z z L 8 y l d r C T c S s t ! 4 5 x X g h 7 q k G f v w p b n N m i e E 3 o u a j j Z Z z z L L 8 8 y y l l d d r r C C T T c c S S s s t t ! ! 4 4 5 5 x x X X g g h h 7 7 q q k k G G f f v v w w p p b b n n N N m m i i e e E E 3 3 o o u u a a j Z z L 8 y l d r C T c S s t ! 4 5 x X g h 7 q k G f v w p b n N m i e E 3 o u a

Pairwise alignment ❩ s ✽ ③ ♥ ✺ Evaluation ❧ r ▲ ❙ t ✼ ❞ ❥ ✲ ✺ ✵ ✺ ✶ ✵ Gerhard Jäger Sequence Alignment ESSLLI 2016 ❤ ② ❝ ❛ ① ❳ ● ❈ ❣ ❦ ✹ ◆ ♦ ✉ q ❊ ❜ ✦ ❚ ♣ ❡ 31 / 62 ❢ ✇ ✸ ✐ ✈ ♠ �✞ ✟ ✁ ✂ ✠ ✂ ✄ ✡ ☛ ✄ ✂ ✠ ☞ ✁ ✌ ✍ ✄ ✂ �✁ ✂ ✄ ☎✆ ✝ �✝ ✂ ✄ ☎

Pairwise alignment Gap penalties ESSLLI 2016 Sequence Alignment Gerhard Jäger each gap i.e., there is a constant term for each gap 32 / 62 gaps in an alignment correspond either to an insertion or a deletion probability of a deletion simplified assumption: insertions and deletions are equally likely at all positions; symbols are inserted according to their general frequency of occurrence Suppose an item x i is aligned to a gap. Let α be the probability that an insertion occured since the latest common ancestor, and β the P ( x i , −| h 1 ) = αq x i + βq x i P ( x i , −| h 0 ) = q x i log ( P ( x i , −| h 1 ) : P ( x i , −| h 0 )) log ( α + β ) = = − d as α + β < 1 , this term is negative, i.e. there a constant penalty for

Pairwise alignment Affine gap penalties ESSLLI 2016 Sequence Alignment Gerhard Jäger trial and error 33 / 62 gap in biology and linguistics) deletions/insertions frequently apply to entire blocks of symbols (both probability of a gap of length n are higher than the product of probabilities of n individual gaps penalty e for extending a gap is lower than penalty d for opening a g : length of a gap γ ( g ) = − d − ( g − 1) e no principled way to derive the values of d and e ; have to be fixed via d = 2 . 5 and e = 1 . 6 work quite well for the ASJP data

Pairwise alignment enumeration is infeasible, because the number of alignments between ESSLLI 2016 Sequence Alignment Gerhard Jäger log-odds! simpler task: find the most likely alignment and determine its computation is nonetheless possible using Pair Hidden Markov Models Weighted alignment 34 / 62 so far, we assumed that the alignment between � x and � y is known to assess strength of evidence for h 1 given � x, � y , we need to consider all alignments between � x and � y two sequences of length n is � 2 n ( n !) 2 ≈ 2 2 n � = (2 n )! √ πn n

Pairwise alignment The Needleman-Wunsch algorithm almost identical to Levenshtein algorithm, except: of the corresponding symbol pair insertions/deletions are counted as gap penalties by convention, the similarity rather than the distance is counted, i.e. we try to find the alignment that maximizes the score Gerhard Jäger Sequence Alignment ESSLLI 2016 35 / 62 matches/mismatches are counted not as 1 and 0 , but as log-odds scores let � x have length n , � y lenth m , s ab be the log-odds score of a and b , and d / e the gap penalties

Pairwise alignment The Needleman-Wunsch algorithm ESSLLI 2016 Sequence Alignment Gerhard Jäger 36 / 62 F (0 , 0) = 0 G (0 , 0) = 0 ∀ i : 0 < i ≤ n F ( i, 0) = F ( i − 1 , 0) + G ( i − 1 , 0) e + (1 − G ( i − 1 , 0)) d G ( i, 0) = 1 ∀ j : 0 < j ≤ m : F (0 , j ) = F (0 , j − 1) + G (0 , j − 1) e + (1 − G (0 , j − 1)) d G (0 , j ) = 1 ∀ i, j : 0 < i ≤ n, 0 < j ≤ m  F ( i − 1 , j ) + G ( i − 1 , j ) e + (1 − G ( i − 1 , j )) d  max F ( i, j − 1) + G ( i, j − 1) + (1 − G ( i, j − 1)) d F ( i, j ) = F ( i − 1 , j − 1) + s xiyj   F ( i − 1 , j ) + G ( i − 1 , j ) e + (1 − G ( i − 1 , j )) d    0 if arg max F ( i, j − 1) + G ( i, j − 1) e + (1 − G ( i, j − 1)) d G ( i, j ) =  = 3 F ( i − 1 , j − 1) + s xiyj  1 else

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 m − 4 . 1 e − 5 . 7 n E − 7 . 3 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 m − 4 . 1 e − 5 . 7 n E − 7 . 3 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − − 4.1 0 − 2 . 5 − 5 . 7 − 7 . 3 − − 2 . 5 4.13 m − 4 . 1 e − 5 . 7 n E − 7 . 3 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4.13 m − 4 . 1 e − 5 . 7 n E − 7 . 3 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 m − 4 . 1 e − 5 . 7 n E − 7 . 3 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 m − 4 . 1 e − 5 . 7 n E − 7 . 3 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 e − 5 . 7 n E − 7 . 3 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 e − 5 . 7 n E − 7 . 3 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 e − 5 . 7 n E − 7 . 3 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 3 . 05 e − 5 . 7 n E − 7 . 3 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e − 5 . 7 n E − 7 . 3 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e − 5 . 7 0 . 03 n E − 7 . 3 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e − 5 . 7 0 . 03 3 . 05 n E − 7 . 3 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e − 5 . 7 0 . 03 3 . 05 9 . 2 n E − 7 . 3 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e − 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E − 7 . 3 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e − 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E − 7 . 3 − 1 . 47 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e − 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E − 7 . 3 − 1 . 47 4 . 75 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e − 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E − 7 . 3 − 1 . 47 4 . 75 6 . 6 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e − 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E − 7 . 3 − 1 . 47 4 . 75 6 . 6 7 . 62 s − 8 . 9

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e − 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E − 7 . 3 − 1 . 47 4 . 75 6 . 6 7 . 62 s − 8 . 9 − 2 . 97

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e − 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E − 7 . 3 − 1 . 47 4 . 75 6 . 6 7 . 62 s − 8 . 9 − 2 . 97 2 . 15

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e − 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E − 7 . 3 − 1 . 47 4 . 75 6 . 6 7 . 62 s − 8 . 9 − 2 . 97 2 . 15 5 . 1

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e − 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E − 7 . 3 − 1 . 47 4 . 75 6 . 6 7 . 62 s − 8 . 9 − 2 . 97 2 . 15 5 . 1 8 . 84

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e − 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E − 7 . 3 − 1 . 47 4 . 75 6.6 7 . 62 8.84 s − 8 . 9 − 2 . 97 2 . 15 5 . 1 memorizing in each step which of the three cells to the left and above gave rise to the current entry lets us recover the corresponing optimal alignment

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e − 5 . 7 0 . 03 3 . 05 9.2 6 . 6 n E − 7 . 3 − 1 . 47 4 . 75 6.6 7 . 62 8.84 s − 8 . 9 − 2 . 97 2 . 15 5 . 1 memorizing in each step which of the three cells to the left and above gave rise to the current entry lets us recover the corresponing optimal alignment

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4 . 13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5.65 3 . 05 1 . 55 e − 5 . 7 0 . 03 3 . 05 9.2 6 . 6 n E − 7 . 3 − 1 . 47 4 . 75 6.6 7 . 62 8.84 s − 8 . 9 − 2 . 97 2 . 15 5 . 1 memorizing in each step which of the three cells to the left and above gave rise to the current entry lets us recover the corresponing optimal alignment

Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard Jäger 37 / 62 Dynamic Programming m E n S − 0 − 2 . 5 − 4 . 1 − 5 . 7 − 7 . 3 − − 2 . 5 4.13 1 . 53 0 . 03 − 1 . 47 m − 4 . 1 1 . 53 5.65 3 . 05 1 . 55 e − 5 . 7 0 . 03 3 . 05 9.2 6 . 6 n E − 7 . 3 − 1 . 47 4 . 75 6.6 7 . 62 8.84 s − 8 . 9 − 2 . 97 2 . 15 5 . 1 memorizing in each step which of the three cells to the left and above gave rise to the current entry lets us recover the corresponing optimal alignment

Pairwise alignment liNgwE leb3r leb3r pektus- pektus- b--rust --brust manus manus han-t han-t -liNgwE chuN--3 yekur -chuN3 d-ens dens chan- chan nasus nasus naz3- na-z3 okulus okulus a-ug3- yekur triNk3n kornu -mor-i- ESSLLI 2016 Sequence Alignment Gerhard Jäger sol- so-l zon3 zon3 w---enire wenire khom3n--- khom3n -mor--i triNk3n- Sterb3n Sterb3n audire- audire --her3n -her3n widere widere --ze3n --ze3n -bi-bere -bibere -au-g3 kornu Evaluation ains persona persona mEnS--- ---mEnS duo- -duo cvai cvai -unus unus ain-s nos fiS--- nos vir vir tu tu du du ego ego iX- -iX left: Levenshtein alignment; right: Needleman-Wunsch alignment ---fiS piskis horn- haut-- horn- --os-- --o--s knoX3n knoX3n saNgwis saNgwis ---blut ---blut k-utis -kutis haut-- piskis folyu folyu b-lat -blat pedikul-us pedikulus ------laus -----laus kanis kanis hun-t hun-t 38 / 62

Pairwise alignment no-i- noks noks ---fol fol---- plenus p-lenus no--i nowus n-at nowus nam-3 nam3- nomen nomen Gerhard Jäger Sequence Alignment ESSLLI 2016 na-t mons Evaluation -foia vas3r --vas3r -akwa akwa--- Stain Sta-in lapis -lapis fo-ia mons iNnis iNnis pfat p-fat viya viya- bErk bErk 39 / 62

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence - PowerPoint PPT Presentation

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1 / 62 Sequence alignment: Motivation Sequence alignment: Motivation Gerhard Jger Sequence Alignment ESSLLI 2016 2 / 62 Sequence alignment:

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 5/29/2013 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/24/2012 Mark Voorhies Sequence Alignment Exercise:

CSE 421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE 427 Computational Biology Winter 2008 Sequence Alignment; DNA Replication 1 Sequence

CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Acoustic Modeling Hsin-min Wang References: 1. X. Huang et. al., Spoken Language Processing,

Stefanie Shattuck-Hufnagel Speech Communication Group Research Laboratory of Electronics MIT A

Day-7 How to write half letter or double Conjuncts- when two consonants are joined together

Southern French (De-)Nasal(ized) Vowels: [m bOm vEm blAN] Megan L. Risdal Department of

Daralyn Hassan, MS, MT(ASCP) April 3rd, 2014 CLIA General overview of CLIA Identification

Exploring new structures for the development of CPL-dyes based on flexible bis(BODIPY)s Csar Ray,

A Hands-on IODA Tutorial Interaction-Oriented Simulation within NetLogo Sbastien Picault

2 Model i Density: The density i of a task i is the ratio C i /D i of A sporadic

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence - PowerPoint PPT Presentation

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1 / 62 Sequence alignment: Motivation Sequence alignment: Motivation Gerhard Jger Sequence Alignment ESSLLI 2016 2 / 62 Sequence alignment:

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 5/29/2013 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/24/2012 Mark Voorhies Sequence Alignment Exercise:

CSE 421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE 427 Computational Biology Winter 2008 Sequence Alignment; DNA Replication 1 Sequence

CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

SEQUENCE ANALYSIS The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Acoustic Modeling Hsin-min Wang References: 1. X. Huang et. al., Spoken Language Processing,

Stefanie Shattuck-Hufnagel Speech Communication Group Research Laboratory of Electronics MIT A

Day-7 How to write half letter or double Conjuncts- when two consonants are joined together

Southern French (De-)Nasal(ized) Vowels: [m bOm vEm blAN] Megan L. Risdal Department of

Daralyn Hassan, MS, MT(ASCP) April 3rd, 2014 CLIA General overview of CLIA Identification

Exploring new structures for the development of CPL-dyes based on flexible bis(BODIPY)s Csar Ray,

A Hands-on IODA Tutorial Interaction-Oriented Simulation within NetLogo Sbastien Picault

2 Model i Density: The density i of a task i is the ratio C i /D i of A sporadic

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or