Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 9 / 62 Dynamic Programming m E n S β 0 1 2 3 4 β 1 0 1 2 3 m 2 1 1 2 3 e 3 2 2 1 2 n E 4 3 2 2 2 s 5 4 3
Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 9 / 62 Dynamic Programming m E n S β 0 1 2 3 4 β 1 0 1 2 3 m 2 1 1 2 3 e 3 2 2 1 2 n E 4 3 2 2 2 s 5 4 3 3
Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 9 / 62 Dynamic Programming m E n S β 0 1 2 3 4 β 1 0 1 2 3 m 2 1 1 2 3 e 3 2 2 1 2 n E 4 3 2 2 2 s 5 4 3 3 3
Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 9 / 62 Dynamic Programming m E n S β 0 1 2 3 4 β 1 0 1 2 3 m 2 1 1 2 3 e 3 2 2 1 2 n E 4 3 2 2 2 s 5 4 3 3 3 memorizing in each step which of the three cells to the left and above gave rise to the current entry lets us recover the corresponing optimal alignment
Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 9 / 62 Dynamic Programming m E n S β 0 1 2 3 4 β 1 0 1 2 3 m 2 1 1 2 3 e 3 2 2 1 2 n E 4 3 2 2 2 3 s 5 4 3 3 memorizing in each step which of the three cells to the left and above gave rise to the current entry lets us recover the corresponing optimal alignment
Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 9 / 62 Dynamic Programming m E n S β 0 1 2 3 4 β 1 0 1 2 3 m 2 1 1 2 3 e 3 2 2 1 2 n E 4 3 2 2 2 3 s 5 4 3 3 memorizing in each step which of the three cells to the left and above gave rise to the current entry lets us recover the corresponing optimal alignment
Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 9 / 62 Dynamic Programming m E n S β 0 1 2 3 4 β 1 0 1 2 3 m 2 1 1 2 3 e 3 2 2 1 2 n E 4 3 2 2 2 3 s 5 4 3 3 memorizing in each step which of the three cells to the left and above gave rise to the current entry lets us recover the corresponing optimal alignment
Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 9 / 62 Dynamic Programming m E n S β 0 1 2 3 4 β 1 0 1 2 3 m 2 1 1 2 3 e 3 2 2 1 2 n E 4 3 2 2 2 3 s 5 4 3 3 memorizing in each step which of the three cells to the left and above gave rise to the current entry lets us recover the corresponing optimal alignment
Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 9 / 62 Dynamic Programming m E n S β 0 1 2 3 4 β 1 0 1 2 3 m 2 1 1 2 3 e 3 2 2 1 2 n E 4 3 2 2 2 3 s 5 4 3 3 m E n S β m e n E s
Pairwise alignment Computing the Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 9 / 62 m E n S β m E n S β 0 1 2 3 4 β 0 1 2 3 4 β 0 m 1 1 2 3 1 0 1 2 3 m 2 1 1 2 3 e e 2 1 1 2 3 3 2 2 1 2 n 1 n 3 2 2 2 4 3 2 2 2 E 4 3 2 2 2 E s 5 4 3 3 3 5 4 3 3 3 s m E n S β m E n S β m e n E s m e n E s
Pairwise alignment Normalization for length ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger normalization: dividing Levenshtein distance by length of longer string: 10 / 62 grm. ze3n ( sehen , βseeβ) and Hindi deg are not cognate still cognate grm. mEnS ( Mensch , βpersonβ) and Hindi manuSya are (partially) d L ( mEnS , manuSya ) = 4 d L ( ze3n , deg ) = 3 d LD ( mEnS , manuSya ) = 4/7 β 0 . 57 d LD ( ze3n , deg ) = 3/4 = 0 . 75
Pairwise alignment How well does normalized Levenshtein distance ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 11 / 62 predict cognacy? 1.00 1.0 0.8 0.75 empirical probability of cognacy 0.6 cognate LDN no 0.50 yes 0.4 0.25 0.2 0.0 0.2 0.4 0.6 0.8 0.00 LDN no yes cognate
Pairwise alignment t i s corresponding sounds count as mismatches even if they are aligend correctly h a n t h a n h s E n d m a n o substantial amount of chance similarities Gerhard JΓ€ger Sequence Alignment ESSLLI 2016 k i Problems - binary distinction: match vs. non-match frequently genuin sound correspondences in cognates are missed: c v a i n a z 3 - - p f i S - - t u n - o s 12 / 62
Pairwise alignment Background: probability theory Given two sequences: How likely is it that they are aligned? More general question: Given some data, and two competing hypotheses, how likely is it that the first hypothesis is correct? Bayesian Inference!!! given: wanted: Gerhard JΓ€ger Sequence Alignment ESSLLI 2016 13 / 62 data: d hypotheses: h 1 , h 0 model: P ( d | h 1 ) , P ( d | h 0 ) P ( h 1 | d ) : P ( h 0 | d )
Pairwise alignment ergo: ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger Bayesian inference 14 / 62 Bayes Theorem: P ( d | h ) P ( h ) P ( h | d ) = οΏ½ h β² P ( d | h β² ) P ( h β² ) P ( h 1 | d ) : P ( h 0 | d ) = P ( d | h 1 ) P ( h 1 ) : P ( d | h 0 ) P ( h 0 ) P ( d | h 1 ) P ( h 1 ) P ( h 1 | d ) : P ( h 0 | d ) = P ( d | h 0 ) P ( h 0 ) log P ( d | h 1 ) P ( d | h 0 ) + log P ( h 1 ) log ( P ( h 1 | d ) : P ( h 0 | d )) = P ( h 0 )
Pairwise alignment Bayesian inference ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 15 / 62 suppose we have many independent data: οΏ½ d = d 1 , . . . , d n n P ( οΏ½ οΏ½ d | h ) = P ( d i | h ) i =1 n log P ( οΏ½ οΏ½ log P ( d i | h ) d | h ) = i =1 n log P ( οΏ½ d | h 1 ) log P ( d i | h 1 ) οΏ½ = P ( οΏ½ P ( d i | h 0 ) d | h 0 ) i =1 n log P ( d i | h 1 ) P ( d i | h 0 ) + log P ( h 1 ) log ( P ( h 1 | οΏ½ d ) : P ( h 0 | οΏ½ οΏ½ d )) = P ( h 0 ) i =1
Pairwise alignment Bayesian inference ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 16 / 62 there are various heuristics, but no generally accepted way to obtain mein argument against using Bayesβ rule: the prior probabilities them P ( h 1 ) , P ( h 0 ) are not known if n is large though, log P ( h 1 ) / P ( h 0 ) doesnβt matter very much: 1 n log P ( d i | h 1 ) log ( P ( h 1 | οΏ½ d ) : P ( h 0 | οΏ½ οΏ½ P ( d i | h 0 ) = log ( P ( οΏ½ d | h 1 ) : P ( οΏ½ d )) β d | h 0 )) i =1 the quantity log ( P ( οΏ½ d | h 1 ) : P ( οΏ½ d | h 0 )) is called log-odds 1 Also, if we choose an uninformative prior with P ( h 1 ) = P ( h 0 ) , we have log P ( h 1 ) / P ( h 0 ) = 0 anyway.
Pairwise alignment Log-odds log-odds can take any real value the higher the absolute value, the stronger is the evidence Gerhard JΓ€ger Sequence Alignment ESSLLI 2016 17 / 62 a positive value indicates evidence for h 1 and a negative value evidence for h 0
Pairwise alignment Weighted alignment for the time being, we assume there are no gaps in the alignment additional assumptions (rough approximation in biology, pretty much off the mark in linguistics): substitions in different positions occur independently Gerhard JΓ€ger Sequence Alignment ESSLLI 2016 18 / 62 suppose our data are two aligned sequences οΏ½ x , οΏ½ y h 1 : they developed from a common ancestor via substitions h 0 : they are unrelated
Pairwise alignment from all other positions ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger The null model then 19 / 62 assume the strings have no βgrammarβ; each position is independent as a start (quite wrong both in biology and in linguistics): let us their individual probabilities if οΏ½ x and οΏ½ y are unrelated, their joint probability equals the product of P ( οΏ½ y | h 0 ) = P ( οΏ½ x | h 0 ) P ( οΏ½ y | h 0 ) x, οΏ½ οΏ½ = P ( x i | h 0 ) P ( y i | h 0 ) i οΏ½ log P ( οΏ½ log ( P ( x i | h 0 ) + log P ( y i | h 0 )) x, οΏ½ y | h 0 ) = i
Pairwise alignment The null model ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 20 / 62 DNA and protein comparison, false for cross-linguistic word comparison) suppose οΏ½ x and οΏ½ y are generated by the same process (reasonable for then P ( x i | h ) , P ( y i | h ) are simply the probabilities of occurrence q a : probability that symbol a occurs in a sequence οΏ½ οΏ½ log P ( οΏ½ log q x i + log q y j x, οΏ½ y | h 0 ) = i j q can be estimated from relative frequencies
Pairwise alignment The alignment model ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 21 / 62 independence between positions: substitution mutations suppose οΏ½ x and οΏ½ y evolved from a common ancestor via independent οΏ½ P ( οΏ½ y | h 1 ) = P ( x i , y i | h 2 ) x, οΏ½ i p a,b : probability that a position in the latest common ancestor of x and y evolved into an a in sequence οΏ½ x and into a b in sequence οΏ½ y οΏ½ P ( οΏ½ x, οΏ½ y | h 1 ) = p x i ,y i i log P ( οΏ½ οΏ½ log p x i ,y i y | h 1 ) = x, οΏ½ i
Pairwise alignment The log-odds score ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger assembled in a PMI substitution matrix also goes by the name of Pointwise Mutual Information (PMI) 22 / 62 taking things together, we have log p x i ,y i οΏ½ log ( P ( οΏ½ x, οΏ½ y | h 1 ) : P ( οΏ½ x, οΏ½ y | h 0 )) = q x i q y i i log p ab q a q b : score of the alignment of a with b
Pairwise alignment Substitution matrices ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 23 / 62 in bioinformatics, several commonly used substitution matrices for for nucleotids: nucleotids and proteins based on explicit models of evolution and careful empirical testing A G T C A 2 β 5 β 7 β 7 β 5 2 β 7 β 7 G T β 7 β 7 2 β 5 β 7 β 7 β 5 2 C
Pairwise alignment Substitution matrices for proteins: different matrices for different evolutionary distances for instance: BLOSUM50 Gerhard JΓ€ger Sequence Alignment ESSLLI 2016 24 / 62
Pairwise alignment An.CENTRAL_MALAYO-POLYNESIAN.BALILEDO An.NORTHWEST_MALAYO-POLYNESIAN.LAHANAN NC.BANTOID.LIFONGA NC.BANTOID.BOMBOMA_2 IE.INDIC.WAD_PAGGA IE.INDIC.TALAGANG_HINDKO NC.BANTOID.LINGALA NC.BANTOID.LIFONGA An.CENTRAL_MALAYO-POLYNESIAN.PALUE An.NORTHERN_PHILIPPINES.LIMOS_KALINGA AuA.MUNDA.HO AuA.MUNDA.KORKU MGe.GE-KAINGANG.KAYAPO MGe.GE-KAINGANG.APINAYE Gerhard JΓ€ger Sequence Alignment ESSLLI 2016 An.MESO-PHILIPPINE.CANIPAAN_PALAWAN An.SOUTHERN_PHILIPPINES.KAGAYANEN Substitution matrix for the ASJP data Pan.PANOAN.KASHIBO_SAN_ALEJANDRO 1. identify large sample of pairs of closely related languages (using expert information or heuristics based on aggregated Levenshtein distance) An.NORTHERN_PHILIPPINES.CENTRAL_BONTOC An.MESO-PHILIPPINE.NORTHERN_SORSOGON WF.WESTERN_FLY.IAMEGA WF.WESTERN_FLY.GAMAEWE Pan.PANOAN.KASHIBO_BAJO_AGUAYTIA AA.EASTERN_CUSHITIC.KAMBAATA_2 UA.AZTECAN.NAHUATL_CUENTEPEC_TEMIXCO AA.EASTERN_CUSHITIC.HADIYYA_2 ST.BAI.QILIQIAO_BAI_2 ST.BAI.YUNLONG_BAI An.SULAWESI.MANDAR An.OCEANIC.RAGA An.SULAWESI.TANETE An.SAMA-BAJAW.BOEPINANG_BAJAU UA.AZTECAN.NAHUATL_HUEYAPAN_TETELA_DEL_VOLCAN 25 / 62
Pairwise alignment n ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger nn: 1; ii: 1; sS; 1; ae: 1; mm: 1 5. for each sound pair, count number of correspondences m e S i m Substitution matrix for the ASJP data a s i n 4. do Levenshtein alignment nisam , niSem 3. find corresponding words from the two languages: concept: one languages: Pen.MAIDUAN.MAIDU_KONKAU , Pen.MAIDUAN.NE_MAIDU 2. pick a concept and a pair of related languages at random 26 / 62
Pairwise alignment 6,681 6,829 a ! 2 p p a d ! 2 w w 6,613 ! y d 2 N b l l 11,377 X z 2 b 8,965 8 ! k 2 s s 8,245 q 2 N s . 2 . . . . . . . E . v S 2 Gerhard JΓ€ger Sequence Alignment ESSLLI 2016 ! 5,255 6,275 G ! E 2 h h 5,331 j 2 3 y y 5,321 G i 2 3 2 G Substitution matrix for the ASJP data i . . . . . i 33,955 . 4 8 2 u u 23,731 4 . . 2 klom steps 2-5 are repeated 100,000 times klem S3--v ligini kulox Naltir---i β¦ S37on . ji---p Gulox Naltirtiri β¦ a a 56,047 a n 11,601 2 ! 2 k k 16,773 s G e 16,975 e 12,745 Z 5 2 r r d t n i 21,363 G t 2 o o 19,619 ! t 2 m m 18,263 G y 2 27 / 62
Pairwise alignment 0.0056 0.0041 q 0.0045 v 0.0049 5 0.0052 f c 0.0035 0.0062 x 0.0064 S 0.0073 C 0.0124 7 z j E ! ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 0.0001 G 0.0002 4 0.0009 0.0011 0.0035 Z 0.0014 8 0.0022 X 0.0027 L 0.0029 T 0.0134 0.0178 Substitution matrix for the ASJP data n t 0.0465 m 0.0478 k 0.0478 e 0.0614 0.0626 r o 0.0696 u 0.0969 i 0.1479 a entire database 6. determine relative frequency of occurrence of each sound within the 0.0449 0.0346 g 0.0222 0.0201 N 0.0202 p 0.0213 h 0.0214 d y l 0.0228 3 0.0232 w 0.0243 s 0.0248 b 0.0331 28 / 62
Pairwise alignment -3.2893 T Z 3.7380 8 G 3.6993 o q -3.2842 C a j x o -3.2914 a m -3.2915 E v -3.3035 ! w -3.3079 3.8116 X u q 4.4037 ! G 4.3760 3 3 4.3692 r r 4.3061 X 4.1200 3.9046 m m 4.1087 t t 4.1021 G Z 4.0429 k k ! -3.3087 l -4.2625 4 i -3.5529 5 a -3.8294 C N -3.8451 ! t ! u e -4.3534 ! i -4.3712 ! a -4.9817 Gerhard JΓ€ger Sequence Alignment ESSLLI 2016 -3.4690 T 5 ! q -3.3116 T o -3.3158 ! k -3.3526 e z -3.3763 s -3.4558 -3.3788 f q -3.3942 N S -3.3954 ! b -3.4077 L b l 4.4637 Substitution matrix for the ASJP data 6.7587 f 6.9117 v v 6.8418 5 5 6.7731 j j T 7.2542 T 6.6580 S S 6.6054 c c 6.5989 C C 6.2439 f q G 8.0650 G G 11.2348 ! ! 10.0202 4 4 9.1480 8 8 Z q Z 7.9575 X X 7.9375 L L 7.6276 z z 7.2624 y 4 6.1943 w b 4.8906 s s 4.8277 4 5 4.7508 E E 4.7143 w 4.8958 4.6512 h h 4.5819 G x 4.5573 Z z 4.4943 y x b g p x 6.1210 G X 5.3342 G q 5.3017 7 7 g 5.2111 p j 4.9263 d d 5.0693 4.9386 Z 4.9821 N N 29 / 62 7. estimate p ab as relative frequency of co-occurrence of a with b , q a , q b as individual relative frequencies, and determine PMI scores log 2 p ab q a q b Β· Β· Β·
Pairwise alignment Evaluation ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 30 / 62 β 10 β 5 0 5 10 PMI j Z z L 8 y l d r C T c S s t ! 4 5 x X g h 7 q k G f v w p b n N m i e E 3 o u a j j Z Z z z L L 8 8 y y l l d d r r C C T T c c S S s s t t ! ! 4 4 5 5 x x X X g g h h 7 7 q q k k G G f f v v w w p p b b n n N N m m i i e e E E 3 3 o o u u a a j Z z L 8 y l d r C T c S s t ! 4 5 x X g h 7 q k G f v w p b n N m i e E 3 o u a
Pairwise alignment β© s β½ β’ β₯ βΊ Evaluation β§ r β² β t βΌ β β₯ β² βΊ β΅ βΊ βΆ β΅ Gerhard JΓ€ger Sequence Alignment ESSLLI 2016 β€ β‘ β β β β³ β β β£ β¦ βΉ β β¦ β q β β β¦ β β£ β‘ 31 / 62 β’ β βΈ β β β οΏ½β β β β β β β β‘ β β β β β β β β β β οΏ½β β β ββ β οΏ½β β β β
Pairwise alignment Gap penalties ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger each gap i.e., there is a constant term for each gap 32 / 62 gaps in an alignment correspond either to an insertion or a deletion probability of a deletion simplified assumption: insertions and deletions are equally likely at all positions; symbols are inserted according to their general frequency of occurrence Suppose an item x i is aligned to a gap. Let Ξ± be the probability that an insertion occured since the latest common ancestor, and Ξ² the P ( x i , β| h 1 ) = Ξ±q x i + Ξ²q x i P ( x i , β| h 0 ) = q x i log ( P ( x i , β| h 1 ) : P ( x i , β| h 0 )) log ( Ξ± + Ξ² ) = = β d as Ξ± + Ξ² < 1 , this term is negative, i.e. there a constant penalty for
Pairwise alignment Affine gap penalties ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger trial and error 33 / 62 gap in biology and linguistics) deletions/insertions frequently apply to entire blocks of symbols (both probability of a gap of length n are higher than the product of probabilities of n individual gaps penalty e for extending a gap is lower than penalty d for opening a g : length of a gap Ξ³ ( g ) = β d β ( g β 1) e no principled way to derive the values of d and e ; have to be fixed via d = 2 . 5 and e = 1 . 6 work quite well for the ASJP data
Pairwise alignment enumeration is infeasible, because the number of alignments between ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger log-odds! simpler task: find the most likely alignment and determine its computation is nonetheless possible using Pair Hidden Markov Models Weighted alignment 34 / 62 so far, we assumed that the alignment between οΏ½ x and οΏ½ y is known to assess strength of evidence for h 1 given οΏ½ x, οΏ½ y , we need to consider all alignments between οΏ½ x and οΏ½ y two sequences of length n is οΏ½ 2 n ( n !) 2 β 2 2 n οΏ½ = (2 n )! β Οn n
Pairwise alignment The Needleman-Wunsch algorithm almost identical to Levenshtein algorithm, except: of the corresponding symbol pair insertions/deletions are counted as gap penalties by convention, the similarity rather than the distance is counted, i.e. we try to find the alignment that maximizes the score Gerhard JΓ€ger Sequence Alignment ESSLLI 2016 35 / 62 matches/mismatches are counted not as 1 and 0 , but as log-odds scores let οΏ½ x have length n , οΏ½ y lenth m , s ab be the log-odds score of a and b , and d / e the gap penalties
Pairwise alignment The Needleman-Wunsch algorithm ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 36 / 62 F (0 , 0) = 0 G (0 , 0) = 0 β i : 0 < i β€ n F ( i, 0) = F ( i β 1 , 0) + G ( i β 1 , 0) e + (1 β G ( i β 1 , 0)) d G ( i, 0) = 1 β j : 0 < j β€ m : F (0 , j ) = F (0 , j β 1) + G (0 , j β 1) e + (1 β G (0 , j β 1)) d G (0 , j ) = 1 β i, j : 0 < i β€ n, 0 < j β€ m ο£± F ( i β 1 , j ) + G ( i β 1 , j ) e + (1 β G ( i β 1 , j )) d ο£² max F ( i, j β 1) + G ( i, j β 1) + (1 β G ( i, j β 1)) d F ( i, j ) = F ( i β 1 , j β 1) + s xiyj ο£³ ο£± F ( i β 1 , j ) + G ( i β 1 , j ) e + (1 β G ( i β 1 , j )) d ο£Ό ο£² ο£½ 0 if arg max F ( i, j β 1) + G ( i, j β 1) e + (1 β G ( i, j β 1)) d G ( i, j ) = ο£Ύ = 3 F ( i β 1 , j β 1) + s xiyj ο£³ 1 else
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 m β 4 . 1 e β 5 . 7 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 m β 4 . 1 e β 5 . 7 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 m β 4 . 1 e β 5 . 7 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 m β 4 . 1 e β 5 . 7 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β β 4.1 0 β 2 . 5 β 5 . 7 β 7 . 3 β β 2 . 5 4.13 m β 4 . 1 e β 5 . 7 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4.13 m β 4 . 1 e β 5 . 7 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 m β 4 . 1 e β 5 . 7 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 m β 4 . 1 e β 5 . 7 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 e β 5 . 7 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 e β 5 . 7 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 e β 5 . 7 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 3 . 05 e β 5 . 7 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e β 5 . 7 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e β 5 . 7 0 . 03 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e β 5 . 7 0 . 03 3 . 05 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e β 5 . 7 0 . 03 3 . 05 9 . 2 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e β 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E β 7 . 3 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e β 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E β 7 . 3 β 1 . 47 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e β 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E β 7 . 3 β 1 . 47 4 . 75 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e β 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E β 7 . 3 β 1 . 47 4 . 75 6 . 6 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e β 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E β 7 . 3 β 1 . 47 4 . 75 6 . 6 7 . 62 s β 8 . 9
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e β 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E β 7 . 3 β 1 . 47 4 . 75 6 . 6 7 . 62 s β 8 . 9 β 2 . 97
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e β 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E β 7 . 3 β 1 . 47 4 . 75 6 . 6 7 . 62 s β 8 . 9 β 2 . 97 2 . 15
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e β 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E β 7 . 3 β 1 . 47 4 . 75 6 . 6 7 . 62 s β 8 . 9 β 2 . 97 2 . 15 5 . 1
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e β 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E β 7 . 3 β 1 . 47 4 . 75 6 . 6 7 . 62 s β 8 . 9 β 2 . 97 2 . 15 5 . 1 8 . 84
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e β 5 . 7 0 . 03 3 . 05 9 . 2 6 . 6 n E β 7 . 3 β 1 . 47 4 . 75 6.6 7 . 62 8.84 s β 8 . 9 β 2 . 97 2 . 15 5 . 1 memorizing in each step which of the three cells to the left and above gave rise to the current entry lets us recover the corresponing optimal alignment
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5 . 65 3 . 05 1 . 55 e β 5 . 7 0 . 03 3 . 05 9.2 6 . 6 n E β 7 . 3 β 1 . 47 4 . 75 6.6 7 . 62 8.84 s β 8 . 9 β 2 . 97 2 . 15 5 . 1 memorizing in each step which of the three cells to the left and above gave rise to the current entry lets us recover the corresponing optimal alignment
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4 . 13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5.65 3 . 05 1 . 55 e β 5 . 7 0 . 03 3 . 05 9.2 6 . 6 n E β 7 . 3 β 1 . 47 4 . 75 6.6 7 . 62 8.84 s β 8 . 9 β 2 . 97 2 . 15 5 . 1 memorizing in each step which of the three cells to the left and above gave rise to the current entry lets us recover the corresponing optimal alignment
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4.13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5.65 3 . 05 1 . 55 e β 5 . 7 0 . 03 3 . 05 9.2 6 . 6 n E β 7 . 3 β 1 . 47 4 . 75 6.6 7 . 62 8.84 s β 8 . 9 β 2 . 97 2 . 15 5 . 1 memorizing in each step which of the three cells to the left and above gave rise to the current entry lets us recover the corresponing optimal alignment
Pairwise alignment Finding the best alignment ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger 37 / 62 Dynamic Programming m E n S β 0 β 2 . 5 β 4 . 1 β 5 . 7 β 7 . 3 β β 2 . 5 4.13 1 . 53 0 . 03 β 1 . 47 m β 4 . 1 1 . 53 5.65 3 . 05 1 . 55 e β 5 . 7 0 . 03 3 . 05 9.2 6 . 6 n E β 7 . 3 β 1 . 47 4 . 75 6.6 7 . 62 8.84 s β 8 . 9 β 2 . 97 2 . 15 5 . 1 memorizing in each step which of the three cells to the left and above gave rise to the current entry lets us recover the corresponing optimal alignment
Pairwise alignment liNgwE leb3r leb3r pektus- pektus- b--rust --brust manus manus han-t han-t -liNgwE chuN--3 yekur -chuN3 d-ens dens chan- chan nasus nasus naz3- na-z3 okulus okulus a-ug3- yekur triNk3n kornu -mor-i- ESSLLI 2016 Sequence Alignment Gerhard JΓ€ger sol- so-l zon3 zon3 w---enire wenire khom3n--- khom3n -mor--i triNk3n- Sterb3n Sterb3n audire- audire --her3n -her3n widere widere --ze3n --ze3n -bi-bere -bibere -au-g3 kornu Evaluation ains persona persona mEnS--- ---mEnS duo- -duo cvai cvai -unus unus ain-s nos fiS--- nos vir vir tu tu du du ego ego iX- -iX left: Levenshtein alignment; right: Needleman-Wunsch alignment ---fiS piskis horn- haut-- horn- --os-- --o--s knoX3n knoX3n saNgwis saNgwis ---blut ---blut k-utis -kutis haut-- piskis folyu folyu b-lat -blat pedikul-us pedikulus ------laus -----laus kanis kanis hun-t hun-t 38 / 62
Pairwise alignment no-i- noks noks ---fol fol---- plenus p-lenus no--i nowus n-at nowus nam-3 nam3- nomen nomen Gerhard JΓ€ger Sequence Alignment ESSLLI 2016 na-t mons Evaluation -foia vas3r --vas3r -akwa akwa--- Stain Sta-in lapis -lapis fo-ia mons iNnis iNnis pfat p-fat viya viya- bErk bErk 39 / 62
Recommend
More recommend