Sequence Alignment
Gerhard Jäger ESSLLI 2016
Gerhard Jäger Sequence Alignment ESSLLI 2016 1 / 62
Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence - - PowerPoint PPT Presentation
Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1 / 62 Sequence alignment: Motivation Sequence alignment: Motivation Gerhard Jger Sequence Alignment ESSLLI 2016 2 / 62 Sequence alignment:
Gerhard Jäger Sequence Alignment ESSLLI 2016 1 / 62
Sequence alignment: Motivation
Gerhard Jäger Sequence Alignment ESSLLI 2016 2 / 62
Sequence alignment: Motivation
Meaning Italian English cognate few ’pɔko fju: 1 rub fre’gare rʌb dull
dʌl hunt kat’tʃare hʊnt year ’anno jɪə this ’kwesto ðɪs fish ’peʃʃe fɪʃ 1 rotten ’martʃo ’rɒtən right ’dʒusto raɪt when ’kwando wɛn 1 drink ’bere drɪŋk heavy pe’sante ’hɛvɪ heavy ’grɛve ’hɛvɪ egg ’wɔvo ɛg 1 earth ’tɛrra ɜ:θ dust ’polvere dʌst laugh ’ridere lɑ:f grass ’ɛrba grɑ:s sharp taʎ’ʎɛnte ʃɑ:p wash la’vare wɒʃ
Gerhard Jäger Sequence Alignment ESSLLI 2016 3 / 62
Sequence alignment: Motivation
Meaning Italian English cognate few ’pɔko fju: rub fre’gare rʌb dull
dʌl hunt kat’tʃare hʊnt year ’anno jɪə this ’kwesto ðɪs fish ’peʃʃe fɪʃ rotten ’martʃo ’rɒtən right ’dʒusto raɪt when ’kwando wɛn drink ’bere drɪŋk heavy pe’sante ’hɛvɪ heavy ’grɛve ’hɛvɪ egg ’wɔvo ɛg earth ’tɛrra ɜ:θ dust ’polvere dʌst laugh ’ridere lɑ:f grass ’ɛrba grɑ:s sharp taʎ’ʎɛnte ʃɑ:p wash la’vare wɒʃ
Gerhard Jäger Sequence Alignment ESSLLI 2016 3 / 62
Sequence alignment: Motivation
Gerhard Jäger Sequence Alignment ESSLLI 2016 4 / 62
Sequence alignment: Motivation
Meaning Italian English few poko fyu rub fregare rob dull
dol hunt kattSare hunt year anno yi3 this kwesto 8is fish peSSe fiS rotten martSo rot3n right dZusto rait when kwando wEn drink bere driNk heavy pesante hEvi heavy grEve hEvi egg wovo Eg earth tErra 38 dust polvere dost laugh ridere lof grass Erba gros sharp tallEnte Sop wash lavare woS
Gerhard Jäger Sequence Alignment ESSLLI 2016 4 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 5 / 62
Pairwise alignment
1
2
3
Gerhard Jäger Sequence Alignment ESSLLI 2016 6 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 7 / 62
Pairwise alignment
1
2
Gerhard Jäger Sequence Alignment ESSLLI 2016 8 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 9 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 10 / 62
Pairwise alignment
LDN empirical probability of cognacy 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0
0.00 0.25 0.50 0.75 1.00 no yes
cognate LDN cognate
no yes
Gerhard Jäger Sequence Alignment ESSLLI 2016 11 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 12 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 13 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 14 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 15 / 62
Pairwise alignment
1Also, if we choose an uninformative prior with P(h1) = P(h0), we have
Gerhard Jäger Sequence Alignment ESSLLI 2016 16 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 17 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 18 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 19 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 20 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 21 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 22 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 23 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 24 / 62
Pairwise alignment
An.NORTHERN_PHILIPPINES.CENTRAL_BONTOC An.MESO-PHILIPPINE.NORTHERN_SORSOGON WF.WESTERN_FLY.IAMEGA WF.WESTERN_FLY.GAMAEWE Pan.PANOAN.KASHIBO_BAJO_AGUAYTIA Pan.PANOAN.KASHIBO_SAN_ALEJANDRO AA.EASTERN_CUSHITIC.KAMBAATA_2 AA.EASTERN_CUSHITIC.HADIYYA_2 ST.BAI.QILIQIAO_BAI_2 ST.BAI.YUNLONG_BAI An.SULAWESI.MANDAR An.OCEANIC.RAGA An.SULAWESI.TANETE An.SAMA-BAJAW.BOEPINANG_BAJAU UA.AZTECAN.NAHUATL_HUEYAPAN_TETELA_DEL_VOLCAN UA.AZTECAN.NAHUATL_CUENTEPEC_TEMIXCO An.SOUTHERN_PHILIPPINES.KAGAYANEN An.NORTHERN_PHILIPPINES.LIMOS_KALINGA An.MESO-PHILIPPINE.CANIPAAN_PALAWAN An.NORTHWEST_MALAYO-POLYNESIAN.LAHANAN NC.BANTOID.LIFONGA NC.BANTOID.BOMBOMA_2 IE.INDIC.WAD_PAGGA IE.INDIC.TALAGANG_HINDKO NC.BANTOID.LINGALA NC.BANTOID.LIFONGA An.CENTRAL_MALAYO-POLYNESIAN.BALILEDO An.CENTRAL_MALAYO-POLYNESIAN.PALUE AuA.MUNDA.HO AuA.MUNDA.KORKU MGe.GE-KAINGANG.KAYAPO MGe.GE-KAINGANG.APINAYE Gerhard Jäger Sequence Alignment ESSLLI 2016 25 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 26 / 62
Pairwise alignment
a a 56,047 . . . . . . . . . i i 33,955 4 8 2 u u 23,731 4 a 2 n n 21,363 G t 2
i ! 2 m m 18,263 G y 2 t t 16,975 d ! 2 k k 16,773 s G 2 e e 12,745 Z 5 2 r r 11,601 G s 2 l l 11,377 X z 2 b b 8,965 ! k 2 s s 8,245 q 8 2 d d 6,829 a ! 2 p p 6,681 a ! 2 w w 6,613 ! y 2 N N 6,275 ! E 2 h h 5,331 j G 2 y y 5,321 G i 2 3 3 5,255 E ! 2 . . . . . . . . . v S 2
Gerhard Jäger Sequence Alignment ESSLLI 2016 27 / 62
Pairwise alignment
a 0.1479 i 0.0969 u 0.0696
n 0.0614 e 0.0478 k 0.0478 m 0.0465 t 0.0449 r 0.0346 l 0.0331 b 0.0248 s 0.0243 w 0.0232 3 0.0228 y 0.0222 d 0.0214 h 0.0213 p 0.0202 N 0.0201 g 0.0178 E 0.0134 7 0.0124 C 0.0073 S 0.0064 x 0.0062 c 0.0056 f 0.0052 5 0.0049 v 0.0045 q 0.0041 z 0.0035 j 0.0035 T 0.0029 L 0.0027 X 0.0022 8 0.0014 Z 0.0011 ! 0.0009 4 0.0002 G 0.0001 Gerhard Jäger Sequence Alignment ESSLLI 2016 28 / 62
Pairwise alignment
G G 11.2348 ! ! 10.0202 4 4 9.1480 8 8 8.0650 Z Z 7.9575 X X 7.9375 L L 7.6276 z z 7.2624 q q 7.2542 f f 6.9117 v v 6.8418 5 5 6.7731 j j 6.7587 T T 6.6580 S S 6.6054 c c 6.5989 C C 6.2439 4 G 6.1943 x x 6.1210 G X 5.3342 G q 5.3017 7 7 5.2111 p p 5.0693 N N 4.9821 Z j 4.9386 d d 4.9263 g g 4.8958 b b 4.8906 s s 4.8277 4 5 4.7508 E E 4.7143 w w 4.6512 h h 4.5819 G x 4.5573 Z z 4.4943 y y 4.4637 l l 4.4037 ! G 4.3760 3 3 4.3692 r r 4.3061 X q 4.1200 m m 4.1087 t t 4.1021 G Z 4.0429 k k 3.9046 X x 3.8116 T Z 3.7380 8 G 3.6993 · · ·
C a
j
a m
E v
! w
! u
5 q
T
! k
e z
! s
f q
N S
! b
L b
T u
4 i
5 a
C N
! t
! e
! i
! a
Gerhard Jäger Sequence Alignment ESSLLI 2016 29 / 62
Pairwise alignment
j Z z L 8 y l d r C T c S s t ! 4 5 x X g h 7 q k G f v w p b n N m i e E 3
a a u
E e i m N n b p w v f G k q 7 h g X x 5 4 ! t s S c T C r d l y 8 L z Z j
−10 −5 5 10 PMI
j Z z L 8 y l d r C T c S s t ! 4 5 x X g h 7 q k G f v w p b n N m i e E 3
a a u
E e i m N n b p w v f G k q 7 h g X x 5 4 ! t s S c T C r d l y 8 L z Z j
Gerhard Jäger Sequence Alignment ESSLLI 2016 30 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 31 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 32 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 33 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 34 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 35 / 62
Pairwise alignment
F (0, 0) = G(0, 0) = ∀i : 0 < i ≤ n F (i, 0) = F (i − 1, 0) + G(i − 1, 0)e + (1 − G(i − 1, 0))d G(i, 0) = 1 ∀j : 0 < j ≤ m : F (0, j) = F (0, j − 1) + G(0, j − 1)e + (1 − G(0, j − 1))d G(0, j) = 1 ∀i, j : 0 < i ≤ n, 0 < j ≤ m F (i, j) = max F (i − 1, j) + G(i − 1, j)e + (1 − G(i − 1, j))d F (i, j − 1) + G(i, j − 1) + (1 − G(i, j − 1))d F (i − 1, j − 1) + sxiyj G(i, j) = 0 if arg max F (i − 1, j) + G(i − 1, j)e + (1 − G(i − 1, j))d F (i, j − 1) + G(i, j − 1)e + (1 − G(i, j − 1))d F (i − 1, j − 1) + sxiyj = 3 1 else Gerhard Jäger Sequence Alignment ESSLLI 2016 36 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 37 / 62
Pairwise alignment
iX- ego ego du du tu tu vir vir nos nos ains ain-s unus
cvai cvai
duo-
mEnS--- persona persona
fiS--- piskis piskis hun-t hun-t kanis kanis
pedikulus pedikul-us
b-lat folyu folyu haut-- haut--
k-utis
saNgwis saNgwis knoX3n knoX3n
horn- horn- kornu kornu
a-ug3-
na-z3 naz3- nasus nasus chan chan- dens d-ens
chuN--3 liNgwE
han-t han-t manus manus
b--rust pektus- pektus- leb3r leb3r yekur yekur triNk3n triNk3n-
widere widere
audire audire- Sterb3n Sterb3n
khom3n khom3n--- wenire w---enire zon3 zon3 so-l sol- Gerhard Jäger Sequence Alignment ESSLLI 2016 38 / 62
Pairwise alignment
vas3r
akwa--- Stain Sta-in lapis
fo-ia iNnis iNnis pfat p-fat viya viya- bErk bErk mons mons n-at na-t noks noks
fol---- plenus p-lenus no--i no-i- nowus nowus nam-3 nam3- nomen nomen Gerhard Jäger Sequence Alignment ESSLLI 2016 39 / 62
Pairwise alignment
'I': 0.3 iX i 'you': 8.26 du du 'we':
vir mia 'one': 4.63 ains
'two': 16.0 cvai cvoi 'person': 12.61 mEnS mEnZE 'fish': 16.35 fiS fiS 'louse': 15.01 laus laus 'tree': 6.57 baum bom 'leaf': 11.92 blat blad 'skin': 14.42 haut haut 'blood': 12.88 blut blud 'bone': 16.88 knoX3n knoXE 'horn': 8.75 horn hoan 'tongue': 9.8 chuN3 cuN 'knee': 7.77 kni knui 'hand': 8.6 hant hEnd 'breast': 14.81 brust bXuSt 'liver': 10.01 leb3r leba 'drink': 4.99 triNk3n dXiNg 'see': 0.63 ze3n se 'die': 10.16 Sterb3n StEab 'come': 11.84 khom3n khom 'sun': 8.79 zon3 sonE 'star': 16.16 StErn StEan 'water': 7.8 vas3r vaza 'stone': 10.36 Stain Stoi 'fire': 12.43 foia fuia Gerhard Jäger Sequence Alignment ESSLLI 2016 40 / 62
Pairwise alignment
'I':
iX Ei 'you': 2.34 du yu 'we': 2.21 vir wi 'one':
ains w3n 'two':
cvai tu 'fish': 16.35 fiS fiS 'dog':
hunt dag 'tree':
baum tri 'leaf':
blat lif 'blood': 9.46 blut bl3d 'bone':
knoX3n bon 'horn': 15.73 horn horn 'eye':
aug3 Ei 'nose': 1.63 naz3 nos 'tongue':-0.63 chuN3 t3N 'knee': 3.86 kni ni 'hand': 8.6 hant hEnd 'breast': 16.93 brust brest 'liver': 14.65 leb3r liv3r 'drink': 7.48 triNk3n drink 'see':
ze3n si 'die':
Sterb3n dEi 'come': 1.22 khom3n k3m 'sun': 1.95 zon3 s3n 'star': 8.2 StErn star 'water': 12.06 vas3r wat3r 'stone': 6.75 Stain ston 'fire': 6.79 foia fEir Gerhard Jäger Sequence Alignment ESSLLI 2016 41 / 62
Pairwise alignment
'I':
iX ego 'you': 3.62 du tu 'we':
vir nos 'one': 2.39 ains unus 'two':
cvai duo 'person':-4.66 mEnS persona 'fish': 0.29 fiS piskis 'louse': -0.08 laus pedikulus 'tree':
baum arbor 'leaf':
blat folyu 'skin':
haut kutis 'blood': -9.18 blut saNgwis 'bone':
knoX3n
'horn': 7.55 horn kornu 'nose': 4.49 naz3 nasus 'tooth': -2.78 chan dens 'tongue':-3.4 chuN3 liNgwE 'knee': 0.8 kni genu 'hand': 0.73 hant manus 'breast': 1.39 brust pektus 'liver': 5.37 leb3r yekur 'see':
ze3n widere 'hear':
her3n audire 'die':
Sterb3n mori 'come':
khom3n wenire 'sun': 0.97 zon3 sol 'star': 5.72 StErn stela 'water': -5.4 vas3r akwa Gerhard Jäger Sequence Alignment ESSLLI 2016 42 / 62
Pairwise alignment
LDN empirical probability of cognacy 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0 PMI empirical probability of cognacy −20 −10 10 20 0.0 0.2 0.4 0.6 0.8 1.0
0.00 0.25 0.50 0.75 1.00 no yes cognate LDN cognate no yes −20 −10 10 20 no yes cognate PMI cognate no yes
Gerhard Jäger Sequence Alignment ESSLLI 2016 43 / 62
Pairwise alignment
0.0 0.2 0.4 0.6 0.8 1.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0
precision−recall curve
recall precision LDN PMI
Gerhard Jäger Sequence Alignment ESSLLI 2016 44 / 62
Estimating distances from pairwise alignments
Gerhard Jäger Sequence Alignment ESSLLI 2016 45 / 62
Estimating distances from pairwise alignments
concept Italian English predicted prob. expert judgment sharp tallEnte Sop 0.004 float galleddZare fl3ut 0.004 Kill ammattsare kil 0.007 bark skordza bok 0.009 husband marito hozb3nd 0.010 walk kamminare wok 0.011 eat mandZare it 0.011 bark kortettSa bok 0.013 know sapere n3u 0.015 come venire kom 0.016 1 swim nwotare swim 0.016 back dosso bEk 0.018 burn ardere b3n 0.018 think pensare 8iNk 0.019 dust polvere dost 0.019 wife molle waif 0.020 swell gonfyare swEl 0.021 sing kantare siN 0.022 knee rotElla ni 0.022 dry aSSutto drai 0.022 five tSinkwe faiv 0.023 1 skin pElle skin 0.024 hand mano hEnd 0.025 blood sangwe blod 0.025 flow skorrere fl3u 0.026 wipe aSSugare waip 0.026 turn dZirare t3n 0.026 concept Italian English predicted prob. expert judgment father padre fo83 0.480 1 when kwando wEn 0.483 1 night notte nait 0.508 1 and eed End 0.518 name nome neim 0.519 1 worm vErme w3m 0.521 1 round tondo raund 0.526 1 many molti mEni 0.569 wind vEnto wind 0.573 1 two due tu 0.600 1 mother madre mo83 0.624 1 thou tu 8au 0.629 1 child fantSullo tSaild 0.638 long lungo loN 0.651 1 fish peSSe fiS 0.659 1 count kontare kaunt 0.660 1 star stella sto 0.664 1 belly vEntre bEli 0.679 sun sole son 0.692 1 fly volare flai 0.742 three tre 8ri 0.744 1 flow fluire fl3u 0.759 heavy grEve hEvi 0.769 person persona p3s3n 0.799 1 animal animale Enim3l 0.947 1 vomit vomitare vomit 0.960 1 fruit frutto frut 0.966 1
Gerhard Jäger Sequence Alignment ESSLLI 2016 46 / 62
Estimating distances from pairwise alignments
0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5 expert prediction
Greek−Bulgarian Greek−Russian Greek−Polish Greek−Ukrainian Greek−Czech Greek−Icelandic Greek−Swedish Greek−Danish Greek−English Greek−Dutch Greek−German Greek−Catalan Greek−Portuguese Greek−Spanish Greek−French Greek−Italian Greek−Breton Greek−Romanian Greek−Lithuanian Greek−Irish Greek−Hindi Greek−Bengali Greek−Welsh Greek−Nepali Bulgarian−Russian Bulgarian−Polish Bulgarian−Ukrainian Bulgarian−Czech Bulgarian−Icelandic Bulgarian−Swedish Bulgarian−Danish Bulgarian−English Bulgarian−Dutch Bulgarian−German Bulgarian−Catalan Bulgarian−Portuguese Bulgarian−Spanish Bulgarian−French Bulgarian−Italian Bulgarian−Breton Bulgarian−Romanian Bulgarian−Lithuanian Bulgarian−Irish Bulgarian−Hindi Bulgarian−Bengali Bulgarian−Welsh Bulgarian−Nepali Russian−Polish Russian−Ukrainian Russian−Czech Russian−Icelandic Russian−Swedish Russian−Danish Russian−English Russian−Dutch Russian−German Russian−Catalan Russian−Portuguese Russian−Spanish Russian−French Russian−Italian Russian−Breton Russian−Romanian Russian−Lithuanian Russian−Irish Russian−Hindi Russian−Bengali Russian−Welsh Russian−Nepali Polish−Ukrainian Polish−Czech Polish−Icelandic Polish−Swedish Polish−Danish Polish−English Polish−Dutch Polish−German Polish−Catalan Polish−Portuguese Polish−Spanish Polish−French Polish−Italian Polish−Breton Polish−Romanian Polish−Lithuanian Polish−Irish Polish−Hindi Polish−Bengali Polish−Welsh Polish−Nepali Ukrainian−Czech Ukrainian−Icelandic Ukrainian−Swedish Ukrainian−Danish Ukrainian−English Ukrainian−Dutch Ukrainian−German Ukrainian−Catalan Ukrainian−Portuguese Ukrainian−Spanish Ukrainian−French Ukrainian−Italian Ukrainian−Breton Ukrainian−Romanian Ukrainian−Lithuanian Ukrainian−Irish Ukrainian−Hindi Ukrainian−Bengali Ukrainian−Welsh Ukrainian−Nepali Czech−Icelandic Czech−Swedish Czech−Danish Czech−English Czech−Dutch Czech−German Czech−Catalan Czech−Portuguese Czech−Spanish Czech−French Czech−Italian Czech−Breton Czech−Romanian Czech−Lithuanian Czech−Irish Czech−Hindi Czech−Bengali Czech−Welsh Czech−Nepali Icelandic−Swedish Icelandic−Danish Icelandic−English Icelandic−Dutch Icelandic−German Icelandic−Catalan Icelandic−Portuguese Icelandic−Spanish Icelandic−French Icelandic−Italian Icelandic−Breton Icelandic−Romanian Icelandic−Lithuanian Icelandic−Irish Icelandic−Hindi Icelandic−Bengali Icelandic−Welsh Icelandic−Nepali Swedish−Danish Swedish−English Swedish−Dutch Swedish−German Swedish−Catalan Swedish−Portuguese Swedish−Spanish Swedish−French Swedish−Italian Swedish−Breton Swedish−Romanian Swedish−Lithuanian Swedish−Irish Swedish−Hindi Swedish−Bengali Swedish−Welsh Swedish−Nepali Danish−English Danish−Dutch Danish−German Danish−Catalan Danish−Portuguese Danish−Spanish Danish−French Danish−Italian Danish−Breton Danish−Romanian Danish−Lithuanian Danish−Irish Danish−Hindi Danish−Bengali Danish−Welsh Danish−Nepali English−Dutch English−German English−Catalan English−Portuguese English−Spanish English−French English−Italian English−Breton English−Romanian English−Lithuanian English−Irish English−Hindi English−Bengali English−Welsh English−Nepali Dutch−German Dutch−Catalan Dutch−Portuguese Dutch−Spanish Dutch−French Dutch−Italian Dutch−Breton Dutch−Romanian Dutch−Lithuanian Dutch−Irish Dutch−Hindi Dutch−Bengali Dutch−Welsh Dutch−Nepali German−Catalan German−Portuguese German−Spanish German−French German−Italian German−Breton German−Romanian German−Lithuanian German−Irish German−Hindi German−Bengali German−Welsh German−Nepali Catalan−Portuguese Catalan−Spanish Catalan−French Catalan−Italian Catalan−Breton Catalan−Romanian Catalan−Lithuanian Catalan−Irish Catalan−Hindi Catalan−Bengali Catalan−Welsh Catalan−Nepali Portuguese−Spanish Portuguese−French Portuguese−Italian Portuguese−Breton Portuguese−Romanian Portuguese−Lithuanian Portuguese−Irish Portuguese−Hindi Portuguese−Bengali Portuguese−Welsh Portuguese−Nepali Spanish−French Spanish−Italian Spanish−Breton Spanish−Romanian Spanish−Lithuanian Spanish−Irish Spanish−Hindi Spanish−Bengali Spanish−Welsh Spanish−Nepali French−Italian French−Breton French−Romanian French−Lithuanian French−Irish French−Hindi French−Bengali French−Welsh French−Nepali Italian−Breton Italian−Romanian Italian−Lithuanian Italian−Irish Italian−Hindi Italian−Bengali Italian−Welsh Italian−Nepali Breton−Romanian Breton−Lithuanian Breton−Irish Breton−Hindi Breton−Bengali Breton−Welsh Breton−Nepali Romanian−Lithuanian Romanian−Irish Romanian−Hindi Romanian−Bengali Romanian−Welsh Romanian−Nepali Lithuanian−Irish Lithuanian−Hindi Lithuanian−Bengali Lithuanian−Welsh Lithuanian−Nepali Irish−Hindi Irish−Bengali Irish−Welsh Irish−Nepali Hindi−Bengali Hindi−Welsh Hindi−Nepali Bengali−Welsh Bengali−Nepali Welsh−Nepali
Gerhard Jäger Sequence Alignment ESSLLI 2016 47 / 62
Estimating distances from pairwise alignments
Greek Bulgarian Russian Polish Ukrainian Czech Icelandic Swedish Danish English Dutch German Catalan Portuguese Spanish French Italian Breton Romanian Lithuanian Irish Hindi Bengali Welsh Nepali
Gerhard Jäger Sequence Alignment ESSLLI 2016 47 / 62
Estimating distances from pairwise alignments
Uralic Hmong-Mien Chukotko-Kamchatkan Japonic Dravidian Austronesian Nakh-Daghestanian Tai-Kadai Tungusic Sino-Tibetan Mongolic Yeniseian Ainu Austroasiatic Nivkh Indo-European Turkic
99.4% 100% 96.8% 99.9% 100% 96.9% 100%
Yukaghir Gerhard Jäger Sequence Alignment ESSLLI 2016 48 / 62
Estimating distances from pairwise alignments
Austronesian Niger-Congo T ai-Kadai Austro-Asiatic Sino-Tibetan Uto-Aztecan Mayan Quechuan Altaic
Africa Eurasia Papunesia Australia America
Subsaharan Africa NW Eurasia A u s t r a l i a / P a p u a SE Asia America Papua
K h
s a n Nilo-Saharan Kadugli Nilo-Saharan N i g e r
g
T i m
l
a n t a r Indo-European Uralic Afro-Asiatic Afro-Asiatic Australian
Gerhard Jäger Sequence Alignment ESSLLI 2016 49 / 62
Multiple sequence alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 50 / 62
Multiple sequence alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 51 / 62
Multiple sequence alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 52 / 62
Multiple sequence alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 53 / 62
Multiple sequence alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 54 / 62
Multiple sequence alignment
Gerhard Jäger Sequence Alignment ESSLLI 2016 55 / 62
Multiple sequence alignment
1
2
3
4
dendron 8en-ro- dendron 8---ru- dendron
dendron t---ri- dendron
8enro d--ru 8enro t--ri dru tri Gerhard Jäger Sequence Alignment ESSLLI 2016 56 / 62
Multiple sequence alignment
1
2
3
4
dendron 8en-ro- dendron 8---ru- dendron
dendron t---ri- dendron
8enro d--ru 8enro t--ri dru tri
t---ri- dendron
t--ri 8enro d--ru t---ri- dendron d---ru-
dendron
dendron t---ru- ... ... ...
Gerhard Jäger Sequence Alignment ESSLLI 2016 56 / 62
Multiple sequence alignment
1
2
3
4
dendron 8en-ro- dendron 8---ru- dendron
dendron t---ri- dendron
8enro d--ru 8enro t--ri dru tri
t---ri- dendron
t--ri 8enro d--ru t---ri- dendron d---ru-
dendron
dendron t---ru- ... ... ...
Gerhard Jäger Sequence Alignment ESSLLI 2016 56 / 62
Multiple sequence alignment
1
2
3
4
dendron 8en-ro- dendron 8---ru- dendron
dendron t---ri- dendron
8enro d--ru 8enro t--ri dru tri
t---ri- dendron
t--ri 8enro d--ru t---ri- dendron d---ru-
dendron
dendron t---ru- ... ... ...
dendron 8enro dendron 8en-ro- tri dru dendron 8en-ro- d---ru- dendron 8en-ro- d---ru- t---ri-
Gerhard Jäger Sequence Alignment ESSLLI 2016 56 / 62
Multiple sequence alignment
cognate class language word
German
Dutch
English
Danish
Swedish
Icelandic
Irish
Breton
French
Catalan
Spanish
Portuguese
Italian
Romanian
Bengali
Nepali
Czech yEdE--n-
Polish yEdE--n-
Ukrainian
Russian
Bulgarian
cognate class language word heart:J German h-Er-t--s- heart:J Dutch h-or-t---- heart:J English h-o--t---- heart:J Danish y-Ea-d--3- heart:J Swedish y-E--t--a- heart:J Icelandic S-ar-t--a- heart:J French k-Er------ heart:J Catalan k-or------ heart:J Spanish k-ora8--on heart:J Portuguese k-uras--aw heart:J Italian kwor----e- heart:J Hindi h--r-d--ai heart:J Lithuanian S-ir-dis-- heart:J Czech s--r-t-sE- heart:J Polish s-Er-t-sE- heart:J Ukrainian s-Er-t-sE- heart:J Russian s-Erdt-sE- heart:J Bulgarian s-3r-t-sE- heart:J Greek k-ar-8-Sa-
Gerhard Jäger Sequence Alignment ESSLLI 2016 57 / 62
Multiple sequence alignment
cognate class language word two:A German tsvai- two:A Dutch t-we-- two:A English t--u-- two:A Danish d--o-- two:A Swedish t-vo-- two:A Icelandic t-veir two:A French d--e-- two:A Catalan d--o-s two:A Spanish d--o-s two:A Portuguese d--oiS two:A Italian d--ue- two:A Romanian d--o-y two:A Nepali d--ui- two:A Czech d-va-- two:A Polish d-va-- two:A Ukrainian d-wa-- two:A Russian d-va-- two:A Bulgarian d-va-- two:A Greek 8-io-- cognate class language word mother:A German mu-t--a- mother:A Dutch mu-d--3r mother:A English mo-8--3- mother:A Danish mo----a- mother:A Swedish mu-d--3r mother:A Icelandic mou8--ir mother:A French mE---r-- mother:A Catalan ma---r3- mother:A Spanish ma-8-re- mother:A Portuguese ma----i- mother:A Italian ma-d-re- mother:A Czech ma-t-ka- mother:A Polish ma-t-ka- mother:A Ukrainian ma-t--i- mother:A Russian ma-t---- mother:A Bulgarian ma-y-k3- mother:A Greek mi-tera-
Gerhard Jäger Sequence Alignment ESSLLI 2016 58 / 62
Multiple sequence alignment
cognate class language word tongue:W German
tongue:W Dutch
tongue:W English
tongue:W Danish
tongue:W Swedish
tongue:W Icelandic
tongue:W French
tongue:W Catalan
tongue:W Spanish
tongue:W Portuguese
tongue:W Italian
tongue:W Romanian
tongue:W Hindi
tongue:W Czech ya-z-ik--- tongue:W Polish yEwz-3k--- tongue:W Ukrainian ya-z-ik--- tongue:W Russian yi-z-3k--- tongue:W Bulgarian
cognate class language word tooth:B Greek 8-ondi tooth:B German tsan-- tooth:B Dutch t-ont- tooth:B English t-u-8- tooth:B Danish d-an-- tooth:B Swedish t-and- tooth:B Icelandic t-En-- tooth:B French d-o--- tooth:B Catalan d-en-- tooth:B Spanish dyente tooth:B Portuguese d-e-t3 tooth:B Italian d-Ente tooth:B Romanian d-inte tooth:B Bengali d-o-t- tooth:B Hindi d-a-t-
Gerhard Jäger Sequence Alignment ESSLLI 2016 59 / 62
Multiple sequence alignment
cognate class language word dog:A Lithuanian S-u---o- dog:A Ukrainian s-obaka- dog:A Russian s-abaka- dog:A Danish h-u-n--- dog:A Swedish h-3-nd-- dog:A Icelandic h-i-ndir dog:A German h-u-nt-- dog:A Dutch h-o-nt-- dog:A Welsh k-----i- dog:A Breton k-----i- dog:A Irish k-----u- dog:A French S-i---E- dog:A Italian k-a-n-e- dog:A Portuguese k-a---u- dog:A Romanian k-3yn-e- dog:A Greek Tio-n--- cognate class language word tree:C Danish d---G-E-- tree:C Swedish t---r-Ed- tree:C Icelandic t---ryE-- tree:C English t---r-i-- tree:C Ukrainian dE--r-Ewo tree:C Russian dE--r-Evo tree:C Polish d---Z-Evo tree:C Bulgarian d3--r--vo tree:C Greek 8endr---o
Gerhard Jäger Sequence Alignment ESSLLI 2016 60 / 62
Wrapping up
Gerhard Jäger Sequence Alignment ESSLLI 2016 61 / 62
Wrapping up
Gerhard Jäger Sequence Alignment ESSLLI 2016 62 / 62
References
Gerhard Jäger Sequence Alignment ESSLLI 2016 62 / 62
Wrapping up
Gerhard Jäger Sequence Alignment ESSLLI 2016 62 / 62