Creating Large-Scale Multilingual Cognate Tables Winston Wu and - - PowerPoint PPT Presentation

creating large scale
SMART_READER_LITE
LIVE PREVIEW

Creating Large-Scale Multilingual Cognate Tables Winston Wu and - - PowerPoint PPT Presentation

Creating Large-Scale Multilingual Cognate Tables Winston Wu and David Yarowsky Center for Language and Speech Processing Johns Hopkins University http://educationviews.org/wp-content/uploads/2013/06/world-bread-cognates-panis.jpg Cognates and


slide-1
SLIDE 1

Creating Large-Scale Multilingual Cognate Tables

Winston Wu and David Yarowsky Center for Language and Speech Processing Johns Hopkins University

slide-2
SLIDE 2

http://educationviews.org/wp-content/uploads/2013/06/world-bread-cognates-panis.jpg

slide-3
SLIDE 3

Cognates and Cognate Chains

slide-4
SLIDE 4

Data

  • Panlex and Wiktionary
slide-5
SLIDE 5

Cognate Table Construction

Initial cluster with unweighted edit distance Alignment to get lexical translation probabilities Cluster with weighted distance function

slide-6
SLIDE 6

Clustering

azj: stol tat: ostal tat: tablis tuk: stol tuk: tablisa tur: tablo uig: ustel uzn: stol uzn: tablista

slide-7
SLIDE 7

Bitext from Clusters

eng azj tat tuk tur uig uzn table stol stol stol table

  • stal

ustel table tablo table tablis tablisa tablista

slide-8
SLIDE 8

Alignment

t -> t 0.600 t -> d 0.098 t -> c 0.061 t -> r 0.057 t -> p 0.019 t -> s 0.017 t -> l 0.017 t -> n 0.015 l -> l 0.747 l -> r 0.048 l -> n 0.024 l -> t 0.019 l -> o 0.018 l -> d 0.016 l -> c 0.015 l -> a 0.015 h -> h 0.529 h -> u 0.150 h -> NULL 0.140 h -> l 0.048 h -> a 0.032 h -> j 0.019 h -> o 0.017 h -> k 0.015

ü s t e l

  • s t o l

TAT UIG

slide-9
SLIDE 9

Clustering Distance Function

  • Language-pair-specific edit distance
  • Intra-family edit distance
  • Same backtranslation
  • Same POS
  • Same MeaningID
slide-10
SLIDE 10

Cognate Tables

slide-11
SLIDE 11

Experiments

  • Hold out words
  • Use MT to predict
  • Single language pair and system combination
  • Evaluate on 1-best, 10-best, MRR
slide-12
SLIDE 12

Results: Romance

slide-13
SLIDE 13

Results: Romance

slide-14
SLIDE 14

Results: Turkic

slide-15
SLIDE 15

Results: Turkic

slide-16
SLIDE 16

Results: Romance

slide-17
SLIDE 17

Results: Turkic

slide-18
SLIDE 18

Conclusion

  • Cluster-alignment-cluster process for

multilingual cognate table construction

  • Experiments
  • 1-best exact match accuracy is hard!
  • Close languages tend to do better
  • Data size matters
  • Code and data at github.com/wswu/coglust