creating large scale
play

Creating Large-Scale Multilingual Cognate Tables Winston Wu and - PowerPoint PPT Presentation

Creating Large-Scale Multilingual Cognate Tables Winston Wu and David Yarowsky Center for Language and Speech Processing Johns Hopkins University http://educationviews.org/wp-content/uploads/2013/06/world-bread-cognates-panis.jpg Cognates and


  1. Creating Large-Scale Multilingual Cognate Tables Winston Wu and David Yarowsky Center for Language and Speech Processing Johns Hopkins University

  2. http://educationviews.org/wp-content/uploads/2013/06/world-bread-cognates-panis.jpg

  3. Cognates and Cognate Chains

  4. Data • Panlex and Wiktionary

  5. Cognate Table Construction Initial cluster with Alignment to get Cluster with unweighted edit lexical translation weighted distance distance probabilities function

  6. Clustering tuk: stol uig: ustel azj: stol tur: tablo uzn: stol tat: ostal tuk: tablisa tat: tablis uzn: tablista

  7. Bitext from Clusters eng azj tat tuk tur uig uzn table stol stol stol table ostal ustel table tablo table tablis tablisa tablista

  8. Alignment ü s t e l UIG o s t o l TAT t -> t 0.600 l -> l 0.747 h -> h 0.529 t -> d 0.098 l -> r 0.048 h -> u 0.150 t -> c 0.061 l -> n 0.024 h -> NULL 0.140 t -> r 0.057 l -> t 0.019 h -> l 0.048 t -> p 0.019 l -> o 0.018 h -> a 0.032 t -> s 0.017 l -> d 0.016 h -> j 0.019 t -> l 0.017 l -> c 0.015 h -> o 0.017 t -> n 0.015 l -> a 0.015 h -> k 0.015

  9. Clustering Distance Function • Language-pair-specific edit distance • Intra-family edit distance • Same backtranslation • Same POS • Same MeaningID

  10. Cognate Tables

  11. Experiments • Hold out words • Use MT to predict • Single language pair and system combination • Evaluate on 1-best, 10-best, MRR

  12. Results: Romance

  13. Results: Romance

  14. Results: Turkic

  15. Results: Turkic

  16. Results: Romance

  17. Results: Turkic

  18. Conclusion • Cluster-alignment-cluster process for multilingual cognate table construction • Experiments • 1-best exact match accuracy is hard! • Close languages tend to do better • Data size matters • Code and data at github.com/wswu/coglust

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend