Cross-lingual POS Tagging Daniel Zeman, Rudolf Rosa March 27, 2020 - PowerPoint PPT Presentation

Cross-lingual POS Tagging Daniel Zeman, Rudolf Rosa March 27, 2020 NPFL120 Multilingual Natural Language Processing Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated

• Align words using EGYPT/IBM Model 3 (Al-Onaizan et al., 1999) • 1:N English-target word alignment • or 0:1 or 1:0 for unaligned words • Tag the English side with an existing tagger (e.g., Brill, 1995) • Direct projection across alignment • Laws • NNS Cross-lingual POS Tagging NNS NNS Les lois POS Tags Projection across Parallel Corpora Chinese French, English Computational Linguistics (NAACL-2001), pp. 200–207, Pittsburgh, PA, USA Bracketers via Robust Projection across Aligned Corpora 1/14 • David Yarowsky, Grace Ngai (2001). Inducing Multilingual POS Taggers and NP • In Proceedings of the Second Meeting of the North American Association for • Source language: • Target languages:

• Tag the English side with an existing tagger (e.g., Brill, 1995) • Direct projection across alignment • Laws • NNS POS Tags Projection across Parallel Corpora Cross-lingual POS Tagging NNS NNS Les lois 1/14 Chinese French, English Computational Linguistics (NAACL-2001), pp. 200–207, Pittsburgh, PA, USA Bracketers via Robust Projection across Aligned Corpora • David Yarowsky, Grace Ngai (2001). Inducing Multilingual POS Taggers and NP • In Proceedings of the Second Meeting of the North American Association for • Source language: • Target languages: • Align words using EGYPT/IBM Model 3 (Al-Onaizan et al., 1999) • 1:N English-target word alignment • or 0:1 or 1:0 for unaligned words

• Direct projection across alignment • Laws • NNS POS Tags Projection across Parallel Corpora Cross-lingual POS Tagging NNS NNS Les lois 1/14 Chinese French, English Computational Linguistics (NAACL-2001), pp. 200–207, Pittsburgh, PA, USA Bracketers via Robust Projection across Aligned Corpora • David Yarowsky, Grace Ngai (2001). Inducing Multilingual POS Taggers and NP • In Proceedings of the Second Meeting of the North American Association for • Source language: • Target languages: • Align words using EGYPT/IBM Model 3 (Al-Onaizan et al., 1999) • 1:N English-target word alignment • or 0:1 or 1:0 for unaligned words • Tag the English side with an existing tagger (e.g., Brill, 1995)

POS Tags Projection across Parallel Corpora English Cross-lingual POS Tagging French, Chinese Bracketers via Robust Projection across Aligned Corpora Computational Linguistics (NAACL-2001), pp. 200–207, Pittsburgh, PA, USA 1/14 • David Yarowsky, Grace Ngai (2001). Inducing Multilingual POS Taggers and NP • In Proceedings of the Second Meeting of the North American Association for • Source language: • Target languages: • Align words using EGYPT/IBM Model 3 (Al-Onaizan et al., 1999) • 1:N English-target word alignment • or 0:1 or 1:0 for unaligned words • Tag the English side with an existing tagger (e.g., Brill, 1995) • Direct projection across alignment • Laws → Les lois • NNS → NNS a NNS b

• Aggressive smoothing towards two most frequent core tags of each word Training on Noisy Data Cross-lingual POS Tagging for all • • where • 2/14 • Train a tagger on the target side • Problem: lot of noise! • Core tags only: fjrst letter, i.e.: • N … noun • J … adjective • V … verb • R … adverb • I … preposition or subordinating conjunction (?)

Training on Noisy Data Cross-lingual POS Tagging 2/14 • Train a tagger on the target side • Problem: lot of noise! • Core tags only: fjrst letter, i.e.: • N … noun • J … adjective • V … verb • R … adverb • I … preposition or subordinating conjunction (?) • Aggressive smoothing towards two most frequent core tags of each word • ˆ P ( t (2) | w ) = λ 1 P ( t (2) | w ) where λ 1 < 1 . 0 • ˆ P ( t (1) | w ) = 1 − ˆ P ( t (2) | w ) • ˆ P ( t ( c ) | w ) = 0 for all c > 2

• Linear interpolation of model obtained from 1:1 alignments, and of model obtained from • Estimate tag sequence model on fjltered, high-confjdence alignment data. There are • Alignment confjdence score provided by Model 3 • Sentences where directly projected tags are compatible with the estimated lexical prior • sentence length Cross-lingual POS Tagging _ log weighting: probability for each word – penalize less compatible sentences by pseudo-divergence Training on Noisy Data fewer parameters, therefore we can afgord it. is some weight from • 1:N alignments: 3/14 • Recursively apply the smoothing to subtags • E.g. distribute the prob. mass of N to the two most probable subtags, NN and NNS

• Estimate tag sequence model on fjltered, high-confjdence alignment data. There are • Alignment confjdence score provided by Model 3 • Sentences where directly projected tags are compatible with the estimated lexical prior • sentence length Cross-lingual POS Tagging _ log weighting: probability for each word – penalize less compatible sentences by pseudo-divergence Training on Noisy Data 3/14 fewer parameters, therefore we can afgord it. • Recursively apply the smoothing to subtags • E.g. distribute the prob. mass of N to the two most probable subtags, NN and NNS • Linear interpolation of model obtained from 1:1 alignments, and of model obtained from 1:N alignments: P ( t | w ) = λ 2 P 1:1 ( t | w ) + (1 − λ 2 ) P 1: N ( t | w ) • λ 2 is some weight from (0; 1)

Training on Noisy Data fewer parameters, therefore we can afgord it. Cross-lingual POS Tagging weighting: probability for each word – penalize less compatible sentences by pseudo-divergence 3/14 • Recursively apply the smoothing to subtags • E.g. distribute the prob. mass of N to the two most probable subtags, NN and NNS • Linear interpolation of model obtained from 1:1 alignments, and of model obtained from 1:N alignments: P ( t | w ) = λ 2 P 1:1 ( t | w ) + (1 − λ 2 ) P 1: N ( t | w ) • λ 2 is some weight from (0; 1) • Estimate tag sequence model on fjltered, high-confjdence alignment data. There are • Alignment confjdence score provided by Model 3 • Sentences where directly projected tags are compatible with the estimated lexical prior i =1 log ˆ ∑ k • sentence length k ⇒ weight = 1 P ( projected _ tag i | w i ) k

POS Tags Projection across Parallel Corpora for Computational Linguistics , pp. 600–609, Portland, Oregon, USA. International Joint Conference on Natural Language Processing (Short Papers) , pp. 268–272, Beijing, China. Cross-lingual POS Tagging 4/14 • Dipanjan Das, Slav Petrov (2011). Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections. In Proceedings of the 49 th Annual Meeting of the Association • Difgerences from Yarowsky and Ngai (2001): • Graph-based projection • Projected labels are features in an unsupervised model • Željko Agić, Dirk Hovy, Anders Søgaard (2015). If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages. In Proceedings of the 53 rd Annual Meeting of the Association for Computational Linguistics and the 7 th

• English vertices are connected to foreign vertices • Foreign vertices are connected to other foreign vertices Projection Graph Cross-lingual POS Tagging 5/14 • English vertices = word types • Foreign vertices = word trigram types

• Foreign vertices are connected to other foreign vertices Projection Graph Cross-lingual POS Tagging 5/14 • English vertices = word types • Foreign vertices = word trigram types • English vertices are connected to foreign vertices

Projection Graph Cross-lingual POS Tagging 5/14 • English vertices = word types • Foreign vertices = word trigram types • English vertices are connected to foreign vertices • Foreign vertices are connected to other foreign vertices

Cross-lingual POS Tagging Training 6/14 • Parallel English-foreign corpus, word-aligned • English side labeled by a supervised English tagger • Monolingual foreign corpus, unlabeled • Used to compute target edge weights (similarity) • ⇒ We will propagate tags across edges

Cross-lingual POS Tagging Monolingual Similarity of Foreign Trigrams 7/14 • Trigram type x 2 x 3 x 4 in a sequence x 1 x 2 x 3 x 4 x 5 • Features: • Trigram + Context: x 1 x 2 x 3 x 4 x 5 • Trigram: x 2 x 3 x 4 • Left Context: x 1 x 2 • Right Context: x 4 x 5 • Center Word: x 3 • Trigram – Center Word: x 2 x 4 • Left Word + Right Context: x 2 x 4 x 5 • Left Context + Right Word: x 1 x 2 x 4 • Suffjx: HasSuffix( x 3 )

• Parallel corpora: “Health” and “Tourism” (250 to 500K tokens each; not publicly available) • Align words using GIZA++ POS Tags Projection across Parallel Corpora (continued) Resource Poor Indian Languages through Feature Projection (Indo-Aryan, i.e., related to Hindi) (Dravidian, i.e., unrelated) Cross-lingual POS Tagging 8/14 • Pruthwik Mishra, Vandan Mujadia, Dipti Misra Sharma (2017). POS Tagging for • In Proceedings of ICON 2017, Jadavpur, India • Source language: Hindi • Target languages: • Urdu, Punjabi, Gujarati, Marathi, Konkani, Bengali • Telugu, Tamil, Malayalam

Cross-lingual POS Tagging Daniel Zeman, Rudolf Rosa March 27, 2020 - PowerPoint PPT Presentation

Cross-lingual POS Tagging Daniel Zeman, Rudolf Rosa March 27, 2020 NPFL120 Multilingual Natural Language Processing Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.

POS tagging CMSC 723 / LING 723 / INST 725 Marine Carpuat POS tagging Sequence labeling with

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Joint Word Segmentation and pos-Tagging using a Single Perceptron Yue Zhang and Stephen Clark

Statistical Natural Language Processing Dr. Besnik Fetahu Overview POS tagging

Cross-Lingual Part-of-Speech Tagging through Ambiguous Learning Guillaume Wisniewski Nicolas

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction. Rob van der Goot, Nikola

Cross-lingual Information Retrieval Pavel Pecina Institute of Formal and Applied Linguistics

Cross-Lingual Information Retrieval Language Technology I Language Technology I Crosslingual

On the informational completeness of local observables Isaac H. Kim Perimeter Institute of

Interacting diffusions on random sparse graphs EBP July 2019 Guilherme Reis - Federal University

Learning in Autonomous Systems Proff. Luca Iocchi, Giorgio Grisetti Course web site:

Interacting particle systems for the analysis of rare events Josselin Garnier (Universit e

t t s r

Requirements for GMPLS Control of Flexible Grids CCAMP WG, IETF 82nd, Taipei, Taiwan

Adaptive Filters Processing Structures Gerhard Schmidt Christian-Albrechts-Universitt zu

Transiting Exoplanet Survey Satellite and AGN Michael Fausnaugh faus@mit.edu 8/18/2017 u Level

Cross-lingual POS Tagging Daniel Zeman, Rudolf Rosa March 27, 2020 - PowerPoint PPT Presentation

Cross-lingual POS Tagging Daniel Zeman, Rudolf Rosa March 27, 2020 NPFL120 Multilingual Natural Language Processing Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning &amp; H.

POS tagging CMSC 723 / LING 723 / INST 725 Marine Carpuat POS tagging Sequence labeling with

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Joint Word Segmentation and pos-Tagging using a Single Perceptron Yue Zhang and Stephen Clark

Statistical Natural Language Processing Dr. Besnik Fetahu Overview POS tagging

Cross-Lingual Part-of-Speech Tagging through Ambiguous Learning Guillaume Wisniewski Nicolas

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction. Rob van der Goot, Nikola

Cross-lingual Information Retrieval Pavel Pecina Institute of Formal and Applied Linguistics

Cross-Lingual Information Retrieval Language Technology I Language Technology I Crosslingual

On the informational completeness of local observables Isaac H. Kim Perimeter Institute of

Interacting diffusions on random sparse graphs EBP July 2019 Guilherme Reis - Federal University

Learning in Autonomous Systems Proff. Luca Iocchi, Giorgio Grisetti Course web site:

Interacting particle systems for the analysis of rare events Josselin Garnier (Universit e

t t s r

Requirements for GMPLS Control of Flexible Grids CCAMP WG, IETF 82nd, Taipei, Taiwan

Adaptive Filters Processing Structures Gerhard Schmidt Christian-Albrechts-Universitt zu

Transiting Exoplanet Survey Satellite and AGN Michael Fausnaugh faus@mit.edu 8/18/2017 u Level

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.