Cross-lingual POS Tagging
Daniel Zeman, Rudolf Rosa
March 27, 2020
NPFL120 Multilingual Natural Language Processing
Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated
Cross-lingual POS Tagging Daniel Zeman, Rudolf Rosa March 27, 2020 - - PowerPoint PPT Presentation
Cross-lingual POS Tagging Daniel Zeman, Rudolf Rosa March 27, 2020 NPFL120 Multilingual Natural Language Processing Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated
Daniel Zeman, Rudolf Rosa
March 27, 2020
NPFL120 Multilingual Natural Language Processing
Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated
Bracketers via Robust Projection across Aligned Corpora
Computational Linguistics (NAACL-2001), pp. 200–207, Pittsburgh, PA, USA
English
French, Chinese
Les lois
NNS NNS
Cross-lingual POS Tagging
1/14
Bracketers via Robust Projection across Aligned Corpora
Computational Linguistics (NAACL-2001), pp. 200–207, Pittsburgh, PA, USA
English
French, Chinese
Les lois
NNS NNS
Cross-lingual POS Tagging
1/14
Bracketers via Robust Projection across Aligned Corpora
Computational Linguistics (NAACL-2001), pp. 200–207, Pittsburgh, PA, USA
English
French, Chinese
Les lois
NNS NNS
Cross-lingual POS Tagging
1/14
Bracketers via Robust Projection across Aligned Corpora
Computational Linguistics (NAACL-2001), pp. 200–207, Pittsburgh, PA, USA
English
French, Chinese
Cross-lingual POS Tagging
1/14
Cross-lingual POS Tagging
2/14
P(t(2)|w) = λ1P(t(2)|w) where λ1 < 1.0
P(t(1)|w) = 1 − ˆ P(t(2)|w)
P(t(c)|w) = 0 for all c > 2
Cross-lingual POS Tagging
2/14
1:N alignments:
fewer parameters, therefore we can afgord it.
probability for each word – penalize less compatible sentences by pseudo-divergence weighting:
log _
Cross-lingual POS Tagging
3/14
1:N alignments: P(t|w) = λ2P1:1(t|w) + (1 − λ2)P1:N(t|w)
fewer parameters, therefore we can afgord it.
probability for each word – penalize less compatible sentences by pseudo-divergence weighting:
log _
Cross-lingual POS Tagging
3/14
1:N alignments: P(t|w) = λ2P1:1(t|w) + (1 − λ2)P1:N(t|w)
fewer parameters, therefore we can afgord it.
probability for each word – penalize less compatible sentences by pseudo-divergence weighting:
k
∑k
i=1 log ˆ
P(projected_tagi|wi)
Cross-lingual POS Tagging
3/14
Graph-Based Projections. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 600–609, Portland, Oregon, USA.
Learning POS taggers for truly low-resource languages. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers), pp. 268–272, Beijing, China.
Cross-lingual POS Tagging
4/14
Cross-lingual POS Tagging
5/14
Cross-lingual POS Tagging
5/14
Cross-lingual POS Tagging
5/14
Cross-lingual POS Tagging
6/14
Cross-lingual POS Tagging
7/14
Resource Poor Indian Languages through Feature Projection
(Indo-Aryan, i.e., related to Hindi)
(Dravidian, i.e., unrelated)
Cross-lingual POS Tagging
8/14
Resource Poor Indian Languages through Feature Projection
(Indo-Aryan, i.e., related to Hindi)
(Dravidian, i.e., unrelated)
Cross-lingual POS Tagging
8/14
Cross-lingual POS Tagging
9/14
Prefjx(1) प pa Prefjx(2) पत pata Prefjx(3) पत् pat Prefjx(4) प patra Prefjx(5) पक patraka Prefjx(6) पका patrakā Prefjx(7) पकार patrakāra Suffjx(1) ◌ं ṁ Suffjx(2) ◌ oṁ Suffjx(3) र roṁ Suffjx(4) ◌ार āroṁ Length 9 Current पकार patrakāroṁ Previous, Next context dependent
Cross-lingual POS Tagging
10/14
Prefjx(1) व va → ਵ va Prefjx(2) िव vi → ਿਵ vi Prefjx(3) िवव viva → ਿਵਆ viā Prefjx(4) िववा vivā → ਿਵਆਹ viāha Prefjx(5) िववाह vivāha → ਿਵਆਹੁ viāhu Prefjx(6) िववािह vivāhi → ਿਵਆਹੁਤ viāhuta Prefjx(7) िववािहत vivāhita → ਿਵਆਹੁਤਾ viāhutā Suffjx(1) त ta → ◌ਾ ā Suffjx(2) ि◌त ita → ਤਾ tā Suffjx(3) िहत hita → ◌ੁਤਾ utā Suffjx(4) ◌ािहत āhita → ਹੁਤਾ hutā Length 7 → 7 Current िववािहत vivāhita → ਿਵਆਹੁਤਾ viāhutā
Cross-lingual POS Tagging
11/14
feature is selected.”
Cross-lingual POS Tagging
12/14
feature is selected.”
Cross-lingual POS Tagging
12/14
feature is selected.”
Cross-lingual POS Tagging
12/14
shorter feature.
Cross-lingual POS Tagging
13/14
Cross-lingual POS Tagging
13/14
Cross-lingual POS Tagging
14/14