Exploring the use of target-language information to train the - PowerPoint PPT Presentation

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems ∗ Felipe S´ anchez-Mart´ ınez, Juan Antonio P´ erez-Ortiz, Mikel L. Forcada Departament de Llenguatges i Sistemes Inform` atics Universitat d’Alacant E-03071 Alacant, Spain { fsanchez,japerez,mlf } @dlsi.ua.es ∗ Funded by the Spanish Government through grants TIC2003-08681-C02-01 and BES-2004-4711

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 1 ⊲ Contents • Introduction • Part-of-speech ambiguities in machine translation • Part-of-speech tagging with HMM • Target-language based training of HMM-based taggers • Target-language model • Experiments • Results • Discussion • Future work EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 2 ⊲ Introduction Part-of-speech (PoS) tagging: determining the lexical category or PoS of each word that appears in a text Lexically ambiguous word: word with more than one possible lexical category or part-of-speech (PoS) Lemma PoS noun book book verb book Ambiguities are usually solved by looking at the context EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 3 ⊲ PoS ambiguities in machine translation (I) Indirect MT system: source language (SL) text is analysed and transformed into an intermediate representation (IR), transformations are applied and, finally, target language (TL) text is generated SLIR TLIR ↓ ↓ SL → TL text − → Analysis − → − → Generation − Transformation text • Analysis module usually includes a PoS tagger EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 4 ⊲ PoS ambiguities in machine translation (II) Mistranslation due to wrong PoS tagging • Translation differs from one PoS to another: Spanish PoS Translation into Catalan preposition per a (for/to) para verb para (stop) EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 4 ⊲ PoS ambiguities in machine translation (II) Mistranslation due to wrong PoS tagging • Translation differs from one PoS to another: Spanish PoS Translation into Catalan preposition per a (for/to) para verb para (stop) • Some transformation is applied (or not) for some PoS: Spanish PoS Translation into Catalan gender la (article) el carrer (the street) ← agreement la calle la (pronoun) * la carrer (it/her street) rule applied EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 5 ⊲ PoS tagging with HMM (I) Classical use of a hidden Markov model (HMM): • Adopting a reduced tag set (grouping the finer tags delivered by the morphological analyser) • Each HMM state corresponds to a different PoS tag • Each input word is replaced by its corresponding ambiguity class (set of all possible PoS tags for a given word) EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 6 ⊲ PoS tagging with HMM (II) Estimating proper HMM parameters:  supervised  Training unsupervised  EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 6 ⊲ PoS tagging with HMM (II) Estimating proper HMM parameters:  supervised  ✑✑✑✑✑✑✑✑✑✑✑✑ ✸ Training unsupervised  ❅ ■ ❅ ❅ ❅ ❅ tagged corpus untagged corpus EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 6 ⊲ PoS tagging with HMM (II) Estimating proper HMM parameters:  supervised  ✑✑✑✑✑✑✑✑✑✑✑✑ ✸ � Training Baum-Welch unsupervised New idea: Use of TL information  ❅ ■ ❅ ❅ ❅ ❅ tagged corpus untagged corpus EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 7 ⊲ Target-language based training of HMM-based taggers (I) Training as if we had a tagged corpus: • Transition probabilities n ( γ i γ j ) ˜ a γ i γ j = , where γ i is a tag � γ k ∈ Γ ˜ n ( γ i γ k ) • Emission probabilities n ( σ, γ i ) ˜ b γ i σ = , where σ is an ambiguity class n ( σ ′ , γ i ) � σ ′ : γ i ∈ σ ′ ˜ EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 8 ⊲ Target-language based training of HMM-based taggers (II) SL text ↓ segmentation ↓ seg. s 1 , seg. s 2 , seg. s 3 . . . seg. s n EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 8 ⊲ Target-language based training of HMM-based taggers (II) SL text ↓ segmentation ↓ seg. s 1 , seg. s 2 , seg. s 3 . . . seg. s n disambiguations path g 1 ր seg. path g 2 . . . . . . s i ց path g m EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 8 ⊲ Target-language based training of HMM-based taggers (II) SL text ↓ segmentation ↓ seg. s 1 , seg. s 2 , seg. s 3 . . . seg. s n translations disambiguations τ ( g 1 , s ) path g 1 ր ց ր path g 2 τ ( g 2 , s ) seg. . . . . . . MT . . . . . . . . . s i ց ր ց path g m τ ( g m , s ) EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 8 ⊲ Target-language based training of HMM-based taggers (II) SL text ↓ segmentation ↓ seg. s 1 , seg. s 2 , seg. s 3 . . . seg. s n translations likelihoods disambiguations τ ( g 1 , s ) p TL ( τ ( g 1 , s )) path g 1 ր ց ր ց ր TL path g 2 τ ( g 2 , s ) p TL ( τ ( g 2 , s )) seg. . . . . . . . . . . MT . . . . . . . . . . . model . . . s i ց ր ց ր ց path g m τ ( g m , s ) p TL ( τ ( g m , s )) EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 8 ⊲ Target-language based training of HMM-based taggers (II) SL text ↓ segmentation ↓ seg. s 1 , seg. s 2 , seg. s 3 . . . seg. s n translations likelihoods probs. disambiguations τ ( g 1 , s ) p TL ( τ ( g 1 , s )) p ( g 1 | s ) path g 1 �� ր ց ր ց ր TL path g 2 τ ( g 2 , s ) p TL ( τ ( g 2 , s )) p ( g 2 | s ) seg. . . . . . . . . . . �� MT . . . . . . . . . . . . . . . model . . . . s i . ց ր ց ր ց path g m τ ( g m , s ) p TL ( τ ( g m , s )) p ( g m | s ) �� EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 9 ⊲ Target-language based training of HMM-based taggers (III) s ≡ y la para si � CNJ � � CNJ � � � � � ART VB PRN PR p ( g i | s ) g 1 ≡ CNJ ART PR CNJ τ ( g 1 , s ) ≡ i (and) la (the) per a (for/to) si (if) 0 . 0001 g 2 ≡ CNJ ART VB CNJ τ ( g 2 , s ) ≡ i (and) la (the) para (stop) si (if) 0 . 4999 g 3 ≡ CNJ PRN PR CNJ τ ( g 3 , s ) ≡ i (and) la (it/her) per a (for/to) si (if) 0 . 0001 g 4 ≡ CNJ PRN VB CNJ τ ( g 4 , s ) ≡ i (and) la (it/her) para (stop) si (if) 0 . 4999 Free ride: word translated the same way independently of the tag selected EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 10 ⊲ Target-language based training of HMM-based taggers (IV) p ( g i | s ) ∝ p ( g i | τ ( g i , s )) p TL ( τ ( g i , s )) • p ( g i | s ) : Probability of g i to be the correct disambiguation of segment s • p TL ( τ ( g i , s )) : Likelihood of the translation into TL of segment s according to the disambiguation given by path g i – Language model based on trigrams of words – ... • p ( g i | τ ( g i , s )) : Contribution of the disambiguation path g i to the translation given by τ ( g i , s ) EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems 11 ⊲ Target-language model • Trigram model of TL surface forms (words as they appear in raw text) • Probabilities smoothed via deleted interpolation and Good-Turing • Likelihood evaluation of a segment: – taking into account the two preceding words of the segment, and – taking into account the two first words of the next segment • Problem: Shorter translations receive higher scores than larger ones EsTAL, 20–22 October 2004

Exploring the use of target-language information to train the - PowerPoint PPT Presentation

Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems Felipe S anchez-Mart nez, Juan Antonio P erez-Ortiz, Mikel L. Forcada Departament de Llenguatges i Sistemes Inform`

23 Advanced Topics 5: Multi-lingual Models Up until now, we have assumed that in the case of

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

TOS Arno Puder 1 Objectives Introduce the train simulator Using the model train

A-train Commuter Rail Updated July 31, 2018 Presentation Overview DCTA A-train Commuter Rail

Bethesda Big Train Partnership Presentation What is Big Train? Bethesda Big Train is a summer

Antwerp 50 by train Ghent 40 by

TRISTAN 2016, Aruba, June 2016 1 Real-time train rescheduling Train scheduling : routing and

Target Risk vs. Target Date Funds in 401(k) Plans: Maybe the answer is both January 14, 2015

TRAIN Demonstration TRAIN is a paperless Web based system designed to assist OSHPD employees

7.2 Ship Drive Train and Power Ship Drive Train System EHP Engine Reduction Screw Strut Gear

Train Smarter 1 The days of just training harder are over, you need to train smarter.

Systems and Level Crossings Railway Signalling Seminars Phillip James Overview Train

Natural Target Pruning Making Proper Pruning Cuts Natural Target Pruning In this lesson we

Cotton Incorporated TARGET SPOT UPDATE A. K. Hagan Auburn University TARGET SPOT Target Spot

LBNE 1.2MW Target NBI 2014 Presented by Brian Hartsell LBNE Target - Introduction Target

Semi-Heuristic Target-Based Fuzzy Target . . . Fuzzy Target . . . Fuzzy Decision Procedures:

Statistical Morphological Tagging and Parsing of Korean with an LTAG Grammar Anoop Sarkar and

X bb and Top- Tagging in ATLAS Mike Nelson, University of Oxford HF@LHC, 2017

Annotating and Automatically Tagging Constructions of Causal Language What Google displays for

Lemmatization and Morphosyntactic Tagging of Croatian and Serbian c c Danijela Merkler

Constraining h s s at lepton colliders Matthias Schla ff er Weizmann Institute of Science

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

The Tagging Task Part-of-Speech Tagging Input: the lead paint is unsafe Output: the/Det lead/N

Natural Language Processing with Python CS372: Spring, 20 15 Lecture 12 Categorizing and