Speeding up target-language driven part-of-speech tagger training - - PowerPoint PPT Presentation

speeding up target language driven part of speech tagger
SMART_READER_LITE
LIVE PREVIEW

Speeding up target-language driven part-of-speech tagger training - - PowerPoint PPT Presentation

Speeding up TL driven part-of-speech tagger training for MT Speeding up target-language driven part-of-speech tagger training for machine translation Felipe Snchez-Martnez Juan Antonio Prez-Ortiz Mikel L. Forcada Transducens Group


slide-1
SLIDE 1

Speeding up TL driven part-of-speech tagger training for MT

Speeding up target-language driven part-of-speech tagger training for machine translation

Felipe Sánchez-Martínez Juan Antonio Pérez-Ortiz Mikel L. Forcada

Transducens Group – Departament de Llenguatges i Sistemes Informàtics Universitat d’Alacant, E-03071 Alacant, Spain {fsanchez,japerez,mlf}@dlsi.ua.es

MICAI, 5th Mexican International Conference on Artificial Intelligence Apizaco, México November 16, 2006

slide-2
SLIDE 2

Speeding up TL driven part-of-speech tagger training for MT

Outline

1

Introduction Part-of-speech tagging for machine translation Part-of-speech tagging with HMM

2

Target-language driven HMM training Method overview Disadvantage

3

Pruning of disambiguation paths Pruning method HMM updating

4

Experiments Overview Results

5

Discussion Concluding remarks Future work

slide-3
SLIDE 3

Speeding up TL driven part-of-speech tagger training for MT Introduction

Outline

1

Introduction Part-of-speech tagging for machine translation Part-of-speech tagging with HMM

2

Target-language driven HMM training Method overview Disadvantage

3

Pruning of disambiguation paths Pruning method HMM updating

4

Experiments Overview Results

5

Discussion Concluding remarks Future work

slide-4
SLIDE 4

Speeding up TL driven part-of-speech tagger training for MT Introduction Part-of-speech tagging for machine translation

Part-of-speech tagging

Part-of-speech tagging: determining the lexical category or part-of-speech (PoS) of each word that appears in a text Lexically ambiguous word: word with more than one possible lexical category or PoS Lemma PoS book book noun book verb Ambiguities are usually solved according to the surrounding context

2/22

slide-5
SLIDE 5

Speeding up TL driven part-of-speech tagger training for MT Introduction Part-of-speech tagging for machine translation

PoS tagging for machine translation /1

Indirect rule-base machine translation (MT) systems usually perform PoS tagging as a subtask of the analysis procedure source text → Analysis → Transfer → Generation → target text

3/22

slide-6
SLIDE 6

Speeding up TL driven part-of-speech tagger training for MT Introduction Part-of-speech tagging for machine translation

PoS tagging for machine translation /2

PoS tagging becomes crucial Translation may differ from one PoS to another

English PoS Spanish book noun libro verb reservar

Some transformation is applied (or not) for some PoS

English PoS Spanish reordering the green house green-adj la casa verde ←rule green-noun * el césped casa applied

4/22

slide-7
SLIDE 7

Speeding up TL driven part-of-speech tagger training for MT Introduction Part-of-speech tagging with HMM

PoS tagging with HMM

Hidden Markov models are one of the standard statistical solutions for PoS tagging

verb noun . . . verb | noun verb verb | noun | adj . . . noun noun | verb noun | prp . . . . . . 0.1 0.2 0.02 0.4 0.2 0.01 0.08 0.12 verb noun

Each HMM state corresponds to a different PoS tag Each input word is replaced by its corresponding ambiguity class

5/22

slide-8
SLIDE 8

Speeding up TL driven part-of-speech tagger training for MT Introduction Part-of-speech tagging with HMM

HMM parameter estimation

Supervisedly (non-ambiguous corpora available):

Maximum-likelihood estimate (MLE)

Unsupervisedly (only ambiguous corpora available):

Baum-Welch (Expectation-maximization, EM) Our recently proposed (Sánchez-Martínez et al. 2004) target-language (TL) driven method ...

6/22

slide-9
SLIDE 9

Speeding up TL driven part-of-speech tagger training for MT Target-language driven HMM training

Outline

1

Introduction Part-of-speech tagging for machine translation Part-of-speech tagging with HMM

2

Target-language driven HMM training Method overview Disadvantage

3

Pruning of disambiguation paths Pruning method HMM updating

4

Experiments Overview Results

5

Discussion Concluding remarks Future work

slide-10
SLIDE 10

Speeding up TL driven part-of-speech tagger training for MT Target-language driven HMM training Method overview

Target-language driven method overview

The method uses the MT system in which the resulting tagger will be embedded; however it will also work for other natural language processing tasks A target-language (TL) model is used to choose the best disambiguations HMM parameters are calculated according to the likelihood

  • f the corresponding translations into TL

The resulting tagger is tuned to the translation quality

7/22

slide-11
SLIDE 11

Speeding up TL driven part-of-speech tagger training for MT Target-language driven HMM training Method overview

Example

Source-language (SL) sentence (English):

He-prn books-noun|verb the-art room-noun|verb

Possible translations (Spanish) according to each disambiguation and their normalized likelihoods according to a target-language (TL) model:

  • Él-prn reserva-verb la-art habitación-noun

0.75

  • Él-prn reserva-verb la-art aloja-verb

0.15

  • Él-prn libros-noun la-art habitación-noun

0.06

  • Él-prn libros-noun la-art aloja-verb

+ 0.04 1.00 The HMM parameters involved in these 4 disambiguations are updated according to their likelihoods in TL

8/22

slide-12
SLIDE 12

Speeding up TL driven part-of-speech tagger training for MT Target-language driven HMM training Method overview

Example

Source-language (SL) sentence (English):

He-prn books-noun|verb the-art room-noun|verb

Possible translations (Spanish) according to each disambiguation and their normalized likelihoods according to a target-language (TL) model:

  • Él-prn reserva-verb la-art habitación-noun

0.75

  • Él-prn reserva-verb la-art aloja-verb

0.15

  • Él-prn libros-noun la-art habitación-noun

0.06

  • Él-prn libros-noun la-art aloja-verb

+ 0.04 1.00 The HMM parameters involved in these 4 disambiguations are updated according to their likelihoods in TL

8/22

slide-13
SLIDE 13

Speeding up TL driven part-of-speech tagger training for MT Target-language driven HMM training Method overview

Example

Source-language (SL) sentence (English):

He-prn books-noun|verb the-art room-noun|verb

Possible translations (Spanish) according to each disambiguation and their normalized likelihoods according to a target-language (TL) model:

  • Él-prn reserva-verb la-art habitación-noun

0.75

  • Él-prn reserva-verb la-art aloja-verb

0.15

  • Él-prn libros-noun la-art habitación-noun

0.06

  • Él-prn libros-noun la-art aloja-verb

+ 0.04 1.00 The HMM parameters involved in these 4 disambiguations are updated according to their likelihoods in TL

8/22

slide-14
SLIDE 14

Speeding up TL driven part-of-speech tagger training for MT Target-language driven HMM training Disadvantage

Disadvantage

The number of possible disambiguations to translate grows exponentially with the segment length Translation is the most time-consuming task Consequence: Segment length must be constrained to keep complexity under control

Potential benefits of likelihood estimated from longer segments is rejected

Goal: To overcome this problem How? Pruning unlikely disambiguation paths by using a priori knowledge

9/22

slide-15
SLIDE 15

Speeding up TL driven part-of-speech tagger training for MT Pruning of disambiguation paths

Outline

1

Introduction Part-of-speech tagging for machine translation Part-of-speech tagging with HMM

2

Target-language driven HMM training Method overview Disadvantage

3

Pruning of disambiguation paths Pruning method HMM updating

4

Experiments Overview Results

5

Discussion Concluding remarks Future work

slide-16
SLIDE 16

Speeding up TL driven part-of-speech tagger training for MT Pruning of disambiguation paths Pruning method

Pruning method /1

Based on an initial model of SL tags (Mtag) Assumption: Any reasonable model of SL tags may be useful to choose a set of possible disambiguation paths, being the correct one in that set

It is not necessary to translate all possible disambiguation paths, but the “promising” ones

The model used for pruning can be update dynamically

10/22

slide-17
SLIDE 17

Speeding up TL driven part-of-speech tagger training for MT Pruning of disambiguation paths Pruning method

Pruning method /2

1

The a priori likelihood p(gi|s, Mtag) of each possible disambiguation path gi of segment s is calculated using the model Mtag

2

Then, the set of disambiguation paths to take into account is determined:

Only the most likely disambiguation paths A mass probability threshold ρ is introduced The set of disambiguation paths taken into account satisfies ρ ≤

  • ∀gi∈T(s)

p(gi|s, Mtag)

11/22

slide-18
SLIDE 18

Speeding up TL driven part-of-speech tagger training for MT Pruning of disambiguation paths HMM updating

HMM updating

The model Mtag used for pruning can be updated with the new evidences collected from the TL The update consist of:

1

Calculating the HMM parameters with the counts collected from the TL

2

Mixing the parameters of the new HMM with the initial one

12/22

slide-19
SLIDE 19

Speeding up TL driven part-of-speech tagger training for MT Pruning of disambiguation paths HMM updating

HMM parameters mixing

Let θ = (aγ1γ1, ..., aγ|Γ|γ|Γ|, bγ1σ1, ..., bγ|Γ|σ|Σ|) be a vector containing all the parameters of a given HMM Mixing equation: θmixed(x) = λ(x) θTL(x) + (1 − λ(x)) θinit λ(x) assigns a weight to the model estimated using the counts collected from the TL (θTL)

This weight function is made to depend on the number x of SL words processed so far λ(x) = x/C

13/22

slide-20
SLIDE 20

Speeding up TL driven part-of-speech tagger training for MT Experiments

Outline

1

Introduction Part-of-speech tagging for machine translation Part-of-speech tagging with HMM

2

Target-language driven HMM training Method overview Disadvantage

3

Pruning of disambiguation paths Pruning method HMM updating

4

Experiments Overview Results

5

Discussion Concluding remarks Future work

slide-21
SLIDE 21

Speeding up TL driven part-of-speech tagger training for MT Experiments Overview

Overview

Task: Training a Spanish PoS tagger Catalan being the TL TL model: Trigram language model trained from a Catalan corpus with around 2 000 000 words SL corpora: 5 Spanish disjoint corpora of 500 000 words Initial model: estimated through Kupiec’s method HMM updating: after every 1 000 words Mass probability threshold: 0.1 ≤ ρ ≤ 1.0, increment: 0.1 Evaluation: hand-tagged corpus with around 8 000 words

14/22

slide-22
SLIDE 22

Speeding up TL driven part-of-speech tagger training for MT Experiments Overview

Framework

Open-source shallow transfer MT engine Apertium, http://apertium.org Packages: lttoolbox-1.0.1, apertium-1.0.1, apertium-es-ca-1.0.1 The method presented (including the language model) is implemented inside package apertium-tagger-training-tools All packages, including source code, can be freely downloaded from http://sourceforge.net/projects/apertium

15/22

slide-23
SLIDE 23

Speeding up TL driven part-of-speech tagger training for MT Experiments Overview

Apertium working scheme

Shallow-transfer machine translation architecture

lexical transfer

  • SL

text→ morph. analyser → PoS tagger → struct. transfer → morph. generator → TL text

PoS tagger is trained by using the rest of the modules of the MT engine after it The morphological analyzer is used to preprocess SL texts

16/22

slide-24
SLIDE 24

Speeding up TL driven part-of-speech tagger training for MT Experiments Results

Results /1

Mean and std. dev. of the PoS tagging error rate achieved after training for each value of ρ

30 29 28 27 26 25 24 23 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 PoS error (% of ambiguous words) mass probability threshold (ρ) 17/22

slide-25
SLIDE 25

Speeding up TL driven part-of-speech tagger training for MT Experiments Results

Results /2

Evolution of the mean and std. dev. of the PoS tagging error rate of the mixed model used for pruning for ρ = 0.6

32 31 30 29 28 27 26 25 24 23 22 500000 400000 300000 200000 100000 PoS error (% of ambiguous words) SL words

18/22

slide-26
SLIDE 26

Speeding up TL driven part-of-speech tagger training for MT Experiments Results

Results /3

Percentage of translated words for each value of ρ

100 90 80 70 60 50 40 30 20 10 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 ratio of translated words (%) mass probability threshold (ρ) 19/22

slide-27
SLIDE 27

Speeding up TL driven part-of-speech tagger training for MT Discussion

Outline

1

Introduction Part-of-speech tagging for machine translation Part-of-speech tagging with HMM

2

Target-language driven HMM training Method overview Disadvantage

3

Pruning of disambiguation paths Pruning method HMM updating

4

Experiments Overview Results

5

Discussion Concluding remarks Future work

slide-28
SLIDE 28

Speeding up TL driven part-of-speech tagger training for MT Discussion Concluding remarks

Concluding remarks

The pruning method avoids more than 80% of the translations to perform The results achieved are even better than when no pruning is performed, when ρ = 1.0

HMM parameters involved in those discarded disambiguations have a null count When no pruning is done their counts are small but never null

20/22

slide-29
SLIDE 29

Speeding up TL driven part-of-speech tagger training for MT Discussion Future work

Future work

Try other weighting functions giving earlier a higher weight to the model being learned from the TL

Test how fast the TL-driven method learns

Test two additional strategies to select the disambiguation paths to take into account

Dynamically change the value of the mass probability threshold ρ while training Instead of using ρ, always select a fix number k of disambiguation paths to translate

21/22

slide-30
SLIDE 30

Speeding up TL driven part-of-speech tagger training for MT Discussion Future work

Further reading

Sánchez-Martínez F ., J.A. Pérez-Ortiz and M. L. Forcada Exploring the use of target-language information to train the part-of-speech tagger of machine translation systems Lecture Notes in Computer Science 3230 (Advances in Natural Language Processing, Proceedings of EsTAL - España for Natural Language Processing), p. 137–148, 2004. Corbí-Bellot A. M., M. L. Forcada, S. Ortiz-Rojas, J. A. Pérez-Ortiz, G. Ramírez-Sánchez, F . Sánchez-Martínez, I. Alegria, A. Mayor, K. Sarasola An open-source shallow-transfer machine translation engine for the Romance languages of Spain Proceedings of the Tenth Conference of the European Associtation for Machine Translation, p. 79–80, 2005.

22/22

slide-31
SLIDE 31

Speeding up TL driven part-of-speech tagger training for MT Discussion Future work

Speeding up target-language driven part-of-speech tagger training for machine translation

Felipe Sánchez-Martínez Juan Antonio Pérez-Ortiz Mikel L. Forcada

Transducens Group – Departament de Llenguatges i Sistemes Informàtics Universitat d’Alacant, E-03071 Alacant, Spain {fsanchez,japerez,mlf}@dlsi.ua.es

MICAI, 5th Mexican International Conference on Artificial Intelligence Apizaco, México November 16, 2006