Discriminative word alignment by learning the Discriminative word - - PowerPoint PPT Presentation

discriminative word alignment by learning the
SMART_READER_LITE
LIVE PREVIEW

Discriminative word alignment by learning the Discriminative word - - PowerPoint PPT Presentation

Discriminative word alignment by learning the Discriminative word alignment by learning the alignment structure and syntactic divergence alignment structure and syntactic divergence between a language pair between a language pair Sriram


slide-1
SLIDE 1

1 1

Discriminative word alignment by learning the Discriminative word alignment by learning the alignment structure and syntactic divergence alignment structure and syntactic divergence between a language pair between a language pair

Sriram Sriram Venkatapathy Venkatapathy Aravind Aravind Joshi Joshi IIIT IIIT – – Hyderabad University of Pennsylvania Hyderabad University of Pennsylvania

slide-2
SLIDE 2

2

Outline Outline

► ► Word Alignment

Word Alignment -

  • English

English-

  • Hindi Language Pair

Hindi Language Pair

► ► Related approaches

Related approaches

► ► Discriminative Re

Discriminative Re-

  • ranking approach

ranking approach

  • Features

Features

  • Parameter optimization using MIRA

Parameter optimization using MIRA

  • Results

Results

► ► Future Work and Conclusion

Future Work and Conclusion

slide-3
SLIDE 3

3

Word Word-

  • Alignment

Alignment

► ► People of these islands have adopted Hindi as a means of communi

People of these islands have adopted Hindi as a means of communication . cation .

► ► इन

इन प प क े क े लोग लोग ने ने हंद हंद भाषा भाषा को को एक एक संपक

  • संपक

भाषा भाषा क े क े प प म म अपना अपना िलया िलया है है . .

► ►

These islands of people These islands of people hindi hindi language a language a commu

  • commu. language in form of adopted

. language in form of adopted-

  • take

take-

  • be

be

► ► Primary Observation:

Primary Observation:

  • The alignment between English

The alignment between English-

  • Hindi is largely non

Hindi is largely non-

  • monotonic, unlike the

monotonic, unlike the alignment between English alignment between English-

  • French.

French.

slide-4
SLIDE 4

4

Comparison Comparison

Alignments of an example sentence (English-Hindi) 5 10 15 20 5 10 15 English word indices Hindi word indices Alignments of an example sentence (English-Hindi)

English English French Hindi

slide-5
SLIDE 5

5

Outline Outline

► ► Word Alignment

Word Alignment -

  • English

English-

  • Hindi Language Pair

Hindi Language Pair

► ► Related approaches

Related approaches

► ► Discriminative Re

Discriminative Re-

  • ranking approach

ranking approach

  • Features

Features

  • Parameter optimization using MIRA

Parameter optimization using MIRA

  • Results

Results

► ► Future Work and Conclusion

Future Work and Conclusion

slide-6
SLIDE 6

6

Related approaches Related approaches

► ► Generative models

Generative models

  • IBM Models, HMM models (Implemented in

IBM Models, HMM models (Implemented in Giza Giza+ + ) + + ) ► ► Discriminative models

Discriminative models

  • (

(Taskar Taskar et al., 2005) et al., 2005)

  • (Moore et al., 2005)

(Moore et al., 2005)

slide-7
SLIDE 7

7

Generative models Generative models -

  • Limitations

Limitations

  • Difficult to add new Parameters.

Difficult to add new Parameters.

► ► The generative story needs to be modified appropriately to

The generative story needs to be modified appropriately to incorporate the new parameters. incorporate the new parameters.

  • Parameters are not optimized.

Parameters are not optimized.

► ► All the parameters used have equal weights. For example,

All the parameters used have equal weights. For example, translation probabilities have the same importance as distortion translation probabilities have the same importance as distortion probabilities. probabilities.

► ► As more complex features are added to the model, the parameters

As more complex features are added to the model, the parameters need to be optimized appropriately. need to be optimized appropriately.

slide-8
SLIDE 8

8

( (Taskar Taskar et al., 2005) et al., 2005) -

  • Limitations

Limitations

  • The alignment search and optimization requires that the

The alignment search and optimization requires that the features are local to the alignment link. features are local to the alignment link.

  • There is 0

There is 0th

th order correlation with other alignments links in

  • rder correlation with other alignments links in

an alignment. an alignment.

  • (

(Lacoste Lacoste-

  • Simon et al., 2006) include first

Simon et al., 2006) include first-

  • order features
  • rder features

(similar to HMM Parameter) and fertility but still there isn (similar to HMM Parameter) and fertility but still there isn’ ’t t much room for more complex global features required for much room for more complex global features required for aligning diverse language pairs such as English aligning diverse language pairs such as English-

  • Hindi.

Hindi.

slide-9
SLIDE 9

9

(Moore et al., 2006) (Moore et al., 2006) -

  • Limitations

Limitations

► ► Structural features are applied on partial structures (

Structural features are applied on partial structures (ie ie.., .., every time a new alignment link is considered) every time a new alignment link is considered)

  • May lead to ruling out good alignments at an early stage.

May lead to ruling out good alignments at an early stage.

  • Restricts us from using more complex syntactic features.

Restricts us from using more complex syntactic features. (As it is a left to right search). (As it is a left to right search).

slide-10
SLIDE 10

10

Outline Outline

► ► Word Alignment

Word Alignment -

  • English

English-

  • Hindi Language Pair

Hindi Language Pair

► ► Related approaches

Related approaches

► ► Discriminative Re

Discriminative Re-

  • ranking approach

ranking approach

  • Features

Features

  • Parameter optimization using MIRA

Parameter optimization using MIRA

  • Results

Results

► ► Future Work and Conclusion

Future Work and Conclusion

slide-11
SLIDE 11

11

Discriminative Re Discriminative Re-

  • ranking Approach

ranking Approach

► ► The best alignment

The best alignment â â = = argmax argmax score( a | e, h) score( a | e, h)

► ► Here,

Here, e

e is the

is the english english and and h

h is the

is the hindi hindi sentence. sentence.

► ► score( a | e, h) =

score( a | e, h) = score scoreLa

La( a | e, h) +

( a | e, h) + score scoreS

S( a | e, h)

( a | e, h)

slide-12
SLIDE 12

12

Alignment search Alignment search (Discriminative Re (Discriminative Re-

  • ranking)

ranking)

► ► Three main steps

Three main steps

  • Populate the Beam

Populate the Beam

► ► Use local features to determine K

Use local features to determine K-

  • best alignments of source

best alignments of source words with words in the target sentence. words with words in the target sentence.

  • Re

Re-

  • order the Beam
  • rder the Beam

► ► Re

Re-

  • order the above alignments using structural features.
  • rder the above alignments using structural features.
  • Post

Post-

  • processing

processing

► ► Extend alignments to include other links that can be inferred

Extend alignments to include other links that can be inferred using simple rules. using simple rules.

slide-13
SLIDE 13

13

Alignment search Alignment search (Discriminative Re (Discriminative Re-

  • ranking)

ranking)

K=4

slide-14
SLIDE 14

14

Populate the Beam Populate the Beam

► ► Obtain K

Obtain K-

  • best candidate alignments using local scores .

best candidate alignments using local scores .

► ► Local score is computed by looking at the features of the

Local score is computed by looking at the features of the individual alignment links independently. individual alignment links independently.

► ► score

scoreL

L(e

(ej

j,

, h hk

k) = W.

) = W. f fL

L(e

(ej

j,

, h hk

k)

)

► ► score

scoreLa

La( a | e, h) =

( a | e, h) = ∑

∑ score

scoreL

L(e

(ej

j,

, h hk

k)

)

slide-15
SLIDE 15

15

Populate the Beam Populate the Beam -

  • 2

2

► ► Task: Populate the beam in the decreasing order of

Task: Populate the beam in the decreasing order of score scoreLa

La( a | e, h).

( a | e, h).

► ► Compute the local score of each source word with every

Compute the local score of each source word with every target word (including NULL). target word (including NULL).

► ► Top

Top-

  • k alignment links of each source word are chosen.

k alignment links of each source word are chosen.

slide-16
SLIDE 16

16

Populate the Beam Populate the Beam -

  • 2

2

► ► Populating K

Populating K-

  • best alignments

best alignments

  • Implemented using Priority Queues.

Implemented using Priority Queues. ► ► Initial State of Priority Queue

Initial State of Priority Queue

  • One entry representing the best alignment (set of best

One entry representing the best alignment (set of best alignment links). alignment links). ► ► At every iteration

At every iteration

  • Pop the best entry from the PQ.

Pop the best entry from the PQ.

  • Add it

Add it’

’s k successor entries back into the PQ.

s k successor entries back into the PQ.

slide-17
SLIDE 17

17

Re Re-

  • order the Beam
  • rder the Beam

► ► Structural scores are now added to the local scores

Structural scores are now added to the local scores

  • f the alignments in the beam in order to re
  • f the alignments in the beam in order to re-
  • order the
  • rder the

beam. beam.

  • score

scoreS

S(a

(a) = W . ) = W . f fS

S ( a )

( a ) ► ► Overall score =

Overall score = score scoreLa

La(a

(a) + ) + score scoreS

S(a

(a) )

► ► Structural features look at properties of the entire

Structural features look at properties of the entire alignment structure instead of individual alignment links. alignment structure instead of individual alignment links.

slide-18
SLIDE 18

18

Post Post-

  • processing

processing

► ► Previous two steps produce alignments which contain one

Previous two steps produce alignments which contain one-

  • to

to-

  • ne and many
  • ne and many-
  • to

to-

  • one mappings.
  • ne mappings.

► ► Goal is to include the best alignment structure from previous

Goal is to include the best alignment structure from previous step to include other alignment links of one step to include other alignment links of one-

  • to

to-

  • many/many

many/many-

  • to

to-

  • many types.

many types.

► ► New alignment links are added while processing source words in

New alignment links are added while processing source words in the breadth first order of the the breadth first order of the dependency structure

dependency structure.

.

slide-19
SLIDE 19

19

Post Post-

  • processing

processing

► ► Algorithm:

Algorithm:

► ► Let w be next word considered.

Let w be next word considered. pw pw = parent (w). = parent (w).

  • If w ,

If w , pw pw linked to one or more common words. linked to one or more common words.

Align w to all words already aligned with Align w to all words already aligned with pw pw. .

  • Else, Use simple target

Else, Use simple target-

  • specific rules to extend alignments of w.

specific rules to extend alignments of w. ► ► Recursively consider all the children of w

Recursively consider all the children of w

slide-20
SLIDE 20

20

Post Post-

  • processing

processing

slide-21
SLIDE 21

21

Outline Outline

► ► Word Alignment

Word Alignment -

  • English

English-

  • Hindi Language Pair

Hindi Language Pair

► ► Related approaches

Related approaches

► ► Discriminative Re

Discriminative Re-

  • ranking approach

ranking approach

  • Features

Features

  • Parameter optimization using MIRA

Parameter optimization using MIRA

  • Results

Results

► ► Future Work and Conclusion

Future Work and Conclusion

slide-22
SLIDE 22

22

Features Features -

  • Local

Local

► ►

DiceWords DiceWords ( (Taskar Taskar et al., 2005) et al., 2005)

► ►

DiceRoots DiceRoots : Lemmatized forms of : Lemmatized forms of e ej

j and

and h hk

k .

.

► ►

Dict Dict : Whether there exists an entry from source word : Whether there exists an entry from source word e ej

j to

to target word target word h hk

k.

.

► ►

Null(POS Null(POS) : Binary feature which is active when a ) : Binary feature which is active when a source word with a particular part source word with a particular part-

  • of
  • f-
  • speech tag is

speech tag is aligned with NULL. aligned with NULL.

slide-23
SLIDE 23

23

Structural Features Structural Features

► ►

Overlap Overlap

  • This feature considers the instances in a sentence pair where a

This feature considers the instances in a sentence pair where a source word links to a target word which is a participant in mor source word links to a target word which is a participant in more e than one alignment link. than one alignment link.

<

slide-24
SLIDE 24

24

Structural Features Structural Features

► ►

Null Percent Null Percent

  • This feature measures the percentage of words in target

This feature measures the percentage of words in target sentence with zero fertility. sentence with zero fertility.

slide-25
SLIDE 25

25

Structural Features Structural Features

► ►

Direction of Dependency Pair Direction of Dependency Pair

  • Captures first order interdependence between the alignments link

Captures first order interdependence between the alignments links s connected to two sources connected by a dependency relation. connected to two sources connected by a dependency relation.

  • One way to measure such interdependence is by noting the order

One way to measure such interdependence is by noting the order

  • f target sentence words the child and the parent of a s
  • f target sentence words the child and the parent of a source
  • urce

sentence dependency relation. sentence dependency relation.

  • Three possible orders (next slide).

Three possible orders (next slide).

slide-26
SLIDE 26

26

Structural Features Structural Features

► ►

Direction of Dependency Pair Direction of Dependency Pair

► ►

The feature thus captures a simple divergence between The feature thus captures a simple divergence between the source and target dependency structures. the source and target dependency structures.

slide-27
SLIDE 27

27

Outline Outline

► ► Word Alignment

Word Alignment -

  • English

English-

  • Hindi Language Pair

Hindi Language Pair

► ► Related approaches

Related approaches

► ► Discriminative Re

Discriminative Re-

  • ranking approach

ranking approach

  • Features

Features

  • Parameter optimization using MIRA

Parameter optimization using MIRA

  • Results

Results

► ► Future Work and Conclusion

Future Work and Conclusion

slide-28
SLIDE 28

28

Online large margin Training Online large margin Training using MIRA using MIRA

► ► For parameter optimization, we used online

For parameter optimization, we used online-

  • large margin

large margin algorithm called MIRA (Crammer and Singer, 2005; algorithm called MIRA (Crammer and Singer, 2005; McDonald et al., 2005). McDonald et al., 2005).

► ► If T = { (x

If T = { (xi

i,

, y yi

i) }

) } m

m be gold data, where x

be gold data, where xi

i is the

is the i ith

th sentence

sentence pair, pair, y yi

i is the corresponding gold alignment. The task is to

is the corresponding gold alignment. The task is to learn the weight vector W such that, learn the weight vector W such that,

slide-29
SLIDE 29

29

Online large margin Training Online large margin Training using MIRA using MIRA

► ► For a sentence pair, the weight should be optimized in

For a sentence pair, the weight should be optimized in the following fashion. the following fashion.

► ► Online training algorithm.

Online training algorithm.

Minimize || wi+1 - wi || Such that

  • w. f(xi, yi) -
  • w. f(xi, y’i) >= loss (yi, y’i)

For all, (xi,yi) E T , y’i E K-best Predictions ( xi )

slide-30
SLIDE 30

30

Outline Outline

► ► Word Alignment

Word Alignment -

  • English

English-

  • Hindi Language Pair

Hindi Language Pair

► ► Related approaches

Related approaches

► ► Discriminative Re

Discriminative Re-

  • ranking approach

ranking approach

  • Features

Features

  • Parameter optimization using MIRA

Parameter optimization using MIRA

  • Results

Results

► ► Future Work and Conclusion

Future Work and Conclusion

slide-31
SLIDE 31

31

Data Data

► ► Unsupervised data: 50,000 sentence pairs

Unsupervised data: 50,000 sentence pairs

► ► Supervised data

Supervised data

  • Training : 4252 sentence pairs

Training : 4252 sentence pairs

  • Testing : 100 sentence pairs

Testing : 100 sentence pairs

slide-32
SLIDE 32

32

GIZA++ results GIZA++ results

slide-33
SLIDE 33

33

Results using local features Results using local features

Features Features Precision Precision Recall Recall F F-

  • measure

measure AER AER Dicewords Dicewords + + Diceroots Diceroots 41.49 41.49 38.71 38.71 40.05 40.05 59.95 59.95 + + Null_POS Null_POS 42.82 42.82 38.29 38.29 40.43 40.43 59.57 59.57 + + Dict Dict 43.94 43.94 39.30 39.30 41.49 41.49 58.51 58.51 + Word pairs + Word pairs 46.27 46.27 41.07 41.07 43.52 43.52 56.48 56.48

slide-34
SLIDE 34

34

Results after adding Global features Results after adding Global features

Features Features Precision Precision Recall Recall F F-

  • measure

measure AER AER Local feats. Local feats. 46.27 46.27 41.07 41.07 43.52 43.52 56.48 56.48 Local feats. Local feats. + Overlap + Overlap 48.17 48.17 42.76 42.76 45.30 45.30 54.70 54.70 Local feats Local feats + + Direct_Deppair Direct_Deppair 47.93 47.93 42.55 42.55 45.08 45.08 54.92 54.92 Local feats Local feats + All + All struct

  • struct. feats

. feats 48.81 48.81 43.31 43.31 45.90 45.90 54.10 54.10

slide-35
SLIDE 35

35

Adding structural features to Adding structural features to Giza Giza transition probabilities transition probabilities

Features Features Precision Precision Recall Recall F F-

  • measure

measure AER AER IBM Model IBM Model-

  • 4

4

  • Pars. + Local
  • Pars. + Local

feats. feats. 48.85 48.85 43.98 43.98 46.29 46.29 52.71 52.71 Local feats. Local feats. + All + All struct struct. . feats feats 48.95 48.95 50.06 50.06 49.50 49.50 50.50 50.50

slide-36
SLIDE 36

36

Outline Outline

► ► Word Alignment

Word Alignment -

  • English

English-

  • Hindi Language Pair

Hindi Language Pair

► ► Related approaches

Related approaches

► ► Discriminative Re

Discriminative Re-

  • ranking approach

ranking approach

  • Features

Features

  • Parameter optimization using MIRA

Parameter optimization using MIRA

  • Results

Results

► ► Future Work and Conclusion

Future Work and Conclusion

slide-37
SLIDE 37

37

Future work Future work

► ► Experiment with more sophisticated structural

Experiment with more sophisticated structural features. features.

► ► Design an transducer (dependency based) which

Design an transducer (dependency based) which uses parameter weights learnt by our approach uses parameter weights learnt by our approach and the LM. and the LM.

slide-38
SLIDE 38

38

Future work Future work

► ► Merge the two alignment search steps to make

Merge the two alignment search steps to make better use of structural features. better use of structural features.

slide-39
SLIDE 39

39

THANK YOU THANK YOU Questions and Suggestions ? Questions and Suggestions ?