Maximum Entropy Models for y Realization Ranking yz Erik Velldal - - PowerPoint PPT Presentation

maximum entropy models for
SMART_READER_LITE
LIVE PREVIEW

Maximum Entropy Models for y Realization Ranking yz Erik Velldal - - PowerPoint PPT Presentation

Maximum Entropy Models for y Realization Ranking yz Erik Velldal <erik.velldal@iln.uio.no> y Department of Linguistics and Scandinavian Studies, Stephan Oepen <oe@csli.stanford.edu> z Center for the Study of Language and


slide-1
SLIDE 1

Maximum Entropy Models for Realization Ranking

Erik Velldal

y

<erik.velldal@iln.uio.no>

Stephan Oepen

yz

<oe@csli.stanford.edu>

y Department of Linguistics and Scandinavian Studies,

University of Oslo (Norway)

z Center for the Study of Language and Information,

Stanford (USA)

slide-2
SLIDE 2

Maximum Entropy Models for Realization Ranking 1

Realization Ranking

The problem: Ambiguity in generation, – many ways to formulate a given

meaning.

A solution: Use statistics for modeling preferences and soft constraints

(grammaticality already guaranteed).

Trained and tested three types of models:

1)

n-gram language models (surface oriented)

2) maximum entropy model (structural features) 3) a combination of 1) and 2)

LOGON

slide-3
SLIDE 3

Maximum Entropy Models for Realization Ranking 2

Overview

Generation in the LOGON MT-system and the problem of realization

ranking.

Reference experiments: random choice and n-gram language models. The relation to parse selection. Treebank data and maximum entropy

models (MaxEnt).

A combined model: MaxEnt + language model. Results, future work and discussion.

LOGON

slide-4
SLIDE 4

Maximum Entropy Models for Realization Ranking 3

Generation in the LOGON MT-system

LOGON

– Aims at high precision Norwegian–English MT of texts in the tourism domain. – Symbolic, rule-based system, centered on semantic transfer using Minimal Recursion Semantics (MRS; Copestake, Flickinger, Malouf, Riehemann, & Sag, 1995). Includes stochastic methods for ambiguity management.

  • LOGON
slide-5
SLIDE 5

Maximum Entropy Models for Realization Ranking 3

Generation in the LOGON MT-system

LOGON

– Aims at high precision Norwegian–English MT of texts in the tourism domain. – Symbolic, rule-based system, centered on semantic transfer using Minimal Recursion Semantics (MRS; Copestake, Flickinger, Malouf, Riehemann, & Sag, 1995). Includes stochastic methods for ambiguity management.

The LKB Chart Generator (Carroll, Copestake, Flickinger, & Poznanski,

1999; Carroll & Oepen, 2005). – Lexically-driven, bottom-up chart generation from MRSs. – Generation based on the LinGO English Resource Grammar (ERG; Flickinger, 2002); a general-purpose, wide-coverage grammar, designed using HPSG and MRS.

LOGON

slide-6
SLIDE 6

Maximum Entropy Models for Realization Ranking 4

Generator Ambiguity

Caused by e.g. the optionality of complementizers and relative pronouns,

permutation of (intersective) modifiers, different possible topicalizations, as well as lexical and orthographic alternations.

Average number of realizations in the current data set is 73 (max = 5712).

All realizations of a given MRS are guaranteed to be semantically (truth-conditionally) equivalent. Grammaticality is ensured wrt. the underlying grammar (LinGO ERG).

Remember that dogs must be on a leash. Remember dogs must be on a leash. On a leash remember that dogs must be. On a leash remember dogs must be. A leash remember that dogs must be on. A leash remember dogs must be on. Dogs remember must be on a leash.

LOGON

slide-7
SLIDE 7

Maximum Entropy Models for Realization Ranking 5

A Language Model Ranker

The most common approach to the problem of generator ambiguity is to

use

n-gram statistics (Langkilde & Knight, 1998; White, 2004;

Callison-Burch & Flournoy, 2001).

Score and rank strings using a language model; p n (w 1 ; : : : ; w k ) = Q k i=1 p(w i jw in ; : : : ; w i1 ). Trained a 4-gram model on the BNC (100 mill. words).

LOGON

slide-8
SLIDE 8

Maximum Entropy Models for Realization Ranking 6

A Language Model Ranker (Cont’d)

Results on the LOGON data set ‘Rondane’ (864 test items, to be detailed

later in the talk): – Exact match accuracy:

48:46%

(Random choice baseline:

18:03%)

– BLEU score:

0:8776

(Random choice baseline:

0:727) Limitations: Can not model dependencies between non-contiguous
  • words. No linguistic information. Does not condition the output (string)
  • n the input (MRS).

LOGON

slide-9
SLIDE 9

Maximum Entropy Models for Realization Ranking 7

The Relation to Parse Selection

The problem of selecting the best realization can be seen to be “inversely

similar” to the problem of selecting the best parse.

  • p(analysisjutterance
) vs. p(utterance janalysis)
  • LOGON
slide-10
SLIDE 10

Maximum Entropy Models for Realization Ranking 7

The Relation to Parse Selection

The problem of selecting the best realization can be seen to be “inversely

similar” to the problem of selecting the best parse.

  • p(analysisjutterance
) vs. p(utterance janalysis) Toutanova, Manning, Shieber, Flickinger, & Oepen (2002) implement a

MaxEnt model for parse disambiguation using the Redwoods HPSG treebank.

Features defined over derivation trees with non-terminals representing the

construction types and lexical types of the grammar.

  • LOGON
slide-11
SLIDE 11

Maximum Entropy Models for Realization Ranking 7

The Relation to Parse Selection

The problem of selecting the best realization can be seen to be “inversely

similar” to the problem of selecting the best parse.

  • p(analysisjutterance
) vs. p(utterance janalysis) Toutanova, Manning, Shieber, Flickinger, & Oepen (2002) implement a

MaxEnt model for parse disambiguation using the Redwoods HPSG treebank.

Features defined over derivation trees with non-terminals representing the

construction types and lexical types of the grammar.

We train a realization ranker in much the same way. Requires different types of treebanks for training.

LOGON

slide-12
SLIDE 12

Maximum Entropy Models for Realization Ranking 8

Treebanks for Parse Selection

Training data for parse selection models is typi- cally given by (1) a tree- bank of utterances paired with their optimal anal- yses, together with (2) all their competing (sub-

  • ptimal) analyses.

Analyses Utterances

LOGON

slide-13
SLIDE 13

Maximum Entropy Models for Realization Ranking 9

Symmetric Treebanks

To produce a symmetric treebank, exhaustively generate all paraphrases

  • f

the treebanked analyses, and assume

  • ptimality relation to be

bidirectional (Velldal, Oepen, & Flickinger, 2004).

Analyses Utterances

LOGON

slide-14
SLIDE 14

Maximum Entropy Models for Realization Ranking 10

Treebanks for Realization Ranking

We now have the train- ing data for a realization ranking model, given by (1) a treebank of anal- yses paired with their

  • ptimal utterances,

to- gether with (2) all com- peting (suboptimal) can- didates.

Analyses Utterances

LOGON

slide-15
SLIDE 15

Maximum Entropy Models for Realization Ranking 11

The Rondane Treebank

items words ambiguity baseline Aggregate

  • %
100 readings

87 20.5 580.8 0.42

50 readings < 100

61 17.3 73.0 1.44

10 readings < 50

269 15.1 22.5 5.61

5 < readings < 10

172 11.1 6.9 15.66

1 < readings < 5

275 8.8 2.8 40.9 Total 864 13.0 72.9 18.03 The treebank data binned with respect to generator ambiguity, for each group showing the total number of items, average string length, average number of paraphrases, and a random choice baseline for accuracy.

LOGON

slide-16
SLIDE 16

Maximum Entropy Models for Realization Ranking 12

Maximum Entropy Models

Given by a set of features ff 1 ; : : : ; f m g and a set of associated weights f 1 ; : : : ;
  • m
g. The real-valued feature-functions describe relevant properties of the data

items.

The lambda weights determine the contribution or importance of each

feature.

  • Prob. of a realization
r given a semantics s: p(r js) = 1 Z (s) exp ( P i
  • i
f i (r )) Learning amounts to finding the optimal weights that maximize the

likelihood of the training corpus.

LOGON

slide-17
SLIDE 17

Maximum Entropy Models for Realization Ranking 13

MaxEnt Features

subjh

  • H
H H H H

hspec

  • H
H H

det_the_le the sing_noun n_intr_le dog third_sg_fin_verb v_unerg_le barks

Sample HPSG derivation tree for the dog barks. Features record local deriva- tion sub-trees with different degrees

  • f

lexicalization, levels

  • f

grandparenting, etc. Additional features record

n-grams over lexical

types.

LOGON

slide-18
SLIDE 18

Maximum Entropy Models for Realization Ranking 14

MaxEnt Features

subjh

  • H
H H H H

hspec

  • H
H H

det_the_le the sing_noun n_intr_le dog third_sg_fin_verb v_unerg_le barks

Sample HPSG derivation tree for the dog barks. Features record local deriva- tion sub-trees with different degrees

  • f

lexicalization, levels

  • f

grandparenting, etc. Additional features record

n-grams over lexical

types.

LOGON

slide-19
SLIDE 19

Maximum Entropy Models for Realization Ranking 15

MaxEnt Features

subjh

  • H
H H H H

hspec

  • H
H H

det_the_le the sing_noun n_intr_le dog third_sg_fin_verb v_unerg_le barks

Sample HPSG derivation tree for the dog barks. Features record local deriva- tion sub-trees with different degrees

  • f

lexicalization, levels

  • f

grandparenting, etc. Additional features record

n-grams over lexical

types.

LOGON

slide-20
SLIDE 20

Maximum Entropy Models for Realization Ranking 16

MaxEnt Features

subjh

  • H
H H H H

hspec

  • H
H H

det_the_le the sing_noun n_intr_le dog third_sg_fin_verb v_unerg_le barks

Sample HPSG derivation tree for the dog barks. Features record local deriva- tion sub-trees with different degrees

  • f

lexicalization, levels

  • f

grandparenting, etc. Additional features record

n-grams over lexical

types.

LOGON

slide-21
SLIDE 21

Maximum Entropy Models for Realization Ranking 17

MaxEnt Features

subjh

  • H
H H H H

hspec

  • H
H H

det_the_le the sing_noun n_intr_le dog third_sg_fin_verb v_unerg_le barks

Sample HPSG derivation tree for the dog barks. Features record local deriva- tion sub-trees with different degrees

  • f

lexicalization, levels

  • f

grandparenting, etc. Additional features record

n-grams over lexical

types.

LOGON

slide-22
SLIDE 22

Maximum Entropy Models for Realization Ranking 18

MaxEnt Features

subjh

  • H
H H H H

hspec

  • H
H H

det_the_le the sing_noun n_intr_le dog third_sg_fin_verb v_unerg_le barks

Sample HPSG derivation tree for the dog barks. Features record local deriva- tion sub-trees with different degrees

  • f

lexicalization, levels

  • f

grandparenting, etc. Additional features record

n-grams over lexical

types.

LOGON

slide-23
SLIDE 23

Maximum Entropy Models for Realization Ranking 19

The MaxEnt Ranker

Exact match accuracy: 61:58% BLEU: 0:903 When training and testing by 10-fold cross-validation on the small

‘Rondane’ data set, we get results competitive with a language model trained on the entire BNC. – Structural features are a good thing. – Having training data attuned to the domain is a good thing.

LOGON

slide-24
SLIDE 24

Maximum Entropy Models for Realization Ranking 20

A Combined Ranker

Many non-overlapping errors made by the different models, leaving more

to be gained by combining the two.

We can throw in the n-gram probabilities as a separate feature in the

MaxEnt model to get a combined model.

  • 65:63%
  • 0:920

LOGON

slide-25
SLIDE 25

Maximum Entropy Models for Realization Ranking 20

A Combined Ranker

Many non-overlapping errors made by the different models, leaving more

to be gained by combining the two.

We can throw in the n-gram probabilities as a separate feature in the

MaxEnt model to get a combined model.

Exact match accuracy: 65:63% BLEU: 0:920

LOGON

slide-26
SLIDE 26

Maximum Entropy Models for Realization Ranking 21

Exact Match Accuracy

10 20 30 40 50 60 70 80 1-5 5-10 10-50 50-100 100-5712 Language Model MaxEnt Combined

Exact match accuracy scores for the different models. Data items are binned with respect to number of distinct realizations.

LOGON

slide-27
SLIDE 27

Maximum Entropy Models for Realization Ranking 22

BLEU

0.7 0.75 0.8 0.85 0.9 0.95 1 1-5 5-10 10-50 50-100 100-5712 Language Model MaxEnt Combined

Averaged sentence-level BLEU scores for the different models. Data items are binned with respect to number of distinct realizations.

LOGON

slide-28
SLIDE 28

Maximum Entropy Models for Realization Ranking 23

Summary

Successful combination of linguistic grammar and stochastic

disambiguation for target language generation in a hybrid MT system (LOGON).

The ranking module benefits from combining statistics from different

sources; surface oriented

n-grams in addition to structural features of

derivation trees.

Ongoing work: Generation from packed MRSs and selective unpacking

(Carroll & Oepen, 2005).

LOGON

slide-29
SLIDE 29

Maximum Entropy Models for Realization Ranking

Bibliography

References

Callison-Burch, C., & Flournoy, R. S. (2001). A program for automatically selecting the best output from multiple machine translation engines. In Proceedings of the MT Summit. Santiago, Spain. Carroll, J., Copestake, A., Flickinger, D., & Poznanski, V. (1999). An efficient chart generator for (semi-)- lexicalist grammars. In Proceedings of the 7th European Workshop on Natural Language Gener- ation (pp. 86 – 95). Toulouse, France. Carroll, J., & Oepen, S. (2005). High efficiency realization for a wide-coverage unification grammar. In R. D. and (Ed.), Proceedings of the 2nd International Joint Conference on Natural Language

  • Processing. Jeju, Republic of Korea.

Copestake, A., Flickinger, D., Malouf, R., Riehemann, S., & Sag, I. (1995). Translation using minimal recursion semantics. In Proceedings of the Sixth International Conference on Theoretical and Methodological Issues in Machine Translation. Leuven, Belgium. Flickinger, D. (2002). On building a more efficient grammar by exploiting types. In S. Oepen,

  • D. Flickinger, J. Tsujii, , & H. Uszkoreit (Eds.), Collaborative language engineering: A case

study in efficient grammar-based processing (pp. 1–17). CSLI Press. Langkilde, I., & Knight, K. (1998). The practical value of n-grams in generation. In International natural language generation workshop.

LOGON

slide-30
SLIDE 30

Maximum Entropy Models for Realization Ranking

Bibliography Toutanova, K., Manning, C. D., Shieber, S. M., Flickinger, D., & Oepen, S. (2002). Parse disambiguation for a rich hpsg grammar. In First workshop on treebanks and linguistic theories. Sozopol, Bulgaria. Velldal, E., Oepen, S., & Flickinger, D. (2004). Paraphrasing treebanks for stochastic realization ranking. In Proceedings of the 3rd Workshop on Treebanks and Linguistic Theories. Tübingen, Germany. White, M. (2004). Reining in CCG chart realization. In Proceedings of the 3rd International Conference

  • n Natural Language Generation. Hampshire, UK.

LOGON