Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase - - PDF document

phrase weights statistical nlp
SMART_READER_LITE
LIVE PREVIEW

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase - - PDF document

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC Berkeley 1 Phrase Scoring Phrase Size Phrases do help Learning weights has been tried, several times: But they dont need [Marcu


slide-1
SLIDE 1

1

Statistical NLP

Spring 2011

Lecture 10: Phrase Alignment

Dan Klein – UC Berkeley

Phrase Weights

slide-2
SLIDE 2

2

Phrase Scoring

les chats aiment le poisson cats like fresh fish . . frais .

  • Learning weights has been

tried, several times:

  • [Marcu and Wong, 02]
  • [DeNero et al, 06]
  • … and others
  • Seems not to work well, for

a variety of partially understood reasons

  • Main issue: big chunks get

all the weight, obvious priors don’t help

  • Though, [DeNero et al 08]

Phrase Size

Phrases do help

But they don’t need to be long Why should this be?

Lexical Weighting Phrase Alignment

slide-3
SLIDE 3

3

slide-4
SLIDE 4

4

Identifying Phrasal Translations

In the past two years , a number

  • f

US citizens … 过去 两 年 中 , 一 批 美国 公民 … past two year in ,

  • ne

lots US citizen

Phrase alignment models: Choose a segmentation and a

  • ne-to-one phrase alignment

Past Go over

Underlying assumption: There is a correct phrasal segmentation

Unique Segmentations?

In the past two years , a number

  • f

US citizens … 过去 两 年 中 , 一 批 美国 公民 … past two year in ,

  • ne

lots US citizen

Problem 1: Overlapping phrases can be useful (and complementary) Problem 2: Phrases and their sub-phrases can both be useful Hypothesis: This is why models of phrase alignment don’t work well

Identifying Phrasal Translations

This talk: Modeling sets of overlapping, multi-scale phrase pairs

In the past two years , a number

  • f

US citizens … 过去 两 年 中 , 一 批 美国 公民 … past two year in ,

  • ne

lots US citizen

Input: sentence pairs Output: extracted phrases

… But the Standard Pipeline has Overlap!

M O T I V A T I O N

In the past two years 过去 两 年 中

past two year in

Sentence Pair Word Alignment Extracted Phrases

Our Task: Predict Extraction Sets

M O T I V A T I O N

Sentence Pair Extracted Phrases Conditional model of extraction sets given sentence pairs

In the past two years 过去 两 年 中

1 2 3 4 1 2 3 4 5

In the past two years 过去 两 年 中

1 2 3 4 1 2 3 4 5

Extracted Phrases + ``Word Alignments’’

slide-5
SLIDE 5

5

Alignments Imply Extraction Sets

M O D E L

In the past two years 过去 两 年 中

past two year in

1 2 3 4 1 2 3 4 5

Word-level alignment links Word-to-span alignments Extraction set

  • f bispans

Incorporating Possible Alignments

M O D E L

In the past two years 过去 两 年 中

past two year in

1 2 3 4 1 2 3 4 5

Sure and possible word links Word-to-span alignments Extraction set

  • f bispans

Linear Model for Extraction Sets

M O D E L

In the past two years 过去 两 年 中

1 2 3 4 1 2 3 4 5

Features on sure links Features on all bispans

Features on Bispans and Sure Links

F E A T U R E S

过 地球

go over Earth

  • ver

the Earth Some features on sure links HMM posteriors Presence in dictionary Numbers & punctuation Features on bispans HMM phrase table features: e.g., phrase relative frequencies Lexical indicator features for phrases with common words Monolingual phrase features: e.g., “the _____” Shape features: e.g., Chinese character counts

Getting Gold Extraction Sets

T R A I N I N G

Hand Aligned: Sure and possible word links Word-to-span alignments Extraction set

  • f bispans

Deterministic: A bispan is included iff every word within the bispan aligns within the bispan Deterministic: Find min and max alignment index for each word

Discriminative Training with MIRA

T R A I N I N G

Loss function: F-score of bispan errors (precision & recall) Training Criterion: Minimal change to w such that the gold is preferred to the guess by a loss-scaled margin Gold (annotated) Guess (arg max wɸ)

slide-6
SLIDE 6

6

Inference: An ITG Parser

I N F E R E N C E

ITG captures some bispans

Experimental Setup

R E S U L T S

Chinese-to-English newswire Parallel corpus: 11.3 million words; sentences length ≤ 40 MT systems: Tuned and tested on NIST ‘04 and ‘05 Supervised data: 150 training & 191 test sentences (NIST ‘02) Unsupervised Model: Jointly trained HMM (Berkeley Aligner)

Baselines and Limited Systems

R E S U L T S

HMM: ITG: Coarse: State-of-the-art unsupervised baseline Joint training & competitive posterior decoding Source of many features for supervised models Supervised ITG aligner with block terminals State-of-the-art supervised baseline Re-implementation of Haghighi et al., 2009 Supervised block ITG + possible alignments Coarse pass of full extraction set model

Word Alignment Performance

R E S U L T S 84.7 84.0 84.4 82.2 84.2 83.1 83.4 83.8 83.6 84.0 76.9 80.4 Precision Recall 1 - AER HMM ITG Coarse Full

Extracted Bispan Performance

R E S U L T S 69.0 74.2 71.6 74.0 70.0 72.9 71.4 72.8 75.8 62.3 68.4 62.8 69.5 59.5 64.1 59.9 Precision Recall F1 F5 HMM ITG Coarse Full

Translation Performance (BLEU)

R E S U L T S 34.4 35.9 34.2 35.7 33.6 34.7 33.2 34.5 31 32 33 34 35 36 37 Moses Joshua HMM ITG Coarse Full

Supervised conditions also included HMM alignments