Generating Disambiguating Paraphrases for Structurally Ambiguous - - PowerPoint PPT Presentation

generating disambiguating paraphrases for structurally
SMART_READER_LITE
LIVE PREVIEW

Generating Disambiguating Paraphrases for Structurally Ambiguous - - PowerPoint PPT Presentation

Generating Disambiguating Paraphrases for Structurally Ambiguous Sentences Manjuan Duan, Ethan Hill, Michael White August 11-12, 2016, LAW-X The Ohio State University Department of Linguistics 1 Joint work with Manjuan& Ethan&


slide-1
SLIDE 1

Generating Disambiguating Paraphrases for Structurally Ambiguous Sentences

Manjuan Duan, Ethan Hill, Michael White August 11-12, 2016, LAW-X

The Ohio State University Department of Linguistics 1

slide-2
SLIDE 2

Joint work with Manjuan& Duan& Ethan& Hill&

2

slide-3
SLIDE 3

Introduction

slide-4
SLIDE 4

How can we crowd-source data for adapting parsers to new domains?

  • To some extent, MTurk workers can perform meaning-

and form-oriented tasks such as annotating PP-attachment points, with some training (Snow et al., 2008; Jha et al., 2010)

  • Gerdes (2013) and Zeldes (2016) also found that it was

possible to obtain fairly high quality class-sourced annotations, where students only received a modest amount of training

3

slide-5
SLIDE 5

How can we crowd-source data for adapting parsers to new domains?

  • To some extent, MTurk workers can perform meaning-

and form-oriented tasks such as annotating PP-attachment points, with some training (Snow et al., 2008; Jha et al., 2010)

  • Gerdes (2013) and Zeldes (2016) also found that it was

possible to obtain fairly high quality class-sourced annotations, where students only received a modest amount of training

  • In the current study, rather than annotating syntax, we use

natural language clarification questions, simply asking Mturk workers to select the right paraphrase of a structurally ambiguous sentence

3

slide-6
SLIDE 6

Big picture: Just ask people what ambiguous sentences mean

Silver' Data' Parser' Realizer' AMT:' Closer'in' meaning?' Sent' Interp1' Interp2' Para1' Para2' Interpt' 4

slide-7
SLIDE 7

Difference from previous studies

  • Aiming (ultimately) for all structural ambiguities identifiable

by an automatic parser, not confined to some specific constructions (Jha et al., 2010)

  • AMT workers are making choices among paraphrases, not

annotations, and no specific tutorial is needed

5

slide-8
SLIDE 8

Methods

slide-9
SLIDE 9

Generating disambiguating paraphrases: An illustration

laser<NUM>sg the

Det

with

Arg1

Godzilla<NUM>sg stop.01<TENSE>past,<MOOD>dcl

Mod Arg1

he

Arg0

laser<NUM>sg the

Det

with

Arg1

Godzilla<NUM>sg

Mod

stop.01<TENSE>past,<MOOD>dcl

Arg1

he

Arg0 laser<NUM>sg the

Det

with

Arg1

he by

Arg1

stop<PARTIC>pass

Mod Arg1

Godzilla<NUM>sg

Arg0

PASS<TENSE>past,<MOOD>dcl

Arg1 Arg0

he by

Arg1

stop<PARTIC>pass

Arg0

Godzilla<NUM>sg

Arg1

PASS<TENSE>past,<MOOD>dcl

Arg1 Arg0

laser<NUM>sg the

Det

with

Arg1 Mod

✗ He stopped Godzilla with the laser ✓ With the laser, he stopped Godzilla ✗ He stopped Godzilla with the laser ✓ Godzilla with the laser was stopped by him He stopped Godzilla with the laser Input ¡Sentence ¡ Reversal ¡ Reversal ¡ ✓ Godzilla was stopped by him with the laser Rewrite ¡ Rewrite ¡ Top ¡Parse ¡ Next ¡Parse ¡

realize realize realize r e w r i t e

6

slide-10
SLIDE 10

Generating disambiguating paraphrases: An illustration

laser<NUM>sg the

Det

with

Arg1

Godzilla<NUM>sg stop.01<TENSE>past,<MOOD>dcl

Mod Arg1

he

Arg0

stop.01

laser<NUM>sg the

Det

with

Arg1

he by

Arg1

stop<PARTIC>pass

Mod Arg1

Godzilla<NUM>sg

Arg0

PASS<TENSE>past,<MOOD>dcl

Arg1 Arg0

✗ He stopped Godzilla with the laser ✓ With the laser, he stopped Godzilla ✗ He stopped Godzilla with the laser

  • Reversal ¡

Reversal ¡ Top ¡Parse ¡ Next ¡Parse ¡

realize realize

slide-11
SLIDE 11

Generating disambiguating paraphrases: An illustration

laser<NUM>sg the

Det

with

Arg1

he by

Arg1

stop<PARTIC>pass

Mod Arg1

Godzilla<NUM>sg

Arg0

PASS<TENSE>past,<MOOD>dcl

Arg1 Arg0

✗ He stopped Godzilla with the laser ✓ With the laser, he stopped Godzilla ✗ Reversal ¡ Reversal ¡ ✓ Godzilla was stopped by him with the laser Rewrite ¡

l i z e realize

slide-12
SLIDE 12

Generating disambiguating paraphrases: An illustration

laser<NUM>sg the

Det

with

Arg1

Godzilla<NUM>sg stop.01<TENSE>past,<MOOD>dcl

Mod Arg1

he

Arg0

laser<NUM>sg the

Det

with

Arg1

Godzilla<NUM>sg

Mod

stop.01<TENSE>past,<MOOD>dcl

Arg1

he

Arg0 laser<NUM>sg the

Det

with

Arg1

he by

Arg1

stop<PARTIC>pass

Mod Arg1

Godzilla<NUM>sg

Arg0

PASS<TENSE>past,<MOOD>dcl

Arg1 Arg0

he by

Arg1

stop<PARTIC>pass

Arg0

Godzilla<NUM>sg

Arg1

PASS<TENSE>past,<MOOD>dcl

Arg1 Arg0

laser<NUM>sg the

Det

with

Arg1 Mod

✗ He stopped Godzilla with the laser ✓ With the laser, he stopped Godzilla ✗ He stopped Godzilla with the laser ✓ Godzilla with the laser was stopped by him He stopped Godzilla with the laser Input ¡Sentence ¡ Reversal ¡ Reversal ¡ ✓ Godzilla was stopped by him with the laser Rewrite ¡ Rewrite ¡ Top ¡Parse ¡ Next ¡Parse ¡

realize realize realize rewrite

slide-13
SLIDE 13

Generating disambiguating paraphrases: An illustration

laser<NUM>sg the

Det

with

Arg1

Godzilla<NUM>sg

Mod

stop.01<TENSE>past,<MOOD>dcl

Arg1

he

Arg0

he by

Arg1

stop<PARTIC>pass

Arg0

Godzilla<NUM>sg

Arg1

PASS<TENSE>past,<MOOD>dcl

Arg1 Arg0

laser<NUM>sg the

Det

with

Arg1 Mod

✗ ✗ He stopped Godzilla with the laser ✓ Godzilla w was stoppe

  • Reversal ¡

Rew Next ¡Parse ¡

realize realize rewrite

slide-14
SLIDE 14

Generating disambiguating paraphrases: An illustration

he by

Arg1

stop<PARTIC>pass

Arg0

Godzilla<NUM>sg

Arg1

PASS<TENSE>past,<MOOD>dcl

Arg1 Arg0

laser<NUM>sg the

Det

with

Arg1 Mod

✗ ✗ He stopped Godzilla with the laser ✓ Godzilla with the laser was stopped by him Reversal ¡ Rewrite ¡

realize realize rewrite

slide-15
SLIDE 15

Obtaining meaningfully distinct parses

  • 1. Parse the input sentence with the OpenCCG parser to
  • btain its top 25 parses
  • 2. Find a parse from the n-best parse list which is

meaningfully distinct from the top parse:

8

slide-16
SLIDE 16

Obtaining meaningfully distinct parses

  • 1. Parse the input sentence with the OpenCCG parser to
  • btain its top 25 parses
  • 2. Find a parse from the n-best parse list which is

meaningfully distinct from the top parse:

  • Only compare the unlabeled and unordered dependencies

from the two parses

  • The symmetric difference cannot be empty, with neither set
  • f dependencies a superset of the other

8

slide-17
SLIDE 17

Obtaining meaningfully distinct parses

  • 1. Parse the input sentence with the OpenCCG parser to
  • btain its top 25 parses
  • 2. Find a parse from the n-best parse list which is

meaningfully distinct from the top parse:

  • Only compare the unlabeled and unordered dependencies

from the two parses

  • The symmetric difference cannot be empty, with neither set
  • f dependencies a superset of the other
  • Ambiguities involving only POS, named entity or word

sense differences are disregarded

8

slide-18
SLIDE 18

Obtaining meaningfully distinct parses

  • 1. Parse the input sentence with the OpenCCG parser to
  • btain its top 25 parses
  • 2. Find a parse from the n-best parse list which is

meaningfully distinct from the top parse:

  • Only compare the unlabeled and unordered dependencies

from the two parses

  • The symmetric difference cannot be empty, with neither set
  • f dependencies a superset of the other
  • Ambiguities involving only POS, named entity or word

sense differences are disregarded

  • 3. If successful, this phase yields a top and next parse — the
  • nes reflecting the greatest uncertainty

8

slide-19
SLIDE 19

Two ways to obtain paraphrases

  • Paraphrases obtained from reverse realization

(reversals)

  • Able to generate paraphrases for ambiguities involving

various constructions identifiable by an auto parser

  • Paraphrases obtained from logical form rewriting

(rewrites)

  • Triggered by specific syntactic constructions such as

PP-attachment ambiguity and modifier scope ambiguity in coordination

9

slide-20
SLIDE 20

Validating reverse realizations

Need to ensure paraphrases actually disambiguate intended meanings

10

slide-21
SLIDE 21

Validating reverse realizations

Need to ensure paraphrases actually disambiguate intended meanings

  • 1. Realize the top and next parse into a n-best realization list

(n=25), using OpenCCG

  • 2. Traverse the list to find a qualifying paraphrase, which has

to

  • be different from the original sentence
  • have different relative distance among the words

involving the ambiguity from the original sentence

10

slide-22
SLIDE 22

Validating reverse realizations

Need to ensure paraphrases actually disambiguate intended meanings

  • 1. Realize the top and next parse into a n-best realization list

(n=25), using OpenCCG

  • 2. Traverse the list to find a qualifying paraphrase, which has

to

  • be different from the original sentence
  • have different relative distance among the words

involving the ambiguity from the original sentence

  • 3. Parse each candidate paraphrase to make sure the most

likely interpretation includes the dependencies from which it was generated

10

slide-23
SLIDE 23

Two-sided paraphrases and one-sided paraphrases

  • Two-sided paraphrases: Two paraphrases are obtained for

the original sentence, one generated from the top parse, and one from the next

  • One-sided paraphrases: Only one paraphrase is obtained

for the original sentence

11

slide-24
SLIDE 24

Logical form rewriting

Rewritten logical forms are realized to obtain paraphrases which highlight the ambiguous part

  • Passive and cleft rewrites for PP-attachment ambiguities
  • Coordination rewrites for ambiguities in the scope of

modiers with coordinated phrases

12

slide-25
SLIDE 25

Passive rewrites: An example

I saw the girl with the telescope.

Rewrite

⇒ The girl with the telescope was seen by me.

13

slide-26
SLIDE 26

Cleft rewrites: An example

I saw the girl with the telescope.

Rewrite

⇒ The girl with the telescope was what I saw.

14

slide-27
SLIDE 27

Coordination rewrites: An example (1)

The old men and women are becoming senile.

Rewrite

⇒ The old women and the old men are becoming senile

15

slide-28
SLIDE 28

Coordination rewrites: An example (2)

The old men and women are becoming senile.

Rewrite

⇒ The women and the old men are becoming senile

16

slide-29
SLIDE 29

Experiment

slide-30
SLIDE 30

Validation experiment

Aim: Examine the quality of the crowd-sourced annotations through disambiguating paraphrases

  • Used AMT workers as our naive annotators
  • For comparison, hand annotated 1,030 sentences as the
  • ptimal (‘gold’) annotations to measure the accuracy of the

crowd-sourced annotations

17

slide-31
SLIDE 31

Data preparation

Parsing(and( Filtering( Paraphrasing( Selec2on( AMT( Surveys( 14,114(sentences( from(Big(10(football( and(prehistoric( rep2les( 5,063(with(( top(and(next( parses( 3,605(valid( paraphrases( 1,030( items(

Working assumption: Unannotated data available in large quantities, so can focus on most informative ambiguities

18

slide-32
SLIDE 32

Gold annotations

We selected the correct parse of the sentence by examining the dependency graphs of the input sentence:

  • Annotated ‘top’ if the top parse was correct
  • Annotated ‘next’ if the next parse was correct
  • Annotated ‘neither’ if neither of them was more correct

than the other one

19

slide-33
SLIDE 33

Distribution of test data

20

slide-34
SLIDE 34

Collecting human judgments

  • 5 judgments for each sentence were collected from AMT

workers and the judgments of identical sentences were collapsed

  • “Neither” cases were excluded from analysis
  • Comprehension questions were asked to prevent random

choosing

  • Agreement levels among the AMT workers:

Majority > 50% agreement Strong Majority > 75% Unanimity > 90%

21

slide-35
SLIDE 35

Coverage vs. Accuracy: Higher accuracy (but lower cover- age) with greater agreement

22

slide-36
SLIDE 36

One-sided vs. Two-sided: Two-sided much more reliable

23

slide-37
SLIDE 37

Reversals vs. Rewrites: Reversals at least as accurate

24

slide-38
SLIDE 38

Potential correction to current parser

25

slide-39
SLIDE 39

Manual analysis

Examined 43 sentences where unanimous AMT workers judgments did not agree with gold annotations and located the following reasons for error:

  • Incompetent or broken realizations (29/43)
  • Bad parses (11/43)
  • Lack of context (3/43)

26

slide-40
SLIDE 40

Preliminary parser retraining experiment

  • Trained OpenCCG Parser with majority AMT worker

annotations (along with original CCGbank data)

  • Trained the parser separately in the two domains
  • Evaluated the parser with 10-fold cross validation

27

slide-41
SLIDE 41

Evaluation of retrained parser: an example

Parses were considered correct if the top and next dependencies occur in the same order as in gold: e.g., for the sentence I saw the girl with the telescope, if (saw, with) is annotated as the correct dependency, n-best parses Correct Incorrect 1 ... ... 2 (saw, with) ... 3 ... ... 4 ... (girl, with) 5 (girl, with) ... 6 ... ... ... ... (saw, with) 25 ... ...

28

slide-42
SLIDE 42

Parser retraining results

Dinosaur Football Train size 471 356 Eval size 291 226 Original acc. 0.701 0.668 Retrained acc. 0.749 0.717 Correction rate 0.243 0.32

  • MacNemars chi-square test shows a significant

improvement in the dinosaur domain (p = 0.02)

  • No significant improvement on football data due to the

smaller data size

  • The retrained parsers do not differ significantly from the
  • riginal parser (p > 0.05 for both) on the CCGbank

development set

29

slide-43
SLIDE 43

Conclusions

slide-44
SLIDE 44

Conclusions and future work

  • It is possible to obtain accurate crowd-sourced judgments

from naive annotators with no instruction — pointing the way towards collecting parser training data on a massive scale

30

slide-45
SLIDE 45

Conclusions and future work

  • It is possible to obtain accurate crowd-sourced judgments

from naive annotators with no instruction — pointing the way towards collecting parser training data on a massive scale

  • The preliminary parsing experiment already suggests that

automatic parsers can be retrained to achieve better parsing accuracy

30

slide-46
SLIDE 46

Conclusions and future work

  • It is possible to obtain accurate crowd-sourced judgments

from naive annotators with no instruction — pointing the way towards collecting parser training data on a massive scale

  • The preliminary parsing experiment already suggests that

automatic parsers can be retrained to achieve better parsing accuracy

  • In the future, we plan to experiment with parser adaptation

with multiple parsers and larger data sets

  • We also plan to experiment with generating paraphrases

with sentence splitting and simplification (Siddharthan, 2006; Siddharthan, 2011)

30

slide-47
SLIDE 47

Acknowledgments

We thank James Curran, Eric Fosler-Lussier, the OSU Clippers Group and the anonymous reviewers for helpful comments and

  • discussion. This work was supported in part by NSF grant

1319318.

31

slide-48
SLIDE 48

Thank you!

31

slide-49
SLIDE 49

Incompetent realizations

Realization ok, but fails to reliably capture the different meaning in the parses Usually involved just adding or deleting punctuation

32

slide-50
SLIDE 50

Incompetent realizations: An example

The teeth were adapted to crush bivalves, gastropods and

  • ther animals with a shell or exoskeleton.

(animals, with): Same as the original sentence (crush, with): The teeth were adapted to crush bivalves, gastropods and other animals , with a shell or exoskeleton.

33

slide-51
SLIDE 51

Broken realizations

  • Inappropriate heavy NP shift
  • Long adverbials moved between verbs and their (other)

complements

  • Wrong modifier-modificand word order
  • Wrong position of the particle for phrasal verbs
  • Wrong preposition-complement position

34

slide-52
SLIDE 52

Broken realizations: An example

They are thought to have gone extinct during the Triassic-Jurassic extinction event. (gone, during): They are thought to have gone during the Triassic-Jurassic extinction event extinct. (thought, during): They are thought during the Triassic-Jurassic extinction event to have gone extinct.

35

slide-53
SLIDE 53

Bad parses

Although one parse is better than the other one for the disputed dependency, the rest of both parses are so broken that the realization cannot reliably capture the meaning difference

  • Parsing in as a conjunction
  • Bad parse in general

36

slide-54
SLIDE 54

Bad parses: An example

Coming off a disappointing 2-10 season in 2009 Maryland returns to a bowl game to face East Carolina. (returns, to): Coming off a disappointing 2-10 season in 2009 returns to a bowl game to face East Carolina Maryland. (Coming, to): Coming off a disappointing 2-10 season to a bowl game to face East Carolina in 2009 Maryland returns.

37

slide-55
SLIDE 55

Bad parses: top parse

Coming off a disappointing 2-10 season in 2009 Maryland returns to a bowl game to face East Carolina.

East_Carolina<NUM>sg 2009 face.01

Arg1

game<NUM>sg

Arg0

Maryland<NUM>sg

Purpose

bowl<NUM>sg

Mod

a

Det

to

Arg1

in return<NUM>pl,<DET>nil

Arg1

season<NUM>sg 2-10<NUM>sg

Mod

disappointing

Mod

a

Det

  • ff

Arg1

come.03<MOOD>dcl,<NOM>+,<PARTIC>pres

Mod Arg2 x1 Arg1 Mod Mod Mod

38

slide-56
SLIDE 56

Bad parses: next meaningfully distinct

Coming off a disappointing 2-10 season in 2009 Maryland returns to a bowl game to face East Carolina.

East_Carolina<NUM>sg 2009 face.01

Arg1

game<NUM>sg

Arg0

Maryland<NUM>sg

Purpose

bowl<NUM>sg

Mod

a

Det

to

Arg1

in return<NUM>pl,<DET>nil

Arg1

season<NUM>sg 2-10<NUM>sg

Mod

disappointing

Mod

a

Det

  • ff

Arg1

come.03<MOOD>dcl,<NOM>+,<PARTIC>pres

Mod Mod Arg2 x1 Arg1 Mod Mod

39

slide-57
SLIDE 57

Lack of context

Turkers fail to choose the correct parse because of lack of context

40

slide-58
SLIDE 58

Lack of context: An example

Michigan’s backup center, Gerald Ford, expressed a desire to attend the fair while in Chicago. (attend, while): Michigan’s backup center, Gerald Ford, expressed a desire to attend while in Chicago the fair. (expressed, while): Michigan’s backup center, Gerald Ford, expressed while in Chicago a desire to attend the fair.

41

slide-59
SLIDE 59

Regression analysis

A regression analysis to determine the factors affecting AMT workers choices: One-sided Two-sided Maj

  • S. Maj

Maj

  • S. Maj

parse

  • 0.03
  • 0.05

0.01 0.01 bleu 3.05* 4.38** 1.68* 3.07** rlz.glb 0.01 0.01 0.07** 0.103*** AMT workers tend to choose:

  • the paraphrases similar to the original sentence
  • the paraphrases with higher fluency scores

42

slide-60
SLIDE 60

Regression analysis for coverage and accuracy trade-off

0.6 0.7 0.8 0.9 1.0 100 200 300 400

Data Size Accuracy

Majority.Baseline Majority.Pred Strong.Majority.Baseline Strong.Majority.Pred

43

slide-61
SLIDE 61

Distribution of test data

44

slide-62
SLIDE 62

Data preparation

  • 1. We collected 6,335 sentences from Prehistoric Reptiles

and 7,779 from Big 10 Conference Football

  • 2. After parsing the sentences and filtering sentences too

short or too long, 5,063 sentences were found to be ambiguous

  • 3. Valid paraphrases were generated for 3,605 sentences
  • 4. 515 sentences from each domain were selected for

validation experiment

45