Automatic extraction of paraphrasing rules: A survey and plans for - - PowerPoint PPT Presentation

automatic extraction of paraphrasing rules a survey and
SMART_READER_LITE
LIVE PREVIEW

Automatic extraction of paraphrasing rules: A survey and plans for - - PowerPoint PPT Presentation

Automatic extraction of paraphrasing rules: A survey and plans for future work Prodromos Malakasiotis, Ph.D. student Department of Informatics Athens University of Economics and Business rulller@aueb.gr 1 What is Paraphrasing? X is


slide-1
SLIDE 1

1

Prodromos Malakasiotis, Ph.D. student Department of Informatics Athens University of Economics and Business rulller@aueb.gr

Automatic extraction of paraphrasing rules: A survey and plans for future work

slide-2
SLIDE 2

2

What is Paraphrasing?

  • “X is the writer of Y” ≈ “X wrote Y” ≈ “X is the author of Y”.
  • “Oswald killed Kennedy” / “Kennedy was killed by

Oswald”.

  • “Who invented the light bulb?” / “Who was the inventor of

the light bulb?”.

  • “Edison invented the light bulb” / “Edison’s invention of

the light bulb”.

  • “Athens is located in Greece” / “Athens is the capital of

Greece”.

– Textual entailment, not really paraphrasing.

  • Can be used in:

– Question Answering, Information Retrieval, Web Search Engines. – Natural Language Generation, Automatic Summarization.

slide-3
SLIDE 3

3

Contents of this talk

  • What is Paraphrasing?
  • Lin & Pantel.
  • Barzilay & McKeown.
  • Barzilay & Lee.
  • Pang et al.
  • Ibrahim et al.
  • Directions for future work.

Paraphrasing methods

slide-4
SLIDE 4

4

Lin & Pantel’s method (1 of 3)

John found a solution to the problem det subj

  • bj

to det

N:subj:V N:to:N V:obj:N find “X finds solution to Y” solution

slide-5
SLIDE 5

5

Lin & Pantel’s method (2 of 3)

mystery he crisis government problem she problem government woe petition problem he murder sheriff dispute sheriff crime resistance budget deficit legislator mystery researcher situation I “X solves Y” problem crisis problem Slot Y government clout committee Slot X crisis committee civil war committee strike commission Slot Y Slot X “X finds a solution to Y”

slide-6
SLIDE 6

6

Lin & Pantel’s method (3 of 3)

  • Good performance despite only

approximately correct or occasionally incorrect paraphrases.

– “X caused Y” ≈ “Y is blamed on X”. – “X asks Y” ≈ “Y asks X”. – “X worsens Y” ≈ “X solves Y”.

  • Requires reliable dependency parser.

– Computationally expensive. – Not always available (e.g. in Greek).

slide-7
SLIDE 7

7

Contents of this talk

  • What is Paraphrasing?
  • Lin & Pantel.

– Dependency paths with similar slot fillers have similar meanings.

  • Barzilay & McKeown.
  • Barzilay & Lee.
  • Pang et al.
  • Ibrahim et al.
  • Directions for future work.

Paraphrasing methods

slide-8
SLIDE 8

8

Barzilay & McKeown’s method (1 of 3)

EXTRACT CONTEXTS EXTRACT PARAPHRASES

context rules more, possibly multi-word (+ / -) paraphrasing examples

<S>…</S> <S>…</S> … ... … <S>…</S> <S>…</S> <S>…</S> … ... … <S>…</S> <S>…</S> <S>…</S> … ... … <S>…</S> <S>…</S> <S>…</S> … ... … <S>…</S>

Parallel texts

<S>w1w2w3</S> <S>w7w8w9</S> … ... … <S>wnwmwk</S> <S>w1w2w3</S> <S>w7w8w9</S> … ... … <S>wnwmwk</S> <S>w4w2w5</S> <S>w6w8w0</S> … ... … <S>wlwmwi</S> <S>w4w2w5</S> <S>w6w8w0</S> … ... … <S>wlwmwi</S>

Aligned parallel texts w1 ? w3 ≈ w4 ? w5 Sentence alignment initial single-word (+ / -) paraphrasing examples

slide-9
SLIDE 9

9

Barzilay & McKeown’s method (2 of 3)

The clerk liked Monsieur Bovary He liked Monsieur Bovary ? ? The clerk liked Monsieur Bovary He was fond of Monsieur Bovary His apprentice liked the girl He was fond of the doctor’s daughter ? ? + + Actually, 1) use POS tags, and 2) mark tags of identical words Actually, 1) use both words and POS tags as features, and 2) mark tags of identical words and words with the same root

slide-10
SLIDE 10

10

Barzilay & McKeown’s method (3 of 3)

  • High precision

– 86.5% when context not given to human judges. – 91.6% when context given to human judges.

  • But 70.8% single word paraphrases.

– In effect low recall

  • Requires parallel corpus.

– Difficult to obtain.

  • Requires POS tagger, aligner.

– Easier to obtain.

slide-11
SLIDE 11

11

Contents of this talk

  • What is Paraphrasing?
  • Lin & Pantel.

– Dependency paths with similar slot fillers have similar meanings.

  • Barzilay & McKeown.

– Identical words contexts more paraphrases more contexts …

  • Barzilay & Lee.
  • Pang et al.
  • Ibrahim et al.
  • Directions for future work.

Paraphrasing methods

slide-12
SLIDE 12

12

Barzilay & Lee’s method (1 of 3)

MSA, clustering Corpus 1 Corpus 1 Corpus 2 Corpus 2 Articles for the same events Slot 1 Slot 2 Slot 4 Slot 3 US planes army bombed Baghdad forces Iraqi capital was by Iraqi military base b US planes bombers bombed Baghdad forces Iraqi capital 1 Lattices 1 2 3 a b c Many common fillers

slide-13
SLIDE 13

13

Barzilay & Lee’s method (2 of 3)

Slot 1 bombed 1 Slot 2 bombed was by b Slot 3 Slot 4 Enemy forces bombed the Afghani capital The Afghani capital was bombed by enemy forces

slide-14
SLIDE 14

14

Barzilay & Lee’s method (3 of 3)

  • Relatively high precision (78.5%).

– Many sentence-level paraphrases. – Unknown recall. – Seems to outperform Lin & Pantel’s method (42.5% precision).

  • But able to paraphrase only 12.2% of a set of

new sentences.

– Input does not match any lattice. – Precision at the same level (79.7%).

  • Does not require dependency parser, POS

tagger, aligner, etc.

– Uses simplistic named-entity (NE) recogniser. – NE recognition could help other methods too.

slide-15
SLIDE 15

15

Contents of this talk

  • What is Paraphrasing?
  • Lin & Pantel.

– Dependency paths with similar slot fillers have similar meanings.

  • Barzilay & McKeown.

– Identical words contexts more paraphrases more contexts …

  • Barzilay & Lee.

– Lattices with common slot fillers tend to correspond to paraphrases.

  • Pang et al.
  • Ibrahim et al.
  • Directions for future work.

Paraphrasing methods

slide-16
SLIDE 16

16

Pang et al.’s method (1 of 4)

105 news stories in Mandarin Chinese 105 news stories in Mandarin Chinese

<S id=1>…</S> <S id=2>…</S> … ... … <S id=n>…</S> <S id=1>…</S> <S id=2>…</S> … ... … <S id=n>…</S>

Translation 1

<S id=1>…</S> <S id=2>…</S> … ... … <S id=n>…</S> <S id=1>…</S> <S id=2>…</S> … ... … <S id=n>…</S>

Translation 2

<S id=1>…</S> <S id=2>…</S> … ... … <S id=n>…</S> <S id=1>…</S> <S id=2>…</S> … ... … <S id=n>…</S>

Translation 11 LDC Multiple Translation Chinese Corpus The sentences are already aligned

slide-17
SLIDE 17

17

S NP VP CD 12 NN persons AUX were VB killed S NP VP CD twelve NN people VB died

Pang et al.’s method (2 of 4)

NP VP CD NN AUX VB 12 twelve persons people died were killed merge trees Parse trees of aligned sentences VB

slide-18
SLIDE 18

18

Pang et al.’s method (3 of 4)

S E 12 twelve people persons died were killed NP VP CD NN AUX VB 12 twelve persons people died were killed Different paths correspond to paraphrases VB

slide-19
SLIDE 19

19

Pang et al.’s method (4 of 4)

  • Better results than Barzilay & McKeown’s

method.

– 81% vs. 66% precision, context not given to human judges. – 93% vs. 77% precision, context given to human judges.

  • Produces complete sentences not

patterns.

  • Requires reliable parser, parallel corpus.

– Difficult to obtain (e.g. in Greek).

slide-20
SLIDE 20

20

Contents of this talk

  • What is Paraphrasing?
  • Lin & Pantel.

– Dependency paths with similar slot fillers have similar meanings.

  • Barzilay & McKeown.

– Identical words contexts more paraphrases more contexts …

  • Barzilay & Lee.

– Lattices with common slot fillers tend to correspond to paraphrases.

  • Pang et al.

– Merge parse trees of aligned sentences to extract FSAs. – Different paths in an FSA correspond to paraphrases.

  • Ibrahim et al.
  • Directions for future work.

Paraphrasing methods

slide-21
SLIDE 21

21

Ibrahim et al.’s method (1 of 2)

liked The clerk A1 Monsieur Bovary A2 * O The clerk A1 Monsieur Bovary A2 * OF fond

  • f

J

“X liked Y” ≈ “X was fond of Y”

  • Same as Lin & Pantel’s method

– Dependency parse trees.

  • Compares only paths from aligned sentences
  • Find anchors among nouns and pronouns of the

aligned sentences and score them using heuristics.

The clerk liked Monsieur Bovary The clerk Monsieur Bovary was fond of /

slide-22
SLIDE 22

22

Ibrahim et al.’s method (2 of 2)

  • Low precision.

– 40.2% average precision. – Up to 47.8% by increasing the threshold.

  • Requires dependency parser, parallel corpus.

– Difficult to obtain (e.g. in Greek).

  • Requires aligner.

– Easier to obtain.

  • Reduces search space compared to Lin &

Pantel’s method.

– Compares only paths from aligned sentences. – Unclear if it overcomes the other problems of Lin & Pantel’s method (e.g. “fail” ≈ “succeed”).

slide-23
SLIDE 23

23

Contents of this talk

  • What is Paraphrasing?
  • Lin & Pantel.

– Dependency paths with similar slot fillers have similar meanings.

  • Barzilay & McKeown.

– Identical words contexts more paraphrases more contexts …

  • Barzilay & Lee.

– Lattices with common slot fillers tend to correspond to paraphrases.

  • Pang et al.

– Merge parse trees of aligned sentences to extract FSAs. – Different paths in an FSA correspond to paraphrases.

  • Ibrahim et al.

– Dependency paths with similar anchors tend to correspond to paraphrases.

  • Directions for future work.

Paraphrasing methods

slide-24
SLIDE 24

27

References

  • R. Barzilay and K. McKeown. Extracting paraphrases from a parallel
  • corpus. In Proceedings of the ACL/EACL, 2001.
  • R. Barzilay and L. Lee. Learning to Paraphrase: An Unsupervised

Approach Using Multiple-Sequence Alignment. In HLT-NAACL 2003, Main Proceedings, 2003.

  • A. Ibrahim, B. Katz, and J. Lin. Extracting structural paraphrases

from aligned monolingual corpora. In Proceedings of the Second International Workshop on Paraphrasing (IWP-2003), 2003.

  • D. Lin and P. Pantel. Discovery of Inference rules for Question
  • Answering. Natural Language Engineering, 2001.
  • B. Pang, K. Knight, and D. Marcu. Syntax-based Alignment of

Multiple Translations: Extracting Paraphrases and Generating New

  • Sentences. In HLT-NAACL 2003, Main Proceedings, 2003.