Diversifiable Bootstrapping Hideki Shima Thesis Committee: Language - - PDF document

diversifiable bootstrapping
SMART_READER_LITE
LIVE PREVIEW

Diversifiable Bootstrapping Hideki Shima Thesis Committee: Language - - PDF document

8/20/2014 Carnegie Mellon Paraphrase Pattern Acquisition by Diversifiable Bootstrapping Hideki Shima Thesis Committee: Language Technologies Institute Teruko Mitamura, CMU (chair) Eric Nyberg, CMU School of Computer Science Eduard Hovy, CMU


slide-1
SLIDE 1

8/20/2014 1

Thesis Defense, Aug 20th, 2014

Language Technologies Institute School of Computer Science Carnegie Mellon University, USA

Paraphrase Pattern Acquisition by Diversifiable Bootstrapping

Carnegie Mellon

Hideki Shima

Thesis Committee: Teruko Mitamura, CMU (chair) Eric Nyberg, CMU Eduard Hovy, CMU Patrick Pantel, Microsoft Research

1

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 2

Need for capturing meaning equivalence in QA

  • Q. What did John Lennon die of?

 John Lennon died of what John Lennon was murdered with gunshots in 1980 … Templates of natural language expressions can bridge different surface with close meaning:

  • X died of Y
  • X was murdered with Y
slide-2
SLIDE 2

8/20/2014 2

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 3

Need for capturing meaning equivalence in QA

  • Q. What did John Lennon die of?

 John Lennon died of what John Lennon was murdered with gunshots John Lennon's death by gunshots John Lennon suffered a fatal gunshot wound John Lennon fell victim to assassin's bullets Chapman killed him with four gunshots wounds … pumping four bullets into him, ending his life : : :

  • X died of Y
  • X had died from Y
  • X was murdered with Y
  • X's death by Y
  • killed X with Y
  • X suffered a fatal Y
  • X fell victim to Y
  • pumping Y into X,

ending his life

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 4

  • Automatic Evaluation

– In Machine Translation [Kauchak & Barzilay, 2006][Padó et al., 2009] – In Text Summarization [Zhou et al., 2006] – In Question Answering [Ibrahim et al., 2003] [Dalmas, 2007]

  • Text Summarization [Lloret et al., 2008][Tatar et al., 2009]
  • Information Retrieval [Parapar et al., 2005][Riezler et al., 2007]
  • Information Extraction [Romano et al., 2006]
  • Question Answering [Harabagiu & Hickl, 2006][Dogdan et al., 2008]
  • Collocation Error Correction [Dahlmeier and Ng, 2011]

Paraphrasing is a common need in various applications

slide-3
SLIDE 3

8/20/2014 3

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 5

(1) Paraphrase Recognition (2) Paraphrase Generation (3) Paraphrase Extraction

  • die  <decease, pass away, kick the bucket>
  • He had a lot of admiration for his job

 He had plenty of admiration for his job Usage / Application Classification of Paraphrase Research (word/phrase-level) (sentence-level) (document-level)

  • Question Answering
  • Text Summarization
  • Automatic Grading
  • Plagiarism Detection
  • Query Expansion
  • Reference Expansion in

Automatic Evaluation  {word, phrase, sentence}

  • level paraphrases
  • with/without variables
  • with/without structure
  • Resource for (1) and (2)
  • Paraphrase dictionary
  • Sentence-aligned paraphrase

corpus <kill, murder>  {Y, N} <S1, S2>  {Y, N} <D1, D2>  {Y, N}

<X wrote Y, X is the writer of Y> <writer, author> <S1, S2> <a lot of X, plenty of X> <X buy Y, Y sell X>

SUBJ FROM TO SUBJ

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 6

Why not using existing lexical resource (e.g. WordNet)?

Limitations:

  • Lack of coverage (e.g. phrasal expression)
  • Lack of context (preposition etc)

Can we rewrite patterns with knowledge for more lexical varieties?

e.g., WordNet [Miller, 1995], FrameNet [Baker et al., 1998], Nomlex [Macleod et al., 1998], VerbNet [Kipper et al., 2006]

slide-4
SLIDE 4

8/20/2014 4

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 7

Why extract paraphrases?

It’s because language expressions are diverse

Type Example Paraphrases of “die” Idioms bite the dust, go west, give up the ghost, go to a better place, pay the ultimate price, buy the farm Non-idiom phrase suffer a fatal something; fall victim to something; pumping a bullet into the heart, ending one’s life Religious euphemism be carried away by angels, answer God’s calling, go to heaven, reach nirvana Euphemism by profession (author) write one’s final chapter, (dancer) dance one’s last dance, (gambler) cashed in their chips Slang in military go Tango Uniform, go T.U., turn one’s toes up, be KIA (killed in action), be KIFA (killed in flight accident), be DOW (died of wounds) Slang in physician be at room temperature, be bloodless, feel no pain, lose vital signs, wear a toe tag Slang in gangsters merc, merk, murk, snuff, smoke, bang, get a backdoor parole Carnegie Mellon

Thesis Defense, Aug 20th, 2014 8

absolute synonym near-synonym expression with high semantic relatedness entailment / inference metaphor syntactic variation euphemism neologism slang / jargon expression with high semantic similarity

(Quasi-)Paraphrase

slide-5
SLIDE 5

8/20/2014 5

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 9

  • Introduction
  • Paraphrase Extraction

– Vanilla Espresso (Baseline) – Espresso Extension (Baseline2) – Diversifiable Bootstrapping – Distributional Type Filtering

  • Paraphrase Evaluation Metric: DIMPLE
  • Experiment

– Design – Evaluation Results

  • Conclusion

Outline

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 10

– Bilingual parallel corpus [Callison-Burch, 2008, Kok and Brockett,

2010]

– Multiple translations [Barzilay & McKeown 2001] [Pang et al, 2003] – Aligned news contents [Dolan et al., 2004][Dolan and Brockett,

2005][Quirk et al., 2004]

– Aligned definitions [Hashimoto et al., 2002] – Huge monolingual corpora

  • 150GB [Bhagat & Ravichandran, 2008]
  • 4.5TB parsed corpus [Metzler & Hovy, 2011]

Paraphrase extraction source corpora

slide-6
SLIDE 6

8/20/2014 6

Carnegie Mellon

Thesis Contribution (1 of 4)

  • Problem

– Corpus Restriction: previous works have special corpus requirement e.g. parallel corpus, terabyte- scale corpus.

  • Not suitable for domain-specific paraphrase acquisition
  • Costly to build
  • Hypothesis & Proposed Solution

– It is possible to extract paraphrase templates from an unstructured monolingual corpus given seed instances  Bootstrap Paraphrase Learning

11 Thesis Defense, Aug 20th, 2014

Carnegie Mellon

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 12

monolingual plain corpus

seed instances BOOTSTRAP LEARNING ALGORITHM more instances patterns INPUT OUTPUT

ESPRESSO [Pantel & Pennacchiotti, 2006]

slide-7
SLIDE 7

8/20/2014 7

Carnegie Mellon

BOOTSTRAP LEARNING ALGORITHM

monolingual plain corpus

Bootstrapping more instances patterns INPUT OUTPUT

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 13

seed instances

X (killer) Y (victim) John Wilkes Booth Mark David Chapman Nathuram Godse Yigal Amir John Bellingham Mohammed Bouyeri Dan White Sirhan Sirhan El Sayyid Nosair Mijailo Mijailovic Abraham Lincoln John Lennon Mahatma Gandhi Yitzhak Rabin Spencer Perceval Theo van Gogh Mayor George Moscone Robert F. Kennedy Meir Kahane Anna Lindh

Carnegie Mellon monolingual plain corpus

seed instances Bootstrapping more instances INPUT OUTPUT

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 14

patterns

X, the assassin of Y assassination of Y by X X assassinated Y the assassination of Y by X

  • f X, the assassin of Y

X assassinated Y in : : :

Unlike many other bootstrapping works the goal is acquire patterns, not instances

slide-8
SLIDE 8

8/20/2014 8

Carnegie Mellon

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 15

monolingual plain corpus

seed instances BOOTSTRAP LEARNING ALGORITHM more instances patterns INPUT OUTPUT

Carnegie Mellon

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 16

Seed Instances Sentences Extracted Patterns Ranked Patterns Extracted Instances Sentences Ranked Instances 1st iteration . . . 2nd iteration

slide-9
SLIDE 9

8/20/2014 9

Carnegie Mellon

Search sentences by instances

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 17

Extracted Patterns Ranked Patterns Extracted Instances Sentences Ranked Instances 1st iteration . . . 2nd iteration Sentences Seed Instances

  • Edwin Booth was brother of John Wilkes Booth, the

assassin of Abraham Lincoln.

  • John Wilkes Booth, the assassin of Abraham

Lincoln, was inspired by Brutus.

  • In 1969 Berman was part of the defense team of

Sirhan Sirhan, the assassin of Robert F. Kennedy. : : :

Carnegie Mellon

Search sentences by instances

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 18

Extracted Patterns Ranked Patterns Extracted Instances Sentences Ranked Instances 1st iteration . . . 2nd iteration Sentences Seed Instances

  • Edwin Booth was brother of X, the assassin of Y.
  • X, the assassin of Y, was inspired by Brutus.
  • In 1969 Berman was part of the defense team of X,

the assassin of Y. : : :

slide-10
SLIDE 10

8/20/2014 10

Carnegie Mellon

Extract patterns from sentences

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 19

Seed Instances Ranked Patterns Extracted Instances Sentences Ranked Instances 1st iteration . . . 2nd iteration Extracted Patterns Sentences

  • … brother of X, the assassin of Y .
  • X, the assassin of Y , was
  • …team of X, the assassin of Y .

Extracted Pattern: Longest Common Substring among retrieved sentences

Carnegie Mellon

Score and rank patterns

Sentences

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 20

Extracted Instances Sentences Ranked Instances 1st iteration . . . 2nd iteration Ranked Patterns

Rank by reliability of pattern: r(p). r(p) is based on an association measure with each instance in the corpus.

Extracted Patterns Seed Instances

slide-11
SLIDE 11

8/20/2014 11

Carnegie Mellon

Score and rank patterns

Sentences

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 21

Extracted Instances Sentences Ranked Instances 1st iteration . . . 2nd iteration Ranked Patterns

  • 1. 0.422 X, the assassin of Y
  • 2. 0.324 assassination of Y by X
  • 3. 0.312 X assassinated Y
  • 4. 0.231 the assassination of Y by X
  • 5. 0.208 of X, the assassin of Y

: : :

Extracted Patterns Seed Instances

Carnegie Mellon

Search sentences by pattern(s)

Sentences Extracted Patterns Seed Instances

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 22

Extracted Instances Ranked Instances 1st iteration . . . 2nd iteration Ranked Patterns

  • Still shot from the CCTV video footage showing

Oguen Samast, the assassin of Hrant Dink.

  • Henry Bellingham is a descendant of John

Bellingham, the assassin of Spencer Perceval.

Sentences

slide-12
SLIDE 12

8/20/2014 12

Carnegie Mellon

Ranked Patterns

Extract instances from sentences

Sentences Extracted Patterns Seed Instances

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 23

Ranked Instances 1st iteration . . . 2nd iteration

  • Still shot from the CCTV video footage showing

Oguen Samast, the assassin of Hrant Dink.

  • Henry Bellingham is a descendant of John

Bellingham, the assassin of Spencer Perceval.

Sentences Extracted Instances

Carnegie Mellon

Sentences Sentences 1st iteration Extracted Patterns Seed Instances

Score and rank instances

Bootstrap Instance/Pattern Learning

Thesis Defense, Aug 20th, 2014 24

. . . 2nd iteration Ranked Patterns Extracted Instances Ranked Instances

Rank instances by reliability: r(i) (similar to pattern reliability scoring)

slide-13
SLIDE 13

8/20/2014 13

Carnegie Mellon

Thesis Defense, Aug 20th, 2014

Reliability: Scoring Patterns and Instances

25

Pattern reliability Instance reliability

ESPRESSO [Pantel & Pennacchiotti, 2006] Carnegie Mellon

Convergence: When to stop the iteration?

Thesis Defense, Aug 20th, 2014 26

  • Until extracting τ1 patterns
  • The average pattern score decreases by more

than τ2 from the previous iteration

slide-14
SLIDE 14

8/20/2014 14

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 27

  • Introduction
  • Paraphrase Extraction

– Vanilla Espresso (Baseline) – Espresso Extension (Baseline2) – Diversifiable Bootstrapping – Distributional Type Filtering

  • Paraphrase Evaluation Metric: DIMPLE
  • Experiment

– Design – Evaluation Results

  • Conclusion

Outline

Carnegie Mellon

Extending Vanilla Espresso

Thesis Defense, Aug 20th, 2014 28

  • Instance Extraction:
  • POS-based [Justesona & Katz, 1995] ( (Adj|Noun)+ | ((Adj|Noun) * ((Noun)(Prep))?

) (Adj|Noun) * ) Noun

  • Sliding window + dictionary (YAGO2[Hoffart et al., 2012] )
  • Instance Filtering
  • Pronouns
  • Distributional type constraint
  • Specific pattern filtering
  • General pattern filtering
  • Sentence-based corpus
slide-15
SLIDE 15

8/20/2014 15

Carnegie Mellon

Thesis Defense, Aug 20th, 2014

Corpus preprocessing

29

  • punctuations & symbols play important role in a pattern

 Let's index them too

Carnegie Mellon

Thesis Defense, Aug 20th, 2014

Reliability from sentence-based corpus

30

(relation: died-in) |x, p, y| = | Liu Bei (d. 223 | = xcount( #1( Liu Bei lLPARENl d lPERIODl 223) ) = 20 |x, *, y| = xcount( #uw20( #1( Liu Bei ) #1( 223 ) ) ) = 36 |*, p, *| = | X (d. Y | = xcount( #1( lLPARENl d lPERIODl ) ) = 50347

p = “X (d. Y”

i = <“Liu Bei”, “223”> |x, p, y| calculation is the core part in both accuracy and speed.

slide-16
SLIDE 16

8/20/2014 16

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 31

  • Introduction
  • Paraphrase Extraction

– Vanilla Espresso (Baseline) – Espresso Extension (Baseline2) – Diversifiable Bootstrapping – Distributional Type Filtering

  • Paraphrase Evaluation Metric: DIMPLE
  • Experiment

– Design – Evaluation Results

  • Conclusion

Outline

Carnegie Mellon

Issue: Lack of Lexical Diversity

Thesis Defense, Aug 20th, 2014 32

X, the assassin of Y assassination of Y by X X assassinated Y the assassination of Y by X

  • f X, the assassin of Y

X assassinated Y in

Words participating in patterns are skewed!

0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 6 7 8 9 10 Iteration

precision recall

slide-17
SLIDE 17

8/20/2014 17

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 33

Paraphrases extracted for “killed” in various approaches

[Bannard & Callison- Burch, 2005] [Bhagat & Ravichandran, 2008] [Pasca & Dienes, 2005] [Metzler & Hovy, 2011]

murdered killed in used wounded died killed , made injured beaten that killed involved arrested been killed killed NN people found left are killed NN born that killed lost killed by done were killed were killed were wounded in injured Involved kill and wounding seen killing have died dead , including taken claimed , hundreds released shot dead

Paraphrases acquired by Metzler et al., [2011]

unique keywords from correct phrases

murder die kill kill dead N/A kill dead Carnegie Mellon

Thesis Contribution (2 of 4)

  • Problem

– Lack of Lexical Diversity: preventing semantic drift too much results in extracting patterns with poor lexical diversity

  • Hypothesis & Proposed Solution

– Lexical diversity of acquired paraphrase can be controlled with a model of relevance-dissimilarity interpolation  Diversifiable Bootstrapping

34 Thesis Defense, Aug 20th, 2014

slide-18
SLIDE 18

8/20/2014 18

Carnegie Mellon

Diversifiable Bootstrapping [Shima & Mitamura, 2012]

Thesis Defense, Aug 20th, 2014 35

) ( ) 1 ( ) ( ) ( ' p diversity p r p r       

Original reliability score of a pattern How is a pattern lexically different from other patterns originally ranked higher than this?

Carnegie Mellon

Diversifiable Bootstrapping [Shima & Mitamura, 2012]

Thesis Defense, Aug 20th, 2014 36

) ( ) 1 ( ) ( ) ( ' p diversity p r p r       

Original reliability score of a pattern Interpolation parameter:

1   

How is a pattern lexically different from other patterns originally ranked higher than this?

slide-19
SLIDE 19

8/20/2014 19

Carnegie Mellon

How is this pattern lexically different from

  • ther patterns originally

ranked higher than this?

Diversifiable Bootstrapping [Shima & Mitamura, 2012]

Thesis Defense, Aug 20th, 2014 37

) ( ) 1 ( ) ( ) ( ' p diversity p r p r       

Original reliability score of a pattern

By tweaking the parameter λ, patterns to acquire can be diversifiable with a specific degree one can control.

Interpolation parameter:

1   

Carnegie Mellon

Acquired Paraphrases: killed

Thesis Defense, Aug 20th, 2014 38

X, the assassin of Y assassination of Y by X X assassinated Y the assassination of Y by X

  • f X, the assassin of Y

X assassinated Y in X, the man who assassinated Y Y's assassin, X

  • f Y's assassin X
  • f the assassination of Y by X

X shot and killed Y Y was assassinated by X named X assassinated Y Y was shot by X X to assassinate Y

1   (no diversification)

[Shima & Mitamura, 2012]

slide-20
SLIDE 20

8/20/2014 20

Carnegie Mellon

Acquired Paraphrases: killed

Thesis Defense, Aug 20th, 2014 39

X, the assassin of Y assassination of Y by X X assassinated Y the assassination of Y by X

  • f X, the assassin of Y

X assassinated Y in X, the man who assassinated Y Y's assassin, X

  • f Y's assassin X
  • f the assassination of Y by X

X shot and killed Y Y was assassinated by X named X assassinated Y Y was shot by X X to assassinate Y X, the assassin of Y X assassinated Y assassination of Y by X Y was shot by X X, who killed Y the assassination of Y by X X assassinated Y in X tells his version of Y X shoot Y X murdered Y Y's killer, X Y, at the theatre after X Y, push X to his breaking point X to assassinate Y

  • f X, the assassin of Y

X, the assassin of Y X, who killed Y Y was shot by X X tells his version of Y X shoot Y X murdered Y Y's killer, X Y, at the theatre after X Y, push X to his breaking point X assassinated Y assassination of Y by X X to assassinate Y X kills Y

  • f X shooting Y

X assassinated Y in

1   7 .   3 .  

[Shima & Mitamura, 2012]

Carnegie Mellon

Acquired Paraphrases: died-of

Thesis Defense, Aug 20th, 2014 40

X died of Y X died of Y in X died of Y on X died of lung Y X died of lung Y in X died of lung Y on X died of Y in the X died of Y at X died of stomach Y X died of natural Y X died of breast Y in X died of a Y X died of Y in his X passed away from Y X died of a Y in X died of Y in X died of Y X's death from Y X passed away from Y Y of X, news Y of X, a former that X was suffering from Y the suspected Y of X X to breast Y in X was diagnosed with ovarian Y X dies of Y X was dying of Y X died of lung Y X died of Y on X died of lung Y in X died of Y in X's death from Y X passed away from Y Y of X, news Y of X, a former that X was suffering from Y the suspected Y of X X succumbed to lung Y X to breast Y in X was diagnosed with ovarian Y X dies of Y X was dying of Y X died of Y X's death from Y in X died of lung Y

1   7 .   3 .  

[Shima & Mitamura, 2012]

slide-21
SLIDE 21

8/20/2014 21

Carnegie Mellon

Acquired Paraphrases: was-led-by

Thesis Defense, Aug 20th, 2014 41

Y came to power in X in Y came to power in X Y to power in X Y came to power in X in the when Y came to power in X in when Y came to power in X Y took power in X Y rose to power in X after Y came to power in X Y became chancellor of X Y came to power in X and Y seized power in X Y gained power in X to power of Y in X Y's rise to power in X Y came to power in X Y to power in X regime of Y in X Y came to power in X in Y to power in X in Y became chancellor of X the rise of Y in X X's dictator Y X's president Y Y took control of X Y, who ruled X Y's success and X's saviour Y declared that X had X's leader Y government of Y in X Y came to power in X in regime of Y in X X's dictator Y Y became chancellor of X X's president Y the rise of Y in X X's leader Y Y, who ruled X Y took control of X government of Y in X X, led by Y quisling had visited Y in X to flee X after Y Y in X the year before X, under the leadership of Y

1   7 .   3 .  

[Shima & Mitamura, 2012]

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 42

  • Introduction
  • Paraphrase Extraction

– Vanilla Espresso (Baseline) – Espresso Extension (Baseline2) – Diversifiable Bootstrapping – Distributional Type Filtering

  • Paraphrase Evaluation Metric: DIMPLE
  • Experiment

– Design – Evaluation Results

  • Conclusion

Outline

slide-22
SLIDE 22

8/20/2014 22

Carnegie Mellon

Semantic Drift Problem

Thesis Defense, Aug 20th, 2014 43

“gave birth to X and Y” <rock, roll> “annual X and Y hall of fame”

  • “lexicons intended meaning shifts into another category during

bootstrapping” [Curran et al., 2007]

  • “semantic drift often occurs when ambiguous or erroneous terms

and/or patterns are introduced into and then dominate the iterative process” [McIntosh 2009]

(has-sister relation)

Carnegie Mellon

Thesis Contribution (3 of 4)

  • Problem

– Semantic Drift: bootstrap pattern-instance learning can easily mess up with ambiguous or erroneous item

  • Hypothesis & Proposed Solution

– Semantic drift risk from diversification be mitigated by distributional type constraint.

44 Thesis Defense, Aug 20th, 2014

slide-23
SLIDE 23

8/20/2014 23

Carnegie Mellon

Overview: Distributional Type Constraint

Thesis Defense, Aug 20th, 2014 45

Elvis Presley heart attack Bob Marley cancer John Lennon shot dead Marilyn Monroe drug overdose

X Y

Linda McCartney breast cancer Los Alamos radiation exposure Peter Turkel car accident Jim Morrison 1971

X Y

Initial seed instances Extracted instance candidates

Distributional Type Extractor weight: type frequency * Inverse corpus type frequency

44.7 physical condition 34.9 condition 34.9 illness 30.1 ill health 29.9 pathological state 29.1 state 20.9 crisis 20.8 emergency 20.4 juncture

each

0.0 entity 0.0 abstraction 0.0 attribute 2.2 pathological state 2.1 illness 2.0 malignant tumor 1.9 cancer

Vector Space Similarity Calculation

[0.0, 1.0]

Carnegie Mellon

Distributional Type Constraint: Pros and Cons

  • Pros: Can define soft constraint by seed instances (instead of
  • ntological hard-constraint).

– Associating with one ontology node is sometimes difficult – Example: cause-of-death

  • disease or health problem (Motor Neurone Disease; alcohol
  • verdose; starvation)
  • accident (traffic accident; lawn mower; fight; fire)
  • indirect cause of death (overwork; curse; shame)
  • Cons: robustness

– Errors of type extraction – Coverage of words/phrases e.g. week-long series of air raid; well- aimed rifle shots

46 Thesis Defense, Aug 20th, 2014

slide-24
SLIDE 24

8/20/2014 24

Carnegie Mellon

Source of types: YAGO2 DB

  • Type resource: YAGO2[Hoffart et al., 2012]

– 9.8 million entities from Wikipedia, GeoNames, and WordNet. – each entity is linked with WordNet synsets

  • Cf. WordNet 3.0 contains 155K (nouns: 118K)

words and 118K (nouns: 82K) synsets, which lacks coverage of proper nouns.

47 Thesis Defense, Aug 20th, 2014

Carnegie Mellon

Example Types

48 Thesis Defense, Aug 20th, 2014

Exhaustive set of types associated with “heart attack” in YAGO2.

slide-25
SLIDE 25

8/20/2014 25

Carnegie Mellon

Type Vector Weights

49 Thesis Defense, Aug 20th, 2014

cf: tfidf = tf * log ( D / 1 + df ). Carnegie Mellon

Type similarity calculation

50 Thesis Defense, Aug 20th, 2014

slide-26
SLIDE 26

8/20/2014 26

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 51

  • Introduction
  • Paraphrase Extraction

– Vanilla Espresso (Baseline) – Espresso Extension (Baseline2) – Diversifiable Bootstrapping – Distributional Type Filtering

  • Paraphrase Evaluation Metric: DIMPLE
  • Experiment

– Design – Evaluation Results

  • Conclusion

Outline

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 52

X died from Y X died of Y X who died of Y X who died from Y X was dying of Y X had died of Y X dying of Y X died of Y X died from Y X was murdered with Y X's death by Y X suffered a fatal Y Y killed Y X fell victim to Y

Paraphrase Evaluation: Which set of paraphrases is better?

Relation: “killed”

Are A & B really equally valuable?

A B

100% precision 100% precision

slide-27
SLIDE 27

8/20/2014 27

Carnegie Mellon

Thesis Contribution (4 of 4)

  • Problem

– Lack of Evaluation Metric: precision or recall does not reward lexical diversity

  • Hypothesis & Proposed Solution

– Evaluation metric which gives reward to lexically diverse paraphrases is effective for paraphrase evaluation  DIMPLE Metric (DIversity-aware Metric

for Pattern Learning Experiments)

53 Thesis Defense, Aug 20th, 2014

Carnegie Mellon

Traditional metrics

54

Relation: “killed”

Expected Precision

[Bannard and Callison-Burch, 2005; Callison-Burch, 2008; Kok and Brockett, 2010; Metzler et al., 2011]

Output “kill” “killed, ” “of” “death” “murdered” Judge1

1 1 1 1

 

k i i k

avg k

1

1 EP

Judge2

1 1 1

Judge3

1 1 1 1

Avg

1 1 2/3 1

Thesis Defense, Aug 20th, 2014

slide-28
SLIDE 28

8/20/2014 28

Carnegie Mellon

Traditional metrics

55

Relation: “killed”

Expected Precision + Redundancy

[Metzler et al., 2011]

Output “kill” “killed, ” “of” “death” “murdered” Judge1

1 1

Judge2

1

Judge3

1 1

Avg

2/3 1

 

k i i k

avg k

1

1 EPR

Thesis Defense, Aug 20th, 2014

Carnegie Mellon

Cumulative Gain (for Information Retrieval evaluation)

56

Input: “killed” Output “kill” “killed, ” “of” “death” “murdered” gain

2 1 2 3

 

 

k i i k

gain

1

) 1 ^ 2 ( CG

Cumulative Gain [Järvelin & Kekäläinen, 2002; Kekäläinen, 2005]

Query doc1 doc2 doc3 doc4 doc5

relevance

fairly relevant marginally relevant irrelevant fairly relevant highly relevant

Thesis Defense, Aug 20th, 2014

slide-29
SLIDE 29

8/20/2014 29

Carnegie Mellon

DIMPLE metric [Shima & Mitamura, 2011]

57

Relation: “killed” Output “kill” “killed, ” “of” “death” “murdered” Q

1 1 2/3 1

D

2 1 1 3 3

gain

2 1 2 3

i i i

D Q gain  

Thesis Defense, Aug 20th, 2014

Quality Diversity

Carnegie Mellon

DIMPLE metric [Shima & Mitamura, 2011]

58

Relation: “killed” Output “kill” “killed, ” “of” “death” “murdered” Q

1 1 2/3 1

D

2 1 1 3 3

gain

2 1 2 3

 

 

k i i k

gain

1

) 1 ^ 2 ( CG

Cumulative Gain [Järvelin & Kekäläinen, 2002; Kekäläinen, 2005]

Thesis Defense, Aug 20th, 2014

slide-30
SLIDE 30

8/20/2014 30

Carnegie Mellon

DIMPLE metric [Shima & Mitamura, 2011]

59

Relation: “killed” Output “kill” “killed, ” “of” “death” “murdered” Q

1 1 2/3 1

D

2 1 1 3 3

gain

2 1 2 3

14 7 3 1 3 ) 1 ^ 2 (

5 1 5

        

 i i k

gain CG

2^gain-1

3 1 3 7

Thesis Defense, Aug 20th, 2014

Carnegie Mellon

DIMPLE metric [Shima & Mitamura, 2011]

60

Relation: “killed” Output “kill” “killed, ” “of” “death” “murdered” Q

1 1 2/3 1

D

2 1 1 3 3

gain

2 1 2 3

4 . 7 7 7 7 7 7 3 1 3 DIMPLE

5

         

 k 2^gain-1 3 1 3 7

Thesis Defense, Aug 20th, 2014

slide-31
SLIDE 31

8/20/2014 31

Carnegie Mellon

Evaluating DIMPLE

61

Relation: “killed” Output “kill” “killed, ” “of” “death” “murdered” Intrinsic EP, EPR, DIMPLE Extrinsic MSRPA, RTE, CQAE Correlation

40 sets of paraphrases (=10 verbs x 4 paraphrase generation algorithms)

Thesis Defense, Aug 20th, 2014

  • MSRPC: The Microsoft Research Paraphrase

Corpus [Dollan et al., 2005]

  • RTE: Recognizing Textual Entailment dataset from

PASCAL/TAC RTE1-4.

  • CQAE: Complex Question Answering Evaluation

from 6 past TREC QA tracks.

Carnegie Mellon

Evaluating DIMPLE: Result

62

Dataset EP EPR DIMPLE MSRPC 0.19 0.37 *0.52 RTE 0.29 *0.38 *0.58 CQAE *0.47 *0.55 *0.70

*: statistical significance where null-hypothesis tested: “there is no correlation”, p-value<0.01

Thesis Defense, Aug 20th, 2014

slide-32
SLIDE 32

8/20/2014 32

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 63

  • Introduction
  • Paraphrase Extraction

– Vanilla Espresso (Baseline) – Espresso Extension (Baseline2) – Diversifiable Bootstrapping – Distributional Type Filtering

  • Paraphrase Evaluation Metric: DIMPLE
  • Experiment

– Design – Evaluation Results

  • Conclusion

Outline

Carnegie Mellon

List of Relations

Thesis Defense, Aug 20th, 2014 64

Seed source: N: NELL[Carlson et al., 2010] E: Ephyra [Schlaefer et al., 2006]

slide-33
SLIDE 33

8/20/2014 33

Carnegie Mellon

Paraphrase Extractors

Thesis Defense, Aug 20th, 2014 65

Extraction Algorithm Description Iteration Corpus

CPL

Baseline 1: Coupled Pattern Learner from NELL [Carlson et al., 2010] 860th ClueWeb09 25TB 500m pages (includes Wikipedia)

VANILLA

Baseline 2: Espresso[Pantel

& Pennacchiotti, 2006] w/o web

10th and/or convergence at τ2=0.01 Wikipedia 7GB 2.1m pages 50m sentences

BPL

Baseline 3: Extended Espresso; Bootstrap Paraphrase Learner (λ=1.0)

D-BPL

Proposed: BPL with Diversification (λ=0.75) Carnegie Mellon

Gold standard labels for patterns

Thesis Defense, Aug 20th, 2014 66

Label M Matched (high certainty) O Matched & Out-of-dictionary I Inconclusive depends on the context (medium certainty) R Related (no or very small certainty) A Antonym W Wrong

correct incorrect

slide-34
SLIDE 34

8/20/2014 34

Carnegie Mellon

Judging: M, O, R

Thesis Defense, Aug 20th, 2014 67

  • (M) X died of Y
  • (M) X passes to Y
  • (M) X perished in Y
  • (M) X succumbed to Y
  • (O) X fell victim to Y
  • (O) X was terminally ill with Y
  • (O) X suffered a fatal Y
  • (R) X was diagnosed with Y
  • (M) X was diagnosed with Y, and died

died-of Carnegie Mellon

  • DIMPLE
  • Precision
  • Recall

Evaluation metrics on paraphrase patterns

slide-35
SLIDE 35

8/20/2014 35

Carnegie Mellon

Label Distribution

Thesis Defense, Aug 20th, 2014 69

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 70

  • Introduction
  • Paraphrase Extraction

– Vanilla Espresso (Baseline) – Espresso Extension (Baseline2) – Diversifiable Bootstrapping – Distributional Type Filtering

  • Paraphrase Evaluation Metric: DIMPLE
  • Experiment

– Design – Evaluation Results

  • Conclusion

Outline

slide-36
SLIDE 36

8/20/2014 36

Carnegie Mellon

  • Effect of diversification
  • Effect of type-based filtering

Outline of Experiments

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 72

Main Results (Metric: DIMPLE)

0.02 0.04 0.06 0.08 0.1 0.12 1 2 3 4 5 6 7 8 9 10 DIMPLE Iteration

D-BPL BPL VANILLA

p-values CPL & D-BPL: 0.042 VANILLA & D-BPL: 0.023 BPL & D-BPL: 0.048

slide-37
SLIDE 37

8/20/2014 37

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 73

Main Results (Metric: Precision)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 2 3 4 5 6 7 8 9 10 Precision Iteration

D-BPL BPL VANILLA

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 74

Main Results (Metric: Recall)

0.1 0.2 0.3 0.4 0.5 0.6 1 2 3 4 5 6 7 8 9 10 RECALL Iteration

D-BPL BPL VANILLA

slide-38
SLIDE 38

8/20/2014 38

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 75

Overall results (11 relations; macro-avg)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 2 3 4 5 6 7 8 9 10 Precision Iteration

D-BPL BPL VANILLA

0.02 0.04 0.06 0.08 0.1 0.12 1 2 3 4 5 6 7 8 9 10 DIMPLE Iteration

D-BPL BPL VANILLA

0.1 0.2 0.3 0.4 0.5 0.6 1 2 3 4 5 6 7 8 9 10 RECALL Iteration

D-BPL BPL VANILLA

2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 Num of Distinct Keywords Iteration

D-BPL BPL VANILLA

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 76

Effect of type-constraint

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 2 3 4 5 6 7 8 9 10 Precision Iteration

D-BPL(+) D-BPL(-)

D-BPL with type scoring without type scoring

slide-39
SLIDE 39

8/20/2014 39

Carnegie Mellon

Example:

LEADER(X:person, Y:organization)

Thesis Defense, Aug 20th, 2014 77

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 78

Y, President of X Y, president of X Y, president of X Y, president of X Y, former president of X Y's regime in X X - Y, President president of X, Y Y's government in X Y, former president of X X's president Y X an leader Y president of X, Y Y, the president of X X an dictator Y President of X, Y Y (president of X Y to X to face trial X n president Y president Y of X Y (captain general, X X's President Y X's president, Y Y from power in X X's president Y Y, the current president of X X, led by Y Y, the president of X Y is elected president of X banned in X during Y Y (president of X X ian president Y invaded and annexed by X (under Y president Y of X Y - president of X war against Y's X President Y of X Y, the former president of X Y to the presidency of X Y - former president of X Y becomes president of X Y is made premier of X X's president, Y Y, current president of X unification with Y's X

VANILLA BPL D-BPL

Example:

LEADER(X:person, Y:organization)

Top 15 paraphrases patterns

slide-40
SLIDE 40

8/20/2014 40

Carnegie Mellon

Example: LEADER(X:person, Y:organization)

Thesis Defense, Aug 20th, 2014 79

(selected)

Carnegie Mellon

Example:

person_graduated_school (X: person, Y:org)

Thesis Defense, Aug 20th, 2014 80

slide-41
SLIDE 41

8/20/2014 41

Carnegie Mellon

Example: person_graduated_school (X: person, Y:org)

Thesis Defense, Aug 20th, 2014 81

Patterns by CPL/NELL (top 100) Carnegie Mellon

Example: person_graduated_school (X: person, Y:org)

Thesis Defense, Aug 20th, 2014 82

VANILLA BPL D-BPL

High School, X attended Y X graduated from Y X graduated from Y high school, X attended Y X is a graduate of Y attended Y, where X School, X attended Y X has taught at Y X has taught at Y school, X attended Y attended Y, where X Y, where X majored X attended Y Y, where X majored X received his undergraduate degree from Y X graduated from Y X attended Y X joined the faculty at Y graduating from high school, X attended Y X taught at Y X studied at Y high school, X attended Y where he played X received his undergraduate degree from Y X was a visiting professor at Y X attended Y where he played X studied at Y Y, where X earned Y, where X majored X graduated at Y X accepted a position at Y attended Y, where X Y, where X graduated X then went to Y X taught at Y high school, X attended Y Y, where X was a member X has taught at Y X joined the faculty at Y X played college football for Y X is a graduate of Y X graduated with honors from Y X is a graduate of Y X received his undergraduate degree from Y X was graduated from Y science at Y, where X Top 15 paraphrases patterns

slide-42
SLIDE 42

8/20/2014 42

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 83

  • Introduction
  • Paraphrase Extraction

– Vanilla Espresso (Baseline) – Espresso Extension (Baseline2) – Diversifiable Bootstrapping – Distributional Type Filtering

  • Paraphrase Evaluation Metric: DIMPLE
  • Experiment

– Design – Evaluation Results

  • Conclusion

Outline

Carnegie Mellon

Summary of Contributions

Thesis Defense, Aug 20th, 2014 84

Limitations in State-of-the-art Confirmed Hypothesis Supporting Evidence Corpus restriction

It is possible to extract paraphrase templates from an unstructured monolingual corpus given seed instances. BPL & D-BPL outperforms the baselines in precision, recall and number of distinct keywords.

Lack of lexical diversity

Lexical diversity of acquired paraphrase can be controlled with a model of relevance- dissimilarity interpolation. A statistically significant difference (p < 0.05) in DIMPLE was observed between the diversifiable bootstrapping and the baselines.

Semantic drift

Semantic drift risk from diversification be mitigated by distributional type restriction. When type-based instance filtering is enabled, precision is constantly above the baseline and does not steeply drop.

Lack of evaluation metric

Cumulative-gain style metric which gives reward to lexically diverse paraphrases is effective. DIMPLE correlates with paraphrase recognition task performance, with a Pearson's r of +0.5 ~ +0.7 with a statistical significance (p < 0.01).

slide-43
SLIDE 43

8/20/2014 43

Carnegie Mellon

Future Works

  • Co-reference resolution

– full name : last name only : pronoun = 1 : 5.8 : 6.7 (wikipedia) – Data-sparseness issue when calculating reliability – generate and add sentence replacing a reference with referent (avoid double count)

  • Corpus-specific paraphrase extraction

– Medical, Legal, Sports etc

  • Robust vector representation for type-scoring

– YAGO covers 9.8M entities, but there’s still coverage issue

  • DIMPLE's "Q" (Quality) by O/M/I labels
  • Extrinsic evaluation (QA)
  • Feature-based trainable scorer

– Using multiple features (pos seq, context vector, type vector, dict feature) – Optimize w.r.t. different application needs / labels

85 Thesis Defense, Aug 20th, 2014

Carnegie Mellon

Wrap up: immediate tasks to do

  • Complete annotating all 15 relations
  • Calculate Inter-annotator agreement

– Compare fine- vs coarse-grain ({M, O, I} vs {R, A, W}) – By Cohen's Kappa

  • Related works

– Especially, Coupled Pattern Learner (NELL)

  • Analysis of Precision vs Avg Reliability
  • Release D-BPL code + Evaluation tool+ annotated data

86 Thesis Defense, Aug 20th, 2014

slide-44
SLIDE 44

8/20/2014 44

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 87

Web-based Experiment Management Tool

Carnegie Mellon

Thesis Defense, Aug 20th, 2014 88

Web-based Experiment Management Tool

slide-45
SLIDE 45

8/20/2014 45

Carnegie Mellon

Conclusion

  • Developed a paraphrase extraction algorithm that can acquire

lexically-diverse binary-relation paraphrase templates, given a relatively small number of seed instances for a certain relation and an unstructured monolingual corpus. – Diversification is effective: a statistically significant difference in DIMPLE was observed between the Diversifiable Bootstrapping (D-BPL) and the two baseline algorithms (D-BPL without diversification and vanilla Espresso). – Distributional type scoring is effective: when enabled, precision drop became less steel in early iterations, suggesting semantic drift is mitigated.

89 Thesis Defense, Aug 20th, 2014

Carnegie Mellon

Questions?

Thesis Defense, Aug 20th, 2014 90