Extraction of Entailed Semantic Relations Through Syntax-based - - PowerPoint PPT Presentation

extraction of entailed semantic relations through syntax
SMART_READER_LITE
LIVE PREVIEW

Extraction of Entailed Semantic Relations Through Syntax-based - - PowerPoint PPT Presentation

Extraction of Entailed Semantic Relations Through Syntax-based Comma Resolution Vivek Srikumar Roi Reichart Mark Sammons Ari Rappoport Dan Roth University of Illinois, Urbana-Champaign Hebrew University of Jerusalem 1 The City of


slide-1
SLIDE 1

Extraction of Entailed Semantic Relations Through Syntax-based Comma Resolution

Vivek Srikumar Roi Reichart Mark Sammons Ari Rappoport Dan Roth

University of Illinois, Urbana-Champaign Hebrew University of Jerusalem

1

slide-2
SLIDE 2

The City of Chicago’s OEMC and IBM launch Advanced Video Surveillance System, part of Operation Virtual Shield. . The City of Chicago possesses OEMC. . The City of Chicago’s OEMC, IBM form a conjunction . Advanced Video Surveillance System is part of Operation

Virtual Shield.

2

slide-3
SLIDE 3

Motivation

  • Sentences can be decomposed into smaller ones
  • Smaller sentences are easier to process
  • Syntax gives us cues for decomposition

Along the lines of (Chandrasekar and Srinivas, ’96)

3

slide-4
SLIDE 4

Outline

1

Task: Comma Resolution

2

Learning to Transform Sentences

3

Evaluation

4

slide-5
SLIDE 5

Outline

1

Task: Comma Resolution

2

Learning to Transform Sentences What are we learning from? The Learning Procedure

3

Evaluation The Comma Data Set Experiments

5

slide-6
SLIDE 6

Commas tell us something

  • Authorities have arrested John Smith, a police officer.

⇒ John Smith is a police officer.

6

slide-7
SLIDE 7

Commas tell us something

  • Authorities have arrested John Smith, a police officer.

⇒ John Smith is a police officer.

  • Authorities have arrested John Smith, a police officer and his

brother today. ⇒ John Smith, a police officer, his brother are elements of a list.

6

slide-8
SLIDE 8

Commas tell us something

  • Authorities have arrested John Smith, a police officer.

⇒ John Smith is a police officer.

  • Authorities have arrested John Smith, a police officer and his

brother today. ⇒ John Smith, a police officer, his brother are elements of a list.

  • They live in Chicago, IL.

⇒ Chicago is located in IL.

6

slide-9
SLIDE 9

Commas tell us something

Commas indicate several syntactic phenomena

  • Appositives
  • Lists
  • Clausal modifiers
  • Locations
  • Many others...

Each interpretation implies different relationships .

(van Delden and Gomez, 2002) (Bayraktar et al., 1998)

7

slide-10
SLIDE 10

Commas come in different flavors

. SUBSTITUTE . ATTRIBUTE . LOCATION . LIST . OTHER

8

slide-11
SLIDE 11

Commas come in different flavors

. SUBSTITUTE: An IS-A relation between the arguments

John Smith, a police officer, was arrested.

⇒ John Smith is a police officer. John Smith was arrested. A police officer was arrested. . ATTRIBUTE . LOCATION . LIST . OTHER

8

slide-12
SLIDE 12

Commas come in different flavors

. SUBSTITUTE . ATTRIBUTE: One argument is an attribute of the other

John Smith, 61, was arrested.

⇒ John Smith is 61. John Smith was arrested. . LOCATION . LIST . OTHER

8

slide-13
SLIDE 13

Commas come in different flavors

. SUBSTITUTE . ATTRIBUTE . LOCATION: A located-in relation

Chicago, Illinois saw some snow today.

⇒ Chicago is located in Illinois. . LIST . OTHER

8

slide-14
SLIDE 14

Commas come in different flavors

. SUBSTITUTE . ATTRIBUTE . LOCATION . LIST: A list of entities, adjectives, actions, etc.

John, James and Kelly left last week.

⇒ { John, James, Kelly } form a group. . OTHER

8

slide-15
SLIDE 15

Commas come in different flavors

. SUBSTITUTE . ATTRIBUTE . LOCATION . LIST . OTHER: Everything else

However, he cheered up quickly. “So what if I can’t spell pesticde,” he said.

⇒ Discourse information, pauses, etc.

8

slide-16
SLIDE 16

Commas come in different flavors

. SUBSTITUTE . ATTRIBUTE . LOCATION . LIST . OTHER

8

slide-17
SLIDE 17

Comma Resolution

Given a sentence, Comma resolution consists of:

  • Interpreting the type of each comma
  • Decomposing the sentence based on the interpretation
  • Meaning is preserved

9

slide-18
SLIDE 18

Why Comma Resolution?

  • Shorter sentences can be analyzed better
  • Decomposition helps other tasks involving text

understanding

10

slide-19
SLIDE 19

Why Comma Resolution?

  • Shorter sentences can be analyzed better
  • Decomposition helps other tasks involving text

understanding For example, think about textual entailment. Given a sentence T, is H true?

10

slide-20
SLIDE 20

Outline

1

Task: Comma Resolution

2

Learning to Transform Sentences What are we learning from? The Learning Procedure

3

Evaluation The Comma Data Set Experiments

11

slide-21
SLIDE 21

Outline

1

Task: Comma Resolution

2

Learning to Transform Sentences What are we learning from? The Learning Procedure

3

Evaluation The Comma Data Set Experiments

12

slide-22
SLIDE 22

Representation of Sentences

Example: Both are produced by the same company, Macmillan-McGraw-Hill, a joint venture of McGraw-Hill Inc. and Macmillan’s parent, Maxwell Communication Corp. .

13

slide-23
SLIDE 23

Representation of Sentences

Example: Both are produced by the same company, Macmillan-McGraw-Hill, a joint venture of McGraw-Hill Inc. and Macmillan’s parent, Maxwell Communication Corp. . Macmillan-McGraw-Hill is a joint venture of ...

13

slide-24
SLIDE 24

Representation of Sentences

Example: Both are produced by the same company, Macmillan-McGraw-Hill, a joint venture of McGraw-Hill Inc. and Macmillan’s parent, Maxwell Communication Corp. . Macmillan-McGraw-Hill is a joint venture of ... . Macmillan’s parent is Maxwell Communication Corp.

13

slide-25
SLIDE 25

Representation of Sentences

Example: Both are produced by the same company, Macmillan-McGraw-Hill, a joint venture of McGraw-Hill Inc. and Macmillan’s parent, Maxwell Communication Corp. . Macmillan-McGraw-Hill is a joint venture of ... . Macmillan’s parent is Maxwell Communication Corp.

  • Relations might be nested
  • We need hierarchical information.
  • Parse trees encode this

13

slide-26
SLIDE 26

Sentence Transformation Rules

We want to do two things –

  • Look for a pattern in the parse tree of a sentence
  • If we find the pattern, then we generate new sentences

using the matched parts. A Sentence Transformation Rule (STR) does these. More on STRs later · · ·

14

slide-27
SLIDE 27

Outline

1

Task: Comma Resolution

2

Learning to Transform Sentences What are we learning from? The Learning Procedure

3

Evaluation The Comma Data Set Experiments

15

slide-28
SLIDE 28

An Algorithm Outline

For every example –

  • Learn a Sentence Transformation Rule from the example
  • Refine it with statistics taken over the entire dataset
  • Remove all covered examples

16

slide-29
SLIDE 29

An Algorithm Outline

For every example –

  • Learn the most general STR from the example
  • Refine it with statistics taken over the entire dataset
  • Remove all covered examples

17

slide-30
SLIDE 30

An Algorithm Outline

For every example –

  • Learn the most general STR from the example
  • Specialize it with statistics taken over the entire dataset
  • Remove all covered examples

18

slide-31
SLIDE 31

An Algorithm Outline

For every example –

  • Learn the most general STR from the example
  • Specialize it with statistics taken over the entire dataset
  • Remove all covered examples

This is A Sentence Transformation Rule Learner (ASTRL)

18

slide-32
SLIDE 32

Learning from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone. . But Fujitsu is n’t alone. . But Japan ’s No. 1 computer maker is n’t alone. . Fujitsu is Japan ’s No. 1 computer maker.

19

slide-33
SLIDE 33

Learning from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone. . But Fujitsu is n’t alone. . But Japan ’s No. 1 computer maker is n’t alone. . Fujitsu is Japan ’s No. 1 computer maker.

19

slide-34
SLIDE 34

Learning from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC But NP-SBJ NP NNP Fujitsu , , NP Japan ’s No. 1 computer maker , , VP is n’t alone

20

slide-35
SLIDE 35

Learning from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC NP-SBJ NP NNP Fujitsu , , NP Japan ’s No. 1 computer maker , , VP is n’t alone

21

slide-36
SLIDE 36

Learning from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC NP-SBJ NP , , NP Japan ’s No. 1 computer maker , , VP is n’t alone

22

slide-37
SLIDE 37

Learning from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC NP-SBJ NP , NP , VP

23

slide-38
SLIDE 38

Learning from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC NP-SBJ NP , NP , VP

. But Fujitsu is n’t alone. . But Japan ’s No. 1 computer maker is n’t alone. . Fujitsu is Japan ’s No. 1 computer maker.

24

slide-39
SLIDE 39

Learning from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC NP-SBJ NP , NP , VP

. CC NP VP. . But Japan ’s No. 1 computer maker is n’t alone. . Fujitsu is Japan ’s No. 1 computer maker.

24

slide-40
SLIDE 40

Learning from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC NP-SBJ NP , NP , VP

. CC NP VP. . CC NP VP. . NP is NP.

25

slide-41
SLIDE 41

Learning from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC NP-SBJ NP , NP , VP

. CC NP VP. . CC NP VP. . NP is NP.

26

slide-42
SLIDE 42

Learning from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC NP-SBJ NP , NP , VP

. CC NP VP. . CC NP VP. . NP is NP. Abstracted away some details from parse tree.

Can we get a smaller pattern?

26

slide-43
SLIDE 43

Learning more from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC NP-SBJ NP , NP , VP

27

slide-44
SLIDE 44

Learning more from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC NP-SBJ NP , NP , VP

Leaves of this pattern tree: CC NP , NP , VP . CC NP VP . CC NP VP . NP is NP

27

slide-45
SLIDE 45

Learning more from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC NP-SBJ NP , NP , VP

Leaves of this pattern tree: CC NP , NP , VP CC NP VP

27

slide-46
SLIDE 46

Learning more from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC NP-SBJ NP , NP , VP

Leaves of this pattern tree: CC NP , NP , VP CC NP VP NP has substituted NP, NP,

27

slide-47
SLIDE 47

Learning more from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC NP-SBJ NP , NP , VP

Leaves of this pattern tree: CC NP , NP , VP CC NP VP NP has substituted NP-SBJ

27

slide-48
SLIDE 48

Learning more from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC NP-SBJ NP , NP , VP

Leaves of this pattern tree: CC NP , NP , VP CC NP VP But Fujitsu is n’t alone NP has substituted NP-SBJ Fujitsu has substituted Fujitsu, Japan ’s No. 1 computer maker,

27

slide-49
SLIDE 49

Learning from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC NP-SBJ NP , NP , VP

. CC NP VP. . CC NP VP. . NP is NP.

28

slide-50
SLIDE 50

Learning from a Single Example

But Fujitsu, Japan ’s No. 1 computer maker, is n’t alone.

S CC NP-SBJ NP , NP , VP

. CC NP VP. . CC NP VP. . NP is NP.

NP-SBJ NP , NP ,

. NP (Substitute) . NP (Substitute) . NP is NP. (Introduce) Substitute: NP substitutes root of pattern (NP-SBJ) Introduce: The relation is independent of the context in which the pattern is found.

28

slide-51
SLIDE 51

Definition: Sentence Transformation Rule

L → R

29

slide-52
SLIDE 52

Definition: Sentence Transformation Rule

L → R

  • L: A tree fragment (think part of a parse tree)

NP-SBJ NP , NP , 29

slide-53
SLIDE 53

Definition: Sentence Transformation Rule

L → R

  • L: A tree fragment (think part of a parse tree)

NP-SBJ NP , NP ,

  • R: Set of combinations of nodes of L, possibly with new

tokens

. NP . NP . NP is NP.

29

slide-54
SLIDE 54

Definition: Sentence Transformation Rule

L → R

  • L: A tree fragment (think part of a parse tree)

NP-SBJ NP , NP ,

  • R: Set of combinations of nodes of L, possibly with new

tokens

. NP (Substitute) . NP (Substitute) . NP is NP. (Introduce)

  • Each relation is marked as Introduce or Substitute

29

slide-55
SLIDE 55

Sentence Transformation Rules

Applying L → R to a sentence, whose parse tree is p

1 Look for matches of L in p 2 For every match:

Generate new sentences, as specified by R

30

slide-56
SLIDE 56

Learning from a Single Example

NP-SBJ NP , NP ,

. NP (Substitute) . NP (Substitute) . NP is NP. (Introduce)

  • This is the smallest pattern tree that is possible.

Easy to check

  • This is the most general STR for the example.

31

slide-57
SLIDE 57

An Algorithm Outline

For every example –

  • Learn the most general STR from the example
  • Specialize it with statistics taken over the entire dataset
  • Remove all covered examples

This is A Sentence Transformation Rule Learner (ASTRL)

32

slide-58
SLIDE 58

An Algorithm Outline

For every example –

Learn the most general STR from the example

  • Specialize it with statistics taken over the entire dataset
  • Remove all covered examples

This is A Sentence Transformation Rule Learner (ASTRL)

32

slide-59
SLIDE 59

Why Refinement?

John Smith, 61, was arrested.

S NP-SBJ NP John Smith , , NP CD 61 , , VP was arrested

33

slide-60
SLIDE 60

Why Refinement?

John Smith, 61, was arrested.

S NP-SBJ NP John Smith , , NP CD 61 , , VP was arrested

generates

NP-SBJ NP , NP ,

33

slide-61
SLIDE 61

Why Refinement?

John Smith, 61, was arrested.

S NP-SBJ NP John Smith , , NP CD 61 , , VP was arrested

generates

NP-SBJ NP , NP ,

But this is not an apposition. CD will help disambiguate.

33

slide-62
SLIDE 62

Why Refinement?

John Smith, 61, was arrested.

S NP-SBJ NP John Smith , , NP CD 61 , , VP was arrested

generates

NP-SBJ NP , NP ,

But this is not an apposition. CD will help disambiguate.

NP-SBJ NP , NP CD ,

33

slide-63
SLIDE 63

Why Refinement?

Problem:

  • The most general STR is generated from a single example
  • It might cover other phenomena too

i.e. it might be too general Solution: Specialize the STR to maximize performance over the entire dataset.

34

slide-64
SLIDE 64

A Sentence Transformation Rule Learner (ASTRL)

For a given comma type t:

1 p = All examples of type t

35

slide-65
SLIDE 65

A Sentence Transformation Rule Learner (ASTRL)

For a given comma type t:

1 p = All examples of type t 2 For each example in p: 1 r = Most general STR that covers this example

35

slide-66
SLIDE 66

A Sentence Transformation Rule Learner (ASTRL)

For a given comma type t:

1 p = All examples of type t 2 For each example in p: 1 r = Most general STR that covers this example 2 Compute score of r

Score = fraction of positive examples covered − fraction of negative examples covered

35

slide-67
SLIDE 67

A Sentence Transformation Rule Learner (ASTRL)

For a given comma type t:

1 p = All examples of type t 2 For each example in p: 1 r = Most general STR that covers this example 2 Compute score of r 3 Get all neighbors of the pattern from the parse tree

35

slide-68
SLIDE 68

A Sentence Transformation Rule Learner (ASTRL)

For a given comma type t:

1 p = All examples of type t 2 For each example in p: 1 r = Most general STR that covers this example 2 Compute score of r 3 Get all neighbors of the pattern from the parse tree 4 For every neighbor 1

Add it to the pattern and recompute score

2

If the score is better, keep it and recalculate the neighbors 35

slide-69
SLIDE 69

A Sentence Transformation Rule Learner (ASTRL)

For a given comma type t:

1 p = All examples of type t 2 For each example in p: 1 r = Most general STR that covers this example 2 Compute score of r 3 Get all neighbors of the pattern from the parse tree 4 For every neighbor 1

Add it to the pattern and recompute score

2

If the score is better, keep it and recalculate the neighbors

5 Add r to list of STRs 6 Remove all covered examples from p

35

slide-70
SLIDE 70

Outline

1

Task: Comma Resolution

2

Learning to Transform Sentences What are we learning from? The Learning Procedure

3

Evaluation The Comma Data Set Experiments

36

slide-71
SLIDE 71

Outline

1

Task: Comma Resolution

2

Learning to Transform Sentences What are we learning from? The Learning Procedure

3

Evaluation The Comma Data Set Experiments

37

slide-72
SLIDE 72

Data Set

  • 1000 sentences from WSJ corpus, all with commas
  • Manually annotated with comma type and transformed

sentences

  • Four annotators, high agreement

Data available for download at http://L2R.cs.uiuc.edu/~cogcomp/data.php

38

slide-73
SLIDE 73

Data: Example Annotation

But Fujitsu [1], Japan’s No. 1 computer maker [1], isn’t alone. [1] SUBSTITUTE . But Fujitsu is n’t alone. . But Japan ’s No. 1 computer maker is n’t alone. . Fujitsu is Japan ’s No. 1 computer maker.

39

slide-74
SLIDE 74

Data: Comma types

. SUBSTITUTE: An IS-A relation between the arguments

John Smith, a police officer, was arrested.

. ATTRIBUTE: One argument is an attribute of the other

John Smith, 61, was arrested.

. LOCATION: A located-in relation

Chicago, Illinois saw some snow today.

. LIST: A list of entities, adjectives, actions, etc.

John, James and Kelly left last week.

. OTHER: Everything else

However, he cheered up soon.

40

slide-75
SLIDE 75

Outline

1

Task: Comma Resolution

2

Learning to Transform Sentences What are we learning from? The Learning Procedure

3

Evaluation The Comma Data Set Experiments

41

slide-76
SLIDE 76

Experimental Setup

1 GOLD-GOLD

Train and test using gold standard trees

2 GOLD-CHARNIAK

Train with gold standard trees and test with trees generated by a statistical parser (Charniak and Johnson, 2005)

3 CHARNIAK-CHARNIAK

Train and test using generated parses

42

slide-77
SLIDE 77

Results: Extracted Sentences

  • Most relevant for entailment applications
  • Different comma types used for learning STRs
  • Only relations scored during evaluation, irrespective of

type.

Setting P R F Gold-Gold 86.1 75.4 80.2 Gold-Charniak 77.3 60.1 68.1 Charniak-Charniak 77.2 64.8 70.4

43

slide-78
SLIDE 78

Results

  • Performance of ASTRL is close to human agreement
  • Even the most general STR is often quite good

Specialization disambiguates SUBSTITUTE and ATTRIBUTE.

ATTRIBUTE F-score gain of ≈ 14 %

  • F-score for identifying OTHER ≈ 80%

44

slide-79
SLIDE 79

Conclusion

  • Defined Comma Resolution
  • Extracting relations based on commas
  • Developed an annotation scheme and released an

annotated dataset

  • Developed an efficient learning algorithm
  • Uses syntax to learn and generate relations
  • Same technique could be used to draw inferences from
  • ther other phenomena (like possessives)

45