A Consolidated Open Knowledge Representation for Multiple Texts - - PowerPoint PPT Presentation

a consolidated open knowledge representation for multiple
SMART_READER_LITE
LIVE PREVIEW

A Consolidated Open Knowledge Representation for Multiple Texts - - PowerPoint PPT Presentation

A Consolidated Open Knowledge Representation for Multiple Texts Rachel Wities , Vered Shwartz, Gabriel Stanovsky, Meni Adler, Ori Shapira, Shyam Upadhyay, Dan Roth, Eugenio Martinez Camara, Iryna Gurevych and Ido Dagan 1 Outline: Consolidated


slide-1
SLIDE 1

A Consolidated Open Knowledge Representation for Multiple Texts

Rachel Wities, Vered Shwartz, Gabriel Stanovsky, Meni Adler, Ori Shapira, Shyam Upadhyay, Dan Roth, Eugenio Martinez Camara, Iryna Gurevych and Ido Dagan

1

slide-2
SLIDE 2
  • Consolidated semantic representation for multiple texts
  • Annotated dataset of news-related tweets
  • Automatic baseline and results

Outline:

2

slide-3
SLIDE 3

Consolidated Representation

3

slide-4
SLIDE 4

Semantic representations are focused on single sentences.

Single Sentence Semantic Representations

4

slide-5
SLIDE 5

Semantic representations are focused on single sentences. Example: Open IE pred-arg tuples: 3 people dead in shooting in Wisconsin.

1. (shooting in, Wisconsin) 2. (three, dead in, shooting)

Single Sentence Semantic Representations

5

slide-6
SLIDE 6

Applications often need to consolidate information from multiple texts:

Goal: Consolidated Representation

6

slide-7
SLIDE 7

Applications often need to consolidate information from multiple texts: 3 people dead in shooting in Wisconsin. Man kills three in Spa shooting. Shooter was identified as Radcliffe Haughton, 45.

  • Question answering

○ How many people did Radcliffe Haughton shoot?

  • Abstractive summarization

○ Radcliffe Haughton, 45, kills three in Spa shooting in Wisconsin.

Goal: Consolidated Representation

7

slide-8
SLIDE 8

Applications often need to consolidate information from multiple texts: 3 people dead in shooting in Wisconsin. Man kills three in Spa shooting. Shooter was identified as Radcliffe Haughton, 45.

  • Question answering

○ How many people did Radcliffe Haughton shoot?

  • Abstractive summarization

○ Radcliffe Haughton, 45, kills three in Spa shooting in Wisconsin. Consolidation usually done at the application level, to a partial extent.

Goal: Consolidated Representation

8

slide-9
SLIDE 9

Our Proposal: Consolidated Propositions

  • Generic semantic structures that represent multiple texts
  • Can be used for various semantic applications
  • “Out of the box” - another step in the semantic NLP pipeline

Generic consolidated representation

9

Multiple texts Black Box

slide-10
SLIDE 10

Our Solution

1. Predicate-argument structure for single sentences

○ Current scope: Open IE

2. Consolidating propositions based on coreference 3. Representing information overlap/containment via lexical entailments

10

slide-11
SLIDE 11

Our Solution

1. Extract propositions for single sentences

○ Current scope: use Open IE proposition

2. Consolidating propositions based on coreference 3. Representing information overlap/containment via lexical entailments ⇒ Open Knowledge Representation structure (OKR)

11

slide-12
SLIDE 12

OKR Pipeline

  • Leverage known NLP tasks!

12

Entity and proposition mention extraction Entity and event coreference Arguments alignment Entailment within consolidated elements consolidation

slide-13
SLIDE 13
  • Extract entity and proposition mentions at single sentence level:

Entity & Proposition Extraction

Entity mentions: 1. 3 people 2. Wisconsin 3. man 4. Three 5. ... 3 people dead in shooting in Wisconsin. Man kills three in spa shooting . Shooter was identified as Radcliffe Haughton, 45. Proposition mentions: 1. (3 people, dead in, shooting) 2. (shooting in, Wisconsin) 3. (Man, kills, three, shooting) 4. (spa, shooting) 5. ...

13

Entity and proposition mention extraction Entity and event coreference Arguments alignment Entailment within consolidated elements consolidation

slide-14
SLIDE 14
  • Create coreference chains of entity mentions

3 people dead in shooting in Wisconsin. Man kills three in spa shooting . Shooter was identified as Radcliffe Haughton, 45.

Entity Coreference

Entities: E1: {3 people, three} E2: {man, shooter, Radcliffe Haughton} E3: ...

14

Entity and proposition mention extraction Entity and event coreference Arguments alignment Entailment within consolidated elements consolidation

slide-15
SLIDE 15

3 people dead in shooting in Wisconsin. Man kills three in spa shooting . Shooter was identified as Radcliffe Haughton, 45.

  • Create coreference chains of entity mentions

Event Coreference

P1: {(3 people, dead in, shooting), (Man, kills, three, shooting)} P2: {(shooting in, Wisconsin), (spa, shooting)} P3: ...

15

Entity and proposition mention extraction Entity and event coreference Arguments alignment Entailment within consolidated elements consolidation

slide-16
SLIDE 16
  • Align arguments of corefering propositions based on semantic role:

P1: {(3 people, dead in, shooting), (Man, kills, three, shooting)} P2: {(shooting in, Wisconsin), ( spa, shooting)}

Argument Alignment

16

Entity and proposition mention extraction Entity and event coreference Arguments alignment Entailment within consolidated elements consolidation a2 a2 a1 a1 a3 a3 a1

slide-17
SLIDE 17

Consolidation of propositions:

Entity and proposition mention extraction Entity and event coreference Arguments alignment Entailment within consolidated elements consolidation P1: {(3 people, dead in, shooting), (Man, kills, three, shooting)} { [a2] dead in [a3], [a1] kills [a2] in [a3] }

slide-18
SLIDE 18

Consolidation of propositions:

18

Entity and proposition mention extraction Entity and event coreference Arguments alignment Entailment within consolidated elements consolidation P1: {(3 people, dead in, shooting), (Man, kills, three, shooting)} a2 a2 a3 a3 a1 { [a2] dead in [a3], [a1] kills [a2] in [a3] }

E2 {3 people, three} a2 P2 {shooting} a3 E1 {Man, Radcliff Haughton, shooter} a1

{ [a2] dead in [a3], [a1] kills [a2] in [a3] }

a2 a3 a1

slide-19
SLIDE 19

Consolidation of propositions:

19

Entity and proposition mention extraction Entity and event coreference Arguments alignment Entailment within consolidated elements consolidation P1: {(3 people, dead in, shooting), (Man, kills, three, shooting)} E1: {3 people, three} E2: {man, shooter, Radcliffe Haughton} a2 a2 a3 a3 a1 { [a2] dead in [a3], [a1] kills [a2] in [a3] }

E2 {3 people, three} a2 P2 {shooting} a3 E1 {Man, Radcliff Haughton, shooter} a1

{ [a2] dead in [a3], [a1] kills [a2] in [a3] }

E2 {3 people, three} a2 P2 {shooting} a3 E1 {Man, Radcliff Haughton, shooter} a1

slide-20
SLIDE 20

Consolidation Properties:

  • All proposition information is concentrated in one structure
  • No redundancy
  • Tracking all original mentions
  • Allow generation of new sentences

○ “Radcliff Haughton kills 3 people in shooting” { [a2] dead in [a3], [a1] kills [a2] in [a3] }

E2 {3 people, three} a2 P2 {shooting} a3 E1 {Man, Radcliff Haughton, shooter} a1 20

slide-21
SLIDE 21

Still missing: modeling information overlap

  • “killed” is more specific than “dead”
  • “man” is more general than “Radcliff Haughton”
  • Need to model level of specificity of mentions
  • Our proposal: entailment graphs within

structure components

21

{ [a2] dead in [a3], [a1] kills [a2] in [a3] }

E2 {3 people, three} a2 P2 {shooting} a3 E1 {Man, Radcliff Haughton, shooter} a1

slide-22
SLIDE 22

Entailment between Elements

{ [a2] dead in [a3], [a1] kills [a2] in [a3] }

E2 {3 people three} a2 P2 {shooting} a3 E1 {Man shooter Radcliff Haughton} a1 22

{ [a2] dead in [a3], [a1] kills [a2] in [a3] }

E2 {3 people, three} a2 P2 {shooting} a3 E1 {Man, Radcliff Haughton, shooter} a1

Entity and proposition mention extraction Entity and event coreference Arguments alignment Entailment within consolidated elements consolidation

slide-23
SLIDE 23

Dataset and Baselines

23

slide-24
SLIDE 24

News-Related Tweets Dataset

  • OKR Annotation of 1257 news-related tweets from 27 event clusters

Collected from the Twitter Event Detection Dataset (McMinn et al., 2013)

  • Annotated Dataset characteristics:

○ High proportion of nominal predicates - 39% ■ Example: accident, demonstration ○ High entailment connectivity within coreference chains ■ 96% of our entailment graphs (entity and proposition) form a connected component

24

slide-25
SLIDE 25

Inter-Annotator Agreement

Entity Extraction

(avg. accuracy)

Entity Coref.

(CoNNL F1)

Proposition extraction

(avg. accuracy)

Predicates Arguments Predicate coreference

(CoNNL F1)

Entailment

(F1)

Entities Predicates

agreement

.85 .90 .74

Verbal

.93

Non verbal

.72

.85 .83 .70 .82

25

slide-26
SLIDE 26

Entity Extraction

(avg. accuracy)

Entity Coref.

(CoNNL F1)

Proposition extraction

(avg. accuracy)

Predicates Arguments Predicate coreference

(CoNNL F1)

Entailment

(F1)

Entities Predicates

agreement

.85 .90 .74

Verb.:

.93

Non verb.

.72

.85 .83 .70 .82

Inter-annotator agreement

  • Entity or Predicate?

■ Examples: terror, hurricane

26

slide-27
SLIDE 27

Baselines

  • Perform pipeline tasks independently
  • A simple baseline for each task:

○ Entity extraction – spaCy NER model and all nouns. ○ Proposition extraction - Open IE propositions extracted from PropS (Stanovsky et al., 2016). ○ Proposition and Entity coreference - clustering based on simple lexical similarity metrics ■ lemma matching, Levenshtein distance, Wordnet synset. ○ Argument alignment – align all mentions of the same entity ○ Entity Entailment - knowledge resources (Shwartz et al., 2015) and a pre-trained model for HypeNET (Shwartz et al., 2016) ○ Predicate Entailment - rules extracted by Berant et al. (2012)

27

slide-28
SLIDE 28

Baselines - results

Entity Extraction

(avg. accuracy)

Entity Coref.

(CoNNL F1)

Proposition extraction

(avg. accuracy)

Predicates Arguments Predicate coreference

(CoNNL F1)

Entailment

(F1)

Entities Predicates

agreement

.85 .90 .74

Verb. .93 Non verb. .72

.85 .83 .70 .82

predicted

.58 .85 .41

Verb. .73 Non verb. .25

.37 .56 .44 .56

28

slide-29
SLIDE 29

Baselines - results

  • Main challenges:

○ Recognize arguments for nominal predicates - current systems are verb-centric (well known)

Entity Extraction

(avg. accuracy)

Entity Coref.

(CoNNL F1)

Proposition extraction

(avg. accuracy)

Predicates Arguments Predicate coreference

(CoNNL F1)

Entailment

(F1)

Entities Predicates

agreement

.85 .90 .74

Verb. .93 Non verb. .72

.85 .83 .70 .82

predicted

.58 .85 .41

Verb. .73 Non verb. .25

.37 .56 .44 .56

29

slide-30
SLIDE 30

Baselines - results

  • Main challenges:

○ Recognize arguments for nominal predicates - current systems are verb-centric (well known) ○ Distinguish entity nouns from predicate nouns (organization vs. elections)

Entity Extraction

(avg. accuracy)

Entity Coref.

(CoNNL F1)

Proposition extraction

(avg. accuracy)

Predicates Arguments Predicate coreference

(CoNNL F1)

Entailment

(F1)

Entities Predicates

agreement

.85 .90 .74

Verb. .93 Non verb. .72

.85 .83 .70 .82

predicted

.58 .85 .41

Verb. .73 Non verb. .25

.37 .56 .44 .56

30

slide-31
SLIDE 31

Baselines - results

  • Main challenges:

○ Recognize arguments for nominal predicates - current systems are verb-centric (well known) ○ Distinguish entity nouns from predicate nouns (organization vs. elections) ○ Entity entailment is hard for multi-word expressions

Entity Extraction

(avg. accuracy)

Entity Coref.

(CoNNL F1)

Proposition extraction

(avg. accuracy)

Predicates Arguments Predicate coreference

(CoNNL F1)

Entailment

(F1)

Entities Predicates

agreement

.85 .90 .74

Verb. .93 Non verb. .72

.85 .83 .70 .82

predicted

.58 .85 .41

Verb. .73 Non verb. .25

.37 .56 .44 .56

31

slide-32
SLIDE 32

Baselines - results

  • Main challenges:

○ Recognize arguments for nominal predicates - current systems are verb-centric (well known) ○ Distinguish entity nouns from predicate nouns (organization vs. elections) ○ Entity entailment is hard for multi-word expressions ○ Predicate coreference is harder

Entity Extraction

(avg. accuracy)

Entity Coref.

(CoNNL F1)

Proposition extraction

(avg. accuracy)

Predicates Arguments Predicate coreference

(CoNNL F1)

Entailment

(F1)

Entities Predicates

agreement

.85 .90 .74

Verb. .93 Non verb. .72

.85 .83 .70 .82

predicted

.58 .85 .41

Verb. .73 Non verb. .25

.37 .56 .44 .56

32

slide-33
SLIDE 33

Future work:

  • Using OKR for summarization and for for interactive text exploration
  • OKR Version 2

○ Avoid distinguishing entities from predicates ○ Knowledge-graph perspective

  • Consolidation of other types of predicate-argument structures:

○ SRL ○ AMR

33

slide-34
SLIDE 34

Summary

  • We present a generic semantic representation for multiple texts
  • Consolidating propositions using coreference and entailment
  • 1257 annotated tweets
  • Our dataset is available at:

http://u.cs.biu.ac.il/~nlp/resources/downloads/twitter-events/

34

slide-35
SLIDE 35

35

slide-36
SLIDE 36

Outline:

  • Intro: motivation & positioning
  • Our solution:

○ Focus in this work: Open -IE predicate-argument structure for single sentences ○ Consolidation of propositions using coreference ○ Representing information overlap/containment via lexical entailments

  • Pipeline:

○ OIE extraction (show for a sentence, with same visual output - for single extractions) ○ Entity and event coref (same visual) ○ Consolidation - final visual (as in intro teaser)

  • Notes bullet slides - phenomena addressed - see paper: (2-3 points)

○ Nested propositions, implicit predicates, predicate representation as templates

  • Dataset and baseline slides - like in Saarland presentation
  • Conclusions

○ ?yes KG perspective ○ We focused on creating multi-text representations from OIE single sentence; future work may explore analogous representations based on other single sentence representations (e.g. AMR)

36

slide-37
SLIDE 37

Other phenomena addressed (see paper for more details)

  • Implicit and relation predicates

○ Examples: Radcliffe Haughton, 45 ⇒ IMPLICIT (Radcliffe Haughton;45)

  • Support

○ Number of mentions of each proposition is indicative to factuality and salience.

  • Predicate representation as templates

○ DIRT-like propositions

37

slide-38
SLIDE 38

Proposition Consolidation

Propositions mentions: 1. Dead in (At least 2; shooting) 2. Shooting in (Wisconsin) 3. Kills in (Man; three; shooting) 4. Shooting (Spa) 5. ... Propositions: { shooting in [a1], [a1] shooting] } P1

E1 {Wisconsin} E3 {spa} a1

Proposition and entity mention extraction Entity and event coreference Proposition and entity consolidation Entailment between elements

38

slide-39
SLIDE 39

OKR pipeline

Entities mentions: 1. At least 2 2. Wisconsin 3. Man 4. Three 5. Spa 6. Radcliffe Haughton 7. ... Entities: E1: {Man, Radcliff Haughton) E2: {At least 2} ... Proposition and entity mention extraction Entity and event coreference Entity and proposition consolidation Entailment between elements

39