ReNoun Fact Extraction for Nominal Attributes Mohamed Yahya, Steven - - PowerPoint PPT Presentation

renoun
SMART_READER_LITE
LIVE PREVIEW

ReNoun Fact Extraction for Nominal Attributes Mohamed Yahya, Steven - - PowerPoint PPT Presentation

ReNoun Fact Extraction for Nominal Attributes Mohamed Yahya, Steven Whang, Rahul Gupta, and Alon Halevy EMNLP // 2014.10.26 Nouns, Queries & Relations Nouns as Attributes % Noun % Verb KB Attributes Attributes 96 4 DBpedia


slide-1
SLIDE 1

ReNoun

Fact Extraction for Nominal Attributes

Mohamed Yahya, Steven Whang, 
 Rahul Gupta, and Alon Halevy

EMNLP // 2014.10.26

slide-2
SLIDE 2

Nouns, Queries 
 & Relations

slide-3
SLIDE 3
slide-4
SLIDE 4

Nouns as Attributes

KB % Noun Attributes % Verb Attributes DBpedia

96 4

Freebase

97 3

slide-5
SLIDE 5
slide-6
SLIDE 6

Nouns as Attributes

KB % Noun Attributes % Verb Attributes DBpedia

96 4

Freebase

97 3

slide-7
SLIDE 7

entity entity entity relation/attribute

7

slide-8
SLIDE 8

8

slide-9
SLIDE 9
slide-10
SLIDE 10

This talk is about 
 extracting facts 
 centered around 
 noun phrases (=attributes)

president coach province banking arm state assemblyman caucus chairman foreign policy chief

slide-11
SLIDE 11

11

(Open) Information Extraction

[Etzioni et al., CACM’08]

…Gazprom’s banking arm, Gazprombank, 
 is owned by the company’s pension fund… Gazprom Gazprombank banking arm

slide-12
SLIDE 12

Subject Object

12

relation

Text Fact/ triple (Open) Information Extraction

…Gazprom’s banking arm, Gazprombank, 
 is owned by the company’s pension fund… Gazprom Gazprombank banking arm

[Etzioni et al., CACM’08]

slide-13
SLIDE 13

Before the details:
 
 What’s missing from 
 the state of the art?

slide-14
SLIDE 14

2010 WOE 2011 ReVerb 2012 Ollie 2014 ReNoun 2013 ClausIE

slide-15
SLIDE 15

2010 WOE 2011 ReVerb 2012 Ollie 2014 ReNoun 2013 ClausIE

Notable exception, 
 check it out! [WWW’13]

slide-16
SLIDE 16

2010 WOE 2011 ReVerb 2012 Ollie 2014 ReNoun 2013 ClausIE

slide-17
SLIDE 17

Let’s see Ollie…

slide-18
SLIDE 18

18

Obama US

is president of

Obama is the president of the US.

arg1 arg2

V | V P | V W* P

slide-19
SLIDE 19

19

Obama US

is president of

{US/N} {president/N} {Obama/N} nn nn

slide-20
SLIDE 20

ReNoun [this talk] Ollie* [EMNLP’12]

president coach president coach province province banking arm banking arm state assemblyman state assemblyman caucus chairman caucus chairman vs. foreign policy chief foreign policy chief

20

* Ollie handles verbs, ReNoun doesn’t!

slide-21
SLIDE 21

ReNoun [this talk] Ollie* [EMNLP’12]

president coach president coach province province banking arm banking arm state assemblyman state assemblyman caucus chairman caucus chairman vs. foreign policy chief foreign policy chief

21

* Ollie handles verbs, ReNoun doesn’t!

slide-22
SLIDE 22

ReNoun [this talk] Ollie* [EMNLP’12]

president coach president coach province province banking arm banking arm state assemblyman state assemblyman caucus chairman caucus chairman vs. foreign policy chief foreign policy chief

F a t h e a d ( 2 1 8 ) L

  • n

g t a i l ( 6 K )

22

* Ollie handles verbs, ReNoun doesn’t!

slide-23
SLIDE 23

2010 WOE 2011 ReVerb 2012 Ollie 2014 ReNoun 2013 ClausIE

slide-24
SLIDE 24

2010 WOE 2011 ReVerb 2012 Ollie 2014 ReNoun 2013 ClausIE

slide-25
SLIDE 25

… now, ReNoun is upon us!

slide-26
SLIDE 26

26

Entity Entity ReNoun [this talk] noun phrase

Relations are expressed using noun phrases.

slide-27
SLIDE 27

ReNoun

Seed fact extraction Dependency pattern learning Fact extraction Fact scoring Annotated Corpus Attributes (Biperpedia)

slide-28
SLIDE 28

Seed fact extraction Dependency pattern learning Fact extraction Fact scoring Annotated Corpus Attributes (Biperpedia)

8 simple lexical patterns to capture facts with S A O in close proximity

Google CEO Larry Page started his term in 2011.

slide-29
SLIDE 29

Seed fact extraction Dependency pattern learning Fact extraction Fact scoring Annotated Corpus Attributes (Biperpedia)

align seed facts with text to find variations in how facts for an attribute are expressed

A CEO, like Larry Page of Google, is usually a busy person.

slide-30
SLIDE 30

Seed fact extraction Dependency pattern learning Fact extraction Fact scoring Annotated Corpus Attributes (Biperpedia)

deploy the dependency patterns to collect more facts

A CEO, like Manouchehr Moshayedi of STEC Inc., is not allowed to trade his company’s stocks based

  • n information only he

has.

slide-31
SLIDE 31

Seed fact extraction Dependency pattern learning Fact extraction Fact scoring Annotated Corpus Attributes (Biperpedia)

assign numerical scores facts score ∝ extracted fact is correct

slide-32
SLIDE 32

ReNoun

Seed fact extraction Dependency pattern learning Fact extraction Fact scoring Annotated Corpus Attributes (Biperpedia)

slide-33
SLIDE 33

Annotated Corpus

  • 1. Dependency parses,
  • 2. Noun phrase chunks,
  • 3. NER,
  • 4. Coreference resolution,
  • 5. Entity resolution to

[Google]1 [CEO]2 [Larry Page]2 started his term in 2011, when [he]2 succeeded [Eric Schmidt]3.

Cluster Phrase Freebase ID 1 Google /m/045c7b 2 Larry Page, CEO, he /m/0gjpq 3 Eric Schmidt /m/0gjpq

33

400x106 news documents

A/DET CEO/NN det like/IN prep Page/NNP pobj Larry/NNP nn

  • f/IN

prep Google/NNP pobj

slide-34
SLIDE 34

ReNoun

Seed fact extraction Dependency pattern learning Fact extraction Fact scoring Annotated Corpus Attributes (Biperpedia)

slide-35
SLIDE 35

Biperpedia

Attributes: Biperpedia [Gupta et al., VLDB’14]

Query Attribute Gazprom’s banking arm banking arm Biperpedia

for more details, see the VLDB’14 paper, also [Lee at al., ICDE’13] and [Pasca & van Durma, IJCAI’07 ]

35

N

  • t

T r i p l e s

slide-36
SLIDE 36

ReNoun

Seed fact extraction Dependency pattern learning Fact extraction Fact scoring Annotated Corpus Attributes (Biperpedia)

slide-37
SLIDE 37

Seed fact extraction

# Pattern Example

1 the A of S, O the CEO of Google, Larry Page 2 the A of S is O the CEO of Google is Larry Page 3 O, S A Larry Page, Google CEO 4 O, S’s A Larry Page, Google’s CEO 5 O, [the] A of S Larry Page, [the] CEO of Google 6 SAO Google CEO Larry Page 7 SA, O Google CEO, Larry Page 8 S’s A, O Google’s CEO, Larry Page

#1: A in Biperpedia & one of 8 pattern applies

[Google]1 [CEO]2 [Larry Page]2 started his term in 2011, when [he]2 succeeded [Eric Schmidt]3.

37

slide-38
SLIDE 38

Seed fact extraction

# Pattern Example

1 the A of S, O the CEO of Google, Larry Page 2 the A of S is O the CEO of Google is Larry Page 3 O, S A Larry Page, Google CEO 4 O, S’s A Larry Page, Google’s CEO 5 O, [the] A of S Larry Page, [the] CEO of Google 6 SAO Google CEO Larry Page 7 SA, O Google CEO, Larry Page 8 S’s A, O Google’s CEO, Larry Page

#1: A in Biperpedia & one of 8 pattern applies

[Google]1 [CEO]2 [Larry Page]2 started his term in 2011, when [he]2 succeeded [Eric Schmidt]3.

38

slide-39
SLIDE 39

Seed fact extraction #2: A and O Corefer

[Google]1 [CEO]2 [Larry Page]2 started his term in 2011, when [he]2 succeeded [Eric Schmidt]3.

Cluster Phrase Freebase ID 1 Google /m/045c7b 2 Larry Page, CEO, he /m/0gjpq 3 Eric Schmidt /m/0gjpq

39

slide-40
SLIDE 40

Seed fact extraction Result

[Google]1 [CEO]2 [Larry Page]2 started his term in 2011, when [he]2 succeeded [Eric Schmidt]3.

Google Larry Page CEO

40

slide-41
SLIDE 41

139M Seed facts, 680K unique

F a t h e a d L

  • n

g t a i l

80/100 65/100

*random sample of 100 seed facts

Accuracy*

Seed fact extraction

41

slide-42
SLIDE 42

ReNoun

Seed fact extraction Dependency pattern learning Fact extraction Fact scoring Annotated Corpus Attributes (Biperpedia)

slide-43
SLIDE 43

Seed fact extraction Dependency pattern learning Annotated Corpus

Google Larry Page CEO

A [CEO]1, like [Larry Page]2 of [Google]3, is usually a busy person.

A/DET CEO/NN det like/IN prep Page/NNP pobj Larry/NNP nn

  • f/IN

prep Google/NNP pobj

43

slide-44
SLIDE 44

{A/N} like/IN prep {O/N} pobj

  • f/IN

prep {S/N} pobj Google Larry Page CEO

Dependency pattern learning

A/DET CEO/NN det like/IN prep Page/NNP pobj Larry/NNP nn

  • f/IN

prep Google/NNP pobj CEO/NN like/IN prep Page/NNP pobj

  • f/IN

prep Google/NNP pobj

44

slide-45
SLIDE 45

{A/N} like/IN prep {O/N} pobj

  • f/IN

prep {S/N} pobj CEO Google Larry Page CEO

45

slide-46
SLIDE 46

{A/N} like/IN prep {O/N} pobj

  • f/IN

prep {S/N} pobj CEO Google Larry Page CEO executive chairman Google Eric Schmidt executive chairman

Same pattern could apply to multiple attributes…

46

slide-47
SLIDE 47

{A/N} like/IN prep {O/N} pobj

  • f/IN

prep {S/N} pobj CEO Google Larry Page CEO executive chairman head coach Google Eric Schmidt executive chairman Real Madrid Carlo Ancelotti head coach

Same pattern could apply to multiple attributes… …useful for scoring

47

slide-48
SLIDE 48

ReNoun

Seed fact extraction Dependency pattern learning Fact extraction Fact scoring Annotated Corpus Attributes (Biperpedia)

slide-49
SLIDE 49

{A/N} like/IN prep {O/N} pobj

  • f/IN

prep {S/N} pobj

Fact extraction

A CEO, like Manouchehr Moshayedi of STEC Inc., is not allowed to trade his company’s stocks based on information

  • nly he has.

STEC Inc. Manouchehr Moshayedi CEO

CEO executive chairman head coach

49

slide-50
SLIDE 50

{A/N} like/IN prep {O/N} pobj

  • f/IN

prep {S/N} pobj

A CEO, like Manouchehr Moshayedi of STEC Inc., is not allowed to trade his company’s stocks based on information

  • nly he has.

STEC Inc. Manouchehr Moshayedi CEO

CEO executive chairman head coach

Argument order

Seed fact extraction

# Pattern Example 1 the A of S, O the CEO of Google, Larry Page 2 the A of S is O the CEO of Google is Larry Page 3 O, S A Larry Page, Google CEO 4 O, S’s A Larry Page, Google’s CEO 5 O, [the] A of S Larry Page, [the] CEO of Google 6 SAO Google CEO Larry Page 7 SA, O Google CEO, Larry Page 8 S’s A, O Google’s CEO, Larry Page

#1: A in Biperpedia & one of 8 pattern applies

50

slide-51
SLIDE 51

ReNoun

Seed fact extraction Dependency pattern learning Fact extraction Fact scoring Annotated Corpus Attributes (Biperpedia)

slide-52
SLIDE 52

Some Numbers …

Seed fact extraction Dependency pattern learning Fact extraction Fact scoring Annotated Corpus Attributes (Biperpedia) 400x106 news documents 60K: 
 218 Fat head

30K patterns 460M extractions
 40M unique facts 139M extractions 680K unique facts

slide-53
SLIDE 53

ReNoun [this talk] Ollie* [EMNLP’12]

vs.

F a t h e a d ( 2 1 8 ) L

  • n

g t a i l ( 6 K )

99 31 : : 100 100

53

* Ollie handles verbs, ReNoun doesn’t!

slide-54
SLIDE 54

ReNoun [this talk] Ollie [EMNLP’12]

vs.

: 100

54

48

  • 25/52 not in Biperpedia (mostly meaningless)
  • Action: “Obama’s citation of the Bible” (Coref kills these)
slide-55
SLIDE 55

Results - Precision & #Attributes

k Fat head Long tail precision #attr precision #attr 102 1.00 8 1.00 50 103 0.98 36 1.00 294 104 0.96 78 0.98 1548 105 0.82 106 0.96 5093 106 0.74 124 0.70 7821 all 0.18 141 0.26 11178

55

f1 f2

f100

f1000

f10000

f100000

f1000000

− score +

slide-56
SLIDE 56

Results - Attributes w/o Facts

Cause FH LT Example Vague 23 37 culture Numeric 4 26 rainfall Object in in Freebase 11 6 email Plural 30 15 member firms Bad attribute 3 4 newsies Value expected 6 12 nationality Σ 77 100

56

slide-57
SLIDE 57

ReNoun

Seed fact extraction Dependency pattern learning Fact extraction Fact scoring Annotated Corpus Attributes (Biperpedia)

slide-58
SLIDE 58

Fact Scoring

Score( ) = Σ score(p)

p∈patterns( )

score(p) = freq(p) × coh(p)

58

slide-59
SLIDE 59

Fact Scoring coh(p) low → p admits more noise Pattern coherence coh(p) coh(p) high → p admits less noise

59

slide-60
SLIDE 60

{A/N} like/IN prep {O/N} pobj

  • f/IN

prep {S/N} pobj CEO executive chairman head coach

60

slide-61
SLIDE 61

coh(p)

Barack has two children with his wife Michelle Putin has two children with his ex-wife Lyudmilla Chelsea F.C. general manager Jose Mourinho General Motors subsidiary Opel

coh( ) = 0.093

{A/N} {S/N} nn {O/N} nn

coh( ) = 0.429

has/VBZ {S/N} nsubj children/NNS dobj

with/IN

prep {A/N} pobj {O/N} appos

61

slide-62
SLIDE 62

ex-wife husband wife

coh( ) =

has/VBZ {S/N} nsubj children/NNS dobj

with/IN

prep {A/N} pobj {O/N} appos

Avg(coh(ex-wife, husband), 
 coh(ex-wife, wife), coh(husband, wife))

coh(husband, wife) = cos (word2vec(husband), word2vec(wife))

[Mikolov et al., ICLR’13]

62

slide-63
SLIDE 63

I You

63

thank

slide-64
SLIDE 64

Summary

ReNoun [this talk] Ollie [EMNLP’12]

president coach president coach province province banking arm banking arm state assemblyman state assemblyman caucus chairman caucus chairman vs. foreign policy chief foreign policy chief F a t h e a d L

  • n

g t a i l

ReNoun

Seed fact extraction Extraction pattern learning Fact extraction Fact scoring Annotated Corpus Attributes (Biperpedia)

Seed fact extraction

# Pattern Example 1 the A of S, O the CEO of Google, Larry Page 2 the A of S is O the CEO of Google is Larry Page 3 O, S A Larry Page, Google CEO 4 O, S’s A Larry Page, Google’s CEO 5 O, [the] A of S Larry Page, [the] CEO of Google 6 SAO Google CEO Larry Page 7 SA, O Google CEO, Larry Page 8 S’s A, O Google’s CEO, Larry Page

#1: A in Biperpedia & one of 8 pattern applies

[Google]1 [CEO]2 [Larry Page]2 started his term in 2011, when [he]2 succeeded [Eric Schmidt]3.

Fact Scoring

Score( ) = Σ score(p)

p∈patterns( )

score(p) = freq(p) × coh(p)

An/DET executive/NN chairman/NN det nn like/IN prep Schmidt/NNP pobj Eric/NNP nn
  • f/IN
prep Google/NNP pobj chairman/NN like/IN prep Schmidt/NNP pobj
  • f/IN
prep Google/NNP pobj {A/N} like/IN prep {O/N} pobj
  • f/IN
prep {S/N} pobj Google Larry Page CEO

Dependency pattern learning