renoun
play

ReNoun Fact Extraction for Nominal Attributes Mohamed Yahya, Steven - PowerPoint PPT Presentation

ReNoun Fact Extraction for Nominal Attributes Mohamed Yahya, Steven Whang, Rahul Gupta, and Alon Halevy EMNLP // 2014.10.26 Nouns, Queries & Relations Nouns as Attributes % Noun % Verb KB Attributes Attributes 96 4 DBpedia


  1. ReNoun Fact Extraction for Nominal Attributes Mohamed Yahya, Steven Whang, 
 Rahul Gupta, and Alon Halevy EMNLP // 2014.10.26

  2. Nouns, Queries 
 & Relations

  3. Nouns as Attributes % Noun % Verb KB Attributes Attributes 96 4 DBpedia 97 3 Freebase

  4. Nouns as Attributes % Noun % Verb KB Attributes Attributes 96 4 DBpedia 97 3 Freebase

  5. entity relation/attribute entity entity 7

  6. 8

  7. This talk is about 
 extracting facts 
 centered around 
 noun phrases (= attributes ) banking arm coach caucus chairman president foreign policy chief province state assemblyman

  8. [Etzioni et al., CACM’08] (Open) Information Extraction Gazprombank banking arm …Gazprom’s banking arm, Gazprombank, 
 Gazprom is owned by the company’s pension fund… 11

  9. [Etzioni et al., CACM’08] (Open) Information Extraction Text Gazprombank banking arm …Gazprom’s banking arm, Gazprombank, 
 Gazprom is owned by the company’s pension fund… Fact/ Subject relation Object triple 12

  10. 
 Before the details: 
 What’s missing from 
 the state of the art?

  11. 2010 WOE 2011 ReVerb 2012 Ollie 2013 ClausIE 2014 ReNoun

  12. 2010 WOE 2011 ReVerb 2012 Ollie 2013 ClausIE 2014 Notable exception, 
 ReNoun check it out! [WWW’13]

  13. 2010 WOE 2011 ReVerb 2012 Ollie 2013 ClausIE 2014 ReNoun

  14. Let’s see Ollie …

  15. Obama is the president of the US. V | V P | V W* P arg1 arg2 Obama US is president of 18

  16. Obama is president of US {US/N} nn {Obama/N} nn {president/N} 19

  17. Ollie* ReNoun vs. [EMNLP’12] [this talk] coach coach president president province province banking arm banking arm caucus chairman caucus chairman foreign policy chief foreign policy chief state assemblyman state assemblyman * Ollie handles verbs, ReNoun doesn’t! 20

  18. Ollie* ReNoun vs. [EMNLP’12] [this talk] coach coach president president province province banking arm banking arm caucus chairman caucus chairman foreign policy chief foreign policy chief state assemblyman state assemblyman * Ollie handles verbs, ReNoun doesn’t! 21

  19. Ollie* ReNoun vs. [EMNLP’12] [this talk] coach coach d a ) e 8 president president h 1 2 t ( a F province province banking arm banking arm l i caucus chairman caucus chairman a ) t K g 0 n 6 o ( foreign policy chief foreign policy chief L state assemblyman state assemblyman * Ollie handles verbs, ReNoun doesn’t! 22

  20. 2010 WOE 2011 ReVerb 2012 Ollie 2013 ClausIE 2014 ReNoun

  21. 2010 WOE 2011 ReVerb 2012 Ollie 2013 ClausIE 2014 ReNoun

  22. … now, ReNoun is upon us!

  23. Relations are expressed using noun phrases . ReNoun Entity noun phrase Entity [this talk] 26

  24. ReNoun Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes (Biperpedia)

  25. Google CEO Larry Page started his term in 2011. Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes 8 simple lexical patterns to capture (Biperpedia) facts with S A O in close proximity

  26. A CEO, like Larry Page of Google, is usually a busy person. Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring align seed facts with text to find Attributes (Biperpedia) variations in how facts for an attribute are expressed

  27. A CEO, like Manouchehr Moshayedi of STEC Inc., is not allowed to trade his company’s stocks based on information only he has. Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes deploy the dependency patterns to (Biperpedia) collect more facts

  28. Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes assign numerical scores facts (Biperpedia) score ∝ extracted fact is correct

  29. ReNoun Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes (Biperpedia)

  30. Annotated Corpus 400x10 6 news 1. Dependency parses, 2. Noun phrase chunks, documents 3. NER, 4. Coreference resolution, det A/DET 5. Entity resolution to CEO/NN prep nn Larry/NNP pobj like/IN Page/NNP prep pobj of/IN Google/NNP [Google] 1 [CEO] 2 [Larry Page] 2 started his term in 2011, when [he] 2 succeeded [Eric Schmidt] 3 . Cluster Phrase Freebase ID 1 Google /m/045c7b 2 Larry Page, CEO, he /m/0gjpq 3 Eric Schmidt /m/0gjpq 33

  31. ReNoun Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes (Biperpedia)

  32. Attributes: Biperpedia [Gupta et al., VLDB’14] N o Query Gazprom’s banking arm t T Biperpedia r Biperpedia i p l banking arm Attribute e s for more details, see the VLDB’14 paper, also [Lee at al., ICDE’13] and [Pasca & van Durma, IJCAI’07 ] 35

  33. ReNoun Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes (Biperpedia)

  34. Seed fact extraction [Google] 1 [ CEO ] 2 [Larry Page] 2 started his term in 2011, when [he] 2 succeeded [Eric Schmidt] 3 . #1: A in Biperpedia & one of 8 pattern applies # Pattern Example 1 the A of S, O the CEO of Google, Larry Page 2 the A of S is O the CEO of Google is Larry Page 3 O, S A Larry Page, Google CEO 4 O, S’s A Larry Page, Google’s CEO 5 O, [the] A of S Larry Page, [the] CEO of Google 6 SAO Google CEO Larry Page 7 SA, O Google CEO, Larry Page 8 S’s A, O Google’s CEO, Larry Page 37

  35. Seed fact extraction [Google] 1 [CEO] 2 [Larry Page] 2 started his term in 2011, when [he] 2 succeeded [Eric Schmidt] 3 . #1: A in Biperpedia & one of 8 pattern applies # Pattern Example 1 the A of S, O the CEO of Google, Larry Page 2 the A of S is O the CEO of Google is Larry Page 3 O, S A Larry Page, Google CEO 4 O, S’s A Larry Page, Google’s CEO 5 O, [the] A of S Larry Page, [the] CEO of Google 6 SAO Google CEO Larry Page 7 SA, O Google CEO, Larry Page 8 S’s A, O Google’s CEO, Larry Page 38

  36. Seed fact extraction #2: A and O Corefer [Google] 1 [CEO] 2 [Larry Page] 2 started his term in 2011, when [he] 2 succeeded [Eric Schmidt] 3 . Cluster Phrase Freebase ID 1 Google /m/045c7b 2 Larry Page, CEO, he /m/0gjpq 3 Eric Schmidt /m/0gjpq 39

  37. Seed fact extraction Result [Google] 1 [CEO] 2 [Larry Page] 2 started his term in 2011, when [he] 2 succeeded [Eric Schmidt] 3 . Google CEO Larry Page 40

  38. Seed fact extraction 139M Seed facts, 680K unique Accuracy* d a 65/100 e h t a F l i a 80/100 t g n o L *random sample of 100 seed facts 41

  39. ReNoun Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes (Biperpedia)

  40. A [CEO] 1 , like [Larry Page] 2 of [Google] 3 , is usually a busy person. det A/DET CEO/NN prep nn Larry/NNP pobj like/IN Page/NNP prep pobj of/IN Google/NNP Annotated Corpus Larry Google CEO Page Seed fact Dependency extraction pattern learning 43

  41. Dependency pattern learning Larry Google CEO Page det A/DET CEO/NN prep nn Larry/NNP pobj like/IN Page/NNP prep pobj of/IN Google/NNP prep pobj prep pobj CEO/NN like/IN Page/NNP of/IN Google/NNP prep pobj prep pobj {A/N} like/IN {O/N} of/IN {S/N} 44

  42. prep pobj prep pobj {A/N} like/IN {O/N} of/IN {S/N} Larry Google CEO CEO Page 45

  43. prep pobj prep pobj {A/N} like/IN {O/N} of/IN {S/N} Larry Google CEO CEO Page executive executive Eric Google chairman chairman Schmidt Same pattern could apply to multiple attributes… 46

  44. prep pobj prep pobj {A/N} like/IN {O/N} of/IN {S/N} Larry Google CEO CEO Page executive executive Eric Google chairman chairman Schmidt head head Real Carlo coach coach Madrid Ancelotti Same pattern could apply to multiple attributes… …useful for scoring 47

  45. ReNoun Annotated Corpus Seed fact Dependency Fact Fact extraction pattern learning extraction scoring Attributes (Biperpedia)

  46. Fact extraction A CEO, like Manouchehr Moshayedi of STEC Inc., is not allowed to trade his company’s stocks based on information only he has. prep pobj prep pobj executive head {A/N} like/IN {O/N} of/IN {S/N} CEO chairman coach Manouchehr CEO STEC Inc. Moshayedi 49

  47. Argument order Seed fact extraction A CEO, like Manouchehr Moshayedi of #1: A in Biperpedia & one of 8 pattern applies STEC Inc., is not allowed to trade his # Pattern Example 1 the A of S, O the CEO of Google, Larry Page company’s stocks based on information 2 the A of S is O the CEO of Google is Larry Page 3 O, S A Larry Page, Google CEO only he has. 4 O, S’s A Larry Page, Google’s CEO 5 O, [the] A of S Larry Page, [the] CEO of Google 6 SAO Google CEO Larry Page 7 SA, O Google CEO, Larry Page 8 S’s A, O Google’s CEO, Larry Page prep pobj prep pobj executive head {A/N} like/IN {O/N} of/IN {S/N} CEO chairman coach Manouchehr CEO STEC Inc. Moshayedi 50

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend