querying unnormalized and inc mpl te knowledge bases
play

Querying unNorMaLiZED and Inc mpl te Knowledge Bases Percy Liang - PowerPoint PPT Presentation

Querying unNorMaLiZED and Inc mpl te Knowledge Bases Percy Liang Stanford University Automated Knowledge Base Construction (AKBC) 2016 June 17, 2016 Computing the answer What is the second most populous city in California? 1 Computing the


  1. Querying unNorMaLiZED and Inc mpl te Knowledge Bases Percy Liang Stanford University Automated Knowledge Base Construction (AKBC) 2016 June 17, 2016

  2. Computing the answer What is the second most populous city in California? 1

  3. Computing the answer What is the second most populous city in California? semantic parsing argmax ( Type.City ⊓ ContainedBy.CA , Population , 2) 1

  4. Computing the answer What is the second most populous city in California? semantic parsing argmax ( Type.City ⊓ ContainedBy.CA , Population , 2) execute San Diego 1

  5. Computing the answer Which states’ capitals are also their largest cities by area? 2

  6. Computing the answer Which states’ capitals are also their largest cities by area? semantic parsing µx. Type.USState ⊓ Capital . argmax ( Type.City ⊓ ContainedBy .x, Area ) 2

  7. Computing the answer Which states’ capitals are also their largest cities by area? semantic parsing µx. Type.USState ⊓ Capital . argmax ( Type.City ⊓ ContainedBy .x, Area ) execute Arizona,Hawaii,Idaho,Indiana,Iowa,Oklahoma,Utah 2

  8. Computing the answer Which states’ capitals are also their largest cities by area? semantic parsing µx. Type.USState ⊓ Capital . argmax ( Type.City ⊓ ContainedBy .x, Area ) execute Arizona,Hawaii,Idaho,Indiana,Iowa,Oklahoma,Utah Strongly leverages KB structure! 2

  9. [Bollacker, 2008; Google, 2013] Freebase 100M entities (nodes) 1B assertions (edges) MichelleObama Gender Female USState PlacesLived 1992.10.03 Spouse Type StartDate Event21 Event8 Hawaii ContainedBy Location Type UnitedStates Marriage ContainedBy ContainedBy Chicago BarackObama Honolulu PlaceOfBirth Location PlacesLived Event3 Type DateOfBirth Profession Type Person 1961.08.04 Politician City 3

  10. [Bollacker, 2008; Google, 2013] Freebase 100M entities (nodes) 1B assertions (edges) MichelleObama Gender Female USState PlacesLived 1992.10.03 Spouse Type StartDate Event21 Event8 Hawaii ContainedBy Location Type UnitedStates Marriage ContainedBy ContainedBy Chicago BarackObama Honolulu PlaceOfBirth Location PlacesLived Event3 Type DateOfBirth Profession Type Person 1961.08.04 Politician City 3

  11. hiking trails near Palo Alto dishes at Oren’s Hummus ACL 2014 papers MichelleObama Gender Female USState 1992.10.03 PlacesLived Spouse Type StartDate Event21 Event8 Hawaii ContainedBy UnitedStates Location Type Marriage ContainedBy ContainedBy Chicago BarackObama Honolulu PlaceOfBirth Location PlacesLived Event3 Type DateOfBirth Profession Type Person 1961.08.04 Politician City 4

  12. hiking trails near Palo Alto dishes at Oren’s Hummus ACL 2014 papers MichelleObama Gender Female USState PlacesLived Spouse 1992.10.03 Type StartDate Event21 Event8 Hawaii ContainedBy Location Type Marriage UnitedStates ContainedBy ContainedBy Chicago BarackObama PlaceOfBirth Honolulu Location PlacesLived Event3 Type DateOfBirth Profession Type Person 1961.08.04 Politician City Fewer than 10% of WebQuestions answerable via Freebase 4

  13. Obtaining better KBs 5

  14. Obtaining better KBs KBC 5

  15. Obtaining better KBs KBC QA 5

  16. Obtaining better KBs KBC QA 5

  17. Philosophy Focus on the end-to-end task of question answering. Let that end goal drive learning and construction of intermediate knowl- edge representations. 6

  18. Philosophy Focus on the end-to-end task of question answering. Let that end goal drive learning and construction of intermediate knowl- edge representations. KBC/QA 6

  19. Outline On web pages On tables In vector space 7

  20. Simple semantic parsing on web pages Panupong (Ice) Pasupat ACL 2014 8

  21. Semantic parsing on the web Input: • query x hiking trails near Baltimore • web page w 9

  22. Semantic parsing on the web Input: • query x hiking trails near Baltimore • web page w 9

  23. Semantic parsing on the web Input: • query x hiking trails near Baltimore • web page w 9

  24. Semantic parsing on the web Input: • query x hiking trails near Baltimore • web page w Output: • list of entities y [Avalon Super Loop, Patapsco Valley State Park, ...] 9

  25. [Sahuguet and Azavant, 1999; Liu et al., 2000; Crescenzi et al., 2001] Logical forms: XPath expressions html head body table h1 table tr tr tr ... tr td td td td th th td td td td z = /html[1]/body[1]/table[2]/tr/td[1] 10

  26. [Sahuguet and Azavant, 1999; Liu et al., 2000; Crescenzi et al., 2001] Logical forms: XPath expressions html head body table h1 table tr tr tr ... tr td td td td th th td td td td z = /html[1]/body[1]/table[2]/tr/td[1] A low-level KB 10

  27. Framework html hiking trails head body x w near Baltimore ... ... 11

  28. Framework html hiking trails head body x w near Baltimore ... ... Generation ( |Z| ≈ 8500) Z 11

  29. Framework html hiking trails head body x w near Baltimore ... ... Generation ( |Z| ≈ 8500) Z Model /html[1]/body[1]/table[2]/tr/td[1] z 11

  30. Framework html hiking trails head body x w near Baltimore ... ... Generation ( |Z| ≈ 8500) Z Model /html[1]/body[1]/table[2]/tr/td[1] Execution z [Avalon Super Loop, Patapsco Valley State Park, ...] y 11

  31. Dataset airlines of italy natural causes of global warming lsu football coaches bf3 submachine guns badminton tournaments foods high in dha technical colleges in south carolina songs on glee season 5 singers who use auto tune san francisco radio stations 12

  32. Dataset airlines of italy natural causes of global warming lsu football coaches 12

  33. Dataset statistics 2773 examples 2269 unique queries 894 unique headwords ← long tail! 1483 unique web domains ← long tail! ( � = wrapper induction) 13

  34. Results 70 60 55.8 50 40.5 40 30 20 10.3 10 0 Baseline Accuracy Accuracy @ 5 (Most frequent extraction predicates) 14

  35. Correct prediction Query: disney channel movies /html[1]/body/div[2]/div/div/div[3]/div[1]/div/div/div/div/b 15

  36. Ranking error Query: doctors at emory /html/body/div[3]/div[4]/table/tbody/tr/td[2] Need better understanding of entities/categories 16

  37. Coverage error Query: hedge funds in new york /html/body/div[3]/div[3]/div[4]/.../table/tbody/tr/td[2]/a Need compositionality 17

  38. Outline On web pages On tables In vector space 18

  39. Semantic parsing on tables Panupong (Ice) Pasupat ACL 2015, ACL 2016 19

  40. In what city did Piotr’s last 1st place finish occur? 20

  41. How long did it take this competitor to finish the 4x400 meter relay at Universiade in 2005? 20

  42. Where was the competition held immediately before the one in Turkey? 20

  43. How many times has this competitor placed 5th or better in competition? 20

  44. Dataset Statistics: • 22000 question/answers • 2100 tables • 6.3 columns and 27.5 rows per table 21

  45. Dataset Statistics: • 22000 question/answers • 2100 tables • 6.3 columns and 27.5 rows per table Challenges: • High logical complexity (conjunction, disjunction, superlatives, comparatives, aggregation, arithmetic) • Tables are unnormalized • Train and test tables are distinct; need to generalize! 21

  46. Knowledge representation Add normalization / auxiliary edges (custom functions), push resolution to semantic parsing 22

  47. Model Greece held its last Summer Olympics in which year? 2004 23

  48. Model Greece held its last Summer Olympics in which year? R [ Date ] . R [ Year ] . argmax ( Country . Greece , Index ) 2004 23

  49. Results IR: Train classifer to pick answer directly from table. WQ: Use logical complexity of our previous Freebase work. 50 37.1 40 answer accuracy 30 24.3 20 12.7 10 0 IR WQ Full 24

  50. Oracle accuracy Can the system even generate a set of candidates containing the answer? How many times did Greece hold the summer olympics? 2 25

  51. Oracle accuracy Can the system even generate a set of candidates containing the answer? How many times did Greece hold the summer olympics? 2 Method LF accuracy ACL 2015 53.5% ACL 2016 (dynamic prog. on denotations) 76.0% 25

  52. Error analysis Unhandled operations (19%): • Was there more gold medals won than silver? • Which movies were number 1 for at least two consecutive weeks? • How many titles had the same author listed as the illustrator? 26

  53. Error analysis Unhandled operations (19%): • Was there more gold medals won than silver? • Which movies were number 1 for at least two consecutive weeks? • How many titles had the same author listed as the illustrator? Table normalization: • In what city did Piotr’s last 1st place finish occur? ...[Bangkok, Thailand]... • How long does the show defcon 3 last? ...[2pm-3pm]... 26

  54. Error analysis Unhandled operations (19%): • Was there more gold medals won than silver? • Which movies were number 1 for at least two consecutive weeks? • How many titles had the same author listed as the illustrator? Table normalization: • In what city did Piotr’s last 1st place finish occur? ...[Bangkok, Thailand]... • How long does the show defcon 3 last? ...[2pm-3pm]... Lexical mismatch: • Mexican ⇒ Mexico, airplane ⇒ Model 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend