Utilizing Knowledge Bases for Text Retrieval: A Wishlist for Text - - PowerPoint PPT Presentation

utilizing knowledge bases for text retrieval a wishlist
SMART_READER_LITE
LIVE PREVIEW

Utilizing Knowledge Bases for Text Retrieval: A Wishlist for Text - - PowerPoint PPT Presentation

Utilizing Knowledge Bases for Text Retrieval: A Wishlist for Text Retrieval: A Wishlist Utilizing Knowledge Bases Laura Dietz dietz@cs.unh.edu KG4IR https://kg4ir.github.io Retrieval for Open-ended Information Needs Requiring long, complex


slide-1
SLIDE 1

Utilizing Knowledge Bases for Text Retrieval: A Wishlist Laura Dietz

dietz@cs.unh.edu

Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-2
SLIDE 2

KG4IR https://kg4ir.github.io

slide-3
SLIDE 3

Retrieval for Open-ended Information Needs Requiring long, complex answers Intended queries:

  • how ice skates work
  • UK leaving Europe
  • cashflow important for investment
  • effects of water pollution
  • Diesel scandal affect Daimler AG

If yes, why? If not, why not? Causes? Involvements? Controversy? Backstory? What do I need to know to understand the answer?

xkcd.com/1867/

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-4
SLIDE 4

What is the problem? ...and the solution? Wikipedia Web Search Not enough / recent information Train computers to recycle Web content to write a comprehensive articles in response to a search query Manually sift through many web pages

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-5
SLIDE 5

Query-specific Article + Knowledge Graph

predominant facts and introduction more details about Heading 1 Query more details about Heading 2

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-6
SLIDE 6

Step 1: Find Relevant/Central Entities

predominant facts and introduction more details about Heading 1 Query more details about Heading 2

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-7
SLIDE 7

predominant facts and introduction more details about Heading 1 Query more details about Heading 2

Step 2: Find Relevant Relations

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-8
SLIDE 8

Step 3: Find Relevant Text + Consolidate

predominant facts and introduction more details about Heading 1 Query more details about Heading 2

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-9
SLIDE 9

Q: diesel scandal affect Daimler How to Find Relevant Entities?

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-10
SLIDE 10

Volkswagen Emissions Test

Q: diesel scandal affect Daimler

Diesel scandal Category: Automobile car innovation industry Daimler AG Volkswagen

(1) Entity linking the query (2) Search in KB index (3) Relevance Feedback

How to Find Relevant Entities?

Q

pretend these are relevant

Q

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-11
SLIDE 11

How to Use Entities for Text Ranking?

Q

Q: diesel scandal affect Daimler

Q

pretend these are relevant

Diesel scandal

...name ........ query term .... article term ...name ........

Diesel scandal diesel car ... industry Daimler

...name ........ query term .... article term ...name ........

Category: Automobile car innovation industry Daimler AG Volkswagen

[Dalton, Dietz, Allan 14]

(1) Entity linking the query (2) Search in KB index (3) Relevance Feedback

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-12
SLIDE 12

Q: diesel scandal affect Daimler

Q

pretend these are relevant

(1) Entity linking the query (2) Search in KB index (3) Relevance Feedback

Finding Relevant Entities: What Works? Strongest feature! room for improvement Wiki pages of relevant entities may not mention query <- <- Sparse <-

Q

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-13
SLIDE 13

Naive approach: Select sub-KG of relevant entities. So many connections in a knowledge graph

  • Some are relevant!
  • But many are only

relevant in a certain (other?) context. Identifying Relevant Relations in a KG

Emission Scandal Daimler Stock price VW Exxon Mobil Diesel Engines

Q: diesel scandal affect Daimler

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-14
SLIDE 14

KGs in 2013 KGs in 2019 KGs started with the "most popular" facts then it grew in number of nodes and number of connections, aiming for better coverage. Hub nodes: New York City, California, United States Link Structure in KGs Became Unhelpful

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-15
SLIDE 15

KGs in 2013 KGs in 2019 KGs started with the "most popular" facts then it grew in number of nodes and number of connections, aiming for better coverage. Hub nodes: New York City, California, United States Link Structure in KGs Became Unhelpful

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-16
SLIDE 16

KGs in 2013 KGs in 2019 KGs started with the "most popular" facts then it grew in number of nodes and number of connections, aiming for better coverage. Hub nodes: New York City, California, United States Link Structure in KGs Became Unhelpful

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-17
SLIDE 17

ENT Rank for Entity Ranking (1) Retrieve text + entity links and entities (2) Build candidate graph (3) Learn edge weights & Predict entity ranking

[Dietz 19, SIGIR]

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-18
SLIDE 18

ENT Rank for Entity Ranking (1) Retrieve text + entity links and entities (2) Build candidate graph (3) Learn edge weights & Predict entity ranking

[Dietz 19, SIGIR]

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-19
SLIDE 19

ENT Rank for Entity Ranking Features: Entity Neighbor Text

.. investor lawsuit seeking class action status ... seeking compen- sation for the drop in stock value due to the emissions scandal. Volkswagen had intentionally programmed turbocharged direct injection (TDI) diesel engines to activate some emissions controls

  • nly during laboratory emissions testing.

Diesel Engines Emissions Scandal Lawsuit

[Dietz 19, SIGIR]

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-20
SLIDE 20

ENT Rank for Entity Ranking [Dietz, SIGIR 19] Edges annotated with paragraphs! Why not relation types?

.. investor lawsuit seeking class action status ... seeking compen- sation for the drop in stock value due to the emissions scandal. Volkswagen had intentionally programmed turbocharged direct injection (TDI) diesel engines to activate some emissions controls

  • nly during laboratory emissions testing.

Diesel Engines Emissions Scandal Lawsuit

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-21
SLIDE 21

Relation Extraction: Research question: relevant documents + extraction = relevant relations? Extracting Relevant Relations

works_for works_for

[Schuhmacher, Roth, Ponzetto, Dietz 16]

Q

[Roth et al 14] (best at TAC KBP 13) [Schuhmacher, Roth, Ponzetto, Dietz 16]

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-22
SLIDE 22

rf:founded_by Eben_Upton Premier_Farnell United_Kingdom Broadcom University_of_Cambridge rf:member_of rf:member_of rf:headquarters England Harriet_Green dbp:membership rf:member_of rf:headquarters dbo p:almaMater Reuters rf:headquarters Raspberry_Pi_Foundation rf:member_of

Goal: Relations need to be relevant and correct Query: Raspberry Pi Relevant Relations through Relevant Documents

not relevant relevant dbp knowledge base rf relation extraction

no signal 50% / 50% N/A 60% queries

[Schuhmacher, Roth, Ponzetto, Dietz 16]

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-23
SLIDE 23

Goal: Relations need to be relevant and correct

  • nly considering correct extractions....

Issue 1: Correct Vs. Relevant Extractions

not relevant relevant

50% / 50%

[Schuhmacher 16] [Kadry & Dietz 17]

Schema-based: OpenIE-based: Human-based: 50% relevant 50% relevant 50% relevant (sentence-level)

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-24
SLIDE 24

Issue 2: Coverage of Relation Extractions

[Kadry & Dietz 17]

Open IE: 5% sentences with correct annotations (no coref)

0.1 0.2 0.3 0.4 0.5

MAP

Leads to only marginal improvements for IR, e.g. Ranking entity-query support sentences for relevance.

POS/NER Parsing OpenIE TF-IDF All together

Schema-based: N/A for 60% of queries (TAC KBP 13)

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-25
SLIDE 25

Volkswagen had intentionally programmed turbocharged direct injection (TDI) diesel engines to activate some emissions controls

  • nly during laboratory emissions testing.

Issue 3: Complex Relation Expressions Interesting relations are a bit more complicated.

.. investor lawsuit seeking class action status ... seeking compen- sation for the drop in stock value due to the emissions scandal.

Beyond more than one sentence. Include multiple intermediate entities. ...also not just triples + coref...

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-26
SLIDE 26

Data: Effects of Water Pollution/Eutrophication

Ask me for the data ...

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-27
SLIDE 27

Shared Task: TREC Complex Answer Retrieval Given: query Q and outline of Headings

predominant facts and introduction more details about Heading 1 H1 Query Heading 1 Query H2 Heading 2 Query more details about Heading 2

CAR Y1, Y2: Paragraph ranking per heading. Optimize relevance CAR Y3: Paragraph ordering. Maximize coverage, topical coherence Up next: Multi-paragraph summarization + query-KG

1. 2. 3. 4. 1. 2. 1. 2. 3.

http://trec-car.cs.unh.edu/

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist

slide-28
SLIDE 28

General purpose schema with many types Extraction of complex relations (not just triples + coref) High coverage/recall (40%?) Relevant information extraction Query-specific knowledge graphs

TREC CAR Dataset Ask me for a data set to play around with...

Bridging existing KGs with text

http://trec-car.cs.unh.edu/

Laura Dietz dietz@cs.unh.edu Utilizing Knowledge Bases for Text Retrieval: A Wishlist