Entity Search: Building Bridges between Two Worlds Krisztian Balog , - - PowerPoint PPT Presentation

entity search building bridges between two worlds
SMART_READER_LITE
LIVE PREVIEW

Entity Search: Building Bridges between Two Worlds Krisztian Balog , - - PowerPoint PPT Presentation

Entity Search: Building Bridges between Two Worlds Krisztian Balog , Edgar Meij, and Maarten de Rijke ISLA, University of Amsterdam http://ilps.science.uva.nl Entity search Information organized around entities Instead of finding


slide-1
SLIDE 1

Entity Search: Building Bridges between Two Worlds

Krisztian Balog, Edgar Meij, and Maarten de Rijke ISLA, University of Amsterdam

http://ilps.science.uva.nl

slide-2
SLIDE 2

Entity search

  • Information organized around entities
  • Instead of finding documents about the

entity, find the entity itself

  • Problem looked at by both the

Information Retrieval (IR) and the Semantic Web (SW) communities

slide-3
SLIDE 3

Entity search tasks

  • Entity ranking
  • List completion
  • Related entity finding
slide-4
SLIDE 4

Motivation

  • To which extent are IR and SW methods

capable of answering information needs related to entity finding?

slide-5
SLIDE 5

Where are we now?

  • Information Retrieval
  • Identifying and ranking entities in

large volumes of data

  • Mostly based on co-occurrences

between terms and entities

  • Generated models are not always

meaningful for human consumption

slide-6
SLIDE 6

Where are we now?

  • Semantic Web
  • Structured data, naturally organized

around entities

  • Entity retrieval is as simple as running

SPARQL queries?

  • Free-text querying is more appealing

to (naive) end users

slide-7
SLIDE 7

Related entity finding

  • Given
  • Input entity E (name plus homepage)
  • Type T of the target entity (person,
  • rganization, or product)
  • Narrative R (describes nature of

relation)

  • Return homepages of related entities
slide-8
SLIDE 8

Example topics

(E) Source entity name (E) Source entity URL (T) Target type (R) Narrative Medimmune, Inc. clueweb09-en0008-26-39300 Product Products of Medimmune, Inc. (E) Source entity name (E) Source entity URL (T) Target type (R) Narrative Boeing 747 clueweb09-en0005-75-02292 Organisation Airlines that currently use Boeing 747 planes.

slide-9
SLIDE 9

Aim

  • Compare IR and SW approaches on the

related entity finding task

  • Focusing on finding all relevant entities,

but not on actually ranking them

slide-10
SLIDE 10

Related entity finding

Our variation

  • TREC Entity 2009 topics (20)
  • Map source entity to a Wikipedia page (17)
  • Map target category to the most specific

class within the DBPedia ontology

  • Ground truth: Wikipedia pages from

relevance assessments

slide-11
SLIDE 11

Example topic

(E) Source entity name (E) Source entity URL (T) Target type (R) Narrative Boeing 747 clueweb09-en0005-75-02292 Organisation Airlines that currently use Boeing 747 planes. Source entity DBPedia-owl Relation Boeing_747 Organisation/Company/Airline Airlines that currently use Boeing 747 planes.

slide-12
SLIDE 12

IR approaches

  • Aggregation of approaches employed at

the TREC Entity track

  • Various ways of recognizing and ranking

entities

  • Common to all is a mechanism for

capturing the co-occurrence between source and target entities

slide-13
SLIDE 13

A typical IR approach

Query (input entity, relation) Document/snippet retrieval Answer candidate extraction Answer candidate (type) filtering Answer candidate ranking Output (related entities)

slide-14
SLIDE 14

Two SW approaches

  • SPARQL query
  • Exhaustive graph search
  • Find all paths between E and T in a

knowledge base

  • The depth of search is limited

SELECT DISTINCT ?m ?r WHERE { ?m rdf:type dbpedia-owl:Drug . { ?m ?r dbpedia:MedImmune } UNION { dbpedia:MedImmune ?r ?m } }

slide-15
SLIDE 15

SPARQL on DBPedia

Query: Products of Medimunne, Inc.

?m ?r dbpedia:Amifostine dbp-prop:wikilink dbpedia:Blinatumomab dbp-prop:wikilink dbpedia:Motavizumab dbp-prop:wikilink dbpedia:Palivizumab dbp-prop:wikilink

slide-16
SLIDE 16

SPARQL on DBPedia

Query: Airlines that Air Canada has code share flights with.

?m ?r dbpedia:Air_Canada dbp-prop:wikilink dbpedia:Austrian_Airlines dbp-prop:wikilink dbpedia:Japan_Airlines dbp-prop:wikilink dbpedia:Lufthansa dbp-prop:wikilink dbpedia:Turkish_Airlines dbp-prop:wikilink ... dbpedia:Air_Ontario dbp-ontology:Company/parentCompany dbpedia:Air_Canada_Tango dbp-ontology:Company/parentCompany dbpedia:Canadian_Airlines dbp-ontology:foundationPerson

slide-17
SLIDE 17

SPARQL on DBPedia

Query: Members of the band Jefferson Airplane.

?m ?r dbpedia:Jim_Morrison dbp-prop:wikilink dbpedia:Jimi_Hendrix dbp-prop:wikilink ... dbpedia:Jack_Casady dbp-ontology:associatedMusicalArtist dbpedia:Paul_Kantner dbp-ontology:associatedMusicalArtist dbpedia:Joey_Covington dbp-ontology:associatedMusicalArtist dbpedia:Marty_Balin dbp-ontology:associatedMusicalArtist ... dbpedia:Grace_Slick dbp-prop:pastMembers dbpedia:Jorma_Kaukonen dbp-prop:pastMembers ...

slide-18
SLIDE 18

Findings

  • IR and SW methods find basically the

same set of entities

  • Most relations returned by SW methods

are of type wikilink

slide-19
SLIDE 19

Next

  • Extend search to Linked Open Data (LOD)
  • We use the Linked Data Semantic

Repository (LDSR)

slide-20
SLIDE 20

SPARQL on LOD

?m ?r dbpedia:Amifostine dbp-prop:wikilink dbpedia:Blinatumomab dbp-prop:wikilink dbpedia:Motavizumab dbp-prop:wikilink dbpedia:Palivizumab dbp-prop:wikilink dbpedia:Motavizumab fb:base.bioventurist.product.developed_by dbpedia:Palivizumab fb:base.bioventurist.science_or_technology_company.products dbpedia:Motavizumab fb:base.bioventurist.product.developed_by dbpedia:Palivizumab fb:base.bioventurist.science_or_technology_company.products

Query: Products of Medimunne, Inc.

slide-21
SLIDE 21

Graph search on LOD

slide-22
SLIDE 22

Findings

  • More entities as well as more diverse

relations

  • Having more data does not

automatically improve results

  • Some of the identified entities are now

too general

slide-23
SLIDE 23

Summarizing findings

  • Information Retrieval
  • Excellent ways of finding associations

between topics and entities

  • Tend to perform better for less popular

entities (not represented in LOD)

  • Missing: semantics of the found

associations

slide-24
SLIDE 24

Summarizing findings

  • Semantic Web
  • Has the potential of generating a large

number of candidate entities and relations

  • Could be as simple as instantiating a

SPARQL query

  • For many queries LOD is very sparse

w.r.t. semantically meaningful links between entities

slide-25
SLIDE 25

Zooming out

  • Enhance text-based models with

semantic information from LOD

  • Use IR models to discover and label links

between entities in LOD

slide-26
SLIDE 26

TREC Entity 2010

  • Main task: Related entity finding
  • Pilot task: List completion
  • Given URIs of related entities,

complete the list with additional entities from LOD

slide-27
SLIDE 27

Questions?

Krisztian Balog

http://staff.science.uva.nl/~kbalog