Semantic Search Vinay Setty Jannik Strtgen vsetty@mpi-inf.mpg.de - - PowerPoint PPT Presentation

semantic search
SMART_READER_LITE
LIVE PREVIEW

Semantic Search Vinay Setty Jannik Strtgen vsetty@mpi-inf.mpg.de - - PowerPoint PPT Presentation

Advanced Topics in Information Retrieval Semantic Search Vinay Setty Jannik Strtgen vsetty@mpi-inf.mpg.de jannik.stroetgen@mpi-inf.mpg.de ATIR June 9, 2016 Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search What


slide-1
SLIDE 1

Advanced Topics in Information Retrieval

Semantic Search

Vinay Setty Jannik Strötgen

vsetty@mpi-inf.mpg.de jannik.stroetgen@mpi-inf.mpg.de

ATIR – June 9, 2016

slide-2
SLIDE 2

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

What is Semantic Search?

c Jannik Strötgen – ATIR-06 2 / 64

slide-3
SLIDE 3

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Semantics the study of meaning Search today is the sixth lecture ;-)

c Jannik Strötgen – ATIR-06 3 / 64

slide-4
SLIDE 4

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

What is Semantic Search?

c Jannik Strötgen – ATIR-06 4 / 64

slide-5
SLIDE 5

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Semantic Search

Semantic Search is about going beyond documents and queries as bag of words having a deeper understanding of document contents by leveraging world knowledge as structured data going beyond 10 blue links and providing users with direct answers to their (natural language) questions

c Jannik Strötgen – ATIR-06 5 / 64

slide-6
SLIDE 6

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

List Queries

c Jannik Strötgen – ATIR-06 6 / 64

slide-7
SLIDE 7

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Factoid Questions

c Jannik Strötgen – ATIR-06 7 / 64

slide-8
SLIDE 8

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Further Examples

c Jannik Strötgen – ATIR-06 8 / 64

slide-9
SLIDE 9

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

2014 Lecture

  • c

Jannik Strötgen – ATIR-06 9 / 64

slide-10
SLIDE 10

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Still not solved ;-)

c Jannik Strötgen – ATIR-06 10 / 64

slide-11
SLIDE 11

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Solved...

c Jannik Strötgen – ATIR-06 11 / 64

slide-12
SLIDE 12

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Solved, with explanation

c Jannik Strötgen – ATIR-06 12 / 64

slide-13
SLIDE 13

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

But still things beyond hope

c Jannik Strötgen – ATIR-06 13 / 64

slide-14
SLIDE 14

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

...

c Jannik Strötgen – ATIR-06 14 / 64

slide-15
SLIDE 15

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

But not just for question-style queries

c Jannik Strötgen – ATIR-06 15 / 64

slide-16
SLIDE 16

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

And of course: it’s not just google

c Jannik Strötgen – ATIR-06 16 / 64

slide-17
SLIDE 17

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

But it is not perfect

c Jannik Strötgen – ATIR-06 17 / 64

slide-18
SLIDE 18

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

But it is not perfect

c Jannik Strötgen – ATIR-06 17 / 64

slide-19
SLIDE 19

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

But it is not perfect

fall of the Berlin Wall

November 9, 1989

presidency of Ronald Reagan

January 20, 1981 – January 20, 1989

c Jannik Strötgen – ATIR-06 18 / 64

slide-20
SLIDE 20

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

But it is not perfect

c Jannik Strötgen – ATIR-06 19 / 64

slide-21
SLIDE 21

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

But it is not perfect

c Jannik Strötgen – ATIR-06 19 / 64

slide-22
SLIDE 22

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Outline

1

Semantic Web

2

Knowledge Bases

3

Entity Linking

4

Semantic Search

c Jannik Strötgen – ATIR-06 20 / 64

slide-23
SLIDE 23

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Outline

1

Semantic Web

2

Knowledge Bases

3

Entity Linking

4

Semantic Search

c Jannik Strötgen – ATIR-06 21 / 64

slide-24
SLIDE 24

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Semantic Web

Semantic Web is an extension of the World Wide Web, envisioned by Berners-Lee et al. [2], which aims at giving well-defined meaning to information (in Web pages) making the Web interpretable for machines facilitating exchange and reuse of data

c Jannik Strötgen – ATIR-06 22 / 64

slide-25
SLIDE 25

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Semantic Web Standards

World Wide Web Consortium (W3C) Semantic Web standards Unified Resource Identifier (URI) to uniquely identify an abstract of physical resource Resource Description Framework (RDF) to describe properties of abstract or physical resources Resource Description Framework Schema (RDF/S) to describe schemata of properties of abstract of physical resources Web Ontology Language (OWL) to describe ontologies SPARQL Protocol and Query Language (SPARQL) to formulate queries over properties of abstract or physical resources schema.org [ASSIGNMENT]

c Jannik Strötgen – ATIR-06 23 / 64

slide-26
SLIDE 26

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

URI

Unified Resource Identifier (URI) is a string of characters that uniquely identifies an abstract or physical resource

http://en.wikipedia.org/wiki/Foo_Fighters http://www.bbc.co.uk/music/artists/67f66c07-6e61-4026-ade5-7e782fad3a5d http://www.musicbrainz.org/artist/67f66c07-6e61-4026-ade5-7e782fad3a5d

http://www.host.orgwww.host.org/pub/bandspub/bands?query=FFquery=FF scheme (e.g., http, ftp, urn) determines interpretation of URI authority indicates who is responsible for resource (e.g., a host) path provides hierarchical information for identifying the resource query provides non-hierarchical information for identifying the resource fragment refers to a specific part of resource

c Jannik Strötgen – ATIR-06 24 / 64

slide-27
SLIDE 27

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

RDF

Resource description framework (RDF) provides a data model to describe properties of resources (identified by their URI) RDF statements are (S,P ,O) triples consisting of a subject (URI), a predicate (URI), and an object (URI or literal) example

http://dbtune.org/musicbrainz/page/artist/67f66c07-6e61-4026-ade5-7e782fad3a5d (S) http://xmlns.com/foaf/spec/20100809.html#member (P) http://dbtune.org/musicbrainz/page/artist/4d5f891d-9bce-45ae-ad86-912dd27252fa (O) (S) —————(P)————– > (O) Foo Fighters have member Dave Grohl

c Jannik Strötgen – ATIR-06 25 / 64

slide-28
SLIDE 28

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

RDF

RDF triples form an RDF graph (a labeled directed multigraph)

what’s a

graph directed graph labeled directed graph labeled directed multigraph

c Jannik Strötgen – ATIR-06 26 / 64

slide-29
SLIDE 29

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

RDF

RDF triples form an RDF graph (a labeled directed multigraph) Namespaces represent common URI prefixes and allow for a more compact representation of RDF data RDF/N3 as one common text representation of RDF data

c Jannik Strötgen – ATIR-06 27 / 64

slide-30
SLIDE 30

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

RDF

RDF triples form an RDF graph (a labeled directed multigraph) Namespaces represent common URI prefixes and allow for a more compact representation of RDF data RDF/N3 as one common text representation of RDF data @ prefix a: http://allaboutmusic.org a:Foo_Fighters a:member a:Dave_Grohl a:Foo_Fighters a:member a:Pat_Smear

c Jannik Strötgen – ATIR-06 27 / 64

slide-31
SLIDE 31

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

RDF

RDF triples form an RDF graph (a labeled directed multigraph) Namespaces represent common URI prefixes and allow for a more compact representation of RDF data RDF/N3 as one common text representation of RDF data

@ prefix a: http://allaboutmusic.org a:Foo_Fighters a:member a:Dave_Grohl a:Foo_Fighters a:member a:Pat_Smear c Jannik Strötgen – ATIR-06 27 / 64

slide-32
SLIDE 32

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

SPARQL

SPARQL Protocol and RDF Query Language (SPARQL) is a query language for the Semantic Web standardized by W3C

SPARQL from the linguistic point of view:

a recursive acronym SPARQL has a SQL-inspired syntax to define graph patterns and retrieves all matching subgraphs as query answers

c Jannik Strötgen – ATIR-06 28 / 64

slide-33
SLIDE 33

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

SPARQL

SPARQL Protocol and RDF Query Language (SPARQL) is a query language for the Semantic Web standardized by W3C SPARQL has a SQL-inspired syntax to define graph patterns and retrieves all matching subgraphs as query answers Query:

PREFIX a:<http://allmusic.org/> SELECT DISTINCT ?b, ?r, ?p WHERE { ?b a:hasMember ?p . ?p ?r a:Seattle . } ORDER BY ?p

Graph pattern: Answer:

c Jannik Strötgen – ATIR-06 29 / 64

slide-34
SLIDE 34

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Linked Open Data Project

problem:

Resources can be referred to by different URIs in different RDF datasets Linked Open Data Project creates a Web of Linked Open Data by establishing owl:sameAs links between RDF data sources As of 2011 (!) 295 RDF data sources 32 billion RDF triples 504 million links http://linkeddata.org/

c Jannik Strötgen – ATIR-06 30 / 64

slide-35
SLIDE 35

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Linked Open Data Project – 2014

c Jannik Strötgen – ATIR-06 31 / 64

slide-36
SLIDE 36

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Outline

1

Semantic Web

2

Knowledge Bases

3

Entity Linking

4

Semantic Search

c Jannik Strötgen – ATIR-06 32 / 64

slide-37
SLIDE 37

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Knowledge Bases

Knowledge bases (e.g., DBpedia, Wikidata, YAGO) provide data about real-world named entities, their properties and relations between them have typically been extracted from Wikipedia and different

  • ther data sources (WordNet, GeoNames, MusicBrainz, etc.)

form the core of the linked open data cloud

c Jannik Strötgen – ATIR-06 33 / 64

slide-38
SLIDE 38

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Wikipedia

Collaboratively edited encyclopedia that forms the common basis of other knowledge bases 5.1 M articles / 3.4 M edits per month / 7,398 M views per month (for English, April 2016) in 293 languages (with vastly different coverage) templates, infoboxes, and categories (quasi-structured) article contents (unstructured) with rich interlinkage revision history (who edited which page at what time) usage data (e.g., access statistics) started in 2001

c Jannik Strötgen – ATIR-06 34 / 64

slide-39
SLIDE 39

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Wikipedia

c Jannik Strötgen – ATIR-06 35 / 64

slide-40
SLIDE 40

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Wikipedia

c Jannik Strötgen – ATIR-06 35 / 64

slide-41
SLIDE 41

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Wikipedia

c Jannik Strötgen – ATIR-06 35 / 64

slide-42
SLIDE 42

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Wikipedia

c Jannik Strötgen – ATIR-06 35 / 64

slide-43
SLIDE 43

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Wikipedia

c Jannik Strötgen – ATIR-06 35 / 64

slide-44
SLIDE 44

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Wikidata

Collaboratively edited knowledge base intended to provide common source of structured data for

  • ther related projects (e.g., Wikipedia)

17 M data items, 345 M edits started in 2012

c Jannik Strötgen – ATIR-06 36 / 64

slide-45
SLIDE 45

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Wikidata

c Jannik Strötgen – ATIR-06 37 / 64

slide-46
SLIDE 46

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Wikidata

c Jannik Strötgen – ATIR-06 37 / 64

slide-47
SLIDE 47

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Wikidata

c Jannik Strötgen – ATIR-06 37 / 64

slide-48
SLIDE 48

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Wikidata

c Jannik Strötgen – ATIR-06 37 / 64

slide-49
SLIDE 49

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Wikidata

c Jannik Strötgen – ATIR-06 37 / 64

slide-50
SLIDE 50

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Wikidata

c Jannik Strötgen – ATIR-06 37 / 64

slide-51
SLIDE 51

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

DBPedia

Knowledge base extracted from Wikipedia 4.6 M things (for English, as of 2014) 80 M links to Wikipedia categories 41 M links to YAGO categories 50 M owl:sameAs links to other data sources started in 2007

c Jannik Strötgen – ATIR-06 38 / 64

slide-52
SLIDE 52

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

DBPedia

c Jannik Strötgen – ATIR-06 39 / 64

slide-53
SLIDE 53

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

DBPedia

c Jannik Strötgen – ATIR-06 39 / 64

slide-54
SLIDE 54

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

DBPedia

c Jannik Strötgen – ATIR-06 39 / 64

slide-55
SLIDE 55

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Freebase

Collaboratively edited knowledge base which includes information extracted from other data sources (e.g., Wikipedia, MusicBrainz) 44 M topics (organized into domains, as of 2014) 2.4 B facts more than 60 M deleted facts (as of 2013) focused on entertainment many topics without textual data started in 2007, acquired by Google in 2010 forms the core of the Google Knowledge Graph now: freebase data “moved” to wikidata [4]

attention

resources evolve, change, disappear

c Jannik Strötgen – ATIR-06 40 / 64

slide-56
SLIDE 56

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Yago

Knowledge base extracted from Wikipedia, WordNet, and GeoNames 10 M named entities 120 M facts 72 relation types with manually assessed accuracy (95%) started in 2007

c Jannik Strötgen – ATIR-06 41 / 64

slide-57
SLIDE 57

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Yago

c Jannik Strötgen – ATIR-06 42 / 64

slide-58
SLIDE 58

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Entity Linking

Entity linking (aka. “Wikification”): process of spotting mentions of named entities in documents, resolving their ambiguity, and linking them to known identifiers

c Jannik Strötgen – ATIR-06 43 / 64

slide-59
SLIDE 59

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Entity Linking

Entity linking (aka. “Wikification”): process of spotting mentions of named entities in documents, resolving their ambiguity, and linking them to known identifiers Page played his Gibson on Kashmir.

c Jannik Strötgen – ATIR-06 43 / 64

slide-60
SLIDE 60

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Entity Linking

Entity linking (aka. “Wikification”): process of spotting mentions of named entities in documents, resolving their ambiguity, and linking them to known identifiers Page played his Gibson on Kashmir.

c Jannik Strötgen – ATIR-06 43 / 64

slide-61
SLIDE 61

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Entity Linking

Entity linking (aka. “Wikification”): process of spotting mentions of named entities in documents, resolving their ambiguity, and linking them to known identifiers Page played his Gibson on Kashmir.

c Jannik Strötgen – ATIR-06 43 / 64

slide-62
SLIDE 62

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Entity Linking

Entity linking (aka. “Wikification”): process of spotting mentions of named entities in documents, resolving their ambiguity, and linking them to known identifiers Page played his Gibson on Kashmir.

c Jannik Strötgen – ATIR-06 43 / 64

slide-63
SLIDE 63

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Outline

1

Semantic Web

2

Knowledge Bases

3

Entity Linking

4

Semantic Search

c Jannik Strötgen – ATIR-06 44 / 64

slide-64
SLIDE 64

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Remember: Named Entities

entity

anything you can refer to with a name location, person, organization facilities, vehicles, songs, movies, products (and domain-dependent ones: genes & proteins, ...) sometimes: numbers, dates

relevant in IR

entities are popular and extremely frequent in queries names are highly ambiguous Washington → place(s), person(s), (government) Springfield

c Jannik Strötgen – ATIR-06 45 / 64

slide-65
SLIDE 65

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Remember: Named Entity Recognition

tasks extraction → determine the boundaries classification→ assign class (PER, LOC, ORG, . . . ) systems rule-based → with gazetteers, context-based rules (Mr.), . . . machine learning → features: mixed case (eBay), ends in digit (A9), all caps (BMW), . . . several tools available (e.g., Stanford NER) extraction is good, but normalization is better

c Jannik Strötgen – ATIR-06 46 / 64

slide-66
SLIDE 66

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Remember: Named Entity Normalization

same task, many names normalization linking resolution grounding example: Washington /wiki/Washington,_D.C. /wiki/Washington_%28state%29 /wiki/Washington_Irving /wiki/Washington_Redskins /wiki/George_Washington tools several tools available (AIDA, . . . )

c Jannik Strötgen – ATIR-06 47 / 64

slide-67
SLIDE 67

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Assignment

just recognition, no normalization

c Jannik Strötgen – ATIR-06 48 / 64

slide-68
SLIDE 68

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Outline

1

Semantic Web

2

Knowledge Bases

3

Entity Linking

4

Semantic Search

c Jannik Strötgen – ATIR-06 49 / 64

slide-69
SLIDE 69

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Semantic Search

Semantic search includes a wide variety of approaches (see Tranh and Mika [5] for a comprehensive overview)

c Jannik Strötgen – ATIR-06 50 / 64

slide-70
SLIDE 70

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Semantic Search

Semantic search includes a wide variety of approaches (see Tranh and Mika [5] for a comprehensive overview)

c Jannik Strötgen – ATIR-06 51 / 64

slide-71
SLIDE 71

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Semantic Search

Semantic search includes a wide variety of approaches (see Tranh and Mika [5] for a comprehensive overview)

c Jannik Strötgen – ATIR-06 51 / 64

slide-72
SLIDE 72

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Semantic Search

Semantic search includes a wide variety of approaches (see Tranh and Mika [5] for a comprehensive overview)

c Jannik Strötgen – ATIR-06 51 / 64

slide-73
SLIDE 73

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Semantic Search

Semantic search includes a wide variety of approaches (see Tranh and Mika [5] for a comprehensive overview)

c Jannik Strötgen – ATIR-06 51 / 64

slide-74
SLIDE 74

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Semantic Search

Semantic search includes a wide variety of approaches (see Tranh and Mika [5] for a comprehensive overview)

c Jannik Strötgen – ATIR-06 51 / 64

slide-75
SLIDE 75

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Semantic Search Examples

(More) Semantic Keyword Search improves ad-hoc information retrieval by leveraging knowledge bases (input: keywords, output: list of documents) Entity Search retrieves entities based on keyword queries which may include explicit/implicit cues about the target type (input: keywords + types, output: list of entities) Knowledge Search retrieves subgraphs as answers to structured queries which may be augmented with keywords (input: structured + keywords, output: list of subgraphs)

c Jannik Strötgen – ATIR-06 52 / 64

slide-76
SLIDE 76

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Semantic Keyword Search

Synonymy (e.g., car/automobile) and polysemy (e.g., jaguar as cat, car, guitar): negative effects on retrieval effectiveness Concept-based IR represents queries and documents in terms of (semantic) concepts instead of terms – WordNet synsets after word sense disambigutation – Latent concepts identified based on word co-occurrences Explicit Semantic Analysis (ESA) [3]: words (e.g., jaguar) in terms of the Wikipedia concepts that they describe

(e.g., JAGUAR, JAGUAR_CARS, JAGUAR_XK, FENDER_JAGUAR)

do you remember last week?

c Jannik Strötgen – ATIR-06 53 / 64

slide-77
SLIDE 77

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search c Jannik Strötgen – ATIR-06 54 / 64

slide-78
SLIDE 78

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search c Jannik Strötgen – ATIR-06 54 / 64

slide-79
SLIDE 79

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search c Jannik Strötgen – ATIR-06 54 / 64

slide-80
SLIDE 80

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Entity Oriented Search

Entity Search retrieves entities based on keyword queries queries may include explicit/implicit cues about target type (input: keywords + types, output: list of entities) Queries for specific entities are common in web search (e.g., museums with van goghs, bayern munich players, etc.) Product search with type cues are common in e-commerce

c Jannik Strötgen – ATIR-06 55 / 64

slide-81
SLIDE 81

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

LM for Entity Oriented Search

Balog et al. [1] develop a language modeling approach, assuming that entities come with ready-to-use representation: a textual description (Wikipedia article, IMDB profile, etc.) a set of categories with meaningful names (e.g., American film) American film, 2002 film, films about drugs, ... tools, pocket knifes, power and hand tools, ...

c Jannik Strötgen – ATIR-06 56 / 64

slide-82
SLIDE 82

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

LM for Entity Oriented Search

Query q = (T, C) consists of textual terms T and categories C Estimate language models θT

q and θC q for query components

Estimate language models θT

e and θC e for entity components

Rank entities for query q according to query likelihood P[q|e] = λ · P[θT

q |θT e ] + (1 − λ) · P[θC q |θC e ]

with probabilities estimated from Kullback-Leibler divergences

c Jannik Strötgen – ATIR-06 57 / 64

slide-83
SLIDE 83

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

LM for Entity Oriented Search

Heuristic estimation of probabilities P[θq|θe] from Kullback-Leibler divergences to make them comparable and combinable P[θq|θe] = maxe∗ KL(θq||θe∗) − KL(θq||θe)

  • e′ maxe∗ KL(θq||θe∗) − KL(θq||θe′)

Language modeling framework with various instantiations evaluated and compared, e.g., estimate θT

q from T and C but ignore θC q (M2)

estimate both θT

q and θC q from T (M3)

estimate θT

q only from T and θC q only from C (M4)

estimate θT

q from T and C and θC q only from C (M5)

c Jannik Strötgen – ATIR-06 58 / 64

slide-84
SLIDE 84

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

LM for Entity Oriented Search

dataset: INEX entity ranking: given a query, return entities Example: Query T = {Paul Auster novels} C = {NOVELS} what performs better M3 or M4?

c Jannik Strötgen – ATIR-06 59 / 64

slide-85
SLIDE 85

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

LM for Entity Oriented Search

(M4) uses a general target category (“novel”) “The Winthrop Woman (novel)” Wikipedia page has many categories with “novel”

  • ne of the categories found for (M3) is “Novels by Paul Auster”

“The New York Trilogy” Wikipedia page has category “Novels by Paul Auster”

c Jannik Strötgen – ATIR-06 60 / 64

slide-86
SLIDE 86

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Summary

Semantic Web aims at making the Web interpretable for machines; underlying standards include RDF and SPARQL Knowledge Bases provide world knowledge in structured form have been harvested from Wikipedia and other data sources Entity Linking is crucial (if we want to return text documents) Entity Search retrieves entities based on keyword queries which may include type cues; applications in web search and business

Thank you for your attention!

c Jannik Strötgen – ATIR-06 61 / 64

slide-87
SLIDE 87

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Summary

References 1 K. Balog, M. Bron, M. de Rijke: Query Modeling for Entity Search Based on Terms, Categories, and Examples, ACM TOIS 29(4), 2011 2 T. Berners-Lee, J. Hendler, O. Lassila: The Semantic Web, Scientific American, 2001 3 O. Egozi, S. Markovitch, E. Gabrilovich: Concept-Based Information Retrieval, Using Explicit Semantic Analysis, ACM TOIS 29(2), 2011 4 T. Pellissier Tanon, D. Vrandecic, S. Schaffert, T. Steiner, L. Pintscher: From Freebase to Wikidata: The Great Migration, WWW 2016 5 T. Tranh and P . Mika: Semantic Search â Systems, Concepts, Methods, and the Communities behind it, 2013

c Jannik Strötgen – ATIR-06 62 / 64

slide-88
SLIDE 88

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Summary Thank you for your attention!

Furthermore:

  • U. Sawant and S. Chakrabarti: Learning Joint Query

Interpretation and Response Ranking, WWW 2013

  • M. Yahya, K. Berberich, S. Elbassuoni, M. Ramanath, V. Tresp,
  • G. Weikum: Natural Language Questions for the Web of Data,

EMNLP 2012

c Jannik Strötgen – ATIR-06 63 / 64

slide-89
SLIDE 89

Motivation Semantic Web Knowledge Bases Entity Linking Semantic Search

Thanks

some slides / examples are taken from / similar to those of: Klaus Berberich, Saarland University, previous ATIR lecture

c Jannik Strötgen – ATIR-06 64 / 64