Entity Representation and Retrieval Laura Dietz University of New - PowerPoint PPT Presentation

Entity Representation and Retrieval Laura Dietz University of New Hampshire Alexander Kotov Wayne State University Edgar Meij Bloomberg L.P . WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Knowledge Graph Fragment WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Entity Retrieval ◮ Users often search for concrete or abstract objects (i.e. people, products or locations), rather than documents ◮ Search results are names of entities or entity representations (i.e. entity cards) ◮ Users are willing to express their information need more elaborately than with a few keywords [Balog et al. 2008] ◮ Knowledge graphs are perfectly suited for entity retrieval WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Typical Entity Retrieval Tasks ◮ Entity Search : simple queries aimed at finding a particular entity or an entity which is an attribute of another entity ◮ “Ben Franklin” ◮ “Einstein Relativity theory” ◮ “England football player highest paid” ◮ List Search : descriptive queries with several relevant entities ◮ “US presidents since 1960” ◮ “animals lay eggs mammals” ◮ “Formula 1 drivers that won the Monaco Grand Prix” ◮ Question Answering : queries are questions in natural language ◮ “Who founded Intel?” ◮ “For which label did Elvis record his first album?” WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Entity Retrieval from Knowledge Graph(s) (ERKG) ◮ Assumes keyword queries (structured queries are studied in database community) ◮ Different from ad-hoc entity retrieval, which is focused on retrieving entities embedded in documents, e.g: ◮ Entity track at TREC 2009–2011 ◮ Entity Ranking track at INEX 2007–2009 ◮ Expert Finding in Enterprise Search ◮ Different from entity linking, which aims at identifying entities mentioned in queries (part 1 of this tutorial) ◮ Can be combined with methods using KGs for ad-hoc or Web search (part 3 of this tutorial) WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Why ERKG? ◮ Unique IR problem: there are no documents ◮ Challenging IR problem: knowledge graphs are designed for graph pattern-based SPARQL queries WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Research challenges in ERKG ERKG requires accurate interpretation of unstructured textual queries and matching them with entity semantics: 1. How to design entity representations that capture the semantics of entity properties and relations to other entities? 2. How to develop accurate and efficient entity retrieval models? WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Architecture of ERKG Methods [Tonon, Demartini et al., SIGIR’12] WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Outline ◮ Entity representation ◮ Entity retrieval ◮ Entity set expansion ◮ Entity ranking WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Structured Entity Documents Build a textual representation (i.e. “document”) for each entity by considering all triples, where it stands as a subject (or object) WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Predicate Folding ◮ Simple approach: each predicate corresponds to one document field ◮ Problem: there are infinitely many predicates → optimization of field importance weights is computationally intractable ◮ Predicate folding: group predicates into a small set of predefined categories → entity documents with smaller number of fields ◮ By predicate type (attributes, incoming/outgoing links)[P´ erez-Ag¨ uera et al. 2010] ◮ By predicate importance (determined based on predicate popularity)[Blanco et al. 2010] WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Predicate Folding Example WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

2-field Entity Document [Neumayer, Balog et al., ECIR’12] Each entity is represented as a two-field document: title object values belonging to predicates ending with “name”, “label” or “title” content object values for 1000 most frequent predicates concatenated together into a flat text representation WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

2-field Entity Document Example WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

3-field Entity Document [Zhiltsov and Agichtein, CIKM’13] Each entity is represented as a three-field document: names literals of foaf:name , rdfs:label predicates along with tokens extracted from entity URIs attributes literals of all other predicates outgoing links names of entities in the object position WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

5-field Entity Document [Zhiltsov, Kotov et al., SIGIR’15] Each entity is represented as a five-field document: names conventional names of entities, such as the name of a person or the name of an organization attributes all entity properties, other than names categories classes or groups, to which the entity has been assigned similar entity names names of the entities that are very similar or identical to a given entity related entity names names of entities in the object position WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Dynamic Entity Representation [Graus, Tsagkias et al., WSDM’16] ◮ Problem: vocabulary mismatch between entity’s description in a knowledge base and the way people refer to the entity when searching for it ◮ Entity representations should account for: ◮ Context: entities can appear in different contexts (e.g. Germany should be returned for queries related to World War II and 2014 Soccer World Cup) ◮ Time: entities are not static in how they are perceived (e.g. Ferguson, Missouri before and after August 2014) WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Approach (1) Leverage collective intelligence provided by different entity description sources (KBs, web anchors, tweets, social tags, query log) to fill in the “vocabulary gap”: ◮ Create and update entity representations based on different sources ◮ Combine different entity descriptions for retrieval at specific time intervals by dynamically assigning weights to different sources WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Approach (2) WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Dynamic Entity Representation Represent entities as fielded documents, in which each field corresponds to the content that comes from one description source: ◮ Knowledge base: anchor text of inter-knowledge base hyperlinks, redirects, category titles, names of entities that are linked from and to each entity in Wikipedia ◮ Web anchors: anchor text of links to Wikipedia pages from Google Wikilinks corpus ◮ Twitter: all English tweets that contain links to Wikipedia pages representing entities in the used snapshot ◮ Delicious: tags associated with Wikipedia pages in SocialBM0311 dataset ◮ Queries: queries that result in clicks on Wikipedia pages in the used snapshot WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Entity Updates The fields of entity document: e = { ¯ f e title , ¯ f e text , ¯ f e anchors , . . . , ¯ f e query } are updated at each discretized time point T = { t 1 , t 2 , t 3 , . . . , t n } � q , ¯ if e clicked ¯ query ( t i ) = ¯ f e f e query ( t i − 1 ) + 0 , otherwise ¯ tweets ( t i ) = ¯ f e f e tweets ( t i − 1 ) + tweet e ¯ tags ( t i ) = ¯ f e f e tags ( t i − 1 ) + tag e Each field’s contribution towards the final entity score is determined based on features WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Features ◮ Field similarity : TF-IDF cosine similarity of query and field f at time t i ◮ Field importance (favor fields with more novel content): field’s length in terms; field’s length in characters; field’s novelty at time t i (favor fields with unseen, newly associated terms); number of updates to the field from t 0 through t 1 ◮ Entity importance (favor recently updated entities): time since the last entity update Classification-based ranker supervised by clicks learns the optimal feature weights WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Results (a) adaptive runs (b) non-adaptive runs ◮ Social tags are the best performing single entity description source ◮ KB+queries yields substantial relative improvement → added queries provide a strong signal for ranking the clicked entities ◮ Rankers that incorporate dynamic description sources (i.e KB+tags, KB+tweets and KB+queries) show the highest learning rate → entity content from these sources accounts for changes in entity representations over time WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Outline ◮ Entity representation ◮ Entity retrieval ◮ Entity set expansion ◮ Entity ranking WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Setting Field Weights ◮ Structured entity documents can be retrieved using structured document retrieval models (B25F, MLM) ◮ Problem: how to set the weights of document fields? ◮ Heuristically: proportionate to the length of content in the field ◮ Empirically: by optimizing the target retrieval metric using training queries WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR

Entity Representation and Retrieval Laura Dietz University of New - PowerPoint PPT Presentation

Entity Representation and Retrieval Laura Dietz University of New Hampshire Alexander Kotov Wayne State University Edgar Meij Bloomberg L.P . WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR Knowledge Graph Fragment WSDM 2017 Tutorial on

Entity Representation and Retrieval from Knowledge Graphs Alexander Kotov Textual Data Analytics

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

Luo Si Department of Computer Science Purdue University Retrieval Models Information Need

analysing entity context in multilingual wikipedia to support entity-centric retrieval

Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar

Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris

Information Retrieval Introducing Information Retrieval and Web Search

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

Accessing XML content: An information retrieval perspective Mounia Lalmas mounia@acm.org 1

Hoog e Fast Type Searching Neil Mitchell www.cs.york.ac.uk/~ndm/ Hoogle Synopsis Hoogle is

IIIF SEARCH API @glenrobson USE CASES Searching OCR generated text to find words or phrases

Guided Policy Search Sergey Levine Learning on PR2 Shape sorting cube Visuomotor Policies

Weight Agnostic Neural Networks Adam Gaier 1,2 , David Ha 1 1 Google Brain, 2 Inria / CNRS /

Course : Data mining Topic : Similarity search Aristides Gionis Aalto University Department of

Towards Searchable and Verifjable Blockchain Cheng Xu Ce Zhang April 8, 2019 Department of

State-Space Search and the STRIPS Planner Searching for a Path through a Graph of Nodes

Efficient Algorithms P and NP So far, we have developed algorithms for finding shortest

Entity Representation and Retrieval Laura Dietz University of New - PowerPoint PPT Presentation

Entity Representation and Retrieval Laura Dietz University of New Hampshire Alexander Kotov Wayne State University Edgar Meij Bloomberg L.P . WSDM 2017 Tutorial on Utilizing KGs in Text-centric IR Knowledge Graph Fragment WSDM 2017 Tutorial on

Entity Representation and Retrieval from Knowledge Graphs Alexander Kotov Textual Data Analytics

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search &amp; Management Retrieval Models

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

Luo Si Department of Computer Science Purdue University Retrieval Models Information Need

analysing entity context in multilingual wikipedia to support entity-centric retrieval

Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar

Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris

Information Retrieval Introducing Information Retrieval and Web Search

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

Accessing XML content: An information retrieval perspective Mounia Lalmas mounia@acm.org 1

Hoog e Fast Type Searching Neil Mitchell www.cs.york.ac.uk/~ndm/ Hoogle Synopsis Hoogle is

IIIF SEARCH API @glenrobson USE CASES Searching OCR generated text to find words or phrases

Guided Policy Search Sergey Levine Learning on PR2 Shape sorting cube Visuomotor Policies

Weight Agnostic Neural Networks Adam Gaier 1,2 , David Ha 1 1 Google Brain, 2 Inria / CNRS /

Course : Data mining Topic : Similarity search Aristides Gionis Aalto University Department of

Towards Searchable and Verifjable Blockchain Cheng Xu Ce Zhang April 8, 2019 Department of

State-Space Search and the STRIPS Planner Searching for a Path through a Graph of Nodes

Efficient Algorithms P and NP So far, we have developed algorithms for finding shortest

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models