Entity Representation and Retrieval Laura Dietz University of New - PowerPoint PPT Presentation

Entity Representation and Retrieval Laura Dietz University of New Hampshire Alexander Kotov Wayne State University Edgar Meij Bloomberg SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Knowledge Graph Fragment SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Entity Retrieval Besides documents, users often search for concrete or abstract entities/objects (i.e. people, products, organizations, books) Users are willing to express these information needs more elaborately than with a few keywords [Balog et al., SIGIR’08] Entities (or entity cards) provide immediate answers to such queries → natural units for organizing search results Knowledge graphs are built around entities → Entity Retrieval from Knowledge Graph(s) (ERKG) SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Entity Retrieval Tasks Entity Search : simple queries aimed at finding a particular entity or an entity which is an attribute of another entity ◮ “Ben Franklin” ◮ “Einstein Relativity theory” ◮ “England football player highest paid” List Search : descriptive queries with several relevant entities ◮ “US presidents since 1960” ◮ “animals lay eggs mammals” ◮ “Formula 1 drivers that won the Monaco Grand Prix” Question Answering : queries are questions in natural language ◮ “Who founded Intel?” ◮ “For which label did Elvis record his first album?” SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Entity Retrieval from Knowledge Graph(s) (ERKG) Evolution of entity retrieval tasks: ◮ Expert search at TREC 2005–2008 enterprise track: find experts knowledgeable about a given topic ◮ Entity ranking track at INEX 2007–2009: find Wikipedia page of entities with a given target type ◮ Related entity search at TREC 2009–2011 entity track: find Web pages of entities related to a given entity in a certain way Can be used for entity linking: fragment of text as query, list of linked entities as result Can be combined with methods using KGs for ad-hoc or Web search (part 3 of this tutorial) SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Why ERKG? Unique IR problem: there are no documents . Entities in KG have no textual representation, apart from their names Challenging IR problem: knowledge graphs are best suited for structured graph pattern-based SPARQL queries, not for traditional IR models SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Research Challenges in ERKG ERKG requires accurate interpretation of unstructured textual queries and matching them with entity semantics: 1. How to design entity representations that capture the semantics of entity properties and relations to other entities? 2. How to semantically match unstructured queries with structured entity representations? 3. How to account for entity types in retrieval? SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Architecture of ERKG Methods [Tonon, Demartini et al., SIGIR’12] SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Outline Entity representation Entity retrieval Entity set expansion Entity ranking SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Structured Entity Documents Build a textual representation (i.e. “document”) for each entity by considering all triples, where it stands as a subject (or object) SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Predicate Folding Simple approach: each predicate corresponds to one entity document field Problem: there are infinitely many predicates → optimization of field importance weights is computationally intractable Predicate folding: group predicates into a small set of predefined categories → entity documents with smaller number of fields ◮ by predicate type (attributes, incoming/outgoing links)[P´ erez-Ag¨ uera et al., SemSearch 2010] ◮ by predicate importance (determined based on predicate popularity)[Blanco et al., ISWC 2011] The number and type of fields depends on a retrieval task SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Predicate Folding Example SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

2-field Entity Document [Neumayer, Balog et al., ECIR’12] Each entity is represented as a two-field document: title object values belonging to predicates ending with “name”, “label” or “title” content object values for 1000 most frequent predicates concatenated together into a flat text representation This simple scheme is effective for entity retrieval SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

2-field Entity Document Example SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

3-field Entity Document [Zhiltsov and Agichtein, CIKM’13] Each entity is represented as a three-field document: names literals of foaf:name , rdfs:label predicates along with tokens extracted from entity URIs attributes literals of all other predicates outgoing links names of entities in the object position This scheme is effective for entity retrieval SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

5-field Entity Document [Zhiltsov, Kotov et al., SIGIR’15] Each entity is represented as a five-field document: names labels or names of entities attributes all entity properties, other than names categories classes or groups, to which the entity has been assigned similar entity names names of the entities that are very similar or identical to a given entity related entity names names of entities in the object position This flexible scheme is effective for a variety of tasks: entity search, list search, question answering SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Challenges related to Entity Representations Vocabulary mismatch between relevant entity(ies) description(s) and the query terms that can be used to search for it(them) Associations between words and entities depend on the context: ◮ Germany should be returned for queries related to World War II and 2006 Soccer World Cup Real-life events change the descriptions of entities: ◮ Ferguson, Missouri before and after August 2014 SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Dynamic Entity Representation [Graus, Tsagkias et al., WSDM’16] Idea: create static entity representations using knowledge bases and leverage different social media sources to dynamically update them Represent entities as fielded documents, in which each field corresponds to different source Tweak the weights of different fields over time SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Static Sources SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Dynamic Sources SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Outline Entity representation Entity retrieval Entity set expansion Entity ranking SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Methods for ERKG ERKG has been addressed in a probabilistic generative framework: P ( e | q ) ∝ P ( q | e ) P ( e ) Besides keywords q w , query q implicitly or explicitly contains target entity type(s) q t , which can be incorporated into entity retrieval models SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Incorporating Entity Types Two ways to combine term-based similarity P ( q w | e ) and type-based similarity P ( q t | e ): Filtering [Bron et al., CIKM’10]: P ( q | e ) = P ( q w | e ) P ( q t | e ) Interpolation [Balog et al., TOIS’11; Kaptein et al., AI’13; Pehcevski et al., IR’10; Raviv et al., JIWES’12]: P ( q | e ) = (1 − λ t ) P ( q w | e ) + λ t P ( q t | e ) SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Term-based Similarity Possible options for P ( q w | e ): unigram bag-of-words models for structured document retrieval: ◮ Mixture of Language Models (MLM) [Ogilvie and Callan, SIGIR’03] ◮ BM25 for multi-field documents (BM25F) [Robertson et al., CIKM’04] ◮ Probabilistic Retrieval Model for Semi-structured Data (PRMS) [Kim and Croft, ECIR’09] term dependence (bigrams) models: ◮ Sequential Dependence Model (SDM) [Metzler and Croft, SIGIR’05] term dependence models for structured document retrieval: ◮ Fielded Sequential Dependence Model (FSDM) [Zhiltsov et al., SIGIR’15] ◮ Parameterized Fielded Sequential Dependence Model (PFSDM) [Nikolaev et al., SIGIR’16] SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Fielded Sequential Dependence Model [Zhiltsov, Kotov et al., SIGIR’15] Idea: account both for phrases (bigrams) and document structure Document score is a linear combination of matching functions for unigrams and bigrams in each document field : rank � ˜ P Λ ( D | Q ) = λ T f T ( q i , D ) + q ∈ Q � ˜ f O ( q i , q i +1 , D ) + λ O q ∈ Q ˜ � λ U f U ( q i , q i +1 , D ) q ∈ Q MLM is a special case of FSDM, when λ T = 1 , λ O = 0 , λ U = 0 SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR

Entity Representation and Retrieval Laura Dietz University of New - PowerPoint PPT Presentation

Entity Representation and Retrieval Laura Dietz University of New Hampshire Alexander Kotov Wayne State University Edgar Meij Bloomberg SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR Knowledge Graph Fragment SIGIR 2018 Tutorial on

Entity Representation and Retrieval from Knowledge Graphs Alexander Kotov Textual Data Analytics

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

Luo Si Department of Computer Science Purdue University Retrieval Models Information Need

analysing entity context in multilingual wikipedia to support entity-centric retrieval

Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar

Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris

Information Retrieval Introducing Information Retrieval and Web Search

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

Accessing XML content: An information retrieval perspective Mounia Lalmas mounia@acm.org 1

Semantic security framework and context- aware role-based access control ontology for Smart

Improved Path Exploration in shim6-based Multihoming S ebastien Barr e , Olivier Bonaventure

Net Centricity Net Centricity A full contact Social Sport A full contact Social Sport Page 2 1

ICN-IoT and its Evaluation Sugang Li, Yanyong Zhang, Dipankar Raychaudhuri (WINLAB, Rutgers

Hui Zhang, Jeffrey K. Hollingsworth {hzhang86, hollings}@cs.umd.edu Department of Computer

We add Providing brands with human a future proof frame context. for innovation. Some cases

Transforming to a customer-centric product organisation through customer journey teams. David

A FRAMEWORK OF ADAPTIVE INTERACTION SUPPORT IN CLOUD-BASED INTERNET OF THINGS (IOT) ENVIRONMENT

Entity Representation and Retrieval Laura Dietz University of New - PowerPoint PPT Presentation

Entity Representation and Retrieval Laura Dietz University of New Hampshire Alexander Kotov Wayne State University Edgar Meij Bloomberg SIGIR 2018 Tutorial on Utilizing KGs for Text-centric IR Knowledge Graph Fragment SIGIR 2018 Tutorial on

Entity Representation and Retrieval from Knowledge Graphs Alexander Kotov Textual Data Analytics

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search &amp; Management Retrieval Models

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

Luo Si Department of Computer Science Purdue University Retrieval Models Information Need

analysing entity context in multilingual wikipedia to support entity-centric retrieval

Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar

Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris

Information Retrieval Introducing Information Retrieval and Web Search

CS54701: Information Retrieval CS-54701 Information Retrieval Luo Si Department of Computer

Accessing XML content: An information retrieval perspective Mounia Lalmas mounia@acm.org 1

Semantic security framework and context- aware role-based access control ontology for Smart

Improved Path Exploration in shim6-based Multihoming S ebastien Barr e , Olivier Bonaventure

Net Centricity Net Centricity A full contact Social Sport A full contact Social Sport Page 2 1

ICN-IoT and its Evaluation Sugang Li, Yanyong Zhang, Dipankar Raychaudhuri (WINLAB, Rutgers

Hui Zhang, Jeffrey K. Hollingsworth {hzhang86, hollings}@cs.umd.edu Department of Computer

We add Providing brands with human a future proof frame context. for innovation. Some cases

Transforming to a customer-centric product organisation through customer journey teams. David

A FRAMEWORK OF ADAPTIVE INTERACTION SUPPORT IN CLOUD-BASED INTERNET OF THINGS (IOT) ENVIRONMENT

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models