Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in - PowerPoint PPT Presentation

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data Nikita Zhiltsov 1 , 2 Alexander Kotov 3 Fedor Nikolaev 3 1 Kazan Federal University 2 Textocat 3 Textual Data Analytics Lab, Department of Computer Science, Wayne State University

Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Overview Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion 2/34

Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Knowledge Graphs 3/34

Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Linked Open Data (LOD) Cloud 4/34

Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Entities ◮ Material objects or concepts in the real world or fiction (e.g. people, movies, conferences etc.) ◮ Are connected with other entities by relations (e.g. hasGenre, actedIn, isPCmemberOf etc.) ◮ Subject-Predicate-Object (SPO) triple: subject=entity; object=entity (or primitive data value); predicate=relationship between subject and object ◮ Many SPO triples → knowledge graph 5/34

Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion DBPedia entity page example 6/34

Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Entity Retrieval from Knowledge Graph(s) ◮ Graph KBs are perfectly suited for addressing the information needs that aim at finding specific objects (entities) rather than documents ◮ Given the user’s information need expressed as a keyword query, retrieve a relevant set of objects from the knowledge graph(s) 7/34

Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Typical ERWD tasks ◮ Entity Search Queries refer to a particular entity. ◮ “Ben Franklin” ◮ “England football player highest paid” ◮ “Einstein Relativity theory” ◮ List Search Complex queries with several relevant entities. ◮ “US presidents since 1960” ◮ “animals lay eggs mammals” ◮ Question Answering Queries are questions in natural language. ◮ “Who is the mayor of Santiago?” ◮ “For which label did Elvis record his first album?” 8/34

Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Fundamental problems in ERWD ◮ Designing effective and concise entity representations • Pound, Mika et al. Ad-hoc Object Retrieval in the Web of Data, WWW’10 • Blanco, Mika et al. Effective and Efficient Entity Search in RDF Data, ISWC’11 • Neumayer, Balog et al. On the Modeling of Entities for Ad-hoc Entity Search in the Web of Data, ECIR’12 ◮ Developing accurate retrieval models • Mostly adaptations of standard unigram bag-of-words retrieval models, such as BM25F, MLM 9/34

Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Entity document An entity is represented as a structured (multi-fielded) document: names Conventional names of the entities, such as the name of a person or the name of an organization attributes All entity properties, other than names categories Classes or groups, to which the entity has been assigned similar entity names Names of the entities that are very similar or identical to a given entity related entity names Names of the entities that are part of the same RDF triple 11/34

Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Entity document example Multi-fielded entity document for the entity Barack Obama . Field Content names barack obama barack hussein obama ii attributes 44th current president united states birth place honolulu hawaii categories democratic party united states senator nobel peace prize laureate christian similar entity names barack obama jr barak hussein obama barack h obama ii related entity names spouse michelle obama illinois state predecessor george walker bush 12/34

Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion Motivation Previous research in ad-hoc IR has focused on two major directions: ◮ unigram bag-of-words retrieval models for multi-fielded documents • Ogilvie and Callan. Combining Document Representations for Known-item Search, SIGIR’03 • Robertson et al. Simple BM25 Extension to Multiple Weighted Fields, CIKM’04 ◮ retrieval models incorporating term dependencies • Metzler and Croft. A Markov Random Field Model for Term Dependencies, SIGIR’05 • Huston and Croft. A Comparison of Retrieval Models using Term Dependencies, CIKM’14 Goal : to develop a retrieval model that captures both document structure and term dependencies 14/34

Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion MLM rank � P ( q i | θ D ) tf ( q i ) , P ( Q | D ) = q i ∈ Q where w j P ( q i | θ j � P ( q i | θ D ) = D ) j 15/34

Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion SDM Ranks w.r.t. P Λ ( D | Q ) = � i ∈{ T , U , O } λ i f i ( Q , D ) Potential function for unigrams is QL: cf qi tf q i , D + µ | C | f T ( q i , D ) = log P ( q i | θ D ) = log | D | + µ 16/34

Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion FSDM ranking function FSDM incorporates document structure and term dependencies with the following ranking function: rank ˜ � P Λ ( D | Q ) = λ T f T ( q i , D ) + q ∈ Q ˜ � λ O f O ( q i , q i + 1 , D ) + q ∈ Q ˜ � f U ( q i , q i + 1 , D ) λ U q ∈ Q Separate MLMs for bigrams and unigrams give FSDM the flexibility to adjust the document scoring depending on the query type MLM is a special case of FSDM, when λ T = 1 , λ O = 0 , λ U = 0 17/34

Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion FSDM ranking function P otential function for unigrams in case of FSDM: cf j qi tf q i , D j + µ j | C j | ˜ j P ( q i | θ j � w T � w T f T ( q i , D ) = log D ) = log j | D j | + µ j j j Example apollo astronauts who walked on the moon 18/34

Entities Entity Representation Fielded Sequential Dependence Model Parameter Estimation Results Conclusion FSDM ranking function P otential function for unigrams in case of FSDM: cf j qi tf q i , D j + µ j | C j | ˜ j P ( q i | θ j � w T � w T f T ( q i , D ) = log D ) = log j | D j | + µ j j j Example apollo astronauts who walked on the moon category 18/34

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in - PowerPoint PPT Presentation

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data Nikita Zhiltsov 1 , 2 Alexander Kotov 3 Fedor Nikolaev 3 1 Kazan Federal University 2 Textocat 3 Textual Data Analytics Lab, Department of Computer Science, Wayne

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Measuring Dependence and Conditional Dependence with Kernels Kenji Fukumizu The Institute of

Linear dependence and independence Linear dependence 1 Definition (linear (in)dependence) Let {

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Area 11 Redistricting Ad-Hoc Committee AREA 11 Redistricting Ad-Hoc Committee March 8 th 2017 a

Routing In Ad Hoc Networks 1. Introduction to Ad-hoc networks 2. Routing in Ad-hoc networks 3.

Ad-hoc and Mesh Networks MAP-I Manuel P. Ricardo Faculdade de Engenharia da Universidade do

Mobile Communications Ad-hoc and Mesh Networks Manuel P. Ricardo Faculdade de Engenharia da

http://ceds.ed.gov CEDS Data Model The CEDS Data Model Process Domain Normalized CEDS Entity

IT360 Applied Database Systems Entity-Relationship Model Chapter 5 in Kroenke Database

Treating Tobacco Treating Tobacco Treating Tobacco Treating Tobacco Dependence and Providing

Control-dependence Analysis 2 Control-dependence Analysis 1. Introduction (motivation, overview)

More refined representations Control dependence graph Problem: control-flow edges in CFG

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Introduction to Information Retrieval and Web Search Tao Yang UCSB CS293S, Winter 2017 Table of

Text Clustering Luo Si Department of Computer Science Purdue University [Borrows slides from

Henry Corrigan-Gibbs Dmitry Kogan EPFL & MIT Stanford Eurocrypt 2020 PIR schemes with

In Information Retrieval (CIR IR) - A re review of f neura ral appro roaches Jianfeng Gao,

Axiomatic Analysis and Optimization of Information Retrieval Models ChengXiang (Cheng)

Goals Advance math-aware search Advance semantic analysis of mathematical notation and

Storage and Retrieval Cycle A storage and retrieval (S/R) cycle is one complete roundtrip from

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in - PowerPoint PPT Presentation

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data Nikita Zhiltsov 1 , 2 Alexander Kotov 3 Fedor Nikolaev 3 1 Kazan Federal University 2 Textocat 3 Textual Data Analytics Lab, Department of Computer Science, Wayne

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Measuring Dependence and Conditional Dependence with Kernels Kenji Fukumizu The Institute of

Linear dependence and independence Linear dependence 1 Definition (linear (in)dependence) Let {

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Area 11 Redistricting Ad-Hoc Committee AREA 11 Redistricting Ad-Hoc Committee March 8 th 2017 a

Routing In Ad Hoc Networks 1. Introduction to Ad-hoc networks 2. Routing in Ad-hoc networks 3.

Ad-hoc and Mesh Networks MAP-I Manuel P. Ricardo Faculdade de Engenharia da Universidade do

Mobile Communications Ad-hoc and Mesh Networks Manuel P. Ricardo Faculdade de Engenharia da

http://ceds.ed.gov CEDS Data Model The CEDS Data Model Process Domain Normalized CEDS Entity

IT360 Applied Database Systems Entity-Relationship Model Chapter 5 in Kroenke Database

Treating Tobacco Treating Tobacco Treating Tobacco Treating Tobacco Dependence and Providing

Control-dependence Analysis 2 Control-dependence Analysis 1. Introduction (motivation, overview)

More refined representations Control dependence graph Problem: control-flow edges in CFG

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Introduction to Information Retrieval and Web Search Tao Yang UCSB CS293S, Winter 2017 Table of

Text Clustering Luo Si Department of Computer Science Purdue University [Borrows slides from

Henry Corrigan-Gibbs Dmitry Kogan EPFL &amp; MIT Stanford Eurocrypt 2020 PIR schemes with

In Information Retrieval (CIR IR) - A re review of f neura ral appro roaches Jianfeng Gao,

Axiomatic Analysis and Optimization of Information Retrieval Models ChengXiang (Cheng)

Goals Advance math-aware search Advance semantic analysis of mathematical notation and

Storage and Retrieval Cycle A storage and retrieval (S/R) cycle is one complete roundtrip from

Henry Corrigan-Gibbs Dmitry Kogan EPFL & MIT Stanford Eurocrypt 2020 PIR schemes with