1 EoS: Voting approach ( Balog06, MacDonald09 ) Markov Random Fields - PDF document

A Ranking Framework for Entity Oriented Search using Markov Random Fields Hadas Hadas Raviv, David Carmel Oren Kurland IBM Research – Haifa Lab , Faculty of IE and Management Israel Technion, Israel David Oren IBM Research - Haifa IE&M Technion, Haifa 2 MRF for EoS JIWES 2012, Portland OR Outline Entity Oriented Search (EoS) � When people use retrieval systems they are often not searching for � Entities Oriented Search documents or text passages � Popular Approaches � Often named entities play a central role in answering such information � MRF for information retrieval needs � persons, organizations, � MRF for Entity Oriented search locations, products… � Entity Document Scoring � Entity Type Scoring � Entity Name Scoring � At least 20-30% of the queries submitted to Web SE are simply named entities � Evaluation � ~71% of Web search queries contain � Benchmarks: INEX entity tracks 2007 -2009 named entities � Experimental Results ( Named entity recognition in query , Guo et al, SIGIR09) � Summary and future work MRF for EoS JIWES 2012, Portland OR MRF for EoS JIWES 2012, Portland OR 3 4 EoS: Profile based Approach (Craswell et al 2001): � Represent each entity by a virtual document (a profile) e1 d_e1 e2 d_e2 � e.g. e3 d_e3 e4 d_e4 � Entity home-page � Concatenating passages mentioning the entity d_e1 � Rank those profiles according to their relevance to the query d_e2 � Using standard IR ranking techniques q d_e3 � Difficulties: � Co-resolution and name disambiguation d_e4 � Profiling is not an easy task 5 MRF for EoS JIWES 2012, Portland OR 6 MRF for EoS JIWES 2012, Portland OR 1

EoS: Voting approach ( Balog06, MacDonald09 ) Markov Random Fields for IR (Metzler & Croft 2005) � Any relevant document is a “voter” for the entity mentioned within its content p1 d1 d2 p2 q d3 p3 � � � Full Independence Sequential dependence Full dependence Score ( p , q ) Score ( d , q ) Score ( p , d ) d C T =(q i ,D) C O =(#1(q i ..q i+k ),D) C U =(#uwN{q i ..q j },D) � What is the ratio behind? � � � � � An entity mentioned many times in relevant (top retrieved) docs P( D Q | ) f c ( ) c is more likely to be relevant on the given topic? c C G ( ) 7 MRF for EoS JIWES 2012, Portland OR 8 MRF for EoS JIWES 2012, Portland OR MRF for EoS MRF based Entity Document Scoring P(E D |Q) � We consider cliques of the three types � Full Independent (C T ) � Sequential dependent (C O ) � Full dependent (C U ) � The feature functions f I D (c) over clique of type I (I in {T,O,U}) � measures how well the clique's terms represent the entity document � Based on Dirichlet smoothed language model � � � � � tf q E ( , ) cf q ( )/ | C | T � f ( , q E ) log � i D i � � � D i D � | E | � D � For C O and C U we replace q i with #1(q i ..q i+k ) and #uwN({q i ,.. q j }) respectively � The entity document scoring function aggregates the feature functions over all clique types � � � � � � � P( E Q | ) P( E | Q ) � I I �� P( E | Q ) f ( ) c Q { ... q q }, T E P D E D P 1 n D � � , � � P D T N , , } I T O U , } c I ED MRF for EoS JIWES 2012, Portland OR MRF for EoS JIWES 2012, Portland OR 9 10 Entity type Scoring P(E T |Q) Entity Name Scoring P(E N |Q) � f T (c) is defined over a single clique composed of E T and Q T � Clique types: � � � � d Q ( , E ) e T T � � � � � Query terms independent – S EN - a single node clique containing the P E ( | Q ) f ( ) c log � � � T T d Q ( , E ' ) � e � � T T � entity name alone � E ' R � Equivalent to the voting approach � d(Q T ,E T ) - the type distance, is domain dependent � Query terms dependent – Consider proximity with the query terms � In our experiments we � Does the entity is usually mentioned in proximity to query terms measured the distance in the � In analogy to document scoring Wikipedia category graph Name Name Name • T EN Full independent � The minimal path length • O EN Sequential dependent between all pairs of the • U EN Full dependent query and the entity’s page categories 11 MRF for EoS JIWES 2012, Portland OR 12 MRF for EoS JIWES 2012, Portland OR 2

Entity Name Scoring P(E N |Q) (cont.) Entity Scoring Process E 1,1 E’ 1 E’’ 1 � Local approach E 1 E’ 1 E 1,2 � Measure the relationship (e.g. proximity) between the query terms and the entity name in the E’ 2 E’’ 2 top retrieved documents E 2 E’ 2 � Global approach P(E D, E T, E N |Q) P(E D |Q) P(E D, E T |Q) (Q,T) � Measure the PMI between the query term(s) and the entity name in the whole collection E r,1 � PMI – the pointwise mutual information – the likelihood of finding one term in proximity to another term E r,2 � � � � X X P( E | Q ) f ( ) c N E N E n E’ n E’ n E’’ n N � � X A c X EN � A { , S T O U PMI , , , , PMI , PMI } T O U E’ n+k E’’ n+k local global 13 MRF for EoS JIWES 2012, Portland OR 14 MRF for EoS JIWES 2012, Portland OR Evaluation INEX – Entity Ranking track <inex_topic> � The INEX Entity Ranking track 2007-2009 <title> circus mammals</title> � Entity Ranking (XER), � The Collection – Wikipedia articles <description> � Return entities that satisfy a I want a list of mammals which have ever � A retrievable entity must have a Wikipedia page been tamed to perform in circuses. topic described in natural � No need for a third-party named-entity extraction tools </description> language text � Each entity has a unique name, document and type (WP categories) <narrative> � List Completion (LC) � INEX topics perfectly fit our model looking for relevant entities to a given topic Each answer should contain an article about � Metrics: MAP for 2007, infAP for 2008-2009 � mammal which can be a part of any circus Complete a partial list of given show. answers </narrative> Data set Wikipedia Year #docs Train topics Test topics <categories> � Entities: <category id="138">mammals</category> 2007 28 46 � </categories> Must have a Wikipedia page 2006 659,388 <entities> � Entity type is determined by <entity id="379035">Asian Elephant</entity> 2008 74 35 corresponding Wikipedia <entity id="4402">Brown Bear</entity> categories </entities> • “movies”, “trees”, 2009 2009 2,666,190 - 55 </inex_topic> • “Italian politicians” MRF for EoS JIWES 2012, Portland OR MRF for EoS JIWES 2012, Portland OR 15 16 Parameter tuning Parameter tuning (cont) – Coordinate Ascent The parameters of the � The following parameters are common for all the entity ranking scores scoring function were � Values were selected after extensive search over a wide range of values tuned using the Coordinate Ascent Entity Property Symbol Parameter name Optimal Value algorithm for each ED N Query term proximity 10 benchmark window size d max Max. distance in 5 � Optimization process category graph ET was done separately for alpha Category score decay 3 each dataset, using the R #top-docs for voting 500 training topics score computation � Performance was EN K Entity name terms 3 estimated using the test proximity window topics R init #top-docs for entity 10 � For 2009 we used expansion Cross-Validation 17 MRF for EoS JIWES 2012, Portland OR 18 MRF for EoS JIWES 2012, Portland OR 3

0.4 Summary 2007 0.35 2008 0.3 2009 � In this work we presented an entity ranking model 0.25 using the MRF framework which integrates 0.2 0.15 � Profile approach: query E-document relationship 0.1 � Voting approach: query E-name relationship 0.05 � Type filtering approach: query E-type relationship 0 � Experiments over INEX benchmarks showed that S(ED) S(ED,ET) S(ED,ET,EN) S(ED) S(ED,ET) S(ED,ET,EN) S(ED) S(ED,ET) S(ED,ET,EN) � Performance is relatively high and comparative to leading INEX systems � Using dependence models did not result in significant improvement over Full Independence model. FI SD FD INEX top � Global based name scoring outperforms local based name scoring � Results improved significantly when type and name scoring were added � Future work � Final Results are superior to top INEX 2007,2008, and comparable to 2009 � Explore this model with additional data collections, specifically, large web � Dependence models (SD, FD) have not improved over Independence collections model (FI) ??? � Using additional entity properties, e.g. exploring the entity graph � Global based name scoring (PMI) outperforms local based name scoring � Further investigation of the dependence models 19 MRF for EoS JIWES 2012, Portland OR 20 MRF for EoS JIWES 2012, Portland OR Thank You! Questions? MRF for EoS JIWES 2012, Portland OR 21 4

1 EoS: Voting approach ( Balog06, MacDonald09 ) Markov Random Fields - PDF document

A Ranking Framework for Entity Oriented Search using Markov Random Fields Hadas Hadas Raviv, David Carmel Oren Kurland IBM Research Haifa Lab , Faculty of IE and Management Israel Technion, Israel David Oren IBM Research - Haifa

Moving the EOS namespace to persistent memory Tobias Kapp e (IT-DSS-DT) tkappe@cern.ch

Neutron Star Merger with Tabulated EOS and Spin Wolfgang Kastaun MICRA, Stockholm, Aug. 2015

A Data Throughput Prediction and Optimization Service for Widely Distributed Many-Task Computing

Satellite monitoring of the 2010 Russian Wildfires: Capitalizing

Equation of State Effects on the Birth of Compact Objects Andr da Silva Schneider E.

USQCD and thermodynamics: where we are and where we are going ? Peter Petreczky, BNL Defining

Bareos Python Plugins Introduction Stephan Dhr Feb 3, 2017 Agenda Bareos architecture and

COMMON ENVELOPE SIMULATIONS IN PHANTOM THOMAS REICHARDT COLLABORATORS: ORSOLA DE MARCO, ROBERTO

PyFR Symposium 2020 Addin ing Mult ltiphase Capabili lities to to PyF yFR Xi Deng 1 , Pierre

GWs from neutron star mergers: accuracy and tidal effects S. Bernuzzi TPI-FSU Jena / SFB-TR7

Simulations of the inspiral and merger of neutron star binaries Jos A. Font Departamento de

Exploiting Cross-Sentence Context for Neural Machine Translation Longyue Wang Zhaopeng Tu

neutron star mergers Kenta Kiuchi (YITP) Masaru Shibata (YITP), Yuichiro Sekiguchi (Toho Univ.),

Varying Nf in QCD: scale separation, topology (and hot axions) Maria Paola Lombardo INFN I.

NLP William Wang Sameer Singh Slides: http://tiny.cc/adversarial With contributions from Jiwei

Theres Always a First Time A Clinical Problem Solving Case Gurpreet Dhaliwal, MD Professor of

Ask an Expert: Ansible Network Automation Sean Cavanaugh Iftikhar Khan Technical Marketing

A new RMF based quark-nuclear matter EoS for applications in astrophysics and heavy-ion collisions

quark star model Enping Zhou Supervisor: Prof. Renxin Xu & Prof. Luciano Rezzolla 2015.01.13

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Welcome to Zoom. If this is your first time, here are some pointers. To let us see your smiling

130 Million Years ago, in a galaxy equally far away F. J. Llanes-Estrada Equation of State for

Towards Intelligence Augmented Software Traceability Jin L.C. Guo School of Computer Science

Laws of Thermodynamics Thermodynamics: (developed in 19 th century) phenomenological theory to

1 EoS: Voting approach ( Balog06, MacDonald09 ) Markov Random Fields - PDF document

A Ranking Framework for Entity Oriented Search using Markov Random Fields Hadas Hadas Raviv, David Carmel Oren Kurland IBM Research Haifa Lab , Faculty of IE and Management Israel Technion, Israel David Oren IBM Research - Haifa

Moving the EOS namespace to persistent memory Tobias Kapp e (IT-DSS-DT) tkappe@cern.ch

Neutron Star Merger with Tabulated EOS and Spin Wolfgang Kastaun MICRA, Stockholm, Aug. 2015

A Data Throughput Prediction and Optimization Service for Widely Distributed Many-Task Computing

Satellite monitoring of the 2010 Russian Wildfires: Capitalizing

Equation of State Effects on the Birth of Compact Objects Andr da Silva Schneider E.

USQCD and thermodynamics: where we are and where we are going ? Peter Petreczky, BNL Defining

Bareos Python Plugins Introduction Stephan Dhr Feb 3, 2017 Agenda Bareos architecture and

COMMON ENVELOPE SIMULATIONS IN PHANTOM THOMAS REICHARDT COLLABORATORS: ORSOLA DE MARCO, ROBERTO

PyFR Symposium 2020 Addin ing Mult ltiphase Capabili lities to to PyF yFR Xi Deng 1 , Pierre

GWs from neutron star mergers: accuracy and tidal effects S. Bernuzzi TPI-FSU Jena / SFB-TR7

Simulations of the inspiral and merger of neutron star binaries Jos A. Font Departamento de

Exploiting Cross-Sentence Context for Neural Machine Translation Longyue Wang Zhaopeng Tu

neutron star mergers Kenta Kiuchi (YITP) Masaru Shibata (YITP), Yuichiro Sekiguchi (Toho Univ.),

Varying Nf in QCD: scale separation, topology (and hot axions) Maria Paola Lombardo INFN I.

NLP William Wang Sameer Singh Slides: http://tiny.cc/adversarial With contributions from Jiwei

Theres Always a First Time A Clinical Problem Solving Case Gurpreet Dhaliwal, MD Professor of

Ask an Expert: Ansible Network Automation Sean Cavanaugh Iftikhar Khan Technical Marketing

A new RMF based quark-nuclear matter EoS for applications in astrophysics and heavy-ion collisions

quark star model Enping Zhou Supervisor: Prof. Renxin Xu &amp; Prof. Luciano Rezzolla 2015.01.13

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Welcome to Zoom. If this is your first time, here are some pointers. To let us see your smiling

130 Million Years ago, in a galaxy equally far away F. J. Llanes-Estrada Equation of State for

Towards Intelligence Augmented Software Traceability Jin L.C. Guo School of Computer Science

Laws of Thermodynamics Thermodynamics: (developed in 19 th century) phenomenological theory to

quark star model Enping Zhou Supervisor: Prof. Renxin Xu & Prof. Luciano Rezzolla 2015.01.13