1
play

1 EoS: Voting approach ( Balog06, MacDonald09 ) Markov Random Fields - PDF document

A Ranking Framework for Entity Oriented Search using Markov Random Fields Hadas Hadas Raviv, David Carmel Oren Kurland IBM Research Haifa Lab , Faculty of IE and Management Israel Technion, Israel David Oren IBM Research - Haifa


  1. A Ranking Framework for Entity Oriented Search using Markov Random Fields Hadas Hadas Raviv, David Carmel Oren Kurland IBM Research – Haifa Lab , Faculty of IE and Management Israel Technion, Israel David Oren IBM Research - Haifa IE&M Technion, Haifa 2 MRF for EoS JIWES 2012, Portland OR Outline Entity Oriented Search (EoS) � When people use retrieval systems they are often not searching for � Entities Oriented Search documents or text passages � Popular Approaches � Often named entities play a central role in answering such information � MRF for information retrieval needs � persons, organizations, � MRF for Entity Oriented search locations, products… � Entity Document Scoring � Entity Type Scoring � Entity Name Scoring � At least 20-30% of the queries submitted to Web SE are simply named entities � Evaluation � ~71% of Web search queries contain � Benchmarks: INEX entity tracks 2007 -2009 named entities � Experimental Results ( Named entity recognition in query , Guo et al, SIGIR09) � Summary and future work MRF for EoS JIWES 2012, Portland OR MRF for EoS JIWES 2012, Portland OR 3 4 EoS: Profile based Approach (Craswell et al 2001): � Represent each entity by a virtual document (a profile) e1 d_e1 e2 d_e2 � e.g. e3 d_e3 e4 d_e4 � Entity home-page � Concatenating passages mentioning the entity d_e1 � Rank those profiles according to their relevance to the query d_e2 � Using standard IR ranking techniques q d_e3 � Difficulties: � Co-resolution and name disambiguation d_e4 � Profiling is not an easy task 5 MRF for EoS JIWES 2012, Portland OR 6 MRF for EoS JIWES 2012, Portland OR 1

  2. EoS: Voting approach ( Balog06, MacDonald09 ) Markov Random Fields for IR (Metzler & Croft 2005) � Any relevant document is a “voter” for the entity mentioned within its content p1 d1 d2 p2 q d3 p3 � � � Full Independence Sequential dependence Full dependence Score ( p , q ) Score ( d , q ) Score ( p , d ) d C T =(q i ,D) C O =(#1(q i ..q i+k ),D) C U =(#uwN{q i ..q j },D) � What is the ratio behind? � � � � � An entity mentioned many times in relevant (top retrieved) docs P( D Q | ) f c ( ) c is more likely to be relevant on the given topic? c C G ( ) 7 MRF for EoS JIWES 2012, Portland OR 8 MRF for EoS JIWES 2012, Portland OR MRF for EoS MRF based Entity Document Scoring P(E D |Q) � We consider cliques of the three types � Full Independent (C T ) � Sequential dependent (C O ) � Full dependent (C U ) � The feature functions f I D (c) over clique of type I (I in {T,O,U}) � measures how well the clique's terms represent the entity document � Based on Dirichlet smoothed language model � � � � � tf q E ( , ) cf q ( )/ | C | T � f ( , q E ) log � i D i � � � D i D � | E | � D � For C O and C U we replace q i with #1(q i ..q i+k ) and #uwN({q i ,.. q j }) respectively � The entity document scoring function aggregates the feature functions over all clique types � � � � � � � P( E Q | ) P( E | Q ) � I I �� � P( E | Q ) f ( ) c Q { ... q q }, T E P D E D P 1 n D � � , � � P D T N , , } I T O U , } c I ED MRF for EoS JIWES 2012, Portland OR MRF for EoS JIWES 2012, Portland OR 9 10 Entity type Scoring P(E T |Q) Entity Name Scoring P(E N |Q) � f T (c) is defined over a single clique composed of E T and Q T � Clique types: � � � � d Q ( , E ) e T T � � � � � Query terms independent – S EN - a single node clique containing the P E ( | Q ) f ( ) c log � � � T T d Q ( , E ' ) � e � � T T � entity name alone � E ' R � Equivalent to the voting approach � d(Q T ,E T ) - the type distance, is domain dependent � Query terms dependent – Consider proximity with the query terms � In our experiments we � Does the entity is usually mentioned in proximity to query terms measured the distance in the � In analogy to document scoring Wikipedia category graph Name Name Name • T EN Full independent � The minimal path length • O EN Sequential dependent between all pairs of the • U EN Full dependent query and the entity’s page categories 11 MRF for EoS JIWES 2012, Portland OR 12 MRF for EoS JIWES 2012, Portland OR 2

  3. Entity Name Scoring P(E N |Q) (cont.) Entity Scoring Process E 1,1 E’ 1 E’’ 1 � Local approach E 1 E’ 1 E 1,2 � Measure the relationship (e.g. proximity) between the query terms and the entity name in the E’ 2 E’’ 2 top retrieved documents E 2 E’ 2 � Global approach P(E D, E T, E N |Q) P(E D |Q) P(E D, E T |Q) (Q,T) � Measure the PMI between the query term(s) and the entity name in the whole collection E r,1 � PMI – the pointwise mutual information – the likelihood of finding one term in proximity to another term E r,2 � � � � X X P( E | Q ) f ( ) c N E N E n E’ n E’ n E’’ n N � � X A c X EN � A { , S T O U PMI , , , , PMI , PMI } T O U E’ n+k E’’ n+k local global 13 MRF for EoS JIWES 2012, Portland OR 14 MRF for EoS JIWES 2012, Portland OR Evaluation INEX – Entity Ranking track <inex_topic> � The INEX Entity Ranking track 2007-2009 <title> circus mammals</title> � Entity Ranking (XER), � The Collection – Wikipedia articles <description> � Return entities that satisfy a I want a list of mammals which have ever � A retrievable entity must have a Wikipedia page been tamed to perform in circuses. topic described in natural � No need for a third-party named-entity extraction tools </description> language text � Each entity has a unique name, document and type (WP categories) <narrative> � List Completion (LC) � INEX topics perfectly fit our model looking for relevant entities to a given topic Each answer should contain an article about � Metrics: MAP for 2007, infAP for 2008-2009 � mammal which can be a part of any circus Complete a partial list of given show. answers </narrative> Data set Wikipedia Year #docs Train topics Test topics <categories> � Entities: <category id="138">mammals</category> 2007 28 46 � </categories> Must have a Wikipedia page 2006 659,388 <entities> � Entity type is determined by <entity id="379035">Asian Elephant</entity> 2008 74 35 corresponding Wikipedia <entity id="4402">Brown Bear</entity> categories </entities> • “movies”, “trees”, 2009 2009 2,666,190 - 55 </inex_topic> • “Italian politicians” MRF for EoS JIWES 2012, Portland OR MRF for EoS JIWES 2012, Portland OR 15 16 Parameter tuning Parameter tuning (cont) – Coordinate Ascent The parameters of the � The following parameters are common for all the entity ranking scores scoring function were � Values were selected after extensive search over a wide range of values tuned using the Coordinate Ascent Entity Property Symbol Parameter name Optimal Value algorithm for each ED N Query term proximity 10 benchmark window size d max Max. distance in 5 � Optimization process category graph ET was done separately for alpha Category score decay 3 each dataset, using the R #top-docs for voting 500 training topics score computation � Performance was EN K Entity name terms 3 estimated using the test proximity window topics R init #top-docs for entity 10 � For 2009 we used expansion Cross-Validation 17 MRF for EoS JIWES 2012, Portland OR 18 MRF for EoS JIWES 2012, Portland OR 3

  4. 0.4 Summary 2007 0.35 2008 0.3 2009 � In this work we presented an entity ranking model 0.25 using the MRF framework which integrates 0.2 0.15 � Profile approach: query E-document relationship 0.1 � Voting approach: query E-name relationship 0.05 � Type filtering approach: query E-type relationship 0 � Experiments over INEX benchmarks showed that S(ED) S(ED,ET) S(ED,ET,EN) S(ED) S(ED,ET) S(ED,ET,EN) S(ED) S(ED,ET) S(ED,ET,EN) � Performance is relatively high and comparative to leading INEX systems � Using dependence models did not result in significant improvement over Full Independence model. FI SD FD INEX top � Global based name scoring outperforms local based name scoring � Results improved significantly when type and name scoring were added � Future work � Final Results are superior to top INEX 2007,2008, and comparable to 2009 � Explore this model with additional data collections, specifically, large web � Dependence models (SD, FD) have not improved over Independence collections model (FI) ??? � Using additional entity properties, e.g. exploring the entity graph � Global based name scoring (PMI) outperforms local based name scoring � Further investigation of the dependence models 19 MRF for EoS JIWES 2012, Portland OR 20 MRF for EoS JIWES 2012, Portland OR Thank You! Questions? MRF for EoS JIWES 2012, Portland OR 21 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend