Axiomatic Analysis and Optimization of Information Retrieval Models - PowerPoint PPT Presentation

Axiomatic Analysis and Optimization of Information Retrieval Models ChengXiang (“Cheng”) Zhai Department of Computer Science University of Illinois at Urbana-Champaign http://www.cs.uiuc.edu/homes/czhai 1

Search is everywhere, and part of everyone’s life Web Search Desk Search Enterprise Search Social Media Search Site Search … … 2

Search accuracy matters! # Queries /Day X 1 sec X 10 sec 4,700,000,000 ~13,000,000 hrs ~1,300,000 hrs 1,600,000,000 ~440,000 hrs ~4,400,000 hrs ~5,500 hrs ~550 hrs 2,000,000 … … How can we improve all search engines in a general way? Sources: Google, Twitter: http://www.statisticbrain.com/ PubMed: http://www.ncbi.nlm.nih.gov/About/tools/restable_stat_pubmed.html 3

Behind all the search boxes… Document collection number of queries search engines k Query q d Ranked Machine Learning list How can we optimize a retrieval model? Retrieval Score(q,d) Model Natural Language Processing 4

Retrieval model = computational definition of “relevance” S(“computer science CMU”, ) s(“computer”, ) s(“science”, ) s(“CMU”, ) How many times does “computer” occur in d? Term Frequency (TF): c(“computer”, d) How long is d? Document length : |d| How often do we see “computer” in the entire collection? Document Frequency : df(“computer) P(“computer”|Collection) 5

Scoring based on bag of words in general q d w This image cannot currently be displayed. Sum over matched query terms   ∑   = ( , ) ( , , ) , ( , ) s q d f weight w q d a q d     ∈ ∩ w q d [ ( , ), ( , ), | |, ( )] g c w q c w d d df w Inverse Document Frequency ( | ) p w C (IDF) Term Frequency (TF) Document length 6

Improving retrieval models is a long-standing challenge • Vector Space Models: [Salton et al. 1975], [Singhal et al. 1996], … • Classic Probabilistic Models: [Maron & Kuhn 1960], [Harter 1975], [Robertson & Sparck Jones 1976], [van Rijsbergen 1977], [Robertson 1977], [Robertson et al. 1981], [Robertson & Walker 1994], … • Language Models: [Ponte & Croft 1998], [Hiemstra & Kraaij 1998], [Zhai & Lafferty 2001], [Lavrenko & Croft 2001], [Kurland & Lee 2004], … • Non-Classic Logic Models : [Rijsbergen 1986], [Wong & Yao 1991], … • Divergence from Randomness: [Amati & Rijsbergen 2002], [He & Ounis 2005], … • Learning to Rank: [Fuhr 1989], [Gey 1994], ... • … Many different models were proposed and tested 7

Some are working very well (equally well) • Pivoted length normalization (PIV) [Singhal et al. 96] • BM25 [Robertson & Walker 94] • PL2 [Amati & Rijsbergen 02] • Query likelihood with Dirichlet prior (DIR) [Ponte & Croft 98], [Zhai & Lafferty] • … but many others failed to work well… 8

State of the art retrieval models • Pivoted Normalization Method + + + 1 ln(1 ln( ( , ))) 1 c w d N ∑ ⋅ ⋅ ( , ) ln c w q | | d ( ) df w − + ∈ ∩ (1 ) w q d s s avdl • Dirichlet Prior Method µ ( , ) c w d ∑ × + + ⋅ ( , ) ln(1 ) | | ln c w q q µ ⋅ µ + ( | ) | | p w C d ∈ ∩ w q d • Okapi Method + × + × − + ( ) 0.5 ( 1) ( , ) ( 1) ( , ) k c w q ∑ N df w k c w d ⋅ ⋅ 1 3 ln + + | | d ( ) 0.5 ( , ) df w k c w q − + + ∈ ∩ ((1 ) ) ( , ) w q d k b b c w d 3 1 avdl PL2 is a bit more complicated, but implements similar heuristics 9

Questions • Why do {BM25, PIV, PL, DIR, …} tend to perform similarly even though they were derived in very different ways? • Why are they better than many other variants? • Why does it seem to be hard to beat these strong baseline methods? • Are they hitting the ceiling of bag-of-words assumption? – If yes, how can we prove it? – If not, how can we find a more effective one? 10

Suggested Answers • Why do {BM25, PIV, PL, DIR, …} tend to perform similarly even though they were derived in very different ways? They share some nice common properties These properties are more important than how each is derived • Why are they better than many other variants? Other variants don’t have all the “nice properties” • Why does it seem to be hard to beat these strong baseline methods? We don’t have a good knowledge about their deficiencies • Are they hitting the ceiling of bag-of-words assumption? – If yes, how can we prove it? – If not, how can we find a more effective one? Need to formally define “the ceiling” (= complete set of “nice properties”) 11

Main Point of the Talk: Axiomatic Relevance Hypothesis (ARH) • Relevance can be modeled by a set of formally defined constraints on a retrieval function – If a function satisfies all the constraints, it will perform well empirically – If function Fa satisfies more constraints than function Fb, Fa would perform bettter than Fb empirically • Analytical evaluation of retrieval functions – Given a set of relevance constraints C={c1, …, ck} – Function Fa is analytically more effective than function Fb iff the set of constraints satisfied by Fb is a proper subset of those satisfied by Fa – A function F is optimal iff it satisfies all the constraints in C 12

Rest of the Talk 1. Modeling relevance with formal constraints 2. Testing the axiomatic relevance hypothesis 3. An axiomatic framework for optimizing retrieval models 4. Open challenge: seeking an ultimately optimal retrieval model 13

Outline 1. Modeling relevance with formal constraints 2. Testing the axiomatic relevance hypothesis 3. An axiomatic framework for optimizing retrieval models 4. Open challenge: seeking an ultimately optimal retrieval model 14

Motivation: different models, but similar heuristics • Pivoted Normalization Method + + + 1 ln(1 ln( ( , ))) 1 c w d N ∑ ⋅ ⋅ ( , ) ln c w q | | d ( ) df w − + ∈ ∩ (1 ) w q d s s avdl Parameter sensitivity Document Length Normalization Inversed Document Frequency Term Frequency • Dirichlet Prior Method µ ( , ) c w d ∑ × + + ⋅ ( , ) ln(1 ) | | ln c w q q µ ⋅ µ + ( | ) | | p w C d ∈ ∩ w q d • Okapi Method + × + × − + ( ) 0.5 ( 1) ( , ) ( 1) ( , ) k c w q ∑ N df w k c w d ⋅ ⋅ 1 3 ln + + | | d ( ) 0.5 ( , ) df w k c w q − + + ∈ ∩ ((1 ) ) ( , ) w q d k b b c w d 3 1 avdl PL2 is a bit more complicated, but implements similar heuristics 15

Are they performing well because they implement similar retrieval heuristics? Can we formally capture these necessary retrieval heuristics? 16

Term Frequency Constraints (TFC1) TF weighting heuristic I: Give a higher score to a document with more occurrences of a query term. • TFC1 w Let q be a query with only one term w . q : ( , ) c w d 1 d = | | | | If d 1 2 d 1 : > and ( , ) ( , ) c w d c w d 1 2 d 2 : > then ( , ) ( , ). f d q f d q 1 2 ( , ) c w d 2 > ( , ) ( , ) f d q f d q 1 2 17

Term Frequency Constraints (TFC2) TF weighting heuristic II: Favor a document with more distinct query terms. • TFC2 w 1 w 2 Let q be a query and w 1 , w 2 be two query terms . q: ( , ) ( , ) c w d c w d = = | | | | 1 1 2 1 ( ) ( ) d d Assume idf w idf w and 1 2 1 2 d 1 : = + ( , ) ( , ) ( , ) If c w d c w d c w d 1 2 1 1 2 1 = ≠ ≠ ( , ) 0, ( , ) 0, ( , ) 0 and c w d c w d c w d d 2 : 2 2 1 1 2 1 > ( , ) ( , ). then f d q f d q ( , ) c w d 1 2 1 2 > ( , ) ( , ) f d q f d q 1 2 18

Length Normalization Constraints(LNCs) Document length normalization heuristic: Penalize long documents(LNC1); avoid over-penalizing long documents (LNC2) . • LNC1 ( , ) c w d q: 1 Let q be a query. w ∉ d 1 : ∉ = + q , ( , ) ( , ) 1 If for some word w q c w d c w d 2 1 d 2 : = , ( , ) ( , ) but for other words w c w d c w d 2 1 ≥ ≥ ( , ) ( , ) then f d q f d q ( , ) ( , ) f d q f d q ( , ) c w d 1 2 1 2 2 LNC2 • q: Let q be a query. = ⋅ ∀ > = ⋅ If and ( , ) ( , ) d 1 : c w d k c w d 1 , | | | | k d k d 1 2 1 2 d 2 : ≥ then ( , ) ( , ) f d q f d q 1 2 ≥ ( , ) ( , ) f d q f d q 1 2 19

TF-LENGTH Constraint (TF-LNC) TF-LN heuristic: Regularize the interaction of TF and document length. • TF-LNC w Let q be a query with only one term w . q: ( , ) c w d 1 = + − | | | | ( , ) ( , ) If d d c w d c w d d 1 : 1 2 1 2 d 2 : > and ( , ) ( , ) c w d c w d 1 2 > ( , ) then ( , ) ( , ). c w d f d q f d q 2 1 2 > ( , ) ( , ) f d q f d q 1 2 20

Seven Basic Relevance Constraints [Fang et al. 2011] Hui Fang, Tao Tao, ChengXiang Zhai: Diagnostic Evaluation of Information Retrieval Models. ACM Trans. Inf. Syst. 29(2): 7 (2011) 21

Outline 1. Modeling relevance with formal constraints 2. Testing the axiomatic relevance hypothesis 3. An axiomatic framework for optimizing retrieval models 4. Open challenge: seeking an ultimately optimal retrieval model 22

Axiomatic Analysis and Optimization of Information Retrieval Models - PowerPoint PPT Presentation

Axiomatic Analysis and Optimization of Information Retrieval Models ChengXiang (Cheng) Zhai Department of Computer Science University of Illinois at Urbana-Champaign http://www.cs.uiuc.edu/homes/czhai 1 Search is everywhere, and part

Logical Topology and Axiomatic Cohesion David Jaz Myers Johns Hopkins University March 12, 2019

9 Axiomatic semantics 9 .1 OVERVIEW As introduced in chapter 4, the axiomatic method expresses

Logic as a Tool Chapter 2: Deductive Reasoning in Propositional Logic 2.2 Axiomatic systems for

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Deflationism and Axiomatic Theories of Truth Deflationism Peano Proof theoretic and model

An axiomatic divisibility theory for commutative rings .c Pha .m Ngo Anh R enyi

Lecture 2: Axiomatic semantics Reading assignment for next week Ariane paper and response (see

Constructive Axiomatic Method and Univalent Foundations of Mathematics Andrei Rodin

O. Kosmachev INTRODUCTION Necessity of conversion to axiomatics. What means axiomatic-like

The manifestation of Hilberts Nullstellensatz in Lawveres Axiomatic Cohesion Mat as

Axiomatic classes of models in modal logics Evgeny Zolin Moscow State University Workshop on

The Axiomatic Method in Social Choice Theory: Preference Aggregation, Judgment Aggregation, Graph

Logic as a Tool Chapter 2: Deductive Reasoning in Propositional Logic 2.2 Axiomatic systems for

Lecture 2.4: Axiomatic systems Matthew Macauley Department of Mathematical Sciences Clemson

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

In Information Retrieval (CIR IR) - A re review of f neura ral appro roaches Jianfeng Gao,

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data Nikita

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Introduction to Information Retrieval and Web Search Tao Yang UCSB CS293S, Winter 2017 Table of

Goals Advance math-aware search Advance semantic analysis of mathematical notation and

Storage and Retrieval Cycle A storage and retrieval (S/R) cycle is one complete roundtrip from

Retrieving Target Gestures Toward Speech Driven Animation with Meaningful Behaviors N AJMEH S

Semi-supervised Question Retrieval with Gated Convolutions Tao Lei joint work with Hrishikesh