On the Mathematical Relationship between Expected n-call@k and the - PowerPoint PPT Presentation

On the Mathematical Relationship between Expected n-call@k and the Relevance vs. Diversity Trade-off Kar Wai Lim, Scott Sanner , Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi Feb 21 2013 1

Outline • Need for diversity • The answer: MMR • Jeopardy: what was the question? – Expected n-call@k 2

Search Result Ranking • We query the daily news for “ technology ”  we get this • Is this desirable? • Note that de-duplication would not solve this problem 3

Another example Query for Apple: • Is this better? 4

The Answer: Diversity • When query is ambiguous, diversity is useful • How can we achieve this? – Maximum marginal relevance (MMR) • Carbonell & Goldstein, SIGIR 1998 • S k is subset of k selected documents from D • Greedily build S k from S k-1 where S 0   : 5

What was the Question? • MMR is an algorithm , we don’t know what underlying objective it is optimizing. • Previous formalization attempts but full question unanswered for 14 years – Chen and Karger, SIGIR 2006 came closest • This talk: one complete derivation of MMR 6

What Set-based Objectives Encourage Diversity? • Chen and Karger, SIGIR 2006: 1-call@k – At least one document in S k should be relevant – Diverse: encourages you to “cover your bases” with S k – Sanner et al , CIKM 2011: 1-call@k derives MMR with λ = ½ • van Rijsbergen, 1979: Probability Ranking Principle (PRP) – Rank items by probability of relevance (e.g., modeled via term freq) – Not diverse: Encourages k th item to be very similar to first k-1 items – k-call@k relates to MMR with λ = 1, which is PRP • So either λ = ½ (1-call@k) or λ = 1 (k-call@k)? – Should really tune λ for MMR based on query ambiguity • Santos, MacDonald, Ounis, CIKM 2011: Learn best λ given query features – So what derives λ  [½,1]? • Any guesses?  7

Empirical Study of n-call@k • How does diversity of n-call@k change with n? Estimate of Results Diversity (  ) Clearly,  decreases with n in n-call J. Wang and J. Zhu. Portfolio theory of information retrieval, SIGIR 2009 8

Hypothesis • Let’s try optimizing 2 -call@k – Derivation builds on Sanner et al , CIKM 2011 2 – Optimizing this leads to MMR with λ = 3 • There seems to be a trend relating λ and n: – n=1: λ = ½ 2 – n=2: λ = 3 – n=k: 1 • Hypothesis 𝑜 – Optimizing n-call@k leads to MMR with lim 𝑙→∞ λ (k,n) = 𝑜+1 9

One Detail is Missing… • We want to optimize n-call@k – i.e., at least n of k documents should be relevant • But what is “relevance”? – Need a model for this – In particular, one that models query and document ambiguity (via latent topics) • Since we hypothesize that topic ambiguity underlies the need for diversity 10

Graphical Model of Relevance s = selected docs t = subtopics ∈ T r = relevance ∈ {0, 1} q = observed query T = discrete subtopic set {apple-fruit, apple-inc} Observed Latent subtopic binary relevance model Latent (unobserved) 11

Graphical model of Relevance P(t i = C|s i ) = prob. of document s belongs to subtopic C P(t = C| q ) = prob. query q refers to subtopic C Observed Latent subtopic binary relevance model Latent (unobserved) 12

Graphical model of Relevance P(r i =1|t i =t) = 1 P(r i =1|t i  t) = 0 Observed Latent subtopic binary relevance model Latent (unobserved) 13

Optimising Objective • Now we can compute expected relevance – So need to use Expected n-call@k objective: where • For given query q , we want the maximizing S k – Intractable to jointly optimize 14

Greedy approach • Like MMR, we’ll take a greedy approach – Select the next document s k * given all previously chosen documents S k-1 : 15

Derivation • Nontrivial – Only an overview of “key tricks” here • For full details, see – Sanner et al, CIKM 2011: 1-call@k (gentler introduction) • http://users.cecs.anu.edu.au/~ssanner/Papers/cikm11.pdf – Lim et al, SIGIR 2012: n-call@k • http://users.cecs.anu.edu.au/~ssanner/Papers/sigir12.pdf and online SIGIR 2012 appendix • http://users.cecs.anu.edu.au/~ssanner/Papers/sigir12_app.pdf 16

Derivation 17

Derivation Marginalise out all subtopics (using conditional probability) 18

Derivation We write r k as conditioned on R k-1 , where it decomposes into two independent events, hence the + 19

Derivation Start to push latent topic marginalizations as far in as possible. 20

Derivation First term in + is independent of s k so can remove from max! 21

Derivation • We arrive at the simplified • This is still a complicated expression, but it can be expressed recursively… 22

Recursion Very similar conditional decomposition as done in first part of derivation. 23

Unrolling the Recursion • We can unroll the previous recursion, Where’s the max? MMR express it in closed-form, and substitute: has a max. 24

Deterministic Topic Probabilities • We assume that the topics of each document are known (deterministic), hence: – Likewise for P(t|q) – This means that a document refers to exactly one topic and likewise for queries, e.g., • If you search for “Apple” you meant the fruit OR the company , but not both • If a document refers to “Apple” the fruit , it does not discuss the company Apple Computer 25

Deterministic Topic Probabilities • Generally: • Deterministic: 26

Convert a  to a max • Assuming deterministic topic probabilities, we can convert a  to a max and vice versa • For x i  {0 (false), 1 (true)} max i =  i x i =  i (  x i ) = 1 -  i (1 – x i ) = 1 -  i (1 – x i ) 27

Convert a  to a max • From the optimizing objective when , we can write 28

Objective After   max 29

Combinatorial Simplification • Deterministic topics also permit combinatorial simplification of some of the  • Assuming that m documents out of the chosen (k-1) are relevant, then d (the top term) are non-zero times. • (bottom term) are non-zero times. 30

Final form • After… – assuming a deterministic topic distribution, – converting  to a max, and – combinatorial simplification Topic marginalization leads to argmax invariant to constant probability product kernel Sim 1 (·, ·): multiplier, use Pascal’s rule to 31 this is any kernel that L 1 normalizes normalize coefficients to [0,1]: inputs, so can use with TF, TF-IDF! MMR drops q dependence in Sim 2 (·, ·).

Comparison to MMR • The optimising objective used in MMR is • We note that the optimising objective for expected n-call@k has the same form as MMR, with . – but m is unknown 32

Expectation of m • Under expected n- call@k’s greedy algorithm, after choosing k-1 documents (note that k  n and m  n), we would expect m  n. • With the assumption m = n, we obtain – Our hypothesis! 𝑜 m is corpus dependent, but λ = 𝑜+1 also roughly follows can leave in if wanted; since empirical behavior observed 𝑜 m  n it follows that λ = 𝑜+1 is earlier, variation is likely 𝑜 an upper bound on λ = 𝑛+1 due to m for each corpus 33

Summary of Contributions • We showed the first derivation of MMR from first principles: – MMR optimizes expected n-call@k under the given graphical model of relevance and assumptions – After 14 years, gives insight as to what MMR is optimizing! • This framework can be used to derive new diversification (or retrieval) algorithms by changing – the graphical model of relevance – the set- or rank-based objective criterion – the assumptions 34

On the Mathematical Relationship between Expected n-call@k and the - PowerPoint PPT Presentation

On the Mathematical Relationship between Expected n-call@k and the Relevance vs. Diversity Trade-off Kar Wai Lim, Scott Sanner , Shengbo Guo, Thore Graepel, Sarvnaz Karimi, Sadegh Kharazmi Feb 21 2013 1 Outline Need for diversity The

Week 5 Video 1 Relationship Mining Correlation Mining Relationship Mining Discover

1 Relationship Sets (Cont.) Degree of a Relationship Set An attribute can also be property of

A changing investors / A changing investors / intermediary relationship intermediary relationship

Database design The Entity-Relationship model 1 The Entity-Relationship approach Design

Mathematical Induction COMPSCI 230 Discrete Math March 26, 2015 COMPSCI 230 Discrete

Mathematical String Notation 7 January 2019 OSU CSE 1 String Theory A mathematical model

Slide 1 Page: 1 Mathematical Tasks.ppt Effective Mathematics Instruction: The Role of

Mathematical Set Notation 8 February 2019 OSU CSE 1 Set Theory A mathematical model that

Mathematical Modeling NCTM Annual Meeting 2016 Cheryl Gann gann@ncssm.edu What is Mathematical

Research Digest Mathematical Optimization Mathematical approach to pursue the best Makoto

A Mathematical Theory of Communication (after C. E. Shannon) Alex Vlasiuk Alex A Mathematical

MathML 1 Mathematical Typesetting Mathematical typesetting differs in significant ways from

Dave Mark Intrinsic Algorithm Reducing the world to mathematical equations! Reducing

Week 4 Mathematical Induction Discrete Math Marie Demlov http://math.feld.cvut.cz/demlova

Customer Relationship Management BUSINESS CONSUL TING STRATEGY.EXECUTION.RESUL TS What is

Zoning and its Relationship to Zoning and its Relationship to Economic Development (or Planning for

Your Dedicated Team Your Dedicated Team Vice President and Senior Portfolio Manager TD Wealth |

Legal Issues to Consider Before the Death of a Loved One by Shelley Thompson, Esq. 1 When a

Settlement on behalf of Children Tom Stoate, Garden Court Chambers toms@gclaw.co.uk May 2020

North Yorkshire Dementia Support Who are Making Space? We are a national charity and leading

Introduction In many countries population aging is contributing to increases in the share of

The MSR System for Entity Linking at TAC 2013 Silviu Cucerzan Microsoft Research Machine

Introduc)on Sta$c electricity has been an industrial problem for

RECA 2019 Annual Meeting Your hosts Rob Telford, Chair Bob Myroniuk, Executive Director