Modeling Relevance Gain Evaluation, session 4 CS6200: Information - - PowerPoint PPT Presentation

modeling relevance gain
SMART_READER_LITE
LIVE PREVIEW

Modeling Relevance Gain Evaluation, session 4 CS6200: Information - - PowerPoint PPT Presentation

Modeling Relevance Gain Evaluation, session 4 CS6200: Information Retrieval Expected Relevance Gain All of the measures weve seen so far can be expressed in a different way, Let P ( i ) := prob. user reads doc i based on a user model. r )


slide-1
SLIDE 1

CS6200: Information Retrieval

Modeling Relevance Gain

Evaluation, session 4

slide-2
SLIDE 2

All of the measures we’ve seen so far can be expressed in a different way, based on a user model. The user model gives the probability

  • f the user reading each document in

the ranking. With these probabilities, we can calculate the expected amount of relevance the user would gain from the ranking.

Expected Relevance Gain

Let P(i) := prob. user reads doc i R(

  • r) := fraction of docs user reads

from r which are relevant Then gain(

  • r) :=EP[R(
  • r)]

=

|

  • r|
  • i=1

P(i) · ri

slide-3
SLIDE 3

For precision@k, we model the user as having equal probability of reading each of the top k documents and zero probability of reading anything else. Is this a reasonable user model?

Precision@k

Pprec@k(i) :=

  • 1/k

if i ≤ k

  • therwise

EPprec@k[R(

  • r)] =

|

  • r|
  • i=1

Pprec@k(i) · ri =

k

  • i=1

1 kri = 1 k

k

  • i=1

ri

slide-4
SLIDE 4

DCG and nDCG don’t normalize easily for this framework, so instead we introduce a related measure: Scaled DCG, or sdcg. This user model is top-weighted: the probability of observing a document is higher for top-ranked documents.

Scaled DCG

Psdcg@k(i) :=

  • 1/Z · 1/ lg(i + 1)

if i ≤ k

  • therwise

Z :=

k

  • i=1

1/ lg(i + 1)

sdcg@k(

  • r) :=

  • i=1

riPsdcg@k(i) = 1 Z

k

  • i=1

ri lg(i + 1)

slide-5
SLIDE 5

So far, we have reconsidered the measures based on the probability of the user observing a document. It’s sometimes useful to instead consider the probability of the user continuing past a given document. If they read doc i, will they read i+1?

Probability of Continuing

CM(i) := PM(i + 1) PM(i)

Cprec@k(i) :=

  • 1

if i < k

  • therwise

Csdcg@k(i) :=

  • lg(i+1)

lg(i+2)

if i < k

  • therwise
slide-6
SLIDE 6

Rank-biased precision is the measure we get if we imagine that the user has some fixed probability, p, of continuing. This hypothetical user flips a p-biased coin at each document to decide when to give up. On average, this user will read 1 / (1 - p) documents before giving up.

Rank-biased Precision

Prbp(i) := (1 − p)pi−1 Crbp(i) := p

slide-7
SLIDE 7

This form of Inverse Squares (by Moffat et al 2012) is built on the intuition that the probability of continuing depends on the number of documents the user expects to need to satisfy her information need. Its parameter T is the anticipated number of documents.

  • For nav queries, T ≅1
  • For info queries, T ≫ 1

Inverse Squares

Let Sm := π2 6 −

m

  • i=1

1 i2 Then: Pinsq(T, i) := 1 S2T−1 · 1 (i + 2T − 1)2 Cinsq(T, i) = (i + 2T − 1)2 (i + 2T)2

slide-8
SLIDE 8

A final way to model user behavior is based on the probability that document i is the last document read. This gives an interpretation for Average Precision: the expected relevance gained from the user choosing a relevant document i uniformly at random, and reading all documents from 1 to i. Imagine that exactly one of the relevant documents will satisfy the user, but we don’t know which one.

Average Precision

LM(i) := PM(i) − PM(i + 1) PM(1)

Lap(i) :=

  • ri/R

if R > 0

  • therwise
slide-9
SLIDE 9

Evaluation metrics should be carefully chosen to be well-suited to the users and task you’re trying to measure. Understanding the user model underlying a given metric can help shed light on what you’re really measuring. Next, we’ll look at the construction and use of test collections.

Wrapping Up