SLIDE 8 8
29
Estimate and rank by P(R | Q, D), or I.e., , where Assume pi the same for all query terms qi = ni/N, where N is DB size (i.e., “all” docs are non-relevant)
intuition: e.g., “apple computer” in a computer DB
Probabilistic models (e.g.: [Croft and Harper,
1979])
) , | ( ) , | ( log D Q R P D Q R P
∏
∈
− ⋅ −
D Q ti i i i i
q q p p
,
1 1 log
∑ ∏ ∏ ∏
∈ ∈ ∈ ∈
− = − = − ∝ − ⋅ −
D Q ti i i D Q ti i i D Q ti i i D Q ti i i i i
n n N n n N q q q q p p
, , , ,
log log 1 log 1 1 log
) | ( R t P p
i i =
) | ( R t P q
i i = 30
This is how we derive the ranking function: ∏ ∏ ∏ ∏ ∏ ∏ ∏ ∏ ∏ ∏ ∏ ∏ ∏ ∏ ∏
∈ ∈ ∈ ∈ ∈ ∈ ∈ ∈ ∈ ∈ ∈ ∈ ∈ ∈ ∈
− ⋅ − = − − ∝ − − = − = − = − = − = ∝ =
D Q ti i i i i i D Q ti i i D Q ti i D Q tj j D Q ti i D Q tj j D Q ti i D Q tj j D Q ti i D Q tj j D Q ti i D Q tj j D Q ti i D Q tj j D Q ti i
q q p p p q q p q q p p D Q R P D Q R P q q R t P R t P R D Q P p p R t P R t P R D Q P R D Q P R D Q P R P R D Q P R P R D Q P D Q R P D Q R P
, , , , , , , , , , , , , , ,
1 1 ) 1 ( ) 1 ( ) 1 ( ) 1 ( ) , | ( ) , | ( ) 1 ( ) ) | ( 1 ( ) | ( ) | , ( ) 1 ( ) ) | ( 1 ( ) | ( ) | , ( ) | , ( ) | , ( ) ( ) | , ( ) ( ) | , ( ) , | ( ) , | (
) , | ( ) , | ( log D Q R P D Q R P
To rank by
31
Inference-based relevance
Motivation
Is there any “objective” way of defining relevance? Hint from a logic view of database querying: retrieve all objects
s.t., O → Q
E.g., O = (john, cs, 3.5) gpa>3.0 AND dept=cs What about “Retrieve D iff we can prove D→Q”?
Challenges: Uncertainty in inference? [van Rijsbergen, 1986]
Representation of documents and queries Quantify the uncertainty of inference P(D→Q) = P(Q|D)
32
Inference network [Turtle and Croft, 1990]
Given doc as evidence, prove that info need is satisfied Inference based on Bayesian belief networks
Query Network d1 dn d2 t1 t2 tn r1 r2 r3 rk Q q2 q1 cm c2 c1 Query or “infomation need” “doc dn observed” Doc Network
doc Doc rep. Doc concept Query concept Query rep.