Chapitre : Recherche d’information et apprentissage
Slides empruntés De la présentation Tie-Yan Liu
Microsoft Research Asia
Chapitre : Recherche d information et apprentissage Slides - - PowerPoint PPT Presentation
Chapitre : Recherche d information et apprentissage Slides emprunts De la prsentation Tie-Yan Liu Microsoft Research Asia Conventional Ranking Models Query-dependent Boolean model, extended Boolean model, etc. Vector space
Microsoft Research Asia
39
The Pointwise Approach Regression Classification Ordinal Regression Input Space Single documents yj Output Space Real values Non-ordered Categories Ordinal categories Hypothesis Space Scoring function Loss Function Regression loss Classification loss Ordinal regression loss ) (x f
) , ; (
j j y
x f L
x x x
1
, ( ),..., , ( ), , (
2 2 1 1 m m y
x y x y x
Introduction to Information Retrieval
i=1 m
2
i=1
– Une requête, un document, une classe (Pertinent, non pertinent) (plusieurs catégories)
– f(x)=sign <(x.w)+b>
B1
B2
B2
B1 B2
B1 B2 b11 b12 b21 b22
margin
B1 B2 b11 b12 b21 b22
margin
Support Vectors
B1 b11 b12
< w, x > +b = 0
< w, x > +b = −1 < w.x > +b = +1 f (! x) = 1 if <w,x> + b ≥1 −1 if <w,x> + b ≤ −1 $ % & ' &
M=Margin Width
w w w x x M 2 ) ( = ⋅ − =
− +
x2 x1 x+ x-
n Goal: 1) Correctly classify all training data
n We can formulate a Quadratic Optimization Problem and solve for w and b
n Minimize
2
n The old formulation: n The new formulation incorporating slack variables: n Parameter C can be viewed as a way to control overfitting.
Find w and b such that Minimize ½ wTw and for all {(xi ,yi)} yi (wTxi + b) ≥ 1 Find w and b such that Minimize ½ wTw + CΣξi for all {(xi ,yi)} yi (wTxi + b) ≥ 1- ξi and ξi ≥ 0 for all i
,..., 1 , . 1 if , 1 || || 2 1 min
) ( ) ( , ) ( , ) ( ) ( 1 1 : , ) ( , 2
) ( ,
n i y x x w C w
i uv i v u i v u i v i u T n i y v u i v u
i v u
learning Use SVM to perform binary classification on these instances, to learn model parameter w
RSV (q, du) = (w*.xu)
The Listwise Approach Listwise Loss Minimization Direct Optimization of IR Measure Input Space Document set Output Space Permutation Ordered categories Hypothesis Space Loss Function Listwise loss 1-surrogate measure
m j j
x
1
} {
y
j j
y
1
} {
) ( ) ( x x f h
( sort ) ( x x f h
, ; ( y x h L ) , ; (
y
h L