chapitre recherche d information et apprentissage
play

Chapitre : Recherche d information et apprentissage Slides - PowerPoint PPT Presentation

Chapitre : Recherche d information et apprentissage Slides emprunts De la prsentation Tie-Yan Liu Microsoft Research Asia Conventional Ranking Models Query-dependent Boolean model, extended Boolean model, etc. Vector space


  1. Chapitre : Recherche d ’ information et apprentissage Slides empruntés De la présentation Tie-Yan Liu Microsoft Research Asia

  2. Conventional Ranking Models • Query-dependent – Boolean model, extended Boolean model, etc. – Vector space model, latent semantic indexing (LSI), etc. – BM25 model, statistical language model, etc. • Query-independent – PageRank, TrustRank, BrowseRank, Toolbar Clicks, etc.

  3. Generative vs. Discriminative • All of the probabilistic retrieval models (PRP, LM, Inference model presented so far fall into the category of generative models – A generative model assumes that documents were generated from some underlying model (in this case, usually a multinomial distribution) and uses training data to estimate the parameters of the model – probability of belonging to a class (i.e. the relevant documents for a query) is then estimated using Bayes’ Rule and the document model

  4. Discriminative model for IR • Discriminative models can be trained using – explicit relevance judgments – or click data in query logs • Click data is much cheaper, more noisy

  5. Relevance judgement • Degree of relevance l k – Binary: relevant vs. irrelevant – Multiple ordered categories: Perfect > Excellent > Good > Fair > Bad • Pairwise preference l u,v – Document A is more relevant than document B • Total order π l – Documents are ranked as {A,B,C,..} according to their relevance

  6. Apprentissage de l’ordonnacement : Learning to rank

  7. Machine learning can help • Machine learning is an effective tool – To automatically tune parameters. – To combine multiple evidences. – To avoid over-fitting (by means of regularization, etc.) • “Learning to Rank” – In general, those methods that use machine learning technologies to solve the problem of ranking can be named as “learning to rank” methods.

  8. Machine learning • Given a training set of examples, each of which is a tuple of: a query q, a document d, a relevance judgment for d on q • Learn weights from this training set, so that the learned scores approximate the relevance judgments in the training set

  9. Discriminative Training • An automatic learning process based on the training data • With the four pillars of discriminative learning – Input space, (features vectors) – Output space (+1/-1; real value, ranking) – Hypothesis space (function mapping the input to the output) – Function quality (Loss function: risk, error between the hypothesis and the ground truth)

  10. � � Learning to rank: general approach Use � the � Learned � Model � to � Infer � the � Ranking � Use � the � Learned � Model � to � Infer � the � Ranking � of � Documents � for � New � Queries of � Documents � for � New � Queries Learning � the � Ranking � Model � by � Minimizing � a � Learning � the � Ranking � Model � by � Minimizing � a � Loss � Function � on � the � Training � Data Loss � Function � on � the � Training � Data Feature � Extraction � for � Query � document � Pairs Feature � Extraction � for � Query � document � Pairs Collect � Training � Data � Collect � Training � Data � (Queries � and � their � labeled � documents) (Queries � and � their � labeled � documents) � � � � � �� � �

  11. Example of features 39

  12. Categorization: Basic Unit of Learning • Pointwise – Input: single document – Output: scores or class labels (relevant/non relevant) • Pairwise – Input: document pairs – Output: partial order preference • Listwise – Input: document collections – Output: ranked document List

  13. Catergoriztion of the algorithms

  14. The Pointwise approach The Pointwise Approach Regression Classification Ordinal Regression Input Space Single documents y j Non-ordered Ordinal categories Output Space Real values Categories ( x ) f Hypothesis Space Scoring function Ordinal regression Regression loss Classification loss loss Loss Function ( ; , ) L f x j y j • – • – � � • � � � � • � � �� – � � � � � • � � • � �

  15. The Pointwise approach • Reduce ranking to • – – Regression • • Subset Ranking – � � • – Classification x � � 1 � � • x q �� 2 � � • Discriminative model for IR – � � � � � • MCRank • � x � m • ple – Ordinal regression • PRanking � � ( , ), ( , ),..., ( , ) x y x y x m y 1 1 2 2 m • Ranking with large margin principle

  16. Introduction to Information Retrieval Sec. 15.4.1 Exemple pointwise • Collecter des exemples d’entraînement (q, d, y) triplets – Pertinence r est binaire (peut être graduée) – Document représenté par deux « features » • Le vecteur x=( α , ω ), représenté par deux caractéristiques α est la similarité (entre q et d) , ω est la proximité entre les termes de la requête dans le document – ω est la taille de la partie du texte du document qui inclut tous les mots de la requête • Deux exemples d’approches : – Régression linéaire – Classification

  17. Pointwise approach: linear regression • La pertinence est vue comme une valeur de score • But apprendre la fonction de score qui combine les différentes caractéristiques m ∑ f ( x ) = w i x i + w 0 i = 1 - w les poids ajustés par apprentissage - (x 1 , ..x m ) les caractéristiques du document-requête • Trouver les w i qui réduisent l’erreur suivante : L ( f , x , y ) → 1 2 ∑ (y i - f (x i )) 2 L ( f ; x i , y i ) = f ( x ) − y i 2 n i = 1 • à pertinence (y=1), non pertinence (y=0)

  18. Exemple Régression § Apprendre une fonction de score qui combine les deux « features » (x 1 ,x 2 )= ( α , ω ) f ( d , q ) = w 1 * α ( d , q ) + w 2 * ω ( d , q ) + w 0

  19. Pointwise approach: Classification (SVM) • Ramène la RI à un problème de classification: – Une requête, un document, une classe (Pertinent, non pertinent) (plusieurs catégories) • On cherche une fonction de décision de la forme : – f(x)=sign <(x.w)+b> – On souhaite f(x) ≤ − 1 pour non pertinent et f(x) ≥ 1 pour pertinent

  20. Support Vector Machines B 1 • Find a linear hyperplane (decision boundary) that will separate the data • One Possible Solution

  21. Support Vector Machines B 2 • Another possible solution

  22. Support Vector Machines B 2 • Other possible solutions

  23. Support Vector Machines B 1 B 2 • Which one is better? B1 or B2? • How do you define better?

  24. Support Vector Machines B 1 B 2 b 21 b 22 margin b 11 b 12 • Find hyperplane maximizes the margin => B1 is better than B2

  25. Support Vector Machines B 1 Support Vectors B 2 b 21 b 22 margin b 11 b 12

  26. Support Vector Machines B 1 x 2 x + < w , x > + b = 0 < w . x > + b = + 1 < w , x > + b = − 1 x - b 11 M =Margin Width b 12 x 1 $ f ( ! & ( x + x − ) w 2 1 if <w,x> + b ≥ 1 − ⋅ x ) = M % = = − 1 if <w,x> + b ≤ − 1 w w & '

  27. Linear SVM n Goal: 1) Correctly classify all training data w . x i + b ≥ 1 if y i = +1 wx i + b ≤ 1 if y i = -1 y i ( wx i + b ) ≥ 1 for all i 2 2) Maximize the Margin M = w 1 same as minimize w t w 2 n We can formulate a Quadratic Optimization Problem and solve for w and b n Minimize 1 2 w t w subject to i y i ( wx i + b ) ≥ 1 ∀

  28. Linear SVM(if no separable) Noisy data, outliers, etc. Slack variables ξ i ξ 2 ξ 1 $ 1 if <w,x> + b ≥ 1- ξ i & f ( x ) = % − 1 if <w,x> + b ≤ − 1 + ξ i & '

  29. SVM : Hard Margin v.s. Soft Margin n The old formulation: Find w and b such that Minimize ½ w T w and for all { ( x i ,y i )} y i ( w T x i + b) ≥ 1 n The new formulation incorporating slack variables: Find w and b such that Minimize ½ w T w + C Σ ξ i for all { ( x i ,y i )} y i ( w T x i + b) ≥ 1- ξ i and ξ i ≥ 0 for all i n Parameter C can be viewed as a way to control overfitting.

  30. Sec. ¡15.4.2 ¡ Learning ¡to ¡rank ¡ • Classifica2on ¡(regression) ¡probably ¡isn’t ¡the ¡right ¡way ¡to ¡think ¡ about ¡approaching ¡ad ¡hoc ¡IR: ¡ – Classifica2on ¡problems: ¡Map ¡to ¡a ¡unordered ¡set ¡of ¡classes ¡ – Regression ¡problems: ¡Map ¡to ¡a ¡real ¡value ¡ ¡ – Ordinal ¡regression ¡problems: ¡Map ¡to ¡an ¡ ordered ¡set ¡of ¡ classes ¡ • A ¡fairly ¡obscure ¡sub-­‑branch ¡of ¡sta2s2cs, ¡but ¡what ¡we ¡want ¡here ¡ • This ¡formula2on ¡gives ¡extra ¡power: ¡ – Rela2ons ¡between ¡relevance ¡levels ¡are ¡modeled ¡ – Documents ¡are ¡good ¡versus ¡other ¡documents ¡for ¡query ¡ given ¡collec2on; ¡not ¡an ¡absolute ¡scale ¡of ¡goodness ¡

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend