learning to rank learning to rank with partially labeled
play

Learning to Rank Learning to Rank with Partially-Labeled Data with - PowerPoint PPT Presentation

Learning to Rank Learning to Rank with Partially-Labeled Data with Partially-Labeled Data Kevin Duh University of Washington (Joint work with Katrin Kirchhoff) 1 Motivation Motivation Machine learning can be an effective solution for


  1. Learning to Rank Learning to Rank with Partially-Labeled Data with Partially-Labeled Data Kevin Duh University of Washington (Joint work with Katrin Kirchhoff) 1

  2. Motivation Motivation • Machine learning can be an effective solution for ranking problems in IR • But success depends on quality and size of training data Labeled Unlabeled Data Data 2

  3. Problem Statement Problem Statement Supervised Labeled Ranking function f(x) Learning Algorithm Data Labeled Data Semi-supervised Ranking function f(x) Learning Algorithm Unlabeled Data Can we build a better ranker by adding cheap, unlabeled data? 3

  4. Outline Outline 1. Problem Definition 1. Ranking as a Supervised Learning Problem 2. Two kinds of Partially-labeled Data 2. Proposed Method 3. Results and Analysis Problem Definition | Proposed Method | Result and Analysis 4

  5. Ranking as Supervised Learning Problem Ranking as Supervised Learning Problem Labels Query: SIGIR = x tfidf pagerank (1) [ , ,...] 3 1 = x tfidf pagerank (1 ) [ , ,...] 1 2 = x tfidf pagerank (1 ) [ , ,...] 2 3 Query: Hotels in Singapore = x tfidf pagerank (2) [ , ,...] 2 1 = x tfidf pagerank (2 ) [ , ,...] 1 2 Problem Definition | Proposed Method | Result and Analysis 5

  6. Ranking as Supervised Learning Problem Ranking as Supervised Learning Problem Query: SIGIR f x Train ( ) such that: = x tfidf pagerank (1) [ , ,...] 3 1 > > f x f x f x (1) (1) (1) ( ) ( ) ( ) 1 3 2 > f x f x ( 2 ) ( ) 2 = x tfidf pagerank (1 ) ( ) ( ) [ , ,...] 1 2 1 2 = x tfidf pagerank (1 ) [ , ,...] 2 Test Query: Singapore Airport 3 ? Query: Hotels in Singapore = x tfidf pagerank (2) ? [ , ,...] 2 1 = x tfidf pagerank (2 ) [ , ,...] 1 2 ? Problem Definition | Proposed Method | Result and Analysis 6

  7. Two kinds of Partially-Labeled Data Two kinds of Partially-Labeled Data 1. Lack of labels for some documents (depth) Some references: Query1 Query2 Query3 Amini+, SIGIR’08 Agarwal, ICML’06 Doc1 Label Doc1 Label Doc1 Label Wang+, MSRA TechRep’05 Zhou+, NIPS’04 Doc2 Label Doc2 Label Doc2 Label He+, ACM Multimedia ‘04 Doc3 ? Doc3 ? Doc3 ? 2. Lack of labels for some queries (breadth) Query1 Query2 Query3 This paper Truong+, ICMIST’06 Doc1 Label Doc1 Label Doc1 ? Doc2 Label Doc2 Label Doc2 ? Doc3 Label Doc3 Label Doc3 ? Problem Definition | Proposed Method | Result and Analysis 7

  8. Focus of this work: Focus of this work: Transductive Learning Transductive Learning • Unlabeled data = Test data � Transductive Learning Query1 Query2 Test Query Doc1 Label Doc1 Label Doc1 ? Doc2 Label Doc2 Label Doc2 ? Doc3 Label Doc3 Label Doc3 ? • Main question: How can knowledge of the test list help our learning algorithm? Problem Definition | Proposed Method | Result and Analysis 8

  9. Why transductive learning? Why transductive learning? Inductive (semi-supervised) learning: Need to generalize to new data Query1 Query2 Query3 Test Query f(x) Doc1 Label Doc1 Label Doc1 ? Doc1 ? Doc2 Label Doc2 Label Doc2 ? Doc2 ? Doc3 Label Doc3 Label Doc3 ? Doc3 ? Inductive learning Transductive learning: = closed-book exam Test data is fixed and observed during learning; Arguably, transduction is easier than induction Transductive learning = open-note exam Query1 Query2 Test Query Doc1 Label Doc1 Label Doc1 ? Doc2 Label Doc2 Label Doc2 ? Doc3 Label Doc3 Label Doc3 ? Problem Definition | Proposed Method | Result and Analysis 9

  10. Outline Outline 1. Problem Definition 2. Proposed Method 1. Intuition 2. Details of proposed algorithm 3. Results and Analysis Problem Definition | Proposed Method | Result and Analysis 10

  11. Thought Experiment: What information Thought Experiment: What information does unlabeled data provide? does unlabeled data provide? Query 1 & Documents Query 2 & Documents Observation: Direction of variance differs according to query BM25 Implication: Different feature representations are optimal for different queries HITS HITS Problem Definition | Proposed Method | Result and Analysis 11

  12. Good results can be achieved by: Good results can be achieved by: Ranking Query 1 by BM25 only Ranking Query 1 by BM25 only Ranking Query 2 by HITS only Ranking Query 2 by HITS only Query 1 & Documents Query 2 & Documents Relevant webpages (high rank) BM25 Irrelevant webpages (low rank) HITS HITS 12

  13. Proposed Method: Main Ideas Proposed Method: Main Ideas Main Assumptions: 1. Different queries are best modeled by different features 2. Unlabeled data can help us discover this representation Two-Step Algorithm: Requires: - DISCOVER(): unsupervised method for finding useful features - LEARN(): supervised method for learning to rank For each Test List: - Run DISCOVER() - Augment Feature Representation - Run LEARN() and Predict Problem Definition | Proposed Method | Result and Analysis 13

  14. Proposed Method: Illustration Proposed Method: Illustration Test Query1 Query1 Query2 Doc1 ? Doc1 Label Doc1 Label Doc2 ? Doc2 Label Doc2 Label Doc3 ? Doc3 Label Doc3 Label x : initial feature representation Unsupervised learning outputs projection matrix A predict Query1 Query2 Supervised learning Doc1 Label Doc1 Label of ranking function Doc2 Label Doc2 Label Doc3 Label Doc3 Label z=A’x : new feature representation Problem Definition | Proposed Method | Result and Analysis 14

  15. DISCOVER( ) Component DISCOVER( ) Component • Goal of DISCOVER( ): Find useful patterns on the test list • Principal Components Analysis (PCA) • Discovers direction of maximum variance • View low variance directions as noise Kernel PCA [Scholkopf+, Neural Computation 98] • • Non-linear extension to PCA via the Kernel Trick 1. Maps inputs non-linearly to high-dimensional space. 2. Performs PCA in that space Problem Definition | Proposed Method | Result and Analysis 15

  16. Kernels for Kernel PCA Kernels for Kernel PCA Linear Gaussian ′ =< ′ > K x x x x ′ = − β − ′ K x x x x ( , ) , ( , ) exp( || ||) Polynomial Diffusion K x x ′ = ′ = + < ′ > d Random walk K x x x x ( , ) ( , ) (1 , ) between x, x’ on graph Problem Definition | Proposed Method | Result and Analysis 16

  17. LEARN( ) Component LEARN( ) Component • Goal of LEARN( ): • Optimize some ranking metric on labeled data • RankBoost [Freund+, JMLR 2003] • Inherent Feature Selection • Few parameters to tune • Other supervised ranking methods are possible: • RankNet, Rank SVM, ListNet, FRank, SoftRank, etc. Problem Definition | Proposed Method | Result and Analysis 17

  18. Summary of Proposed Method Summary of Proposed Method • Relies on unlabeled test data to learn good feature representation • “Adapts” the supervised learning process to each test list • Caveats: • DISCOVER() may not always find features that are helpful for LEARN() • Run LEARN() at query time � Computational speedup is needed in practical application Problem Definition | Proposed Method | Result and Analysis 18

  19. Outline Outline 1. Problem Definition 2. Proposed Method 3. Results and Analysis 1. Experimental Setup 2. Main Results 3. Deeper analysis into where things worked and failed Problem Definition | Proposed Method | Result and Analysis 19

  20. Experiment Setup (1/2) Experiment Setup (1/2) • LETOR Dataset [Liu+, LR4IR 2007] : TREC03 TREC04 OHSUMED # of queries 50 75 106 Average # of documents/query 1000 1000 150 # of original features 44 44 25 • Additional features generated by Kernel PCA: • 5 kernels: Linear, Polynomial, Gaussian, Diffusion 1, Diffusion 2 • Extract 5 principal components for each Problem Definition | Proposed Method | Result and Analysis 20

  21. Experiment Setup (2/2) Experiment Setup (2/2) • Comparison of 3 systems: • Baseline: Supervised RankBoost • Transductive: Proposed method: Kernel PCA + Supervised RankBoost • Combined: Average of Baseline, Transductive outputs = + i i i f x sort f x f x ( ) ( ) ( ) ( ) { ( ) ( )} baseline n transductive n • Evaluation: • Mean Averaged Precision (MAP) • Normalized Discount Cumulative Gain (NDCG) � see the paper Problem Definition | Proposed Method | Result and Analysis 21

  22. Overall Results (MAP) Overall Results (MAP) ��� transductive ���� combined baseline ��� �������� ���� ������������ ��� �� !���� ���� ��� ���� � � � 1. Transductive outperforms Baseline � � � � � � 2. Combined give extra improvements � � � � � (2 datasets) � � � � � The rankers make complementary � mistakes Problem Definition | Proposed Method | Result and Analysis 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend