learning to rank with learning to rank with partially
play

Learning to Rank with Learning to Rank with Partially-Labeled Data - PowerPoint PPT Presentation

Learning to Rank with Learning to Rank with Partially-Labeled Data Partially-Labeled Data Kevin Duh University of Washington 1 The Ranking Problem The Ranking Problem Definition: Given a set of objects, sort them by preference. objectA


  1. Learning to Rank with Learning to Rank with Partially-Labeled Data Partially-Labeled Data Kevin Duh University of Washington 1

  2. The Ranking Problem The Ranking Problem • Definition: Given a set of objects, sort them by preference. objectA objectA Ranking Function (obtained via machine learning) objectB objectB objectC objectC 2

  3. Application: Web Search Application: Web Search You enter “uw” into the searchbox… All webpages containing the term “uw”: Results presented to user, after ranking: 1st 2nd 3rd 4th 5th 3

  4. Application: Machine Translation Application: Machine Translation Basic 1 st Pass Decoder translation/language models 1 st : The vodka is good, but the meat is rotten N-best list: 2 nd : The spirit is willing but the flesh is weak 3 rd : The vodka is good. Advanced Ranker (Re-ranker) translation/language models 1 st : The spirit is willing but the flesh is weak 2 nd : The vodka is good, but the meat is rotten 3 rd : The vodka is good. 4

  5. Application: Protein Structure Application: Protein Structure Prediction Prediction Amino Acid Sequence: MMKLKSNQTRTYDGDGYKKRAACLCFSE various protein 1st folding simulations 2nd Ranker 3rd Candidate 3-D Structures 5

  6. Goal of this thesis Goal of this thesis Supervised Labeled Ranking function f(x) Learning Algorithm Data Labeled Data Semi-supervised Ranking function f(x) Learning Algorithm Unlabeled Data Can we build a better ranker by adding cheap, unlabeled data? 6

  7. Emerging field Emerging field Semi-supervised Ranking Semi-supervised Supervised Classification Ranking 7

  8. Outline Outline 1. Problem Setup 1. Background in Ranking 2. Two types of partially-labeled data 3. Methodology 2. Manifold Assumption 3. Local/Transductive Meta-Algorithm 4. Summary Problem Setup | Manifold | Local/Transductive | Summary 8

  9. Ranking as Supervised Learning Problem Ranking as Supervised Learning Problem Labels Query: UW = i x tfidf pagerank ( ) [ , ,...] 3 1 i = x tfidf pagerank ( ) [ , ,...] 1 2 i = x tfidf pagerank ( ) [ , ,...] 2 3 Query: Seattle Traffic j = x tfidf pagerank ( ) [ , ,...] 2 1 j = x tfidf pagerank ( ) [ , ,...] 1 2 Problem Setup | Manifold | Local/Transductive | Summary 9

  10. Ranking as Supervised Learning Problem Ranking as Supervised Learning Problem F x Query: UW ( ) Train such that = i x tfidf pagerank ( ) [ , ,...] 3 > > 1 F x F x F x (1) (1) (1) ( ) ( ) ( ) 1 3 2 > F x F x (2) (2) ( ) ( ) i = x tfidf pagerank ( ) [ , ,...] 1 1 2 2 i = x tfidf pagerank ( ) [ , ,...] 2 Test Query: MSR 3 ? Query: Seattle Traffic j = x tfidf pagerank ( ) ? [ , ,...] 2 1 j = x tfidf pagerank ( ) [ , ,...] 1 ? 2 Problem Setup | Manifold | Local/Transductive | Summary 10

  11. Semi-supervised Data: Some labels are missing Semi-supervised Data: Some labels are missing Labels Query: UW = i x tfidf pagerank ( ) [ , ,...] 3 1 i = x tfidf pagerank ( ) [ , ,...] 1 2 i = x tfidf pagerank ( ) X [ , ,...] 2 3 Query: Seattle Traffic j = x tfidf pagerank ( ) [ , ,...] X 2 1 j = x tfidf pagerank ( ) X [ , ,...] 1 2 Problem Setup | Manifold | Local/Transductive | Summary 11

  12. Two kinds of Semi-supervised Data Two kinds of Semi-supervised Data 1. Lack of labels for some documents (depth) Some references: Query1 Query2 Query3 Amini+, SIGIR’08 Agarwal, ICML’06 Doc1 Label Doc1 Label Doc1 Label Wang+, MSRA TechRep’05 Zhou+, NIPS’04 Doc2 Label Doc2 Label Doc2 Label He+, ACM Multimedia ‘04 Doc3 ? Doc3 ? Doc3 ? 2. Lack of labels for some queries (breadth) Query1 Query2 Query3 This thesis Duh&Kirchhoff, SIGIR’08 Doc1 Label Doc1 Label Doc1 ? Truong+, ICMIST’06 Doc2 Label Doc2 Label Doc2 ? Doc3 Label Doc3 Label Doc3 ? Problem Setup | Manifold | Local/Transductive | Summary 12

  13. Why “Breadth” Scenario Why “Breadth” Scenario • Information Retrieval: Long tail of search queries “20-25% of the queries we will see today, we have never seen before” – Udi Manber (Google VP), May 2007 • Machine Translation and Protein Prediction: • Given references (costly), computing labels is trivial candidate 1 candidate 2 reference similarity=0.3 similarity=0.9 Problem Setup | Manifold | Local/Transductive | Summary 13

  14. Methodology of this thesis Methodology of this thesis 1. Make an assumption about how can unlabeled lists be useful • Borrow ideas from semi-supervised classification 2. Design a method to implement it • 4 unlabeled data assumptions & 4 methods 3. Test on various datasets • Analyze when a method works and doesn’t work Problem Setup | Manifold | Local/Transductive | Summary 14

  15. Datasets Datasets Information Retrieval datasets - from LETOR distribution [Liu’07] - TREC: Web search / OHSUMED: Medical search - Evaluation: MAP (measures how high relevant documents are on list) OHSUMED TREC TREC Arabic Italian Protein translation 2003 2004 translation prediction # lists 50 75 100 500 500 100 label type 2 2 3 conti- conti- conti- level level levels nuous nuous nuous avg # objects per list 1000 1000 150 260 360 120 # features 44 44 25 9 10 25 Problem Setup | Manifold | Local/Transductive | Summary 15

  16. Datasets Datasets Machine Translation datasets - from IWSLT 2007 competition, UW system [Kirchhoff’07] - translation in the travel domain - Evaluation: BLEU (measures word match to reference) OHSUMED TREC TREC Arabic Italian Protein translation 2003 2004 translation prediction # lists 50 75 100 500 500 100 label type 2 2 3 conti- conti- conti- level level levels nuous nuous nuous avg # objects per list 1000 1000 150 260 360 120 # features 44 44 25 9 10 25 Problem Setup | Manifold | Local/Transductive | Summary 16

  17. Datasets Datasets Protein Prediction dataset - from CASP competition [Qiu/Noble’07] - Evaluation: GDT-TS (measures closeness to true 3-D structure) OHSUMED TREC TREC Arabic Italian Protein translation 2003 2004 translation prediction # lists 50 75 100 500 500 100 label type 2 2 3 conti- conti- conti- level level levels nuous nuous nuous avg # objects per list 1000 1000 150 260 360 120 # features 44 44 25 9 10 25 Problem Setup | Manifold | Local/Transductive | Summary 17

  18. Outline Outline 1. Problem Setup 2. Manifold Assumption • Definition • Ranker Propagation Method • List Kernel similarity 3. Local/Transductive Meta-Algorithm 4. Summary Problem Setup | Manifold | Local/Transductive | Summary 18

  19. Manifold Assumption in Classification Manifold Assumption in Classification -Unlabeled data can help discover underlying data manifold -Labels vary smoothly over this manifold + + + + + + Prior work: + + + 1. How to give labels to test samples? + + + - Mincut [Blum01] + - Label Propagation [Zhu03] - - + + - - Regularizer+Optimization [Belkin03] - - - - + - 2. How to construct graph? - - - k-nearest neighbors, eps-ball - - - data-driven methods - - - - - - [Argyriou05,Alexandrescu07] - Problem Setup | Manifold | Local/Transductive | Summary 19

  20. Manifold Assumption in Ranking Manifold Assumption in Ranking Ranking functions vary smoothly over the manifold Each node is a List Edges represent “similarity” between two lists Problem Setup | Manifold | Local/Transductive | Summary 20

  21. Ranker Propagation Ranker Propagation Algorithm: w (1) 1. For each train list, fit a ranker w (4) = ∈ ∈ T d d F x w x w R x R ( ) , w (u) 2. Minimize objective: w (2) ∑ 2 ij i − j K w w ( ) ( ) ( ) || || ∈ ij edges Ranker for list i Similarity between list i,j w (3) u = − uu ul l W inv L L W ( ) ( ) ( ) ( ) ( ) Problem Setup | Manifold | Local/Transductive | Summary 21

  22. Similarity between lists: Similarity between lists: Desirable properties Desirable properties • Maps two lists of feature vectors to scalar K( , ) =0.7 • Work on variable length lists (different N in N-best) • Satisfy symmetric, positive semi-definite properties • Measure rotation/shape differences Problem Setup | Manifold | Local/Transductive | Summary 22

  23. List Kernel List Kernel u (i) u (j) List i List j 2 2 u (i) Step 1: 1 PCA u (j) 1 u (i) u (j) Step 2: Compute 1 1 similarity between axes u (j) u (i) 2 2 λ (i) 2 λ (j) 2 |<u (i) 2 ,u (j) 2 >| M = ∑ ij λ λ i j < i j > K u u ( ) ( ) ( ) ( ) ( ) Step 3: Maximum | , | λ i ⋅ λ j ( ) ( ) / || || || || m a m m a m Bipartite Matching ( ) ( ) = m 1 Problem Setup | Manifold | Local/Transductive | Summary 23

  24. Evaluation in Evaluation in Machine Translation & Protein Prediction Machine Translation & Protein Prediction Ranker Propagation (with List Kernel) outperforms Supervised Baseline (MERT linear ranker) translation Arabic 24.3 * 25.6 58.1 Baseline (MERT) Protein prediction Ranker translation Propagation * Italian 59.1 21.2 22.3 20 30 55 60 * Indicates statistically significant improvement (p<0.05) over baseline Problem Setup | Manifold | Local/Transductive | Summary 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend