Learning to Rank Learning to Rank with Partially-Labeled Data with - PowerPoint PPT Presentation

Learning to Rank Learning to Rank with Partially-Labeled Data with Partially-Labeled Data Kevin Duh University of Washington (Joint work with Katrin Kirchhoff) 1

Motivation Motivation • Machine learning can be an effective solution for ranking problems in IR • But success depends on quality and size of training data Labeled Unlabeled Data Data 2

Problem Statement Problem Statement Supervised Labeled Ranking function f(x) Learning Algorithm Data Labeled Data Semi-supervised Ranking function f(x) Learning Algorithm Unlabeled Data Can we build a better ranker by adding cheap, unlabeled data? 3

Outline Outline 1. Problem Definition 1. Ranking as a Supervised Learning Problem 2. Two kinds of Partially-labeled Data 2. Proposed Method 3. Results and Analysis Problem Definition | Proposed Method | Result and Analysis 4

Ranking as Supervised Learning Problem Ranking as Supervised Learning Problem Labels Query: SIGIR = x tfidf pagerank (1) [ , ,...] 3 1 = x tfidf pagerank (1 ) [ , ,...] 1 2 = x tfidf pagerank (1 ) [ , ,...] 2 3 Query: Hotels in Singapore = x tfidf pagerank (2) [ , ,...] 2 1 = x tfidf pagerank (2 ) [ , ,...] 1 2 Problem Definition | Proposed Method | Result and Analysis 5

Ranking as Supervised Learning Problem Ranking as Supervised Learning Problem Query: SIGIR f x Train ( ) such that: = x tfidf pagerank (1) [ , ,...] 3 1 > > f x f x f x (1) (1) (1) ( ) ( ) ( ) 1 3 2 > f x f x ( 2 ) ( ) 2 = x tfidf pagerank (1 ) ( ) ( ) [ , ,...] 1 2 1 2 = x tfidf pagerank (1 ) [ , ,...] 2 Test Query: Singapore Airport 3 ? Query: Hotels in Singapore = x tfidf pagerank (2) ? [ , ,...] 2 1 = x tfidf pagerank (2 ) [ , ,...] 1 2 ? Problem Definition | Proposed Method | Result and Analysis 6

Two kinds of Partially-Labeled Data Two kinds of Partially-Labeled Data 1. Lack of labels for some documents (depth) Some references: Query1 Query2 Query3 Amini+, SIGIR’08 Agarwal, ICML’06 Doc1 Label Doc1 Label Doc1 Label Wang+, MSRA TechRep’05 Zhou+, NIPS’04 Doc2 Label Doc2 Label Doc2 Label He+, ACM Multimedia ‘04 Doc3 ? Doc3 ? Doc3 ? 2. Lack of labels for some queries (breadth) Query1 Query2 Query3 This paper Truong+, ICMIST’06 Doc1 Label Doc1 Label Doc1 ? Doc2 Label Doc2 Label Doc2 ? Doc3 Label Doc3 Label Doc3 ? Problem Definition | Proposed Method | Result and Analysis 7

Focus of this work: Focus of this work: Transductive Learning Transductive Learning • Unlabeled data = Test data � Transductive Learning Query1 Query2 Test Query Doc1 Label Doc1 Label Doc1 ? Doc2 Label Doc2 Label Doc2 ? Doc3 Label Doc3 Label Doc3 ? • Main question: How can knowledge of the test list help our learning algorithm? Problem Definition | Proposed Method | Result and Analysis 8

Why transductive learning? Why transductive learning? Inductive (semi-supervised) learning: Need to generalize to new data Query1 Query2 Query3 Test Query f(x) Doc1 Label Doc1 Label Doc1 ? Doc1 ? Doc2 Label Doc2 Label Doc2 ? Doc2 ? Doc3 Label Doc3 Label Doc3 ? Doc3 ? Inductive learning Transductive learning: = closed-book exam Test data is fixed and observed during learning; Arguably, transduction is easier than induction Transductive learning = open-note exam Query1 Query2 Test Query Doc1 Label Doc1 Label Doc1 ? Doc2 Label Doc2 Label Doc2 ? Doc3 Label Doc3 Label Doc3 ? Problem Definition | Proposed Method | Result and Analysis 9

Outline Outline 1. Problem Definition 2. Proposed Method 1. Intuition 2. Details of proposed algorithm 3. Results and Analysis Problem Definition | Proposed Method | Result and Analysis 10

Thought Experiment: What information Thought Experiment: What information does unlabeled data provide? does unlabeled data provide? Query 1 & Documents Query 2 & Documents Observation: Direction of variance differs according to query BM25 Implication: Different feature representations are optimal for different queries HITS HITS Problem Definition | Proposed Method | Result and Analysis 11

Good results can be achieved by: Good results can be achieved by: Ranking Query 1 by BM25 only Ranking Query 1 by BM25 only Ranking Query 2 by HITS only Ranking Query 2 by HITS only Query 1 & Documents Query 2 & Documents Relevant webpages (high rank) BM25 Irrelevant webpages (low rank) HITS HITS 12

Proposed Method: Main Ideas Proposed Method: Main Ideas Main Assumptions: 1. Different queries are best modeled by different features 2. Unlabeled data can help us discover this representation Two-Step Algorithm: Requires: - DISCOVER(): unsupervised method for finding useful features - LEARN(): supervised method for learning to rank For each Test List: - Run DISCOVER() - Augment Feature Representation - Run LEARN() and Predict Problem Definition | Proposed Method | Result and Analysis 13

Proposed Method: Illustration Proposed Method: Illustration Test Query1 Query1 Query2 Doc1 ? Doc1 Label Doc1 Label Doc2 ? Doc2 Label Doc2 Label Doc3 ? Doc3 Label Doc3 Label x : initial feature representation Unsupervised learning outputs projection matrix A predict Query1 Query2 Supervised learning Doc1 Label Doc1 Label of ranking function Doc2 Label Doc2 Label Doc3 Label Doc3 Label z=A’x : new feature representation Problem Definition | Proposed Method | Result and Analysis 14

DISCOVER( ) Component DISCOVER( ) Component • Goal of DISCOVER( ): Find useful patterns on the test list • Principal Components Analysis (PCA) • Discovers direction of maximum variance • View low variance directions as noise Kernel PCA [Scholkopf+, Neural Computation 98] • • Non-linear extension to PCA via the Kernel Trick 1. Maps inputs non-linearly to high-dimensional space. 2. Performs PCA in that space Problem Definition | Proposed Method | Result and Analysis 15

Kernels for Kernel PCA Kernels for Kernel PCA Linear Gaussian ′ =< ′ > K x x x x ′ = − β − ′ K x x x x ( , ) , ( , ) exp( || ||) Polynomial Diffusion K x x ′ = ′ = + < ′ > d Random walk K x x x x ( , ) ( , ) (1 , ) between x, x’ on graph Problem Definition | Proposed Method | Result and Analysis 16

LEARN( ) Component LEARN( ) Component • Goal of LEARN( ): • Optimize some ranking metric on labeled data • RankBoost [Freund+, JMLR 2003] • Inherent Feature Selection • Few parameters to tune • Other supervised ranking methods are possible: • RankNet, Rank SVM, ListNet, FRank, SoftRank, etc. Problem Definition | Proposed Method | Result and Analysis 17

Summary of Proposed Method Summary of Proposed Method • Relies on unlabeled test data to learn good feature representation • “Adapts” the supervised learning process to each test list • Caveats: • DISCOVER() may not always find features that are helpful for LEARN() • Run LEARN() at query time � Computational speedup is needed in practical application Problem Definition | Proposed Method | Result and Analysis 18

Outline Outline 1. Problem Definition 2. Proposed Method 3. Results and Analysis 1. Experimental Setup 2. Main Results 3. Deeper analysis into where things worked and failed Problem Definition | Proposed Method | Result and Analysis 19

Experiment Setup (1/2) Experiment Setup (1/2) • LETOR Dataset [Liu+, LR4IR 2007] : TREC03 TREC04 OHSUMED # of queries 50 75 106 Average # of documents/query 1000 1000 150 # of original features 44 44 25 • Additional features generated by Kernel PCA: • 5 kernels: Linear, Polynomial, Gaussian, Diffusion 1, Diffusion 2 • Extract 5 principal components for each Problem Definition | Proposed Method | Result and Analysis 20

Experiment Setup (2/2) Experiment Setup (2/2) • Comparison of 3 systems: • Baseline: Supervised RankBoost • Transductive: Proposed method: Kernel PCA + Supervised RankBoost • Combined: Average of Baseline, Transductive outputs = + i i i f x sort f x f x ( ) ( ) ( ) ( ) { ( ) ( )} baseline n transductive n • Evaluation: • Mean Averaged Precision (MAP) • Normalized Discount Cumulative Gain (NDCG) � see the paper Problem Definition | Proposed Method | Result and Analysis 21

Overall Results (MAP) Overall Results (MAP) �� transductive �� combined baseline �� !�� 1. Transductive outperforms Baseline � � � � � � 2. Combined give extra improvements � � � � � (2 datasets) � � � � � The rankers make complementary � mistakes Problem Definition | Proposed Method | Result and Analysis 22

Learning to Rank Learning to Rank with Partially-Labeled Data with - PowerPoint PPT Presentation

Learning to Rank Learning to Rank with Partially-Labeled Data with Partially-Labeled Data Kevin Duh University of Washington (Joint work with Katrin Kirchhoff) 1 Motivation Motivation Machine learning can be an effective solution for

Learning to Rank with Learning to Rank with Partially-Labeled Data Partially-Labeled Data Kevin

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Probabilistic Graphical Models 10-708 Learning Partially Observed Learning Partially Observed

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Rayleigh- -Taylor instability Taylor instability Rayleigh in partially ionized in partially

Blind and Partially Sighted Blind and Partially Sighted People People Lifelong Learning

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

Partially specified Probabilities: decisions and games May 2007 Ehud Lehrer The problem

SHOCK ACCELERATION SHOCK ACCELERATION IN PARTIALLY IONIZED PLASMAS IN PARTIALLY IONIZED

Balanced Group Labeled Graphs M. Joglekar N. Shah A.A. Diwan Department of Computer Science and

Type Systems 3. Labeled Variants 4. Lists Lecture 4 Nov. 10th, 2004 5. Normalization

Labeled Transition Systems 2IT70 Finite Automata and Process Theory Technische Universiteit

2018 - 2019 Teacher Salary Comparison Report 0-Year 5-Year 10-Year 15-Year 20-Year District

Introduction to rank-based cryptography Philippe Gaborit University of Limoges, France ASCRYPTO

Lecture 3: Sports rating models David Aldous January 27, 2016 Sports are a popular topic for

Ranking Distributed Probabilistic Data Jeffrey Jestes Feifei Li Ke Yi 1-1 Introduction

Semantic Sitemaps R. Cyganiak, H. Stenzhorn, R. Delbru, S. Decker, G. Tummarello DERI Galway A

Automatic 3D Mapping for Tree Diameter Measurements in Inventory Operations Jean-Franois

FIFA Foe Fun! Tim Chartier Mark Kozek Davidson College Whittier College Michael

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

Labeling Information Enhancement for Multi-label Learning with Low-rank Subspace An Tao* , Ning

The BUILD Act & EPAs Brownfields Program June 6, 2018 Webinar Topics I. BUILD Act

Learning to Rank Learning to Rank with Partially-Labeled Data with - PowerPoint PPT Presentation

Learning to Rank Learning to Rank with Partially-Labeled Data with Partially-Labeled Data Kevin Duh University of Washington (Joint work with Katrin Kirchhoff) 1 Motivation Motivation Machine learning can be an effective solution for

Learning to Rank with Learning to Rank with Partially-Labeled Data Partially-Labeled Data Kevin

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Probabilistic Graphical Models 10-708 Learning Partially Observed Learning Partially Observed

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Rayleigh- -Taylor instability Taylor instability Rayleigh in partially ionized in partially

Blind and Partially Sighted Blind and Partially Sighted People People Lifelong Learning

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

Partially specified Probabilities: decisions and games May 2007 Ehud Lehrer The problem

SHOCK ACCELERATION SHOCK ACCELERATION IN PARTIALLY IONIZED PLASMAS IN PARTIALLY IONIZED

Balanced Group Labeled Graphs M. Joglekar N. Shah A.A. Diwan Department of Computer Science and

Type Systems 3. Labeled Variants 4. Lists Lecture 4 Nov. 10th, 2004 5. Normalization

Labeled Transition Systems 2IT70 Finite Automata and Process Theory Technische Universiteit

2018 - 2019 Teacher Salary Comparison Report 0-Year 5-Year 10-Year 15-Year 20-Year District

Introduction to rank-based cryptography Philippe Gaborit University of Limoges, France ASCRYPTO

Lecture 3: Sports rating models David Aldous January 27, 2016 Sports are a popular topic for

Ranking Distributed Probabilistic Data Jeffrey Jestes Feifei Li Ke Yi 1-1 Introduction

Semantic Sitemaps R. Cyganiak, H. Stenzhorn, R. Delbru, S. Decker, G. Tummarello DERI Galway A

Automatic 3D Mapping for Tree Diameter Measurements in Inventory Operations Jean-Franois

FIFA Foe Fun! Tim Chartier Mark Kozek Davidson College Whittier College Michael

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

Labeling Information Enhancement for Multi-label Learning with Low-rank Subspace An Tao* , Ning

The BUILD Act &amp; EPAs Brownfields Program June 6, 2018 Webinar Topics I. BUILD Act

The BUILD Act & EPAs Brownfields Program June 6, 2018 Webinar Topics I. BUILD Act