On the Consistency of Ranking Algorithms John Duchi Lester Mackey - PowerPoint PPT Presentation

On the Consistency of Ranking Algorithms John Duchi Lester Mackey Michael I. Jordan University of California, Berkeley International Conference on Machine Learning, 2010 Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 1 / 24

Ranking Goal: Order set of inputs/results to best match the preferences of an individual or a population ◮ Web search: Return most relevant results for user queries ◮ Recommendation systems: ◮ Suggest movies to watch based on user’s past ratings ◮ Suggest news articles to read based on past browsing history ◮ Advertising placement: Maximize profit and click-through Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 2 / 24

Supervised ranking setup Observe: Sequence of training examples 1 ◮ Query q : e.g., search term a12 a13 ◮ Set of results x to rank ◮ Items { 1 , 2 , 3 , 4 } 2 3 ◮ Weighted DAG G representing preferences over results a34 ◮ Item 1 preferred to { 2 , 3 } and item 3 to 4 4 Observe multiple preference graphs for the same Example G with query q and results x x = { 1 , 2 , 3 , 4 } Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 3 / 24

Supervised ranking setup Learn: Scoring function f ( x ) to rank results x ◮ Real-valued score for result i 1 a12 a13 s i := f i ( x ) 2 3 ◮ Result i ranked above j iff f i ( x ) > f j ( x ) a34 ◮ Loss suffered when scores s disagree with preference graph G : 4 � L ( s, G ) = a ij 1 ( s i <s j ) Example G with i,j x = { 1 , 2 , 3 , 4 } Example: L ( s, G ) = a 12 1 ( s 1 <s 2 ) + a 23 1 ( s 1 <s 3 ) + a 34 1 ( s 3 <s 4 ) Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 4 / 24

Supervised ranking setup Example: Scoring function f optimally ranks results in G 1 f 1 ( x ) > f 2 ( x ) 2 f 2 ( x ) > f 3 ( x ) 3 G Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 5 / 24

Detour to classification Consider the simpler problem of classification ◮ Given: Input x , label y ∈ {− 1 , 1 } ◮ Learn: Classification function f ( x ) . Have margin s = yf ( x ) Loss L ( s ) = 1 ( s ≤ 0) Surrogate loss φ ( s ) Hard Tractable Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 6 / 24

Classification and surrogate consistency Question: Does minimizing expected φ -loss minimize expected L ? n →∞ Minimize � n i =1 φ ( y i f ( x i )) ⇒ Minimize E φ ( Y f ( X )) ? ⇐ ⇒ Minimize E L ( Y f ( X )) Theorem: If φ is convex, procedure based on minimizing φ is consistent if and only if φ ′ (0) < 0 . 1 Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 7 / 24 1 Bartlett, Jordan, McAuliffe 2006

What about ranking consistency? Minimization of true ranking loss is hard ◮ Replace ranking loss L ( s, G ) with tractable surrogate ϕ ( s, G ) Question: When is surrogate minimization consistent for ranking? Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 8 / 24

Conditional losses 1 1 1 .5 a12 a12 a13 a21 a23' .5 a23+ .5 a23' .5 a21 2 3 2 3 2 3 a34 a34' .5 a34+ .5 a34' 4 4 4 p ( G 1 ) = . 5 p ( G 2 ) = . 5 Aggregate ◮ ℓ ( p, s ) = � G p ( G | x, q ) L ( s, G ) ◮ ℓ ( p, s ) = . 5 a 21 1 ( s 2 <s 1 ) + . 5( a 12 + a ′ 12 )1 ( s 1 <s 2 ) + . 5( a 23 + a ′ 23 )1 ( s 1 <s 3 ) + . 5( a 34 + a ′ 34 )1 ( s 3 <s 4 ) ◮ Optimal score vectors A ( p ) = argmin ℓ ( p, s ) s Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 9 / 24

Consistency theorem Theorem: Procedure minimizing ϕ is asymptotically consistent if and only if �� inf p ( G ) ϕ ( s, G ) � s �∈ A ( p ) > inf p ( G ) ϕ ( s, G ) � s s G G In other words, ϕ is consistent if and only if minimization gives correct order to the results Goal: Find tractable ϕ so s that minimizes � p ( G ) ϕ ( s, G ) G minimizes ℓ ( p, s ) . Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 10 / 24

Consistent and Tractable? Hard to get consistent and tractable ϕ ◮ In general, it is NP-hard even to find s minimizing � p ( G ) L ( s, G ) . G (reduction from feedback arc-set problem) Some restrictions on the problem space necessary... Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 11 / 24

Low noise setting Definition: Low noise if a ij − a ji > 0 and a jk − a kj > 0 implies a ik − a ki ≥ ( a ij − a ji ) + ( a jk − a kj ) 1 ◮ Intuition: weight on path reinforces local a13 weights, local weights reinforce paths. a12 a31 ◮ Reverse triangle inequality a23 ◮ True when DAG derived from user ratings 2 3 a 13 − a 31 ≥ a 12 + a 23 Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 12 / 24

Trying to achieve consistency Try ideas from classification: φ is convex, bounded below, φ ′ (0) < 0 . Common in ranking literature. 2 � ϕ ( s, G ) = a ij φ ( s i − s j ) ij 1 3 a12 a34 ϕ ( s, G ) = a 12 φ ( s 1 − s 2 ) + a 34 φ ( s 3 − s 4 ) 2 4 Theorem: ϕ is not consistent, even in low noise settings. 2 Herbrich et al., 2000; Freund et al., 2003; Dekel et al., 2004, etc. Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 13 / 24

What is the problem? Surrogate loss ϕ ( s, G ) = � ij a ij φ ( s i − s j ) 1 1 1 a13 a12 a12 a13 a31 a31 a23 a23 2 3 2 3 2 3 p ( G 1 ) = . 5 p ( G 2 ) = . 5 Aggregate p ( G ) ϕ ( s, G ) = 1 2 ϕ ( s, G 1 ) + 1 � 2 ϕ ( s, G 2 ) G ∝ a 12 φ ( s 1 − s 2 ) + a 13 φ ( s 1 − s 3 ) + a 23 φ ( s 2 − s 3 ) + a 31 φ ( s 3 − s 1 ) Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 14 / 24

What is the problem? a 12 φ ( s 1 − s 2 ) + a 13 φ ( s 1 − s 3 ) + a 23 φ ( s 2 − s 3 ) + a 31 φ ( s 3 − s 1 ) 1 a13 a12 a31 a23 2 3 More bang for your $$ by increasing to 0 from left: s 1 ↓ . Result: s ∗ = argmin � a ij φ ( s i − s j ) s ij can have s ∗ 2 > s ∗ 1 , even if a 13 − a 31 > a 12 + a 23 . Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 15 / 24

Trying to achieve consistency, II Idea: Use margin-based penalty 3 � ϕ ( s, G ) = φ ( s i − s j − a ij ) ij Inconsistent: Take a ij ≡ c ; can reduce to previous case 1 a13 a12 a31 a23 2 3 3 Shashua and Levin 2002 Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 16 / 24

Ranking is challenging ◮ Inconsistent in general ◮ Low noise settings ◮ Inconsistent for edge-based convex losses � ϕ ( s, G ) = a ij φ ( s i − s j ) ij ◮ Inconsistent for margin-based convex losses � ϕ ( s, G ) = φ ( s i − s j − a ij ) ij ◮ Question: Do tractable consistent losses exist? Yes. Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 17 / 24

A solution in the low noise setting Recall reverse triangle inequality 1 ◮ Idea 1: make loss reduction proportional to weight difference a ij − a ji a13 a12 a31 ◮ Idea 2: regularize to keep loss well-behaved a23 2 3 Theorem: For r strongly convex, following loss is consistent: � � ϕ ( s, G ) = a ij ( s j − s i ) + r ( s j ) ij j Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 18 / 24

Consistency proof sketch Write surrogate, take derivatives: � � � p ( G ) ϕ ( s, G ) = a ij ( s j − s i ) + r ( s j ) ij j G ∂ � ( a ij − a ji ) + r ′ ( s i ) = 0 = ∂s i j Simply note that r ′ is strictly increasing, see that � � ⇔ a ij − a ji > a kj − a jk s i > s k j j Last holds by low-noise assumption. Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 19 / 24

Experimental results ◮ MovieLens dataset: 4 100,000 ratings for 1682 movies by 943 users ◮ Query is user u , results X = { 1 , . . . , 1682 } are movies ◮ Scoring function: f i ( x, u ) = w T ψ ( x i , u ) ◮ ψ maps from movie x i and user u to features ◮ Per-user pair weight a u ij is difference of user’s ratings for movies x i , x j 4 GroupLens Lab, 2008 Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 20 / 24

Surrogate risks Losses based on pairwise comparisons � � ( w T ψ ( x i , u )) 2 a u ij w T ( ψ ( x j , u ) − ψ ( x i , u )) + θ Ours i,j,u i,u � a u � 1 − w T ( ψ ( x j , u ) − ψ ( x i , u )) � Hinge ij + i,j,u � 1 + e w T ( ψ ( x j ,u ) − ψ ( x i ,u )) � � a u Logistic ij log i,j,u Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 21 / 24

Experimental results Test losses for each surrogate (standard error in parenthesis) Num training pairs Hinge Logistic Ours 20000 .478 (.008) .479 (.010) .465 (.006) 40000 .477 (.008) .478 (.010) .464 (.006) 80000 .480 (.007) .478 (.009) .462 (.005) 120000 .477 (.008) .477 (.009) .463 (.006) 160000 .474 (.007) .474 (.007) .461 (.004) Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 22 / 24

On the Consistency of Ranking Algorithms John Duchi Lester Mackey - PowerPoint PPT Presentation

On the Consistency of Ranking Algorithms John Duchi Lester Mackey Michael I. Jordan University of California, Berkeley International Conference on Machine Learning, 2010 Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms

Consistency - Chapter 5 Introduce several notions of Local Consistency: arc consistency,

Constraint Programming - An overview Node-consistency Arc-consistency Path-consistency

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

1 Applications ? Trading Consistency for Performance Applications ? Trading Consistency for

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Consistent Storage or Scalable Storage Why Not Both? CONSISTENCY Strong Consistency

Seminar: Search and Optimization Directional Consistency Gabi R oger Universit at Basel

Advanced consistency methods Chapter 8 ICS-275 Winter 2016 Winter 2016 ICS 275 - Constraint

Consistency algorithms Chapter 3 Fall 2010 1 Consistency methods Approximation of inference:

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

Outline Ranking and skyline Top- k algorithms Skyline algorithms Reconciling top-k

BRAND CONSISTENCY presented by Index Introduction What is Brand Consistency? Why is Brand

Strong Invariants for Weak Consistency Gustavo Petri Marc Shapiro Masoud Saeida-Ardekani

lgebra Linear e Aplicaes MATRIX ALGEBRA Basic definitions A scalar is complex number

Lvy-Khintchine random matrices Paul Jung University of Alabama Birmingham September 21, 2014

Singularly Perturbed Algorithms for Dynamic Average Consensus Solmaz S. Kia, Jorge Corts, Sonia

CS672: Approximation Algorithms Spring 2020 Intro to Semidefinite Programming Instructor:

Outline Problem: identifying an ARX systems via binary sensors Previous solutions typically

Software implementation of correlated quantum chemistry methods. Exploiting advanced programming

Littlewood Richardson coefficients for reflection groups Arkady Berenstein and Edward Richmond*

Re-evaluate Evaluation David Balduzzi, Karl Tuyls, Julien Perolat, Thore Graepel Presented by