on the consistency of ranking algorithms
play

On the Consistency of Ranking Algorithms John Duchi Lester Mackey - PowerPoint PPT Presentation

On the Consistency of Ranking Algorithms John Duchi Lester Mackey Michael I. Jordan University of California, Berkeley International Conference on Machine Learning, 2010 Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms


  1. On the Consistency of Ranking Algorithms John Duchi Lester Mackey Michael I. Jordan University of California, Berkeley International Conference on Machine Learning, 2010 Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 1 / 24

  2. Ranking Goal: Order set of inputs/results to best match the preferences of an individual or a population ◮ Web search: Return most relevant results for user queries ◮ Recommendation systems: ◮ Suggest movies to watch based on user’s past ratings ◮ Suggest news articles to read based on past browsing history ◮ Advertising placement: Maximize profit and click-through Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 2 / 24

  3. Supervised ranking setup Observe: Sequence of training examples 1 ◮ Query q : e.g., search term a12 a13 ◮ Set of results x to rank ◮ Items { 1 , 2 , 3 , 4 } 2 3 ◮ Weighted DAG G representing preferences over results a34 ◮ Item 1 preferred to { 2 , 3 } and item 3 to 4 4 Observe multiple preference graphs for the same Example G with query q and results x x = { 1 , 2 , 3 , 4 } Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 3 / 24

  4. Supervised ranking setup Learn: Scoring function f ( x ) to rank results x ◮ Real-valued score for result i 1 a12 a13 s i := f i ( x ) 2 3 ◮ Result i ranked above j iff f i ( x ) > f j ( x ) a34 ◮ Loss suffered when scores s disagree with preference graph G : 4 � L ( s, G ) = a ij 1 ( s i <s j ) Example G with i,j x = { 1 , 2 , 3 , 4 } Example: L ( s, G ) = a 12 1 ( s 1 <s 2 ) + a 23 1 ( s 1 <s 3 ) + a 34 1 ( s 3 <s 4 ) Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 4 / 24

  5. Supervised ranking setup Example: Scoring function f optimally ranks results in G 1 f 1 ( x ) > f 2 ( x ) 2 f 2 ( x ) > f 3 ( x ) 3 G Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 5 / 24

  6. Detour to classification Consider the simpler problem of classification ◮ Given: Input x , label y ∈ {− 1 , 1 } ◮ Learn: Classification function f ( x ) . Have margin s = yf ( x ) Loss L ( s ) = 1 ( s ≤ 0) Surrogate loss φ ( s ) Hard Tractable Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 6 / 24

  7. Classification and surrogate consistency Question: Does minimizing expected φ -loss minimize expected L ? n →∞ Minimize � n i =1 φ ( y i f ( x i )) ⇒ Minimize E φ ( Y f ( X )) ? ⇐ ⇒ Minimize E L ( Y f ( X )) Theorem: If φ is convex, procedure based on minimizing φ is consistent if and only if φ ′ (0) < 0 . 1 Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 7 / 24 1 Bartlett, Jordan, McAuliffe 2006

  8. What about ranking consistency? Minimization of true ranking loss is hard ◮ Replace ranking loss L ( s, G ) with tractable surrogate ϕ ( s, G ) Question: When is surrogate minimization consistent for ranking? Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 8 / 24

  9. Conditional losses 1 1 1 .5 a12 a12 a13 a21 a23' .5 a23+ .5 a23' .5 a21 2 3 2 3 2 3 a34 a34' .5 a34+ .5 a34' 4 4 4 p ( G 1 ) = . 5 p ( G 2 ) = . 5 Aggregate ◮ ℓ ( p, s ) = � G p ( G | x, q ) L ( s, G ) ◮ ℓ ( p, s ) = . 5 a 21 1 ( s 2 <s 1 ) + . 5( a 12 + a ′ 12 )1 ( s 1 <s 2 ) + . 5( a 23 + a ′ 23 )1 ( s 1 <s 3 ) + . 5( a 34 + a ′ 34 )1 ( s 3 <s 4 ) ◮ Optimal score vectors A ( p ) = argmin ℓ ( p, s ) s Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 9 / 24

  10. Consistency theorem Theorem: Procedure minimizing ϕ is asymptotically consistent if and only if �� � �� � � � inf p ( G ) ϕ ( s, G ) � s �∈ A ( p ) > inf p ( G ) ϕ ( s, G ) � s s G G In other words, ϕ is consistent if and only if minimization gives correct order to the results Goal: Find tractable ϕ so s that minimizes � p ( G ) ϕ ( s, G ) G minimizes ℓ ( p, s ) . Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 10 / 24

  11. Consistent and Tractable? Hard to get consistent and tractable ϕ ◮ In general, it is NP-hard even to find s minimizing � p ( G ) L ( s, G ) . G (reduction from feedback arc-set problem) Some restrictions on the problem space necessary... Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 11 / 24

  12. Low noise setting Definition: Low noise if a ij − a ji > 0 and a jk − a kj > 0 implies a ik − a ki ≥ ( a ij − a ji ) + ( a jk − a kj ) 1 ◮ Intuition: weight on path reinforces local a13 weights, local weights reinforce paths. a12 a31 ◮ Reverse triangle inequality a23 ◮ True when DAG derived from user ratings 2 3 a 13 − a 31 ≥ a 12 + a 23 Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 12 / 24

  13. Trying to achieve consistency Try ideas from classification: φ is convex, bounded below, φ ′ (0) < 0 . Common in ranking literature. 2 � ϕ ( s, G ) = a ij φ ( s i − s j ) ij 1 3 a12 a34 ϕ ( s, G ) = a 12 φ ( s 1 − s 2 ) + a 34 φ ( s 3 − s 4 ) 2 4 Theorem: ϕ is not consistent, even in low noise settings. 2 Herbrich et al., 2000; Freund et al., 2003; Dekel et al., 2004, etc. Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 13 / 24

  14. What is the problem? Surrogate loss ϕ ( s, G ) = � ij a ij φ ( s i − s j ) 1 1 1 a13 a12 a12 a13 a31 a31 a23 a23 2 3 2 3 2 3 p ( G 1 ) = . 5 p ( G 2 ) = . 5 Aggregate p ( G ) ϕ ( s, G ) = 1 2 ϕ ( s, G 1 ) + 1 � 2 ϕ ( s, G 2 ) G ∝ a 12 φ ( s 1 − s 2 ) + a 13 φ ( s 1 − s 3 ) + a 23 φ ( s 2 − s 3 ) + a 31 φ ( s 3 − s 1 ) Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 14 / 24

  15. What is the problem? a 12 φ ( s 1 − s 2 ) + a 13 φ ( s 1 − s 3 ) + a 23 φ ( s 2 − s 3 ) + a 31 φ ( s 3 − s 1 ) 1 a13 a12 a31 a23 2 3 More bang for your $$ by increasing to 0 from left: s 1 ↓ . Result: s ∗ = argmin � a ij φ ( s i − s j ) s ij can have s ∗ 2 > s ∗ 1 , even if a 13 − a 31 > a 12 + a 23 . Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 15 / 24

  16. Trying to achieve consistency, II Idea: Use margin-based penalty 3 � ϕ ( s, G ) = φ ( s i − s j − a ij ) ij Inconsistent: Take a ij ≡ c ; can reduce to previous case 1 a13 a12 a31 a23 2 3 3 Shashua and Levin 2002 Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 16 / 24

  17. Ranking is challenging ◮ Inconsistent in general ◮ Low noise settings ◮ Inconsistent for edge-based convex losses � ϕ ( s, G ) = a ij φ ( s i − s j ) ij ◮ Inconsistent for margin-based convex losses � ϕ ( s, G ) = φ ( s i − s j − a ij ) ij ◮ Question: Do tractable consistent losses exist? Yes. Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 17 / 24

  18. A solution in the low noise setting Recall reverse triangle inequality 1 ◮ Idea 1: make loss reduction proportional to weight difference a ij − a ji a13 a12 a31 ◮ Idea 2: regularize to keep loss well-behaved a23 2 3 Theorem: For r strongly convex, following loss is consistent: � � ϕ ( s, G ) = a ij ( s j − s i ) + r ( s j ) ij j Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 18 / 24

  19. Consistency proof sketch Write surrogate, take derivatives: � � � p ( G ) ϕ ( s, G ) = a ij ( s j − s i ) + r ( s j ) ij j G ∂ � ( a ij − a ji ) + r ′ ( s i ) = 0 = ∂s i j Simply note that r ′ is strictly increasing, see that � � ⇔ a ij − a ji > a kj − a jk s i > s k j j Last holds by low-noise assumption. Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 19 / 24

  20. Experimental results ◮ MovieLens dataset: 4 100,000 ratings for 1682 movies by 943 users ◮ Query is user u , results X = { 1 , . . . , 1682 } are movies ◮ Scoring function: f i ( x, u ) = w T ψ ( x i , u ) ◮ ψ maps from movie x i and user u to features ◮ Per-user pair weight a u ij is difference of user’s ratings for movies x i , x j 4 GroupLens Lab, 2008 Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 20 / 24

  21. Surrogate risks Losses based on pairwise comparisons � � ( w T ψ ( x i , u )) 2 a u ij w T ( ψ ( x j , u ) − ψ ( x i , u )) + θ Ours i,j,u i,u � a u � 1 − w T ( ψ ( x j , u ) − ψ ( x i , u )) � Hinge ij + i,j,u � 1 + e w T ( ψ ( x j ,u ) − ψ ( x i ,u )) � � a u Logistic ij log i,j,u Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 21 / 24

  22. Experimental results Test losses for each surrogate (standard error in parenthesis) Num training pairs Hinge Logistic Ours 20000 .478 (.008) .479 (.010) .465 (.006) 40000 .477 (.008) .478 (.010) .464 (.006) 80000 .480 (.007) .478 (.009) .462 (.005) 120000 .477 (.008) .477 (.009) .463 (.006) 160000 .474 (.007) .474 (.007) .461 (.004) Duchi, Mackey, Jordan (UC Berkeley) Consistency of Ranking Algorithms ICML 2010 22 / 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend