Rank Aggregation from Pairwise Comparisons in the Presence of - PowerPoint PPT Presentation

Rank Aggregation from Pairwise Comparisons in the Presence of Adversarial Corruptions Arpit Agarwal, Shivani Agarwal, Sanjeev Khanna, Prathamesh Patil ICML 2020

Rank Aggregation from Pairwise Comparisons ≻ In many practical applications, the available data comes in the form of comparisons and choices. Aggregating these partial preferences into a complete ≻ ordering is important in order to understand user behavior and predict future behavior. Applications include e-commerce, recommendation ≻ systems, and information retrieval. ⋮

Need for Robustness Rank aggregation algorithms play a critical role in modern web applications. Determining product placement, Ordering search results, Providing recommendations. Their significant economic and societal impact provides strong incentives for malicious players to manipulate the comparison data in order to skew the outcome in their favor. Voter fraud in elections, Inflated purchases in e-commerce, Click fraud in online advertising, Designing rank aggregation algorithms that are robust to adversarial corruptions in input comparison data is a crucial challenge.

Our Contribution We initiate the study of robustness in rank aggregation from pairwise comparisons under the Bradley-Terry-Luce model. We propose a powerful adversarial contamination model, under which ★ Given arbitrary comparison data, we exactly characterize the extent of contamination that can be tolerated up to which the true BTL model parameters are uniquely identifiable. ★ We show that robustness to adversarial contamination is a structural property of the comparison data itself. Not all data are created equal! ★ For a natural family of comparison data (Erd ő s-Rényi comparison graphs), we present a near- quadratic time algorithm (based on Linear Programming) for parameter recovery from comparison data containing a non-trivial fraction of contamination.

Outline Preliminaries ‣ Bradley-Terry-Luce Model ‣ Comparison Graphs Adversarial Contamination Model Condition for Unique Identifiability ‣ Robustness as a Structural Property Results for Erd ő s-Rényi Comparison Graphs ‣ A Sharp Threshold Condition for Identifiability ‣ Algorithm for Parameter Recovery

The Bradley-Terry-Luce Model [Zermelo, 1928; Bradley & Terry, 1952; Luce, 1959] It is a comparison model used to explain outcomes of pairwise comparisons. Given a universe of items/alternatives, associates a positive weight n w i > 0 with each item , and posits that for any pair , i ∈ [ n ] i , j ∈ [ n ] × [ n ] w i P ( i ≻ j ) = w i + w j Given data consisting of pairwise comparisons whose outcomes are assumed to be drawn according to the BTL model, the objective is typically to recover the underlying item weights (up to multiplicative scaling). w

̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ Comparison Data Weighted Comparison Graph ≡ Comparison data, which consists of pairs of items and the observed { i , j } probability with which beats induces a weighted graph , where p ij i j G = ( V , E ) • The vertex set corresponds to the set of items . V [ n ] • An edge i ff items were compared. { i , j } ∈ E { i , j } • If an edge , then its weight is . { i , j } ∈ E p ij j i p ij p 2 i p jk p 1 i 2 k p 2 n p jn p 12 n 1

̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ Adversarial Contamination Model j i p ij “Truthful Estimate” consistent with : w * p ij p 2 i p jk is a good approximation for the true probability p ij w * p 1 i 2 k i p ij ≈ p * ij = w * i + w * p jn p 2 n p 12 j n 1 Practical example: is the empirical fraction of times beats out Nature generates a comparison graph p ij i j G * = ([ n ], E *) of independent comparisons between them. L Each edge is labeled with a { i , j } ∈ E * truthful estimate consistent with a p ij BTL model with (unknown) weights w *

̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ Adversarial Contamination Model j i j i p ij p ij p 2 i p jk p 2 i p jk Adversary p 1 i p 1 i 2 k 2 k p jn p 2 n p 2 n p jn p 12 p 12 n 1 n 1 Nature generates a comparison graph Contaminated comparison graph G * = ([ n ], E *) G = ([ n ], E ) Each edge is labeled with a { i , j } ∈ E * truthful estimate consistent with a p ij BTL model with (unknown) weights w *

̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ Adversarial Contamination Model j i j i p ij p ij p 2 i p jk p 2 i p jk Adversary p 2 j p 1 i p 1 i 2 k 2 k p jn p 2 n Add edges with spurious labels p 2 n p jn p 12 p 12 p 1 n n 1 n 1 Nature generates a comparison graph Contaminated comparison graph G * = ([ n ], E *) G = ([ n ], E ) Each edge is labeled with a { i , j } ∈ E * truthful estimate consistent with a p ij BTL model with (unknown) weights w *

̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ Adversarial Contamination Model j i j i p ij p ij p 2 i p jk p 2 i p jk Adversary p 2 j p 1 i 2 k 2 k p jn p 2 n Add edges with spurious labels p 2 n p jn p 12 Delete existing edges and their labels p 1 n n 1 n 1 Nature generates a comparison graph Contaminated comparison graph G * = ([ n ], E *) G = ([ n ], E ) Each edge is labeled with a { i , j } ∈ E * truthful estimate consistent with a p ij BTL model with (unknown) weights w *

̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ Adversarial Contamination Model j i j i p ij p ij p jk p 2 i p jk p 2 i Adversary p 2 j p 1 i 2 k 2 k p jn p 2 n Add edges with spurious labels p 2 n p jn p 12 Delete existing edges and their labels p 1 n n 1 n 1 Corrupt labels on existing edges Nature generates a comparison graph Contaminated comparison graph G * = ([ n ], E *) G = ([ n ], E ) Each edge is labeled with a { i , j } ∈ E * truthful estimate consistent with a p ij BTL model with (unknown) weights w * Received as Input

Existing Methods… Don’t Work Parameter estimation under the (uncontaminated) BTL model has received a lot of attention in the ML | community, and is a very well understood problem. Negahban et al., 2012 Hajek et al., 2014 E ffi cient, consistent algorithms for parameter estimation in the uncontaminated setting. Chen and Suh., 2015 Maystre and Grossglauser, 2015 However… these are not robust. Shah et al., 2016 Agarwal et al., 2018 Crucially rely on the assumption that input data is Hendrickx et al., 2019 truthfully generated. Chen et al., 2019 ⋮ Their recovery guarantees do not hold in the presence of adversarial corruptions!

A Challenging Example 2 4 2 4 p * 45 = 1/3 p * 45 = 1/3 Adversary p * 14 = 1/3 p * 14 = 1/3 p * 34 = 1/2 p * 34 = 1/2 p * 12 = 1/2 p * 12 = 1/2 5 5 p * 35 = 1/3 p * 35 = 1/3 1 3 1 3 Truthful comparison graph entirely consistent with w * = (1,1,2,2,4)/10

A Challenging Example 2 4 2 4 p * 45 = 1/3 p * 45 = 1/3 Adversary p * 14 = 1/3 p 14 = 3/4 p * 34 = 1/2 p * 34 = 1/2 p * 12 = 1/2 p * 12 = 1/2 5 5 p * 35 = 1/3 p * 35 = 1/3 1 3 1 3 Contaminated graph entirely consistent with Truthful comparison graph entirely consistent with w = (3,3,1,1,2)/10 w * = (1,1,2,2,4)/10 No evidence of corruption in the contaminated graph! Items with the lowest scores have highest scores post corruption!

Exact Condition for Identifiability of w * Theorem 1. (Cut Majority Condition) Given an arbitrary, contaminated comparison graph , the true weights G w * are uniquely identifiable if and only if every cut in has strictly more uncorrupted G edges than corrupted edges crossing the cut.

Takeaway: Robustness is a Structural Property The structure of the comparison graph plays a crucial role in determining resilience to adversarial corruption. Fraction of corrupted edges incident on any vertex is , yet the cut majority condition fails. ≤ O (1/ n ) Bad news! Certain topologies are fundamentally vulnerable to adversarial contamination. For such topologies, even a marginal amount of corruption can make parameter recovery fundamentally impossible. Sparse cuts across dense subgraphs can easily be exploited, even by a limited budget adversary!

Rank Aggregation from Pairwise Comparisons in the Presence of - PowerPoint PPT Presentation

Rank Aggregation from Pairwise Comparisons in the Presence of Adversarial Corruptions Arpit Agarwal, Shivani Agarwal, Sanjeev Khanna, Prathamesh Patil ICML 2020 Rank Aggregation from Pairwise Comparisons In many practical applications, the

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Graph Resistance and Learning from Pairwise Comparisons pairwise comparisons of items. In

Part 16: Group Recommender Systems Rank Aggregation and Balancing Techniques Francesco Ricci

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

Pairwise Comparisons with Flexible Time-Dynamics Lucas Maystre , Victor Kristof, Matthias

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Course : Data mining Topic : Rank aggregation Aristides Gionis Aalto University Department of

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &

Case Comparisons Department of Government London School of Economics and Political Science Uses

Pairwise sequence alignments Volker Flegel Vassilios Ioannidis VI - 2004 Page 1 Outline

Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise Alignment Review: Tips and

Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice

PAIRWISE DECOMPOSITION OF IMAGE SEQUENCES FOR ACTIVE MULTI-VIEW RECOGNITION(EXPERIMENT)

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions MLSIG

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Introduction dynamical system modelling Pierre Nouvellet pierre.nouvellet@sussex.ac.uk

Raiders of the Lost Cart How APMEX mined a hidden treasure from abandoned orders and captured 13%

What is disaster recovery? The webinar will start at 14:00 AEDT We do not run audio over

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

Recursive Low-rank and Sparse Recovery of Surveillance Video using Compressed Sensing Shuangjiang

5 th Annual Stetson Wetlands Workshop: Using Compensatory Mitigation to Offset Coastal Wetland

Recovery Methods 5DV120 Database System Principles Ume a University Department of

Allman & Kaas, 1981 Zeki,

Sambuz

Useful Links

Newsletter

Mail Us

Rank Aggregation from Pairwise Comparisons in the Presence of - PowerPoint PPT Presentation

Rank Aggregation from Pairwise Comparisons in the Presence of Adversarial Corruptions Arpit Agarwal, Shivani Agarwal, Sanjeev Khanna, Prathamesh Patil ICML 2020 Rank Aggregation from Pairwise Comparisons In many practical applications, the

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Graph Resistance and Learning from Pairwise Comparisons pairwise comparisons of items. In

Part 16: Group Recommender Systems Rank Aggregation and Balancing Techniques Francesco Ricci

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

Pairwise Comparisons with Flexible Time-Dynamics Lucas Maystre , Victor Kristof, Matthias

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Course : Data mining Topic : Rank aggregation Aristides Gionis Aalto University Department of

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &amp;

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &amp;

Case Comparisons Department of Government London School of Economics and Political Science Uses

Pairwise sequence alignments Volker Flegel Vassilios Ioannidis VI - 2004 Page 1 Outline

Pairwise Alignment Mark Voorhies 3/27/2012 Mark Voorhies Pairwise Alignment Review: Tips and

Pairwise Sequence Alignment: Dynamic Programming Algorithms COMP 571 Luay Nakhleh, Rice

PAIRWISE DECOMPOSITION OF IMAGE SEQUENCES FOR ACTIVE MULTI-VIEW RECOGNITION(EXPERIMENT)

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions MLSIG

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Introduction dynamical system modelling Pierre Nouvellet pierre.nouvellet@sussex.ac.uk

Raiders of the Lost Cart How APMEX mined a hidden treasure from abandoned orders and captured 13%

What is disaster recovery? The webinar will start at 14:00 AEDT We do not run audio over

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

Recursive Low-rank and Sparse Recovery of Surveillance Video using Compressed Sensing Shuangjiang

5 th Annual Stetson Wetlands Workshop: Using Compensatory Mitigation to Offset Coastal Wetland

Recovery Methods 5DV120 Database System Principles Ume a University Department of

Allman &amp; Kaas, 1981 Zeki,

Sambuz

Useful Links

Newsletter

Mail Us

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &

Allman & Kaas, 1981 Zeki,