Just Sort It! A Simple and E ff ective Approach to Active Preference - - PowerPoint PPT Presentation

just sort it
SMART_READER_LITE
LIVE PREVIEW

Just Sort It! A Simple and E ff ective Approach to Active Preference - - PowerPoint PPT Presentation

Just Sort It! A Simple and E ff ective Approach to Active Preference Learning Lucas Maystre, Matthias Grossglauser School of Computer and Communication Sciences, EPFL ICML August 8 th , 2017 Goal Learning a ranking from noisy pairwise


slide-1
SLIDE 1

Just Sort It!

A Simple and Effective Approach to Active Preference Learning

Lucas Maystre, Matthias Grossglauser School of Computer and Communication Sciences, EPFL

ICML — August 8th, 2017

slide-2
SLIDE 2

Learning a ranking from noisy pairwise comparisons.

Goal

2

Recover the ranking accurately, but sample sparingly. some outcomes are
 inconsistent with the ranking

choose pairs of items adaptively based on previous observations

slide-3
SLIDE 3

3

...

slide-4
SLIDE 4

ML estimate Ground-truth ranking Outputs of Quicksort with noise

Main Idea

4

Use a sorting algorithm! Quicksort Quicksort Quicksort ML estimator ranking Random ranking

slide-5
SLIDE 5

Why Sorting-based AL ?

5

Prior work: greedy active learning strategies
 [Houlsby et al. NIPS 2012, Chen et al. WSDM 2013, Wang et al. KDD 2014, ...]

r details. T [s] Strategy n = 102 n = 103 n = 104 uncertainty 0.05 0.5 11 entropy 0.3 40 — KL-divergence 0.9 71 — Mergesort ε ε ε Quicksort ε ε ε random ε ε ε

Time (in seconds) to select the n+1-st pair among n items ε < 10-5


  • rders of magnitude faster

... and simpler to implement

slide-6
SLIDE 6

This Work

6

Quicksort Quicksort Quicksort ML estimator ranking

Theory Accuracy of output of single call to Quicksort Practice Empirical evaluation of sorting- based active learning on real data vs.

slide-7
SLIDE 7

Noise Model

7

Bradley-Terry model [Zermelo 1928, Bradley & Terry 1952]

R

θ1 θ2 θ3

p(i j) = 1 1 + e−(θi−θj)

error is likely error is unlikely

slide-8
SLIDE 8

Sorting Algorithm

We analyze Quicksort.

8

2 5 4 7 1 3 6 2 4 1 3 5 7 6 2 4 3 6 1 5 7 2 4 3 1 5 6 7 5 1 6 3

slide-9
SLIDE 9

Model Parameters

9

Difficulty of ranking the items depends on |θ2 - θ1|, |θ3 - θ2|, ... Our approach: postulate a distribution over the parameters s.t. E [|θi+1 − θi|] = λ−1

controls the average amount of noise

i.i.d. θi+1 − θi ∼ Exp(λ) We assume that parameters are drawn from a Poisson point process.

n = 20

p(i j) = 1 1 + e−(θi−θj)

slide-10
SLIDE 10

Main Result

We measure the rank displacement of Quicksort's output σ.

10

∆(σ) =

n

X

i=1

|σ(i) − i|

Rank of item i in Quicksort's output True rank of item i

max

i

|σ(i) − i| = O(λ log n) with high probability, ∆(σ) = O(λ2n) with high probability, Theorem: if noise is Bradley-Terry and i.i.d. θi+1 − θi ∼ Exp(λ)

slide-11
SLIDE 11

Sketch of Proof

Lemma: Displacement of Quicksort's

  • utput can be bounded by a sum over

“individual errors”.

11

∆(σ) ≤ 2 X

(i,j)∈E

|i − j| ∆(σ) = O(λ2n) w.h.p., Does not assume anything about the noise generating process.

pairs sampled by Quicksort that resulted in an error

zij = ( 1 if outcome is incorrect

  • therwise

E [∆(σ)] ≤ 2 X

i<j

|i − j|E [zij] ≤ 2n

X

k=1

kE [zi,i+k] = O(λ2n)

decreases exponentially fast in k comparison pair

slide-12
SLIDE 12

Experimental Results

12

2000 4000 6000 8000 10000 Number of comparisons c 500 1000 1500 2000 2500 Displacement

Sushi dataset

uncertainty entropy KL-div Mergesort Quicksort random

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Number of comparisons c ×106 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 ×106

GIFGIF dataset

Mergesort Quicksort random

Matches the performance of alternatives at a fraction of computational cost.

slide-13
SLIDE 13

Conclusions

13

  • Use sorting algorithms to learn a

ranking from noisy comparisons!

  • works well in practice
  • computationally inexpensive
  • Some theoretical results under

assumptions on the noise GIFGIF dataset 6170 animated GIF images
 2.7+ million pairwise comparisons

slide-14
SLIDE 14

14

slide-15
SLIDE 15

How To Select Comparison Pairs?

15

Batch setting Role of the spectral gap of the comparison graph. [Negahban et al. NIPS 2012]
 [Hajek et al. NIPS 2014]
 [Vojnovic et al. ICML 2016] Sequential setting Greedy active learning strategies (EIG, uncertainty sampling, ...) [Houlsby et al. NIPS 2012]
 [Chen et al. WSDM 2013]
 [Ailon et al. NIPS 2011] Bandit approaches Dueling bandits [Yue et al. COLT 2009] [Szörényi et al. NIPS 2015]
 [Heckel et al. arXiv 2016]

slide-16
SLIDE 16

Empirical Validation

16

2000 4000 6000 8000 10000 Number of items n 1 2 3 4 5 6 7 8 9 ×104

∆(σ)

λ = 2 λ = 4 λ = 6 λ = 8

101 102 103 104 105 Number of items n 20 40 60 80 100

maxi |σ(i)− i|

λ = 2 λ = 4 λ = 6 λ = 8

slide-17
SLIDE 17

Practical AL Strategy

In practice: comparison budget is more than that of a single call to Quicksort.

17

20 40 60 80 100 Number of runs m 100 200 300 400 500 600 700 800 900

∆( ˆ σ)

Copeland ML estimate

n = 200, λ = 4.0