Comparison-based Choices Johan Ugander Management Science & - - PowerPoint PPT Presentation

comparison based choices
SMART_READER_LITE
LIVE PREVIEW

Comparison-based Choices Johan Ugander Management Science & - - PowerPoint PPT Presentation

Comparison-based Choices Johan Ugander Management Science & Engineering Stanford University Joint work with: Jon Kleinberg (Cornell) Sendhil Mullainathan (Harvard) EC17 Boston June 28, 2017 P r e d i c t i n g d i s c


slide-1
SLIDE 1

Comparison-based Choices

Johan Ugander Management Science & Engineering Stanford University Joint work with: Jon Kleinberg (Cornell)
 Sendhil Mullainathan (Harvard) EC’17 Boston June 28, 2017

slide-2
SLIDE 2

P r e d i c t i n g d i s c r e t e c h

  • i

c e s

  • Classic problem: consumer preferences [Thurstone ’27, Luce ’59],

commuting [McFadden ’78], school choice [Kohn-Manski-Mundel ’76]

slide-3
SLIDE 3

P r e d i c t i n g

  • n

l i n e d i s c r e t e c h

  • i

c e s

slide-4
SLIDE 4

P r e d i c t i n g

  • n

l i n e d i s c r e t e c h

  • i

c e s

How well can we learn/predict “choice set effects”?


a.k.a. violations of the “independence of irrelevant alternatives” (IIA)

  • [Sheffet-Mishra-Ieong ICML 2012, Yin et al. WSDM 2014]
slide-5
SLIDE 5
  • Bias towards moderation, compromise effect

C h

  • i

c e s e t e f f e c t s

  • [Simonson 1989, Simonson-Tversky 1992, 


Kamenica 2008, Trueblood 2013]

slide-6
SLIDE 6
  • Bias towards moderation, compromise effect

C h

  • i

c e s e t e f f e c t s

weight megapixels

  • [Simonson 1989, Simonson-Tversky 1992, 


Kamenica 2008, Trueblood 2013]

slide-7
SLIDE 7
  • Bias towards moderation, compromise effect

C h

  • i

c e s e t e f f e c t s

weight megapixels

  • [Simonson 1989, Simonson-Tversky 1992, 


Kamenica 2008, Trueblood 2013]

slide-8
SLIDE 8
  • Bias towards moderation, compromise effect
  • Similarity aversion

C h

  • i

c e s e t e f f e c t s

weight megapixels weight megapixels

  • [Simonson 1989, Simonson-Tversky 1992, 


Kamenica 2008, Trueblood 2013]

slide-9
SLIDE 9
  • Bias towards moderation, compromise effect
  • Similarity aversion

C h

  • i

c e s e t e f f e c t s

weight megapixels weight megapixels

  • [Simonson 1989, Simonson-Tversky 1992, 


Kamenica 2008, Trueblood 2013]

slide-10
SLIDE 10

C h

  • i

c e s e t e f f e c t s

weight megapixels

Similarity requires “distance” Ordinal comparisons

weight megapixels

  • [Simonson 1989, Simonson-Tversky 1992, 


Kamenica 2008, Trueblood 2013]

  • Bias towards moderation, compromise effect
  • Similarity aversion
slide-11
SLIDE 11

T h e p r e s e n t w

  • r

k

  • Focused on comparison-based functions.
  • Investigate asymptotic query complexity: if an agent makes

comparison-based choices, how hard to learn their choice function?

  • Assume population is not learning, meaning choice set effects 


are not “transient irrationality”.

  • Several query frameworks:
  • Active queries vs. passive stream of queries
  • Fixed choice function vs. mixture of choice functions
slide-12
SLIDE 12

T h e p r e s e n t w

  • r

k

  • Focused on comparison-based functions.
  • Investigate asymptotic query complexity: if an agent makes

comparison-based choices, how hard to learn their choice function?

  • Assume population is not learning, meaning choice set effects 


are not “transient irrationality”.

  • Several query frameworks:
  • Active queries vs. passive stream of queries
  • Fixed choice function vs. mixture of choice functions
  • Basic takeaway: comparison-based functions in one dimension 


(still rich!) are no harder to learn than binary comparisons (sorting).

slide-13
SLIDE 13

C

  • m

p a r i s

  • n
  • b

a s e d c h

  • i

c e f u n c t i

  • n

s

  • Definition: Given a set of alternatives U, a choice function f maps 


every non-empty S⊆U to an element u∈S.

  • Example:



 
 U: f( ) = S u

slide-14
SLIDE 14

C

  • m

p a r i s

  • n
  • b

a s e d c h

  • i

c e f u n c t i

  • n

s

  • Definition: Given a set of alternatives U, a choice function f maps 


every non-empty S⊆U to an element u∈S.

  • Example:



 
 U: f( ) =

  • Embedding items:
  • Consider U as embedded in attribute space, h:U->X
  • For X = ℝ1, h(ui) are utilities:

S u

b a d c e

slide-15
SLIDE 15

C

  • m

p a r i s

  • n
  • b

a s e d c h

  • i

c e f u n c t i

  • n

s

  • Definition: Given a set of alternatives U, a choice function f maps 


every non-empty S⊆U to an element u∈S.

  • Example:



 
 U: f( ) =

  • Embedding items:
  • Consider U as embedded in attribute space, h:U->X
  • For X = ℝ1, h(ui) are utilities:
  • Comparison-based functions:
  • Definition: Choice functions that can be written as comparisons

(<,>,=) over {h(ui): ui∈S}. S u

b a d c e

slide-16
SLIDE 16
  • In one dimension, comparison-based functions are all 


position-selection functions: select ℓ-of-k.

  • Example: k=4, ℓ=2

C

  • m

p a r i s

  • n
  • b

a s e d c h

  • i

c e f u n c t i

  • n

s

b a c d a d c b f(S) = b

slide-17
SLIDE 17
  • In one dimension, comparison-based functions are all 


position-selection functions: select ℓ-of-k.

  • Example: k=4, ℓ=2
  • Selecting 1-of-2 is sorting.
  • Focus on k-sets S with fixed k.

C

  • m

p a r i s

  • n
  • b

a s e d c h

  • i

c e f u n c t i

  • n

s

b a c d a d c b f(S) = b

slide-18
SLIDE 18

C

  • m

p a r i s

  • n
  • b

a s e d c h

  • i

c e f u n c t i

  • n

s

b a c d a d c b c d

f(S) = c

b e e f(S) = b

  • In one dimension, comparison-based functions are all 


position-selection functions: select ℓ-of-k.

  • Example: k=4, ℓ=2
  • Selecting 1-of-2 is sorting.
  • Focus on k-sets S with fixed k.
  • Position-selection functions exhibit choice set effects.
slide-19
SLIDE 19

Q u e r y c

  • m

p l e x i t y

  • Observe sequence of (choice set, choice) pairs (S, f(S)).
  • How many do we need to observe to report f(S) for (almost) all S?
slide-20
SLIDE 20

Q u e r y c

  • m

p l e x i t y

  • Observe sequence of (choice set, choice) pairs (S, f(S)).
  • How many do we need to observe to report f(S) for (almost) all S?
  • Active vs. passive queries
  • Active: can choose what k-set S to query next, sequentially.
  • Passive: Stream of random k-sets S.
  • Fixed vs. mixed choice functions
  • Fixed: all queries of same -of-k function.
  • Mixed: mixture of different positions selected.

` (π1, ..., πk)

slide-21
SLIDE 21

Q u e r y c

  • m

p l e x i t y , b i n a r y c h

  • i

c e s

  • How does sorting (1-of-2) fit in this query complexity framework?
  • Mixed binary choice functions map to (p,1-p) noisy sorting.

Fixed Mixed Active Sorting from comparisons O(n log n) Sorting with 
 noisy comparisons
 (Feige et al. 1994) 
 O(n log n)
 Passive Sorting in one round
 (Alon-Azar 1988) 
 O(n log n loglog n) 
 ?

slide-22
SLIDE 22

Q u e r y c

  • m

p l e x i t y , k

  • s

e t c h

  • i

c e s

  • Sorting results translated to position-selection functions:

Fixed Mixed Active Two-phase algorithm
 O(n log n) 
 Adaptation of two-phase algorithm
 O(n log n)
 Passive Streaming model
 O(nk-1 log n loglog n) ?

slide-23
SLIDE 23
  • Phase 1: find “ineligible alternatives” via a discard algorithm

Q u e r y c

  • m

p l e x i t y : a c t i v e , fi x e d

` − 1 item(s) k − ` item(s) b a c d = ineligible alternatives S∗ = S−2 = { } { }

f(S) = b

a d c b

slide-24
SLIDE 24
  • Phase 1: find “ineligible alternatives” via a discard algorithm
  • Phase 2: Pad a choice set with ineligible alternatives, do binary sort.

Q u e r y c

  • m

p l e x i t y : a c t i v e , fi x e d

` − 1 item(s) k − ` item(s) b a c d = ineligible alternatives S∗ = S−2 = { } { }

f(S) = b

a d c b

slide-25
SLIDE 25
  • Phase 1: find “ineligible alternatives” via a discard algorithm
  • Phase 2: Pad a choice set with ineligible alternatives, do binary sort.
  • O(n) queries in discard algorithm, O(n log n) queries to sort.
  • Only recovers order, not orientation: don’t know if “padded sort” is a

“max” or a “min”, but not needed to recover f(S) for ever S.

  • Algorithm doesn’t depend on what position is being selected for.

Q u e r y c

  • m

p l e x i t y : a c t i v e , fi x e d

` − 1 item(s) k − ` item(s) b a c d = ineligible alternatives S∗ = S−2 = { } { }

f(S) = b

a d c b

slide-26
SLIDE 26

Q u e r y c

  • m

p l e x i t y : a c t i v e , m i x e d

  • Instead of -of-k, mixture of positions with probabilities ,


constant separation.

  • 0: Estimate probabilities of each position by studying a k+1-set closely.
  • 1: Run discard phase O(log n) times, find “max-ineligible alternatives”
  • 2: Can then pad choice set and run a “noisy max” with (max, min, fail)
  • utcomes instead of (max, min) outcomes as in (Feige et al. 1994).

(π1, ..., πk) `

b a c d a d c b f(S) = b

slide-27
SLIDE 27

Q u e r y c

  • m

p l e x i t y : a c t i v e , m i x e d

  • Instead of -of-k, mixture of positions with probabilities ,


constant separation.

  • 0: Estimate probabilities of each position by studying a k+1-set closely.
  • 1: Run discard phase O(log n) times, find “max-ineligible alternatives”
  • 2: Can then pad choice set and run a “noisy max” with (max, min, fail)
  • utcomes instead of (max, min) outcomes as in (Feige et al. 1994).
  • O(1) queries estimate probabilities, O(n log n) queries in discard

algorithm, O(n log n) queries to sort.

  • Need to book-keep many failure probabilities, but straight forward.

(π1, ..., πk) `

b a c d a d c b f(S) = b

slide-28
SLIDE 28

Q u e r y c

  • m

p l e x i t y : p a s s i v e , fi x e d

  • Passive query model: Poisson process where each k-set enters


the stream with equal rate α.

  • See a given k-set in interval [0,T] with probability pT.
  • How long an interval [0,T] do we need to observe stream?
  • Phase 1: use queries in [0,T1], with T1 large enough so that 


all items except ineligible alternatives are chosen.

  • Phase 2: Simulate pairwise comparisons using queries where k-2 of

the elements are ineligible.

slide-29
SLIDE 29

Q u e r y c

  • m

p l e x i t y : p a s s i v e , fi x e d

  • Passive query model: Poisson process where each k-set enters


the stream with equal rate α.

  • See a given k-set in interval [0,T] with probability pT.
  • How long an interval [0,T] do we need to observe stream?
  • Phase 1: use queries in [0,T1], with T1 large enough so that 


all items except ineligible alternatives are chosen.

  • Phase 2: Simulate pairwise comparisons using queries where k-2 of

the elements are ineligible.

  • For Phase 2 to work, need pT to be O(log n loglog n / n). End up 


seeing ~log(n)/n fraction of all (n choose k) choice sets.

  • For k≥3, proof only works for positions 1<ℓ<k, not ℓ=1 or ℓ=k,


which breaks our analysis (pT ↛ 0).

slide-30
SLIDE 30

Fixed Mixed Active Two-phase algorithm
 O(n log n) 
 No new difficulties
 O(n log n)
 Passive Streaming model
 O(nk-1 log n loglog n) ?

Q u e r y c

  • m

p l e x i t y , k

  • s

e t c h

  • i

c e s

  • Sorting results translated to position-selection functions:
  • Immediate questions:
  • Better algo for passive stream; “sorting in one noisy round”; 


higher-dim comparison functions; distance-comparison.
 


slide-31
SLIDE 31

D i s t a n c e

  • c
  • m

p a r i s

  • n
  • b

a s e d c h

  • i

c e

  • Distance-comparison-based functions 


are comparison functions on the 
 set of pairwise distances.

a b c

slide-32
SLIDE 32

D i s t a n c e

  • c
  • m

p a r i s

  • n
  • b

a s e d c h

  • i

c e

  • Distance-comparison-based functions 


are comparison functions on the 
 set of pairwise distances.

  • Distance-comparison vs. comparison functions are quite different.

comparison distance
 comparison

a b c

slide-33
SLIDE 33

D i s t a n c e

  • c
  • m

p a r i s

  • n
  • b

a s e d c h

  • i

c e

  • Distance-comparison-based functions 


are comparison functions on the 
 set of pairwise distances.

  • Distance-comparison vs. comparison functions are quite different.
  • Comparison functions:
  • Can not express similarity (only order)
  • Distance-comparison functions:
  • Can not maximize or minimize (distances are all internal to set)

comparison distance
 comparison

a b c

1D median

slide-34
SLIDE 34

D i s t a n c e

  • c
  • m

p a r i s

  • n
  • b

a s e d c h

  • i

c e

  • Distance-comparison-based functions 


are comparison functions on the 
 set of pairwise distances.

  • Paper poses many questions about distance-comparison, 


few answers.

  • Related to open learning questions for:
  • Crowd median algorithm [Heikinheimo-Ukkonen 2013]
  • Stochastic triplet embedding [Van Der Maaten-Weinberger 2012]
  • Crowdsourced clustering [Vinayak-Hassibi 2016]
  • Metric embedding [Schultz-Joachims 2004].

a b c

slide-35
SLIDE 35

S u m m a r y

  • Inference for comparison-based functions generally not more difficult


than sorting.

  • Active vs. passive, fixed vs. mixed query complexity frameworks.
  • Open questions:
  • Results for high-dim (EBA?), distance-comparison, RUMs.
  • Learning/non-static agents?
  • Other recent work:
  • [Benson et al. WWW’16] “On the relevance of irrelevant alternatives”
  • [Ugander-Ragain, NIPS’16] Markov chain model generalizing BTL/MNL, can violate IIA.
  • [Maystre-Grossglauser ICML’17] For BTL with ~uniform quality, log5(n) independent

Quicksorts recover exact rank for almost all items.

  • [Peysakhovich-Ugander NetEcon’17] Machine learning adaptation of the Simonson-

Tversky model for contextual utility.