TBD R a v i K u m a r Google Mountain View, CA T heory B ehind D - - PowerPoint PPT Presentation

tbd
SMART_READER_LITE
LIVE PREVIEW

TBD R a v i K u m a r Google Mountain View, CA T heory B ehind D - - PowerPoint PPT Presentation

TBD R a v i K u m a r Google Mountain View, CA T heory B ehind D iscrete choice R a v i K u m a r Google Mountain View, CA (Joint work with Flavio Chierichetti & Andrew Tomkins) Discrete choice Random user { } Slate , , 25% 10%


slide-1
SLIDE 1

TBD

R a v i K u m a r Google Mountain View, CA

slide-2
SLIDE 2

Theory Behind Discrete choice

R a v i K u m a r Google Mountain View, CA (Joint work with Flavio Chierichetti & Andrew Tomkins)

slide-3
SLIDE 3

Discrete choice

{ , , }

Random user Choice distribution 25% 10% 65% Slate

slide-4
SLIDE 4

Discrete choice

Random user How to learn the probability distributions governing the choice in a generic slate? 30% 70%

{ , }

Slate

slide-5
SLIDE 5

Discrete choice

Random user 45% 25% 20% 10%

slide-6
SLIDE 6

Discrete choice

Random user 45% 25% 30%

Quickly learning the winning distributions of the slates is imporuant for applications … but there are exponentially many slates!

slide-7
SLIDE 7

Theory of discrete choice

Universe = [n] = {1, 2, …, n} Slates = non-empty subsets of [n]

  • Model. A function f: slate → distribution over slate

Discrete choice models can codify rational behavior S and T highly overlap ⟹ f(S) and f(T) may be related

slide-8
SLIDE 8

Random utility model (RUM) (Marschak 1960)

  • Exists a distribution 𝕰 on user utilities { [n] →𝕾 }
  • Each user is D ~ 𝕰 iid and will choose highest utility option in

a slate T (ie, argmaxt∈T D(t))

  • Highly overlapping subsets will be related
  • Eg, Pr[j | T] ≥ Pr[j | T ∪ {i}] for j ∈ T and i ∉ T
  • Rational behavior ⟹ the order of utilities determines choice
  • 𝕰 is a distribution on permutations of [n]
slide-9
SLIDE 9

Example

60% 40% Random
 User

{ , }= Slate

40% 60%

slide-10
SLIDE 10

Random
 User

Example

{ , , }= Slate

60% 40% 40% 60% 0%

slide-11
SLIDE 11

Assume a universe [n] and an unknown distribution on the permutations of [n] Given a slate S ⊆ [n], let DS(i) for i ∈ S be the probability that a random permutation (ie, user) prefers i to every other element of S

Formulation

slide-12
SLIDE 12

Learning RUMs

  • Goal. Learn DS, for all S ⊆ [n]
slide-13
SLIDE 13

Observations

The type of queries that we allow can signifjcantly change the hardness of the problem By obtaining O((n/ε)2) random independent permutations (according to the unknown distribution), one can approximate each slate’s winning distribution to within an 𝓂1-error of ε Given a generic slate, return the winning probabilities induced by a random permutation chosen in the set of samples

slide-14
SLIDE 14

Is this reasonable?

1st 2nd

3rd

4th

Click It is easier to ask/infer the preferred

  • ption among those in a slate

The random permutation query is infeasible in many applications

slide-15
SLIDE 15

RUM learning

  • We study RUM learning from the oracle perspective
  • The system can propose slates to random users

and observe which options they select

  • An algorithm can query (adaptively or non-

adaptively) some sequence S1, S2, … of slates to

  • btain their (approximate) winning distributions

DS1(·), DS2(·), ….

slide-16
SLIDE 16

Given a slate S

  • max-sample(S): picks an unknown random

permutation π, and returns the element of S with maximum rank in π

  • max-dist(S): returns DS(i), for all i ∈ S, ie, the

probability that i wins in S given a random permutation

Oracles for RUMs

slide-17
SLIDE 17
  • Even with the more powergul max-dist oracle,

Ω(2n) queries are needed to learn DS exactly

  • With o(2n) queries, there will be some set where

the expected total variation distance is going to be Ω(2-3n/2)

  • Smaller number of queries ⟹ more error

A general lower bound

slide-18
SLIDE 18

What is the hope?

30% 10%

A > B > C > D B > C > A > D

There are only a few types of users

slide-19
SLIDE 19

If there are only k types of users, then

  • Can reconstruct exactly all the DS’s with O(nk)

calls to the max-dist oracle

  • Can reconstruct all the DS’s to within 𝓂1-error
  • f ε with poly(n, k, ε) calls to the max-sample
  • racle

Few user types: Main results

slide-20
SLIDE 20

Efficient versions of RUMs

  • Few user types
  • Multinomial logits (MNLs)
slide-21
SLIDE 21

Multinomial logit (MNL) (Bradley & Terry 1952; Luce 1959)

  • Classical special case of RUMs
  • Model. Given a universe U of items and a positive

weight au for each item u in U For a subset (slate) S of U, the probability of choosing u in slate S is proporuional to au Pr[choosing u in S] = au / ∑v∈S av

slide-22
SLIDE 22

MNL example

2 3 4 1 2 5 2 3 4 1 2 5

> > > > >

Random
 Permutation

Pick the next item in the permutation at random between the remaining ones,
 with probability proporuional to its weight 3/17 5/14 2/9

slide-23
SLIDE 23

1-MNL learning

  • Goal. Learn the weight ai for each i ∈ [n]

Assume for a slate S we get the choice distribution DS(·) exactly (max-dist oracle) For i = 1, …, n-1, query the MNL using slate {i, n} to get the choice distribution Di,n(·) (ai / (ai + an), an / (ai + an))

slide-24
SLIDE 24

A linear system

an/ (a1 + an) = D1,n(n) an / (a2 + an) = D2,n(n) … ∑ ai = 1 Solve the resulting system of linear equations to

  • btain the weights
slide-25
SLIDE 25

1-MNL learning

1-MNL can be learnt with O(n) queries and slates of size 2

slide-26
SLIDE 26

1 1

50% 50%

4

~ 50%

~ 10%

~ 40%

ε ε ε

How good are 1-MNLs?

~ 50%

slide-27
SLIDE 27

W eakness of 1-MNLs

1-MNLs are insuffjcient to capture common setuings

slide-28
SLIDE 28

Mixture of MNLs

  • Modeling distinct populations with 1-MNL causes

the problem

  • Allowing a mixture of population, with a

population-specifjc MNL, can solve the problem

  • New items need not cannibalize equally from all
  • ther items
  • New vegan restaurant afgects only vegans
slide-29
SLIDE 29

2-MNL mixture

2-MNL mixture: Given a universe U of items and positive weights au and bu for each item u in U For a slate S, the probability of choosing u in S equals γ · au / ∑v∈S av + (1 – γ) · bu / ∑v∈S bv Uniform mixture when γ = 1/2

slide-30
SLIDE 30

Power of MNL mixtures

MNL mixtures can approximate arbitrarily well any RUM (McFadden & Train 2000)

slide-31
SLIDE 31

The big picture

Choice models RUMs 1-MNLs k-MNLs

slide-32
SLIDE 32

2-MNL learning

  • Goal: Learn weights ai, bi for each i ∈ [n]
  • Assume for a slate S we get the choice

distribution DS(·) exactly

  • Can show 2-slates are not enough to learn
slide-33
SLIDE 33

2-MNL learning with 3-slates

  • Query the MNL using slates {i, j} and {i, j, k} to get

the choice distributions Di,j(·) and Di,j,k(·) 2 Di,j(i) = ai/(ai + aj) + bi/(bi + bj) 2 Di,j,k(i) = ai/(ai + aj + ak) + bi/(bi + bj + bk)

slide-34
SLIDE 34

A polynomial system

2 Di,j(i) = ai/(ai + aj) + bi/(bi + bj) 2 Di,k(i) = ai/(ai + ak) + bi/(bi + bk) 2 Dj,k(j) = aj/(aj + ak) + bj/(bj + bk) 2 Di,j,k(i) = ai/(ai + aj + ak) + bi/(bi + bj + bk) 2 Di,j,k(j) = aj/(ai + aj + ak) + bj/(bi + bj + bk) ai + aj + ak =1, bi + bj + bk = 1

slide-35
SLIDE 35

Identifiability

  • Theorem. For any uniform 2-MNL and for any set of 3 elements S

= {i, j, k}, the choice distributions of all the subsets of S determine uniquely the weights of i, j, k in each of the two MNLs Proof steps.

  • Paruition the solution space in a discrete number of regions
  • Show that at most one region can contain feasible solutions

and give combinatorial algorithm to determine it

  • Use the structure of the generic region to prove uniqueness
slide-36
SLIDE 36

Patching the unique solutions

  • Query slates {1, 2, 3}, {1, 4, 5}, {1, 6, 7}, …,
  • Find s, t ∈ [n] such that as/at ≠ bs/bt
  • If ai/aj = bi/bj for all i, j, it is a 1-MNL
  • Query slates {1, s, t}, {2, s, t}, {3, s, t}, …
  • ai = a1,s,t(i)·as / a1,s,t(s); bi = b1,s,t(i)·bs / b1,s,t(s)
slide-37
SLIDE 37

2-MNLs: Main results

  • Theorem. There is an adaptive algorithm

pergorming max-dist queries on O(n) slates of sizes 2 and 3, that reconstructs the weights of any uniform 2-MNL system on n elements

  • Theorem. There is a non-adaptive algorithm

pergorming max-dist queries on O(n2) slates of sizes 2 and 3, that reconstructs the weights of any uniform 2-MNL system on n elements

slide-38
SLIDE 38

Conclusions

  • We studied a number of algorithmic problems

related to discrete choice

  • We believe this class of problems is theoretically

imporuant and relevant in practice

slide-39
SLIDE 39

Some open questions

  • What is the relative power of the max-sample /

max-dist oracles?

  • How well can one approximate general mixtures
  • f MNLs with the two oracles?
  • Identifjability of non-uniform 2-MNLs, k-MNLs
  • Distribution testing questions
slide-40
SLIDE 40

Thank you!

Questions/Comments ravi.k53 @ gmail