tbd
play

TBD R a v i K u m a r Google Mountain View, CA T heory B ehind D - PowerPoint PPT Presentation

TBD R a v i K u m a r Google Mountain View, CA T heory B ehind D iscrete choice R a v i K u m a r Google Mountain View, CA (Joint work with Flavio Chierichetti & Andrew Tomkins) Discrete choice Random user { } Slate , , 25% 10%


  1. TBD R a v i K u m a r Google Mountain View, CA

  2. T heory B ehind D iscrete choice R a v i K u m a r Google Mountain View, CA (Joint work with Flavio Chierichetti & Andrew Tomkins)

  3. Discrete choice Random user { } Slate , , 25% 10% 65% Choice distribution

  4. Discrete choice Random user { } Slate , 30% 70% How to learn the probability distributions governing the choice in a generic slate?

  5. Discrete choice 45% 25% 20% 10% Random user

  6. Discrete choice 45% 25% 30% Quickly learning the winning distributions of the slates is impo ru ant for applications … but there are exponentially many slates! Random user

  7. Theory of discrete choice Universe = [n] = {1, 2, …, n} Slates = non-empty subsets of [n] Model. A function f: slate → distribution over slate Discrete choice models can codify rational behavior S and T highly overlap ⟹ f(S) and f(T) may be related

  8. Random utility model ( RUM ) ( Marschak 1960 ) • Exists a distribution 𝕰 on user utilities { [n] → 𝕾 } • Each user is D ~ 𝕰 iid and will choose highest utility option in a slate T (ie, argmax t ∈ T D(t)) • Highly overlapping subsets will be related • Eg, Pr[j | T] ≥ Pr[j | T ∪ {i}] for j ∈ T and i ∉ T • Rational behavior ⟹ the order of utilities determines choice • 𝕰 is a distribution on permutations of [n]

  9. Example 60% 40% 40% 60% Random 
 { } = Slate User ,

  10. Example 60% 40% 0% 60% 40% Random 
 { } = Slate User , ,

  11. Formulation Assume a universe [n] and an unknown distribution on the permutations of [n] Given a slate S ⊆ [n], let D S (i) for i ∈ S be the probability that a random permutation (ie, user) prefers i to every other element of S

  12. Learning RUMs Goal . Learn D S , for all S ⊆ [n]

  13. Observations The type of queries that we allow can signi fj cantly change the hardness of the problem By obtaining O((n/ ε ) 2 ) random independent permutations (according to the unknown distribution), one can approximate each slate’s winning distribution to within an 𝓂 1 -error of ε Given a generic slate, return the winning probabilities induced by a random permutation chosen in the set of samples

  14. Is this reasonable? 4 th 1 st Click 3 rd 2 nd The random permutation query is It is easier to ask/infer the preferred infeasible in many applications option among those in a slate

  15. RUM learning • We study RUM learning from the oracle perspective • The system can propose slates to random users and observe which options they select • An algorithm can query (adaptively or non- adaptively) some sequence S 1 , S 2 , … of slates to obtain their (approximate) winning distributions D S1 ( · ), D S2 ( · ), ….

  16. Oracles for RUMs Given a slate S • max-sample (S): picks an unknown random permutation π , and returns the element of S with maximum rank in π • max-dist (S): returns D S (i), for all i ∈ S, ie, the probability that i wins in S given a random permutation

  17. A general lower bound • Even with the more powe rg ul max-dist oracle, Ω (2 n ) queries are needed to learn D S exactly • With o(2 n ) queries, there will be some set where the expected total variation distance is going to be Ω (2 -3n/2 ) • Smaller number of queries ⟹ more error

  18. What is the hope? A > B > C > D 30% B > C > A > D 10% There are only a few types of users

  19. Few user types: Main results If there are only k types of users, then • Can reconstruct exactly all the D S ’s with O(nk) calls to the max-dist oracle • Can reconstruct all the D S ’s to within 𝓂 1 -error of ε with poly(n, k, ε ) calls to the max-sample oracle

  20. E ffi cient versions of RUMs • Few user types • Multinomial logits (MNLs)

  21. Multinomial logit ( MNL ) ( Bradley & Terry 1952; Luce 1959 ) • Classical special case of RUMs Model. Given a universe U of items and a positive weight a u for each item u in U For a subset (slate) S of U, the probability of choosing u in slate S is propo ru ional to a u Pr[choosing u in S] = a u / ∑ v ∈ S a v

  22. MNL example 3/17 3 > 5/14 5 2 3 > 2/9 Random 
 2 4 1 > Permutation 1 2 5 > 4 > 2 Pick the next item in the permutation at random between the remaining ones, 
 with probability propo ru ional to its weight

  23. 1 - MNL learning Goal. Learn the weight a i for each i ∈ [n] Assume for a slate S we get the choice distribution D S ( · ) exactly ( max-dist oracle) For i = 1, …, n-1, query the MNL using slate {i, n} to get the choice distribution D i,n ( · ) (a i / (a i + a n ), a n / (a i + a n ))

  24. A linear system a n / (a 1 + a n ) = D 1,n (n) a n / (a 2 + a n ) = D 2,n (n) … ∑ a i = 1 Solve the resulting system of linear equations to obtain the weights

  25. 1 - MNL learning 1-MNL can be learnt with O(n) queries and slates of size 2

  26. How good are 1 - MNLs? ~ 50% ~ 50% ~ 40% ~ 10% 1 ε 4 50 % ε 1 ε 50%

  27. W eakness of 1 - MNLs 1-MNLs are insu ffj cient to capture common se tu ings

  28. Mixture of MNLs • Modeling distinct populations with 1-MNL causes the problem • Allowing a mixture of population, with a population-speci fj c MNL, can solve the problem • New items need not cannibalize equally from all other items • New vegan restaurant a fg ects only vegans

  29. 2 - MNL mixture 2-MNL mixture: Given a universe U of items and positive weights a u and b u for each item u in U For a slate S , the probability of choosing u in S equals γ · a u / ∑ v ∈ S a v + (1 – γ ) · b u / ∑ v ∈ S b v Uniform mixture when γ = 1/2

  30. Power of MNL mixtures MNL mixtures can approximate arbitrarily well any RUM (McFadden & Train 2000)

  31. The big picture RUMs k-MNLs Choice 1-MNLs models

  32. 2 - MNL learning • Goal : Learn weights a i , b i for each i ∈ [n] • Assume for a slate S we get the choice distribution D S ( · ) exactly • Can show 2-slates are not enough to learn

  33. 2 - MNL learning with 3 - slates • Query the MNL using slates {i, j} and {i, j, k} to get the choice distributions D i,j ( · ) and D i,j,k ( · ) 2 D i,j (i) = a i /(a i + a j ) + b i /(b i + b j ) 2 D i,j,k (i) = a i /(a i + a j + a k ) + b i /(b i + b j + b k )

  34. A polynomial system 2 D i,j (i) = a i /(a i + a j ) + b i /(b i + b j ) 2 D i,k (i) = a i /(a i + a k ) + b i /(b i + b k ) 2 D j,k (j) = a j /(a j + a k ) + b j /(b j + b k ) 2 D i,j,k (i) = a i /(a i + a j + a k ) + b i /(b i + b j + b k ) 2 D i,j,k (j) = a j /(a i + a j + a k ) + b j /(b i + b j + b k ) a i + a j + a k =1, b i + b j + b k = 1

  35. Identifiability Theorem. For any uniform 2-MNL and for any set of 3 elements S = {i, j, k}, the choice distributions of all the subsets of S determine uniquely the weights of i, j, k in each of the two MNLs Proof steps. • Pa ru ition the solution space in a discrete number of regions • Show that at most one region can contain feasible solutions and give combinatorial algorithm to determine it • Use the structure of the generic region to prove uniqueness

  36. Patching the unique solutions • Query slates {1, 2, 3}, {1, 4, 5}, {1, 6, 7}, …, • Find s, t ∈ [n] such that a s /a t ≠ b s /b t • If a i /a j = b i /b j for all i, j, it is a 1-MNL • Query slates {1, s, t}, {2, s, t}, {3, s, t}, … • a i = a 1,s,t (i) · a s / a 1,s,t (s); b i = b 1,s,t (i) · b s / b 1,s,t (s)

  37. 2 - MNLs: Main results Theorem. There is an adaptive algorithm pe rg orming max-dist queries on O(n) slates of sizes 2 and 3, that reconstructs the weights of any uniform 2-MNL system on n elements Theorem. There is a non-adaptive algorithm pe rg orming max-dist queries on O(n 2 ) slates of sizes 2 and 3, that reconstructs the weights of any uniform 2-MNL system on n elements

  38. Conclusions • We studied a number of algorithmic problems related to discrete choice • We believe this class of problems is theoretically impo ru ant and relevant in practice

  39. Some open questions • What is the relative power of the max-sample / max-dist oracles? • How well can one approximate general mixtures of MNLs with the two oracles? • Identi fj ability of non-uniform 2-MNLs, k-MNLs • Distribution testing questions

  40. Thank you! Questions/Comments ravi.k53 @ gmail

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend