bandits under the influence
play

Bandits Under the Influence Silviu Maniu, Stratis Ioannidis, Bogdan - PowerPoint PPT Presentation

Bandits Under the Influence Silviu Maniu, Stratis Ioannidis, Bogdan Cautis Universit Paris-Saclay & Northeastern University 1/13 Motivation Recommender systems : recommending items to users preferences may be unknown or highly dynamic


  1. Bandits Under the Influence Silviu Maniu, Stratis Ioannidis, Bogdan Cautis Université Paris-Saclay & Northeastern University 1/13

  2. Motivation Recommender systems : recommending items to users • preferences may be unknown or highly dynamic • online recommendations systems – re-learn preferences on the go • users can be influence by other users – social influence Objective : online recommendation systems taking into account social influence • solution framework: sequential learning , multi-armed bandits 2/13

  3. Setting – Recommendation Set of users [ n ] , receiving suggestions at time steps t ∈ N , each having user profiles u i ( t ) ∈ R d Recommended item : d -dimensional vector v ∈ R d , B the catalog of recommendable items Each time step t : user is presented an item i , and presents a rating r i ( t ) : r i ( t ) = � u i ( t ) , v i ( t ) � + ǫ 3/13

  4. Setting – User Preference Evolution Users are in a social network , and interests evolve in time steps: u i ( t ) = α u 0 i + ( 1 − α ) � j ∈ [ n ] P i , j u j ( t − 1 ) , i ∈ [ n ] • social parameter α ∈ [ 0 , 1 ] • influence network between users i and j , P ij 4/13

  5. Our Contributions 1. Establish the link between the online recommendation and linear bandits 2. Apply the non-stationary setting to the classic LinREL and Thompson Sampling algorithms from the bandit literature 3. Study tractable cases for solving the optimizations in each step of the algorithms 5/13

  6. Link with Bandits Want to minimize the aggregate regret : R ( T ) = � T � n i = 1 � u i ( t ) , v ∗ i ( t ) � − � u i ( t ) , v i ( t ) � t = 1 Bandit setting : we notice that the aggregate reward is a linear function of the matrix of user profiles U 0 : • expected reward ¯ r ( t ) = u ⊤ 0 L ( t ) v – function of vectorized forms of the user and item matrices u , v and a matrix capturing the social evolution L ( t ) 6/13

  7. LinREL – Adapting to Recommendations LinREL : • arms are selected from a vector space, and the expected reward observes an linear function of the arm • to select an armwe use Upper Confidence Bound (UCB) principle – a confidence bound on an estimator • the unknown model is estimated via least square fit, either L 1 or L 2 ellipsoids 7/13

  8. LinREL – Adapting to Recommendations In our case : • arms are the items v , modified by L ( t ) – non-stationary setting • the estimator is least-squares t − 1 � X ( V ( τ ) , A ( τ )) u − r ( τ ) � 2 u 0 ( t ) = arg min � ˆ 2 u ∈ R nd τ = 1 • recommendations are selected as solution to the non-convex optimization v ( t ) = arg max u ∈C t u ⊤ L ( t ) v v ∈B ( n ) max • we study the case of C 1 , C 2 – ellipsoids in L 1 and L 2 8/13

  9. LinREL – Regret Theorem Assume that, for any 0 < δ < 1 : � 2 � � 128 nd ln t ln t 2 3 ln t 2 � 8 (1) β t = max δ , , δ then, for C t = C 2 t : � 1 + n �� � � ∀ T , R ( T ) � n 8 nd β T T ln dT � 1 − δ, (2) Pr and, for C t = C 1 t : � 1 + n �� � ∀ T , R ( T ) � n 2 d � 8 β T T ln dT � 1 − δ. (3) Pr 9/13

  10. LinREL – Computational Issues For C 1 the optimization can be solved efficiently for two classes of catalogs: • if B is a convex set – convex optimization problem , need to solve 2 n 2 d convex problems • if B is a finite subset – can check all |B| items for a total of 2 n 2 d evaluations 10/13

  11. Other Algorithms Thompson Sampling • Bayesian interpretation, assumes a prior on u 0 • in each step, samples this vector from the posterior obtained after the feedback has been observed • computationally efficient • Bayesian regret of the same order as for LinREL LinUCB • similar to LinREL, but does not optimize over an ellipsoid • non-convex optimization , inefficient 11/13

  12. Results on Synthetic Datasets RandomBanditFiniteSet RandomBanditL2Ball 14000 LinREL1FiniteSet LinREL1L2Ball 4000 12000 RegressionFiniteSet RegressionL2Ball TompsonSamplingFiniteSet TompsonSamplingL2Ball 10000 3000 Regret Regret 8000 2000 6000 4000 1000 2000 0 0 0 20 40 60 80 100 0 20 40 60 80 100 Step Step (a) Regret, finite set (b) Regret, L 2 ball n = 100 , d = 20 , n = 100 , d = 20 |B| = 1000 Synthetic dataset : randomly generated social network, user profiles, and catalog 12/13

  13. Results on Real Dataset RandomBanditFiniteSet LinREL1FiniteSet 50000 RegressionFiniteSet TompsonSamplingFiniteSet 40000 Regret 30000 20000 10000 0 0 20 40 60 80 100 Step Figure 1: Flixstr regret n = 206 , d = 28 , |B| = 100 Flixstr : filtered dataset • 1 049 492 users in a social network of 7 058 819 links • 74 240 movies and 8 196 077 reviews 13/13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend