Bandits Under the Influence Silviu Maniu, Stratis Ioannidis, Bogdan - PowerPoint PPT Presentation

Bandits Under the Influence Silviu Maniu, Stratis Ioannidis, Bogdan Cautis Université Paris-Saclay & Northeastern University 1/13

Motivation Recommender systems : recommending items to users • preferences may be unknown or highly dynamic • online recommendations systems – re-learn preferences on the go • users can be influence by other users – social influence Objective : online recommendation systems taking into account social influence • solution framework: sequential learning , multi-armed bandits 2/13

Setting – Recommendation Set of users [ n ] , receiving suggestions at time steps t ∈ N , each having user profiles u i ( t ) ∈ R d Recommended item : d -dimensional vector v ∈ R d , B the catalog of recommendable items Each time step t : user is presented an item i , and presents a rating r i ( t ) : r i ( t ) = � u i ( t ) , v i ( t ) � + ǫ 3/13

Setting – User Preference Evolution Users are in a social network , and interests evolve in time steps: u i ( t ) = α u 0 i + ( 1 − α ) � j ∈ [ n ] P i , j u j ( t − 1 ) , i ∈ [ n ] • social parameter α ∈ [ 0 , 1 ] • influence network between users i and j , P ij 4/13

Our Contributions 1. Establish the link between the online recommendation and linear bandits 2. Apply the non-stationary setting to the classic LinREL and Thompson Sampling algorithms from the bandit literature 3. Study tractable cases for solving the optimizations in each step of the algorithms 5/13

Link with Bandits Want to minimize the aggregate regret : R ( T ) = � T � n i = 1 � u i ( t ) , v ∗ i ( t ) � − � u i ( t ) , v i ( t ) � t = 1 Bandit setting : we notice that the aggregate reward is a linear function of the matrix of user profiles U 0 : • expected reward ¯ r ( t ) = u ⊤ 0 L ( t ) v – function of vectorized forms of the user and item matrices u , v and a matrix capturing the social evolution L ( t ) 6/13

LinREL – Adapting to Recommendations LinREL : • arms are selected from a vector space, and the expected reward observes an linear function of the arm • to select an armwe use Upper Confidence Bound (UCB) principle – a confidence bound on an estimator • the unknown model is estimated via least square fit, either L 1 or L 2 ellipsoids 7/13

LinREL – Adapting to Recommendations In our case : • arms are the items v , modified by L ( t ) – non-stationary setting • the estimator is least-squares t − 1 � X ( V ( τ ) , A ( τ )) u − r ( τ ) � 2 u 0 ( t ) = arg min � ˆ 2 u ∈ R nd τ = 1 • recommendations are selected as solution to the non-convex optimization v ( t ) = arg max u ∈C t u ⊤ L ( t ) v v ∈B ( n ) max • we study the case of C 1 , C 2 – ellipsoids in L 1 and L 2 8/13

LinREL – Regret Theorem Assume that, for any 0 < δ < 1 : � 2 � � 128 nd ln t ln t 2 3 ln t 2 � 8 (1) β t = max δ , , δ then, for C t = C 2 t : � 1 + n �� ∀ T , R ( T ) � n 8 nd β T T ln dT � 1 − δ, (2) Pr and, for C t = C 1 t : � 1 + n �� ∀ T , R ( T ) � n 2 d � 8 β T T ln dT � 1 − δ. (3) Pr 9/13

LinREL – Computational Issues For C 1 the optimization can be solved efficiently for two classes of catalogs: • if B is a convex set – convex optimization problem , need to solve 2 n 2 d convex problems • if B is a finite subset – can check all |B| items for a total of 2 n 2 d evaluations 10/13

Other Algorithms Thompson Sampling • Bayesian interpretation, assumes a prior on u 0 • in each step, samples this vector from the posterior obtained after the feedback has been observed • computationally efficient • Bayesian regret of the same order as for LinREL LinUCB • similar to LinREL, but does not optimize over an ellipsoid • non-convex optimization , inefficient 11/13

Results on Synthetic Datasets RandomBanditFiniteSet RandomBanditL2Ball 14000 LinREL1FiniteSet LinREL1L2Ball 4000 12000 RegressionFiniteSet RegressionL2Ball TompsonSamplingFiniteSet TompsonSamplingL2Ball 10000 3000 Regret Regret 8000 2000 6000 4000 1000 2000 0 0 0 20 40 60 80 100 0 20 40 60 80 100 Step Step (a) Regret, finite set (b) Regret, L 2 ball n = 100 , d = 20 , n = 100 , d = 20 |B| = 1000 Synthetic dataset : randomly generated social network, user profiles, and catalog 12/13

Results on Real Dataset RandomBanditFiniteSet LinREL1FiniteSet 50000 RegressionFiniteSet TompsonSamplingFiniteSet 40000 Regret 30000 20000 10000 0 0 20 40 60 80 100 Step Figure 1: Flixstr regret n = 206 , d = 28 , |B| = 100 Flixstr : filtered dataset • 1 049 492 users in a social network of 7 058 819 links • 74 240 movies and 8 196 077 reviews 13/13

Bandits Under the Influence Silviu Maniu, Stratis Ioannidis, Bogdan - PowerPoint PPT Presentation

Bandits Under the Influence Silviu Maniu, Stratis Ioannidis, Bogdan Cautis Universit Paris-Saclay & Northeastern University 1/13 Motivation Recommender systems : recommending items to users preferences may be unknown or highly dynamic

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Cooperative Multi-Agent Bandits with Heavy Tails Introduction K-Armed Bandits Cooperation

Introduction to Bandits R emi Munos SequeL project: Sequential Learning

Chicag cago o Bandits dits Affili liate te Program ram Junior r Affiliate and Tra vel

Data Poisoning Attack cks on Stoch chastic c Bandits Fang Liu and Ness Shroff Outline

Module 13 Bayesian Bandits CS 886 Sequential Decision Making and Reinforcement Learning

Econ 2148, fall 2019 Multi-armed bandits Maximilian Kasy Department of Economics, Harvard

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

CS885 Reinforcement Learning Lecture 8b: May 25, 2018 Bayesian and Contextual Bandits [SutBar]

Weighted bandits or: How bandits learn distorted values that are not expected Prashanth L.A.

On adaptive regret bounds for non- stochastic bandits Gergely Neu INRIA Lille, SequeL team

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson Joint

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

About this class An example Bandit problems in general Two-armed bandits Multi-armed bandits

Multi-Player Bandits Revisited Decentralized Multi-Player Multi-Arm Bandits Lilian Besson

Influence of Array Storage and Access Methods on Performance of Multi-Dimensional Arrays Used in

barnacle Chthalamus stellatus . Lotka Volterra Competition Alfred J. Lotka (1880 1949),

Probabilistic Causal Analysis of Social Influence F. Bonchi 1 F. Gullo 2 B. Mishra 3 D. Ramazzotti

http://cs224w.stanford.edu We are more influenced by our friends than strangers 68% of

Influences of an interdisciplinary global health program on cultural awareness and future global

Informative Priors for Graphical Model Structure James Cussens, University of York

On the Prior and Posterior Distributions Used in Graphical Modelling Marco Scutari

Comparing two proportions Beginning Bayes in R Learning about many parameters Chapters 2-3