Bandits Under the Influence Silviu Maniu, Stratis Ioannidis, Bogdan - - PowerPoint PPT Presentation

bandits under the influence
SMART_READER_LITE
LIVE PREVIEW

Bandits Under the Influence Silviu Maniu, Stratis Ioannidis, Bogdan - - PowerPoint PPT Presentation

Bandits Under the Influence Silviu Maniu, Stratis Ioannidis, Bogdan Cautis Universit Paris-Saclay & Northeastern University 1/13 Motivation Recommender systems : recommending items to users preferences may be unknown or highly dynamic


slide-1
SLIDE 1

Bandits Under the Influence

Silviu Maniu, Stratis Ioannidis, Bogdan Cautis

Université Paris-Saclay & Northeastern University

1/13

slide-2
SLIDE 2

Motivation

Recommender systems: recommending items to users

  • preferences may be unknown or highly dynamic
  • online recommendations systems – re-learn preferences on the

go

  • users can be influence by other users – social influence

Objective: online recommendation systems taking into account social influence

  • solution framework: sequential learning, multi-armed bandits

2/13

slide-3
SLIDE 3

Setting – Recommendation

Set of users [n], receiving suggestions at time steps t ∈ N, each having user profiles ui(t) ∈ Rd Recommended item: d-dimensional vector v ∈ Rd, B the catalog of recommendable items Each time step t: user is presented an item i, and presents a rating ri(t): ri(t) = ui(t), vi(t) + ǫ

3/13

slide-4
SLIDE 4

Setting – User Preference Evolution

Users are in a social network, and interests evolve in time steps: ui(t) = αu0

i + (1 − α) j∈[n] Pi,juj(t − 1), i ∈ [n]

  • social parameter α ∈ [0, 1]
  • influence network between users i and j, Pij

4/13

slide-5
SLIDE 5

Our Contributions

  • 1. Establish the link between the online recommendation and

linear bandits

  • 2. Apply the non-stationary setting to the classic LinREL and

Thompson Sampling algorithms from the bandit literature

  • 3. Study tractable cases for solving the optimizations in each step
  • f the algorithms

5/13

slide-6
SLIDE 6

Link with Bandits

Want to minimize the aggregate regret: R(T) = T

t=1

n

i=1ui(t), v∗ i (t) − ui(t), vi(t)

Bandit setting: we notice that the aggregate reward is a linear function of the matrix of user profiles U0:

  • expected reward ¯

r(t) = u⊤

0 L(t)v – function of vectorized forms of

the user and item matrices u, v and a matrix capturing the social evolution L(t)

6/13

slide-7
SLIDE 7

LinREL – Adapting to Recommendations

LinREL:

  • arms are selected from a vector space, and the expected reward
  • bserves an linear function of the arm
  • to select an armwe use Upper Confidence Bound (UCB) principle

– a confidence bound on an estimator

  • the unknown model is estimated via least square fit, either L1 or

L2 ellipsoids

7/13

slide-8
SLIDE 8

LinREL – Adapting to Recommendations

In our case:

  • arms are the items v, modified by L(t) – non-stationary setting
  • the estimator is least-squares

ˆ u0(t) = arg min

u∈Rnd t−1

  • τ=1

X(V(τ), A(τ))u − r(τ)2

2

  • recommendations are selected as solution to the non-convex
  • ptimization

v(t) = arg max

v∈B(n) max u∈Ct u⊤L(t)v

  • we study the case of C1, C2 – ellipsoids in L1 and L2

8/13

slide-9
SLIDE 9

LinREL – Regret

Theorem Assume that, for any 0 < δ < 1: βt = max

  • 128nd ln t ln t2

δ , 8 3 ln t2 δ 2 , (1) then, for Ct = C2

t :

Pr

  • ∀T, R(T) n
  • 8ndβTT ln
  • 1 + n

dT

  • 1 − δ,

(2) and, for Ct = C1

t :

Pr

  • ∀T, R(T) n2d
  • 8βTT ln
  • 1 + n

dT

  • 1 − δ.

(3)

9/13

slide-10
SLIDE 10

LinREL – Computational Issues

For C1 the optimization can be solved efficiently for two classes of catalogs:

  • if B is a convex set – convex optimization problem, need to solve

2n2d convex problems

  • if B is a finite subset – can check all |B| items for a total of 2n2d

evaluations

10/13

slide-11
SLIDE 11

Other Algorithms

Thompson Sampling

  • Bayesian interpretation, assumes a prior on u0
  • in each step, samples this vector from the posterior obtained

after the feedback has been observed

  • computationally efficient
  • Bayesian regret of the same order as for LinREL

LinUCB

  • similar to LinREL, but does not optimize over an ellipsoid
  • non-convex optimization, inefficient

11/13

slide-12
SLIDE 12

Results on Synthetic Datasets

20 40 60 80 100

Step

2000 4000 6000 8000 10000 12000 14000

Regret

RandomBanditFiniteSet LinREL1FiniteSet RegressionFiniteSet TompsonSamplingFiniteSet

(a) Regret, finite set n = 100, d = 20, |B| = 1000

20 40 60 80 100

Step

1000 2000 3000 4000

Regret

RandomBanditL2Ball LinREL1L2Ball RegressionL2Ball TompsonSamplingL2Ball

(b) Regret, L2 ball n = 100, d = 20

Synthetic dataset: randomly generated social network, user profiles, and catalog

12/13

slide-13
SLIDE 13

Results on Real Dataset

20 40 60 80 100

Step

10000 20000 30000 40000 50000

Regret

RandomBanditFiniteSet LinREL1FiniteSet RegressionFiniteSet TompsonSamplingFiniteSet

Figure 1: Flixstr regret n = 206, d = 28, |B| = 100

Flixstr: filtered dataset

  • 1 049 492 users in a social network of 7 058 819 links
  • 74 240 movies and 8 196 077 reviews

13/13