A Unified Contextual Bandit Framework for Long- and Short-Term - - PowerPoint PPT Presentation

a unified contextual bandit framework for long and short
SMART_READER_LITE
LIVE PREVIEW

A Unified Contextual Bandit Framework for Long- and Short-Term - - PowerPoint PPT Presentation

A Unified Contextual Bandit Framework for Long- and Short-Term Recommendations Maryam Tavakol and Ulf Brefeld {tavakol,brefeld}@leuphana.de Skopje - Sep 21, 2017 Recommendation Tavakol & Brefeld, Leuphana University Lneburg 2/22


slide-1
SLIDE 1

A Unified Contextual Bandit Framework for Long- and Short-Term Recommendations

Maryam Tavakol and Ulf Brefeld {tavakol,brefeld}@leuphana.de

Skopje - Sep 21, 2017

slide-2
SLIDE 2

Tavakol & Brefeld, Leuphana University Lüneburg

Recommendation

2/22

slide-3
SLIDE 3

Tavakol & Brefeld, Leuphana University Lüneburg

Recommendation

3/22

slide-4
SLIDE 4

Tavakol & Brefeld, Leuphana University Lüneburg

Personalization

4/22

slide-5
SLIDE 5

Tavakol & Brefeld, Leuphana University Lüneburg

Short-Term Zeitgeist

5/22

slide-6
SLIDE 6

Tavakol & Brefeld, Leuphana University Lüneburg

Proposed Approach

  • Goal: Combination of long- and short-term interests
  • f users in one unified model

➡ Long-term part + Short-term component ✤ s.t.: Generality in terms of optimization

  • Framework: Contextual Multi-Armed Bandit (MAB)
  • e.g., LinUCB

6/22

slide-7
SLIDE 7

Tavakol & Brefeld, Leuphana University Lüneburg

Unified Model

E[rt,ai|uj] = θ>

i xt

| {z }

Shortterm

+ β>

j zai

| {z }

Longterm

+ bi current user

  • utcome

context item features parameters of short-term model parameters of long-term model bias term

7/22

slide-8
SLIDE 8

Tavakol & Brefeld, Leuphana University Lüneburg

General Optimization

  • Objective function with arbitrary loss,

inf

θ1,...,θn β1,...,βm b

1 T

T

X

t=1

V (θ>

t xt + β> t zt + bt, rt) + λ

2 X

i

kθik2 + ˆ µ 2 X

j

kβjk2

Regularization

V (·, rt)

8/22

slide-9
SLIDE 9

Tavakol & Brefeld, Leuphana University Lüneburg

General Optimization

  • Using the Fenchel-Legendre conjugate of loss

function in the dual space:

sup

α,1>α=0

C

T

X

t=1

V ⇤(αt C , rt) 1 2α>[( X

i

δi ⌦ δ>

i ) XX> + 1

µ( X

i

φi ⌦ φ>

i ) ZZ>]α

Kernel trick

9/22

slide-10
SLIDE 10

Tavakol & Brefeld, Leuphana University Lüneburg

Optimization

  • Gradient-based approaches (in dual or primal)
  • Calculating the gradient depends on the loss function
  • Model parameters, , are obtained from
  • Kernel functions applicable

(θi, βj)

α

10/22

slide-11
SLIDE 11

Tavakol & Brefeld, Leuphana University Lüneburg

Algorithm

11/22

slide-12
SLIDE 12

Tavakol & Brefeld, Leuphana University Lüneburg

Algorithm

11/22

slide-13
SLIDE 13

Tavakol & Brefeld, Leuphana University Lüneburg

Algorithm

11/22

slide-14
SLIDE 14

Tavakol & Brefeld, Leuphana University Lüneburg

Algorithm

11/22

slide-15
SLIDE 15

Tavakol & Brefeld, Leuphana University Lüneburg

Algorithm

11/22

slide-16
SLIDE 16

Tavakol & Brefeld, Leuphana University Lüneburg

Algorithm

11/22

slide-17
SLIDE 17

Tavakol & Brefeld, Leuphana University Lüneburg

Algorithm

11/22

slide-18
SLIDE 18

Tavakol & Brefeld, Leuphana University Lüneburg

Algorithm

11/22

slide-19
SLIDE 19

Tavakol & Brefeld, Leuphana University Lüneburg

Instantiation: Squared Loss

  • Conjugate of the loss function:
  • Becomes a standard quadratic optimization with a

constraint

  • Confidence bound: c

q x>

t (X>X)1xt + z> t (Z>Z)1zt

V ∗(−αt C , rt) = 1 2C2 α2

t − 1

C αtrt

12/22

slide-20
SLIDE 20

Tavakol & Brefeld, Leuphana University Lüneburg

Instantiation: Logistic Loss

  • Conjugate of the loss function:
  • Confidence bound:

V ∗(−αt rt , rt) = (1 − αt Crt ) log(1 − αt Crt ) + αt Crt log( αt Crt )

c q x>

t (X>VaX)1xt + z> t (Z>VuZ)1zt

Diagonal matrix of sigmoid model

13/22

slide-21
SLIDE 21

Tavakol & Brefeld, Leuphana University Lüneburg

Model Simplification

  • Focus on the item model
  • Short-Term:
  • Short-Term+Average:
  • Focus on the user model
  • Long-Term:
  • Long-Term+Average:

E[rt,ai] = θ>

i xt

E[rt,ai] = θ>

i xt + β>zai

E[rt,ai|uj] = β>

j zai

E[rt,ai|uj] = β>

j zai + θ>zai 14/22

slide-22
SLIDE 22

Tavakol & Brefeld, Leuphana University Lüneburg

Empirical Study

  • Using squared loss function
  • Dataset: User transactions from Zalando*
  • Baseline: Matrix Factorization (MF)
  • Performance measure: normalized average rank

*www.zalando.com 15/22

slide-23
SLIDE 23

Tavakol & Brefeld, Leuphana University Lüneburg

No New User/Item

  • The combined approach outperforms either short- or

long-term models —but not the baseline!

16/22

slide-24
SLIDE 24

Tavakol & Brefeld, Leuphana University Lüneburg

Cold Start Scenarios

  • Robustness of combined model in case of new user/item

generalizes well for both cases

17/22

slide-25
SLIDE 25

Tavakol & Brefeld, Leuphana University Lüneburg

Time Complexity

  • The optimization time in combined model is exponential

18/22

slide-26
SLIDE 26

Tavakol & Brefeld, Leuphana University Lüneburg

Short-Term Models

  • The average term compensates for the new items

19/22

slide-27
SLIDE 27

Tavakol & Brefeld, Leuphana University Lüneburg

Long-Term Models

  • The average term compensates for the new users

20/22

slide-28
SLIDE 28

Tavakol & Brefeld, Leuphana University Lüneburg

Conclusion

  • The short- and long-term interests of users are

combined in one model

  • Free choice of loss function and model complexity
  • There is not one best model: the choice depends
  • n the application

21/22

slide-29
SLIDE 29

Tavakol & Brefeld, Leuphana University Lüneburg

Questions?

Thanks for your attention A Unified Contextual Bandit Framework for Long- and Short-Term Recommendations

Maryam Tavakol & Ulf Brefeld {tavakol,brefeld}@leuphana.de

Source code available at https://github.com/marytavakol/Bandits

22/22