A Unified Contextual Bandit Framework for Long- and Short-Term Recommendations
Maryam Tavakol and Ulf Brefeld {tavakol,brefeld}@leuphana.de
Skopje - Sep 21, 2017
A Unified Contextual Bandit Framework for Long- and Short-Term - - PowerPoint PPT Presentation
A Unified Contextual Bandit Framework for Long- and Short-Term Recommendations Maryam Tavakol and Ulf Brefeld {tavakol,brefeld}@leuphana.de Skopje - Sep 21, 2017 Recommendation Tavakol & Brefeld, Leuphana University Lneburg 2/22
Maryam Tavakol and Ulf Brefeld {tavakol,brefeld}@leuphana.de
Skopje - Sep 21, 2017
Tavakol & Brefeld, Leuphana University Lüneburg
2/22
Tavakol & Brefeld, Leuphana University Lüneburg
3/22
Tavakol & Brefeld, Leuphana University Lüneburg
4/22
Tavakol & Brefeld, Leuphana University Lüneburg
5/22
Tavakol & Brefeld, Leuphana University Lüneburg
➡ Long-term part + Short-term component ✤ s.t.: Generality in terms of optimization
6/22
Tavakol & Brefeld, Leuphana University Lüneburg
E[rt,ai|uj] = θ>
i xt
| {z }
Shortterm
+ β>
j zai
| {z }
Longterm
+ bi current user
context item features parameters of short-term model parameters of long-term model bias term
7/22
Tavakol & Brefeld, Leuphana University Lüneburg
inf
θ1,...,θn β1,...,βm b
1 T
T
X
t=1
V (θ>
t xt + β> t zt + bt, rt) + λ
2 X
i
kθik2 + ˆ µ 2 X
j
kβjk2
Regularization
V (·, rt)
8/22
Tavakol & Brefeld, Leuphana University Lüneburg
function in the dual space:
sup
α,1>α=0
C
T
X
t=1
V ⇤(αt C , rt) 1 2α>[( X
i
δi ⌦ δ>
i ) XX> + 1
µ( X
i
φi ⌦ φ>
i ) ZZ>]α
Kernel trick
9/22
Tavakol & Brefeld, Leuphana University Lüneburg
(θi, βj)
α
10/22
Tavakol & Brefeld, Leuphana University Lüneburg
11/22
Tavakol & Brefeld, Leuphana University Lüneburg
11/22
Tavakol & Brefeld, Leuphana University Lüneburg
11/22
Tavakol & Brefeld, Leuphana University Lüneburg
11/22
Tavakol & Brefeld, Leuphana University Lüneburg
11/22
Tavakol & Brefeld, Leuphana University Lüneburg
11/22
Tavakol & Brefeld, Leuphana University Lüneburg
11/22
Tavakol & Brefeld, Leuphana University Lüneburg
11/22
Tavakol & Brefeld, Leuphana University Lüneburg
constraint
q x>
t (X>X)1xt + z> t (Z>Z)1zt
V ∗(−αt C , rt) = 1 2C2 α2
t − 1
C αtrt
12/22
Tavakol & Brefeld, Leuphana University Lüneburg
V ∗(−αt rt , rt) = (1 − αt Crt ) log(1 − αt Crt ) + αt Crt log( αt Crt )
c q x>
t (X>VaX)1xt + z> t (Z>VuZ)1zt
Diagonal matrix of sigmoid model
13/22
Tavakol & Brefeld, Leuphana University Lüneburg
E[rt,ai] = θ>
i xt
E[rt,ai] = θ>
i xt + β>zai
E[rt,ai|uj] = β>
j zai
E[rt,ai|uj] = β>
j zai + θ>zai 14/22
Tavakol & Brefeld, Leuphana University Lüneburg
*www.zalando.com 15/22
Tavakol & Brefeld, Leuphana University Lüneburg
long-term models —but not the baseline!
16/22
Tavakol & Brefeld, Leuphana University Lüneburg
generalizes well for both cases
17/22
Tavakol & Brefeld, Leuphana University Lüneburg
18/22
Tavakol & Brefeld, Leuphana University Lüneburg
19/22
Tavakol & Brefeld, Leuphana University Lüneburg
20/22
Tavakol & Brefeld, Leuphana University Lüneburg
combined in one model
21/22
Tavakol & Brefeld, Leuphana University Lüneburg
Thanks for your attention A Unified Contextual Bandit Framework for Long- and Short-Term Recommendations
Maryam Tavakol & Ulf Brefeld {tavakol,brefeld}@leuphana.de
Source code available at https://github.com/marytavakol/Bandits
22/22