Showing Relevant Ads via Context Multi-Armed Bandits
D´ avid P´ al December 17, 2008 A&C Seminar joint work with Tyler Lu and Martin P´ al
Showing Relevant Ads via Context Multi-Armed Bandits D avid P al - - PowerPoint PPT Presentation
Showing Relevant Ads via Context Multi-Armed Bandits D avid P al December 17, 2008 A&C Seminar joint work with Tyler Lu and Martin P al The Problem were running a popular website users visit our website we want to
D´ avid P´ al December 17, 2008 A&C Seminar joint work with Tyler Lu and Martin P´ al
Context-Free Multi-Armed Bandits
2002
|µ(x, y) − µ(x′, y′)| ≤ LX(x, x′) + LY(y, y′) where LX and LY are metrics
x1, x2, . . . , xT
µ1, ^ µ2, . . . , ^ µt−1 ∈ {0, 1}
y∗
t = argmax y∈Y
µ(xt, y)
Regret(T) =
T
µ(xt, y∗
t ) − E
T
µ(xt, yt)
lim
T→∞
Regret(T) T = 0
Regret(T) = O(Tγ) where 0 < γ < 1.
(Oversimplifying and lying somewhat.)
If X has “dimension” a and Y has “dimension” b, then
Regret(T) = O
a+b+1 a+b+2
Regret(T) = Ω
a+b+1 a+b+2
ǫ
ǫ = T−
1 a+b+2
n(x0, y0)
m(x0, y0)
µ(x0, y0) = m(x0, y0) n(x0, y0)
µ(x0, y0) +
1 + n(x0, y0) (exploration vs. exploitation trade-off)
x0 xt
ǫ
Rt(x0, y0) =
1 + n(x0, y0) It(x0, y0) = µ(x0, y0) + Rt(x0, y0)
It(x0, y0) ∈ [µ(x0, y0) − ǫ, µ(x0, y0) + 2Rt(x0, y0) + ǫ] for all x0 ∈ X0, y0 ∈ Y0 and all t = 1, 2, . . . , T simultaneously.
Fix x0 ∈ X0
µ(x0, y4) µ(x0, y3) µ(x0, y2) µ(x0, y1)
The confidence intervals µ(x0, ·) − ǫ µ(x0, ·) + 2Rt(x0, ·) + ǫ
Regret(T) =
T
µ(xt, y∗
t ) − E
T
µ(xt, yt)
suboptimal ad y contribution to the regret: µ(x0, y∗) − µ(x0, y)
If µ(x0, y) + Rt(x0, y) + ǫ < µ(x0, y∗) − ǫ , the algorithm stops displaying the suboptimal ad y. µ(x0, y∗) − ǫ µ(x0, y) + 2Rt(x0, y) + ǫ
Rt(x0, y) =
1 + n(x0, y)
increases.
difference µ(x0, y∗) − µ(x0, y)
T
a+b+1 a+b+2