CAB: Continuous Adaptive Blending for Policy Evaluation and Learning
Yi Su*, Lequn Wang*, Michele Santacatterina and Thorsten Joachims
CAB: Continuous Adaptive Blending for Policy Evaluation and Learning - - PowerPoint PPT Presentation
CAB: Continuous Adaptive Blending for Policy Evaluation and Learning Yi Su*, Lequn Wang*, Michele Santacatterina and Thorsten Joachims Example: Netflix Context : User/History Action : Movie to be placed here Candidate: Reward :
Yi Su*, Lequn Wang*, Michele Santacatterina and Thorsten Joachims
Action π: Movie to be placed here Context π: User/History Reward π: Whether user will click it Candidate:
Draw π% from π% Γ ' π π% Draw π) from π) Γ ' π π) Draw π* from π* Γ ' π π* Draw π+ from π+ Γ ' π π+ Draw π, from π, Γ ' π π, Draw π- from π- Γ ' π π- Draw π from π. / π β% ' π β% ' π β% ' π β% ' π β% ' π π-
/ π β% ' π β% ' π β% ' π β% ' π β% ' π π%) / π β% ' π β% ' π β% ' π β% ' π β% ' π π%@
π = π¦A, π§A, π
A, π.(π§A|π¦A) AE% F
Contribution I: Present a family
Contribution II: Design a new estimator that inherits desirable properties.
Notation: G π(π¦, π§) be the estimated reward for action π§ given context π¦. Let I π. be the estimated (known) logging policy.
Interpolated Counterfactual Estimator (ICE) Family Given a triplet π³ = (π₯L, π₯M, π₯N) of weighting functions: / πO π = 1
π R
AE% F
R
Sβπ΅
π(π§|π¦A) π₯AS
L π½AS + 1
π R
AE% F
π π§A π¦A π₯A
MπΎA + 1
π R
AE% F
π π§A π¦A π₯A
NπΏA
Model the world π½AS = G π π¦A, π§ High bias, small variance Model the bias πΎA = β π (π¦A, π§A) Z π. π§A π¦A) High variance, can be unbiased with known propensity Control variate πΏA = β G π(π¦A, π§A) Z π. π§A π¦A) Variance reduction, prohibited use in LTR
ΓΌ
Can be sustainably less biased than clipped IPS and DM.
ΓΌ
While having low variance compared to IPS and DR.
ΓΌ
Subdifferentiable and capable of gradient based learning: POEM (Swaminathan & Joachims,
2015a), BanditNet (Joachims et.al., 2018)
ΓΌ
Unlike DR, can be used in off-policy Learning to Rank (LTR) algorithms. (Joachims et.al.,
2017)
See our poster at Pacific Ballroom #221 Thursday (Today) 6:30-9:00pm