SLIDE 51 Policy Evaluation Policy Evaluation with CME
Experimental Results
10
7
10
5
10
3
10
1
Mean Square Error
Stochastic target policy, M=100, K=10 Stochastic target policy, M=10, K=5
1000 4000 16000 63000 250000 1000000
Number of observations
10
7
10
5
10
3
10
1
Mean Square Error
Deterministic target policy, M=100, K=10
1000 4000 16000 63000 250000 1000000
Number of observations
Deterministic target policy, M=10, K=5
Estimator CME Direct DR Slate wIPS OnPolicy
Dataset: Microsoft Learning to Rank Challenge dataset (MSLR-WEB30K)
Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 22 / 27