Trevor Davis2,*, Martin Schmid1, Michael Bowling1,2
Low-Variance and Zero-Variance Baselines in Extensive-Form Games
1 2
*Work done during internship at DeepMind
Low-Variance and Zero-Variance Baselines in Extensive-Form Games - - PowerPoint PPT Presentation
1 2 Low-Variance and Zero-Variance Baselines in Extensive-Form Games Trevor Davis 2,* , Martin Schmid 1 , Michael Bowling 1,2 *Work done during internship at DeepMind Monte Carlo game solving Extensive-form games (EFGs) Monte Carlo game
Trevor Davis2,*, Martin Schmid1, Michael Bowling1,2
1 2
*Work done during internship at DeepMind
Extensive-form games (EFGs)
Extensive-form games (EFGs)
Unbiased updates at h
Unbiased updates at h where
Unbiased updates at h where Unsampled actions:
Without baseline:
Without baseline: Baseline correction:
Without baseline: Baseline correction: (control variate)
Leduc poker, Monte Carlo Counterfactual Regret Minimization (MCCFR+)
No baseline VR-MCCFR (Schmid et al.) Learned history baseline
Updating with learned history baseline: Optimal baseline depends on strategy update:
Updating with learned history baseline: Optimal baseline depends on strategy update: Use strategy to update baseline: Recursively set
No baseline VR-MCCFR (Schmid et al.) Learned history baseline Predictive baseline
Leduc poker, Monte Carlo Counterfactual Regret Minimization (MCCFR+)