deep counterfactual regret min inimization
play

Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam - PowerPoint PPT Presentation

Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam Lerer* 1 , Sam Gross 1 , Tuomas Sandholm 23 *Equal Contribution 1 Facebook AI Research 2 Carnegie Mellon University 3 Strategic Machine Inc., Strategy Robot Inc., and Optimized


  1. Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam Lerer* 1 , Sam Gross 1 , Tuomas Sandholm 23 *Equal Contribution 1 Facebook AI Research 2 Carnegie Mellon University 3 Strategic Machine Inc., Strategy Robot Inc., and Optimized Markets Inc.

  2. Counterfactual Regret Min inimization (C (CFR) [Zin inkevich et al al. Neu eurIP IPS-07] 07] • CFR is the leading algorithm for solving partially observable games • Iteratively converges to an equilibrium • Used by every top poker AI in the past 7 years, including Libratus • Every single one used a tabular form of CFR • This paper introduces a function approximation form of CFR using deep neural networks • Less domain knowledge • Easier to apply to other games

  3. Example of Monte Carlo CFR [La Lanctot et al al. NeurIP rIPS-09] 09] • Simulate a game with one player designated as the traverser C 𝑄 1 𝑄 2 C 25

  4. Example of Monte Carlo CFR [La Lanctot et al al. NeurIP rIPS-09] 09] • Simulate a game with one player designated as the traverser C • After game ends, traverser sees how much better she could have done by 𝑄 1 choosing other actions 𝑄 2 𝑄 2 C 𝑄 1 25 100

  5. Example of Monte Carlo CFR [La Lanctot et al al. NeurIP rIPS-09] 09] • Simulate a game with one player designated as the traverser C • After game ends, traverser sees how much better she could have done by 𝑄 1 choosing other actions • This difference is added to the 𝑄 2 𝑄 2 action’s regret . In future iterations, actions with higher regret are chosen C 𝑄 1 with higher probability 25 100

  6. Example of Monte Carlo CFR [La Lanctot et al al. NeurIP rIPS-09] 09] • Simulate a game with one player designated as the traverser C • After game ends, traverser sees how much better she could have done by 𝑄 1 choosing other actions • This difference is added to the 𝑄 2 𝑄 2 action’s regret . In future iterations, actions with higher regret are chosen C 𝑄 1 with higher probability • Process repeats even for 50 120 25 100 hypothetical decision points

  7. Prior Approach: Abstraction in Games Original game Bucketed together Abstracted game • Requires extensive domain knowledge • Several papers written on how to do abstraction just in poker • Difficult to extend to other games

  8. Deep CFR • Input: low-level features (visible cards, observed actions) • Output: estimate of action regrets • On each iteration: 1. Collect samples of action regrets, add to a buffer 2. Train a network to predict regrets 3. Use network’s regret estimates to play on next iteration

  9. Deep CFR • Input: low-level features (visible cards, observed actions) • Output: estimate of action regrets • On each iteration: 1. Collect samples of action regrets, add to a buffer 2. Train a network to predict regrets 3. Use network’s regret estimates to play on next iteration • Theorem: With arbitrarily high probability, Deep CFR converges to an 𝜗 -Nash equilibrium in two-player zero-sum games, where 𝜗 is determined by prediction error

  10. Experimental results in limit Texas hold’em • Deep CFR produces superhuman performance in heads-up limit Texas hold’em poker • ~10 trillion decision points • Once played competitively by humans • Deep CFR outperforms Neural Fictitious Self Play (NFSP), the prior best deep RL algorithm for partially observable games [Heinrich & Silver arXiv-15] • Deep CFR is also much more sample efficient • Deep CFR is competitive with domain-specific abstraction algorithms

  11. Conclusions • Among algorithms for non-tabular solving of partially-observable games, Deep CFR is the fastest, most sample-efficient, and produces the best results • Uses less domain knowledge than abstraction-based approaches, making it easier to apply to other games

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend