Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam - PowerPoint PPT Presentation

Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam Lerer* 1 , Sam Gross 1 , Tuomas Sandholm 23 *Equal Contribution 1 Facebook AI Research 2 Carnegie Mellon University 3 Strategic Machine Inc., Strategy Robot Inc., and Optimized Markets Inc.

Counterfactual Regret Min inimization (C (CFR) [Zin inkevich et al al. Neu eurIP IPS-07] 07] • CFR is the leading algorithm for solving partially observable games • Iteratively converges to an equilibrium • Used by every top poker AI in the past 7 years, including Libratus • Every single one used a tabular form of CFR • This paper introduces a function approximation form of CFR using deep neural networks • Less domain knowledge • Easier to apply to other games

Example of Monte Carlo CFR [La Lanctot et al al. NeurIP rIPS-09] 09] • Simulate a game with one player designated as the traverser C 𝑄 1 𝑄 2 C 25

Example of Monte Carlo CFR [La Lanctot et al al. NeurIP rIPS-09] 09] • Simulate a game with one player designated as the traverser C • After game ends, traverser sees how much better she could have done by 𝑄 1 choosing other actions 𝑄 2 𝑄 2 C 𝑄 1 25 100

Example of Monte Carlo CFR [La Lanctot et al al. NeurIP rIPS-09] 09] • Simulate a game with one player designated as the traverser C • After game ends, traverser sees how much better she could have done by 𝑄 1 choosing other actions • This difference is added to the 𝑄 2 𝑄 2 action’s regret . In future iterations, actions with higher regret are chosen C 𝑄 1 with higher probability 25 100

Example of Monte Carlo CFR [La Lanctot et al al. NeurIP rIPS-09] 09] • Simulate a game with one player designated as the traverser C • After game ends, traverser sees how much better she could have done by 𝑄 1 choosing other actions • This difference is added to the 𝑄 2 𝑄 2 action’s regret . In future iterations, actions with higher regret are chosen C 𝑄 1 with higher probability • Process repeats even for 50 120 25 100 hypothetical decision points

Prior Approach: Abstraction in Games Original game Bucketed together Abstracted game • Requires extensive domain knowledge • Several papers written on how to do abstraction just in poker • Difficult to extend to other games

Deep CFR • Input: low-level features (visible cards, observed actions) • Output: estimate of action regrets • On each iteration: 1. Collect samples of action regrets, add to a buffer 2. Train a network to predict regrets 3. Use network’s regret estimates to play on next iteration

Deep CFR • Input: low-level features (visible cards, observed actions) • Output: estimate of action regrets • On each iteration: 1. Collect samples of action regrets, add to a buffer 2. Train a network to predict regrets 3. Use network’s regret estimates to play on next iteration • Theorem: With arbitrarily high probability, Deep CFR converges to an 𝜗 -Nash equilibrium in two-player zero-sum games, where 𝜗 is determined by prediction error

Experimental results in limit Texas hold’em • Deep CFR produces superhuman performance in heads-up limit Texas hold’em poker • ~10 trillion decision points • Once played competitively by humans • Deep CFR outperforms Neural Fictitious Self Play (NFSP), the prior best deep RL algorithm for partially observable games [Heinrich & Silver arXiv-15] • Deep CFR is also much more sample efficient • Deep CFR is competitive with domain-specific abstraction algorithms

Conclusions • Among algorithms for non-tabular solving of partially-observable games, Deep CFR is the fastest, most sample-efficient, and produces the best results • Uses less domain knowledge than abstraction-based approaches, making it easier to apply to other games

Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam - PowerPoint PPT Presentation

Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam Lerer* 1 , Sam Gross 1 , Tuomas Sandholm 23 *Equal Contribution 1 Facebook AI Research 2 Carnegie Mellon University 3 Strategic Machine Inc., Strategy Robot Inc., and Optimized

1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Counterfactual Donkeys and the Modal Horizon Andreas Walker and Maribel Romero Counterfactual

TAX MIN INIMIZATION IN IN MERGERS & ACQUISITIONS Harold F. Ingersoll, CPA/ABV/CFF, CVA,

Clo loud-based Collision-Aware Energy- Min inimization Vehicle Velocity Optimization Chenxi

Class 4 @rwdkent Overview Current Events (10 min) Break (5 min) Explore RWD (25 min) CSS

Counterfactual Regret Minimization Gabriele Farina 1 Christian Kroer 2 Noam Brown 1 Tuomas Sandholm

No-Regret Learning in Convex Games Geoff Gordon, Amy Greenwald, Casey Marks, and Martin Zinkevich

CENTRE-BERCY 5 Min 10 Min 45 Min 55 Min DESTINATION PARIS BERCY ACCORHOTELS ARENA THE SEINE

procedure SERIAL MIN ( A , n ) 1. 2. begin 3. min = A [ 0 ] ; 4. for i := 1 to n 1 do 5.

Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and Jacob Abernethy Georgia Tech

Counterfactual-based mediation analysis Workshop 1 Rhian Daniel London School of Hygiene and

Counterfactual-based mediation analysis Workshop 2 Rhian Daniel London School of Hygiene and

Counterfactual Visual Explanations Yash Goyal Ziyan Wu Jan Ernst Dhruv Batra Devi Parikh

Counterfactual Policy Evaluation in Reproducing Kernel Hilbert Spaces Krikamol Muandet Max

Lizzi Meister ISJL Community Engagement Fellows Set Induction (5 min) Introduction to the

Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp Koehn Artificial Intelligence:

Combining Deduction and Algebraic Constraints for Hybrid System Analysis Andr e Platzer

Planning and Satisfiability Conclusion References Jussi Rintanen SAT-SMT School, Trento, June

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Generative Adversarial Networks presented by Ian Goodfellow presentation co-developed with Aaron

Monte-Carlo Game Tree Search: Advanced Techniques Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

Transfer learning with neural language models CS 685, Spring 2020 Advanced Natural Language

Sambuz

Useful Links

Newsletter

Mail Us

Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam - PowerPoint PPT Presentation

Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam Lerer* 1 , Sam Gross 1 , Tuomas Sandholm 23 *Equal Contribution 1 Facebook AI Research 2 Carnegie Mellon University 3 Strategic Machine Inc., Strategy Robot Inc., and Optimized

1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3 min www.matsgroup.info 1 min 2 min 3

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Counterfactual Donkeys and the Modal Horizon Andreas Walker and Maribel Romero Counterfactual

TAX MIN INIMIZATION IN IN MERGERS &amp; ACQUISITIONS Harold F. Ingersoll, CPA/ABV/CFF, CVA,

Clo loud-based Collision-Aware Energy- Min inimization Vehicle Velocity Optimization Chenxi

Class 4 @rwdkent Overview Current Events (10 min) Break (5 min) Explore RWD (25 min) CSS

Counterfactual Regret Minimization Gabriele Farina 1 Christian Kroer 2 Noam Brown 1 Tuomas Sandholm

No-Regret Learning in Convex Games Geoff Gordon, Amy Greenwald, Casey Marks, and Martin Zinkevich

CENTRE-BERCY 5 Min 10 Min 45 Min 55 Min DESTINATION PARIS BERCY ACCORHOTELS ARENA THE SEINE

procedure SERIAL MIN ( A , n ) 1. 2. begin 3. min = A [ 0 ] ; 4. for i := 1 to n 1 do 5.

Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and Jacob Abernethy Georgia Tech

Counterfactual-based mediation analysis Workshop 1 Rhian Daniel London School of Hygiene and

Counterfactual-based mediation analysis Workshop 2 Rhian Daniel London School of Hygiene and

Counterfactual Visual Explanations Yash Goyal Ziyan Wu Jan Ernst Dhruv Batra Devi Parikh

Counterfactual Policy Evaluation in Reproducing Kernel Hilbert Spaces Krikamol Muandet Max

Lizzi Meister ISJL Community Engagement Fellows Set Induction (5 min) Introduction to the

Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp Koehn Artificial Intelligence:

Combining Deduction and Algebraic Constraints for Hybrid System Analysis Andr e Platzer

Planning and Satisfiability Conclusion References Jussi Rintanen SAT-SMT School, Trento, June

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Generative Adversarial Networks presented by Ian Goodfellow presentation co-developed with Aaron

Monte-Carlo Game Tree Search: Advanced Techniques Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

Transfer learning with neural language models CS 685, Spring 2020 Advanced Natural Language

Sambuz

Useful Links

Newsletter

Mail Us

TAX MIN INIMIZATION IN IN MERGERS & ACQUISITIONS Harold F. Ingersoll, CPA/ABV/CFF, CVA,