U G A V ! Michael Johanson, Nolan Bard, Marc Lanctot, " - PowerPoint PPT Presentation

Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization AAMAS 2012 - June 6, 2012 Q J # $ K 1 0 P C R " ! U G A V ! Michael Johanson, Nolan Bard, Marc Lanctot, " # ! K Q $ A J Richard Gibson, Michael Bowling ! 0 1 University of Alberta Computer Poker Research Group University of Alberta Wednesday, November 14, 2012

♣ ♥ ♦ ♠ Motivation Tackling the practical challenge of Nash equilibrium computation in large games Strategy that is guaranteed to not lose on expectation (2-player, zero-sum) Very useful property in practice: Dominant approach in the Annual Computer Poker Competition 2008: beat human professionals at 2-player limit Texas hold’em poker Wednesday, November 14, 2012

Motivation Size of Game Solved The poker community is 10 10 now solving games with 10 9 # Information Sets decisions (information sets). 10 9 10 8 LPs don’t scale to this size of game. We’ve made great 10 7 progress on efficient 10 6 approximation algorithms. 10 5 (CFR, EGT) 2006 2007 2008 2009 2010 2011 Computer Poker Competition Year Wednesday, November 14, 2012

♦ ♠ ♥ ♣ ♦ Counterfactual Regret Minimization (CFR), NIPS 2007 CFR is the competition’s most CFR Convergence popular algorithm. Best Response (mbb/game) 100 Iterative, resembles self-play; 75 reinforcement learning flavour. 50 Memory efficient (2 doubles 25 per infoset-action) Converges quickly (1/ ε 2 ) 0 Programmer Friendly 0 5,000 10,000 15,000 20,000 Easy to implement and optimize Computation Time (seconds) Linear speedup with many cores This paper: a new CFR variant that converges more quickly in imperfect information games. Wednesday, November 14, 2012

♣ ♥ ♠ ♦ Counterfactual Regret Minimization (CFR) Basic idea: ? ? Start with two uniform versus random strategies. Play them against each other. Put a regret minimizing agent at every decision, and let it Regret (I)=(-2,1,4) I σ (I)=(0,0.2,0.8) independently learn its part of the strategy. Run many iterations: walk the game tree, agents update their parts of the strategy. Nash σ Average strategy profile Equilibrium converges to equilibrium. Wednesday, November 14, 2012

♥ ♣ ♠ ♦ Counterfactual Regret Minimization (CFR) Root To update a decision, we need: Probability of other players taking their series of actions p=0.2 π -i (I)=0.2 Expected value (or unbiased I estimate) of actions’ utilities V(I)=(-2,2,6) given opponent’s strategy 2 -2 6 Recursively walk the tree: Push forwards opponent action probabilities Return EV at this terminal node or in this subtree Terminal Nodes Wednesday, November 14, 2012

♥ ♣ ♦ Chance-Sampled CFR Public Chance In practice, a sampling variant of CFR is used. My Private Chance Chance Sampling: on each iteration, randomly sample one Opponent Private Chance set of chance events and only update that part of the tree. Recursion: Terminal nodes: Get an PASS one scalar (opponent reach unbiased estimate of my state’s probability) value. Takes O(1) time. RETURN one scalar (value of subgame) Wednesday, November 14, 2012

New CFR Sampling Variants Opponent-Public Chance Sampling (CS) Chance Sampling (OPCS) Public chance Sample: Opponent Private Chance My Private Chance Sample: Public chance Opponent Private Chance Expand: My Private Chance Public Self-Public Chance Sampling (PCS) Chance Sampling (CS) My Private Chance Sample: Public chance Sample: Public chance My Private Chance Expand: Expand: Opponent Private Chance Opponent Private Chance Wednesday, November 14, 2012

♥ ♣ ♥ ♠ ♦ Opponent-Public Chance Sampling (OPCS) Sample one Public chance event Public Chance Sample one opponent private chance event Enumerate all of my possible My Private Chance private chance events ...(45 choose 2) KEY OBSERVATION: Opponent Private Chance Opponent can’t observe my chance event, so their strategy is the same for all of them. I can efficiently update all Recursion: of these decisions in the same PASS one scalar recursive pass! (opponent reach probability) Terminal nodes: n states to evaluate. RETURN a vector Takes O( n ) time. (values of subgames Wednesday, November 14, 2012

New CFR Sampling Variants Opponent-Public Chance Sampling (CS) Slower, Chance Sampling (OPCS) Many updates Public chance Sample: Sample: Opponent Private Chance per iteration My Private Chance Public chance Opponent Private Chance Expand: My Private Chance Public Self-Public Chance Sampling (PCS) Chance Sampling (SPCS) My Private Chance Sample: Public chance Sample: Public chance My Private Chance Expand: Expand: Opponent Private Chance Opponent Private Chance Wednesday, November 14, 2012

New CFR Sampling Variants Opponent-Public Chance Sampling (CS) Slower, Chance Sampling (OPCS) Many updates Public chance Sample: Opponent Private Chance per iteration My Private Chance Sample: Public chance Opponent Private Chance Expand: My Private Chance Public Self-Public Chance Sampling (PCS) Chance Sampling (SPCS) My Private Chance Sample: Public chance Sample: Public chance My Private Chance Expand: Expand: Opponent Private Chance Opponent Private Chance Wednesday, November 14, 2012

♦ ♣ ♥ ♠ ♥ Self-Public Chance Sampling (SPCS) Public Chance Sample one Public chance event Sample one of my private chance events My Private Chance Enumerate all of opponent’s possible private chance events Opponent Private Chance Terminal nodes: n states to evaluate. Much more precise estimate of my ...(45 choose 2) value, since I compare my state to all of theirs! Recursion: PASS one vector RESULT: (opponent reach probabilities) Slow but very precise updates. RETURN one scalar (value of subgame) Wednesday, November 14, 2012

New CFR Sampling Variants Opponent-Public Chance Sampling (CS) Slower, Chance Sampling (OPCS) Many updates Public chance Sample: Opponent Private Chance per iteration My Private Chance Sample: Public chance Opponent Private Chance Expand: My Private Chance Slower, Very precise updates Public Self-Public Chance Sampling (PCS) Chance Sampling (SPCS) My Private Chance Sample: Public chance Sample: Public chance My Private Chance Expand: Expand: Opponent Private Chance Opponent Private Chance Wednesday, November 14, 2012

♥ ♣ ♥ ♠ ♦ Public Chance Sampling (PCS) Sample one Public chance event Public Chance Enumerate all of my private chance events Enumerate all of opponent’s possible My Private Chance private chance events ...(47 choose 2) Terminal nodes: n states to evaluate Opponent Private Chance against n states. Looks like O(n 2 ) ...(47 choose 2) work. But depending on game structure, O( n ) is often possible, Recursion: making it as fast as OPCS or SPCS! PASS one vector (opponent reach RESULT: probability) Slower, but do many precise updates RETURN one vector on each iteration. (value of subgame) Wednesday, November 14, 2012

New CFR Sampling Variants Opponent-Public Chance Sampling (CS) Slower, Chance Sampling (OPCS) More updates Public chance Sample: Opponent Private Chance per iteration My Private Chance Sample: Public chance Opponent Private Chance Expand: My Private Chance Same speed, Slower, very precise Very precise updates updates Public Self-Public Chance Sampling (PCS) Chance Sampling (SPCS) My Private Chance Sample: Same speed, Public chance Sample: Public chance many updates per iteration My Private Chance Expand: Expand: Opponent Private Chance Opponent Private Chance Wednesday, November 14, 2012

Results: 2-round, 4-bet Poker 94 million decision points (information sets) 10 4 CS OPCS Best response (mbb/g) 10 3 SPCS PCS 10 2 10 1 10 0 10 -1 10 2 10 3 10 4 10 5 Time (seconds) Wednesday, November 14, 2012

Abstracted Limit Texas Hold’em Poker Larger abstractions Real Abstract are better in practice, Poker Poker but take longer to solve. Game Game 3*10 14 Can evaluate by 10 9 measuring Decisions Abstraction Decisions exploitability in (infosets) (infosets) abstract game. Wednesday, November 14, 2012

Results: Abstracted Limit Texas Hold’em Poker Abstract Best Response (mbb/g) 10 2 CS CS PCS PCS 10 1 10 0 5 buckets 8 buckets 3.6m decisions 23.6m decisions 10 -1 CS CS PCS PCS 10 1 10 0 10 buckets 12 buckets 57.3m decisions 118.6m decisions 10 -1 10 1 10 2 10 3 10 4 10 5 10 1 10 2 10 3 10 4 10 5 10 6 Time (seconds) Wednesday, November 14, 2012

Alternate domain: Bluff, an imperfect information dice game CS 10 -1 PCS Best Response 10 -2 10 -3 10 -4 10 3 10 4 10 5 Time (seconds) Wednesday, November 14, 2012

U G A V ! Michael Johanson, Nolan Bard, Marc Lanctot, " - PowerPoint PPT Presentation

Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization AAMAS 2012 - June 6, 2012 Q J # $ K 1 0 P C R " ! U G A V ! Michael Johanson, Nolan Bard, Marc Lanctot, " # ! K Q $

Fun with Mixed Models Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk Overview 1

Mikhail Varentsov Lomomosov Moscow State University, Faculty of Geography, Department of

Temperature monitoring of non Temperature monitoring of non- actively cooled pharmaceutical

Assessing uncertainty of the temporal EBLUP: a resampling-based approach Lus N. Pereira, MsC 1

Regression Diagnostics and the Forward Search 2 A. C. Atkinson, London School of Economics March

IN MEMORY OF PROFESSOR DANIE KRIGE WINFRED ASSIBEY-BONSU August 2015 Content Part 1: The

What works in Boston may not work in Los Angeles: Understanding site di ff erences and generalizing

V OCABULARY : Scatter plot It is a graph composed of points representing the relationship

CIE Chemistry A-Level 4.2.2 Practical Skills for Paper 3 - Presentation of Data and Observations

Click to edit Master title style Click to edit Master title style Regional Planimetrics Project

S- and p-wave structure of S = -1 meson- baryon scattering in the resonance region Supported by

Helping Students Find the Best College Fit: The Counselors Role NOSCA Fall 2013 Webinar Series

Preliminary verifjcation of ensemble precipitation forecast over South America Cristina T

Coupling Index and Stocks Mohamed Sbai Joint work with Benjamin Jourdain Universit e

Non-linear Difference-in-Differences Models for Policy and Program Evaluation Claude M. Setodji,

Elementary Estimators for High-Dimensional Linear Regression Eunho Yang EUNHO @ CS . UTEXAS . EDU

HIERAR HIERARCHICAL CHICAL LINEAR MODELLING LINEAR MODELLING Expectation Expectation

Coverage Adjustment Methodology Census Division General Register Office for Scotland Coverage

Sunthud Pornprasertmanit W. Joel Schneider Sample Size Estimation Approach Power

Project Cost Task Force Knowns and Unknowns Overview May 25, 2011 PCTF Known and Unknown

Louisiana Impact Estimate of Federal Health Care Reform 2010 Louisiana Department of Health and

Strategic Policy & Performance Council May 16, 2017 Welcome and Remarks Kevin Doyle

The Real Crisis: Global Unemployment DALE T. MORTENSEN ISEO SUMMER SCHOOL JUNE 17, 2013

THE IM(PERFECT) MATCH ILO INTERNATIONAL CONFERENCE REGIONAL VIEW: ARAB STATES AND CENTRAL

U G A V ! Michael Johanson, Nolan Bard, Marc Lanctot, " - PowerPoint PPT Presentation

Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization AAMAS 2012 - June 6, 2012 Q J # $ K 1 0 P C R " ! U G A V ! Michael Johanson, Nolan Bard, Marc Lanctot, " # ! K Q $

Fun with Mixed Models Vic Biostats Seminar 30th April 2015 doug.speed@ucl.ac.uk Overview 1

Mikhail Varentsov Lomomosov Moscow State University, Faculty of Geography, Department of

Temperature monitoring of non Temperature monitoring of non- actively cooled pharmaceutical

Assessing uncertainty of the temporal EBLUP: a resampling-based approach Lus N. Pereira, MsC 1

Regression Diagnostics and the Forward Search 2 A. C. Atkinson, London School of Economics March

IN MEMORY OF PROFESSOR DANIE KRIGE WINFRED ASSIBEY-BONSU August 2015 Content Part 1: The

What works in Boston may not work in Los Angeles: Understanding site di ff erences and generalizing

V OCABULARY : Scatter plot It is a graph composed of points representing the relationship

CIE Chemistry A-Level 4.2.2 Practical Skills for Paper 3 - Presentation of Data and Observations

Click to edit Master title style Click to edit Master title style Regional Planimetrics Project

S- and p-wave structure of S = -1 meson- baryon scattering in the resonance region Supported by

Helping Students Find the Best College Fit: The Counselors Role NOSCA Fall 2013 Webinar Series

Preliminary verifjcation of ensemble precipitation forecast over South America Cristina T

Coupling Index and Stocks Mohamed Sbai Joint work with Benjamin Jourdain Universit e

Non-linear Difference-in-Differences Models for Policy and Program Evaluation Claude M. Setodji,

Elementary Estimators for High-Dimensional Linear Regression Eunho Yang EUNHO @ CS . UTEXAS . EDU

HIERAR HIERARCHICAL CHICAL LINEAR MODELLING LINEAR MODELLING Expectation Expectation

Coverage Adjustment Methodology Census Division General Register Office for Scotland Coverage

Sunthud Pornprasertmanit W. Joel Schneider Sample Size Estimation Approach Power

Project Cost Task Force Knowns and Unknowns Overview May 25, 2011 PCTF Known and Unknown

Louisiana Impact Estimate of Federal Health Care Reform 2010 Louisiana Department of Health and

Strategic Policy &amp; Performance Council May 16, 2017 Welcome and Remarks Kevin Doyle

The Real Crisis: Global Unemployment DALE T. MORTENSEN ISEO SUMMER SCHOOL JUNE 17, 2013

THE IM(PERFECT) MATCH ILO INTERNATIONAL CONFERENCE REGIONAL VIEW: ARAB STATES AND CENTRAL

Strategic Policy & Performance Council May 16, 2017 Welcome and Remarks Kevin Doyle