Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker Agents Richard Gibson
Ph.D. Thesis Presentation December 6, 2013
Richard Gibson Ph.D. Thesis Presentation December 6, 2013 Computer - - PowerPoint PPT Presentation
Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker Agents Richard Gibson Ph.D. Thesis Presentation December 6, 2013 Computer Poker Research Group Heads Up Limit Texas Hold'em Source: ebaumsworld.com
Ph.D. Thesis Presentation December 6, 2013
Source: ebaumsworld.com
Bet! Fold? Call? Raise?
Source: ebaumsworld.com
All-in! Bet!
Source: ebaumsworld.com
Fold? Call? Raise? Bet! Call.
Source: toonpool.com
Source: ebaumsworld.com Source: toonpool.com
2010 - 2013
Hyperborean3p
– 3-player – Imperfect recall
2009
– 3-player – Imperfect recall
2009
– Many players – Imperfect recall
with limited memory
2013
– Counterfactual Regret Minimization (CFR)
– Many player games – Imperfect recall games
– Counterfactual Regret Minimization (CFR)
– Many player games – Imperfect recall games
c
c
QJ QK 1/6 1/6 ... ...
1 1 b c b c
Check / Bet ? Information set
c QJ QK 1/6 1/6 ... ...
1 1 2 b c 2 b c f c f c
Bet! Fold / Call ?
c QJ QK 1/6 1/6 ... ...
1 1 2 b c 2 b c f c +1 f c +1
Fold. Bet! +1
c QJ QK 1/6 1/6 ... ...
1 1 2 b c 2 b c f c +1
f c +1 +2
Call. Bet! +2 / -2
/
c QJ QK 1/6 1/6 ... ...
1 1 2 2 b c 2 2 b c f c +1
c b f c +1 +2 c b
Check. Check / Bet ?
c QJ QK 1/6 1/6 ... ...
1 1 2 2 b c 2 2 b c f c +1
c b f c +1 +2 c b
Check. Check.
/ +1 / -1
+1
c QJ QK 1/6 1/6 ... ...
1 1 2 2 b c 2 2 b c f c +1
c b f c +1 +2 c b
Fold / Call ? Bet!
+1
1 1 f
+2 c f
c
Information set
c QJ QK 1/6 1/6 ... ...
1 1 2 2 b c 2 2 b c f c +1
c b f c +1 +2 c b +1
1 1 f
+2 c f
c
In general:
Extensive-Form Game
c QJ QK 1/6 1/6 ... ...
1 1 2 2 .4 .6 2 2 .4 .6 .1 .9 +1
1 1 +1 +2 .8 .2 +1
1 1 .7
+2 .3 .7
.3
In general:
Extensive-Form Game
Strategy Profile
c QJ QK 1/6 1/6 ... ...
In general:
Extensive-Form Game
Nash Equilibrium Strategy Profile
Nash equilibrium:
“No one can change their strategy and do any better.”
In general:
Extensive-Form Game
Nash Equilibrium Strategy Profile
Nash equilibrium:
“No one can change their strategy and do any better.”
1/3 1/3 1/3
Every game has a Nash equilibrium.
In general:
Extensive-Form Game
Nash Equilibrium Strategy Profile
Nash equilibrium:
“No one can change their strategy and do any better.”
1/3 1/3 1/3
Every game has a Nash equilibrium.
– Counterfactual Regret Minimization (CFR)
– Many player games – Imperfect recall games
1 1 2 2 b c 2 2 b c f c +1
c b f c +1 +2 c b +1
1 1 f
+2 c f
c
search of imperfect information games.”
c QJ QK 1/6 1/6 ... ...
1 1 2 2 b c 2 2 b c f c +1
c b f c +1 +2 c b +1
1 1 f
+2 c f
c
search of imperfect information games.”
c QJ QK 1/6 1/6 ... ...
search of imperfect information games.”
1 1 2 2 .5 .5 2 2 .5 .5 .5 .5 +1
.5 .5 .5 .5 +1 +2 .5 .5 +1
1 1 .5
+2 .5 .5
.5 c QJ QK 1/6 1/6 ... ...
search of imperfect information games.”
update action probabilities at every information set.
1 1 2 2 .3 .7 2 2 .3 .7 1 +1
1 1 +1 +2 .8 .2 +1
1 1 .5
+2 .5 .5
.5 c QJ QK 1/6 1/6 ... ...
Nash Equilibrium Strategy Profile Strategy 1 + Strategy 2 + ... + Strategy T T Average Strategy Profile
T
∞
T = number of iterations
Extensive-Form Game
Nash Equilibrium Strategy Profile
CFR
Kuhn Poker
Nash Equilibrium Strategy Profile
CFR
Texas Hold'em
Nash Equilibrium Strategy Profile
CFR
>1014 information sets > 5 million GB
Large Extensive-Form Game
Nash Equilibrium Strategy Profile
Large Extensive-Form Game Abstract Game
Abstract Game
Extensive-Form Game
Abstract Game
Extensive-Form Game
Extensive-Form Game Abstract Game
Extensive-Form Game Abstract Game
Abstract Game Equilibrium Strategy
CFR
Extensive-Form Game Abstract Game
Abstract Game Equilibrium Strategy
Approximate Full Game Equilibrium Strategy
CFR
– Counterfactual Regret Minimization (CFR)
– Many player games – Imperfect recall games
Extensive-Form Game
Nash Equilibrium Strategy Profile
CFR
3-or-more Player Game
(Not equilibrium)
CFR
2-player Zero-Sum Game
Nash Equilibrium Strategy Profile
CFR
3-player Limit Texas Hold'em
Good strategy? (Not equilibrium)
CFR
Agent Total Bankroll (mbb/g) Hyperborean3p 319 ± 2 dpp 171 ± 2 akuma 151 ± 2 CMURingLimit
dcu3pl
Bluechip
Annual Computer Poker Competition 3-Player Limit Texas Hold'em - 2009
c 1 1 2 2 2 2 1
1 QJ QK c b b c c b f c c b f c f c f c 1/6 1/6 +1
+2 +1 +2
+1
... ...
1 1 2 2 2 2 1
1 c b b c c b f c c b f c f c f c +1
+2 +1 +2
+1
c QJ QK 1/6 1/6 ... ...
1 1 2 2 2 2 1
1 c b b c c b f c c b f c f c f c +1
+2 +1 +2
+1
Dominated Strategies
c QJ QK 1/6 1/6 ... ...
2 2 2 2 1
1 c b b c c b f c b c f c f c +1
+2 +1
c QJ QK 1/6 1/6 ... ... 1 1
1 1 2 2 2 2 1
1 c b b c c b f c b c f c f c +1
+2 +1
Iteratively Dominated Strategy
c QJ QK 1/6 1/6 ... ...
Average Strategy Profile T
∞
No Iteratively Dominated Strategies 3-or-more Player Game
CFR New!
[G., arXiv ePrints 2013]
Average Strategy Profile T
∞
No Iteratively Dominated Strategies 3-or-more Player Game
CFR New!
“Current” Strategy Profile T Finite T
No Iteratively Dominated Strategies 3-or-more Player Game
CFR New!
[G., arXiv ePrints 2013]
3-Player Limit Texas Hold'em - 2012
New!
– Counterfactual Regret Minimization (CFR)
– Many player games – Imperfect recall games
Extensive-Form Game Abstract Game
Abstract Game Equilibrium Strategy
CFR
“Imperfect Recall” Abstract Game
(Not equilibrium)
CFR
“Perfect Recall” Abstract Game
Abstract Game Equilibrium Strategy
CFR
Pre-flop
Pre-flop Flop
Imperfect Recall Abstract Game
Perfect Recall Abstract Game
Extensive-Form Game
“Well-formed” Imperfect Recall Game
Abstract Game Equilibrium Strategy
CFR
convergence for our best poker abstractions.
treatment research
[Chen and Bowling, NIPS 2012]. [Lanctot, G., Burch and Bowling, ICML 2012]
New!
– Counterfactual Regret Minimization (CFR)
– Many player games – Imperfect recall games
c 1 1 QJ QK 1/6 1/6 2 2 .3 .7 2 2 .3 .7 1 +1
1 1 +1 +2 .8 .2 +1
1 1 .5
+2 .5 .5
.5
... ...
c 1 1 QJ QK 1/6 1/6 2 2 .3 .7 2 2 .3 .7 .5 .5 +1
.5 .5 1 +1 +2 .8 .2 +1
1 1 .5
+2 .5 .5
.5
update action probabilities at a sampled subset of states.
... ...
Chance Sampling
[Zinkevich et al., NIPS 2007]
Chance Sampling External Sampling
– Use new strategies
sooner
– Good trade-off
[Zinkevich et al., NIPS 2007] [Lanctot et al., NIPS 2009]
Chance Sampling External Sampling Outcome Sampling
iterations
iterations required
[Zinkevich et al., NIPS 2007] [Lanctot et al., NIPS 2009]
2-round Heads-Up No-Limit Hold'em, 36 chips per player
– T = Iterations required to be
“close enough” to equilibrium
– C, K = Constants
New!
[G., Lanctot, Burch, Szafron and Bowling, AAAI 2012]
Chance Sampling External Sampling Outcome Sampling Average Strategy Sampling New!
Average Strategy
[G. et al., NIPS 2012] [Zinkevich et al., NIPS 2007] [Lanctot et al., NIPS 2009]
2-round Heads-Up No-Limit Hold'em, 36 chips per player [G. et al., NIPS 2012]
[G. et al., NIPS 2012] 2-round Heads-Up No-Limit Hold'em, k chips per player
– Counterfactual Regret Minimization (CFR)
– Many player games – Imperfect recall games
2-player Limit Texas Hold'em Abstract Game
≈ 59,000,000 “Turn” Deals
540,000 “Turn” Buckets
2-player Limit Texas Hold'em Abstract Game
3-player Limit Texas Hold'em Abstract Game
≈ 59,000,000 “Turn” Deals
540,000 “Turn” Buckets
≈ 59,000,000 “Turn” Deals
540 “Turn” Buckets
3-Player Limit Texas Hold'em Abstract Game Abstract Game Strategy
3-player Limit Texas Hold'em Stitched Strategy
540 “Turn” Buckets
≈ 59,000,000 “Turn” Deals
3-Player Limit Texas Hold'em Abstract Game Abstract Game Strategy
3-player Limit Texas Hold'em Stitched Strategy
2-player Experts 2-player Sub-games
540 “Turn” Buckets
540,000 “Turn” Buckets
[Gibson and Szafron, NIPS 2011]
≈ 59,000,000 “Turn” Deals
Extensive-Form Game
Abstraction 1
Abstraction 2 Abstraction K
...
“Frankenstein” Abstract Game New!
Frankenstein-Game Strategy
Full Game Strategy
[Gibson and Szafron, NIPS 2011]
3-player Limit Texas Hold'em
18K “Turn” Buckets
1.53 Million “Turn” Buckets
“Frankenstein” Abstract Game
Frankenstein-Game Strategy
3-player Texas Hold'em Strategy
New!
Hyperborean3p Tournament
CFR 2-player Experts
– Counterfactual Regret Minimization (CFR)
– Many player games – Imperfect recall games
– First set of theoretical properties for CFR in:
– Theoretical and practical improvements for
making CFR go faster
– Techniques for dealing with limited memory
the strongest 3-player limit Texas hold'em strategies in the world.
– “On-line CFR”
– Ultimately challenge humans for
the World Series of Poker
Clip art images used in this presentation can be found at clker.com