C R A U V G K Q A J 10 University of Alberta - - PowerPoint PPT Presentation

c r
SMART_READER_LITE
LIVE PREVIEW

C R A U V G K Q A J 10 University of Alberta - - PowerPoint PPT Presentation

Robust Strategies and Counter-Strategies Building a Champion Level Computer Poker Player Mike Johanson November 20, 2012 Q J K 10 P C R A U V G K Q A J 10 University of Alberta


slide-1
SLIDE 1

Robust Strategies and Counter-Strategies

Building a Champion Level Computer Poker Player Mike Johanson November 20, 2012

U V

A

A

C

K

K

P

Q

Q

R

J

J

G

10

10

University of Alberta Computer Poker Research Group

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 1 / 65

slide-2
SLIDE 2

One Sentence Summary

How can we create a poker program for competing against expert players?

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 2 / 65

slide-3
SLIDE 3

One Sentence Summary

How can we create a poker program for competing against expert players? Three new techniques for finding game theoretic strategies

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 2 / 65

slide-4
SLIDE 4

One Sentence Summary

How can we create a poker program for competing against expert players? Three new techniques for finding game theoretic strategies Useful for poker, applicable to other domains

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 2 / 65

slide-5
SLIDE 5

One Sentence Summary

How can we create a poker program for competing against expert players? Three new techniques for finding game theoretic strategies Useful for poker, applicable to other domains Show the value of these approaches through competitions against expert humans and computers

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 2 / 65

slide-6
SLIDE 6

1

Introduction

2

Playing to Not Lose: Counterfactual Regret Minimization

3

Playing to Win: Frequentist Best Response

4

Playing to Win, Carefully: Restricted Nash Response

5

Competition Results

6

Conclusion

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 3 / 65

slide-7
SLIDE 7

1

Introduction

2

Playing to Not Lose: Counterfactual Regret Minimization

3

Playing to Win: Frequentist Best Response

4

Playing to Win, Carefully: Restricted Nash Response

5

Competition Results

6

Conclusion

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 4 / 65

slide-8
SLIDE 8

The Computer Poker Research Group

The CPRG’s goal: Create poker programs to beat the world’s best poker players Martin Zinkevich and I collaborated on this work

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 5 / 65

slide-9
SLIDE 9

The Computer Poker Research Group

The CPRG’s goal: Create poker programs to beat the world’s best poker players Martin Zinkevich and I collaborated on this work

This is a huge understatement

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 5 / 65

slide-10
SLIDE 10

Texas Hold’em Poker

Poker is a collection of wagering card games

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 6 / 65

slide-11
SLIDE 11

Texas Hold’em Poker

Poker is a collection of wagering card games Texas Hold’em is considered to be the most strategic variant

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 6 / 65

slide-12
SLIDE 12

Texas Hold’em Poker

Poker is a collection of wagering card games Texas Hold’em is considered to be the most strategic variant Players play a series of short games against each other

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 6 / 65

slide-13
SLIDE 13

Texas Hold’em Poker

Poker is a collection of wagering card games Texas Hold’em is considered to be the most strategic variant Players play a series of short games against each other Goal: Win as much money as possible from opponents over this series

  • f games

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 6 / 65

slide-14
SLIDE 14

Heads-Up Texas Hold’em Poker

As the game progresses, more cards are revealed

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 7 / 65

slide-15
SLIDE 15

Heads-Up Texas Hold’em Poker

As the game progresses, more cards are revealed

Private cards that only one player can see and use Public cards that all players can see and use

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 7 / 65

slide-16
SLIDE 16

Heads-Up Texas Hold’em Poker

As the game progresses, more cards are revealed

Private cards that only one player can see and use Public cards that all players can see and use

Players alternate taking actions:

Bet: Make a wager that their cards will be the best Call: Match the opponent’s wager Fold: Surrender this game, and begin a new one.

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 7 / 65

slide-17
SLIDE 17

Heads-Up Texas Hold’em Poker

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 8 / 65

slide-18
SLIDE 18

Heads-Up Texas Hold’em Poker

So, why do we care about poker?

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 8 / 65

slide-19
SLIDE 19

Heads-Up Texas Hold’em Poker

So, why do we care about poker? Poker is stochastic and has imperfect information

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 8 / 65

slide-20
SLIDE 20

Heads-Up Texas Hold’em Poker

So, why do we care about poker? Poker is stochastic and has imperfect information ...like the real world

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 8 / 65

slide-21
SLIDE 21

Heads-Up Texas Hold’em Poker

So, why do we care about poker? Poker is stochastic and has imperfect information ...like the real world Exploitation is important

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 8 / 65

slide-22
SLIDE 22

Heads-Up Texas Hold’em Poker

So, why do we care about poker? Poker is stochastic and has imperfect information ...like the real world Exploitation is important Approaches for other games (such as alpha-beta) don’t apply here — we need to find new techniques

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 8 / 65

slide-23
SLIDE 23

Heads-Up Texas Hold’em Poker

So, why do we care about poker? Poker is stochastic and has imperfect information ...like the real world Exploitation is important Approaches for other games (such as alpha-beta) don’t apply here — we need to find new techniques Our techniques are applicable beyond poker

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 8 / 65

slide-24
SLIDE 24

Strategies and Information Sets

Because of hidden information, some game states are indistinguishable

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 9 / 65

slide-25
SLIDE 25

Strategies and Information Sets

Because of hidden information, some game states are indistinguishable An information set is a set of game states that we cannot tell apart

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 9 / 65

slide-26
SLIDE 26

Strategies and Information Sets

Because of hidden information, some game states are indistinguishable An information set is a set of game states that we cannot tell apart We have to play the same way for every game state in an information set

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 9 / 65

slide-27
SLIDE 27

Strategies and Information Sets

Because of hidden information, some game states are indistinguishable An information set is a set of game states that we cannot tell apart We have to play the same way for every game state in an information set A behavioral strategy is a probability distribution over actions for each information set

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 9 / 65

slide-28
SLIDE 28

Computer Poker

Poker is big — 1018 game states

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 10 / 65

slide-29
SLIDE 29

Computer Poker

Poker is big — 1018 game states We abstract the cards into buckets to make the size more reasonable — 1012

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 10 / 65

slide-30
SLIDE 30

Computer Poker

Poker is big — 1018 game states We abstract the cards into buckets to make the size more reasonable — 1012 Poker strategies for the abstract game are still powerful in the “real” game, but there is a loss

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 10 / 65

slide-31
SLIDE 31

1

Introduction

2

Playing to Not Lose: Counterfactual Regret Minimization

3

Playing to Win: Frequentist Best Response

4

Playing to Win, Carefully: Restricted Nash Response

5

Competition Results

6

Conclusion

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 11 / 65

slide-32
SLIDE 32

Counterfactual Regret Minimization

First approach: a strategy that works against anyone

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 12 / 65

slide-33
SLIDE 33

Counterfactual Regret Minimization

First approach: a strategy that works against anyone Nash Equilibrium: strategy for each player, where no player can do better by unilaterally changing their strategy

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 12 / 65

slide-34
SLIDE 34

Counterfactual Regret Minimization

First approach: a strategy that works against anyone Nash Equilibrium: strategy for each player, where no player can do better by unilaterally changing their strategy Approximation to a Nash equilibrium: no player can do better than ǫ by switching

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 12 / 65

slide-35
SLIDE 35

ǫ-Nash equilibria

Unbeatable (within its abstraction)

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 13 / 65

slide-36
SLIDE 36

ǫ-Nash equilibria

Unbeatable (within its abstraction) The strategy can win if the opponent makes mistakes

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 13 / 65

slide-37
SLIDE 37

ǫ-Nash equilibria

Unbeatable (within its abstraction) The strategy can win if the opponent makes mistakes ...thus “playing to not lose”

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 13 / 65

slide-38
SLIDE 38

ǫ-Nash equilibria

Unbeatable (within its abstraction) The strategy can win if the opponent makes mistakes ...thus “playing to not lose” (We still use these strategies to win)

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 13 / 65

slide-39
SLIDE 39

ǫ-Nash equilibria

Unbeatable (within its abstraction) The strategy can win if the opponent makes mistakes ...thus “playing to not lose” (We still use these strategies to win) Can be found through linear programming, requires memory proportional to number of game states

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 13 / 65

slide-40
SLIDE 40

ǫ-Nash equilibria

Unbeatable (within its abstraction) The strategy can win if the opponent makes mistakes ...thus “playing to not lose” (We still use these strategies to win) Can be found through linear programming, requires memory proportional to number of game states Counterfactual Regret Minimization requires memory proportional to number of information sets — much smaller.

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 13 / 65

slide-41
SLIDE 41

ǫ-Nash equilibria

Unbeatable (within its abstraction) The strategy can win if the opponent makes mistakes ...thus “playing to not lose” (We still use these strategies to win) Can be found through linear programming, requires memory proportional to number of game states Counterfactual Regret Minimization requires memory proportional to number of information sets — much smaller. Poker has 3.16 ∗ 1017 game states and 3.19 ∗ 1014 information sets

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 13 / 65

slide-42
SLIDE 42

Counterfactual Regret Minimization: Theory

Play T games of poker, updating your strategy on each round

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 14 / 65

slide-43
SLIDE 43

Counterfactual Regret Minimization: Theory

Play T games of poker, updating your strategy on each round Find the best strategy you could have used for all of those games

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 14 / 65

slide-44
SLIDE 44

Counterfactual Regret Minimization: Theory

Play T games of poker, updating your strategy on each round Find the best strategy you could have used for all of those games Define Average Overall Regret as:

1 T

T

t=1((Value of best strategy) − (Value of your strategy))

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 14 / 65

slide-45
SLIDE 45

Counterfactual Regret Minimization: Theory

Play T games of poker, updating your strategy on each round Find the best strategy you could have used for all of those games Define Average Overall Regret as:

1 T

T

t=1((Value of best strategy) − (Value of your strategy))

If we minimize Average Overall Regret, the average strategy used over the T games approaches a Nash equilibrium

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 14 / 65

slide-46
SLIDE 46

Counterfactual Regret Minimization: Theory

Play T games of poker, updating your strategy on each round Find the best strategy you could have used for all of those games Define Average Overall Regret as:

1 T

T

t=1((Value of best strategy) − (Value of your strategy))

If we minimize Average Overall Regret, the average strategy used over the T games approaches a Nash equilibrium How do we minimize Average Overall Regret?

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 14 / 65

slide-47
SLIDE 47

Immediate Counterfactual Regret

Break down overall regret into the regret for each action at each information set

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 15 / 65

slide-48
SLIDE 48

Immediate Counterfactual Regret

Break down overall regret into the regret for each action at each information set Regret: How much more utility we could have had if we always took some action instead of using our strategy

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 15 / 65

slide-49
SLIDE 49

Immediate Counterfactual Regret

Break down overall regret into the regret for each action at each information set Regret: How much more utility we could have had if we always took some action instead of using our strategy Immediate Counterfactual Regret: Weight this regret by the probability of the opponent reaching the information set

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 15 / 65

slide-50
SLIDE 50

Immediate Counterfactual Regret

Break down overall regret into the regret for each action at each information set Regret: How much more utility we could have had if we always took some action instead of using our strategy Immediate Counterfactual Regret: Weight this regret by the probability of the opponent reaching the information set Average Overall Regret is less than the sum of Immediate Counterfactual Regret

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 15 / 65

slide-51
SLIDE 51

Immediate Counterfactual Regret

Break down overall regret into the regret for each action at each information set Regret: How much more utility we could have had if we always took some action instead of using our strategy Immediate Counterfactual Regret: Weight this regret by the probability of the opponent reaching the information set Average Overall Regret is less than the sum of Immediate Counterfactual Regret So, if we can minimize our immediate counterfactual regret at each information set , then we approach a Nash equilibrium

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 15 / 65

slide-52
SLIDE 52

Counterfactual Regret Minimization: Basic Idea

Player 1 Player 2

Learns to Beat Learns to Beat

Initialize the strategies’ action probabilities to a uniform distribution

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 16 / 65

slide-53
SLIDE 53

Counterfactual Regret Minimization: Basic Idea

Player 1 Player 2

Learns to Beat Learns to Beat

Initialize the strategies’ action probabilities to a uniform distribution Repeat:

(General) Iterate over all chance outcomes

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 16 / 65

slide-54
SLIDE 54

Counterfactual Regret Minimization: Basic Idea

Player 1 Player 2

Learns to Beat Learns to Beat

Initialize the strategies’ action probabilities to a uniform distribution Repeat:

(General) Iterate over all chance outcomes (Poker-specific) Deal cards to each player, as if playing the game

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 16 / 65

slide-55
SLIDE 55

Counterfactual Regret Minimization: Basic Idea

Player 1 Player 2

Learns to Beat Learns to Beat

Initialize the strategies’ action probabilities to a uniform distribution Repeat:

(General) Iterate over all chance outcomes (Poker-specific) Deal cards to each player, as if playing the game Recurse over all choice nodes. Update the action probabilities at each choice node to minimize regret at that node.

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 16 / 65

slide-56
SLIDE 56

Counterfactual Regret Minimization: Basic Idea

Player 1 Player 2

Learns to Beat Learns to Beat

Initialize the strategies’ action probabilities to a uniform distribution Repeat:

(General) Iterate over all chance outcomes (Poker-specific) Deal cards to each player, as if playing the game Recurse over all choice nodes. Update the action probabilities at each choice node to minimize regret at that node.

How do we update the action probabilities after each game?

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 16 / 65

slide-57
SLIDE 57

Counterfactual Regret

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 17 / 65

slide-58
SLIDE 58

Counterfactual Regret

Compute expected value of each action Strategy’s EV: 4

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 17 / 65

slide-59
SLIDE 59

Counterfactual Regret

Compute expected value of each action Calculate the regret for not taking each action (Regret: Difference between the EV for taking an action and the strategy’s EV) Strategy’s EV: 4 Regret: (-7, 2, 5)

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 17 / 65

slide-60
SLIDE 60

Counterfactual Regret

Compute expected value of each action Calculate the regret for not taking each action (Regret: Difference between the EV for taking an action and the strategy’s EV) Counterfactual Regret: Regret weighted by

  • pponent’s probability of reaching this state

Add up Counterfactual Regret over all games Strategy’s EV: 4 Regret: (-7, 2, 5) Total CFR: (-3.5, 1, 2.5)

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 17 / 65

slide-61
SLIDE 61

Counterfactual Regret

Compute expected value of each action Calculate the regret for not taking each action (Regret: Difference between the EV for taking an action and the strategy’s EV) Counterfactual Regret: Regret weighted by

  • pponent’s probability of reaching this state

Add up Counterfactual Regret over all games Assign new probabilities proportional to accumulated positive CFR Strategy’s EV: 4 Regret: (-7, 2, 5) Total CFR: (-3.5, 1, 2.5) New Probabilities: (0, 0.3, 0.7)

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 17 / 65

slide-62
SLIDE 62

Counterfactual Regret Example 2

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 18 / 65

slide-63
SLIDE 63

Counterfactual Regret Example 2

Strategy’s EV: -8.1

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 18 / 65

slide-64
SLIDE 64

Counterfactual Regret Example 2

Strategy’s EV: -8.1 Regret: (5.1, 2.1, -0.9)

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 18 / 65

slide-65
SLIDE 65

Counterfactual Regret Example 2

Strategy’s EV: -8.1 Regret: (5.1, 2.1, -0.9) Total CFR: (1.6, 3.1, 1.6)

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 18 / 65

slide-66
SLIDE 66

Counterfactual Regret Example 2

Strategy’s EV: -8.1 Regret: (5.1, 2.1, -0.9) Total CFR: (1.6, 3.1, 1.6) New Probabilities: (0.25, 0.5, 0.25)

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 18 / 65

slide-67
SLIDE 67

Performance Bounds

Counterfactual Regret Minimization approaches a Nash equilibrium - how fast does it get there?

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 19 / 65

slide-68
SLIDE 68

Performance Bounds

Counterfactual Regret Minimization approaches a Nash equilibrium - how fast does it get there?

General: # iterations grows quadratically with # information sets

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 19 / 65

slide-69
SLIDE 69

Performance Bounds

Counterfactual Regret Minimization approaches a Nash equilibrium - how fast does it get there?

General: # iterations grows quadratically with # information sets Poker: # iterations grows linearly with # information sets

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 19 / 65

slide-70
SLIDE 70

Performance Bounds

Counterfactual Regret Minimization approaches a Nash equilibrium - how fast does it get there?

General: # iterations grows quadratically with # information sets Poker: # iterations grows linearly with # information sets (Because seeing a few samples of the states in an information set is enough to choose a good strategy for that information set)

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 19 / 65

slide-71
SLIDE 71

Performance Bounds

Counterfactual Regret Minimization approaches a Nash equilibrium - how fast does it get there?

General: # iterations grows quadratically with # information sets Poker: # iterations grows linearly with # information sets (Because seeing a few samples of the states in an information set is enough to choose a good strategy for that information set)

In practical terms: we can solve very large games (1012 states) in under two weeks That’s two orders of magnitude larger than was previously possible

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 19 / 65

slide-72
SLIDE 72

Convergence to a Nash Equilibrium

5 10 15 20 25 2 4 6 8 10 12 14 16 18 Exploitability (mb/h) Iterations in thousands, divided by the number of information sets CFR5 CFR8 CFR10

Abstraction Size (game states) Iterations Time Exp (×109) (×106) (h) (mb/h) 5 6.45 100 33 3.4 6 27.7 200 75 3.1 8 276 750 261 2.7 10 1646 2000 326 2.2

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 20 / 65

slide-73
SLIDE 73

Comparison to the 2006 AAAI Competition

Hyperborean Bluffbot Monash Teddy Average Smallbot2298 61 113 695 474 336 CFR8 106 170 746 517 385

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 21 / 65

slide-74
SLIDE 74

Counterfactual Regret Minimization: Conclusions

Approaches Nash Equilibria faster and with less memory than older techniques

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 22 / 65

slide-75
SLIDE 75

Counterfactual Regret Minimization: Conclusions

Approaches Nash Equilibria faster and with less memory than older techniques The resulting strategies are robust — they work well against any

  • pponent

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 22 / 65

slide-76
SLIDE 76

Counterfactual Regret Minimization: Conclusions

Approaches Nash Equilibria faster and with less memory than older techniques The resulting strategies are robust — they work well against any

  • pponent

But... How exploitable are the opponents?

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 22 / 65

slide-77
SLIDE 77

Counterfactual Regret Minimization: Conclusions

Approaches Nash Equilibria faster and with less memory than older techniques The resulting strategies are robust — they work well against any

  • pponent

But... How exploitable are the opponents? How much better could an exploitive strategy do?

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 22 / 65

slide-78
SLIDE 78

Counterfactual Regret Minimization: Conclusions

Approaches Nash Equilibria faster and with less memory than older techniques The resulting strategies are robust — they work well against any

  • pponent

But... How exploitable are the opponents? How much better could an exploitive strategy do? ”Playing to Not Lose”

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 22 / 65

slide-79
SLIDE 79

1

Introduction

2

Playing to Not Lose: Counterfactual Regret Minimization

3

Playing to Win: Frequentist Best Response

4

Playing to Win, Carefully: Restricted Nash Response

5

Competition Results

6

Conclusion

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 23 / 65

slide-80
SLIDE 80

Frequentist Best Response

Best Response: best possible counter-strategy to some strategy

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 24 / 65

slide-81
SLIDE 81

Frequentist Best Response

Best Response: best possible counter-strategy to some strategy Useful for a few reasons:

Tells you how exploitable that strategy is Could use it during a match to win

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 24 / 65

slide-82
SLIDE 82

Best Response Challenges

“real” best response is intractable

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 25 / 65

slide-83
SLIDE 83

Best Response Challenges

“real” best response is intractable abstract game best response is easy, but has some challenges:

Need to actually have the opponent’s strategy Resulting counter-strategy plays in the same abstraction as the strategy

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 25 / 65

slide-84
SLIDE 84

Best Response Challenges

“real” best response is intractable abstract game best response is easy, but has some challenges:

Need to actually have the opponent’s strategy Resulting counter-strategy plays in the same abstraction as the strategy

(Bigger abstraction == better counter-strategy)

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 25 / 65

slide-85
SLIDE 85

Motivating Frequentist Best Response

We’d like to make best response counter-strategies with fewer restrictions:

What if we don’t have the actual strategy, only observations? What if we want to choose the abstraction that the counter-strategy uses?

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 26 / 65

slide-86
SLIDE 86

Frequentist Best Response: Basic Idea

Observe lots of real-game data — say, 1 million hands

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 27 / 65

slide-87
SLIDE 87

Frequentist Best Response: Basic Idea

Observe lots of real-game data — say, 1 million hands Abstract the data, and do frequency counts on how often actions are taken in each choice node

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 27 / 65

slide-88
SLIDE 88

Frequentist Best Response: Basic Idea

Observe lots of real-game data — say, 1 million hands Abstract the data, and do frequency counts on how often actions are taken in each choice node Construct an opponent model, where action probabilities are just the action frequencies

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 27 / 65

slide-89
SLIDE 89

Frequentist Best Response: Basic Idea

Observe lots of real-game data — say, 1 million hands Abstract the data, and do frequency counts on how often actions are taken in each choice node Construct an opponent model, where action probabilities are just the action frequencies Find the abstract game best response to the opponent model

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 27 / 65

slide-90
SLIDE 90

Frequentist Best Response: Basic Idea

Observe lots of real-game data — say, 1 million hands Abstract the data, and do frequency counts on how often actions are taken in each choice node Construct an opponent model, where action probabilities are just the action frequencies Find the abstract game best response to the opponent model Use the counter-strategy to play against the strategy in the real game

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 27 / 65

slide-91
SLIDE 91

Abstracting the data

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 28 / 65

slide-92
SLIDE 92

Frequentist Best Response

There’s a few variables you need to get right:

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 29 / 65

slide-93
SLIDE 93

Frequentist Best Response

There’s a few variables you need to get right:

Who is the strategy playing against for the million hands? (Self play is bad, because it doesn’t explore the whole strategy space) What do you do in states you never observe? (We assume they call)

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 29 / 65

slide-94
SLIDE 94

Frequentist Best Response

  • 200

200 400 600 800 1000 1200 10000 100000 1e+06 millibets/game won by FBR counter-strategy with 95% confidence interval Training Time (games) Performance of FBR Counter-strategies to Several Opponents as Training Hands Varies FBR(PsOpti4) FBR(Smallbot2298) FBR(Attack80) Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 30 / 65

slide-95
SLIDE 95

Frequentist Best Response

PsOpti4 PsOpti6 Attack60 Attack80 Smallbot1239 Smallbot1399 Smallbot2298 CFR5 Average FBR-PsOpti4 137

  • 163
  • 227
  • 231
  • 106
  • 85
  • 144
  • 210
  • 129

FBR-PsOpti6

  • 79

330

  • 68
  • 89
  • 36
  • 23
  • 48
  • 97
  • 14

FBR-Attack60

  • 442
  • 499

2170

  • 701
  • 359
  • 305
  • 377
  • 620
  • 142

FBR-Attack80

  • 312
  • 281
  • 557

1048

  • 251
  • 231
  • 266
  • 331
  • 148

FBR-Smallbot1239

  • 20

105

  • 89
  • 42

106 91

  • 32
  • 87

3 FBR-Smallbot1399

  • 43

38

  • 48
  • 77

75 118

  • 46
  • 109
  • 11

FBR-Smallbot2298

  • 39

51

  • 50
  • 26

42 50 33

  • 41

2 CFR5 36 123 93 41 70 68 17 56 Max 137 330 2170 1048 106 118 33

Columns are poker strategies we’ve produced in the past Rows are counter-strategies to each strategy CFR5 is a Counterfactual Regret Minimization strategy

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 31 / 65

slide-96
SLIDE 96

Frequentist Best Response

PsOpti4 PsOpti6 Attack60 Attack80 Smallbot1239 Smallbot1399 Smallbot2298 CFR5 Average FBR-PsOpti4 137

  • 163
  • 227
  • 231
  • 106
  • 85
  • 144
  • 210
  • 129

FBR-PsOpti6

  • 79

330

  • 68
  • 89
  • 36
  • 23
  • 48
  • 97
  • 14

FBR-Attack60

  • 442
  • 499

2170

  • 701
  • 359
  • 305
  • 377
  • 620
  • 142

FBR-Attack80

  • 312
  • 281
  • 557

1048

  • 251
  • 231
  • 266
  • 331
  • 148

FBR-Smallbot1239

  • 20

105

  • 89
  • 42

106 91

  • 32
  • 87

3 FBR-Smallbot1399

  • 43

38

  • 48
  • 77

75 118

  • 46
  • 109
  • 11

FBR-Smallbot2298

  • 39

51

  • 50
  • 26

42 50 33

  • 41

2 CFR5 36 123 93 41 70 68 17 56 Max 137 330 2170 1048 106 118 33

Columns are poker strategies we’ve produced in the past Rows are counter-strategies to each strategy CFR5 is a Counterfactual Regret Minimization strategy Two observations:

The diagonal has the matches where the counter-strategy plays against its intended opponent. These scores are all good - significantly higher than the CFR strategy does Everything off the diagonal is horrible

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 31 / 65

slide-97
SLIDE 97

Frequentist Best Response: Conclusions

”Playing to Win”

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 32 / 65

slide-98
SLIDE 98

Frequentist Best Response: Conclusions

”Playing to Win” Frequentist Best Response counter-strategies are useful for defeating specific opponents

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 32 / 65

slide-99
SLIDE 99

Frequentist Best Response: Conclusions

”Playing to Win” Frequentist Best Response counter-strategies are useful for defeating specific opponents We also use them to evaluate our strategies, to see how weak they are

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 32 / 65

slide-100
SLIDE 100

Frequentist Best Response: Conclusions

”Playing to Win” Frequentist Best Response counter-strategies are useful for defeating specific opponents We also use them to evaluate our strategies, to see how weak they are However, they are brittle — when used against other opponens, even weak ones, they can lose badly.

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 32 / 65

slide-101
SLIDE 101

Frequentist Best Response: Conclusions

”Playing to Win” Frequentist Best Response counter-strategies are useful for defeating specific opponents We also use them to evaluate our strategies, to see how weak they are However, they are brittle — when used against other opponens, even weak ones, they can lose badly. Is there a way to keep the exploitiveness of FBR counter-strategies, while also gaining the robustness of CFR strategies?

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 32 / 65

slide-102
SLIDE 102

1

Introduction

2

Playing to Not Lose: Counterfactual Regret Minimization

3

Playing to Win: Frequentist Best Response

4

Playing to Win, Carefully: Restricted Nash Response

5

Competition Results

6

Conclusion

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 33 / 65

slide-103
SLIDE 103

Restricted Nash Response

Exploiting opponents is important — we’d like to win more money than the Counterfactual Regret Minimization strategies do

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 34 / 65

slide-104
SLIDE 104

Restricted Nash Response

Exploiting opponents is important — we’d like to win more money than the Counterfactual Regret Minimization strategies do Frequentist Best Response strategies win lots of money, but are terrible against the wrong opponent

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 34 / 65

slide-105
SLIDE 105

Restricted Nash Response

Exploiting opponents is important — we’d like to win more money than the Counterfactual Regret Minimization strategies do Frequentist Best Response strategies win lots of money, but are terrible against the wrong opponent We’d like a compromise: a strategy that exploits an opponent (or class of opponents), but is also robust against arbitrary opponents

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 34 / 65

slide-106
SLIDE 106

Restricted Nash Response: Motivation

We suspect our opponent will use some strategy

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 35 / 65

slide-107
SLIDE 107

Restricted Nash Response: Motivation

We suspect our opponent will use some strategy What if they only used it, say, 75% of the time?

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 35 / 65

slide-108
SLIDE 108

Restricted Nash Response: Motivation

We suspect our opponent will use some strategy What if they only used it, say, 75% of the time? The other 25% of the time, they can do anything...

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 35 / 65

slide-109
SLIDE 109

Restricted Nash Response: Motivation

We suspect our opponent will use some strategy What if they only used it, say, 75% of the time? The other 25% of the time, they can do anything... ...but lets assume they play a best response to whatever we do

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 35 / 65

slide-110
SLIDE 110

Restricted Nash Response: Motivation

We suspect our opponent will use some strategy What if they only used it, say, 75% of the time? The other 25% of the time, they can do anything... ...but lets assume they play a best response to whatever we do We now have two goals: attack the 75% “weak” strategy, and defend against the 25% “adaptive” strategy

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 35 / 65

slide-111
SLIDE 111

Restricted Nash Response: Basic Idea

In CFR, we had two strategies that adapt to beat each other

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 36 / 65

slide-112
SLIDE 112

Restricted Nash Response: Basic Idea

In CFR, we had two strategies that adapt to beat each other In RNR, we have one strategy for our player, and two for our opponent

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 36 / 65

slide-113
SLIDE 113

Restricted Nash Response: Basic Idea

In CFR, we had two strategies that adapt to beat each other In RNR, we have one strategy for our player, and two for our opponent The opponent’s static strategy is the model we get from Frequentist Best Response

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 36 / 65

slide-114
SLIDE 114

Restricted Nash Response: Basic Idea

In CFR, we had two strategies that adapt to beat each other In RNR, we have one strategy for our player, and two for our opponent The opponent’s static strategy is the model we get from Frequentist Best Response We play millions of games, where our player minimizes regret when playing against both the static and adaptive opponent

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 36 / 65

slide-115
SLIDE 115

Restricted Nash Response: Basic Idea

In CFR, we had two strategies that adapt to beat each other In RNR, we have one strategy for our player, and two for our opponent The opponent’s static strategy is the model we get from Frequentist Best Response We play millions of games, where our player minimizes regret when playing against both the static and adaptive opponent The adaptive opponent minimizes regret when playing against us

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 36 / 65

slide-116
SLIDE 116

Restricted Nash Response: Basic Idea

“Restricted Nash Response”: our opponent is restricted to playing the static strategy some of the time. We approach a Nash equilibrium in this restricted game.

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 37 / 65

slide-117
SLIDE 117

Restricted Nash Response: Picking the Percentage

In the last example, we said the opponent uses the static strategy 75% of the time

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 38 / 65

slide-118
SLIDE 118

Restricted Nash Response: Picking the Percentage

In the last example, we said the opponent uses the static strategy 75% of the time This is actually just a variable, p.

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 38 / 65

slide-119
SLIDE 119

Restricted Nash Response: Picking the Percentage

In the last example, we said the opponent uses the static strategy 75% of the time This is actually just a variable, p. Interpretations of p:

How much you care about exploiting the static strategy How confident you are that the opponent will actually use the static strategy

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 38 / 65

slide-120
SLIDE 120

Restricted Nash Response: Picking the Percentage

If p is low, then the resulting counter-strategy is more like a Nash equilibrium If p is high, then the resulting counter-strategy is more like a best response

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 39 / 65

slide-121
SLIDE 121

Restricted Nash Response: Picking the Percentage

PsOpti4

80 100 120 140 160 180 200 220 240 260 1000 2000 3000 4000 5000 6000 7000 8000 Exploitation (mb/h) Exploitability (mb/h) (0.00) (0.50) (0.75) (0.82) (0.85) (0.90) (0.95) (0.99) (1.00)

X-Axis: How exploitable the counter-strategy is Y-Axis: How much we beat the opponent Labels: The value of p used to generate the strategy

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 40 / 65

slide-122
SLIDE 122

Restricted Nash Response: Picking the Percentage

Attack80

100 200 300 400 500 600 700 800 900 1000 1100 1000 2000 3000 4000 5000 6000 7000 8000 9000 Exploitation (mb/h) Exploitability (mb/h) (0.00) (0.25) (0.40) (0.45) (0.50) (0.55) (0.60) (0.80) (0.90) (0.95) (1.00)

Don’t use a Nash equilibrium - you can win a lot by giving up a tiny amount! Don’t use a Best Response - you can save a lot by giving up a tiny amount!

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 41 / 65

slide-123
SLIDE 123

Restricted Nash Response: Results

Frequentist Best Response:

PsOpti4 PsOpti6 Attack60 Attack80 Smallbot1239 Smallbot1399 Smallbot2298 CFR5 Average FBR-PsOpti4 137

  • 163
  • 227
  • 231
  • 106
  • 85
  • 144
  • 210
  • 129

FBR-PsOpti6

  • 79

330

  • 68
  • 89
  • 36
  • 23
  • 48
  • 97
  • 14

FBR-Attack60

  • 442
  • 499

2170

  • 701
  • 359
  • 305
  • 377
  • 620
  • 142

FBR-Attack80

  • 312
  • 281
  • 557

1048

  • 251
  • 231
  • 266
  • 331
  • 148

FBR-Smallbot1239

  • 20

105

  • 89
  • 42

106 91

  • 32
  • 87

3 FBR-Smallbot1399

  • 43

38

  • 48
  • 77

75 118

  • 46
  • 109
  • 11

FBR-Smallbot2298

  • 39

51

  • 50
  • 26

42 50 33

  • 41

2 CFR5 36 123 93 41 70 68 17 56 Max 137 330 2170 1048 106 118 33

Restricted Nash Response:

Opponents PsOpti4 PsOpti6 Attack60 Attack80 Smallbot1239 Smallbot1399 Smallbot2298 CFR5 Average RNR-PsOpti4 85 112 39 9 63 61

  • 1
  • 23

43 RNR-PsOpti6 26 234 72 34 59 59 1

  • 28

57 RNR-Attack60

  • 17

63 582

  • 22

37 39

  • 9
  • 45

78 RNR-Attack80

  • 7

66 22 293 11 12

  • 29

46 RNR-Smallbot1239 38 130 68 31 111 106 9

  • 20

59 RNR-Smallbot1399 31 136 66 29 105 112 6

  • 24

58 RNR-Smallbot2298 21 137 72 30 77 76 31

  • 11

54 CFR5 36 123 93 41 70 68 17 56 Max 85 234 582 293 111 112 31 Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 42 / 65

slide-124
SLIDE 124

Restricted Nash Response: Conclusions

“Playing to Win, Carefully”

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 43 / 65

slide-125
SLIDE 125

Restricted Nash Response: Conclusions

“Playing to Win, Carefully” Restricted Nash Response makes robust counter-strategies

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 43 / 65

slide-126
SLIDE 126

Restricted Nash Response: Conclusions

“Playing to Win, Carefully” Restricted Nash Response makes robust counter-strategies Exploits one opponent, minimizes weakness against all others

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 43 / 65

slide-127
SLIDE 127

Restricted Nash Response: Conclusions

“Playing to Win, Carefully” Restricted Nash Response makes robust counter-strategies Exploits one opponent, minimizes weakness against all others If you ever have to compute a best response offline, you can do this

  • instead. It’s not so bad if you’re right, and a life saver if you’re wrong.

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 43 / 65

slide-128
SLIDE 128

1

Introduction

2

Playing to Not Lose: Counterfactual Regret Minimization

3

Playing to Win: Frequentist Best Response

4

Playing to Win, Carefully: Restricted Nash Response

5

Competition Results

6

Conclusion

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 44 / 65

slide-129
SLIDE 129

Competition Results

We competed in two competitions at AAAI this year:

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 45 / 65

slide-130
SLIDE 130

Competition Results

We competed in two competitions at AAAI this year: Second AAAI Computer Poker Competition

3 events, 15 competitors, 43 bots Used CFR strategies to get a 1st, a 2nd, and a 3rd

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 45 / 65

slide-131
SLIDE 131

Competition Results

We competed in two competitions at AAAI this year: Second AAAI Computer Poker Competition

3 events, 15 competitors, 43 bots Used CFR strategies to get a 1st, a 2nd, and a 3rd

First Man-Machine Poker Championship

Played against two poker pros, Phil Laak and Ali Eslami Used CFR and RNR strategies to win one, tie one, and lose two Post-game analysis suggests a different result

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 45 / 65

slide-132
SLIDE 132

1

Introduction

2

Playing to Not Lose: Counterfactual Regret Minimization

3

Playing to Win: Frequentist Best Response

4

Playing to Win, Carefully: Restricted Nash Response

5

Competition Results

6

Conclusion

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 46 / 65

slide-133
SLIDE 133

3 new techniques for stochastic, imperfect information games: Counterfactual Regret Minimization

”Playing to Not Lose”

Frequentist Best Response

”Playing to Win”

Restricted Nash Response

”Playing to Win, Carefully”

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 47 / 65

slide-134
SLIDE 134

3 new techniques for stochastic, imperfect information games: Counterfactual Regret Minimization

”Playing to Not Lose” Approximate Nash Equilibrium strategies Runs faster and with lower memory requirements than past techniques

Frequentist Best Response

”Playing to Win”

Restricted Nash Response

”Playing to Win, Carefully”

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 47 / 65

slide-135
SLIDE 135

3 new techniques for stochastic, imperfect information games: Counterfactual Regret Minimization

”Playing to Not Lose” Approximate Nash Equilibrium strategies Runs faster and with lower memory requirements than past techniques

Frequentist Best Response

”Playing to Win” Finds exploitive counter-strategies for specific opponents Useful for finding maximum exploitability of an opponent Brittle when used against other opponents

Restricted Nash Response

”Playing to Win, Carefully”

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 47 / 65

slide-136
SLIDE 136

3 new techniques for stochastic, imperfect information games: Counterfactual Regret Minimization

”Playing to Not Lose” Approximate Nash Equilibrium strategies Runs faster and with lower memory requirements than past techniques

Frequentist Best Response

”Playing to Win” Finds exploitive counter-strategies for specific opponents Useful for finding maximum exploitability of an opponent Brittle when used against other opponents

Restricted Nash Response

”Playing to Win, Carefully” Finds robust counter-strategies for specific opponents Useful for exploiting a suspected tendency Robust when used against other opponents

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 47 / 65

slide-137
SLIDE 137

3 new techniques for stochastic, imperfect information games: Counterfactual Regret Minimization

”Playing to Not Lose” Approximate Nash Equilibrium strategies Runs faster and with lower memory requirements than past techniques

Frequentist Best Response

”Playing to Win” Finds exploitive counter-strategies for specific opponents Useful for finding maximum exploitability of an opponent Brittle when used against other opponents

Restricted Nash Response

”Playing to Win, Carefully” Finds robust counter-strategies for specific opponents Useful for exploiting a suspected tendency Robust when used against other opponents

We proved the value of these techniques through competitive play

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 47 / 65

slide-138
SLIDE 138

Concluding Thoughts

There’s another Computer Poker Competition next year, and we’re hoping for another Man-Machine match

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 48 / 65

slide-139
SLIDE 139

Concluding Thoughts

There’s another Computer Poker Competition next year, and we’re hoping for another Man-Machine match We have many directions to take this work

Better ways to manage a team of strategies Counter-strategies that exploit a wide variety of opponents ...and many other parts of the problem

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 48 / 65

slide-140
SLIDE 140

Concluding Thoughts

There’s another Computer Poker Competition next year, and we’re hoping for another Man-Machine match We have many directions to take this work

Better ways to manage a team of strategies Counter-strategies that exploit a wide variety of opponents ...and many other parts of the problem

The CFR and RNR techniques described in this thesis are iterative

The longer you run the program, the better they get Over the next year, we can produce much stronger poker programs The quality of human play will not improve much this year

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 48 / 65

slide-141
SLIDE 141

Concluding Thoughts

There’s another Computer Poker Competition next year, and we’re hoping for another Man-Machine match We have many directions to take this work

Better ways to manage a team of strategies Counter-strategies that exploit a wide variety of opponents ...and many other parts of the problem

The CFR and RNR techniques described in this thesis are iterative

The longer you run the program, the better they get Over the next year, we can produce much stronger poker programs The quality of human play will not improve much this year

The next Man-Machine match might have a different outcome!

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 48 / 65

slide-142
SLIDE 142

Questions?

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 49 / 65

slide-143
SLIDE 143

AAAI Computer Poker Competition

Second year it’s been run

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 50 / 65

slide-144
SLIDE 144

AAAI Computer Poker Competition

Second year it’s been run Last year: 2 events, 5 competitors, 5 bots This year: 3 events, 15 competitors, 43 bots

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 50 / 65

slide-145
SLIDE 145

AAAI Computer Poker Competition

Second year it’s been run Last year: 2 events, 5 competitors, 5 bots This year: 3 events, 15 competitors, 43 bots 3 Events:

Heads-Up Limit Equilibrium Heads-Up Limit Online Learning Heads-Up No-Limit

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 50 / 65

slide-146
SLIDE 146

AAAI: Heads-Up Limit Equilibrium

Winner determined by total matches (not dollars!) won

Hyperborean07EQ IanBot GS3 PokeMinn Quick Gomel-2 DumboEQ DumboEQ-2 Sequel Sequel-2 PokeMinn-2 UNCC Gomel LeRenard MonashBPP MilanoEQ Average Hyperborean07EQ 21 32 136 115 110 193 182 165 166 131 454 115 138 465 428 194 IanBot

  • 21

4 130 99 85 142 119 131 140 142 472 88 130 408 398 164 GS3

  • 32
  • 4

150 73 112 160 149 140 148 154 467 107 142 412 445 175 PokeMinn

  • 136
  • 130
  • 150

40 144 80 76

  • 33
  • 22
  • 24

373 265 127 627 421 111 Quick

  • 115
  • 99
  • 73
  • 40

19 235 135 125 121 134 298 149 15 564 489 131 Gomel-2

  • 110
  • 85
  • 112
  • 144
  • 19

206 200 135 150 16 275 232 136 802 859 169 DumboEQ

  • 193
  • 142
  • 160
  • 80
  • 235
  • 206

133 67 64 55 23 300 13 774 672 72 DumboEQ-2

  • 182
  • 119
  • 149
  • 76
  • 135
  • 200
  • 133

87 82 83

  • 52

271 54 808 762 74 Sequel

  • 165
  • 131
  • 140

33

  • 125
  • 135
  • 67
  • 87

19 130 167

  • 17

92 556 556 46 Sequel-2

  • 166
  • 140
  • 148

22

  • 121
  • 150
  • 64
  • 82
  • 19

125 174

  • 4

74 583 526 41 PokeMinn-2

  • 131
  • 142
  • 154

24

  • 134
  • 16
  • 55
  • 83
  • 130
  • 125

96 123 60 770 748 57 UNCC

  • 454
  • 472
  • 467
  • 373
  • 298
  • 275
  • 23

52

  • 167
  • 174
  • 96

95

  • 281

553 503

  • 125

Gomel

  • 115
  • 88
  • 107
  • 265
  • 149
  • 232
  • 300
  • 271

17 4

  • 123
  • 95

96 779 993 10 LeRenard

  • 138
  • 130
  • 142
  • 127
  • 15
  • 136
  • 13
  • 54
  • 92
  • 74
  • 60

281

  • 96

478 354 2 MonashBPP

  • 465
  • 408
  • 412
  • 627
  • 564
  • 802
  • 774
  • 808
  • 556
  • 583
  • 770
  • 553
  • 779
  • 478

489

  • 539

MilanoEQ

  • 428
  • 398
  • 445
  • 421
  • 489
  • 859
  • 672
  • 762
  • 556
  • 526
  • 748
  • 503
  • 993
  • 354
  • 489
  • 576

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 51 / 65

slide-147
SLIDE 147

AAAI: Heads-Up Limit Equilibrium

Winner determined by total matches (not dollars!) won Emphasizes winning, not exploiting

Hyperborean07EQ IanBot GS3 PokeMinn Quick Gomel-2 DumboEQ DumboEQ-2 Sequel Sequel-2 PokeMinn-2 UNCC Gomel LeRenard MonashBPP MilanoEQ Average Hyperborean07EQ 21 32 136 115 110 193 182 165 166 131 454 115 138 465 428 194 IanBot

  • 21

4 130 99 85 142 119 131 140 142 472 88 130 408 398 164 GS3

  • 32
  • 4

150 73 112 160 149 140 148 154 467 107 142 412 445 175 PokeMinn

  • 136
  • 130
  • 150

40 144 80 76

  • 33
  • 22
  • 24

373 265 127 627 421 111 Quick

  • 115
  • 99
  • 73
  • 40

19 235 135 125 121 134 298 149 15 564 489 131 Gomel-2

  • 110
  • 85
  • 112
  • 144
  • 19

206 200 135 150 16 275 232 136 802 859 169 DumboEQ

  • 193
  • 142
  • 160
  • 80
  • 235
  • 206

133 67 64 55 23 300 13 774 672 72 DumboEQ-2

  • 182
  • 119
  • 149
  • 76
  • 135
  • 200
  • 133

87 82 83

  • 52

271 54 808 762 74 Sequel

  • 165
  • 131
  • 140

33

  • 125
  • 135
  • 67
  • 87

19 130 167

  • 17

92 556 556 46 Sequel-2

  • 166
  • 140
  • 148

22

  • 121
  • 150
  • 64
  • 82
  • 19

125 174

  • 4

74 583 526 41 PokeMinn-2

  • 131
  • 142
  • 154

24

  • 134
  • 16
  • 55
  • 83
  • 130
  • 125

96 123 60 770 748 57 UNCC

  • 454
  • 472
  • 467
  • 373
  • 298
  • 275
  • 23

52

  • 167
  • 174
  • 96

95

  • 281

553 503

  • 125

Gomel

  • 115
  • 88
  • 107
  • 265
  • 149
  • 232
  • 300
  • 271

17 4

  • 123
  • 95

96 779 993 10 LeRenard

  • 138
  • 130
  • 142
  • 127
  • 15
  • 136
  • 13
  • 54
  • 92
  • 74
  • 60

281

  • 96

478 354 2 MonashBPP

  • 465
  • 408
  • 412
  • 627
  • 564
  • 802
  • 774
  • 808
  • 556
  • 583
  • 770
  • 553
  • 779
  • 478

489

  • 539

MilanoEQ

  • 428
  • 398
  • 445
  • 421
  • 489
  • 859
  • 672
  • 762
  • 556
  • 526
  • 748
  • 503
  • 993
  • 354
  • 489
  • 576

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 51 / 65

slide-148
SLIDE 148

AAAI: Heads-Up Limit Equilibrium

Winner determined by total matches (not dollars!) won Emphasizes winning, not exploiting Took first place, using a CFR bot

Hyperborean07EQ IanBot GS3 PokeMinn Quick Gomel-2 DumboEQ DumboEQ-2 Sequel Sequel-2 PokeMinn-2 UNCC Gomel LeRenard MonashBPP MilanoEQ Average Hyperborean07EQ 21 32 136 115 110 193 182 165 166 131 454 115 138 465 428 194 IanBot

  • 21

4 130 99 85 142 119 131 140 142 472 88 130 408 398 164 GS3

  • 32
  • 4

150 73 112 160 149 140 148 154 467 107 142 412 445 175 PokeMinn

  • 136
  • 130
  • 150

40 144 80 76

  • 33
  • 22
  • 24

373 265 127 627 421 111 Quick

  • 115
  • 99
  • 73
  • 40

19 235 135 125 121 134 298 149 15 564 489 131 Gomel-2

  • 110
  • 85
  • 112
  • 144
  • 19

206 200 135 150 16 275 232 136 802 859 169 DumboEQ

  • 193
  • 142
  • 160
  • 80
  • 235
  • 206

133 67 64 55 23 300 13 774 672 72 DumboEQ-2

  • 182
  • 119
  • 149
  • 76
  • 135
  • 200
  • 133

87 82 83

  • 52

271 54 808 762 74 Sequel

  • 165
  • 131
  • 140

33

  • 125
  • 135
  • 67
  • 87

19 130 167

  • 17

92 556 556 46 Sequel-2

  • 166
  • 140
  • 148

22

  • 121
  • 150
  • 64
  • 82
  • 19

125 174

  • 4

74 583 526 41 PokeMinn-2

  • 131
  • 142
  • 154

24

  • 134
  • 16
  • 55
  • 83
  • 130
  • 125

96 123 60 770 748 57 UNCC

  • 454
  • 472
  • 467
  • 373
  • 298
  • 275
  • 23

52

  • 167
  • 174
  • 96

95

  • 281

553 503

  • 125

Gomel

  • 115
  • 88
  • 107
  • 265
  • 149
  • 232
  • 300
  • 271

17 4

  • 123
  • 95

96 779 993 10 LeRenard

  • 138
  • 130
  • 142
  • 127
  • 15
  • 136
  • 13
  • 54
  • 92
  • 74
  • 60

281

  • 96

478 354 2 MonashBPP

  • 465
  • 408
  • 412
  • 627
  • 564
  • 802
  • 774
  • 808
  • 556
  • 583
  • 770
  • 553
  • 779
  • 478

489

  • 539

MilanoEQ

  • 428
  • 398
  • 445
  • 421
  • 489
  • 859
  • 672
  • 762
  • 556
  • 526
  • 748
  • 503
  • 993
  • 354
  • 489
  • 576

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 51 / 65

slide-149
SLIDE 149

AAAI: Heads-Up Limit Online Learning

Winner determined by total winnings (in dollars)

Hyperborean07OL-2 Hyperborean07OL GS3 IanBot Quick Gomel-2 PokeMinn Sequel Sequel-2 LeRenard DumboOL-2 Average Hyperborean07OL-2

  • 37
  • 27
  • 37

138 155 172 166 178 170 259 114 Hyperborean07OL 37 21 27 116 108 141 153 175 132 207 112 GS3 27

  • 21

6 73 112 150 140 148 142 199 98 IanBot 37

  • 27
  • 6

99 85 130 131 140 130 157 87 Quick

  • 138
  • 116
  • 73
  • 99

19

  • 40

125 121 15 129

  • 6

Gomel-2

  • 155
  • 108
  • 112
  • 85
  • 19
  • 144

135 150 136 123

  • 8

PokeMinn

  • 172
  • 141
  • 150
  • 130

40 144

  • 33
  • 22

127

  • 15
  • 35

Sequel

  • 166
  • 153
  • 140
  • 131
  • 125
  • 135

33 19 92

  • 1
  • 71

Sequel-2

  • 178
  • 175
  • 148
  • 140
  • 121
  • 150

22

  • 19

74 17

  • 82

LeRenard

  • 170
  • 132
  • 142
  • 130
  • 15
  • 136
  • 127
  • 92
  • 74

21

  • 100

DumboOL-2

  • 259
  • 207
  • 199
  • 157
  • 129
  • 123

15 1

  • 17
  • 21
  • 110

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 52 / 65

slide-150
SLIDE 150

AAAI: Heads-Up Limit Online Learning

Winner determined by total winnings (in dollars) Took second place with a CFR bot. We just barely lost to...

Hyperborean07OL-2 Hyperborean07OL GS3 IanBot Quick Gomel-2 PokeMinn Sequel Sequel-2 LeRenard DumboOL-2 Average Hyperborean07OL-2

  • 37
  • 27
  • 37

138 155 172 166 178 170 259 114 Hyperborean07OL 37 21 27 116 108 141 153 175 132 207 112 GS3 27

  • 21

6 73 112 150 140 148 142 199 98 IanBot 37

  • 27
  • 6

99 85 130 131 140 130 157 87 Quick

  • 138
  • 116
  • 73
  • 99

19

  • 40

125 121 15 129

  • 6

Gomel-2

  • 155
  • 108
  • 112
  • 85
  • 19
  • 144

135 150 136 123

  • 8

PokeMinn

  • 172
  • 141
  • 150
  • 130

40 144

  • 33
  • 22

127

  • 15
  • 35

Sequel

  • 166
  • 153
  • 140
  • 131
  • 125
  • 135

33 19 92

  • 1
  • 71

Sequel-2

  • 178
  • 175
  • 148
  • 140
  • 121
  • 150

22

  • 19

74 17

  • 82

LeRenard

  • 170
  • 132
  • 142
  • 130
  • 15
  • 136
  • 127
  • 92
  • 74

21

  • 100

DumboOL-2

  • 259
  • 207
  • 199
  • 157
  • 129
  • 123

15 1

  • 17
  • 21
  • 110

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 52 / 65

slide-151
SLIDE 151

AAAI: Heads-Up Limit Online Learning

Winner determined by total winnings (in dollars) Took second place with a CFR bot. We just barely lost to... ...the other U of A bot

Hyperborean07OL-2 Hyperborean07OL GS3 IanBot Quick Gomel-2 PokeMinn Sequel Sequel-2 LeRenard DumboOL-2 Average Hyperborean07OL-2

  • 37
  • 27
  • 37

138 155 172 166 178 170 259 114 Hyperborean07OL 37 21 27 116 108 141 153 175 132 207 112 GS3 27

  • 21

6 73 112 150 140 148 142 199 98 IanBot 37

  • 27
  • 6

99 85 130 131 140 130 157 87 Quick

  • 138
  • 116
  • 73
  • 99

19

  • 40

125 121 15 129

  • 6

Gomel-2

  • 155
  • 108
  • 112
  • 85
  • 19
  • 144

135 150 136 123

  • 8

PokeMinn

  • 172
  • 141
  • 150
  • 130

40 144

  • 33
  • 22

127

  • 15
  • 35

Sequel

  • 166
  • 153
  • 140
  • 131
  • 125
  • 135

33 19 92

  • 1
  • 71

Sequel-2

  • 178
  • 175
  • 148
  • 140
  • 121
  • 150

22

  • 19

74 17

  • 82

LeRenard

  • 170
  • 132
  • 142
  • 130
  • 15
  • 136
  • 127
  • 92
  • 74

21

  • 100

DumboOL-2

  • 259
  • 207
  • 199
  • 157
  • 129
  • 123

15 1

  • 17
  • 21
  • 110

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 52 / 65

slide-152
SLIDE 152

AAAI: Heads-Up Limit Online Learning

Winner determined by total winnings (in dollars) Took second place with a CFR bot. We just barely lost to... ...the other U of A bot (Darse Billings and Morgan Kan)

Hyperborean07OL-2 Hyperborean07OL GS3 IanBot Quick Gomel-2 PokeMinn Sequel Sequel-2 LeRenard DumboOL-2 Average Hyperborean07OL-2

  • 37
  • 27
  • 37

138 155 172 166 178 170 259 114 Hyperborean07OL 37 21 27 116 108 141 153 175 132 207 112 GS3 27

  • 21

6 73 112 150 140 148 142 199 98 IanBot 37

  • 27
  • 6

99 85 130 131 140 130 157 87 Quick

  • 138
  • 116
  • 73
  • 99

19

  • 40

125 121 15 129

  • 6

Gomel-2

  • 155
  • 108
  • 112
  • 85
  • 19
  • 144

135 150 136 123

  • 8

PokeMinn

  • 172
  • 141
  • 150
  • 130

40 144

  • 33
  • 22

127

  • 15
  • 35

Sequel

  • 166
  • 153
  • 140
  • 131
  • 125
  • 135

33 19 92

  • 1
  • 71

Sequel-2

  • 178
  • 175
  • 148
  • 140
  • 121
  • 150

22

  • 19

74 17

  • 82

LeRenard

  • 170
  • 132
  • 142
  • 130
  • 15
  • 136
  • 127
  • 92
  • 74

21

  • 100

DumboOL-2

  • 259
  • 207
  • 199
  • 157
  • 129
  • 123

15 1

  • 17
  • 21
  • 110

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 52 / 65

slide-153
SLIDE 153

AAAI: Heads-Up No-Limit

No-Limit is what you see on TV - bets can be any size

BluffBot20 GS3 Hyperborean07 SlideRule Gomel Gomel-2 Milano Manitoba PokeMinn Manitoba-2 Average BluffBot20 267 380 576 2093 2885 3437 475 1848 2471 1603 GS3

  • 267

113 503 3161 124 1875 4204

  • 42055

5016

  • 3036

Hyperborean07

  • 380
  • 113
  • 48

6657 5455 6795 8697 12051 22116 6803 SlideRule

  • 576
  • 503

48 11596 9730 10337 10387 15637 10791 7494 Gomel

  • 2093
  • 3161
  • 6657
  • 11596

3184 8372 11450 62389 52325 12690 Gomel-2

  • 2885
  • 124
  • 5455
  • 9730
  • 3184

15078 11907 58985 40256 11650 Milano

  • 3437
  • 1875
  • 6795
  • 10337
  • 8372
  • 15078

5741 12719 27040

  • 44

Manitoba

  • 475
  • 4204
  • 8697
  • 10387
  • 11450
  • 11907
  • 5741

18817 50677 1848 PokeMinn

  • 1848

42055

  • 14051
  • 15637
  • 62389
  • 58985
  • 12719
  • 18817

34299

  • 12010

Manitoba-2

  • 2471
  • 5016
  • 22116
  • 10791
  • 52325
  • 40256
  • 27040
  • 50677
  • 34299
  • 27221

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 53 / 65

slide-154
SLIDE 154

AAAI: Heads-Up No-Limit

No-Limit is what you see on TV - bets can be any size This was our first time making a No-Limit bot

BluffBot20 GS3 Hyperborean07 SlideRule Gomel Gomel-2 Milano Manitoba PokeMinn Manitoba-2 Average BluffBot20 267 380 576 2093 2885 3437 475 1848 2471 1603 GS3

  • 267

113 503 3161 124 1875 4204

  • 42055

5016

  • 3036

Hyperborean07

  • 380
  • 113
  • 48

6657 5455 6795 8697 12051 22116 6803 SlideRule

  • 576
  • 503

48 11596 9730 10337 10387 15637 10791 7494 Gomel

  • 2093
  • 3161
  • 6657
  • 11596

3184 8372 11450 62389 52325 12690 Gomel-2

  • 2885
  • 124
  • 5455
  • 9730
  • 3184

15078 11907 58985 40256 11650 Milano

  • 3437
  • 1875
  • 6795
  • 10337
  • 8372
  • 15078

5741 12719 27040

  • 44

Manitoba

  • 475
  • 4204
  • 8697
  • 10387
  • 11450
  • 11907
  • 5741

18817 50677 1848 PokeMinn

  • 1848

42055

  • 14051
  • 15637
  • 62389
  • 58985
  • 12719
  • 18817

34299

  • 12010

Manitoba-2

  • 2471
  • 5016
  • 22116
  • 10791
  • 52325
  • 40256
  • 27040
  • 50677
  • 34299
  • 27221

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 53 / 65

slide-155
SLIDE 155

AAAI: Heads-Up No-Limit

No-Limit is what you see on TV - bets can be any size This was our first time making a No-Limit bot Took third place, using a CFR bot with abstracted betting

BluffBot20 GS3 Hyperborean07 SlideRule Gomel Gomel-2 Milano Manitoba PokeMinn Manitoba-2 Average BluffBot20 267 380 576 2093 2885 3437 475 1848 2471 1603 GS3

  • 267

113 503 3161 124 1875 4204

  • 42055

5016

  • 3036

Hyperborean07

  • 380
  • 113
  • 48

6657 5455 6795 8697 12051 22116 6803 SlideRule

  • 576
  • 503

48 11596 9730 10337 10387 15637 10791 7494 Gomel

  • 2093
  • 3161
  • 6657
  • 11596

3184 8372 11450 62389 52325 12690 Gomel-2

  • 2885
  • 124
  • 5455
  • 9730
  • 3184

15078 11907 58985 40256 11650 Milano

  • 3437
  • 1875
  • 6795
  • 10337
  • 8372
  • 15078

5741 12719 27040

  • 44

Manitoba

  • 475
  • 4204
  • 8697
  • 10387
  • 11450
  • 11907
  • 5741

18817 50677 1848 PokeMinn

  • 1848

42055

  • 14051
  • 15637
  • 62389
  • 58985
  • 12719
  • 18817

34299

  • 12010

Manitoba-2

  • 2471
  • 5016
  • 22116
  • 10791
  • 52325
  • 40256
  • 27040
  • 50677
  • 34299
  • 27221

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 53 / 65

slide-156
SLIDE 156

AAAI: Heads-Up No-Limit

No-Limit is what you see on TV - bets can be any size This was our first time making a No-Limit bot Took third place, using a CFR bot with abstracted betting We hope to do better next year! Lots of exciting work to be done here.

BluffBot20 GS3 Hyperborean07 SlideRule Gomel Gomel-2 Milano Manitoba PokeMinn Manitoba-2 Average BluffBot20 267 380 576 2093 2885 3437 475 1848 2471 1603 GS3

  • 267

113 503 3161 124 1875 4204

  • 42055

5016

  • 3036

Hyperborean07

  • 380
  • 113
  • 48

6657 5455 6795 8697 12051 22116 6803 SlideRule

  • 576
  • 503

48 11596 9730 10337 10387 15637 10791 7494 Gomel

  • 2093
  • 3161
  • 6657
  • 11596

3184 8372 11450 62389 52325 12690 Gomel-2

  • 2885
  • 124
  • 5455
  • 9730
  • 3184

15078 11907 58985 40256 11650 Milano

  • 3437
  • 1875
  • 6795
  • 10337
  • 8372
  • 15078

5741 12719 27040

  • 44

Manitoba

  • 475
  • 4204
  • 8697
  • 10387
  • 11450
  • 11907
  • 5741

18817 50677 1848 PokeMinn

  • 1848

42055

  • 14051
  • 15637
  • 62389
  • 58985
  • 12719
  • 18817

34299

  • 12010

Manitoba-2

  • 2471
  • 5016
  • 22116
  • 10791
  • 52325
  • 40256
  • 27040
  • 50677
  • 34299
  • 27221

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 53 / 65

slide-157
SLIDE 157

First Man-Machine Poker Championship

Beating human experts is a big milestone

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 54 / 65

slide-158
SLIDE 158

First Man-Machine Poker Championship

Beating human experts is a big milestone Tough to get statistical significance against humans

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 54 / 65

slide-159
SLIDE 159

First Man-Machine Poker Championship

Beating human experts is a big milestone Tough to get statistical significance against humans So we played two at once with the same cards

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 54 / 65

slide-160
SLIDE 160

First Man-Machine Poker Championship

Beating human experts is a big milestone Tough to get statistical significance against humans So we played two at once with the same cards Four matches of 500 hands each

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 54 / 65

slide-161
SLIDE 161

First Man-Machine Poker Championship

Beating human experts is a big milestone Tough to get statistical significance against humans So we played two at once with the same cards Four matches of 500 hands each Have to be ahead by 25 small bets to win a match

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 54 / 65

slide-162
SLIDE 162

Phil Laak

Background: Mechanical Engineer Started gambling in competitive backgammon Competes in the world’s biggest poker tournaments

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 55 / 65

slide-163
SLIDE 163

Ali Eslami

Background: Computer consultant Started out by playing... Plays in $1000-$2000 Limit games

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 56 / 65

slide-164
SLIDE 164

Ali Eslami

Background: Computer consultant Started out by playing... Magic: The Gathering Plays in $1000-$2000 Limit games

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 56 / 65

slide-165
SLIDE 165

Ali Eslami

Background: Computer consultant Started out by playing... Magic: The Gathering Plays in $1000-$2000 Limit games (This is a lot of money!)

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 56 / 65

slide-166
SLIDE 166

Day 1, Session 1

We had 10 different bots to use:

Several Counterfactual Regret Minimization approximate Nash equilibria Flavours of Restricted Nash Response counter-strategies

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 57 / 65

slide-167
SLIDE 167

Day 1, Session 1

We had 10 different bots to use:

Several Counterfactual Regret Minimization approximate Nash equilibria Flavours of Restricted Nash Response counter-strategies

We wanted a baseline to compare future bots against Bot used: Mr. Pink, our finest abstraction CFR approximate Nash equilibrium

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 57 / 65

slide-168
SLIDE 168

Day 1, Session 1

  • 40
  • 20

20 40 60 80 100 50 100 150 200 250 300 350 400 450 500 Small Bets Games Bankroll DIVAT

On Stage: Ali Eslami

  • 180
  • 160
  • 140
  • 120
  • 100
  • 80
  • 60
  • 40
  • 20

20 50 100 150 200 250 300 350 400 450 500 Small Bets Games Bankroll DIVAT

Hotel: Phil Laak

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 58 / 65

slide-169
SLIDE 169

Day 1, Session 1

  • 40
  • 20

20 40 60 80 100 50 100 150 200 250 300 350 400 450 500 Small Bets Games Bankroll DIVAT

On Stage: Ali Eslami

  • 180
  • 160
  • 140
  • 120
  • 100
  • 80
  • 60
  • 40
  • 20

20 50 100 150 200 250 300 350 400 450 500 Small Bets Games Bankroll DIVAT

Hotel: Phil Laak Ali: $395 Phil: -$465 Polaris ends ahead by $70 Result: Tie

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 58 / 65

slide-170
SLIDE 170

Day 1, Session 2

Score so far: 1 Tie The careful choice (Mr. Pink) did OK, so lets try something crazy!

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 59 / 65

slide-171
SLIDE 171

Day 1, Session 2

Score so far: 1 Tie The careful choice (Mr. Pink) did OK, so lets try something crazy! Bot used: Mr. Orange / Crazy 8s It’s a CFR approximate Nash equilibrium in a broken game that encourages aggression

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 59 / 65

slide-172
SLIDE 172

Day 1, Session 2

  • 300
  • 250
  • 200
  • 150
  • 100
  • 50

50 50 100 150 200 250 300 350 400 450 500 Small Bets Games Bankroll DIVAT

Hotel: Ali Eslami

  • 50

50 100 150 200 50 100 150 200 250 300 350 400 450 500 Small Bets Games Bankroll DIVAT

On Stage: Phil Laak

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 60 / 65

slide-173
SLIDE 173

Day 1, Session 2

  • 300
  • 250
  • 200
  • 150
  • 100
  • 50

50 50 100 150 200 250 300 350 400 450 500 Small Bets Games Bankroll DIVAT

Hotel: Ali Eslami

  • 50

50 100 150 200 50 100 150 200 250 300 350 400 450 500 Small Bets Games Bankroll DIVAT

On Stage: Phil Laak Ali:-$2495 Phil: $1570 Polaris ends ahead by $925 Result: Win

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 60 / 65

slide-174
SLIDE 174

Day 2, Session 1

Score so far: 1 Win, 1 Tie Which of our 10 bots to use this time?

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 61 / 65

slide-175
SLIDE 175

Day 2, Session 1

Score so far: 1 Win, 1 Tie Which of our 10 bots to use this time? We pulled an all nighter and ran importance sampling on the last 1000 hands Predicted the best 3 bots to use against each player

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 61 / 65

slide-176
SLIDE 176

Day 2, Session 1

Score so far: 1 Win, 1 Tie Which of our 10 bots to use this time? We pulled an all nighter and ran importance sampling on the last 1000 hands Predicted the best 3 bots to use against each player Used a coach that chose between these 3 during the match

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 61 / 65

slide-177
SLIDE 177

Day 2, Session 1

  • 150
  • 100
  • 50

50 100 50 100 150 200 250 300 350 400 450 500 Small Bets Games Bankroll DIVAT

Hotel: Ali Eslami

  • 50

50 100 150 200 250 50 100 150 200 250 300 350 400 450 500 Small Bets Games Bankroll DIVAT

On Stage: Phil Laak

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 62 / 65

slide-178
SLIDE 178

Day 2, Session 1

  • 150
  • 100
  • 50

50 100 50 100 150 200 250 300 350 400 450 500 Small Bets Games Bankroll DIVAT

Hotel: Ali Eslami

  • 50

50 100 150 200 250 50 100 150 200 250 300 350 400 450 500 Small Bets Games Bankroll DIVAT

On Stage: Phil Laak Ali:-$635 Phil: $1455 Polaris ends behind by $820 Result: Loss

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 62 / 65

slide-179
SLIDE 179

Day 2, Session 2

Score so far: 1 Win, 1 Tie, 1 Loss Decided to play it safe and go for a tie

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 63 / 65

slide-180
SLIDE 180

Day 2, Session 2

Score so far: 1 Win, 1 Tie, 1 Loss Decided to play it safe and go for a tie Bot used: Mr. Pink, the approximate Nash equilibrium from the first match

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 63 / 65

slide-181
SLIDE 181

Day 2, Session 2

  • 100
  • 80
  • 60
  • 40
  • 20

20 40 60 80 100 100 200 300 400 500 Small Bets Games Bankroll DIVAT

Onstage: Ali Eslami

  • 80
  • 60
  • 40
  • 20

20 40 60 80 100 200 300 400 500 Small Bets Games Bankroll DIVAT

Hotel: Phil Laak

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 64 / 65

slide-182
SLIDE 182

Day 2, Session 2

  • 100
  • 80
  • 60
  • 40
  • 20

20 40 60 80 100 100 200 300 400 500 Small Bets Games Bankroll DIVAT

Onstage: Ali Eslami

  • 80
  • 60
  • 40
  • 20

20 40 60 80 100 200 300 400 500 Small Bets Games Bankroll DIVAT

Hotel: Phil Laak Ali: $460 Phil: $110 Polaris ends behind by $570 Result: Loss

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 64 / 65

slide-183
SLIDE 183

Man-Machine Match Conclusions

Very close game — we lost by 0.01 small bets/game, less than the tie margin

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 65 / 65

slide-184
SLIDE 184

Man-Machine Match Conclusions

Very close game — we lost by 0.01 small bets/game, less than the tie margin Ali: “This was not a win for us...I played the best heads-up poker I’ve ever played...we just barely won”

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 65 / 65

slide-185
SLIDE 185

Man-Machine Match Conclusions

Very close game — we lost by 0.01 small bets/game, less than the tie margin Ali: “This was not a win for us...I played the best heads-up poker I’ve ever played...we just barely won” Post-game analysis (DIVAT) suggests that we outplayed them

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 65 / 65

slide-186
SLIDE 186

Man-Machine Match Conclusions

Very close game — we lost by 0.01 small bets/game, less than the tie margin Ali: “This was not a win for us...I played the best heads-up poker I’ve ever played...we just barely won” Post-game analysis (DIVAT) suggests that we outplayed them We’d like to do another match next year

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 65 / 65

slide-187
SLIDE 187

Man-Machine Match Conclusions

Very close game — we lost by 0.01 small bets/game, less than the tie margin Ali: “This was not a win for us...I played the best heads-up poker I’ve ever played...we just barely won” Post-game analysis (DIVAT) suggests that we outplayed them We’d like to do another match next year There’s lots of exciting work to do here, too!

Mike Johanson () Robust Strategies and Counter-Strategies November 20, 2012 65 / 65