Richard Gibson Ph.D. Thesis Presentation December 6, 2013 Computer - - PowerPoint PPT Presentation

richard gibson ph d thesis presentation december 6 2013
SMART_READER_LITE
LIVE PREVIEW

Richard Gibson Ph.D. Thesis Presentation December 6, 2013 Computer - - PowerPoint PPT Presentation

Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker Agents Richard Gibson Ph.D. Thesis Presentation December 6, 2013 Computer Poker Research Group Heads Up Limit Texas Hold'em Source: ebaumsworld.com


slide-1
SLIDE 1

Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker Agents Richard Gibson

Ph.D. Thesis Presentation December 6, 2013

slide-2
SLIDE 2

Computer Poker Research Group

slide-3
SLIDE 3

Heads Up Limit Texas Hold'em

Source: ebaumsworld.com

Bet! Fold? Call? Raise?

slide-4
SLIDE 4

Heads Up No-limit Texas Hold'em

Source: ebaumsworld.com

All-in! Bet!

slide-5
SLIDE 5

3-Player Limit Texas Hold'em

Source: ebaumsworld.com

Fold? Call? Raise? Bet! Call.

Source: toonpool.com

slide-6
SLIDE 6

3-Player Limit Texas Hold'em

Source: ebaumsworld.com Source: toonpool.com

2010 - 2013

Hyperborean3p

slide-7
SLIDE 7

Hyperborean3p

  • No theory

– 3-player – Imperfect recall

  • Slow
  • Memory expensive

2009

slide-8
SLIDE 8

Hyperborean3p

  • No theory

– 3-player – Imperfect recall

  • Slow
  • Memory expensive

2009

  • New theory

– Many players – Imperfect recall

  • Fast
  • Improved performance

with limited memory

2013

slide-9
SLIDE 9

Outline of Presentation

  • Background

– Counterfactual Regret Minimization (CFR)

  • Theoretical Advancements for CFR in:

– Many player games – Imperfect recall games

  • CFR Speed-Ups
  • Tricks with Memory Limitations
  • Conclusion + Future Work
slide-10
SLIDE 10

Outline of Presentation

  • Background

– Counterfactual Regret Minimization (CFR)

  • Theoretical Advancements for CFR in:

– Many player games – Imperfect recall games

  • CFR Speed-Ups
  • Tricks with Memory Limitations
  • Conclusion + Future Work
slide-11
SLIDE 11

Background - Kuhn Poker

slide-12
SLIDE 12

Background - Kuhn Poker

c

slide-13
SLIDE 13

Background - Kuhn Poker

c

?

QJ QK 1/6 1/6 ... ...

slide-14
SLIDE 14

Background - Kuhn Poker

1 1 b c b c

Check / Bet ? Information set

?

c QJ QK 1/6 1/6 ... ...

slide-15
SLIDE 15

Background - Kuhn Poker

1 1 2 b c 2 b c f c f c

Bet! Fold / Call ?

?

c QJ QK 1/6 1/6 ... ...

slide-16
SLIDE 16

Background - Kuhn Poker

1 1 2 b c 2 b c f c +1 f c +1

Fold. Bet! +1

  • 1

c QJ QK 1/6 1/6 ... ...

slide-17
SLIDE 17

Background - Kuhn Poker

1 1 2 b c 2 b c f c +1

  • 2

f c +1 +2

Call. Bet! +2 / -2

  • 2 / +2

/

c QJ QK 1/6 1/6 ... ...

slide-18
SLIDE 18

Background - Kuhn Poker

1 1 2 2 b c 2 2 b c f c +1

  • 2

c b f c +1 +2 c b

Check. Check / Bet ?

?

c QJ QK 1/6 1/6 ... ...

slide-19
SLIDE 19

Background - Kuhn Poker

1 1 2 2 b c 2 2 b c f c +1

  • 2

c b f c +1 +2 c b

Check. Check.

  • 1 / +1

/ +1 / -1

+1

  • 1

c QJ QK 1/6 1/6 ... ...

slide-20
SLIDE 20

Background - Kuhn Poker

1 1 2 2 b c 2 2 b c f c +1

  • 2

c b f c +1 +2 c b

Fold / Call ? Bet!

+1

  • 1

1 1 f

  • 1

+2 c f

  • 1
  • 2

c

Information set

?

c QJ QK 1/6 1/6 ... ...

slide-21
SLIDE 21

Background

1 1 2 2 b c 2 2 b c f c +1

  • 2

c b f c +1 +2 c b +1

  • 1

1 1 f

  • 1

+2 c f

  • 1
  • 2

c

In general:

Extensive-Form Game

c QJ QK 1/6 1/6 ... ...

slide-22
SLIDE 22

Background

1 1 2 2 .4 .6 2 2 .4 .6 .1 .9 +1

  • 2

1 1 +1 +2 .8 .2 +1

  • 1

1 1 .7

  • 1

+2 .3 .7

  • 1
  • 2

.3

In general:

Extensive-Form Game

Strategy Profile

c QJ QK 1/6 1/6 ... ...

slide-23
SLIDE 23

Background

In general:

Extensive-Form Game

Nash Equilibrium Strategy Profile

Nash equilibrium:

“No one can change their strategy and do any better.”

slide-24
SLIDE 24

Background

In general:

Extensive-Form Game

Nash Equilibrium Strategy Profile

Nash equilibrium:

“No one can change their strategy and do any better.”

1/3 1/3 1/3

Every game has a Nash equilibrium.

slide-25
SLIDE 25

Background

In general:

Extensive-Form Game

Nash Equilibrium Strategy Profile

Nash equilibrium:

“No one can change their strategy and do any better.”

1/3 1/3 1/3

Every game has a Nash equilibrium.

?

slide-26
SLIDE 26

Outline of Presentation

  • Background

– Counterfactual Regret Minimization (CFR)

  • Theoretical Advancements for CFR in:

– Many player games – Imperfect recall games

  • CFR Speed-Ups
  • Tricks for CFR with Memory Limitations
  • Conclusion + Future Work
slide-27
SLIDE 27

CFR

1 1 2 2 b c 2 2 b c f c +1

  • 2

c b f c +1 +2 c b +1

  • 1

1 1 f

  • 1

+2 c f

  • 1
  • 2

c

  • “The alpha-beta

search of imperfect information games.”

c QJ QK 1/6 1/6 ... ...

slide-28
SLIDE 28

CFR

1 1 2 2 b c 2 2 b c f c +1

  • 2

c b f c +1 +2 c b +1

  • 1

1 1 f

  • 1

+2 c f

  • 1
  • 2

c

  • “The alpha-beta

search of imperfect information games.”

  • Offline algorithm

c QJ QK 1/6 1/6 ... ...

slide-29
SLIDE 29

CFR

  • “The alpha-beta

search of imperfect information games.”

  • Offline algorithm
  • Iterative, “self-play”

1 1 2 2 .5 .5 2 2 .5 .5 .5 .5 +1

  • 2

.5 .5 .5 .5 +1 +2 .5 .5 +1

  • 1

1 1 .5

  • 1

+2 .5 .5

  • 1
  • 2

.5 c QJ QK 1/6 1/6 ... ...

slide-30
SLIDE 30

CFR

  • “The alpha-beta

search of imperfect information games.”

  • Offline algorithm
  • Iterative, “self-play”
  • For each iteration,

update action probabilities at every information set.

1 1 2 2 .3 .7 2 2 .3 .7 1 +1

  • 2

1 1 +1 +2 .8 .2 +1

  • 1

1 1 .5

  • 1

+2 .5 .5

  • 1
  • 2

.5 c QJ QK 1/6 1/6 ... ...

slide-31
SLIDE 31

CFR

Nash Equilibrium Strategy Profile Strategy 1 + Strategy 2 + ... + Strategy T T Average Strategy Profile

=

T

T = number of iterations

slide-32
SLIDE 32

Background

Extensive-Form Game

Nash Equilibrium Strategy Profile

CFR

slide-33
SLIDE 33

Background

Kuhn Poker

Nash Equilibrium Strategy Profile

CFR

slide-34
SLIDE 34

Background

Texas Hold'em

Nash Equilibrium Strategy Profile

CFR

>1014 information sets > 5 million GB

slide-35
SLIDE 35

Background

Large Extensive-Form Game

Nash Equilibrium Strategy Profile

?

slide-36
SLIDE 36

Background

Large Extensive-Form Game Abstract Game

slide-37
SLIDE 37

Background

Abstract Game

  • Merge card deals into buckets.

Extensive-Form Game

slide-38
SLIDE 38

Background

Abstract Game

  • Merge card deals into buckets.

Extensive-Form Game

slide-39
SLIDE 39

Background

Extensive-Form Game Abstract Game

>1014

≈109

slide-40
SLIDE 40

Background

Extensive-Form Game Abstract Game

>1014

≈109

Abstract Game Equilibrium Strategy

CFR

slide-41
SLIDE 41

Background

Extensive-Form Game Abstract Game

>1014

≈109

Abstract Game Equilibrium Strategy

Approximate Full Game Equilibrium Strategy

≈100 GB

CFR

slide-42
SLIDE 42

Outline of Presentation

  • Background

– Counterfactual Regret Minimization (CFR)

  • Theoretical Advancements for CFR in:

– Many player games – Imperfect recall games

  • CFR Speed-Ups
  • Tricks with Memory Limitations
  • Conclusion + Future Work
slide-43
SLIDE 43

Theory – Many Player Games

Extensive-Form Game

Nash Equilibrium Strategy Profile

CFR

L I A R !

slide-44
SLIDE 44

Theory – Many Player Games

3-or-more Player Game

?

(Not equilibrium)

CFR

2-player Zero-Sum Game

Nash Equilibrium Strategy Profile

CFR

slide-45
SLIDE 45

Theory – Many Player Games

3-player Limit Texas Hold'em

Good strategy? (Not equilibrium)

CFR

Agent Total Bankroll (mbb/g) Hyperborean3p 319 ± 2 dpp 171 ± 2 akuma 151 ± 2 CMURingLimit

  • 37 ± 2

dcu3pl

  • 63 ± 2

Bluechip

  • 548 ± 2

Annual Computer Poker Competition 3-Player Limit Texas Hold'em - 2009

slide-46
SLIDE 46

Theory – Many Player Games

c 1 1 2 2 2 2 1

  • 1

1 QJ QK c b b c c b f c c b f c f c f c 1/6 1/6 +1

  • 1

+2 +1 +2

  • 1
  • 1
  • 2

+1

  • 2

... ...

slide-47
SLIDE 47

Theory – Many Player Games

1 1 2 2 2 2 1

  • 1

1 c b b c c b f c c b f c f c f c +1

  • 1

+2 +1 +2

  • 1
  • 1
  • 2

+1

  • 2

c QJ QK 1/6 1/6 ... ...

slide-48
SLIDE 48

Theory – Many Player Games

1 1 2 2 2 2 1

  • 1

1 c b b c c b f c c b f c f c f c +1

  • 1

+2 +1 +2

  • 1
  • 1
  • 2

+1

  • 2

Dominated Strategies

c QJ QK 1/6 1/6 ... ...

slide-49
SLIDE 49

Theory – Many Player Games

2 2 2 2 1

  • 1

1 c b b c c b f c b c f c f c +1

  • 1

+2 +1

  • 1
  • 1
  • 2
  • 2

c QJ QK 1/6 1/6 ... ... 1 1

slide-50
SLIDE 50

Theory – Many Player Games

1 1 2 2 2 2 1

  • 1

1 c b b c c b f c b c f c f c +1

  • 1

+2 +1

  • 1
  • 1
  • 2
  • 2

Iteratively Dominated Strategy

c QJ QK 1/6 1/6 ... ...

slide-51
SLIDE 51

Theory – Many Player Games

Average Strategy Profile T

No Iteratively Dominated Strategies 3-or-more Player Game

CFR New!

[G., arXiv ePrints 2013]

slide-52
SLIDE 52

Theory – Many Player Games

Average Strategy Profile T

No Iteratively Dominated Strategies 3-or-more Player Game

CFR New!

“Current” Strategy Profile T Finite T

No Iteratively Dominated Strategies 3-or-more Player Game

CFR New!

[G., arXiv ePrints 2013]

slide-53
SLIDE 53

Theory – Many Player Games

3-Player Limit Texas Hold'em - 2012

New!

slide-54
SLIDE 54

Outline of Presentation

  • Background

– Counterfactual Regret Minimization (CFR)

  • Theoretical Advancements for CFR in:

– Many player games – Imperfect recall games

  • CFR Speed-Ups
  • Tricks with Memory Limitations
  • Conclusion + Future Work
slide-55
SLIDE 55

Imperfect Recall

Extensive-Form Game Abstract Game

Abstract Game Equilibrium Strategy

CFR

L I A R ! L I A R !

slide-56
SLIDE 56

Imperfect Recall

“Imperfect Recall” Abstract Game

?

(Not equilibrium)

CFR

“Perfect Recall” Abstract Game

Abstract Game Equilibrium Strategy

CFR

slide-57
SLIDE 57

Imperfect Recall

Pre-flop

slide-58
SLIDE 58

Imperfect Recall

Pre-flop Flop

slide-59
SLIDE 59

Imperfect Recall

Imperfect Recall Abstract Game

slide-60
SLIDE 60

Imperfect Recall

Perfect Recall Abstract Game

slide-61
SLIDE 61

Imperfect Recall

Extensive-Form Game

“Well-formed” Imperfect Recall Game

Abstract Game Equilibrium Strategy

CFR

  • Unfortunately, does not prove

convergence for our best poker abstractions.

  • However, this was used in diabetes

treatment research

[Chen and Bowling, NIPS 2012]. [Lanctot, G., Burch and Bowling, ICML 2012]

New!

slide-62
SLIDE 62

Outline of Presentation

  • Background

– Counterfactual Regret Minimization (CFR)

  • Theoretical Advancements for CFR in:

– Many player games – Imperfect recall games

  • CFR Speed-Ups
  • Tricks with Memory Limitations
  • Conclusion + Future Work
slide-63
SLIDE 63

CFR Speed-ups

c 1 1 QJ QK 1/6 1/6 2 2 .3 .7 2 2 .3 .7 1 +1

  • 2

1 1 +1 +2 .8 .2 +1

  • 1

1 1 .5

  • 1

+2 .5 .5

  • 1
  • 2

.5

L I A R ! L I A R ! P A N T S O N F I R E !

... ...

slide-64
SLIDE 64

CFR Speed-ups

c 1 1 QJ QK 1/6 1/6 2 2 .3 .7 2 2 .3 .7 .5 .5 +1

  • 2

.5 .5 1 +1 +2 .8 .2 +1

  • 1

1 1 .5

  • 1

+2 .5 .5

  • 1
  • 2

.5

  • Each iteration, only

update action probabilities at a sampled subset of states.

... ...

slide-65
SLIDE 65

CFR Speed-Ups

Chance Sampling

[Zinkevich et al., NIPS 2007]

slide-66
SLIDE 66

CFR Speed-Ups

Chance Sampling External Sampling

  • Faster iterations

– Use new strategies

sooner

  • Need more iterations

– Good trade-off

[Zinkevich et al., NIPS 2007] [Lanctot et al., NIPS 2009]

slide-67
SLIDE 67

CFR Speed-Ups

Chance Sampling External Sampling Outcome Sampling

  • Even faster

iterations

  • Even more

iterations required

  • Good trade-
  • ff?

[Zinkevich et al., NIPS 2007] [Lanctot et al., NIPS 2009]

slide-68
SLIDE 68

CFR Speed-Ups

2-round Heads-Up No-Limit Hold'em, 36 chips per player

?

slide-69
SLIDE 69

CFR Speed-Ups

  • Outcome sampling introduces a lot of variance
  • T ≤ C + K∙Variance [G. et al., AAAI 2012]

– T = Iterations required to be

“close enough” to equilibrium

– C, K = Constants

New!

[G., Lanctot, Burch, Szafron and Bowling, AAAI 2012]

slide-70
SLIDE 70

CFR Speed-Ups

Chance Sampling External Sampling Outcome Sampling Average Strategy Sampling New!

Average Strategy

[G. et al., NIPS 2012] [Zinkevich et al., NIPS 2007] [Lanctot et al., NIPS 2009]

slide-71
SLIDE 71

CFR Speed-Ups

2-round Heads-Up No-Limit Hold'em, 36 chips per player [G. et al., NIPS 2012]

slide-72
SLIDE 72

CFR Speed-Ups

[G. et al., NIPS 2012] 2-round Heads-Up No-Limit Hold'em, k chips per player

slide-73
SLIDE 73

Outline of Presentation

  • Background

– Counterfactual Regret Minimization (CFR)

  • Theoretical Advancements for CFR in:

– Many player games – Imperfect recall games

  • CFR Speed-Ups
  • Tricks with Memory Limitations
  • Conclusion + Future Work
slide-74
SLIDE 74

Tricks with Memory Limitations

2-player Limit Texas Hold'em Abstract Game

≈1014

≈109

≈ 59,000,000 “Turn” Deals

540,000 “Turn” Buckets

slide-75
SLIDE 75

Tricks with Memory Limitations

2-player Limit Texas Hold'em Abstract Game

≈1014

≈109

3-player Limit Texas Hold'em Abstract Game

≈1017

≈109

≈ 59,000,000 “Turn” Deals

540,000 “Turn” Buckets

≈ 59,000,000 “Turn” Deals

540 “Turn” Buckets

slide-76
SLIDE 76

Tricks with Memory Limitations

3-Player Limit Texas Hold'em Abstract Game Abstract Game Strategy

3-player Limit Texas Hold'em Stitched Strategy

540 “Turn” Buckets

≈ 59,000,000 “Turn” Deals

slide-77
SLIDE 77

Tricks with Memory Limitations

3-Player Limit Texas Hold'em Abstract Game Abstract Game Strategy

3-player Limit Texas Hold'em Stitched Strategy

2-player Experts 2-player Sub-games

540 “Turn” Buckets

540,000 “Turn” Buckets

  • Generalizes 3 previous approaches

[Gibson and Szafron, NIPS 2011]

≈ 59,000,000 “Turn” Deals

slide-78
SLIDE 78

Tricks with Memory Limitations

Extensive-Form Game

Abstraction 1

Abstraction 2 Abstraction K

...

“Frankenstein” Abstract Game New!

Frankenstein-Game Strategy

Full Game Strategy

[Gibson and Szafron, NIPS 2011]

slide-79
SLIDE 79

Tricks with Memory Limitations

3-player Limit Texas Hold'em

18K “Turn” Buckets

1.53 Million “Turn” Buckets

“Frankenstein” Abstract Game

Frankenstein-Game Strategy

3-player Texas Hold'em Strategy

New!

slide-80
SLIDE 80

Tricks with Memory Limitations

Hyperborean3p Tournament

CFR 2-player Experts

slide-81
SLIDE 81

Outline of Presentation

  • Background

– Counterfactual Regret Minimization (CFR)

  • Theoretical Advancements for CFR in:

– Many player games – Imperfect recall games

  • CFR Speed-Ups
  • Tricks with Memory Limitations
  • Conclusion + Future Work
slide-82
SLIDE 82

Conclusion

  • We have made the following contributions:

– First set of theoretical properties for CFR in:

  • games with more than 2 players
  • imperfect recall games

– Theoretical and practical improvements for

making CFR go faster

– Techniques for dealing with limited memory

  • This research has led to the development of

the strongest 3-player limit Texas hold'em strategies in the world.

slide-83
SLIDE 83

Future Work

  • Opponent modelling

– “On-line CFR”

  • 10-player Texas Hold'em

– Ultimately challenge humans for

the World Series of Poker

slide-84
SLIDE 84

Thanks for Listening!

  • Email: richard.g.gibson@gmail.com
  • Website: http://cs.ualberta.ca/~rggibson/
  • Twitter: @RichardGGibson

Clip art images used in this presentation can be found at clker.com