U G A V ! " University of Alberta, Canada # ! K Q $ - - PowerPoint PPT Presentation

u g
SMART_READER_LITE
LIVE PREVIEW

U G A V ! " University of Alberta, Canada # ! K Q $ - - PowerPoint PPT Presentation

Finding Optimal Abstract Strategies in Extensive-Form Games Mike Johanson, Nolan Bard, Q J $ # K 1 0 P C R " ! Neil Burch, Michael Bowling U G A V ! " University of Alberta, Canada # ! K Q $ A J ! 0 July 25


slide-1
SLIDE 1

Mike Johanson, Nolan Bard, Neil Burch, Michael Bowling University of Alberta, Canada July 25th, 2012 :: AAAI 2012, Toronto

U V

A ! A !

C

K " K "

P

Q # Q #

R

J $ J $

G

1 ! 1 !

University of Alberta Computer Poker Research Group

Finding Optimal Abstract Strategies in Extensive-Form Games

Tuesday, November 13, 2012

slide-2
SLIDE 2

100 200 300 400 2006 2007 2008 2009 2010 2011

Exploitability (mbb/g) Year

2-Player Limit Texas Hold’em Poker: Distance from Perfect Play

AAAI 2007 Vancouver: Narrow loss to Human Pros (275) Las Vegas: Narrow win

  • ver Human Pros

(235) 104

Tuesday, November 13, 2012

slide-3
SLIDE 3

Abstraction-Solving-Translation

Goal: We want to learn a strategy σ (or, in RL, a policy π) that chooses actions. Exploitability: Expected loss against a perfect adversary. Nash Equilibrium: Unexploitable - expected loss of $0 per game. An optimal

  • strategy. We want to

approximate this.

Game 1014 decisions Optimal Strategy Solver

Tuesday, November 13, 2012

slide-4
SLIDE 4

Abstraction-Solving-Translation

Problem: The game has 1014 information

  • sets. Far too large to solve!

With current techniques, this would take 4 petabytes of RAM and thousands of CPU-years!

Game 1014 decisions Optimal Strategy Solver

Tuesday, November 13, 2012

slide-5
SLIDE 5

Abstraction-Solving-Translation

Problem: The game has 1014 information

  • sets. Far too large to solve!

With current techniques, this would take 4 petabytes of RAM and thousands of CPU-years!

Game 1014 decisions Optimal Strategy Solver

If you have four petabytes of RAM, we should talk!

Tuesday, November 13, 2012

slide-6
SLIDE 6

Abstraction-Solving-Translation

Workaround: Use state-space abstraction to make a smaller game that we can solve.

Game 1014 decisions Optimal Strategy Solver Abstraction Abstract Game 107 decisions

Tuesday, November 13, 2012

slide-7
SLIDE 7

Abstraction-Solving-Translation

Solving: Use a game-solving algorithm to find an

  • ptimal strategy for

the abstract game.

Game 1014 decisions Optimal Strategy Solver Abstraction Abstract Game 107 decisions Solver Optimal Abstract Strategy

Tuesday, November 13, 2012

slide-8
SLIDE 8

Abstraction-Solving-Translation

Solving: Use a game-solving algorithm to find an

  • ptimal strategy for

the abstract game.

Game 1014 decisions Strategy Solver Abstraction Abstract Game 107 decisions Solver Optimal Abstract Strategy Translation

Tuesday, November 13, 2012

slide-9
SLIDE 9

Abstraction-Solving-Translation

Game 1014 decisions Optimal Strategy Solver Abstraction Abstract Game 107 decisions Solver Optimal Abstract Strategy

Two Types of Loss: Lossy abstraction. May not be possible to represent an optimal strategy. Other abstract strategies might be better in the real game!

Tuesday, November 13, 2012

slide-10
SLIDE 10

Set of Strategies

Set of Abstract Strategies

Abstract Optimal Strategy Real Optimal Strategy

Abstract Equilibrium might not be optimal in the real game.

Exploitability

Tuesday, November 13, 2012

slide-11
SLIDE 11

Set of Strategies

Set of Abstract Strategies

Least Exploitable Abstract Strategy

Abstract Equilibrium might not be optimal in the real game.

Exploitability Abstract Optimal Strategy Real Optimal Strategy

Tuesday, November 13, 2012

slide-12
SLIDE 12

Abstraction-Solving-Translation

This Talk: Efficiently finding an abstract strategy with the lowest exploitability in the real game.

Game 1014 decisions Optimal Strategy Solver Abstraction Abstract Game 107 decisions Optimal Abstract Strategy Least Exploitable Abstract Strategy Solver Solver

Tuesday, November 13, 2012

slide-13
SLIDE 13

Counterfactual Regret Minimization (CFR) NIPS 2007 vs

σ0 = uniform random t=0

Tuesday, November 13, 2012

slide-14
SLIDE 14

vs

t=0 σ0 σ1 1

Update using CFR “Play a game”,

Counterfactual Regret Minimization (CFR) NIPS 2007

Updating with CFR makes them regret-minimizing agents.

Tuesday, November 13, 2012

slide-15
SLIDE 15

vs

t=0 σ0 σ1 1 σ2 2

The “Current” strategy

σ0 + σ1 + ... + σt

^

t

The “Average” strategy

σ =

Counterfactual Regret Minimization (CFR) NIPS 2007

Tuesday, November 13, 2012

slide-16
SLIDE 16

vs

t=0 σ0 σ1 1

Key Theorem:

σ2 2 σ3 3 σ4 4 σT T

If both players are regret-minimizing, then their average strategy converges towards an optimal strategy.

Counterfactual Regret Minimization (CFR) NIPS 2007

Tuesday, November 13, 2012

slide-17
SLIDE 17

CFR Iterations

CFR in an abstract 10-Bucket Perfect Recall Game

Counterfactual Regret Minimization (CFR) NIPS 2007

10-1 100 101 102 103 104 105 106 107 Abstract Game

Abstract Game Exploitability

Tuesday, November 13, 2012

slide-18
SLIDE 18

Abstract Game Exploitability

CFR Iterations

CFR in an abstract 10-Bucket Perfect Recall Game Real Game Exploitability

Counterfactual Regret Minimization (CFR) NIPS 2007

10-1 100 101 102 103 104 105 106 107 260 280 300 320 340 Abstract Game Real Game

Tuesday, November 13, 2012

slide-19
SLIDE 19

Moving from CFR to CFR-BR in six easy steps.

Tuesday, November 13, 2012

slide-20
SLIDE 20

Both players abstracted. vs X GB RAM X GB RAM

Both players are abstracted. Computation is efficient, Solution is suboptimal. X is typically 1 to 100, depending

  • n size of abstraction.

Abstracted, CFR Abstracted, CFR

1

Tuesday, November 13, 2012

slide-21
SLIDE 21

Abstracted, CFR

vs 100 GB RAM 140 TB RAM

Unabstracted, CFR

[Waugh et al., 2009]: Opponent is unabstracted. Abstracted player minimizes exploitability! Requires far too much RAM and computation.

Opponent is unabstracted. 2

Tuesday, November 13, 2012

slide-22
SLIDE 22

Abstracted, CFR

vs 8.75 TB RAM

Unabstracted, Best Response

A Best Response is also regret-minimizing, so average CFR strategy converges. Current CFR strategy converges, too! Takes 76 CPU-days to compute a BR.

Play against a Best Response. 3 100 GB RAM

Tuesday, November 13, 2012

slide-23
SLIDE 23

BR Trunk

BR Subgame

Rounds 1 and 2 Rounds 3 and 4

Unabstracted, Best Response

Split strategy into a Trunk and many Subgames. Big advantage of Best Response: Can compute subgames independently as needed! Never need to store all of it at once!

3 MB 59 MB Split Best Response into pieces. 4

Tuesday, November 13, 2012

slide-24
SLIDE 24

Abstracted, CFR

vs

Compute subgames as needed, then discard. Memory problem solved! Takes 2x76 CPU- days, though: first pass to compute Trunk, second pass to play the game.

BR Trunk

BR Subgame

Rounds 1 and 2 Rounds 3 and 4

59+3 = 62 MB RAM Split Best Response into pieces. 4 100 GB RAM

Tuesday, November 13, 2012

slide-25
SLIDE 25

Abstracted, CFR

vs

Use CFR to update Trunk strategy. This is also regret-minimizing, so CFR converges. Can query Trunk strategy any time, and compute Subgame strategy as needed.

CFR Trunk BR Subgame

Rounds 1 and 2 Rounds 3 and 4

936+3 = 940 MB RAM Play against a CFR-BR Hybrid. 5 100 GB RAM

Tuesday, November 13, 2012

slide-26
SLIDE 26

Abstracted, CFR

vs

Sample one subgame, compute BR, update players. Takes 50 CPU-seconds per iteration and 940 MB RAM, and still converges!

CFR Trunk BR Subgame

Rounds 1 and 2 Rounds 3 and 4

940 MB RAM Use Sampling to converge faster. 6 100 GB RAM

Tuesday, November 13, 2012

slide-27
SLIDE 27

Abstracted, CFR

vs

CFR Trunk BR Subgame

940 MB RAM CFR-BR:

Finds the least exploitable abstract strategy, while using less RAM than CFR did! Average Strategy: Guaranteed to converge. Current Strategy: Not guaranteed, but converges faster in practice.

100 GB RAM

Tuesday, November 13, 2012

slide-28
SLIDE 28

Testing in a small poker game

Exploitability (mbb/g) Unabstracted [2-4] Hold’em Poker: 94 million information sets Time (CPU-seconds)

10-1 100 101 102 103 102 103 104 105 106 107 CFR CFR-BR Average CFR-BR Current

Tuesday, November 13, 2012

slide-29
SLIDE 29

Testing in a small poker game

Exploitability (mbb/g) Abstracted [2-4] Hold’em: 1790 information sets Time (CPU-seconds)

101 102 103 102 103 104 105 106 107 81.332 143.932 CFR A-vs-A CFR A-vs-U CFR-BR Average CFR-BR Current

Tuesday, November 13, 2012

slide-30
SLIDE 30

Texas Hold’em Poker: Small Abstractions

Exploitability (mbb/g) 2007 Computer Poker Competition Abstraction 57 million information sets Time (CPU-seconds)

(Previous best strategy: 100x larger abstraction, exploitable for 104)

101 102 103 104 105 106 107 108 109 305.045 92.638 CFR CFR-BR Avg CFR-BR Cur

Tuesday, November 13, 2012

slide-31
SLIDE 31

Texas Hold’em Poker: Tiny Abstractions

Exploitability (mbb/g) 2-Bucket and 3-Bucket Abstractions: These fit on a 1.44 MB Floppy Disk! Time (CPU-seconds)

(2008 Man-vs-Machine Winner: 1.25 GB, exploitable for 235)

102 103 105 106 107 108 218.487 175.824 2-Bucket CFR-BR Average 3-Bucket CFR-BR Average

Tuesday, November 13, 2012

slide-32
SLIDE 32

Texas Hold’em Poker: Small Abstractions

Exploitability (mbb/g) Least Exploitable Strategy Ever Made: 5.8 Billion information sets Time (CPU-seconds)

101 102 103 106 107 108 109 37.170 53.7929 Hyperborean 2011.IRO CFR-BR Average CFR-BR Current

Previous Best Strategy, Same Abstraction: 104

Tuesday, November 13, 2012

slide-33
SLIDE 33

100 200 300 400 2006 2007 2008 2009 2010 2011 2012 Narrow loss to Human Pros Narrow win

  • ver Human Pros

2-Player Limit Texas Hold’em Poker: Distance from Perfect Play

This Talk: CFR-BR

Exploitability (mbb/g) Year

Tuesday, November 13, 2012

slide-34
SLIDE 34

100 200 300 400 2006 2007 2008 2009 2010 2011 2012

Exploitability (mbb/g)

Narrow loss to Human Pros Narrow win

  • ver Human Pros

Thanks! More results at the poster!

This Talk: CFR-BR

U V

A ! A !

C

K " K "

P

Q # Q #

R

J $ J $

G

1 ! 1 !

University of Alberta Computer Poker Research Group

Year

Tuesday, November 13, 2012

slide-35
SLIDE 35

Bonus Slides

Tuesday, November 13, 2012

slide-36
SLIDE 36

One-on-One: PR 10s

  • 150
  • 125
  • 100
  • 75
  • 50
  • 25

25 50 105 106 107 108

  • 23.9
  • 26.2

CFR-BR Current CFR-BR Average

Time (CPU-Seconds) One-on-One (mbb/g)

Tuesday, November 13, 2012

slide-37
SLIDE 37

One-on-One: IR 9000

Time (CPU-Seconds) One-on-One (mbb/g)

  • 150
  • 125
  • 100
  • 75
  • 50
  • 25

25 50 105 106 107 108

  • 17.1
  • 19.1

CFR-BR Current CFR-BR Average

Tuesday, November 13, 2012

slide-38
SLIDE 38

One-on-One: vs 2011

Time (CPU-Seconds) One-on-One (mbb/g)

  • 150
  • 125
  • 100
  • 75
  • 50
  • 25

25 50 105 106 107 108

  • 27.9
  • 7.5
  • 39.9
  • 18.2

PCS 10-bucket PR CFR-BR 10-bucket PR PCS 9000-bucket IR CFR-BR 9000-bucket IR

Tuesday, November 13, 2012