Accelerating Best Response Calculation in Large Extensive Games Q - - PowerPoint PPT Presentation

accelerating best response calculation in large extensive
SMART_READER_LITE
LIVE PREVIEW

Accelerating Best Response Calculation in Large Extensive Games Q - - PowerPoint PPT Presentation

Accelerating Best Response Calculation in Large Extensive Games Q J # $ K 10 P " C R ! July 21, 2011 U G A V ! " # Michael Johanson, Kevin Waugh, ! K Q $ A J ! 10 University of Alberta Michael Bowling, Martin


slide-1
SLIDE 1

July 21, 2011 Michael Johanson, Kevin Waugh, Michael Bowling, Martin Zinkevich

Accelerating Best Response Calculation in Large Extensive Games

U V

A ! A !

C

K " K "

P

Q # Q #

R

J $ J $

G

10 ! 10 !

University of Alberta Computer Poker Research Group

Wednesday, November 14, 2012

slide-2
SLIDE 2

Game Strategy AI

Suppose you have a 2-player game. You can use an algorithm for learning a strategy in this space.

Wednesday, November 14, 2012

slide-3
SLIDE 3

Game Strategy

Compete against

  • ther agents?

AI

There are several ways to evaluate a strategy. You could run a competition against

  • ther agents.

Computer Poker Competition RoboCup Computer Olympiad Trading Agent Competition ♥ ♣ ♦ ♠

Wednesday, November 14, 2012

slide-4
SLIDE 4

Game Strategy

Compete against

  • ther agents?

Worst case performance?

AI

Worst-case performance is another useful metric. 2-player games: Use Expectimax to find a best-response counterstrategy.

Wednesday, November 14, 2012

slide-5
SLIDE 5

Strategy

Compete against

  • ther agents?

Worst case performance?

AI

Optimal Strategy (2-player, zero-sum game): Nash Equilibrium, maximize worst-case performance. Or, equivalently, minimize worst-case loss.

Game

Wednesday, November 14, 2012

slide-6
SLIDE 6

Strategy AI

Compete against

  • ther agents?

Worst case performance?

2-player Limit Texas Hold’em Poker: ~1018 game states ~1014 information sets (decision points) Computing an optimal strategy: 4 PB of RAM, 1400 cpu-years

Game

Wednesday, November 14, 2012

slide-7
SLIDE 7

Strategy AI

Compete against

  • ther agents?

Worst case performance?

2-player Limit Texas Hold’em Poker: ~1018 game states Computing a best response: Thought to be intractable, may require a full game tree traversal. At 3 billion states/sec, would take 10 years.

Game

Wednesday, November 14, 2012

slide-8
SLIDE 8

Strategy AI

Compete against

  • ther agents?

Worst case performance? Smaller, similar “Abstract” Game

Abstract Strategy AI

1014 decisions 107 decisions

State-space abstraction allows us to produce strategies for the game.

Game

Wednesday, November 14, 2012

slide-9
SLIDE 9

Strategy AI

Compete against

  • ther agents?

Worst case performance?

Game

AAAI 2007 - Phil Laak First Man-Machine Poker Championship Annual Computer Poker Competition: 2006-2011 (at AAAI next month!)

Evaluation has relied on tournaments.

Wednesday, November 14, 2012

slide-10
SLIDE 10

Strategy AI

Compete against

  • ther agents?

Worst case performance? Smaller, similar “Abstract” Game

Abstract Strategy AI

1014 decisions 107 decisions

Key questions:

How much did abstraction hurt? How good are the agents, really? Can we make the best-response computation tractable to find out?

Game

Wednesday, November 14, 2012

slide-11
SLIDE 11

Accelerating Best Response Computation

Four ways to speed up best response calculation in imperfect information games Formerly intractable computations are now run in one day Solving an 8 year old evaluation problem How good are state-of-the-art computer poker programs? ♥ ♣ ♦ ♠

Wednesday, November 14, 2012

slide-12
SLIDE 12

Expectimax Search

The Best Response Task: Given an opponent’s entire strategy, choose actions to maximize our expected value.

Wednesday, November 14, 2012

slide-13
SLIDE 13

Our View Opponent’s View

.9 .1 .3 .7

Wednesday, November 14, 2012

slide-14
SLIDE 14

Our View Opponent’s View

.9 .1 .3 .7

Cards are Private

Wednesday, November 14, 2012

slide-15
SLIDE 15

Our View Opponent’s View

.9 .1 .3 .7

Our Choice Nodes

Wednesday, November 14, 2012

slide-16
SLIDE 16

Our View Opponent’s View

.9 .1 .3 .7

Opponent Choice Nodes

(probabilities are known)

.4 .6 .2 .8

Wednesday, November 14, 2012

slide-17
SLIDE 17

Our Tree Opponent’s Tree

.9 .1 .3 .7

To determine our payoff here...

.4 .6 .2 .8

Wednesday, November 14, 2012

slide-18
SLIDE 18

Our Tree Opponent’s Tree

.9 .1 .3 .7

..we need to compute the distribution over these states

.4 .6 .2 .8

Wednesday, November 14, 2012

slide-19
SLIDE 19

Expectimax Search

Simple recursive tree walk: Pass forward: Probability of opponent being in their private states Return: Expected value for our private state

♥ ♣

Wednesday, November 14, 2012

slide-20
SLIDE 20

Expectimax Search

Simple recursive tree walk: Pass forward: Probability of opponent being in their private states Return: Expected value for our private state

♥ ♣

Visits each state just once! But 1018 states is still intractable.

Wednesday, November 14, 2012

slide-21
SLIDE 21

Accelerated Best Response

Four ways to accelerate this computation:

1) Take advantage of what the opponent doesn’t know

2) Do O(n^2) work in O(n) time 3) Avoid isomorphic game states 4) Parallel computation

Wednesday, November 14, 2012

slide-22
SLIDE 22

My Tree Your Tree What the opponent doesn’t know

.9 .1 .3 .7

Wednesday, November 14, 2012

slide-23
SLIDE 23

My Tree Your Tree What the opponent doesn’t know

.9 .1 .3 .7

Wednesday, November 14, 2012

slide-24
SLIDE 24

My Tree Your Tree What the opponent doesn’t know

.9 .1 .3 .7

Wednesday, November 14, 2012

slide-25
SLIDE 25

We can instead walk this much smaller tree of public information. At each node, we choose actions for all of the states our opponent cannot tell apart. More work per node, but we reuse queries to the opponent’s strategy! ~110x speedup in Texas hold’em

The Public Tree

Wednesday, November 14, 2012

slide-26
SLIDE 26

Accelerated Best Response

The new technique has four orthogonal improvements:

1) Take advantage of what the opponent doesn’t know

2) Do O(n^2) work in O(n) time.

3) Avoid isomorphic game states 4) Parallel computation

Wednesday, November 14, 2012

slide-27
SLIDE 27

Fast Terminal Node Evaluation

My n States

Wednesday, November 14, 2012

slide-28
SLIDE 28

Fast Terminal Node Evaluation

Opponent’s n States My n States

Wednesday, November 14, 2012

slide-29
SLIDE 29

Fast Terminal Node Evaluation

p1*u1

+ p2*u2

+ p3*u3 = + p4*u4 + p5*u5 + p6*u6 p1*u1

+ p2*u2

+ p3*u3 = + p4*u4 + p5*u5 + p6*u6 p1*u1

+ p2*u2

+ p3*u3 = + p4*u4 + p5*u5 + p6*u6 p1*u1

+ p2*u2

+ p3*u3 = + p4*u4 + p5*u5 + p6*u6 p1*u1

+ p2*u2

+ p3*u3 = + p4*u4 + p5*u5 + p6*u6 p1*u1

+ p2*u2

+ p3*u3 = + p4*u4 + p5*u5 + p6*u6

Opponent’s n States My n States

Wednesday, November 14, 2012

slide-30
SLIDE 30

O(n2) work to evaluate n hands

Fast Terminal Node Evaluation

Opponent’s n States My n States

Wednesday, November 14, 2012

slide-31
SLIDE 31

Most games have structure that can be exploited. In Poker, states are ranked, and the highest rank wins.

}

Fast Terminal Node Evaluation

Wednesday, November 14, 2012

slide-32
SLIDE 32

To calculate one state’s EV, we only need:

  • Probability of opponent reaching weaker states
  • Probability of opponent reaching stronger states

EV[i] = p(lose) * util(lose) + p(win) * util(win)

}

Fast Terminal Node Evaluation

Wednesday, November 14, 2012

slide-33
SLIDE 33

By exploiting the game’s structure, we can use two for() loops instead of two nested for() loops. O(n2) to O(n). 7.7x speedup in Texas hold’em. (Some tricky details resolved in the paper)

}

Fast Terminal Node Evaluation

Wednesday, November 14, 2012

slide-34
SLIDE 34

Accelerated Best Response

The new technique has four orthogonal improvements:

1) Take advantage of what the opponent doesn’t know 2) Do O(n^2) work in O(n) time.

3) Avoid isomorphic game states

4) Parallel computation

Wednesday, November 14, 2012

slide-35
SLIDE 35

=

Avoid Isomorphic States

21.5x reduction in game size (only correct if

  • pponent’s strategy

also does this)

Wednesday, November 14, 2012

slide-36
SLIDE 36

Accelerated Best Response

The new technique has four orthogonal improvements:

1) Take advantage of what the opponent doesn’t know, to walk the much smaller public tree 2) Use a fast terminal node evaluation to do O(n^2) work in O(n) time. 3) Avoid isomorphic game states

4) Parallel computation

Wednesday, November 14, 2012

slide-37
SLIDE 37

Parallel Computation

24,570 equal sized independent subtrees. Takes 4m30s to solve each one. 24,570 * 4.5 minutes = 76 cpu-days ♥ ♣ ♦

Wednesday, November 14, 2012

slide-38
SLIDE 38

Parallel Computation

24,570 equal sized independent subtrees. Takes 4m30s to solve each one. 24,570 * 4.5 minutes = 76 cpu-days 72 processors on a cluster: 1 day computation! ♠ ♥ ♣ ♦

Wednesday, November 14, 2012

slide-39
SLIDE 39

Evaluating the Progress

  • f Computer Poker Research

Wednesday, November 14, 2012

slide-40
SLIDE 40

Evaluating Computer Poker Agents

Annual Computer Poker Competition (ACPC)

  • Started in 2006
  • Hosted at AAAI this year
  • 2-player Limit: Strongest agents are

competitive with world’s best human pros Most successful approach (U of A, CMU, many others):

  • Approximate a Nash equilibrium,

worst case loss of $0 per game For the first time, we can now tell how close we are to this goal! ♥ ♣ ♦

Wednesday, November 14, 2012

slide-41
SLIDE 41

Trivial Opponents

Value for Best Response Always-Fold 750 Always-Call 1163.48 Always-Raise 3697.69 Uniform Random 3466.32

(Units are milli-big-blinds per game) A human professional’s goal is to win 50. An optimal strategy would lose 0.

Wednesday, November 14, 2012

slide-42
SLIDE 42

University of Alberta Agents

80 160 240 320 400 2006 2007 2008 2009 2010 2011

Best Response (mbb/g) Year

Computer Poker Competition Man-vs-Machine

Wednesday, November 14, 2012

slide-43
SLIDE 43

University of Alberta Agents

Best Response (mbb/g) Year

2007 Man-Machine: Narrow Human Win

80 160 240 320 400 2006 2007 2008 2009 2010 2011

Computer Poker Competition Man-vs-Machine

Wednesday, November 14, 2012

slide-44
SLIDE 44

University of Alberta Agents

Best Response (mbb/g) Year

2007 Man-Machine: Narrow Human Win 2008 Man-Machine: Narrow Computer Win

80 160 240 320 400 2006 2007 2008 2009 2010 2011

Computer Poker Competition Man-vs-Machine

Wednesday, November 14, 2012

slide-45
SLIDE 45

100 200 300 400 1E+06 1E+07 1E+08 1E+09 1E+10

Percentile HS Public PHS k-Means Earthmover

Abstraction Size (# information sets)

Best Response (mbb/g)

Evaluating the University of Alberta agents

Comparing Abstraction Techniques:

Wednesday, November 14, 2012

slide-46
SLIDE 46

Evaluating Computer Poker Agents: 2010 Competition

Rock hopper GGValuta HyperB (UofA) PULPO GS6 (CMU) Littlerock Best Response Rock hopper

6 3 7 37 77 300

GGValuta

  • 6

3 1 31 77 237

HyperB (UofA)

  • 3
  • 3

2 31 70 135

PULPO

  • 7
  • 1
  • 2

32 125 399

GS6 (CMU)

  • 37
  • 31
  • 31
  • 32

47 318

Littlerock

  • 77
  • 77
  • 70
  • 125
  • 47

421

Wednesday, November 14, 2012

slide-47
SLIDE 47

Conclusion

Fast best-response calculation in imperfect information games The previously intractable computation can now be run in a day! Computer poker community is making steady progress towards robust strategies Many additional exciting results in the paper and at the poster!

♥ ♣ ♦ ♠

Wednesday, November 14, 2012

slide-48
SLIDE 48

More details at our poster! Today, 4:00 - 5:20, Room 120-121

Wednesday, November 14, 2012

slide-49
SLIDE 49

Additional Slides:

Abstraction CFR Tilting Additional Graphs Polaris Hyperborean 2009 Expectimax Public Tree n^2 to n Pathologies

Wednesday, November 14, 2012

slide-50
SLIDE 50

Abstraction Best Response Real Game vs Real Game J.Q.K vs Real Game 55.2 [JQ].K vs Real Game 69.0 J.[QK] vs Real Game 126.3 [JQK] vs Real Game 219.3 [JQ].K vs [JQ].K 272.2 [JQ].K vs J.Q.K 274.1 Real Game vs J.[QK] 345.7 Real Game vs [JQ].K 348.9 J.Q.K vs J.Q.K 359.9 J.Q.K vs [JQ].K 401.3 J.[QK] vs J.[QK] 440.6 Real Game vs [JQK] 459.5 Real Game vs J.Q.K 491.0 [JQK] vs [JQK] 755.8

Leduc Hold’em Pathologies

Home

Wednesday, November 14, 2012

slide-51
SLIDE 51

My Tree Your Tree Reach:

2: 0.5 K: 0.5

0.5

  • 0.29

0.5

Expectimax

Home

Wednesday, November 14, 2012

slide-52
SLIDE 52

My Tree Your Tree Reach:

2: 0.5*0.75 K: 0.5*0.1

0.38

  • 0.29

0.05

Conventional Best Response in one tree walk

Home

Wednesday, November 14, 2012

slide-53
SLIDE 53

My Tree Your Tree

0.38

  • 0.29

0.05

Conventional Best Response in one tree walk Reach:

2: 0.5*0.75 K: 0.5*0.1

Home

Wednesday, November 14, 2012

slide-54
SLIDE 54

My Tree Your Tree Reach:

2: 0.5*0.75*0.25 K: 0.5*0.1*0.9

0.09

  • 0.29

0.05

  • 0.045

Conventional Best Response in one tree walk

Home

Wednesday, November 14, 2012

slide-55
SLIDE 55

My Tree Your Tree Reach:

2: 0.5*0.75 K: 0.5*0.1

  • 0.29

0.38 0.05

  • 0.045

Conventional Best Response in one tree walk

Home

Wednesday, November 14, 2012

slide-56
SLIDE 56

My Tree Your Tree Reach:

2: 0.5*0.75*0.75 K: 0.5*0.1*0.1

  • 0.29

0.28 0.005

  • 0.045

0.14

Conventional Best Response in one tree walk

Home

Wednesday, November 14, 2012

slide-57
SLIDE 57

My Tree Your Tree Reach:

2: 0.5*0.75 K: 0.5*0.1

  • 0.29

0.1 0.38 0.05

Conventional Best Response in one tree walk

Home

Wednesday, November 14, 2012

slide-58
SLIDE 58

My Tree Your Tree

  • 0.29

0.1 0.38 0.05

Conventional Best Response in one tree walk Reach:

2: 0.5*0.75 K: 0.5*0.1

Home

Wednesday, November 14, 2012

slide-59
SLIDE 59

My Tree Your Tree

  • 0.29

0.1 0.38 0.05

Conventional Best Response in one tree walk

  • 0.05

Reach:

2: 0.5*0.75 K: 0.5*0.1

Home

Wednesday, November 14, 2012

slide-60
SLIDE 60

My Tree Your Tree

  • 0.29

0.1 0.38 0.05

Conventional Best Response in one tree walk

0.1

  • 0.05

Reach:

2: 0.5*0.75 K: 0.5*0.1

Home

Wednesday, November 14, 2012

slide-61
SLIDE 61

My Tree Your Tree Reach:

2: 0.5 K: 0.5

Conventional Best Response in one tree walk

0.5 0.5

  • 0.19

Home

Wednesday, November 14, 2012

slide-62
SLIDE 62

My Tree Your Tree Reach:

2: 0.5 K: 0.5

Conventional Best Response in one tree walk

0.5 0.5

  • 0.19

Home

Wednesday, November 14, 2012

slide-63
SLIDE 63

My Tree Your Tree Conventional Best Response in one tree walk

0.5 0.5

  • 0.19

Reach:

2: 0.5 K: 0.5

Home

Wednesday, November 14, 2012

slide-64
SLIDE 64

My Tree Your Tree Conventional Best Response in one tree walk

  • 0.19

0.13 0.45

Reach:

2: 0.5*0.25 K: 0.5*0.9

Home

Wednesday, November 14, 2012

slide-65
SLIDE 65

Their Reach Prob:

2: 0.5 K: 0.5

My Value:

2: K:

1: Walking the Public Tree

Home

Wednesday, November 14, 2012

slide-66
SLIDE 66

Their Reach Prob:

2: 0.5*0.25 K: 0.5*0.9

My Value:

2: K:

1: Walking the Public Tree

Home

Wednesday, November 14, 2012

slide-67
SLIDE 67

Their Reach Prob:

2: 0.5*0.25 K: 0.5*0.9

My Value:

2: -0.45 K:0.13

  • 0.45,

0.13

  • 0.45

0.13

1: Walking the Public Tree

Home

Wednesday, November 14, 2012

slide-68
SLIDE 68

Their Reach Prob:

2: 0.5*0.25 K: 0.5*0.9

My Value:

2: -0.45 K:0.13

  • 0.45,

0.13

  • 0.45

0.13

1: Walking the Public Tree

Home

Wednesday, November 14, 2012

slide-69
SLIDE 69

Their Reach Prob:

2: 0.5*0.25 K: 0.5*0.9

My Value:

2: -0.29 K: -0.29

  • 0.45,

0.13

  • 0.29
  • 0.29
  • 0.45

0.13

  • 0.29
  • 0.29

1: Walking the Public Tree

Home

Wednesday, November 14, 2012

slide-70
SLIDE 70

Their Reach Prob:

2: 0.5*0.25 K: 0.5*0.9

My Value:

2: -0.29 K: 0.13

  • 0.45,

0.13

  • 0.29
  • 0.29
  • 0.29,

0.13

  • 0.45

0.13

  • 0.29
  • 0.29
  • 0.29

0.13

1: Walking the Public Tree

Home

Wednesday, November 14, 2012

slide-71
SLIDE 71

Their Reach Prob:

2: 0.5 K: 0.5

My Value:

2: -0.29 K: 0.13

  • 0.29,

0.13

  • 0.29

0.13

1: Walking the Public Tree

Home

Wednesday, November 14, 2012

slide-72
SLIDE 72

Their Reach Prob:

2: 0.5*0.75 K: 0.5*0.1

My Value:

2: K:

  • 0.29,

0.13

  • 0.29

0.13

1: Walking the Public Tree

Home

Wednesday, November 14, 2012

slide-73
SLIDE 73

Their Reach Prob:

2: 0.5*0.75 K: 0.5*0.1

My Value:

2: K:

  • 0.29,

0.13

  • 0.29

0.13

1: Walking the Public Tree

Home

Wednesday, November 14, 2012

slide-74
SLIDE 74

Their Reach Prob:

2: 0.5*0.75 K: 0.5*0.1

My Value:

2: -0.05 K: 0.09

  • 0.29,

0.13

  • 0.29

0.13

1: Walking the Public Tree

  • 0.05,

0.09

  • 0.05

0.09

Home

Wednesday, November 14, 2012

slide-75
SLIDE 75

Their Reach Prob:

2: 0.5*0.75 K: 0.5*0.1

My Value:

2: -0.05 K: 0.09

  • 0.29,

0.13

  • 0.29

0.13

1: Walking the Public Tree

  • 0.05,

0.09

  • 0.05

0.09

Home

Wednesday, November 14, 2012

slide-76
SLIDE 76

Their Reach Prob:

2: 0.5*0.75 K: 0.5*0.1

My Value:

2: 0.14 K: 0.14

  • 0.29,

0.13

  • 0.29

0.13

1: Walking the Public Tree

  • 0.05,

0.09

  • 0.05

0.09 0.14 0.14 0.14 0.14

Home

Wednesday, November 14, 2012

slide-77
SLIDE 77

Their Reach Prob:

2: 0.5*0.75 K: 0.5*0.1

My Value:

2: 0.14 K: 0.14

  • 0.29,

0.13

  • 0.29

0.13

1: Walking the Public Tree

  • 0.05,

0.09

  • 0.05

0.09 0.14 0.14 0.14 0.14 0.09 0.23 0.09 0.23

Home

Wednesday, November 14, 2012

slide-78
SLIDE 78

Their Reach Prob:

2: 0.5*0.75 K: 0.5*0.1

My Value:

2: 0.14 K: 0.14

  • 0.29,

0.13

  • 0.29

0.13

1: Walking the Public Tree

0.09 0.23 0.09 0.23

Home

Wednesday, November 14, 2012

slide-79
SLIDE 79

Their Reach Prob:

2: 0.5*0.75 K: 0.5*0.1

My Value:

2: -0.05 K: 0.19

  • 0.29,

0.13

  • 0.29

0.13

1: Walking the Public Tree

0.09 0.23 0.09 0.23

  • 0.05

0.19

  • 0.05

0.19

Home

Wednesday, November 14, 2012

slide-80
SLIDE 80

Their Reach Prob:

2: 0.5*0.75 K: 0.5*0.1

My Value:

2: 0.09 K: 0.23

  • 0.29,

0.13

  • 0.29

0.13

1: Walking the Public Tree

0.09 0.23 0.09 0.23

  • 0.05

0.19 0.09 0.23

Home

Wednesday, November 14, 2012

slide-81
SLIDE 81

Their Reach Prob:

2: 0.5*0.75 K: 0.5*0.1

My Value:

2: -0.2 K: 0.36

  • 0.2

0.36

  • 0.2

0.36

1: Walking the Public Tree

Home

Wednesday, November 14, 2012

slide-82
SLIDE 82

Their Reach Prob:

2: 0.5*0.75 K: 0.5*0.1

My Value:

0.18

0.17 0.17

1: Walking the Public Tree

Home

Wednesday, November 14, 2012

slide-83
SLIDE 83

Polaris 2008

Agent Size Tilt Best Response Pink 266m 0, 0, 0, 0 235.294 Orange 266m 7, 0, 0, 7 227.457 Peach 266m 0, 0, 0, 7 228.325 Red 115m 0, -7, 0, 0 257.231 Green 115m 0, -7, 0, -7 263.702 (Reference) 115m 0, 0, 0, 0 266.797

Home

Wednesday, November 14, 2012

slide-84
SLIDE 84

Polaris

100 200 300 400 500 2006 2007 2008 2009 2010 2011

Polaris Hyperborean

Home

Wednesday, November 14, 2012

slide-85
SLIDE 85

Polaris

100 200 300 400 500 2006 2007 2008 2009 2010 2011

Polaris Hyperborean

Man-vs-Machine 2007 Narrow loss

Home

Wednesday, November 14, 2012

slide-86
SLIDE 86

Polaris

100 200 300 400 500 2006 2007 2008 2009 2010 2011

Polaris Hyperborean

Man-vs-Machine 2007 Narrow loss Man-vs-Machine 2008 Narrow win

Home

Wednesday, November 14, 2012

slide-87
SLIDE 87

Polaris

100 200 300 400 500 2006 2007 2008 2009 2010 2011

Polaris Hyperborean

Man-vs-Machine 2007 Narrow loss Man-vs-Machine 2008 Narrow win

Home

Wednesday, November 14, 2012

slide-88
SLIDE 88

Tilting

260 280 300 320 340

  • 20
  • 10

10 20 Exploitability (mb/g) Percent bonus for winner

Home

Wednesday, November 14, 2012

slide-89
SLIDE 89

Tilting: 7%

200 250 300 350 400 50 100 150 200 250 300 Exploitability (mb/g) Abstraction size (millions of information sets) Untilted Perc. E[HS2] Untilted k-Means Tilted Perc. E[HS2] Tilted k-Means

Home

Wednesday, November 14, 2012

slide-90
SLIDE 90

Counterfactual Regret Minimization: Abstract-Game Best Response

10-bucket Perfect Recall, Percentile 10 E[HS^2] 4 8 12 16 20 400 800 1200 1600 2000

Exploitability (mbb/g) Iterations (million)

Home

Wednesday, November 14, 2012

slide-91
SLIDE 91

Counterfactual Regret Minimization: Real Game Best Response

10-bucket Perfect Recall, Percentile 10 E[HS^2] 200 220 240 260 280 300 1600 3200 4800 6400 8000

Exploitability (mbb/g) Iterations (million)

Home

Wednesday, November 14, 2012

slide-92
SLIDE 92

Hyperborean 2009

100 200 300 400 500 2006 2007 2008 2009 2010 2011

Polaris Hyperborean

Best Response (mbb/g) Year

?

Home

Wednesday, November 14, 2012

slide-93
SLIDE 93

Abstraction: Perc HS2

Home

Wednesday, November 14, 2012

slide-94
SLIDE 94

Abstraction: k-Means

Home

Wednesday, November 14, 2012

slide-95
SLIDE 95

Abstraction: HS Distributions

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Probability Expected Hand Strength Distribution over future outcomes for hand AsAd E[HS]

Home

Wednesday, November 14, 2012

slide-96
SLIDE 96

Abstraction: HS Distributions

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Probability Expected Hand Strength Distribution over future outcomes for hand 2s7c E[HS]

Home

Wednesday, November 14, 2012

slide-97
SLIDE 97

k-Means Earthmover Abstraction

Home

Wednesday, November 14, 2012

slide-98
SLIDE 98

3: Fast Terminal Node Evaluation

My Values: His Reach Probs: 0.1 0.05 0.02 = ? = ? = ?

Home

Wednesday, November 14, 2012

slide-99
SLIDE 99

3: Fast Terminal Node Evaluation

My Values: His Reach Probs: 0.1 0.05 0.02

= 0*0.1 + u*0.05 + u*0.02 + ... = -u*0.1 + 0*0.05 + u*0.02 + ... = -u*0.1 + -u*0.05 + 0*0.02 + ...

u = utility for winner ... ...

Home

Wednesday, November 14, 2012

slide-100
SLIDE 100

3: Fast Terminal Node Evaluation

The obvious O(n^2) algorithm:

r[i] = his reach probs v[i] = my values u = utility for the winner for( a = each of my hands ) for( b = each of his hands ) if( a > b ) v[a] += u*r[b] else if( a < b ) v[a] -= u*r[b]

Home

Wednesday, November 14, 2012

slide-101
SLIDE 101

3: Fast Terminal Node Evaluation

But games are fun because they have structure in determining the payoffs, and we can take advantage

  • f that.

This Vector-vs-Vector evaluation can often be done in O(n) time, and not just in poker.

Home

Wednesday, November 14, 2012

slide-102
SLIDE 102

3: Fast Terminal Node Evaluation

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1 sum_win_prob = 0; sum_lose_prob = 0; for( i = each of his hands ) sum_lose_prob += r[i]

Home

Wednesday, November 14, 2012

slide-103
SLIDE 103

3: Fast Terminal Node Evaluation

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1 sum_win_prob = 0 sum_lose_prob = 0.5

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Home

Wednesday, November 14, 2012

slide-104
SLIDE 104

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1 sum_win_prob = 0 sum_lose_prob = 0.5

Home

Wednesday, November 14, 2012

slide-105
SLIDE 105

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1 sum_win_prob = 0 sum_lose_prob = 0.45

Home

Wednesday, November 14, 2012

slide-106
SLIDE 106

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1 sum_win_prob = 0 sum_lose_prob = 0.35

Home

Wednesday, November 14, 2012

slide-107
SLIDE 107

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1

  • 0.35u

sum_win_prob = 0 sum_lose_prob = 0.35

Home

Wednesday, November 14, 2012

slide-108
SLIDE 108

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1

  • 0.35u
  • 0.35u

sum_win_prob = 0 sum_lose_prob = 0.35

Home

Wednesday, November 14, 2012

slide-109
SLIDE 109

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1

  • 0.35u
  • 0.35u

sum_win_prob = 0.05 sum_lose_prob = 0.35

Home

Wednesday, November 14, 2012

slide-110
SLIDE 110

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1

  • 0.35u
  • 0.35u

sum_win_prob = 0.15 sum_lose_prob = 0.35

Home

Wednesday, November 14, 2012

slide-111
SLIDE 111

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1

  • 0.35u
  • 0.35u

sum_win_prob = 0.15 sum_lose_prob = 0.25

Home

Wednesday, November 14, 2012

slide-112
SLIDE 112

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1

  • 0.35u
  • 0.35u

sum_win_prob = 0.15 sum_lose_prob = 0.20

Home

Wednesday, November 14, 2012

slide-113
SLIDE 113

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1

  • 0.35u
  • 0.35u

0.05u sum_win_prob = 0.15 sum_lose_prob = 0.20

Home

Wednesday, November 14, 2012

slide-114
SLIDE 114

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1

  • 0.35u
  • 0.35u

0.05u 0.05u sum_win_prob = 0.15 sum_lose_prob = 0.20

Home

Wednesday, November 14, 2012

slide-115
SLIDE 115

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1

  • 0.35u
  • 0.35u

0.05u 0.05u sum_win_prob = 0.25 sum_lose_prob = 0.20

Home

Wednesday, November 14, 2012

slide-116
SLIDE 116

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1

  • 0.35u
  • 0.35u

0.05u 0.05u sum_win_prob = 0.3 sum_lose_prob = 0.20

Home

Wednesday, November 14, 2012

slide-117
SLIDE 117

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1

  • 0.35u
  • 0.35u

0.05u 0.05u sum_win_prob = 0.3 sum_lose_prob = 0.10

Home

Wednesday, November 14, 2012

slide-118
SLIDE 118

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1

  • 0.35u
  • 0.35u

0.05u 0.05u sum_win_prob = 0.3 sum_lose_prob = 0.0

Home

Wednesday, November 14, 2012

slide-119
SLIDE 119

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1

  • 0.35u
  • 0.35u

0.05u 0.05u 0.3u sum_win_prob = 0.3 sum_lose_prob = 0.0

Home

Wednesday, November 14, 2012

slide-120
SLIDE 120

3: Fast Terminal Node Evaluation

for( s = each set of equal-strength hands ) for( i = each tied hand in s ) sum_lose_prob -= r[i]; for( i = each tied hand in s ) v[i] = -u*sum_lose_prob + u*sum_win_prob for( i = each tied hand in s ) sum_win_prob += r[i];

Reach: Value:

0.05 0.1 0.1 0.05 0.1 0.1

  • 0.35u
  • 0.35u

0.05u 0.05u 0.3u 0.3u sum_win_prob = 0.3 sum_lose_prob = 0.0

Home

Wednesday, November 14, 2012