U G A V michael.johanson@gmail.com ! " # ! K Q $ - - PowerPoint PPT Presentation

u g
SMART_READER_LITE
LIVE PREVIEW

U G A V michael.johanson@gmail.com ! " # ! K Q $ - - PowerPoint PPT Presentation

Robust Strategies and Counter-Strategies: From Superhuman to Optimal Play Mike Johanson January 14, 2016 Grad Seminar Q J $ # K 1 0 P C R " ! U G A V michael.johanson@gmail.com ! " # ! K Q $ @mikebjohanson


slide-1
SLIDE 1

Robust Strategies and 
 Counter-Strategies: From Superhuman to Optimal Play Mike Johanson January 14, 2016 Grad Seminar

U V

A ! A !

C

K " K "

P

Q # Q #

R

J $ J $

G

1 ! 1 !

University of Alberta Computer Poker Research Group

University of Alberta Computer Poker Research Group

michael.johanson@gmail.com @mikebjohanson

slide-2
SLIDE 2

Games as a testbed for Artificial Intelligence

slide-3
SLIDE 3

Games as a testbed for Artificial Intelligence

slide-4
SLIDE 4

Games as a testbed for Artificial Intelligence

Chinook (Checkers):

  • Surpassed humans in 1994
  • Solved (perfect play) in 2007
slide-5
SLIDE 5

Games as a testbed for Artificial Intelligence

Chinook (Checkers):

  • Surpassed humans in 1994
  • Solved (perfect play) in 2007

Deep Blue (Chess):

  • Surpassed humans in 1997
slide-6
SLIDE 6

Games as a testbed for Artificial Intelligence

Chinook (Checkers):

  • Surpassed humans in 1994
  • Solved (perfect play) in 2007

Deep Blue (Chess):

  • Surpassed humans in 1997

Watson (Jeopardy!):

  • Surpassed humans in 2011
slide-7
SLIDE 7

Games as a testbed for Artificial Intelligence

Chinook (Checkers):

  • Surpassed humans in 1994
  • Solved (perfect play) in 2007

Deep Blue (Chess):

  • Surpassed humans in 1997

Watson (Jeopardy!):

  • Surpassed humans in 2011

Current challenges (not yet superhuman): go, Atari 2600 games, General Game Playing, Starcraft, RoboCup, poker, curling (?!) and so on…

slide-8
SLIDE 8

Games as a testbed for Artificial Intelligence

slide-9
SLIDE 9

Babbage and Lovelace: Wanted “Games of Purely Intellectual Skill” to demonstrate their Analytical Engine. Chess, Tic-Tac-Toe. Horse racing?

Games as a testbed for Artificial Intelligence

slide-10
SLIDE 10

Alan Turing: Wrote a chess program before first computers, and ran it by hand. Chess as part of the Turing Test. Babbage and Lovelace: Wanted “Games of Purely Intellectual Skill” to demonstrate their Analytical Engine. Chess, Tic-Tac-Toe. Horse racing?

Games as a testbed for Artificial Intelligence

slide-11
SLIDE 11

John von Neumann: Founded Game Theory to study rational decision making. Needed computational power to drive it, became pioneer in Computing Science. Alan Turing: Wrote a chess program before first computers, and ran it by hand. Chess as part of the Turing Test. Babbage and Lovelace: Wanted “Games of Purely Intellectual Skill” to demonstrate their Analytical Engine. Chess, Tic-Tac-Toe. Horse racing?

Games as a testbed for Artificial Intelligence

slide-12
SLIDE 12

Core idea in this line of research: We aspire to create agents that can achieve their goals in complex real-world domains. Games provide a series of well-defined and tractable domains that humans find challenging. New games introduce new challenges that current approaches can’t handle. This is a gradient we can follow.

slide-13
SLIDE 13

Core idea in this line of research: We aspire to create agents that can achieve their goals in complex real-world domains. Games provide a series of well-defined and tractable domains that humans find challenging. New games introduce new challenges that current approaches can’t handle. This is a gradient we can follow.

slide-14
SLIDE 14

Core idea in this line of research: We aspire to create agents that can achieve their goals in complex real-world domains. Games provide a series of well-defined and tractable domains that humans find challenging. New games introduce new challenges that current approaches can’t handle. This is a gradient we can follow.

slide-15
SLIDE 15

Core idea in this line of research: We aspire to create agents that can achieve their goals in complex real-world domains. Games provide a series of well-defined and tractable domains that humans find challenging. New games introduce new challenges that current approaches can’t handle. This is a gradient we can follow. Can play against humans, to compare Artificial Intelligence to Human Intelligence.

slide-16
SLIDE 16

John von Neumann pioneered Game Theory. When asked about real life and chess, he said…

slide-17
SLIDE 17

Real life is not like that. Real life consists of bluffing,

  • f little tactics of deception,
  • f asking yourself what is the other man

going to think I mean to do. And that is what games are about in my theory.

John von Neumann pioneered Game Theory. When asked about real life and chess, he said…

slide-18
SLIDE 18

Chess is a.. 2-player,

deterministic, perfect information game, with win / lose / tie outcomes.

slide-19
SLIDE 19

Poker:

2-10 Players (at one table) Thousands (tournaments)

Chess is a.. 2-player,

deterministic, perfect information game, with win / lose / tie outcomes.

slide-20
SLIDE 20

Poker:

Stochastic: Cards randomly dealt to players and the table. 2-10 Players (at one table) Thousands (tournaments)

Chess is a.. 2-player,

deterministic, perfect information game, with win / lose / tie outcomes.

slide-21
SLIDE 21

Poker:

Imperfect Information: Opponent’s cards are hidden. Stochastic: Cards randomly dealt to players and the table. 2-10 Players (at one table) Thousands (tournaments)

Chess is a.. 2-player,

deterministic, perfect information game, with win / lose / tie outcomes.

slide-22
SLIDE 22

Maximize winnings by exploiting

  • pponent errors.

Poker:

Imperfect Information: Opponent’s cards are hidden. Stochastic: Cards randomly dealt to players and the table. 2-10 Players (at one table) Thousands (tournaments)

Chess is a.. 2-player,

deterministic, perfect information game, with win / lose / tie outcomes.

slide-23
SLIDE 23

2008: PhD Start 2015: PhD End My Research and This Grad Seminar Topic: Computing strong strategies in Imperfect Information Games

slide-24
SLIDE 24

2008: PhD Start 2015: PhD End My Research and This Grad Seminar Two key milestones in 2-Player limit hold’em poker:

slide-25
SLIDE 25

2008: PhD Start 2015: PhD End My Research and This Grad Seminar Two key milestones in 2-Player limit hold’em poker:

>

First computer victory over human poker pros.

slide-26
SLIDE 26

Game solved. Computer is now optimal.

2008: PhD Start 2015: PhD End

>= Everyone,

forever.

My Research and This Grad Seminar Two key milestones in 2-Player limit hold’em poker:

First computer victory over human poker pros.

slide-27
SLIDE 27

2008: PhD Start 2015: PhD End Solving Attempt #1 Solving Attempt #2 Solving Attempt #3… My Research and This Grad Seminar Two key milestones in 2-Player limit hold’em poker:

First computer victory over human poker pros. Game solved. Computer is now optimal.

slide-28
SLIDE 28

2008: PhD Start 2015: PhD End My Research and This Grad Seminar Two key milestones in 2-Player limit hold’em poker:

First computer victory over human poker pros. Game solved. Computer is now optimal.

Note: I’ll be very high-level in this talk. This is a summary of 7 papers in my thesis, and 7 more not in my thesis. Ask questions!

slide-29
SLIDE 29

Superhuman Play: The Abstraction-Solving-Translation Procedure. This is how we beat the pros in 2008. First used in poker by Shi and Littman in 2002. Still the dominant approach in large games.

slide-30
SLIDE 30

Terminology: Strategy: A policy for playing a game. At every decision, a probability distribution over actions.

slide-31
SLIDE 31

Terminology: Strategy: A policy for playing a game. At every decision, a probability distribution over actions. Best Response: A strategy that maximizes utility against a specific target strategy.

slide-32
SLIDE 32

Terminology: Strategy: A policy for playing a game. At every decision, a probability distribution over actions. Best Response: A strategy that maximizes utility against a specific target strategy. Nash Equilibrium: A strategy for every player that are all mutually best responses to the others.
 
 In a 2-player zero-sum game, it’s guaranteed to do no worse than tie.

slide-33
SLIDE 33

Game (10^14 Decisions) Strategy

AI

Solve the game by computing a Nash Equilibrium. (Opponent Modelling comes later)

slide-34
SLIDE 34

AI

Evaluation EV against humans,


  • ther programs

Game (10^14 Decisions) Strategy

slide-35
SLIDE 35

AI

Evaluation Exploitability by 
 Best Response

Exploitability: Expected loss against a best response. Intractable to compute until 2011.

EV against humans,


  • ther programs

Game (10^14 Decisions) Strategy

slide-36
SLIDE 36

The AI Step: Counterfactual Regret Minimization (CFR) vs

1 2

Start with Uniform Random strategy. Repeatedly plays against itself.

2a

Update: At each decision, use the historically best actions more often. (minimizing regret)

3

Average strategy converges towards a Nash equilibrium.

slide-37
SLIDE 37

The AI Step: Counterfactual Regret Minimization (CFR)

  • Memory Cost: 2 doubles per Action-at-Decision-Point

(16 bytes)

slide-38
SLIDE 38

Real Game (10^14 Decisions) Real Strategy

AI

Evaluation

Problem: Game has 3.6 *1013 actions. At 16 bytes each… 523 TB storage. ~10,000 CPU-years runtime. :(

EV against humans,


  • ther programs

Exploitability by 
 Best Response

slide-39
SLIDE 39

Real Game (10^14 Decisions) Real Strategy

AI

Evaluation

Problem: Game has 3.6 *1013 actions. At 16 bytes each… 523 TB storage. :( ~10,000 CPU-years runtime. :(

EV against humans,


  • ther programs

Exploitability by 
 Best Response

slide-40
SLIDE 40

Real Game (10^14 Decisions) Real Strategy

AI

Evaluation

Problem: Game has 3.6 *1013 actions. At 16 bytes each… 523 TB storage. :( ~10,000 CPU-years runtime. :( :( :(

EV against humans,


  • ther programs

Exploitability by 
 Best Response

slide-41
SLIDE 41

Real Game (10^14 Decisions) Real Strategy Abstract Game (10^10 Decisions) Evaluation

AI

Workaround: Cluster similar decisions together. Lossy.

EV against humans,


  • ther programs

Exploitability by 
 Best Response

slide-42
SLIDE 42

Real Game (10^14 Decisions) Real Strategy Abstract Game (10^10 Decisions) Evaluation

AI

Using k-means to cluster billions of poker hands into 100k - 1M centroids

EV against humans,


  • ther programs

Exploitability by 
 Best Response

slide-43
SLIDE 43

Real Game (10^14 Decisions) Real Strategy Abstract Game (10^10 Decisions) Abstract Strategy Evaluation

AI

Solve the small game.

AI

EV against humans,


  • ther programs

Exploitability by 
 Best Response

slide-44
SLIDE 44

Real Game (10^14 Decisions) Real Strategy Abstract Game (10^10 Decisions) Abstract Strategy Evaluation

AI AI

Use small strategy to act in the real game. NOTE: Not optimal! Lossy abstraction!

EV against humans,


  • ther programs

Exploitability by 
 Best Response

slide-45
SLIDE 45

Real Game (10^14 Decisions) Real Strategy Abstract Game (10^10 Decisions) Abstract Strategy Opponent Model Evaluation

AI AI

Optional: Can use info about adversary,

  • r adapt online, to

increase winnings. In thesis, not in this talk.

EV against humans,


  • ther programs

Exploitability by 
 Best Response

slide-46
SLIDE 46

Intuition:

Using abstraction limits the strategy’s strength. Merging decisions together loses information.

slide-47
SLIDE 47

Intuition:

Using abstraction limits the strategy’s strength. Merging decisions together loses information. Bigger (finer-grained) and Better (feature-preserving) abstractions Better Strategies: wins more, less exploitable

slide-48
SLIDE 48

Intuition:

Using abstraction limits the strategy’s strength. Merging decisions together loses information. Bigger (finer-grained) and Better (feature-preserving) abstractions Better Computers, Better Algorithms Can solve bigger abstractions Better Strategies: wins more, less exploitable

slide-49
SLIDE 49

Abstraction-Solving-Translation was enough to beat top human pros. In retrospect, it was easy: ~8 GB RAM, a few CPU-days. Fairly small abstractions, too! 2007: Narrow loss. 4 GB strategy. 2008: Narrow win. 8 GB strategy. In 2011, we discovered that these strategies were VERY exploitable.

slide-50
SLIDE 50

The Man-vs-Machine strategies were beatable, but small. At the time, we thought: to be optimal, maybe we just have to solve a big enough abstraction! If we can reduce exploitability to “1 milli-big-blind”, then it’s essentially solved. Close enough - justification later in this talk.

slide-51
SLIDE 51

Solving Attempt #1 (2008-2011): The Man-vs-Machine strategies were beatable, but small. At the time, we thought: to be optimal, maybe we just have to solve a big enough abstraction! If we can reduce exploitability to “1 milli-big-blind”, then it’s essentially solved. Close enough - justification later in this talk.

slide-52
SLIDE 52

In 2011, we wrote a fast algorithm for finding perfect real-game counter-strategies. (IJCAI 2011) For the first time, we could measure exploitability! We turned a 10 CPU-year computation into a 76 CPU-day computation. 1 day on the cluster.

slide-53
SLIDE 53

50 100 150 200 250 300 350 400 100 200 300 400 500 600 700 800 Exploitability (mb/g) Abstraction size (millions of decisions) 2007-ACPC 2007-MVM 2008-ACPC 2008-MVM 2010-ACPC

Looking back at 5 years of progress!

slide-54
SLIDE 54

50 100 150 200 250 300 350 400 100 200 300 400 500 600 700 800 Exploitability (mb/g) Abstraction size (millions of decisions) IR Public k-Means IR k-Means IR Public Perc. E[HS2] PR Perc. E[HS2]

This was worrying… Flattening

  • ut

already? We’ll just solve a big enough abstraction!

slide-55
SLIDE 55

…But here’s the overfitting effect:

10-1 100 101 102 103 104 105 106 107 260 280 300 320 340 Abstract Game

CFR Iterations Abstract Exploitability Real Exploitability

slide-56
SLIDE 56

Abstract Exploitability Real Exploitability

10-1 100 101 102 103 104 105 106 107 260 280 300 320 340 Abstract Game Real Game

Real Exploitability CFR Iterations …But here’s the overfitting effect:

slide-57
SLIDE 57

So: we’re far from solved, and have a serious problem! But we’re stuck with abstraction. Can a different algorithm avoid overfitting?

slide-58
SLIDE 58

Solving Attempt #2 (2012): We’ll solve a really big abstraction, but properly, so we don’t overfit.

slide-59
SLIDE 59

We’re solving a 2-player game. If both players use abstraction, we overfit. What if one player uses abstraction, and their opponent doesn’t? By definition, abstracted player minimizes exploitability!

slide-60
SLIDE 60

We’re solving a 2-player game. If both players use abstraction, we overfit. What if one player uses abstraction, and their opponent doesn’t? By definition, abstracted player minimizes exploitability!

slide-61
SLIDE 61

CFR-BR (AAAI 2012) Normally, even one unobstructed player would cost 262 TB of memory. But we can do it without that much… The 76-day best response computation does that! Maybe if we run that in a loop… and use sampling tricks to avoid the time cost… it’s feasible!

slide-62
SLIDE 62

CFR-BR (AAAI 2012) Normally, even one unobstructed player would cost 262 TB of memory. But we can do it without that much… The 76-day best response computation does that! Maybe if we run that in a loop… and use sampling tricks to avoid the time cost… it’s feasible!

slide-63
SLIDE 63

101 102 103 105 106 107 108 109 289.253 60.687 CFR CFR-BR

CFR-BR, 2 GB Abstraction CPU-Seconds Exploitability Promising results! CFR-BR has no overfitting, and is far less exploitable! Small abstraction, but beat all previous strategies!

slide-64
SLIDE 64

In a big strategy (225 GB to solve), we got closer to optimal than ever before.

101 102 103 106 107 108 109 37.170 53.7929 Hyperborean 2011.IRO CFR-BR Average CFR-BR Current

CPU-Seconds Exploitability CFR-BR, 225 GB Abstraction

slide-65
SLIDE 65

However, CFR-BR lost in actual games. Assuming opponent is stronger —> too pessimistic!

  • 150
  • 125
  • 100
  • 75
  • 50
  • 25

25 50 105 106 107 108 One-on-One (mbb/g) Time (CPU-seconds)

  • 17.1

CFR CFR-BR CFR-BR Current CFR

CFR-BR, 2 GB Abstraction

slide-66
SLIDE 66

And still wasn’t getting low enough:

101 102 105 106 107 Exploitability (mbb/g) Time (real seconds using 1024 ccNUMA cores) CFR-BR: Canon-Canon-Canon-OCHS-7200 on a UV1000 69.7 49.0 41.4 36.0 32.4 30.0 28.7 27.9 27.2 26.6

280 Public x 7200 OCHS Avg

CFR-BR, 2 TB Abstraction

slide-67
SLIDE 67

That last strategy was computed on “Hungabee”, an SGI UV 1000 in GSB. 16TB, 2048 cores. North Saskatchewan River, -10C day

slide-68
SLIDE 68

Water cooling, heat dumped to river. That last strategy was computed on “Hungabee”, an SGI UV 1000 in GSB. 16TB, 2048 cores. North Saskatchewan River, -10C day

slide-69
SLIDE 69

Program Output That last strategy was computed on “Hungabee”, an SGI UV 1000 in GSB. 16TB, 2048 cores. Water cooling, heat dumped to river. North Saskatchewan River, -10C day

slide-70
SLIDE 70

Solving Attempt #3 (2013): CFR-D: We’ll avoid the memory cost by solving game fragments as needed. Watch for this in Neil Burch’s upcoming thesis! Flaw: ~16 GB instead of 523 TB of storage… …but massive increase in CPU time required.

slide-71
SLIDE 71

Finally: Heads-Up Limit Texas Hold’em is Solved. Science, 2015.

slide-72
SLIDE 72

Real Game (10^14 Decisions) Real Strategy Abstract Game (10^10 Decisions) Abstract Strategy Evaluation

AI AI

Exploitability by 
 Perfect Counter-Strategy EV against humans,


  • ther programs
  • Hm. Abstraction

is a dead end for perfection. Was solving it directly really infeasible? Old predictions: Memory: 523 TB CPU: ~10k years

slide-73
SLIDE 73

In October 2013, our coauthor Oskari Tammelin contacted us with two ideas: Poker-specific data compression. 523 TB —> 17 TB

1

slide-74
SLIDE 74

In October 2013, our coauthor Oskari Tammelin contacted us with two ideas: Poker-specific data compression. 523 TB —> 17 TB

1 2

CFR+. A new (at that time theoretically unproven) variant that converges amazingly quickly. Key change: floor regret values at zero.

slide-75
SLIDE 75

Third piece: Massive resources from Compute Canada. From our earlier attempts, we had experience with large distributed programs.

slide-76
SLIDE 76

Third piece: Massive resources from Compute Canada. “Mammouth” cluster in Quebec. We used 200 nodes, 24 cores/node. 4800 cores. Each node had 32 GB RAM, and 1 TB of local disk. Each node handled a set of

  • subgames. Solve with

massive parallelism.

slide-77
SLIDE 77

One last wrinkle: Essentially Solving a Game Our algorithms converge towards optimal play in the limit. “Solved” means unbeatable. We can only approximate it. So how close is “close enough”?

slide-78
SLIDE 78

What if a human lifetime of play wasn’t enough for someone to claim to beat

  • ur program?

One last wrinkle: Essentially Solving a Game

slide-79
SLIDE 79

What if a human lifetime of play wasn’t enough for someone to claim to beat

  • ur program?

One last wrinkle: Essentially Solving a Game (200 games/hour) * (12 hours/day) * (70 years) = 60 million games.

slide-80
SLIDE 80

What if a human lifetime of play wasn’t enough for someone to claim to beat

  • ur program?

One last wrinkle: Essentially Solving a Game (200 games/hour) * (12 hours/day) * (70 years) = 60 million games. That isn’t enough to discern “1 milli-big-blind”

  • f exploitability with 95% confidence.

So that’s our goal.

slide-81
SLIDE 81

3000 1 10 100 1000 70 1 10 Exploitability (mbb/g) Time (Calendar Days) Holdem: CFR+ Exploitability over Days

After 70 days (900 CPU-years), we reached 0.986 mbb/g. Essentially solved.

slide-82
SLIDE 82

After 70 days (900 CPU-years), we reached 0.986 mbb/g. Essentially solved.

3 1 30 70 Exploitability (mbb/g) Time (Calendar Days) Holdem: CFR+ Exploitability over Days

slide-83
SLIDE 83

Game Start After a Raise Play against it, inspect strategy, download the code: http://poker.srv.ualberta.ca

slide-84
SLIDE 84

Conclusion:

2008: PhD Start 2015: PhD End Solving Attempt #1 Solving Attempt #2 Solving Attempt #3…

First computer victory over human poker pros. Game solved. Computer is now optimal.

  • My research spanned the End-to-End task of


Abstraction-Solving-Translation


  • Much easier to surpass humans than to be perfect!
  • General set of tools: applicable to other games,


and outside the games domain entirely.