Multi-agent learning Simplied Poker Yannick Bitane , April 14th, - - PowerPoint PPT Presentation

multi agent learning
SMART_READER_LITE
LIVE PREVIEW

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, - - PowerPoint PPT Presentation

Multi-agent learning Simplified Poker Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 1 Multi-agent learning Simplified Poker


slide-1
SLIDE 1

Multi-agent learning Simplified Poker

Multi-agent learning

Simplied Poker

Yannick Bitane, April 14th, 2011. Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 1

slide-2
SLIDE 2

Multi-agent learning Simplified Poker

Contents

  • Poker in Multi Agent Learning
  • Gilpin & Sandholm
  • Formal mechanics

Ordered games Information filters Equilibrium preserving abstractions

  • GameShrink

Algorithm sketch Results

* Gilpin & Sandholm (2005): Finding equilibria in large sequential games of imperfect

  • information. Technical Report CMU-CS-05-158, Carnegie Mellon University.

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 2

slide-3
SLIDE 3

Multi-agent learning Simplified Poker

Poker in MAL

  • AI Testbed: incomplete information game
  • Texas Hold’em

Game tree tremendously large Texas Hold’em, 2 player limit: 1018 nodes

  • How to solve this game?

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 3

slide-4
SLIDE 4

Multi-agent learning Simplified Poker

Gilpin & Sandholm’s approach

  • Rhode Island Hold’em

Strategically similar, but much less branching 3.1 · 109 nodes

  • GameShrink: Reduce branching by merging equivalent branches
  • Proven: Nash equilibria in the reduced game tree correspond to

Nash equilibria in the original tree.

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 4

slide-5
SLIDE 5

Multi-agent learning Simplified Poker

Contents (Again)

  • Poker in Multi Agent Learning
  • Gilpin & Sandholm
  • Formal mechanics

Ordered games Information filters Equilibrium preserving abstractions

  • GameShrink

Algorithm sketch Results

= ⇒ No introduction to poker, no demo, no proofs.

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 5

slide-6
SLIDE 6

Multi-agent learning Simplified Poker

Ordered games

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 6

slide-7
SLIDE 7

Multi-agent learning Simplified Poker

DEFINITION 1. An ordered game is a tuple Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩, where:

  • 1. I = {1, . . . , n} is a finite set of players.
  • 2. G = ⟨G1, . . . , Gr⟩,

Gj = (Vj, Ej) is a finite collection of finite directed trees, with nodes Vj, and edges Ej. Let Zj ⊂ Vj be the leaf nodes of Gj. Let Nj(v) be the outgoing neighbors of v ∈ Vj.

  • 3. L = ⟨L1, . . . , Lr⟩, Lj : Vj \ Zj → I

indicates which player is to act in round j.

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 7

slide-8
SLIDE 8

Multi-agent learning Simplified Poker

DEFINITION 1. An ordered game is a tuple Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩, where:

  • 4. Θ is a finite set of signals.
  • 5. κ = ⟨κ1, . . . , κr⟩ is the number of public signals revealed.

γ = ⟨γ1, . . . , γr⟩ is the number of private signals revealed. (per player in round j) The public information revealed in round j is αj ∈ Θκj, and in all rounds up through j is ˜ αj = (α1, . . . , αj). The private information revealed to player i ∈ I in round j is βj

i ∈ Θγj,

in all rounds up through j is ˜ βj

i = (β1 i , . . . , βj i).

Each signal θ ∈ Θ may only be revealed once.

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 8

slide-9
SLIDE 9

Multi-agent learning Simplified Poker

DEFINITION 1. An ordered game is a tuple Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩, where:

  • 6. p is a probability distribution over Θ, with p(θ) > 0 for all θ ∈ Θ.

Signals are drawn from Θ according to p without replacement, so if A is the set of signals already revealed, then p(x|A) =   

p(x) ∑y/

∈A p(y)

if x /

∈ A

if x ∈ A.

  • 7. ≽ is a partial ordering of subsets of Θ, and

is defined for at least those pairs required by u. (coming up in 2 slides)

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 9

slide-10
SLIDE 10

Multi-agent learning Simplified Poker

DEFINITION 1. An ordered game is a tuple Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩, where: Recall: Zj are the leaf nodes of Gj.

  • 8. ω : ∪r

j=1 Zj → {over, continue}

is a mapping of terminal nodes in round Gj to one of two values:

  • ver,

in which case the game ends, or continue, in which case the game continues to the next round. Clearly, for all z ∈ Zr we require ω(z) = over. Let ωj

  • ver = {z ∈ Zj|ω(z)} = over,

ωj

cont = {z ∈ Zj|ω(z)} = continue.

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 10

slide-11
SLIDE 11

Multi-agent learning Simplified Poker

DEFINITION 1. An ordered game is a tuple Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩, where:

  • 9. u = (u1, . . . , ur),

uj :

j−1

×

k=1

ωk

cont × ωj

  • ver ×

j

×

k=1

Θκk ×

n

×

i=1 j

×

k=1

Θγk → Rn is a utility function, such that: for every j such that 1 ≤ j ≤ r, for every i ∈ I, and for every ˜ z ∈ [

j−1

×

k=1

ωk

cont × ωj

  • ver

] , at least one of the following two conditions holds: (a) Utility is signal independent, that is: uj

i(˜

z, ϑ) = uj

i(˜

z, ϑ′) for all legal ϑ and ϑ′ ∈ [

j

×

k=1

Θκk ×

n

×

i=1 j

×

k=1

Θγk ] . (b) See next slide.

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 11

slide-12
SLIDE 12

Multi-agent learning Simplified Poker

DEFINITION 1. An ordered game is a tuple Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩, where:

  • 9. u = (u1, . . . , ur),

uj :

j−1

×

k=1

ωk

cont × ωj

  • ver ×

j

×

k=1

Θκk ×

n

×

i=1 j

×

k=1

Θγk → Rn is a utility function, such that: for every j such that 1 ≤ j ≤ r, for every i ∈ I, and for every ˜ z ∈ [

j−1

×

k=1

ωk

cont × ωj

  • ver

] , at least one of the following two conditions holds: (a) Utility is signal independent. (b) ≽ is defined for all legal signals (˜ αj, ˜ βj

i) and (˜

αj, ˜ β′j

i) through round j,

and a player’s utility is increasing in her private signals, all else equal: [

αj, ˜ βj

i) ≽ (˜

αj, ˜ β′j

i)

]

= ⇒

[ ui(˜ z, ˜ αj, ( ˜ βj

i, ˜

βj

−i)) ≥ ui(˜

z, ˜ αj, ( ˜ β′j

i, ˜

βj

−i))

] .

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 12

slide-13
SLIDE 13

Multi-agent learning Simplified Poker

DEFINITION 1. An ordered game is a tuple Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩.

  • 1. I: finite set of players.
  • 2. Gj = (Vj, Ej), where Vj are the nodes and Ej the edges in j.

Zj ⊂ Vj: the leaf nodes of Gj. Nj(v): the outgoing neighbors of v ∈ Vj.

  • 3. Lj: mapping from non-terminal nodes to players (to act in round j).
  • 4. Θ: finite set of signals.
  • 5. κj: total number of public signals revealed per player in round j.

γj: total number of private signals revealed per player in round j.

  • 6. p: probability distribution over Θ.
  • 7. ≽: partial ordering of subsets of Θ.
  • 8. ω: mapping from terminal nodes in each round to {over, continue}.
  • 9. u: utility function.

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 13

slide-14
SLIDE 14

Multi-agent learning Simplified Poker

Information filters

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 14

slide-15
SLIDE 15

Multi-agent learning Simplified Poker

Let Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩ be an ordered game. Let Sj be the set of legal∗ signals for one player up through round j. DEFINITION 2. An information filter for Γ is a collection F = ⟨F1, . . . , Fr⟩ where each Fj is a function Fj : Sj → 2Sj s.t. the following conditions hold:

  • 1. Truthfulness. (˜

αj, ˜ βj

i) ∈ Fj(˜

αj, ˜ βj

i) for all legal (˜

αj, ˜ βj

i).

  • 2. Independence. The range of Fj is a partition of Sj.
  • 3. Information preservation. If two values of a signal are distinguishable in

round k, then they are distinguishable for each round j > k. That is, let mj = ∑

j l=1 κl + γl. We require that

for all legal∗ (θ1, . . . , θmk, . . . , θmj) ⊆ Θ and (θ′

1, . . . , θ′ mk, . . . , θ′ mj) ⊆ Θ:

if

(θ′

1, . . . , θ′ mk) /

∈ Fk(θ1, . . . , θmk),

then (θ′

1, . . . , θ′ mk, . . . , θ′ mj) /

∈ Fk(θ1, . . . , θmk, . . . , θmj).

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 15

slide-16
SLIDE 16

Multi-agent learning Simplified Poker

Example

Intuition: by passing signals through a filter before revealing them, informative precision can be reduced while keeping the underlying action space intact, thus reducing game tree.

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 16

slide-17
SLIDE 17

Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 17

slide-18
SLIDE 18

Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 18

slide-19
SLIDE 19

Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 19

slide-20
SLIDE 20

Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 20

slide-21
SLIDE 21

Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 21

slide-22
SLIDE 22

Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 22

slide-23
SLIDE 23

Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 23

slide-24
SLIDE 24

Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 24

slide-25
SLIDE 25

Multi-agent learning Simplified Poker

Equilibria

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 25

slide-26
SLIDE 26

Multi-agent learning Simplified Poker

Let Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩ be an ordered game. Recall: Vj are nodes, Zj are leaves, Lj(v) is a map from nodes to players. DEFINITION 3. A behavior strategy for player i in round j of Γ with information filter F is a probability distribution over possible actions, and is defined for each player i, round j, and v ∈ Vj \ Zj, such that Lj(v) = i: σj

i,v :

[

j−1

×

k=1

ωk

cont × Range

( Fj) ]

→ ∆

{ w ∈ Vj|(v, w) ∈ Ej } . A behavior strategy for player i in Γ is σi = (σ1

i , . . . , σr i ).

σi is a best response to σ−i if, for all other strategies σ′

i : ui(σi, σ−i) ≥ ui(σ′ i , σ−i).

A strategy profile is σ = (σ1, . . . , σn). σ is a Nash equilibrium if, for every player i, σi is a best response for σ−i.

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 26

slide-27
SLIDE 27

Multi-agent learning Simplified Poker

An ordered game Γ and an information filter F for Γ defines a new game ΓF. We refer to such games as filtered ordered games. PROPOSITION 1. A filtered ordered game is an extensive form game satisfying perfect recall. “A Nash equilibrium always exists in finite extensive form games, and one exists in behavior strategies for games with perfect recall.” COROLLARY 1. For any filtered ordered game, a Nash equilibrium exists in behavior strategies.

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 27

slide-28
SLIDE 28

Multi-agent learning Simplified Poker

Equilibrium-preserving abstractions

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 28

slide-29
SLIDE 29

Multi-agent learning Simplified Poker

DEFINITION 4. Associated with every ordered game Γ and information filter F is a filtered signal tree, a directed tree with components analogous to that of Γ.

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 29

slide-30
SLIDE 30

Multi-agent learning Simplified Poker

Recall: Nj(v) are the outgoing neighbors (ie. children) of v ∈ Vj. DEFINITION 5a. Two subtrees beginning at internal nodes x and y of a filtered signal tree are ordered game isomorphic if: x and y have the same parent and there is a bijection f : N(x) → N(y), s.t. for all w ∈ N(x) and v ∈ N(y) : if v = f (w) then the weights on the edges (x, w) and (y, v) are the same, and the subtrees beginning at w and v are ordered game isomorphic. DEFINITION 5b. Two leaves corresponding to filtered signals ϑ and ϑ′ up through round r are ordered game isomorphic if: for all ˜ z ∈ [

j−1

×

k=1

ωk

cont × ωj

  • ver

] : ur(˜ z, ϑ) = ur(˜ z, ϑ′).

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 30

slide-31
SLIDE 31

Multi-agent learning Simplified Poker

Let Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩ be an ordered game. Let F be an information filter for Γ. Let ϑ and ϑ′ be two nodes with subtrees in the induced filtered signal tree being ordered game isomorphic. DEFINITION 6. The ordered game isomorphic abstraction transformation is given by creating a new information filter F′ : F′j (˜ αj, ˜ βj

i) =

{ F′j(˜ αj, ˜ βj

i)

if (˜ αj, ˜ βj

i) /

∈ ϑ ∪ ϑ′

ϑ ∪ ϑ′ if (˜ αj, ˜ βj

i) ∈ ϑ ∪ ϑ′.

As it turns out, any Nash equilibrium of the induced game ΓF′ corresponds to a Nash equilibrium in ΓF

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 31

slide-32
SLIDE 32

Multi-agent learning Simplified Poker

GameShrink: Algorithm sketch

  • Let F be the identity filter Fj (˜

αj, ˜ βj

i) =

{(˜ αj, ˜ βj

i)

} .

  • Going breath-first from top to bottom, for each pair (ϑ, ϑ′):

Check if ordered game isomorphic If so: do ordered game isomorphic abstraction transformation on F

  • Return F

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 32

slide-33
SLIDE 33

Multi-agent learning Simplified Poker

GameShrink: Results

  • Algorithm complexity is O(n2), n being number of signal tree nodes.
  • ie. sublinear in the size of the game tree.
  • RI game tree shrunk by a factor of ~74 in under a second.
  • RI induced filtered signal tree solved in 7 days and 13 hours.
  • Computations performed on a high end, non-supercomputer.

(1.65GHz IBM eServer p5 570 with 64GB RAM, 25GB used)

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 33

slide-34
SLIDE 34

Multi-agent learning Simplified Poker

Thanks for your attention!

Questions

Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 34