Multi-agent learning Simplified Poker
Multi-agent learning
Simplied Poker
Yannick Bitane, April 14th, 2011. Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 1
Multi-agent learning Simplied Poker Yannick Bitane , April 14th, - - PowerPoint PPT Presentation
Multi-agent learning Simplified Poker Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 1 Multi-agent learning Simplified Poker
Multi-agent learning Simplified Poker
Yannick Bitane, April 14th, 2011. Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 1
Multi-agent learning Simplified Poker
Ordered games Information filters Equilibrium preserving abstractions
Algorithm sketch Results
* Gilpin & Sandholm (2005): Finding equilibria in large sequential games of imperfect
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 2
Multi-agent learning Simplified Poker
Game tree tremendously large Texas Hold’em, 2 player limit: 1018 nodes
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 3
Multi-agent learning Simplified Poker
Strategically similar, but much less branching 3.1 · 109 nodes
Nash equilibria in the original tree.
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 4
Multi-agent learning Simplified Poker
Ordered games Information filters Equilibrium preserving abstractions
Algorithm sketch Results
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 5
Multi-agent learning Simplified Poker
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 6
Multi-agent learning Simplified Poker
DEFINITION 1. An ordered game is a tuple Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩, where:
Gj = (Vj, Ej) is a finite collection of finite directed trees, with nodes Vj, and edges Ej. Let Zj ⊂ Vj be the leaf nodes of Gj. Let Nj(v) be the outgoing neighbors of v ∈ Vj.
indicates which player is to act in round j.
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 7
Multi-agent learning Simplified Poker
DEFINITION 1. An ordered game is a tuple Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩, where:
γ = ⟨γ1, . . . , γr⟩ is the number of private signals revealed. (per player in round j) The public information revealed in round j is αj ∈ Θκj, and in all rounds up through j is ˜ αj = (α1, . . . , αj). The private information revealed to player i ∈ I in round j is βj
i ∈ Θγj,
in all rounds up through j is ˜ βj
i = (β1 i , . . . , βj i).
Each signal θ ∈ Θ may only be revealed once.
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 8
Multi-agent learning Simplified Poker
DEFINITION 1. An ordered game is a tuple Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩, where:
Signals are drawn from Θ according to p without replacement, so if A is the set of signals already revealed, then p(x|A) =
p(x) ∑y/
∈A p(y)
if x /
if x ∈ A.
is defined for at least those pairs required by u. (coming up in 2 slides)
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 9
Multi-agent learning Simplified Poker
DEFINITION 1. An ordered game is a tuple Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩, where: Recall: Zj are the leaf nodes of Gj.
j=1 Zj → {over, continue}
is a mapping of terminal nodes in round Gj to one of two values:
in which case the game ends, or continue, in which case the game continues to the next round. Clearly, for all z ∈ Zr we require ω(z) = over. Let ωj
ωj
cont = {z ∈ Zj|ω(z)} = continue.
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 10
Multi-agent learning Simplified Poker
DEFINITION 1. An ordered game is a tuple Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩, where:
uj :
j−1
k=1
ωk
cont × ωj
j
k=1
Θκk ×
n
i=1 j
k=1
Θγk → Rn is a utility function, such that: for every j such that 1 ≤ j ≤ r, for every i ∈ I, and for every ˜ z ∈ [
j−1
k=1
ωk
cont × ωj
] , at least one of the following two conditions holds: (a) Utility is signal independent, that is: uj
i(˜
z, ϑ) = uj
i(˜
z, ϑ′) for all legal ϑ and ϑ′ ∈ [
j
k=1
Θκk ×
n
i=1 j
k=1
Θγk ] . (b) See next slide.
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 11
Multi-agent learning Simplified Poker
DEFINITION 1. An ordered game is a tuple Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩, where:
uj :
j−1
k=1
ωk
cont × ωj
j
k=1
Θκk ×
n
i=1 j
k=1
Θγk → Rn is a utility function, such that: for every j such that 1 ≤ j ≤ r, for every i ∈ I, and for every ˜ z ∈ [
j−1
k=1
ωk
cont × ωj
] , at least one of the following two conditions holds: (a) Utility is signal independent. (b) ≽ is defined for all legal signals (˜ αj, ˜ βj
i) and (˜
αj, ˜ β′j
i) through round j,
and a player’s utility is increasing in her private signals, all else equal: [
αj, ˜ βj
i) ≽ (˜
αj, ˜ β′j
i)
]
[ ui(˜ z, ˜ αj, ( ˜ βj
i, ˜
βj
−i)) ≥ ui(˜
z, ˜ αj, ( ˜ β′j
i, ˜
βj
−i))
] .
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 12
Multi-agent learning Simplified Poker
DEFINITION 1. An ordered game is a tuple Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩.
Zj ⊂ Vj: the leaf nodes of Gj. Nj(v): the outgoing neighbors of v ∈ Vj.
γj: total number of private signals revealed per player in round j.
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 13
Multi-agent learning Simplified Poker
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 14
Multi-agent learning Simplified Poker
Let Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩ be an ordered game. Let Sj be the set of legal∗ signals for one player up through round j. DEFINITION 2. An information filter for Γ is a collection F = ⟨F1, . . . , Fr⟩ where each Fj is a function Fj : Sj → 2Sj s.t. the following conditions hold:
αj, ˜ βj
i) ∈ Fj(˜
αj, ˜ βj
i) for all legal (˜
αj, ˜ βj
i).
round k, then they are distinguishable for each round j > k. That is, let mj = ∑
j l=1 κl + γl. We require that
for all legal∗ (θ1, . . . , θmk, . . . , θmj) ⊆ Θ and (θ′
1, . . . , θ′ mk, . . . , θ′ mj) ⊆ Θ:
if
1, . . . , θ′ mk) /
then (θ′
1, . . . , θ′ mk, . . . , θ′ mj) /
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 15
Multi-agent learning Simplified Poker
Intuition: by passing signals through a filter before revealing them, informative precision can be reduced while keeping the underlying action space intact, thus reducing game tree.
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 16
Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 17
Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 18
Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 19
Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 20
Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 21
Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 22
Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 23
Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 24
Multi-agent learning Simplified Poker
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 25
Multi-agent learning Simplified Poker
Let Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩ be an ordered game. Recall: Vj are nodes, Zj are leaves, Lj(v) is a map from nodes to players. DEFINITION 3. A behavior strategy for player i in round j of Γ with information filter F is a probability distribution over possible actions, and is defined for each player i, round j, and v ∈ Vj \ Zj, such that Lj(v) = i: σj
i,v :
[
j−1
k=1
ωk
cont × Range
( Fj) ]
{ w ∈ Vj|(v, w) ∈ Ej } . A behavior strategy for player i in Γ is σi = (σ1
i , . . . , σr i ).
σi is a best response to σ−i if, for all other strategies σ′
i : ui(σi, σ−i) ≥ ui(σ′ i , σ−i).
A strategy profile is σ = (σ1, . . . , σn). σ is a Nash equilibrium if, for every player i, σi is a best response for σ−i.
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 26
Multi-agent learning Simplified Poker
An ordered game Γ and an information filter F for Γ defines a new game ΓF. We refer to such games as filtered ordered games. PROPOSITION 1. A filtered ordered game is an extensive form game satisfying perfect recall. “A Nash equilibrium always exists in finite extensive form games, and one exists in behavior strategies for games with perfect recall.” COROLLARY 1. For any filtered ordered game, a Nash equilibrium exists in behavior strategies.
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 27
Multi-agent learning Simplified Poker
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 28
Multi-agent learning Simplified Poker
DEFINITION 4. Associated with every ordered game Γ and information filter F is a filtered signal tree, a directed tree with components analogous to that of Γ.
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 29
Multi-agent learning Simplified Poker
Recall: Nj(v) are the outgoing neighbors (ie. children) of v ∈ Vj. DEFINITION 5a. Two subtrees beginning at internal nodes x and y of a filtered signal tree are ordered game isomorphic if: x and y have the same parent and there is a bijection f : N(x) → N(y), s.t. for all w ∈ N(x) and v ∈ N(y) : if v = f (w) then the weights on the edges (x, w) and (y, v) are the same, and the subtrees beginning at w and v are ordered game isomorphic. DEFINITION 5b. Two leaves corresponding to filtered signals ϑ and ϑ′ up through round r are ordered game isomorphic if: for all ˜ z ∈ [
j−1
k=1
ωk
cont × ωj
] : ur(˜ z, ϑ) = ur(˜ z, ϑ′).
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 30
Multi-agent learning Simplified Poker
Let Γ = ⟨I, G, L, Θ, κ, γ, p, ≽, ω, u⟩ be an ordered game. Let F be an information filter for Γ. Let ϑ and ϑ′ be two nodes with subtrees in the induced filtered signal tree being ordered game isomorphic. DEFINITION 6. The ordered game isomorphic abstraction transformation is given by creating a new information filter F′ : F′j (˜ αj, ˜ βj
i) =
{ F′j(˜ αj, ˜ βj
i)
if (˜ αj, ˜ βj
i) /
ϑ ∪ ϑ′ if (˜ αj, ˜ βj
i) ∈ ϑ ∪ ϑ′.
As it turns out, any Nash equilibrium of the induced game ΓF′ corresponds to a Nash equilibrium in ΓF
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 31
Multi-agent learning Simplified Poker
αj, ˜ βj
i) =
{(˜ αj, ˜ βj
i)
} .
Check if ordered game isomorphic If so: do ordered game isomorphic abstraction transformation on F
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 32
Multi-agent learning Simplified Poker
(1.65GHz IBM eServer p5 570 with 64GB RAM, 25GB used)
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 33
Multi-agent learning Simplified Poker
Thanks for your attention!
Yannick Bitane. Slides last processed on Thursday 14th April, 2011 at 12:37. Slide 34