Multi-agent learning Simplied Poker Yannick Bitane , April 14th, - PowerPoint PPT Presentation

Multi-agent learning Simplified Poker Multi-agent learning Simpli�ed Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 1

Multi-agent learning Simplified Poker Contents • Poker in Multi Agent Learning • Gilpin & Sandholm • Formal mechanics Ordered games Information filters Equilibrium preserving abstractions • GameShrink Algorithm sketch Results * Gilpin & Sandholm (2005): Finding equilibria in large sequential games of imperfect information . Technical Report CMU-CS-05-158, Carnegie Mellon University. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 2

Multi-agent learning Simplified Poker Poker in MAL • AI Testbed: incomplete information game • Texas Hold’em Game tree tremendously large Texas Hold’em, 2 player limit: 10 18 nodes • How to solve this game? Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 3

Multi-agent learning Simplified Poker Gilpin & Sandholm’s approach • Rhode Island Hold’em Strategically similar, but much less branching 3.1 · 10 9 nodes • GameShrink: Reduce branching by merging equivalent branches • Proven: Nash equilibria in the reduced game tree correspond to Nash equilibria in the original tree. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 4

Multi-agent learning Simplified Poker Contents (Again) • Poker in Multi Agent Learning • Gilpin & Sandholm • Formal mechanics Ordered games Information filters Equilibrium preserving abstractions • GameShrink Algorithm sketch Results = ⇒ No introduction to poker, no demo, no proofs. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 5

Multi-agent learning Simplified Poker Ordered games Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 6

Multi-agent learning Simplified Poker DEFINITION 1. An ordered game is a tuple Γ = ⟨ I , G , L , Θ , κ , γ , p , ≽ , ω , u ⟩ , where: 1. I = { 1, . . . , n } is a finite set of players . 2. G = ⟨ G 1 , . . . , G r ⟩ , G j = ( V j , E j ) is a finite collection of finite directed trees , with nodes V j , and edges E j . Let Z j ⊂ V j be the leaf nodes of G j . Let N j ( v ) be the outgoing neighbors of v ∈ V j . 3. L = ⟨ L 1 , . . . , L r ⟩ , L j : V j \ Z j → I indicates which player is to ac t in round j . Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 7

Multi-agent learning Simplified Poker DEFINITION 1. An ordered game is a tuple Γ = ⟨ I , G , L , Θ , κ , γ , p , ≽ , ω , u ⟩ , where: 4. Θ is a finite set of signals . 5. κ = ⟨ κ 1 , . . . , κ r ⟩ is the number of public signals revealed . γ = ⟨ γ 1 , . . . , γ r ⟩ is the number of private signals revealed . (per player in round j ) The public information revealed in round j is α j ∈ Θ κ j , and α j = ( α 1 , . . . , α j ). in all rounds up through j is ˜ The private information revealed to player i ∈ I in round j is β j i ∈ Θ γ j , β j i , . . . , β j in all rounds up through j is ˜ i = ( β 1 i ). Each signal θ ∈ Θ may only be revealed once . Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 8

Multi-agent learning Simplified Poker DEFINITION 1. An ordered game is a tuple Γ = ⟨ I , G , L , Θ , κ , γ , p , ≽ , ω , u ⟩ , where: 6. p is a probability distribution over Θ , with p ( θ ) > 0 for all θ ∈ Θ . Signals are drawn from Θ according to p without replacement, so if A is the set of signals already revealed, then  p ( x ) ∈ A if x /  ∈ A p ( y ) ∑ y / p ( x | A ) = if x ∈ A . 0  7. ≽ is a partial ordering of subsets of Θ , and is defined for at least those pairs required by u . (coming up in 2 slides) Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 9

Multi-agent learning Simplified Poker DEFINITION 1. An ordered game is a tuple Γ = ⟨ I , G , L , Θ , κ , γ , p , ≽ , ω , u ⟩ , where: Recall: Z j are the leaf nodes of G j . j = 1 Z j → {over , continue} 8. ω : ∪ r is a mapping of terminal nodes in round G j to one of two values: over , in which case the game ends, or continue , in which case the game continues to the next round. Clearly, for all z ∈ Z r we require ω ( z ) = over . Let ω j over = { z ∈ Z j | ω ( z ) } = over , ω j cont = { z ∈ Z j | ω ( z ) } = continue . Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 10

Multi-agent learning Simplified Poker DEFINITION 1. An ordered game is a tuple Γ = ⟨ I , G , L , Θ , κ , γ , p , ≽ , ω , u ⟩ , where: j − 1 j n j × × Θ κ k × × × Θ γ k → R n cont × ω j u j : 9. u = ( u 1 , . . . , u r ) , ω k over × k = 1 k = 1 i = 1 k = 1 is a utility function , such that: for every j such that 1 ≤ j ≤ r , for every i ∈ I , and j − 1 [ ] × cont × ω j ω k z ∈ for every ˜ , over k = 1 at least one of the following two conditions holds: (a) Utility is signal independent , that is: u j z , ϑ ) = u j z , ϑ ′ ) i ( ˜ i ( ˜ n j j [ × Θ κ k × × × Θ γ k ] for all legal ϑ and ϑ ′ ∈ . i = 1 k = 1 k = 1 (b) See next slide. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 11

Multi-agent learning Simplified Poker DEFINITION 1. An ordered game is a tuple Γ = ⟨ I , G , L , Θ , κ , γ , p , ≽ , ω , u ⟩ , where: j − 1 j n j × × Θ κ k × × × Θ γ k → R n cont × ω j u j : 9. u = ( u 1 , . . . , u r ) , ω k over × k = 1 k = 1 i = 1 k = 1 is a utility function , such that: for every j such that 1 ≤ j ≤ r , for every i ∈ I , and j − 1 [ ] × cont × ω j ω k z ∈ for every ˜ , over k = 1 at least one of the following two conditions holds: (a) Utility is signal independent. β ′ j β j α j , ˜ α j , ˜ (b) ≽ is defined for all legal signals ( ˜ i ) and ( ˜ i ) through round j , and a player’s utility is increasing in her private signals, all else equal: [ β ′ j ] [ β ′ j ] β j β j β j β j α j , ˜ α j , ˜ α j , ( ˜ i , ˜ α j , ( ˜ i , ˜ ( ˜ i ) ≽ ( ˜ i ) = ⇒ u i ( ˜ − i )) ≥ u i ( ˜ − i )) z , ˜ z , ˜ . Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 12

Multi-agent learning Simplified Poker DEFINITION 1. An ordered game is a tuple Γ = ⟨ I , G , L , Θ , κ , γ , p , ≽ , ω , u ⟩ . 1. I : finite set of players. 2. G j = ( V j , E j ), where V j are the nodes and E j the edges in j . Z j ⊂ V j : the leaf nodes of G j . N j ( v ) : the outgoing neighbors of v ∈ V j . 3. L j : mapping from non-terminal nodes to players (to act in round j ). 4. Θ : finite set of signals. 5. κ j : total number of public signals revealed per player in round j . γ j : total number of private signals revealed per player in round j . 6. p : probability distribution over Θ . 7. ≽ : partial ordering of subsets of Θ . 8. ω : mapping from terminal nodes in each round to {over , continue} . 9. u : utility function. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 13

Multi-agent learning Simplified Poker Information filters Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 14

Multi-agent learning Simplified Poker Let Γ = ⟨ I , G , L , Θ , κ , γ , p , ≽ , ω , u ⟩ be an ordered game. Let S j be the set of legal ∗ signals for one player up through round j . DEFINITION 2. An information filter for Γ is a collection F = ⟨ F 1 , . . . , F r ⟩ where each F j is a function F j : S j → 2 S j s.t. the following conditions hold: β j β j β j α j , ˜ α j , ˜ α j , ˜ i ) ∈ F j ( ˜ 1. Truthfulness . ( ˜ i ) for all legal ( ˜ i ) . 2. Independence . The range of F j is a partition of S j . 3. Information preservation . If two values of a signal are distinguishable in round k , then they are distinguishable for each round j > k . That is, l = 1 κ l + γ l . We require that j let m j = ∑ for all legal ∗ ( θ 1 , . . . , θ m k , . . . , θ m j ) ⊆ Θ and ( θ ′ 1 , . . . , θ ′ m k , . . . , θ ′ m j ) ⊆ Θ : ( θ ′ 1 , . . . , θ ′ ∈ F k ( θ 1 , . . . , θ m k ) , m k ) / if then ( θ ′ 1 , . . . , θ ′ m k , . . . , θ ′ ∈ F k ( θ 1 , . . . , θ m k , . . . , θ m j ) . m j ) / Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 15

Multi-agent learning Simplified Poker Example Intuition: by passing signals through a filter before revealing them, informative precision can be reduced while keeping the underlying action space intact, thus reducing game tree. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 16

Multi-agent learning Simplified Poker Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 17

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, - PowerPoint PPT Presentation

Multi-agent learning Simplified Poker Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 1 Multi-agent learning Simplified Poker

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

Agent-Based Systems Agent: autonomous Learning for Agent-Based Systems Environment: fully,

Architectural Complexity Lessons from the bwin P5 Poker System Presented by: Henrik Henke

CS 486/686 Introduction to Artifjcial Intelligence Alice Gao Lecture 2 Readings: R & N 2.1,

Sequential imperfect information games Players face uncertainty about the state of the world

STANDUP POKER KALPESH SHAH CULTURE HACKER & ENTERPRISE AGILE COACH A few things about me.

THE CENTRAL LIMIT THEOREM- WHAT SAMPLE SIZE IS NEEDED? PAUL BOUTHELLIER DEPARTMENT OF

EECS 394 Software Project Management Chris Riesbeck Estimating Thursday, May 19, 2011

Permutations and Combinations MATH 107: Finite Mathematics University of Louisville March 3,

CSCI 246 Class 20 PROBABILITIES, PERMUTATIONS, COMBINATIONS PART 2 Quiz Questions

Sambuz

Useful Links

Newsletter

Mail Us

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, - PowerPoint PPT Presentation

Multi-agent learning Simplified Poker Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides last processed on Thursday 14 th April, 2011 at 12:37. Slide 1 Multi-agent learning Simplified Poker

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

Agent-Based Systems Agent: autonomous Learning for Agent-Based Systems Environment: fully,

Architectural Complexity Lessons from the bwin P5 Poker System Presented by: Henrik Henke

CS 486/686 Introduction to Artifjcial Intelligence Alice Gao Lecture 2 Readings: R &amp; N 2.1,

Sequential imperfect information games Players face uncertainty about the state of the world

STANDUP POKER KALPESH SHAH CULTURE HACKER &amp; ENTERPRISE AGILE COACH A few things about me.

THE CENTRAL LIMIT THEOREM- WHAT SAMPLE SIZE IS NEEDED? PAUL BOUTHELLIER DEPARTMENT OF

EECS 394 Software Project Management Chris Riesbeck Estimating Thursday, May 19, 2011

Permutations and Combinations MATH 107: Finite Mathematics University of Louisville March 3,

CSCI 246 Class 20 PROBABILITIES, PERMUTATIONS, COMBINATIONS PART 2 Quiz Questions

Sambuz

Useful Links

Newsletter

Mail Us

CS 486/686 Introduction to Artifjcial Intelligence Alice Gao Lecture 2 Readings: R & N 2.1,

STANDUP POKER KALPESH SHAH CULTURE HACKER & ENTERPRISE AGILE COACH A few things about me.