Multi-agent learning Satisficing strategies Ronald Chu, Geertjan van - PowerPoint PPT Presentation

Multi-agent learning Satisficing strategies Multi-agent learning Satisficing strategies Ronald Chu, Geertjan van Vliet , Technical Artificial Intelligence, Universiteit Utrecht Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 1

Multi-agent learning Satisficing strategies Outline • What is satisficing • Satisficing in the repeated prisoner’s dillema (RPD) • Satisficing in the multi-agent social dilemma (MASD) Stimpson et al. (2001): Satisficing and Learning Cooperation in the Prisoner’s Dilemma Stimpson et al. (2003): Learning To Cooperate in a Social Dilemma A Satisficing Approach to Bargaining Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 2

Multi-agent learning Satisficing strategies Satisficing (1) Optimize: Choose the best available option Satisfice: Choose an option that meets a certain aspiration level Doesn’t have to be unique or in any way the best. Why satisficing? • No information needed except: – The available actions – The payoff of the last action • Aspiration level is adaptive Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 3

Multi-agent learning Satisficing strategies Satisficing (2) At time t player A has a strategy pair ( A t , α t ) • Action A t ∈ { C , D } • Aspiration level α t • Payoff R t Strategy is updated each round • A t + 1 = A t iff R t ≥ α t , otherwise A t + 1 � = A t • α t + 1 = λα t + ( 1 − λ ) R t where 0 ≤ λ ≤ 1 Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 4

Multi-agent learning Satisficing strategies Satisficing (3) t tvt A t R t t tvt A t R t α t α t 0 C C 3 4.00 10 C D 4 1.72 1 C D 4 3.50 11 D D 2 2.86 2 D D 2 3.75 12 D C 1 2.43 3 D C 1 2.88 13 C D 4 1.71 4 C D 4 1.94 14 D D 2 2.86 5 D D 2 2.97 15 D C 1 2.43 6 D C 1 2.48 16 C D 4 1.71 7 C D 4 1.74 17 D D 2 2.86 8 D D 2 2.87 18 D C 1 2.43 9 D C 1 2.44 19 C D 4 1.71 Satisficing strategy (with λ = 0.5 ) against a tit-for-tat strategy. Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 5

Multi-agent learning Satisficing strategies Repeated Prisoner’s Dilemma (1) A two-player two-action social dilemma • Initial research focuses on Nash equilibrium • Mutual cooperation is rational in repeated prisoner’s dilemma (Axelrod 1984) • Usual assumes that the agent knows: – the structure of the game – the decisions of the opponent(s) – the payoffs of the opponent(s) – that opponents’ actions affect the outcomes Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 6

Multi-agent learning Satisficing strategies Repeated Prisoner’s Dilemma (2) Extend notation for a two-player game: • second player has strategy pair ( B t , β t ) • both players have the same learning rate λ Payoff matrix for the PD is generalized: C D • σ payoff for mutual cooperation ( σ , σ ) ( 0, 1 ) C • δ payoff for mutual defection ( 1, 0 ) ( δ , δ ) D • 0 < δ < σ < 1 • 0.5 < σ Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 7

Multi-agent learning Satisficing strategies RPD Experiment Several possible outcomes: • Convergence to a fixed strategy • Convergence to some action cycle • No convergence. Stimpson et. al. ran 5.000 runs of the repeated PD, with uniformly distributed bounded random values. Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 8

Multi-agent learning Satisficing strategies RPD Experiment: results 74% CC 25% DD-DC-DD-CD 1% DD 0% DD-CC-DC There are several parameters influencing convergence: • Payoffs • Initial aspirations • Initial actions • Learning rate Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 9

Multi-agent learning Satisficing strategies RPD Experiment: payoffs Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 10

Multi-agent learning Satisficing strategies RPD Experiment: initial aspirations Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 11

Multi-agent learning Satisficing strategies RPD Experiment: initial actions 81.6% CC 81.6% DD 73.7% Random 66.7% DC or CD Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 12

Multi-agent learning Satisficing strategies RPD Experiment: learning rate Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 13

Multi-agent learning Satisficing strategies RPD Experiment: conclusion (1) Satisficing strategie converges in the RPD to mutual cooperation 1. Big difference between mutual cooperation and defection payoffs 2. High initial aspirations 3. Similar initial behavior 4. Slow learning rate Stimpson et. al. ran 5.000 runs with these parameters with 100% convergence to mutual cooperation. Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 14

Multi-agent learning Satisficing strategies RPD Experiment: conclusion (2) In our 5.000 runs there was 94.1% convergence to mutual cooperation (94.8% with max. rounds 100.000) Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 15

Multi-agent learning Satisficing strategies Multi-agent social dilemma • Introduction • Satisficing algorithm Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 16

Multi-agent learning Satisficing strategies Multi-agent social dilemma (1) Basic characteristics • Choice between selfish goal or group goal • Benefits from both group goal and selfish goal • Multi-action, multi-agent (more than 2x2) • Repeated game • Individual defection is the best option as long as other agents contribute Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 17

Multi-agent learning Satisficing strategies Multi-agent social dilemma (2) Game structure. • We have M + 1 actions and N agents ( N = | A | ) • Each agent i ∈ A contributes c i units to the group, c i ∈ N , 0 < c i < M • Reward received: R i ( c ) = k g ( ∑ c j ) + k s ( M − c i ) j ∈ A • Dynamics depend on weight of the group goal K g versus the selfish goal K s , which is assumed constant and the same for all agents Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 18

Multi-agent learning Satisficing strategies MASD Satisficing algorithm (1) • Ideally converges to ( M , . . . , M ) • All agents need to be satisficed to converge to a action pair • One agent will give up playing M if another changes strategy • Works best with: – Initial aspirations higher than the best possible reward – Slow learning rate Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 19

Multi-agent learning Satisficing strategies MASD Satisficing algorithm (2) At time t player i ∈ A has a strategy pair ( A t i , α t i ) • Action A t i ∈ { 0, . . . , M } • Aspiration level α t i • Payoff R t i resulting from its strategy at t − 1 Strategy is updated each round • A t + 1 = A t iff R t i ≥ α t i , otherwise choose new action random • α t + 1 = λα t i + ( 1 − λ ) R t i i Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 20

Multi-agent learning Satisficing strategies MASD Satisficing algorithm: example M = 10, k = 0.6, and λ = 0.99 Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 21

Multi-agent learning Satisficing strategies Break Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 22

Multi-agent learning Satisficing strategies MASD Reward function k g and k s . • R i ( c ) = k g ( ∑ j ∈ A c j ) + k s ( M − c i ) . • Make the reward range independent of N and M: 1 k g = NM k s = 1 M k • Introduce weight factor k to the selfish goal and then k s = M . Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 23

Multi-agent learning Satisficing strategies MASD Reward function Interesting values for k . 1 k g = NM k s = k M • Goals are equally important when k g = k s ⇔ k = 1 N . • When k > 1, then the selfish goal is always preferred by any agent (Exercise). 1 • N < k < 1 which means that k s > k g . Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 24

Multi-agent learning Satisficing strategies MASD Reward function Final Reward function . • Inserting the new constants and using • c − i = ∑ j ∈ A \{ i } c j the contribution of other agents 1 NM + k c i R i ( c ) = NM c − i + M ( M − c i ) (1) • Dividing by ( 1 − k ) and dropping constants R i ( c ) = ( 1 − kN ) c i + c − i (2) NM ( 1 − k ) Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March, 2011 at 12:53h. Slide 25

Multi-agent learning Satisficing strategies Ronald Chu, Geertjan van - PowerPoint PPT Presentation

Multi-agent learning Satisficing strategies Multi-agent learning Satisficing strategies Ronald Chu, Geertjan van Vliet , Technical Artificial Intelligence, Universiteit Utrecht Ronald Chu, Geertjan van Vliet. Last update: Thursday 24 th March,

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Multi-agent learning Simplied Poker Yannick Bitane , April 14th, 2011. Yannick Bitane. Slides

Learning Agent Learning Agents An Agent that observes its performance and adapts its

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

The Player Agent The Player Agent Are they the most important league official right now? right

Rational Agents (Ch. 2) Rational agent An agent/robot must be able to perceive and interact with

Agent-Based Systems Michael Rovatsos mrovatso@inf.ed.ac.uk Lecture 6 Agent Communication 1

Agent Training Welcome Blues Agent Portal Training e-Learning on the BCBSM Agent Portal

W HAT S AN A GENT ? Weiss, p. 29 [after Wooldridge and Jennings]: An agent is a

M ULTI -A GENT S YSTEMS Overview and Research Directions Whats an agent? AI Class 12 (C H .

Ame ric a n Workforc e Polic y Advisory Boa rd Da ta T ra nsp a re nc y Wo rking G ro up

COM COMMUN MUNITY ITY CHA CHANGE NGE COM COMMUNI MUNITY TY CHA CHANGE NGE FE FESTI

Transformational Leadership Webinar 1 ACNL Nurse Leadership Development Committee 1 Presenter

Implemen'ng and Assessing Effec've Professional Learning Joellen

Special Webinar COVID-19 and People with IDD: Impact, Prevention and Action PRESENTED BY: Craig

Randomized trial of manual aspiration Thrombectomy + PCI vs. PCI Alone in STEMI (TOTAL) SS

ASCEND Randomized placebo-controlled trial of aspirin 100 mg daily in 15,480 patients with

ARRIVE (Aspirin to Reduce Risk of Initial Vascular Events): A Study to Assess the Efficacy and