1 The DFG Algorithm - The DFG Algorithm - Rationale The basic - PDF document

4.3. Reinforcement learning for forming coalitions: the DFG algorithm Reinforcement Learning (I) Weiß (1995) Watkins (1989) DFG: Dissolution and Formation of Groups Let’s use our single agent definition: Then an agent Ag has in Dat for each pair (s,a) Sit ¥ Act Basic Problems tackled: an evaluation e(s,a). Its decision function then selects n How can several agents learn what actions they can always the action a in a situation s, for which e(s,a) is perform in parallel? optimal. n How can several agents learn what sets of actions Then learning is performed by getting a feedback after have to be executed sequentially? an action or action sequence and a learn function Q distributes the feedback among the evaluations. Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger The DFG Algorithm - Reinforcement Learning (II) Scenario (I) The interesting part of reinforcement learning (often also called Q-learning) is how the learn function Q is A set of organizations competes for furthering a given defined. There are many possibilities and an task. The general procedure is that for each occurring important point is especially how the distribution of situation each organization is allowed to bid its next feedback is done after action sequences. solution step and only the solution step of the best There are obvious similarities to learning in neural organization will be executed, thus generating the networks. next situation. The basic agent architecture resembles Markov An organization itself consists of compatible agents and processes and their theory is used for proving smaller organizations. In the following, we call these properties of Q-functions. organizations and agents units. From time to time random decisions have to be made to The units of a winning organization perform the actions try out no situation action combinations that their decision functions suggest for the current F exploration situation. Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger The DFG Algorithm - The DFG Algorithm - Scenario (II) Examples for organizations This is the reason why the units have to be compatible, i.e. no action of one unit can prevent the action of another unit. flat: hierarchical: In each organization there is one agent that is acting as leader and that computes the bids of the organization. It also receives the rewards (feedback) for the organization. It represents the whole organization. We want organizations to be dependent on situations! Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger 1

The DFG Algorithm - The DFG Algorithm - Rationale The basic cycle Obviously, for each situation we want to find the The DFG algorithm learns by extending, dissolving and organization whose units perform all possible actions forming of organizations. that can be performed in parallel and that also are Basic cycle: sensible, i.e. they should further the problem solution 1. Competition: process. Evaluation and selection of actions The DFG algorithm tries to learn these organizations. 2. Modification of evaluations: former and active organizations get rewarded 3. Development of organizations: Dissolving and forming of organizations Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger Competition Modification of evaluations S j : actual situation Let U i be the organization winning in situation S j and U k the winning organization that led from situation S l U i : organization that could act in actual situation to S j j = (a + b) ¥ E i j : bid of U i for S j , where B i Modify the evaluations as follows: a: learn factor E ij = E ij - a ¥ E ij + R extern b: random factor E kl = E kl + a ¥ E ij E i j : evaluation of the combined actions of U i for S j so Where R extern is the extern feedback provided by the far environment. F this stabilizes successful action sequences and destabilizes unsuccessful sequences Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger Development of organizations (I) Development of organizations (II) n After starting the system and as long as the n Organizations interested in alternatives form a new evaluation of a unit is increasing, there is no need to (combined) organization, if the modification mean look for alternative organizations, i.e. no extensions, value gets smaller than the evaluation before n+1 no defects. modifications (multiplied by a so-called formation factor). n An interest in alternative organizations starts, when First the unit with the highest evaluation selects one the evaluation of a unit decreases or stagnates. In cooperation partner, namely the compatible unit with order to find this out, the leader (or the agent itself) the highest evaluation, then among the remaining computes a moving mean value of the last n ones this is repeated until all units found a new modifications of the evaluation of the unit. partner or there are no compatible units left anymore. Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger 2

Characterization of the DFG Development of organizations (III) algorithm n An organization is dissolved by its leader, if the Each unit permanently does mean value of its evaluation falls below its initial n online learning evaluation (from when it was formed) multiplied by n with a teacher who specifies the quality of its a so-called dissolution factor. behavior. n Whenever a unit has to bid the first time for its The learning is achieved by making experiences. situation, it uses a predefined value E init Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger Discussion : Good solution to problem scenario : Rather fine tuning of organizations to situations possible - Only sensible for a small Sit and a small Mact - In order to allow for learning, the same situations have to occur very often - Big administrative overhead in agents Multi-Agent Systems Jörg Denzinger 3

1 The DFG Algorithm - The DFG Algorithm - Rationale The basic - PDF document

4.3. Reinforcement learning for forming coalitions: the DFG algorithm Reinforcement Learning (I) Wei (1995) Watkins (1989) DFG: Dissolution and Formation of Groups Lets use our single agent definition: Then an agent Ag has in Dat for each

SQL Division CS430/630 Lecture 7 Slides based on Database Management Systems 3 rd ed,

Avoiding Common Bid Errors on MOVEBR Projects November 12, 2020 AGENDA Introductions

Proseminar Network Hacking and Defense Information Session Prof. Dr.-Ing. Georg Carle and

Good Laboratory Practices and Drug Discovery Frank Zaldivar, PhD Director, Biobanking &

Searching for Sustainable Approaches to Remediate U-Contaminated Environments Tetsu Tokunaga,

Distinguishing between SUSY and Littlest Higgs Model using trileptons at the LHC (Pheno09,

Distinguishing iterated encryption E. Lambooij eran@hideinplainsight.io This is joint work with:

District Councils Network Annual Conference 2020 Championing our towns, cities and communities

Transportation: Part of a Whole Systems Approach Simon DVali Principal Highways Engineer

AKING L L ITER ACY E E FFO TS TO TO THE HE N N EX EXT L L EV EL : : ERAC FFORTS EVEL T HE HE L

Financial disclosures Dr Sanchez receives salary from: Protea Therapy (co-owner) The

ESL UNIVERSITEIT STELLENBOSCH Electronic Systems Laboratory UNIVERSITY Neural Network

The Art and Science of Memory Alloca4on Don Porter CSE

MA/CSSE 474 today! Theory of Computation More on the Pumping Lemma Is L Regular? Exam Results

Basic Computation logical AND logical OR logical NOT && || ! public static void

Cubic fourfolds, noncommutative K3 surfaces and stability conditions Paolo Stellari Based on the

SPDES driven by Poisson Random Measures Erika Hausenblas University of Salzburg, Austria SPDES

Computer Systems Lecture 8 Multiplication, Division, and Conditional Execution CS 230 -

Division Problems Follow the slides

Parameter passing: call-by-value public class CBV { public static void main(String[] args) {

real estate capital flows Miles Gibson Head of UK Research 12 End of cycle 10 fears are 8

Seeking for a fingerprint: analysis of point processes in actigraphy recordings Ewa

S. OReilly, Dr A. Mooney, A. Hopkins, M. Fitzpatrick and F. Shanahan CHAMPION PARTNER

Registration Review Linalool Case 6058 Biopesticides and Pollution Prevention Division Steve

1 The DFG Algorithm - The DFG Algorithm - Rationale The basic - PDF document

4.3. Reinforcement learning for forming coalitions: the DFG algorithm Reinforcement Learning (I) Wei (1995) Watkins (1989) DFG: Dissolution and Formation of Groups Lets use our single agent definition: Then an agent Ag has in Dat for each

SQL Division CS430/630 Lecture 7 Slides based on Database Management Systems 3 rd ed,

Avoiding Common Bid Errors on MOVEBR Projects November 12, 2020 AGENDA Introductions

Proseminar Network Hacking and Defense Information Session Prof. Dr.-Ing. Georg Carle and

Good Laboratory Practices and Drug Discovery Frank Zaldivar, PhD Director, Biobanking &amp;

Searching for Sustainable Approaches to Remediate U-Contaminated Environments Tetsu Tokunaga,

Distinguishing between SUSY and Littlest Higgs Model using trileptons at the LHC (Pheno09,

Distinguishing iterated encryption E. Lambooij eran@hideinplainsight.io This is joint work with:

District Councils Network Annual Conference 2020 Championing our towns, cities and communities

Transportation: Part of a Whole Systems Approach Simon DVali Principal Highways Engineer

AKING L L ITER ACY E E FFO TS TO TO THE HE N N EX EXT L L EV EL : : ERAC FFORTS EVEL T HE HE L

Financial disclosures Dr Sanchez receives salary from: Protea Therapy (co-owner) The

ESL UNIVERSITEIT STELLENBOSCH Electronic Systems Laboratory UNIVERSITY Neural Network

The Art and Science of Memory Alloca4on Don Porter CSE

MA/CSSE 474 today! Theory of Computation More on the Pumping Lemma Is L Regular? Exam Results

Basic Computation logical AND logical OR logical NOT &amp;&amp; || ! public static void

Cubic fourfolds, noncommutative K3 surfaces and stability conditions Paolo Stellari Based on the

SPDES driven by Poisson Random Measures Erika Hausenblas University of Salzburg, Austria SPDES

Computer Systems Lecture 8 Multiplication, Division, and Conditional Execution CS 230 -

Division Problems Follow the slides

Parameter passing: call-by-value public class CBV { public static void main(String[] args) {

real estate capital flows Miles Gibson Head of UK Research 12 End of cycle 10 fears are 8

Seeking for a fingerprint: analysis of point processes in actigraphy recordings Ewa

S. OReilly, Dr A. Mooney, A. Hopkins, M. Fitzpatrick and F. Shanahan CHAMPION PARTNER

Registration Review Linalool Case 6058 Biopesticides and Pollution Prevention Division Steve

Good Laboratory Practices and Drug Discovery Frank Zaldivar, PhD Director, Biobanking &

Basic Computation logical AND logical OR logical NOT && || ! public static void