Evolving Societies of Learning Autonomous Systems (ESLAS) Franz - - PowerPoint PPT Presentation

evolving societies of learning autonomous systems eslas
SMART_READER_LITE
LIVE PREVIEW

Evolving Societies of Learning Autonomous Systems (ESLAS) Franz - - PowerPoint PPT Presentation

Organic Computing Status Colloquium / Sept 2010 Evolving Societies of Learning Autonomous Systems (ESLAS) Franz Rammig, Bernd Kleinjohann, Alexander Jungmann University of Paderborn The ESLAS project phase III Goal: Organic coordination


slide-1
SLIDE 1

Evolving Societies of Learning Autonomous Systems (ESLAS)

Franz Rammig, Bernd Kleinjohann, Alexander Jungmann University of Paderborn

Organic Computing Status Colloquium / Sept 2010

slide-2
SLIDE 2

The ESLAS project phase III

  • Goal: Organic coordination and cooperation

September 21-22, 2009 DFG 1183 ORGANIC COMPUTING

2

  • How to coordinate multiple (contradicting?)

goals of one robot?

→ dynamic goal priorization based on Multi-SMDPs

dependent on motivation system

  • How to enable non-obtrusive cooperation?

→ use behavior-recognition (imitation) to model teammates → use those models to “meta-learn” team strategies (cartesian product of the state spaces) dependent on vicinity

controller controller

  • bserver
  • bserver

input

DEC

decision

BC

behavior construction

LTM

long term memory

EXPL

exploration

ACT

action capabilities

EV

evaluation

EM

episode memory

  • utput
controller controller
  • bserver
  • bserver
input DEC decision BC behavior construction LTM long term memory EXPL exploration ACT action capabilities EV evaluation EM episode memory
  • utput
controller controller
  • bserver
  • bserver
input DEC decision BC behavior construction LTM long term memory EXPL exploration ACT action capabilities EV evaluation EM episode memory
  • utput
controller controller
  • bserver
  • bserver
input DEC decision BC behavior construction LTM long term memory EXPL exploration ACT action capabilities EV evaluation EM episode memory
  • utput
controller controller
  • bserver
  • bserver
input DEC decision BC behavior construction LTM long term memory EXPL exploration ACT action capabilities EV evaluation EM episode memory
  • utput
slide-3
SLIDE 3

Part I: Organic goal coordination

  • Up to now:

September 2010 DFG 1183 ORGANIC COMPUTING

3 controller

  • bserver

input

DEC

decision

BC

behavior construction

LTM

long term memory

EXPL

exploration

ACT

action capabilities

EV

evaluation

EM

episode memory

  • utput
slide-4
SLIDE 4
  • Recap:

– Intrinsic high-level state of the robot

  • Robot‟s goals defined by means of drives
  • Dependent on perception and time-dependent functions
  • Threshold defines state of “well-being” or satisfaction

– Drive examples

  • Battery level
  • Collect items
  • Transport items to base

– Generation of motivation

  • Vector to “well-being region”
  • Dynamic drive state →

dynamic motivation

  • ESLAS Phase II: Pursue the goal most in need

– i.e. greedy goal selection – Problem: Does not pay attention to the dynamics of the drive state

September 21-22, 2009 DFG 1183 ORGANIC COMPUTING

4

Organic goal coordination

slide-5
SLIDE 5

Organic goal coordination

  • ESLAS Phase III

– While pursuing one goal, try to fulfill other goals as well that are „on your way“ – Ex: If battery level is low, but the robot can transport an object to its base while driving to the battery fuel station „with minor detour“, then it shall do it

  • Goal coordination (COORD)

– Keeps track of all the state spaces for the different drives„ strategies – Maintains a priorization of those dependent on their motivation – Switches between the different strategies at runtime

September 2010 DFG 1183 ORGANIC COMPUTING

5

controller

SMDP SMDP SMDP

  • bserver

input

DEC

decision

BC

behavior construction

LTM

long term memory

EXPL

exploration

ACT

action capabilities

EV

evaluation

EM

episode memory

  • utput

COORD

goal coordination

slide-6
SLIDE 6

Challenge 1: When is a detour worthwhile?

  • Heuristic: A detour is the more worthwhile

–the shorter → by means of state values –the more beneficial → by means of RL reward it is.

  • BUT: state value as a means of „detour length“ is problematic

–ESLAS splits and merges „raw states“ to abstract states in order to keep RL tractable –Different strategies will have different state space abstractions, which leads us to Challenge 2

September 2010 DFG 1183 ORGANIC COMPUTING

6

slide-7
SLIDE 7

Challenge 2: Different states in different strategies

  • How to intelligently handle state changes in different strategies?

September 2010 DFG 1183 ORGANIC COMPUTING

7 Drive a Drive c Drive b

m a m b

SMDPa SMDPb SMDPc

m c

A 78 65

65

81

65

B 81 65 67

81 67 67 81 78

slide-8
SLIDE 8

Solving the challenges by three approaches

  • Mv: Priority weighted state value
  • EG: Expected Gain
  • EG-PS: Expected Gain – Primary Scondary

September 2010 DFG 1183 ORGANIC COMPUTING

8

slide-9
SLIDE 9

Mv: Priority weighted state value

  • chooses drives

that are urgent and easy to satisfy

September 2010 DFG 1183 ORGANIC COMPUTING

9

MV(i) = mi vi

Priority (higher= lower timely effort) Refined state value

Drive a Drive c Drive b

MV(a) MV(b) = mb vb = ma va = 3,4 • 59 ≈ 200 = 2,6 • 90 ≈ 234

<

= 3,4 ma = 2,6 mb = 2,6 mc

100 A 90 81 90 81 73 81 73 66 73 66 59 66 59 53 59 66 59 66 73 66 73 81 73 81 90 81 90 100 B 53 SMDPa SMDPb

p1 p1

  • Example:
slide-10
SLIDE 10

EG: Expected Gain

September 2010 DFG 1183 ORGANIC COMPUTING

10 10

slide-11
SLIDE 11

EG-PS: Expected Gain – Primary Scondary

September 2010 DFG 1183 ORGANIC COMPUTING

11 11

  • Similar to EG, but considers only sequences of two drives: primary

and secondary

  • State assignment by EG
  • Considers mainly the primary drive (similar to greedy), but also

allows for a detour of another one (secondary)

slide-12
SLIDE 12

Evaluation example scenario 1

September 2010 DFG 1183 ORGANIC COMPUTING

12 12

trial. 1 2 3 4 5 6 7 8 9 10 150 100 200 39% 27% 39% 39%

MV EG EG-PS FIXED GREEDY

EG-PS FIXED EG MV ØΣDegree of „unsatisfaction“

slide-13
SLIDE 13

Evaluation example scenario 2

September 2010 DFG 1183 ORGANIC COMPUTING

13 13

1 2 3 4 5 6 7 8 9 10 22% 22% 22% 11%

MV EG EG-PS FIXED GREEDY

EG-PS FIXED EG MV 12 10 14 11 13 trial. ØΣDegree of „unsatisfaction“

slide-14
SLIDE 14

Organic coordination: Results

  • Organic coordination solves the problem of efficiently selecting a

robot‟s actions based on SMDP in the presence of dynamically prioritized goals.

  • Mv weights the current priority (drive) of a goal with the value of the

active state in the corresponding SMDP –Solely doing so is suboptima: does not handle consequences that emerge when reaching a goal state of a selected

  • Refined “EG” considers all possible sequences of the goals.

–higher precision, but slower runtime

  • Approximation EG-PS achieves EG accuracy but with acceptable

runtime

September 2010 DFG 1183 ORGANIC COMPUTING

14 14

slide-15
SLIDE 15

Part II: Organic cooperation (ongoing work)

  • Cooperation methods

–Communication

  • Robots exchange action plans

prior to execution

  • Locally choose joint action with

highest expected utility

–Social laws

  • Conventions specified at design time
  • Restrict robots‟ decisions in coordinated actions (stigmergy)

–Learning methods

  • Deriving knowledge from the

experience of repeated interactions

  • Typically need to “look into the

teammates‟ brains”

September 21-22, 2009 DFG 1183 ORGANIC COMPUTING

15 15

MRS Taxonomy (Farinelli et al, 2004)

Here, we can do better!

slide-16
SLIDE 16

Organic cooperation

  • What we already can do:

„Understanding“ of other robot„s behavior in terms of our own capabilities

  • What we need:

1. Applying this continuously to all robots all the time to build models 2. Using these models to combine them on-demand with our own strategy

September 2010 DFG 1183 ORGANIC COMPUTING

16 16

slide-17
SLIDE 17

Organic cooperation

  • How to model the behavior of teammates?

–Assumption “my teammate is similar to me”  Model deviations of expected behavior –Assumption “my teammate is different to me (knows nothing)”  Model the complete behavior –Important, not because of model size, but of “approximation speed”

  • Mixing individual and teammate strategy
  • If a teammate is exploring, it should not be modeled

–Determine whether teammate is in exploration/exploitation mode dependent on its “strategy variance”

September 21-22, 2009 DFG 1183 ORGANIC COMPUTING

17 17

slide-18
SLIDE 18

Organic cooperation – the algorithm 1/2

September 2010 DFG 1183 ORGANIC COMPUTING

18 18

slide-19
SLIDE 19

Organic cooperation – the algorithm 2/2

September 2010 DFG 1183 ORGANIC COMPUTING

19 19

slide-20
SLIDE 20

Organic cooperation

  • Non-obtrusive coordination
  • Benefits

–Robustness

No predefined communication needed

–Adaptability

Change of goal/behavior will be reflected in a change of the teammate model

  • Approach

–Use behavior recognition from imitation to model teammates

(possible with minor algorithm changes)

–Same assumption: shared goals

September 21-22, 2009 DFG 1183 ORGANIC COMPUTING

20 20

slide-21
SLIDE 21

Real world evaluation

September 21-22, 2009 DFG 1183 ORGANIC COMPUTING

21 21

Robot„s local video (live) Drag to move robot Live video stream stitched from 8 GigE video cameras

slide-22
SLIDE 22

Conclusion

  • ESLAS Phase III moves on to support robustness in addition to the

learning speed of robot groups –Organic coordination handles intelligently multiple (possibly contradicting) goals –Organic cooperation enables robots to cooperate with each other (even if the teammate was not designed to do so)

September 2010 DFG 1183 ORGANIC COMPUTING

22 22