[PPT] - Learning to Trick Robots into Cooperative Behavior Jen Jen Chung PowerPoint Presentation

SLIDE 1

Learning to Trick Robots into Cooperative Behavior

Jen ¡Jen ¡Chung ¡ ¡ Autonomous ¡Agents ¡and ¡Distributed ¡Intelligence ¡Lab ¡ Oregon ¡State ¡University ¡

SLIDE 2

UAV Package Delivery

Increasing ¡interest ¡in ¡delivery ¡

drones: ¡UPS, ¡Amazon, ¡etc. ¡

Dense ¡UAV ¡traffic ¡in ¡cluDered ¡

urban ¡environment ¡

No ¡current ¡framework ¡for ¡large ¡

scale ¡coordinaIon ¡

1 DEMUR 2015 Jen Jen Chung | Oregon State University

SLIDE 3

100m

A Cross-Section of the Airspace

Automated ¡UAV ¡traffic ¡

management ¡

Challenges: ¡

– Narrow ¡thoroughfares ¡of ¡ dense ¡traffic ¡ – Heterogeneous ¡UAVs ¡ – Dynamic ¡obstacle ¡landscape ¡

Goals ¡

– Minimize ¡conflict ¡occurrences ¡ – Avoid ¡cascading ¡effects ¡ – Maintain ¡throughput ¡

2 DEMUR 2015 Jen Jen Chung | Oregon State University

SLIDE 4

100m

Multiagent UAV Traffic Management (UTM)

Divide ¡airspace ¡into ¡sectors ¡

– Assign ¡single ¡UTM ¡agent ¡to ¡ manage ¡each ¡sector ¡

MulIagent ¡team: ¡

– UTM ¡agents ¡individually ¡learn ¡ policy ¡for ¡assigning ¡sector ¡ traversal ¡costs ¡ – Reward ¡is ¡total ¡number ¡of ¡ conflicts ¡in ¡global ¡system ¡

3 DEMUR 2015 Jen Jen Chung | Oregon State University

SLIDE 5

A Hierarchical Approach

4 DEMUR 2015 Jen Jen Chung | Oregon State University

Sector ¡Agents ¡ UAVs ¡

Define ¡cost ¡of ¡travel ¡in ¡each ¡sector ¡ according ¡to ¡current ¡UAV ¡density ¡ Plans ¡across ¡sector ¡cost ¡graph ¡ Sector-‑level ¡planner ¡ Plans ¡across ¡obstacle ¡map ¡according ¡to ¡ sector ¡traversal ¡plan ¡ Low-‑level ¡planner ¡

SLIDE 6

UTM Learning Agents

Learn ¡the ¡cost ¡of ¡travel ¡to ¡apply ¡

to ¡UAVs ¡in ¡the ¡sector ¡

Neural ¡network ¡control ¡

– Inputs: ¡UAV ¡counts ¡in ¡sector ¡

§ Separate ¡into ¡traffic ¡types, ¡e.g. ¡ heading, ¡priority, ¡plaTorm ¡etc. ¡

– Outputs: ¡Cost ¡of ¡through-‑sector ¡ travel ¡for ¡each ¡traffic ¡type ¡

CooperaIve ¡coevoluIon ¡to ¡learn ¡

NN ¡weights ¡

– Fitness ¡value: ¡number ¡of ¡conflicts ¡

5 DEMUR 2015 Jen Jen Chung | Oregon State University

SLIDE 7

Evolutionary Algorithms for Learning Control Policies

6 IROS 2015 Jen Jen Chung | Oregon State University

Mutate ¡each ¡to ¡create ¡ total ¡populaIon ¡of ¡2k ¡NNs ¡ Retain ¡k ¡best ¡performing ¡ NNs ¡ IniIalize ¡populaIon ¡of ¡k ¡NNs ¡ Test ¡each ¡NN ¡and ¡assess ¡fitness ¡

SLIDE 8

Cooperative Coevolutionary Algorithms (CCEAs)

7 DEMUR 2015 Jen Jen Chung | Oregon State University

IniIalize ¡M ¡populaIons ¡of ¡k ¡NNs ¡ Mutate ¡each ¡to ¡create ¡M ¡ populaIons ¡of ¡2k ¡NNs ¡ Randomly ¡select ¡one ¡NN ¡from ¡ each ¡populaIon ¡to ¡create ¡team ¡Ti

¡

Assess ¡team ¡performance ¡and ¡ assign ¡fitness ¡to ¡team ¡members

¡

Retain ¡k ¡best ¡performing ¡ NNs ¡of ¡each ¡populaIon ¡

SLIDE 9

Simulation Experiments

Urban ¡airspace ¡

– 256×256 ¡cell ¡map ¡of ¡San ¡ Francisco ¡ – 15 ¡Voronoi ¡parIIons ¡

Fitness ¡calculaIon ¡

– Linear: ¡no. ¡conflicts ¡at ¡each ¡ cell ¡summed ¡ – QuadraIc: ¡no. ¡conflicts ¡at ¡ each ¡cell ¡squared ¡and ¡ summed ¡

8 DEMUR 2015 Jen Jen Chung | Oregon State University

100m

SLIDE 10

Simulation Experiments

Sector ¡agents ¡

– IniIalized ¡with ¡populaIon ¡of ¡10 ¡NN ¡control ¡policies, ¡10% ¡mutaIon ¡noise ¡ – Inputs: ¡{nN, ¡nS, ¡nE, ¡nW} ¡ – Outputs: ¡{cN, ¡cS, ¡cE, ¡cW} ¡ – Fitness: ¡number ¡of ¡conflicts ¡

UAVs ¡

– StochasIcally ¡generated ¡from ¡predefined ¡set ¡of ¡start ¡and ¡goal ¡locaIons ¡ – Approximately ¡100 ¡UAVs ¡in ¡airspace ¡during ¡single ¡learning ¡epoch ¡ – A* ¡planning ¡at ¡both ¡sector-‑ ¡and ¡low-‑level ¡ – Conflict ¡radius: ¡2 ¡cells ¡(approx. ¡4m) ¡

9 DEMUR 2015 Jen Jen Chung | Oregon State University

SLIDE 11

Learning Results: Total Conflicts

Team ¡performance ¡over ¡

100 ¡learning ¡epochs ¡

Averaged ¡over ¡20 ¡trials ¡
16% ¡reducIon ¡in ¡total ¡

system ¡conflicts ¡

10 DEMUR 2015 Jen Jen Chung | Oregon State University

SLIDE 12

Congestion Reduction: Linear Cost Fitness Function

11 DEMUR 2015 Jen Jen Chung | Oregon State University

Random ¡iniIalized ¡sector ¡costs ¡ Learned ¡sector ¡costs ¡

SLIDE 13

Congestion Reduction: Quadratic Cost Fitness Function

12 DEMUR 2015 Jen Jen Chung | Oregon State University

Random ¡iniIalized ¡sector ¡costs ¡ Learned ¡sector ¡costs ¡

SLIDE 14

Extensions to Sector Agent Control Policies

Not ¡all ¡UAVs ¡in ¡the ¡airspace ¡are ¡

equal ¡

Account ¡for ¡UAV ¡type ¡in ¡NN ¡

inputs ¡and ¡outputs ¡

13 DEMUR 2015 Jen Jen Chung | Oregon State University

Package ¡ delivery ¡UAVs ¡ Emergency ¡ medical ¡UAVs ¡ Weighted ¡ Cross-‑weighted ¡ MulI-‑mind ¡

SLIDE 15

A Hierarchical Approach

14 DEMUR 2015 Jen Jen Chung | Oregon State University

Sector ¡Agents ¡ UAVs ¡

Define ¡cost ¡of ¡travel ¡in ¡each ¡sector ¡ according ¡to ¡current ¡UAV ¡density ¡ Plans ¡across ¡sector ¡cost ¡graph ¡ Sector-‑level ¡planner ¡ Plans ¡across ¡obstacle ¡map ¡according ¡to ¡ sector ¡traversal ¡plan ¡ Low-‑level ¡planner ¡

SLIDE 16

100m

Risk-Aware Graph Search (RAGS)

Graph ¡search ¡with ¡uncertain ¡

edge ¡costs ¡

– Normal ¡distribuIons ¡

Bound ¡path ¡set ¡

– DominaIon ¡according ¡to ¡ mean ¡and ¡variance ¡

15 DEMUR 2015 Jen Jen Chung | Oregon State University

A < B ↔ A.c < B.c

( )∧ A.σ 2 < B.σ 2

( )

SLIDE 17

RAGS Path Execution

16 DEMUR 2015 Jen Jen Chung | Oregon State University

A1 A2 B1 B2 B4 A3 Start Goal A B B3 ~ ! µAm,σ Am

2

( )

~ ! µBn,σ Bn

2

( )

cA0 cB0 P cAi = x; cAj > x, ∀j ≠ i

( )

i=1 m

∑

⋅1− P cBi > x, ∀i ∈ 1,!,n

{ }

( )dx

−∞ ∞

∫

The ¡probability ¡that ¡traveling ¡via ¡B ¡ ¡ will ¡yield ¡a ¡cheaper ¡path ¡than ¡traveling ¡via ¡A ¡

SLIDE 18

RAGS vs. Existing Planning Algorithms

TesIng ¡on ¡graph ¡with ¡100 ¡verIces ¡

– 3 ¡sets ¡of ¡edge ¡cost ¡distribuIons ¡

Compared ¡against ¡

– Naïve ¡A* ¡on ¡the ¡mean ¡ – Greedy ¡on ¡bounded ¡path ¡set ¡ – D* ¡

17 DEMUR 2015 Jen Jen Chung | Oregon State University

Edge cost = Euclidean distance +ε, ε ~ ! µ,σ 2

( )

µ ∈ 0,100

[ ]

σ 2 ∈ 0,σ max

2

" # $ %, σ max

2

= 5,10,20

{ }

SLIDE 19

RAGS vs. Existing Planning Algorithms

18 DEMUR 2015 Jen Jen Chung | Oregon State University

σ 2 ∈ 0,5

( )

σ 2 ∈ 0,10

( )

σ 2 ∈ 0,20

( )

SLIDE 20

RAGS Integration with UTM Agents

19 DEMUR 2015 Jen Jen Chung | Oregon State University

100m

SLIDE 21

Comparison of A* and RAGS

20 DEMUR 2015 Jen Jen Chung | Oregon State University

UAVs ¡planning ¡with ¡A* ¡ UAVs ¡planning ¡with ¡RAGS ¡

SLIDE 22

Conclusions and Future Work

Implicit ¡cooperaIon ¡by ¡learning ¡individual ¡control ¡policies ¡trained ¡on ¡

global ¡reward ¡structures ¡

Risk-‑aware ¡graph ¡search ¡accounts ¡for ¡modeled ¡uncertainIes ¡in ¡the ¡

environment ¡

IniIal ¡integraIon ¡of ¡high ¡and ¡low-‑level ¡decision ¡making ¡shows ¡faster ¡

learning ¡rates ¡

Future ¡work ¡

– Reward ¡shaping ¡to ¡improve ¡UTM ¡agent ¡policies ¡ – TheoreIcal ¡guarantees ¡of ¡RAGS ¡ – ValidaIon ¡and ¡verificaIon ¡

21 DEMUR 2015 Jen Jen Chung | Oregon State University

SLIDE 23

Acknowledgements

22 DEMUR 2015 Jen Jen Chung | Oregon State University

Learning to Trick Robots into Cooperative Behavior

Jen ¡Jen ¡Chung ¡ ¡ Autonomous ¡Agents ¡and ¡Distributed ¡Intelligence ¡Lab ¡ Oregon ¡State ¡University ¡

UAV Package Delivery

drones: ¡UPS, ¡Amazon, ¡etc. ¡

urban ¡environment ¡

scale ¡coordinaIon ¡

A Cross-Section of the Airspace

management ¡

– Narrow ¡thoroughfares ¡of ¡ dense ¡traffic ¡ – Heterogeneous ¡UAVs ¡ – Dynamic ¡obstacle ¡landscape ¡

– Minimize ¡conflict ¡occurrences ¡ – Avoid ¡cascading ¡effects ¡ – Maintain ¡throughput ¡

Multiagent UAV Traffic Management (UTM)

– Assign ¡single ¡UTM ¡agent ¡to ¡ manage ¡each ¡sector ¡

– UTM ¡agents ¡individually ¡learn ¡ policy ¡for ¡assigning ¡sector ¡ traversal ¡costs ¡ – Reward ¡is ¡total ¡number ¡of ¡ conflicts ¡in ¡global ¡system ¡

A Hierarchical Approach

Sector ¡Agents ¡ UAVs ¡

Define ¡cost ¡of ¡travel ¡in ¡each ¡sector ¡ according ¡to ¡current ¡UAV ¡density ¡ Plans ¡across ¡sector ¡cost ¡graph ¡ Sector-­‑level ¡planner ¡ Plans ¡across ¡obstacle ¡map ¡according ¡to ¡ sector ¡traversal ¡plan ¡ Low-­‑level ¡planner ¡

UTM Learning Agents

to ¡UAVs ¡in ¡the ¡sector ¡

– Inputs: ¡UAV ¡counts ¡in ¡sector ¡

§ Separate ¡into ¡traffic ¡types, ¡e.g. ¡ heading, ¡priority, ¡plaTorm ¡etc. ¡

– Outputs: ¡Cost ¡of ¡through-­‑sector ¡ travel ¡for ¡each ¡traffic ¡type ¡

NN ¡weights ¡

– Fitness ¡value: ¡number ¡of ¡conflicts ¡

Evolutionary Algorithms for Learning Control Policies

Mutate ¡each ¡to ¡create ¡ total ¡populaIon ¡of ¡2k ¡NNs ¡ Retain ¡k ¡best ¡performing ¡ NNs ¡ IniIalize ¡populaIon ¡of ¡k ¡NNs ¡ Test ¡each ¡NN ¡and ¡assess ¡fitness ¡

Cooperative Coevolutionary Algorithms (CCEAs)

IniIalize ¡M ¡populaIons ¡of ¡k ¡NNs ¡ Mutate ¡each ¡to ¡create ¡M ¡ populaIons ¡of ¡2k ¡NNs ¡ Randomly ¡select ¡one ¡NN ¡from ¡ each ¡populaIon ¡to ¡create ¡team ¡Ti

Assess ¡team ¡performance ¡and ¡ assign ¡fitness ¡to ¡team ¡members

Retain ¡k ¡best ¡performing ¡ NNs ¡of ¡each ¡populaIon ¡

Simulation Experiments

– 256×256 ¡cell ¡map ¡of ¡San ¡ Francisco ¡ – 15 ¡Voronoi ¡parIIons ¡

– Linear: ¡no. ¡conflicts ¡at ¡each ¡ cell ¡summed ¡ – QuadraIc: ¡no. ¡conflicts ¡at ¡ each ¡cell ¡squared ¡and ¡ summed ¡

Simulation Experiments

– IniIalized ¡with ¡populaIon ¡of ¡10 ¡NN ¡control ¡policies, ¡10% ¡mutaIon ¡noise ¡ – Inputs: ¡{nN, ¡nS, ¡nE, ¡nW} ¡ – Outputs: ¡{cN, ¡cS, ¡cE, ¡cW} ¡ – Fitness: ¡number ¡of ¡conflicts ¡

Learning Results: Total Conflicts

100 ¡learning ¡epochs ¡

system ¡conflicts ¡

Congestion Reduction: Linear Cost Fitness Function

Random ¡iniIalized ¡sector ¡costs ¡ Learned ¡sector ¡costs ¡

Congestion Reduction: Quadratic Cost Fitness Function

Random ¡iniIalized ¡sector ¡costs ¡ Learned ¡sector ¡costs ¡

Extensions to Sector Agent Control Policies

equal ¡

inputs ¡and ¡outputs ¡

Package ¡ delivery ¡UAVs ¡ Emergency ¡ medical ¡UAVs ¡ Weighted ¡ Cross-­‑weighted ¡ MulI-­‑mind ¡

A Hierarchical Approach

Sector ¡Agents ¡ UAVs ¡

Define ¡cost ¡of ¡travel ¡in ¡each ¡sector ¡ according ¡to ¡current ¡UAV ¡density ¡ Plans ¡across ¡sector ¡cost ¡graph ¡ Sector-­‑level ¡planner ¡ Plans ¡across ¡obstacle ¡map ¡according ¡to ¡ sector ¡traversal ¡plan ¡ Low-­‑level ¡planner ¡

Risk-Aware Graph Search (RAGS)

edge ¡costs ¡

– Normal ¡distribuIons ¡

– DominaIon ¡according ¡to ¡ mean ¡and ¡variance ¡

A < B ↔ A.c < B.c

( )∧ A.σ 2 < B.σ 2

( )

RAGS Path Execution

A1 A2 B1 B2 B4 A3 Start Goal A B B3 ~ ! µAm,σ Am

( )

~ ! µBn,σ Bn

( )

cA0 cB0 P cAi = x; cAj > x, ∀j ≠ i

( )

∑

⋅1− P cBi > x, ∀i ∈ 1,!,n

{ }

( )dx

∫

The ¡probability ¡that ¡traveling ¡via ¡B ¡ ¡ will ¡yield ¡a ¡cheaper ¡path ¡than ¡traveling ¡via ¡A ¡

RAGS vs. Existing Planning Algorithms

– 3 ¡sets ¡of ¡edge ¡cost ¡distribuIons ¡

– Naïve ¡A* ¡on ¡the ¡mean ¡ – Greedy ¡on ¡bounded ¡path ¡set ¡ – D* ¡

Edge cost = Euclidean distance +ε, ε ~ ! µ,σ 2

( )

µ ∈ 0,100

[ ]

σ 2 ∈ 0,σ max

" # $ %, σ max

= 5,10,20

{ }

RAGS vs. Existing Planning Algorithms

Define ¡cost ¡of ¡travel ¡in ¡each ¡sector ¡ according ¡to ¡current ¡UAV ¡density ¡ Plans ¡across ¡sector ¡cost ¡graph ¡ Sector-‑level ¡planner ¡ Plans ¡across ¡obstacle ¡map ¡according ¡to ¡ sector ¡traversal ¡plan ¡ Low-‑level ¡planner ¡

– Outputs: ¡Cost ¡of ¡through-‑sector ¡ travel ¡for ¡each ¡traffic ¡type ¡

Package ¡ delivery ¡UAVs ¡ Emergency ¡ medical ¡UAVs ¡ Weighted ¡ Cross-‑weighted ¡ MulI-‑mind ¡

Define ¡cost ¡of ¡travel ¡in ¡each ¡sector ¡ according ¡to ¡current ¡UAV ¡density ¡ Plans ¡across ¡sector ¡cost ¡graph ¡ Sector-‑level ¡planner ¡ Plans ¡across ¡obstacle ¡map ¡according ¡to ¡ sector ¡traversal ¡plan ¡ Low-‑level ¡planner ¡