Single Agent Policies for the Multi-Agent Persistent Surve - PowerPoint PPT Presentation

Ignorance is bliss: the role of noise and heterogeneity in training and deployment of: Single Agent Policies for the Multi-Agent Persistent Surve veillance Problem Tom Kent Collective Dynamics Seminar 30-10-19

Bio Undergraduate University of Edinburgh (2007-2011) Mathematics Msc PhD University of Bristol (2011-2015) Aerospace Engineering Optimal Routing and Assignment for Commercial Formation Flight Post Doc University of Bristol (2015-Present) Venturer Project Path Planning & Decision Making for Driverless Cars

Academic PIs Seth Bullock Eddie Wilson • Five-year project (2017-22) fundamental autonomous system design problems Jonathan Lawry • Hybrid Autonomous Systems Engineering ‘R3 Challenge’: Arthur Richards • Robustness, Resilience, and Regulation . Post-Docs • Innovate new design principles and processes Tom Kent Use Cases Use Cases • Build new tools for analysis and design Michael Crosscombe Use Cases Use Cases • Engaging with real Thales use cases : Debora Zanatto • Self- Consensus Dynamical Cascading Hybrid Low-Level Flight Monitoring Formation Hierarchical Failure & in Context for Task Network • Hybrid Rail Systems PhDs Collaborative Decomposition Topology Autonomy Elliot Hogg • Hybrid Search & Rescue. Will Bonnell • Engaging stakeholders within Thales Autonomous Systems Architecting Chris Bennett • Charles Clarke Finding a balance between academic and Hybrid Challenges – People & Autonomous Systems industrial outputs

Motivating Question Can we train single-agent policies in isolation that can be successfully deployed in multi-agent scenarios? • Tricky to train/model end-to-end for large multi-agent problems – lots of samples required Policy • Evaluation Loss: Single-Agent Environment = ~ (Noise, under-modelling, uncertainty) Multi-Agent Environment = ~ (Noise, under-modelling, uncertainty)^(No. Agents) + interactions Policy • Enormous design-space and parameter-space • Do we need to solve the entire problem at once?

Hex Score Persistent Surveillance High Objective: Maximise Surveillance Score (Sum of all hexes) Med Method: Continuously visit hexes to increase score Hex score: Increases quickly then decays Low 5

Local Policies S t Gets a reward 1.4 S t+1 – S t 1.1 2.1 Some Fancy Policy 20 S t+1 15.7 4.2 6.8 20.0 0.7 1.8 1.4 18.2 [20.0, 4.2, 6.8, 15.7, 2.1, 1.4. 1.1] 1.1 2.1 3.6 13.9 5.7 Action 20 15.7 4.2 6.8

S t Local Policies 1.4 1.1 2.1 20 Heuristics Random Move random direction 4.2 15.7 6.8 Gradient Move towards lowest value Descent [20.0, 4.2, 6.8, 15.7, 2.1, 1.4. 1.1] DDPG Deep Deterministic Policy Gradient – Trained neural net – Deterministic policy 'AI' Neuro-Evolution of Augmenting Topologies – hand crafted approximates gradient descent NEAT Benchmarks Performance Pre-defined trail to follow – visiting each hex in turn and continuing in a loop Trail Best User Input User Mouse input – move towards clicked location (local and global version) Good Poor

Comparison of Local Policies User Input Trail Human How hard is it to deploy? DDPG Gradient Performance NEAT Descent Best Random Good How hard is it to develop? Poor

Comparison of Local Policies Gradient Random Descent DDPG Trail Performance Best Good Poor

Policy Performance – 1 Agent Trail NEAT GD DDPG Random

Human input (aka graduate descent) Local view Global view • • User clicks hex User clicks hex • • Agent moves in direction of cursor Agent moves in direction of cursor • • Attempt to build global picture & localise Can more easily plan ahead • • Users tend to do gradient descent Users tend to attempt a trail

Policy Performance – 1 Agent Trail UI global UI local GD NEAT DDPG Random

Multiple agents Can we train single-agent policies in isolation that can be successfully deployed in multi-agent scenarios? • All Agents have identical policies • Agents all have perfect global state knowledge Policy • Agents observe their local state and decide action • Agents then all move simultaneously • No communications Policy • No cooperation or planning for other agents • Other agents appear as 'obstacles'

Policy Performance – 3 Agents Trail DDPG GD NEAT Random

Policy Performance – 5 Agents Trail DDPG Random GD NEAT

Homogeneous-policy convergence problem

Homogeneous-policy convergence problem Agent A Agent A Obs Obs action Policy Policy Likely to repeat Policy Agent B Agent B Obs Obs action Policy Policy The convergence cycle 1) Agents move into the same hex ❖ Cooperate to stop agents occupying the same hex Add stochasticity 2) Get an identical state observation action-noise ❖ Have differing state beliefs 3) Identical policies returns identical action choices ❖ Make policies non-deterministic 4) Identical actions lead to high chance of repeating 1) We can break this cycle at any of these points! ❖ Have agents take turns

Policy Performance & action noise - 5 agents NEAT + noise Trail + noise Trail GD + noise Random GD NEAT

Decentralised State Agent A Agent A State action Policy Obs State belief Policy belief Communicate Policy Agent B State Policy Agent B belief State action Policy Obs belief The convergence cycle 1) Agents move into the same hex Add stochasticity individual state beliefs 2) Get an identical state observation Comms for state consensus 3) Identical policies returns identical action choices 4) Identical actions lead to high chance of repeating 1)

Belief Updating • Agents communicate their state-belief Agent A Update Belief State action Policy Obs belief • Agents update their belief to form global ' true' state Agent B • How should agents incorporate these other agents' state belief beliefs? Communicate Update functions 1) Max: Agent A state belief Update Belief The max value of own and other's beliefs 2) Average: Agent B Average of own belief and other agents' beliefs State action 3) Weighted Average: Policy Obs belief Proportionally weight own belief and others 1) W_0.9 -> 0.9*(own belief) + 0.1*(others) 2) W_1.0 -> 1.0*(own belief) 3) W_0.0 -> 1.0*(others belief)

State belief Consensus results Centralise + noise W_1 Consensus Centralised • Ignoring other agents states leads to differing states • How much you use other agents beliefs determines how close to a single global 'truth' you are • Idenitcal states leads to policy convergence

Decentralised State Heterogeneous Policies Agent A Agent A State action Policy 1 Obs State belief Policy belief Communicate Policy Agent B State Policy Agent B belief State action Obs Policy 2 belief The convergence cycle 1) Agents move into the same hex Add stochasticity individual state beliefs 2) Get an identical state observation Comms for state consensus 3) Identical policies returns identical action choices 4) Identical actions lead to high chance of repeating 1) Heterogeneous Teams Different agent policies

Decentralised State Heterogeneous Policies Team Size 3 Policies Gradient Descent DDPG NEAT Heterogenous Team can out perform benchmark Team: [DDPG, NEAT, GD] Update: Max Belief Update Max W = 1.0 But a team of identical ignorant agents can do even better W = 0.9 Team: [NEAT, NEAT, NEAT] Update: W=1.0 (only use own belief) Benchmark Centralised + action noise Centralised

Local Policies: Take away • The multi-agent persistent surveillance problem is somewhat simplistic • Short-term planning is often sufficient • Agents trained in isolation can still perform in a multi-agent scenario • Global 'trail' policies perfom better • Simplistic gradient descent approaches perform pretty well • Homogeneous-policy convergence cycle is a problem and can be avoided by essentially becoming more heterogeneous • Action stochasticity – adding noise • State/observation stochasticity – agent specific state beliefs • Heterogenous policies – teams of different agents • Decentralised case with agents having partial knowledge can be benificial • Different methods of state consensus indicate that communication, that is being closer to the global truth, can be detrimental to performance

Higher Level Decisions • What if we moved up the decision making hierarchy? • Previous work [1]: Decentralised Co-Evoultionary Algorithm to solve decentralised Multi-Agent Travelling Salesman (DEA) • Make Persistent surveillance a higher-level goal - the agents do not consider it • What if we instead place tasks in order to maximise the surveillance score ? • MATSP and shortest path problems lead to essentially decentralised trails [1] Thomas E. Kent and Arthur G. Richards. “ Decentralised multi-demic evolutionary approach to the dynamic multi-agent travelling salesman problem ”. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion on -GECCO ’19. doi: 10.1145/3319619.3321993

Combining Persistent Surveillance and MATSP Hex 9 High Level Hex 21 Tasking Hex 34 Hex 34 Hex 9 Persistent Hex 9 Hex 9 Surveillance Tasker Hex 9 Hex 9 Hex 9 Hex 9

1 Agent 5 Agents

Combining Persistent Surveillance and MATSP

Combining Persistent Surveillence and MATSP

Single Agent Policies for the Multi-Agent Persistent Surve - PowerPoint PPT Presentation

Ignorance is bliss: the role of noise and heterogeneity in training and deployment of: Single Agent Policies for the Multi-Agent Persistent Surve veillance Problem Tom Kent Collective Dynamics Seminar 30-10-19 Bio Undergraduate University

Single-Agent Policies for the Multi-Agent Persistent Surveillance Problem via Artificial

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Hardware Support for ACID Transactions in Persistent Memory Arpit Joshi , Vijay Nagarajan, Marcelo

Persistent Handles: approaches Ralph Bhme, Samba Team, SerNet 2018-06-08 Outline Persistent

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

Distributed Shared Persistent Memory (SoCC 17) Yizhou Shan, Yiying Zhang Persistent Memory

Persistent Homology: Persistence Modules Andrey Blinov 6 October 2017 Andrey Blinov Persistent

Logging in Persistent Memory: to Cache, or Not to Cache? Mengjie Li, Matheus Ogleari , Jishen Zhao

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

Kinds of picture Single frame Kinds of picture Single frame Multi-frame Kinds of

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Single agent or multiple agents Many domains are characterized by multiple agents rather than a

Single agent or multiple agents Many domains are characterized by multiple agents rather than a

Coordinating International Shipping Steven Y. Goldsmith Laurence R. Phillips Shannon V. Spires

Artificial Intelligence (IT4042E) Quang Nhat Nguyen quang.nguyennhat@hust.edu.vn Hanoi

Dialogue Systems Emerging interdisciplinary area since the early 1990s integration of

Inaugural Cultural Evolution Society Conference Jena 2017, 13-15 September Designing filtering,

Dialogue and Conversational Agents Ling575 Spoken Dialog Systems April 2, 2015 Roadmap

Cooperation rather than contention (instead - Coopetition Paradigms in Competitive Wireless

Coalition and Law Enforcement Collaboration Increasing Public Awareness of Social Host

A Design Framework for Collaborative Browsing Guillermo de Jess HOYOS RIVERA Jean-Pierre