cooperative multiagent patrolling for detecting multiple
play

Cooperative Multiagent Patrolling for Detecting Multiple Illegal - PowerPoint PPT Presentation

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Cooperative Multiagent Patrolling for Detecting Multiple Illegal Actions Under Uncertainty Best Paper ICTAI 2016 Aur elie Beynier 19 juin 2017 1 / 26


  1. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Cooperative Multiagent Patrolling for Detecting Multiple Illegal Actions Under Uncertainty Best Paper ICTAI 2016 Aur´ elie Beynier 19 juin 2017 1 / 26

  2. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Context • Multiagent patrolling in adversarial domains • Patrollers have to cooperate to patrol a set of sites and prevent illegal actions (poaching, illegal fishing, intrusion on a frontier) from multiple adversaries • Patrollers have partial observability of the system • Uncertainty on action outcomes • A priory unknown dynamic strategies of the adversaries 2 / 26

  3. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Problem Setting • m heterogeneous defenders (agents i with i ∈ [1 , m ]). • n target sites t j to visit (with j ∈ [1 , n ]). • An unknown number of adversaries trying to perform illegal actions on target sites. • The environment toplogy is represented as a graph G = ( N , E ) with N = { t 1 , · · · , t n } and E is the set of possible routes between the targets. • Uncertainty on move duration : each edge e = ( t k , t j ) ∈ E is assigned a probability distribution C k , j on possible travel durations. • Each patrolling agent has limited observability of the system: observes her own location and adversaries on the current target site. 3 / 26

  4. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Related works Security Games [PPT + 07, BGA09, KJT + 09, ABVT13] • The adversary is able to perform extensive surveillance and obtains full knowledge of the patrolling strategy • The adversary conducts a one-shot attack. • No effective cooperation between the patrollers except in [SJY + 14] (but a single and fully rational adversary). Resource conservation games [FST15, NSG + 16, QHJT14] • Multiple intruders performing illegal actions. • Objective: maximizing the number of detected illegal actions. • Limited observability of the intruders. • No effective cooperation between the patrollers. 4 / 26

  5. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Issues Each patrolling agent • must decide in an autonomous but cooperative way which target to visit at each decision step, • must deal with the uncertainty on action outcomes, • partially observe the state of the system (illegal actions performed and states of the other agents), • has no knowledge a priori on the adversaries strategy, • must face evolving strategies of the adversary . 5 / 26

  6. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Overview of the approach Main components • Context definition from observations on the adversaries • DEC-POMDP formalization of the decision problem based on the current context • Online policy computation • Online detection of context changes DEC- Patrolling Context Patrolling POMDP strategy Definition strategy formal- compu- (PI) execution ization tation 6 / 26

  7. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Adversary model • Detected illegal actions are the only information about the adversaries • We define the probability PI i ( t ) that the adversaries perform an illegal action on site t i at t → the current context • Let NI i ( t − H , t ) be the number of detected adversaries on target t i (defined for all t i in N ) between t − H and t . • Each agent estimates PI i ( t ) using the following equation: NI i ( t − H , t ) PI i ( t ) = (1) � t k ∈N NI k ( t − H , t ) 7 / 26

  8. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP background DECentralized Partially Observable Markov Decision Process [BZI02] • A g = { 1 , ..., m } is a set of m agents, • S is the set of world states, • A = { A 1 × · · · × A m } is the set of possible joint actions a = { a 1 , ..., a m } • T is the transition function giving the probability T ( s ′ | s , a ) that the system moves to s ′ while executing a from s , • O = { O 1 × · · · × O m } is the set of joint observations o = { o 1 , ..., o m } , • Ω is the observation function giving the probability Ω( o | s , a ) of observing o when executing a from s , • R ( s ′ | s , a ) is the reward obtained when executing action a from s and moving to s ′ . 8 / 26

  9. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP for multiagent patrolling • A state s t at time t is defined as: the position of each agent, the list of targets where an illegal action has been currently observed, the idleness of each target, the elapsed time of each current move. • An individual action a i consists in moving to target t j ( t j ∈ N ). • Transition probabilities are defined from probabilities on move durations and from probabilities on detection of illegal actions. • Each agent observes her current position and illegal actions on the currently patrolled target. • The observation function is deterministic. • The reward function is defined in order to reward detected illegal actions. 9 / 26

  10. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP for multiagent patrolling Transition function • The probability that an agent reaches her target is defined from probability distributions C k , j . • Probabilities on the detection of illegal actions are estimated using the current context PI . Let w j be the probability of observing an illegal action on t j at t : min (∆ int , idle j ) � w j ( t ) = P ( I j ( t − x )) w =0 where I j ( t − x ) denotes the event “an illegal action is initiated at t − x on t j ” and is derived from PI 10 / 26

  11. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP solving: background • Solving a DEC-POMDP consists in computing a joint policy π = � π 1 , · · · , π n � where π i is the individual policy of the agent i . • Individual policies are coordinated and takes into account uncertainty and partial observability • The joint policy is usually computed off-line • The joint policy is then executed in a distributed way. 11 / 26

  12. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP solving: background 12 / 26

  13. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP solving: background Observations - Actions history An history of observations - actions for an agent is the sequence of observations and actions made by the agent along the execution: ¯ i = ( o 1 i , a 1 i , o 2 i , a 2 θ t i , · · · , o t i , a t i ) Observations history If the policy is deterministic, it can be summarize by the sequence of observations: ¯ i = ( o 1 i , o 2 θ t i , · · · , o t i ) 13 / 26

  14. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP solving For a given context, the DEC-POMDP can be optimally solved by existing algorithms in a centralized way. But: • Poor scalability because of the high complexity : optimally solving a DEC-POMDP is NEXP-complete [BZI02]. • Centralized computation should be avoided. • The policy has to be frequently updated . 14 / 26

  15. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP solving We propose an evolutionary algorithm (adapted from 1+1 algorithm) to optimize the patrolling strategy over a finite horizon T : • it can be executed in a decentralized way, • it allows the agents to exploit strategies of the previous context to compute new strategies, • it is anytime and scales well but no guarantee on the quality of the solutions. 15 / 26

  16. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP solving champion = RamdomIndividual() championValue = Evaluate(champion) while deadline non reached do challenger = Mutation(champion) challengerValue = Evaluate(challenger) if challengerValue > championValue then champion = challenger championValue = challengerValue end if end while 16 / 26

  17. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Limiting Communication • Patrolling agents need to communicate their observations about detected illegal actions . • Allows the agents to deduce the new context but risky and resource-consuming. • We propose to measure the relevance of an information : only relevant information is communicated. • The relevance measures the distance (Kullback-Leibler divergence) between the current probability distribution PI and the new one PI ′ obtained by considering the new information P ( i ) log P ( i ) Q ( i ) log Q ( i ) � � D ( P , Q ) = Q ( i ) + P ( i ) i i . 17 / 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend