Cooperative Multiagent Patrolling for Detecting Multiple Illegal - - PowerPoint PPT Presentation

cooperative multiagent patrolling for detecting multiple
SMART_READER_LITE
LIVE PREVIEW

Cooperative Multiagent Patrolling for Detecting Multiple Illegal - - PowerPoint PPT Presentation

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Cooperative Multiagent Patrolling for Detecting Multiple Illegal Actions Under Uncertainty Best Paper ICTAI 2016 Aur elie Beynier 19 juin 2017 1 / 26


slide-1
SLIDE 1

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Cooperative Multiagent Patrolling for Detecting Multiple Illegal Actions Under Uncertainty

Best Paper ICTAI 2016

Aur´ elie Beynier

19 juin 2017

1 / 26

slide-2
SLIDE 2

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Context

  • Multiagent patrolling in adversarial domains
  • Patrollers have to cooperate to patrol a set of sites and

prevent illegal actions (poaching, illegal fishing, intrusion on a frontier) from multiple adversaries

  • Patrollers have partial observability of the system
  • Uncertainty on action outcomes
  • A priory unknown dynamic strategies of the adversaries

2 / 26

slide-3
SLIDE 3

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Problem Setting

  • m heterogeneous defenders (agents i with i ∈ [1, m]).
  • n target sites tj to visit (with j ∈ [1, n]).
  • An unknown number of adversaries trying to perform illegal

actions on target sites.

  • The environment toplogy is represented as a graph

G = (N, E) with N = {t1, · · · , tn} and E is the set of possible routes between the targets.

  • Uncertainty on move duration: each edge e = (tk, tj) ∈ E

is assigned a probability distribution Ck,j on possible travel durations.

  • Each patrolling agent has limited observability of the

system: observes her own location and adversaries on the current target site.

3 / 26

slide-4
SLIDE 4

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Related works

Security Games [PPT+07, BGA09, KJT+09, ABVT13]

  • The adversary is able to perform extensive surveillance and
  • btains full knowledge of the patrolling strategy
  • The adversary conducts a one-shot attack.
  • No effective cooperation between the patrollers except in

[SJY+14] (but a single and fully rational adversary).

Resource conservation games [FST15, NSG+16, QHJT14]

  • Multiple intruders performing illegal actions.
  • Objective: maximizing the number of detected illegal actions.
  • Limited observability of the intruders.
  • No effective cooperation between the patrollers.

4 / 26

slide-5
SLIDE 5

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Issues

Each patrolling agent

  • must decide in an autonomous but cooperative way which

target to visit at each decision step,

  • must deal with the uncertainty on action outcomes,
  • partially observe the state of the system (illegal actions

performed and states of the other agents),

  • has no knowledge a priori on the adversaries strategy,
  • must face evolving strategies of the adversary.

5 / 26

slide-6
SLIDE 6

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Overview of the approach

Main components

  • Context definition from observations on the adversaries
  • DEC-POMDP formalization of the decision problem based on

the current context

  • Online policy computation
  • Online detection of context changes

Context Definition (PI) DEC- POMDP formal- ization Patrolling strategy compu- tation Patrolling strategy execution

6 / 26

slide-7
SLIDE 7

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Adversary model

  • Detected illegal actions are the only information about the

adversaries

  • We define the probability PIi(t) that the adversaries perform

an illegal action on site ti at t → the current context

  • Let NIi(t − H, t) be the number of detected adversaries on

target ti (defined for all ti in N) between t − H and t.

  • Each agent estimates PIi(t) using the following equation:

PIi(t) = NIi(t − H, t)

  • tk∈N NIk(t − H, t)

(1)

7 / 26

slide-8
SLIDE 8

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

DEC-POMDP background

DECentralized Partially Observable Markov Decision Process [BZI02]

  • Ag = {1, ..., m} is a set of m agents,
  • S is the set of world states,
  • A = {A1 × · · · × Am} is the set of possible joint actions

a = {a1, ..., am}

  • T is the transition function giving the probability T(s′|s, a)

that the system moves to s′ while executing a from s,

  • O = {O1 × · · · × Om} is the set of joint observations
  • = {o1, ..., om},
  • Ω is the observation function giving the probability Ω(o|s, a)
  • f observing o when executing a from s,
  • R(s′|s, a) is the reward obtained when executing action a from

s and moving to s′.

8 / 26

slide-9
SLIDE 9

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

DEC-POMDP for multiagent patrolling

  • A state st at time t is defined as: the position of each agent,

the list of targets where an illegal action has been currently

  • bserved, the idleness of each target, the elapsed time of each

current move.

  • An individual action ai consists in moving to target tj

(tj ∈ N).

  • Transition probabilities are defined from probabilities on move

durations and from probabilities on detection of illegal actions.

  • Each agent observes her current position and illegal actions on

the currently patrolled target.

  • The observation function is deterministic.
  • The reward function is defined in order to reward detected

illegal actions.

9 / 26

slide-10
SLIDE 10

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

DEC-POMDP for multiagent patrolling

Transition function

  • The probability that an agent reaches her target is defined

from probability distributions Ck,j.

  • Probabilities on the detection of illegal actions are estimated

using the current context PI. Let wj be the probability of

  • bserving an illegal action on tj at t:

wj(t) = P(

min(∆int,idlej)

  • w=0

Ij(t − x)) where Ij(t − x) denotes the event “an illegal action is initiated at t − x on tj” and is derived from PI

10 / 26

slide-11
SLIDE 11

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

DEC-POMDP solving: background

  • Solving a DEC-POMDP consists in computing a joint policy

π = π1, · · · , πn where πi is the individual policy of the agent i.

  • Individual policies are coordinated and takes into account

uncertainty and partial observability

  • The joint policy is usually computed off-line
  • The joint policy is then executed in a distributed way.

11 / 26

slide-12
SLIDE 12

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

DEC-POMDP solving: background

12 / 26

slide-13
SLIDE 13

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

DEC-POMDP solving: background

Observations - Actions history

An history of observations - actions for an agent is the sequence of

  • bservations and actions made by the agent along the execution:

¯ θt

i = (o1 i , a1 i , o2 i , a2 i , · · · , ot i , at i )

Observations history

If the policy is deterministic, it can be summarize by the sequence

  • f observations:

¯ θt

i = (o1 i , o2 i , · · · , ot i )

13 / 26

slide-14
SLIDE 14

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

DEC-POMDP solving

For a given context, the DEC-POMDP can be optimally solved by existing algorithms in a centralized way. But:

  • Poor scalability because of the high complexity : optimally

solving a DEC-POMDP is NEXP-complete [BZI02].

  • Centralized computation should be avoided.
  • The policy has to be frequently updated .

14 / 26

slide-15
SLIDE 15

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

DEC-POMDP solving

We propose an evolutionary algorithm (adapted from 1+1 algorithm) to optimize the patrolling strategy over a finite horizon T :

  • it can be executed in a decentralized way,
  • it allows the agents to exploit strategies of the previous

context to compute new strategies,

  • it is anytime and scales well but no guarantee on the quality
  • f the solutions.

15 / 26

slide-16
SLIDE 16

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

DEC-POMDP solving

champion = RamdomIndividual() championValue = Evaluate(champion) while deadline non reached do challenger = Mutation(champion) challengerValue = Evaluate(challenger) if challengerValue > championValue then champion = challenger championValue = challengerValue end if end while

16 / 26

slide-17
SLIDE 17

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Limiting Communication

  • Patrolling agents need to communicate their observations

about detected illegal actions.

  • Allows the agents to deduce the new context but risky and

resource-consuming.

  • We propose to measure the relevance of an information:
  • nly relevant information is communicated.
  • The relevance measures the distance (Kullback-Leibler

divergence) between the current probability distribution PI and the new one PI ′ obtained by considering the new information D(P, Q) =

  • i

P(i)log P(i) Q(i) +

  • i

Q(i)log Q(i) P(i) .

17 / 26

slide-18
SLIDE 18

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Detecting Context Changes

  • A DEC-POMDP is defined for a current context.
  • The current context is revised every T time steps.
  • The adversaries may change their strategy over these T

time steps (non-stationarity of the adversaries policy).

  • Empirical studies show that the number of detected

adversaries significantly decreases after adversaries policy changes.

  • We develop a mathematical method to attempt to detect

such changes: study the variations of the number of detected adversaries over the last time steps.

  • Once a policy changes is detected, the current context (and

the DEC-POMDP) is updated immediately.

18 / 26

slide-19
SLIDE 19

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Detecting Context Changes

  • 1. Compute a moving average of dett(H) values over the last H

time steps for each time step t.

  • 2. Decompose dett values using a finite adaptation of Stieltjes
  • decomposition1. This decomposition allows us to identify

decreasing components det− of det.

  • 3. Apply a backward finite difference operator:

∇det−[t] = det−

t (H) − det− t−1(H)

to quantify variations.

  • 4. Threshold the values obtained in the previous step to detect

adversarial policy changes. When an adversarial policy change is detected, the current context PI is updated even if the deadline T has not been reached.

19 / 26

slide-20
SLIDE 20

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Overview of the approach

Context Definition (PI) DEC- POMDP formal- ization Patrolling strategy compu- tation Patrolling strategy execution

20 / 26

slide-21
SLIDE 21

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Experiments

Evolutionary Algorithm vs. Optimal Algorithm

Figure: Detection ratios of the executed strategies

21 / 26

slide-22
SLIDE 22

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Experiments / Scalability

Figure: Influence of the number

  • f targets

Figure: Influence of the number

  • f agents

22 / 26

slide-23
SLIDE 23

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Experiments / Communication

Figure: Number of messages 3 agents 4 agents 5 agents Com 0.7375 0.8084 0.8645 KL05 0.7241 0.7966 0.8585 KL10 0.6850 0.7476 0.8036 KL15 0.6661 0.7383 0.7660 Figure: Average detection ratio

23 / 26

slide-24
SLIDE 24

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Experiments / Adversaries policy changes

Figure: Detection ratio over time Figure: Influence of strategy changes

24 / 26

slide-25
SLIDE 25

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Conclusion and Future works

More realistic context

  • A new framework for effective cooperation between patrolling

agents in uncertain and partially obsrevable environments

  • Multiple patrollers and multiple adversaries performing

multiple illegal actions

  • Adversaries may not be fully rational and change their

strategies over time.

Overview of the contribution

  • Context-dependent DEC-POMDP formalization
  • On-line distributed policy computations
  • On-line detection of changes of context
  • Measure of information relevance to limit communication

25 / 26

slide-26
SLIDE 26

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion

Conclusion and Future works

Future research directions

  • Develop new algorithms for on-line policy computation
  • Explore new model of the adversaries: include temporal

dimension, dependencies between illegal actions,..

  • Relax the limited observability assumption

26 / 26

slide-27
SLIDE 27

References I

  • B. An, M. Brown, Y. Vorobeychik, and M. Tambe, Security games with

surveillance cost and optimal timing of attack execution, Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems (AAMAS ’13), 2013, pp. 223–230.

  • N. Basilico, N. Gatti, and F. Amigoni, Leader-follower strategies for

robotic patrolling in environments with arbitrary topologies, Proceedings

  • f The 8th International Conference on Autonomous Agents and

Multiagent Systems (AAMAS ’09), vol. 1, 2009, pp. 57–64.

  • D. Bernstein, S. Zilberstein, and N. Immerman, The complexity of

decentralized control of mdps, Mathematics of Operations Research, 2002,

  • pp. 27(4):819–840.

Fei Fang, Peter Stone, and Milind Tambe, When security games go green: Designing defender strategies to prevent poaching and illegal fishing, Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI ’15), 2015.

27 / 26

slide-28
SLIDE 28

References II

  • C. Kiekintveld, M. Jain, J. Tsai, J. Pita, F. Ord´
  • ˜

nez, and M. Tambe, Computing optimal randomized resource allocations for massive security games, Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS ’09), 2009, pp. 689–696. Thanh H. Nguyen, Arunesh Sinha, Shahrzad Gholami, Andrew Plumptre, Lucas Joppa, Milind Tambe, Margaret Driciru, Fred Wanyama, Aggrey Rwetsiba, Rob Critchlow, and Colin Beale, Capture: A new predictive anti-poaching tool for wildlife protection, 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS ’16), 2016.

  • P. Paruchuri, J. P. Pearce, M. Tambe, F. Ordonez, and S. Kraus, An

efficient heuristic approach for security against multiple adversaries, Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems (AAMAS ’07), 2007, pp. 181:1–181:8.

28 / 26

slide-29
SLIDE 29

References III

  • Y. Qian, W. B. Haskell, A. Xin Jiang, and M. Tambe, Online planning for
  • ptimal protector strategies in resource conservation games, Proceedings
  • f the 13th International Conference on Autonomous Agents and

Multi-agent Systems (AAMAS ’14), 2014, pp. 733–740.

  • E. Shieh, A. Xin Jiang, A. Yadav, P. Varakantham, and M. Tambe,

Unleashing dec-mdps in security games: Enabling effective defender teamwork, European Conference on Artificial Intelligence (ECAI), 2014,

  • pp. 819–824.

29 / 26