[PPT] - Hydrothermal Vent Hunting using Occupancy Grids Zeyn Saigol * , PowerPoint Presentation

SLIDE 1

Zeyn Saigol*, Richard Dearden*, Jeremy Wyatt* and Bramley Murton†

TAROS 2010, Plymouth

Belief Change Maximisation for Hydrothermal Vent Hunting using Occupancy Grids

*School of Computer Science

University of Birmingham

†National Oceanography Centre

Southampton

SLIDE 2

 Motivation – vent prospecting

TAROS'10 - Saigol Belief Change Max for OGs

Outline

2/13

 Problem details  Original algorithms

 Single-step lookahead:

Entropy and ΣΔH

 Non-myopic planning:

ΣΔH-MDP

 Fix for re-rewarding:

OP correction

 Summary

SLIDE 3

TAROS'10 - Saigol Belief Change Max for OGs

Motivation – Hydrothermal Vents

 Sea floor, 3000m, 350°C  Emit a plume containing ‘tracers’, dissolved

chemicals and minerals

 Turbulent current means no gradient  Often found in clusters, so plumes combine

3/13

SLIDE 4

TAROS'10 - Saigol Belief Change Max for OGs

The Challenge

 Ship-based search followed by

AUV deployment

 Use chemical tracers – vision

impossible, sonar difficult

 AUVs – exhaustive search  Use AI: goal of finding as many

vents as possible during mission

 Partially observable, multiple

sources, indirect observations

 Options:

 Reactive, moth-like  Information theoretic – build

probabilistic map, then plan

4/13

SLIDE 5

 Motivation – vent prospecting  Problem details  Original algorithms

 Single-step lookahead: Entropy and ΣΔH  Non-myopic planning: ΣΔH-MDP  Fix for re-rewarding: OP correction

 Summary

TAROS'10 - Saigol Belief Change Max for OGs

Outline

SLIDE 6

TAROS'10 - Saigol Belief Change Max for OGs

Problem Model

 Mapping: adopt occupancy grid

(OG) algorithm of Michael Jakuba

 Uses plume detections and current

to infer map. Observations z  {locate vent, detect plume, nothing}

 Cells occupied (mcvent) or

empty; OG consists of P(mc) values

 Belief state b = (OG, xAUV)  Actions: a {N,E,S,W}  OG : b’=srog(b,a,z)  Observation model P(z|b,a)  Partially-observable Markov decision process (POMDP) –

but intractable, 20x20 grid => 10244 states

5/13

SLIDE 7

 Motivation – vent prospecting  Problem details  Original algorithms

 Single-step lookahead: Entropy and ΣΔH  Non-myopic planning: ΣΔH-MDP  Fix for re-rewarding: OP correction

 Summary

TAROS'10 - Saigol Belief Change Max for OGs

Outline

SLIDE 8

TAROS'10 - Saigol Belief Change Max for OGs

Infotaxis Algorithm

 Vergassola et al. developed infotaxis for finding a single

chemical source, using a continuous distribution map

 Chooses action that reduces uncertainty in map the most  Uncertainty defined by entropy; entropy of OG from sum of

entropy of cells Hc

N E S 6/13

SLIDE 9



z

 Entropy for

bservation

locate vent

TAROS'10 - Saigol Belief Change Max for OGs

Infotaxis Algorithm

 Vergassola et al. developed infotaxis for finding a single

chemical source, using a continuous distribution map

 Chooses action that reduces uncertainty in map the most  Uncertainty defined by entropy; entropy of OG from sum of

entropy of cells Hc

N E S



P(z=l) 

H(srog(b,a=N,z=l)) Hc

6/13

 Value of N action =

expected new entropy

SLIDE 10



z

TAROS'10 - Saigol Belief Change Max for OGs

Infotaxis Algorithm

 Vergassola et al. developed infotaxis for finding a single

chemical source, using a continuous distribution map

 Chooses action that reduces uncertainty in map the most  Uncertainty defined by entropy; entropy of OG from sum of

entropy of cells Hc

N E S



P(z=l) 

H(srog(b,a=N,z=l)) Hc



P(z=n) 



P(z=p) 

H(srog(b,a=N,z=p)) H(srog(b,a=N,z=n))

6/13

SLIDE 11

TAROS'10 - Saigol Belief Change Max for OGs

Infotaxis Algorithm

 Vergassola et al. developed infotaxis for finding a single

chemical source, using a continuous distribution map

 Chooses action that reduces uncertainty in map the most  Uncertainty defined by entropy; entropy of OG from sum of

entropy of cells Hc

N E S

6 

 



z

  



z

6/13

SLIDE 12

N S E TAROS'10 - Saigol Belief Change Max for OGs

Infotaxis Algorithm

 Vergassola et al. developed infotaxis for finding a single

chemical source, using a continuous distribution map

 Chooses action that reduces uncertainty in map the most  Uncertainty defined by entropy; entropy of OG from sum of

entropy of cells Hc

6 3 4

6/13

SLIDE 13

TAROS'10 - Saigol Belief Change Max for OGs

ΣΔH Algorithm

 Jakuba’s OG algorithm requires a

low prior occupancy probability (0.01), as small number of vents are expected in a given search area

 This means plume and vent

detections, which provide useful information, can actually increase entropy

7/13

SLIDE 14

TAROS'10 - Saigol Belief Change Max for OGs

ΣΔH Algorithm

 Jakuba’s OG algorithm requires a

low prior occupancy probability (0.01), as small number of vents are expected in a given search area

 This means plume and vent

detections, which provide useful information, can actually increase entropy

 Heuristic alternative: ΣΔH. Use the change in entropy,

regardless of whether increase or decrease

N

cell-by-cell subtraction



P(z=l) 



 

riginal cell entropies

Hc

7/13

SLIDE 15

 Motivation – vent prospecting  Problem details  Original algorithms

 Single-step lookahead: Entropy and ΣΔH  Non-myopic planning: ΣΔH-MDP  Fix for re-rewarding: OP correction

 Summary

TAROS'10 - Saigol Belief Change Max for OGs

Outline

SLIDE 16

TAROS'10 - Saigol Belief Change Max for OGs

Non-myopic Planning: ΣΔH-MDP

 Issue: only plan one step into future  Intuition: instead of evaluating possible action/observation

pairs N steps into future, evaluate effects of observations N steps away – avoids exponential blowup

8/13

SLIDE 17

TAROS'10 - Saigol Belief Change Max for OGs

Non-myopic Planning: ΣΔH-MDP

 Issue: only plan one step into future  Intuition: instead of evaluating possible action/observation

pairs N steps into future, evaluate effects of observations N steps away – avoids exponential blowup

 Mechanics:

 Calculate Ez[ΣΔH] for making an observation from a cell, for every

cell in the OG (as if AUV could teleport to any cell)

 Assume that the OG no longer changes,

and define a reward of Ez[ΣΔH] for visiting a cell

 Then solve a deterministic Markov

decision process (MDP) to get the

ptimal policy given these assumptions



 



 



 



 

8/13

SLIDE 18

ΣΔH-MDP Movie

TAROS'10 - Saigol Belief Change Max for OGs 9/13

SLIDE 19

TAROS'10 - Saigol Belief Change Max for OGs

Results

 Setup: percent found, 133 timesteps, mean of 600 trials  ΣΔH significantly better than mowing-the-lawn (MTL)

30 32 34 36 64 66 68 70 72 74 Mean percent found MTL (an) Infotaxis

 H

Infotaxis-MDP

 H-MDP

Results shown with 95% confidence intervals

10/13

SLIDE 20

TAROS'10 - Saigol Belief Change Max for OGs

Results

 Setup: percent found, 133 timesteps, mean of 600 trials  ΣΔH significantly better than mowing-the-lawn (MTL)  ΣΔH-MDP improves on ΣΔH

30 32 34 36 64 66 68 70 72 74 Mean percent found MTL (an) Infotaxis

 H

Infotaxis-MDP

 H-MDP

Results shown with 95% confidence intervals

10/13

SLIDE 21

TAROS'10 - Saigol Belief Change Max for OGs

Results

 Setup: percent found, 133 timesteps, mean of 600 trials  ΣΔH significantly better than mowing-the-lawn (MTL)  ΣΔH-MDP improves on ΣΔH  ΣΔH improves on infotaxis

30 32 34 36 64 66 68 70 72 74 Mean percent found MTL (an) Infotaxis

 H

Infotaxis-MDP

 H-MDP

Results shown with 95% confidence intervals

10/13

SLIDE 22

 Motivation – vent prospecting  Problem details  Original algorithms

 Single-step lookahead: Entropy and ΣΔH  Non-myopic planning: ΣΔH-MDP  Fix for re-rewarding: OP correction

 Summary

TAROS'10 - Saigol Belief Change Max for OGs

Outline

SLIDE 23

TAROS'10 - Saigol Belief Change Max for OGs

OP Correction

 Slight issue with ΣΔH-MDP is that the MDP assumes re-

visiting a cell earns the same reward

 In fact, repeated observations from same cell are worth less  ΣΔH-OP: replace the MDP with an Orienteering Problem

solver

 Flag-gathering task – zero reward for re-visiting a cell  OP is a variant of the TSP with

rewards for cities and a limited path length

 Use a Monte-Carlo method:

generate random non-crossing paths and select the best

11/13

SLIDE 24

TAROS'10 - Saigol Belief Change Max for OGs

65 70 75 Mean percent found

H H-MDP

IL4

H-OP30

1.2 2.4 3.6 4.8 6 7.2 8.4 9.6 10.8 12 Mean runtime per step (s)

Results - OP Correction

 Results compared to IL4 – online POMDP – our previous

state-of-the-art solution for this domain (Saigol et al. 2009)

 Also applied to OP correction to IL with less conclusive

results (see paper)

12/13

SLIDE 25

TAROS'10 - Saigol Belief Change Max for OGs

Summary

 We have formalised an interesting real-world problem that

poses a significant challenge for AI

 We have created a novel ΣΔH-MDP algorithm to guide

exploration in occupancy grids

 This adapts existing entropy-based techniques to deal with:

 Low prior occupancy probabilities  Uncertain, long-range sensors  Planning further into the future

 When an OP correction is applied, ΣΔH-OP significantly

utperforms traditional methods such as MTL, and performs

at least as well as online POMDP methods but requires less computation time

13/13

SLIDE 26

TAROS'10 - Saigol Belief Change Max for OGs

References

Jakuba, M. (2007). Stochastic Mapping for Chemical Plume Source Localization with Application to Autonomous Hydrothermal Vent Discovery. PhD thesis, MIT and WHOI Joint Program. Saigol, Z., Dearden, R., Wyatt, J., and Murton, B. (2009). Information-lookahead planning for AUV mapping. Proceedings of the Twenty-first International Joint Conference on Artificial Intelligence (IJCAI-09). Vergassola, M., Villermaux, E., and Shraiman, B. I. (2007). 'Infotaxis' as a strategy for searching without gradients. Nature, 445(7126):406–409.

SLIDE 27

TAROS'10 - Saigol Belief Change Max for OGs

Questions

 Any questions?