Distributed Path Planning for Mobile Robots using a Swarm of - - PowerPoint PPT Presentation

distributed path planning for mobile robots using a swarm
SMART_READER_LITE
LIVE PREVIEW

Distributed Path Planning for Mobile Robots using a Swarm of - - PowerPoint PPT Presentation

Distributed Path Planning for Mobile Robots using a Swarm of Interacting Reinforcement Learners Chris Vigorito Department of Computer Science University of Massachusetts - Amherst vigorito@cs.umass.edu May 17th, 2007 AAMAS 07 - Honolulu,


slide-1
SLIDE 1

Distributed Path Planning for Mobile Robots using a Swarm of Interacting Reinforcement Learners

Chris Vigorito

Department of Computer Science University of Massachusetts - Amherst vigorito@cs.umass.edu

May 17th, 2007 AAMAS ’07 - Honolulu, HI

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 1 / 19

slide-2
SLIDE 2

Local Robot Navigation - Obstacle Avoidance

Local navigation (obstacle avoidance)

Goal observable or only heading given Head in desired direction while avoiding obstacles Reasonably good approaches for solving this problem

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 2 / 19

slide-3
SLIDE 3

Global Robot Navigation - Path Planning

Global navigation (path planning)

Goal unobservable/heading unknown (need a model) Want least cost path to goal Lots of uncertainty/decision points Some egocentric approaches with restrictive assumptions and high complexity

? ? ?

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 3 / 19

slide-4
SLIDE 4

Physical Path Planning

Places computational burden on distributed sensor network rather than on robot Network of unsophisticated sensor nodes with local communication capabilities Nodes communicate path information locally to produce globally

  • ptimal solution

Low complexity computation at each node Robots query nodes for least cost path to desired goal

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 4 / 19

slide-5
SLIDE 5

Previous Work

Perform distance vector routing over topological map formed by sensor network Cost-metrics used limited to hop count [Batalin, et al. (2004); Li et

  • al. (2003); O’Hara, et al. (2006)]

Nodes must be able to sense relevant information No information from robot experience is used (no learning) Only tested on uniform terrain with highly structured (e.g., grid-like) network deployments Contribution: Incorporate reinforcement learning to improve solution quality and versatility

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 5 / 19

slide-6
SLIDE 6

Distance Vector Routing

Route incoming packet/robot to next hop router so as to minimize cost function given a destination Each node stores a distance vector estimate (estimated cost from self to all destinations) D(x, z) = min

y∈N(x) d(x, y) + D(y, z)

Distributed form of Bellman-Ford algorithm (dynamic programming) Widely used in networking applications

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 6 / 19

slide-7
SLIDE 7

Reinforcement Learning

Well developed framework for learning from experience how to interact with an environment Goal is to maximize reward (here to minimize a cost function) Common formalism: Markov Decision Process (MDP) = < S, A, T, R > Agents learn policy π to map states to actions that minimize cost function Can be solved by learning an action-value function Q : S × A → ℜ Network routing problem formulated as MDP in Boyan and Littman (1993)

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 7 / 19

slide-8
SLIDE 8

Model and Assumptions

A B C D

All nodes/robots have some means of local communication All robots equipped with local navigation abilities All robots can obtain distance and heading to a nearby node

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 8 / 19

slide-9
SLIDE 9

Swarm of Interacting Reinforcement Learners (SWIRL)

A B C D

QA(D,B) = 10 QA(D,C) = 15

States represented as node/destination pairs Actions are next hop choices Transition function defined by network topology “Reward" = time, energy, danger, etc. Value function distributed across network

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 9 / 19

slide-10
SLIDE 10

Algorithm

A B C D

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

slide-11
SLIDE 11

Algorithm

A B C D

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

slide-12
SLIDE 12

Algorithm

A B C D

QA(D,B) = 0 QA(D,C) = 0

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

slide-13
SLIDE 13

Algorithm

A B C D

QA(D,B) = 0 QA(D,C) = 0

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

slide-14
SLIDE 14

Algorithm

A B C D

QA(D,B) = 0 QA(D,C) = 0

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

slide-15
SLIDE 15

Algorithm

A B C D

QA(D,B) = 0 QA(D,C) = 0

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

slide-16
SLIDE 16

Algorithm

A B C D

QA(D,B) = 0 QA(D,C) = 0

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

slide-17
SLIDE 17

Algorithm

A B C D

QA(D,B) = 0 QA(D,C) = 10

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

slide-18
SLIDE 18

Algorithm

A B C D

QA(D,B) = 0 QA(D,C) = 10

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

slide-19
SLIDE 19

Simulation Environment

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 11 / 19

slide-20
SLIDE 20

Grid Network Deployment

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 12 / 19

slide-21
SLIDE 21

Random Network Deployment

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 13 / 19

slide-22
SLIDE 22

Grid Deployment - Single Start/Goal Pair

20 40 60 80 100 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 Trajectories Time to Goal (s) Hop Count SWIRL Optimal

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 14 / 19

slide-23
SLIDE 23

Grid Deployment - Random Start/Goal Pairs

50 100 150 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 Trajectories Average Time per Hop (s) Hop Count SWIRL

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 15 / 19

slide-24
SLIDE 24

Random Deployment - Single Start/Goal Pair

50 100 150 8.5 9 9.5 10 10.5 11 11.5 Trajectories Time to Goal (s) Hop Count SWIRL

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 16 / 19

slide-25
SLIDE 25

Grid Deployment - Single Start/Goal Pair

25 50 75 100 125 150 175 200 225 6 7 8 9 10 11 12 Seconds Time to Goal (s) 1 Robot 2 Robots 5 Robots 10 Robots 15 Robots

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 17 / 19

slide-26
SLIDE 26

Summary

Extension of existing methods for physical path planning Incorporated reinforcement learning to improve solution quality in the face of unobservability/uncertainty Performs well in wider class of environments Allows for less structured types of network deployments

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 18 / 19

slide-27
SLIDE 27

Limitations and Future Work

Approach doesn’t currently address situation in which links are not traversable Add ability for robots to sense impasses and send infinite edge weights to nodes Mobility of sensor nodes - reconfiguration for better coverage Have robots use “shortcuts" by interpolating between nodes

Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 19 / 19