Statistical Filtering and Control for AI and Robotics
Alessandro Farinelli
for AI and Robotics Exploration and information gathering - - PowerPoint PPT Presentation
Statistical Filtering and Control for AI and Robotics Exploration and information gathering Alessandro Farinelli Outline POMDPs The POMDP model Finite world POMDP algorithm Point based value iteration Exploration
Alessandro Farinelli
– The POMDP model – Finite world POMDP algorithm – Point based value iteration
– Information gain – Exploration in occupancy grid maps – Extension to MRS
– Thrun, Burgard, Fox; Probabilistic Robotics
decisions based on the belief state which is a posterior distribution over states.
consideration.
' 1
b T u T
POMDP is a function of an entire probability distribution.
continuous.
belief spaces.
spaces and finite horizons, however, we can effectively represent the value functions by piecewise linear functions.
– Possible because Expectation is a linear operator
2
x
1
x
3
u
8 .
2
z
1
z
3
u
2 . 8 . 2 . 7 . 3 . 3 . 7 .
measurements action u3 state x2 payoff measurements
1
u
2
u
1
u
2
u
100 50 100 100
actions u1, u2 payoff state x1
1
z
2
z
– u2 in x1 and u1 in x2
acquires knowledge with a small cost
– z1 increases confidence of being in x1 – z2 increases confidence of being in x2 – cost is -1 (see later)
– p(x2) = 1-p1 –
x x
2 2 1 1 '
state of the system.
integrating over all states:
3 1 1 2 1 1 1
combination weighted with the probabilities:
– Choose best next action among u1,u2,u3
1 2 1 1
the three functions at each point
1 1 1 1 1
1 1 1 1 1
V1(b)
– p(z1 | x1)=0.7 and p(z1| x2)=0.3.
1 1 1 1 2 1 2 2 1 1 1 1 1 1 1 1
b’=p(x1|z1) V1(b) V1(b|z1)
project
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 2 1 1 1 1 1 2 1 1 1 1
i i i i i i i i z
) | ( ) ( 1 1 1 1 ) | ( ) ( 1 1 1 1 2 1 1 1 1
2 1 2 1 1 1
z b V z p z b V z p i i i z
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
p(z1) V1(b|z1) p(z2) V2(b|z2)
u1 u2
unclear
1 1 1 1 3 2 1 1 3 1 1 2 1 3 1 3 1 1
i i i
1 3 1 1
1
1
3 1
1 1 1 1 1 1 1 1 1 1 1 1 3 1
project
u1 u2
unclear
u2 u1
unclear
actions u1, u2, u3
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
measurement is important here
u1 optimal u2 optimal unclear
1
10547,864 linear functions.
function are the major reason why POMDPs are impractical for most applications.
– Exploiting the structure of the domain
– Belief points
function for at least one of the examples
– V contains only constraints that are supported by belief points
Value iteration with pruning T=30 PBVI T=30
position uncertain (particle filter)
Time to clear the room is (with high likelihood) not sufficient for intruder to pass by the corridor. The POMDP policy finds the best policy to detect the intruder considering uncertainty of state space.
– Find an intruder – Active localization – Acquire a map of a static environment
– Just need to build an appropriate reward function (e.g. reduction in entropy) – Not practical for most realistic applications
– Most of them are greedy
distribution
x p x p
and acquiring measurements
acquiring measurement z under the belief b
'
x b
control action: need to integrate z out to obtain
application domain
b b b
– Choose action that maximizes the expected utility
– information gain minus cost – must find a tradeoff between cost and gain
cost expected gain n informatio expected
x b b u
Long action sequences might be not executable
–
an obstacle .
–
– Occupancy values of individual cells are independent – Positions are known, map is static
i
i t t i
: 1 : 1
i
i t t i t t
: 1 : 1 : 1 : 1
CAD map
location, where information gain is maximal
robot action!)
The brighter a location the higher the entropy Entropy Occupancy map
i i i i i p
would acquire when close to a cell
measuring and the entropy after acquiring the possible measurement
b b b
' i p i p
i
m
' i i i i i p i t i t i t i i t i t
t
measuring free
i p i p i p
' ' '
' i p
' i p i p
i
m
– Explored: updated at least once – Unexplored: ever updated
the robot based on information maps
– Move to loc. (x,y) – Acquire info in a small radius
1 ) ( i i i j T j i adj j i T
i
m
– assign frontiers to different robots – greedily maximize exploration effect
– They are at same distance and do not coordinate
second robot
– Joint exploration will be much more effective
– Consider swapping the order of execution for robots
– i.e., robots share the same map
– Use optimal task assignment (e.g., Hungarian method) – Negotiation over tasks during execution (e.g., auctions) – Do not share maps continuously (e.g., plan for meetings) – …
– provide optimal policy considering belief states – are extremely hard to solve – effective for finite worlds and low dimensions
– POMDPs can represent the exploration problem – In most practical application need to exploit domain knowledge to have tractable algorithms – Entropy to guide the search – Very often simple approaches (e.g., binary gain) are very effective and extremely efficient – Interesting extensions for MRS
Material for the slides
15.3,15.5,17.1,17.2,17.4))
Further readings