Statistical Filtering and Control for AI and Robotics
Alessandro Farinelli
Planning and Control: Markov Decision Processes
for AI and Robotics Planning and Control: Markov Decision Processes - - PowerPoint PPT Presentation
Statistical Filtering and Control for AI and Robotics Planning and Control: Markov Decision Processes Alessandro Farinelli Outline Uncertainty: localization for mobile robots State estimation based on Bayesian filters [recall]
Alessandro Farinelli
Planning and Control: Markov Decision Processes
– State estimation based on Bayesian filters [recall]
– Markov Decision Problem – Solution approaches
– Markov Decision Processes for path planning
– Russel and Norvig; Artificial Intelligence: a Modern Approach – Thrun, Burgard, Fox; Probabilistic Robotics
Will open actually open the door ? Problems:
environment
Probabilistic assertions summarize effects of
Subjective or Bayesian probability:
knowledge
– P(open|I am in front of the door) = 0.6 – P(open|I am in front of the door; door is not locked) = 0.8
Suppose a robot obtains measurement z What is P(open|z)?
P(open|z) is diagnostic P(z|open) is causal Often causal knowledge is easier to obtain Bayes rule allows us to use causal knowledge:
) ( ) ( ) | ( ) | ( z P
P
z P z
P
count frequencies!
P(z|open) = 0.6 P(z|open) = 0.3 P(open) = P(open) = 0.5
67 . 3 2 5 . 3 . 5 . 6 . 5 . 6 . ) | ( ) ( ) | ( ) ( ) | ( ) ( ) | ( ) | (
z
P
p
z P
p
z P
P
z P z
P
z raises the probability that the door is open.
Suppose our robot obtains another observation z2. How can we integrate this new information? More generally, how can we estimate P(x| z1...zn )?
) , , | ( ) , , | ( ) , , , | ( ) , , | (
1 1 1 1 1 1 1
n n n n n n
z z z P z z x P z z x z P z z x P
Markov assumption: zn independent of z1,...,zn-1 if we know x
) ( ) | ( ) , , | ( ) | ( ) , , | ( ) , , | ( ) | ( ) , , | (
... 1 ... 1 1 1 1 1 1 1 1
x P x z P z z x P x z P z z z P z z x P x z P z z x P
n i i n n n n n n n n
P(z2|open) = 0.5 P(z2|open) = 0.6 P(open|z1)=2/3
625 . 8 5 3 1 5 3 3 2 2 1 3 2 2 1 ) | ( ) | ( ) | ( ) | ( ) | ( ) | ( ) , | (
1 2 1 2 1 2 1 2
z
P
z P z
P
z P z
P
z P z z
P
z2 lowers the probability that the door is open.
Often the world is dynamic
– actions carried out by the robot, – actions carried out by other agents, – time passing by
How can we incorporate such actions?
The robot moves The robot moves objects People move around the robot Actions are never carried out with absolute certainty. In contrast to measurements, actions generally increase the uncertainty.
To incorporate the outcome of an action u into the current “belief”, we use conditional pdf P(x’|u,x) This term specifies the pdf that executing u changes the state from x to x’.
15
in 90% of all cases.
closed 0.1 1 0.9
Continuous case: Discrete case:
) | ( 1 16 1 8 3 1 8 5 10 1 ) ( ) , | ( ) ( ) , | ( ) ( ) , | ( ) | ( 16 15 8 3 1 1 8 5 10 9 ) ( ) , | ( ) ( ) , | ( ) ( ) , | ( ) | ( u closed P closed P closed u
P
P
u
P x P x u
P u
P closed P closed u closed P
P
u closed P x P x u closed P u closed P
– Stream of observations z and action data u: – Sensor model P(z|x) – Action model P(x’|u,x) – Prior probability of the system state P(x)
– Estimate of the state X of a dynamical system – The posterior of the state is also called Belief:
) , , , | ( ) (
1 1 t t t t
z u z u x P x Bel } , , , {
1 1 t t t
z u z u d
Underlying Assumptions
) , | ( ) , , | (
1 : 1 : 1 1 : 1 t t t t t t t
u x x p u z x x p
) | ( ) , , | (
: 1 : 1 : t t t t t t
x z p u z x z p
1 1 1
) ( ) , | ( ) | (
t t t t t t t
dx x Bel x u x P x z P ) , , , | ( ) , , , , | (
1 1 1 1 t t t t t
u z u x P u z u x z P
Bayes z = observation u = action x = state
) , , , | ( ) (
1 1 t t t t
z u z u x P x Bel
Markov
) , , , | ( ) | (
1 1 t t t t
u z u x P x z P
Markov 1 1 1 1 1
) , , , | ( ) , | ( ) | (
t t t t t t t t
dx u z u x P x u x P x z P
1 1 1 1 1 1 1
) , , , | ( ) , , , , | ( ) | (
t t t t t t t t
dx u z u x P x u z u x P x z P
Total prob. Markov 1 1 1 1 1 1
) , , , | ( ) , | ( ) | (
t t t t t t t t
dx z z u x P x u x P x z P
1. Algorithm Bayes_filter( Bel(x),d ):
2. 0 3. If d is a perceptual data item z then 4. For all x do 5. 6. 7. For all x do 8. 9. Else if d is an action data item u then 10. For all x’ do 11. 12. Return Bel’(x)
) ( ) | ( ) ( ' x Bel x z P x Bel ) ( ' x Bel ) ( ' ) ( '
1
x Bel x Bel
dx x Bel x u x P x Bel ) ( ) , | ' ( ) ' ( '
1 1 1
) ( ) , | ( ) | ( ) (
t t t t t t t t
dx x Bel x u x P x z P x Bel
Kalman filters Particle filters Hidden Markov models Dynamic Bayesian networks Partially Observable Markov Decision Processes (POMDPs)
1 1 1
) ( ) , | ( ) | ( ) (
t t t t t t t t
dx x Bel x u x P x z P x Bel
How do I know whether I am in front of the door ? Localization as a state estimation process (filtering)
State update Sensor Reading
Gaussian pdf for belief
Works only for linear action and sensor models (can use EKF to overcome this) Works well only for unimodal beliefs
Particles to represent the belief Pros: no assumption on belief, action and sensor models Cons: update can be computationally demanding
Localization: given map and observations, update pose estimation Mapping: given pose and observation, update map SLAM: given observations, update map and pose New observations increase uncertainty Loop closures reduce uncertainty
Courtesy of Sebastian Thrun and Dirk Haehnel ( link for the video)
face of uncertainty
Optimal path (shortest) if actions are deterministic Optimal path (safer) if actions are NOT deterministic
Input:
different notation, but the core concepts do not change.
reward
– T=1: greedy policy – T>1: finite horizon case, typically no discount – T=infty: infinite-horizon case, finite reward if discount < 1
t t
u x : p
T t T
r E R
1
t t t
u u z
1 : 1 1 : 1
, : p
) , ( argmax ) (
1
u x r x
u
p
x x r u x r x V
u 1 1
, ) , ( max ) ( p
T t t t t t T
u z u r E x R
1 1 : 1 1 : 1
) , ( |
p
p
) ( max arg
* t T x
Rp
p
p
' 1 2
' , | ' ' , max arg
x u
dx x u x p x V u x r x p
' 1 2
' , | ' ' , max
x u
dx x u x p x V u x r x V
' 1
' , | ' ' , max arg
x T u T
dx x u x p x V u x r x p
' 1
' , | ' ' , max
x T u T
dx x u x p x V u x r x V
'
' , | ' ' , max
x u
dx x u x p x V u x r x V
' *
' , | ' ' , max arg
x u
dx x u x p x V u x r x p
– For all state x
– For all x
'
' , | ' ' ˆ , max ˆ
x u
dx x u x p x V u x r x V
' *
' , | ' ' ˆ , max arg
x u
dx x u x p x V u x r x p
min
ˆ r x V
before the value function has converged.
current value function and then calculates a new value function based on this policy.
policy.
'
' ), ( | ' ' ˆ ) ( , ˆ
x
dx x x x p x V x x r x V p p
Plan for motion in free configuration space (not workspace) configuration space workspace
Convert free configuration space in finite state space Cell decomposition Skeletonization (PRM)
Given finite state space representing free configuration space Find a sequence of states from start to goal Several approaches: Rapidly-exploring Random Trees (RRT) Potential Fields Markov Decision Processes (i.e. building a navigation function)
NOTE: pose (i.e., state) is unknown not an MDP!
– Powerful model to plan a sequence of actions under uncertainty – Key point: define value of states considering expected cumulative reward – Value (policy) iteration to solve the model
– Planning problem in finite state space (C-free) – MDPs powerful techniques to build navigation functions (for low- dimension)
Material for the slides
(Chapter 25)
14)
Further readings
mapping and localization for mobile robots