for AI and Robotics Planning and Control: Markov Decision Processes - - PowerPoint PPT Presentation

for ai and robotics
SMART_READER_LITE
LIVE PREVIEW

for AI and Robotics Planning and Control: Markov Decision Processes - - PowerPoint PPT Presentation

Statistical Filtering and Control for AI and Robotics Planning and Control: Markov Decision Processes Alessandro Farinelli Outline Uncertainty: localization for mobile robots State estimation based on Bayesian filters [recall]


slide-1
SLIDE 1

Statistical Filtering and Control for AI and Robotics

Alessandro Farinelli

Planning and Control: Markov Decision Processes

slide-2
SLIDE 2

Outline

  • Uncertainty: localization for mobile robots

– State estimation based on Bayesian filters [recall]

  • Acting Under Uncertainty

– Markov Decision Problem – Solution approaches

  • Motion planning

– Markov Decision Processes for path planning

  • Acknowledgment: material based on

– Russel and Norvig; Artificial Intelligence: a Modern Approach – Thrun, Burgard, Fox; Probabilistic Robotics

slide-3
SLIDE 3

Mobile robots

slide-4
SLIDE 4

Sensors

slide-5
SLIDE 5

Uncertainty

  • pen = open a door

Will open actually open the door ? Problems:

  • 1) partial observability and noisy sensors
  • 2) uncertainty in action outcomes
  • 3) immense complexity of modelling and predicting

environment

slide-6
SLIDE 6

Probability

Probabilistic assertions summarize effects of

  • laziness (enumeration of all relevant facts),
  • ignorance (lack of relevant facts)

Subjective or Bayesian probability:

  • Probabilities relate propositions to one's own state of

knowledge

– P(open|I am in front of the door) = 0.6 – P(open|I am in front of the door; door is not locked) = 0.8

slide-7
SLIDE 7

Simple Example of State Estimation

Suppose a robot obtains measurement z What is P(open|z)?

slide-8
SLIDE 8

Causal vs. Diagnostic Reasoning

P(open|z) is diagnostic P(z|open) is causal Often causal knowledge is easier to obtain Bayes rule allows us to use causal knowledge:

) ( ) ( ) | ( ) | ( z P

  • pen

P

  • pen

z P z

  • pen

P 

count frequencies!

slide-9
SLIDE 9

Example

P(z|open) = 0.6 P(z|open) = 0.3 P(open) = P(open) = 0.5

67 . 3 2 5 . 3 . 5 . 6 . 5 . 6 . ) | ( ) ( ) | ( ) ( ) | ( ) ( ) | ( ) | (       

 z

  • pen

P

  • pen

p

  • pen

z P

  • pen

p

  • pen

z P

  • pen

P

  • pen

z P z

  • pen

P

z raises the probability that the door is open.

slide-10
SLIDE 10

Combining Evidence

Suppose our robot obtains another observation z2. How can we integrate this new information? More generally, how can we estimate P(x| z1...zn )?

slide-11
SLIDE 11

Recursive Bayesian Updating

) , , | ( ) , , | ( ) , , , | ( ) , , | (

1 1 1 1 1 1 1   

n n n n n n

z z z P z z x P z z x z P z z x P    

Markov assumption: zn independent of z1,...,zn-1 if we know x

) ( ) | ( ) , , | ( ) | ( ) , , | ( ) , , | ( ) | ( ) , , | (

... 1 ... 1 1 1 1 1 1 1 1

x P x z P z z x P x z P z z z P z z x P x z P z z x P

n i i n n n n n n n n

   

        

slide-12
SLIDE 12

Example: Second Measurement

P(z2|open) = 0.5 P(z2|open) = 0.6 P(open|z1)=2/3

625 . 8 5 3 1 5 3 3 2 2 1 3 2 2 1 ) | ( ) | ( ) | ( ) | ( ) | ( ) | ( ) , | (

1 2 1 2 1 2 1 2

      

 z

  • pen

P

  • pen

z P z

  • pen

P

  • pen

z P z

  • pen

P

  • pen

z P z z

  • pen

P

z2 lowers the probability that the door is open.

slide-13
SLIDE 13

Actions

Often the world is dynamic

– actions carried out by the robot, – actions carried out by other agents, – time passing by

How can we incorporate such actions?

slide-14
SLIDE 14

Typical Actions

The robot moves The robot moves objects People move around the robot Actions are never carried out with absolute certainty. In contrast to measurements, actions generally increase the uncertainty.

slide-15
SLIDE 15

Modeling Actions

To incorporate the outcome of an action u into the current “belief”, we use conditional pdf P(x’|u,x) This term specifies the pdf that executing u changes the state from x to x’.

15

slide-16
SLIDE 16

Example: Closing the door

slide-17
SLIDE 17

State Transitions

  • P(x’|u,x) for u = “close door”:
  • If the door is open, the action “close door” succeeds

in 90% of all cases.

  • pen

closed 0.1 1 0.9

slide-18
SLIDE 18

Integrating the Outcome of Actions

 dx x P x u x P u x P ) ( ) , | ' ( ) | ' (

 ) ( ) , | ' ( ) | ' ( x P x u x P u x P

Continuous case: Discrete case:

slide-19
SLIDE 19

Example: The Resulting Belief

) | ( 1 16 1 8 3 1 8 5 10 1 ) ( ) , | ( ) ( ) , | ( ) ( ) , | ( ) | ( 16 15 8 3 1 1 8 5 10 9 ) ( ) , | ( ) ( ) , | ( ) ( ) , | ( ) | ( u closed P closed P closed u

  • pen

P

  • pen

P

  • pen

u

  • pen

P x P x u

  • pen

P u

  • pen

P closed P closed u closed P

  • pen

P

  • pen

u closed P x P x u closed P u closed P                  

 

slide-20
SLIDE 20

Bayes Filters: Framework

  • Given:

– Stream of observations z and action data u: – Sensor model P(z|x) – Action model P(x’|u,x) – Prior probability of the system state P(x)

  • Compute:

– Estimate of the state X of a dynamical system – The posterior of the state is also called Belief:

) , , , | ( ) (

1 1 t t t t

z u z u x P x Bel   } , , , {

1 1 t t t

z u z u d  

slide-21
SLIDE 21

Markov Assumption

Underlying Assumptions

  • Static world (no one else changes the world)
  • Independent noise (over time)
  • Perfect model, no approximation errors

) , | ( ) , , | (

1 : 1 : 1 1 : 1 t t t t t t t

u x x p u z x x p

 

) | ( ) , , | (

: 1 : 1 : t t t t t t

x z p u z x z p 

slide-22
SLIDE 22

Bayes Filters

1 1 1

) ( ) , | ( ) | (

  

t t t t t t t

dx x Bel x u x P x z P  ) , , , | ( ) , , , , | (

1 1 1 1 t t t t t

u z u x P u z u x z P    

Bayes z = observation u = action x = state

) , , , | ( ) (

1 1 t t t t

z u z u x P x Bel  

Markov

) , , , | ( ) | (

1 1 t t t t

u z u x P x z P   

Markov 1 1 1 1 1

) , , , | ( ) , | ( ) | (

  

t t t t t t t t

dx u z u x P x u x P x z P  

1 1 1 1 1 1 1

) , , , | ( ) , , , , | ( ) | (

  

t t t t t t t t

dx u z u x P x u z u x P x z P   

Total prob. Markov 1 1 1 1 1 1

) , , , | ( ) , | ( ) | (

   

t t t t t t t t

dx z z u x P x u x P x z P  

slide-23
SLIDE 23

Bayes Filter Algorithm

1. Algorithm Bayes_filter( Bel(x),d ):

2. 0 3. If d is a perceptual data item z then 4. For all x do 5. 6. 7. For all x do 8. 9. Else if d is an action data item u then 10. For all x’ do 11. 12. Return Bel’(x)

) ( ) | ( ) ( ' x Bel x z P x Bel  ) ( ' x Bel    ) ( ' ) ( '

1

x Bel x Bel

 dx x Bel x u x P x Bel ) ( ) , | ' ( ) ' ( '

1 1 1

) ( ) , | ( ) | ( ) (

  

t t t t t t t t

dx x Bel x u x P x z P x Bel 

slide-24
SLIDE 24

Bayes Filters are Familiar!

Kalman filters Particle filters Hidden Markov models Dynamic Bayesian networks Partially Observable Markov Decision Processes (POMDPs)

1 1 1

) ( ) , | ( ) | ( ) (

  

t t t t t t t t

dx x Bel x u x P x z P x Bel 

slide-25
SLIDE 25

Bayesian filters for localization

How do I know whether I am in front of the door ? Localization as a state estimation process (filtering)

State update Sensor Reading

slide-26
SLIDE 26

Kalman Filter for Localization

Gaussian pdf for belief

  • Pros: closed form representation, very fast update
  • Cons:

Works only for linear action and sensor models (can use EKF to overcome this) Works well only for unimodal beliefs

slide-27
SLIDE 27

Particle filters

Particles to represent the belief Pros: no assumption on belief, action and sensor models Cons: update can be computationally demanding

slide-28
SLIDE 28

Particle Filters: prior

slide-29
SLIDE 29

Particle Filters: bimodal belief

slide-30
SLIDE 30

Particle Filters: Unimodal beliefs

slide-31
SLIDE 31

Mapping and SLAM

Localization: given map and observations, update pose estimation Mapping: given pose and observation, update map SLAM: given observations, update map and pose New observations increase uncertainty Loop closures reduce uncertainty

slide-32
SLIDE 32

SLAM in action

Courtesy of Sebastian Thrun and Dirk Haehnel ( link for the video)

slide-33
SLIDE 33

Markov Decision Process

  • Mathematical model to plan sequences of actions in

face of uncertainty

slide-34
SLIDE 34

Example MDP

slide-35
SLIDE 35

Solving MDPs

slide-36
SLIDE 36

Risk and Reward

slide-37
SLIDE 37

Utility of State Sequences

slide-38
SLIDE 38

Utility of States

slide-39
SLIDE 39

MDPs for mobile robots

Optimal path (shortest) if actions are deterministic Optimal path (safer) if actions are NOT deterministic

slide-40
SLIDE 40

MDPs for mobile robots: formalization

Input:

  • States x (Assume state is known)
  • Actions u
  • Transition probabilities p(x‘|u,x)
  • Reward / payoff function r(x,u)
  • Note: now reward depends on state and action. This is a

different notation, but the core concepts do not change.

  • utput
  • Policy p(x) that maximizes the future expected

reward

slide-41
SLIDE 41

Rewards and Policies

  • Policy (general case):
  • Policy (fully observable case):
  • Expected cumulative payoff:

– T=1: greedy policy – T>1: finite horizon case, typically no discount – T=infty: infinite-horizon case, finite reward if discount < 1

t t

u x  : p       

  T t T

r E R

1   

t t t

u u z 

  1 : 1 1 : 1

, : p

slide-42
SLIDE 42

Main concepts for Policies

  • Expected cumulative payoff of policy:
  • Optimal policy:
  • 1-step optimal policy:
  • Value function of 1-step optimal policy:

) , ( argmax ) (

1

u x r x

u

 p

   

x x r u x r x V

u 1 1

, ) , ( max ) ( p    

 

       

       T t t t t t T

u z u r E x R

1 1 : 1 1 : 1

) , ( |

      p

p 

) ( max arg

* t T x

Rp

p

p 

slide-43
SLIDE 43

2-step policies

  • Optimal Policy
  • Value function

       

       

' 1 2

' , | ' ' , max arg

x u

dx x u x p x V u x r x p

       

       

' 1 2

' , | ' ' , max

x u

dx x u x p x V u x r x V 

slide-44
SLIDE 44

T-step policies

  • Optimal Policy
  • Value function

       

       

 ' 1

' , | ' ' , max arg

x T u T

dx x u x p x V u x r x p

       

       

 ' 1

' , | ' ' , max

x T u T

dx x u x p x V u x r x V 

slide-45
SLIDE 45

Infinite Horizon

  • Optimal Value Funciton (Bellman equation)
  • Can be used to compute the optimal policy

       

       

  '

' , | ' ' , max

x u

dx x u x p x V u x r x V 

       

       

 ' *

' , | ' ' , max arg

x u

dx x u x p x V u x r x p

slide-46
SLIDE 46

Value Iteration: idea

  • Initialize V with random value
  • Untill no change

– For all state x

  • Update V(x) to make it locally consistent
slide-47
SLIDE 47

Value Iteration

  • For all x
  • Repeat untill convergence

– For all x

  • Compute

       

       

'

' , | ' ' ˆ , max ˆ

x u

dx x u x p x V u x r x V 

       

       

' *

' , | ' ' ˆ , max arg

x u

dx x u x p x V u x r x p

 

min

ˆ r x V 

slide-48
SLIDE 48

Value function and policy iteraton

  • Often the optimal policy has been reached long

before the value function has converged.

  • Policy iteration calculates a new policy based on the

current value function and then calculates a new value function based on this policy.

  • This process often converges faster to the optimal

policy.

       

       

'

' ), ( | ' ' ˆ ) ( , ˆ

x

dx x x x p x V x x r x V p p 

slide-49
SLIDE 49

Motion Planning for Mobile Robots

Plan for motion in free configuration space (not workspace) configuration space workspace

slide-50
SLIDE 50

Configuration Space Planning

Convert free configuration space in finite state space Cell decomposition Skeletonization (PRM)

slide-51
SLIDE 51

Planning the motion

Given finite state space representing free configuration space Find a sequence of states from start to goal Several approaches: Rapidly-exploring Random Trees (RRT) Potential Fields Markov Decision Processes (i.e. building a navigation function)

slide-52
SLIDE 52

MDP for robot navigation

NOTE: pose (i.e., state) is unknown  not an MDP!

  • Assume localization works (decently)
  • State is the most probable pose (mode of the posterior)
slide-53
SLIDE 53

Summary

  • Robots must consider uncertainty when planning
  • Markov Decision Processes

– Powerful model to plan a sequence of actions under uncertainty – Key point: define value of states considering expected cumulative reward – Value (policy) iteration to solve the model

  • Motion Planning:

– Planning problem in finite state space (C-free) – MDPs powerful techniques to build navigation functions (for low- dimension)

slide-54
SLIDE 54

References and Further Readings

Material for the slides

  • Russel and Norvig; Artificial Intelligence a Modern Approach

(Chapter 25)

  • Thrun, Burgard, Fox; Probabilistic Robotics (Chapters 2 and

14)

Further readings

  • Latombe; Robot Motion Planning
  • La Valle, Kuffner; Randomized Kinodynamic Planning
  • Thrun,Fox,Burgard; A probabilistic approach to concurrent

mapping and localization for mobile robots