for ai and robotics
play

for AI and Robotics Exploration and information gathering - PowerPoint PPT Presentation

Statistical Filtering and Control for AI and Robotics Exploration and information gathering Alessandro Farinelli Outline POMDPs The POMDP model Finite world POMDP algorithm Point based value iteration Exploration


  1. Statistical Filtering and Control for AI and Robotics Exploration and information gathering Alessandro Farinelli

  2. Outline • POMDPs – The POMDP model – Finite world POMDP algorithm – Point based value iteration • Exploration – Information gain – Exploration in occupancy grid maps – Extension to MRS • Acknowledgment: material based on – Thrun, Burgard, Fox; Probabilistic Robotics

  3. POMDPs • In POMDPs we apply the same idea as in MDPs. • Since the state is not observable , the agent has to make its decisions based on the belief state which is a posterior distribution over states. • Let b be the belief of the agent about the state under consideration. • POMDPs compute a value function over belief space :         V ( b ) max r ( b , u ) V ( b ' ) p ( b ' | u , b ) db '  T T 1   u b '

  4. Problems • Each belief is a probability distribution, thus, each value in a POMDP is a function of an entire probability distribution . • This is problematic, since probability distributions are continuous . • Additionally, we have to deal with the huge complexity of belief spaces . • For finite worlds with finite state, action, and measurement spaces and finite horizons, however, we can effectively represent the value functions by piecewise linear functions . – Possible because Expectation is a linear operator

  5. Example measurements state x 1 action u 3 state x 2 measurements 0 . 2 0 . 8 z z 0 . 7 0 . 3 u 1 1 x x 3 u 1 2 z z 3 0 . 3 0 . 7 2 2 0 . 8 0 . 2 u u u u actions u 1, u 2 1 2 1 2   100 100 100 50 payoff payoff

  6. Discussion on the example • The two states have different optimal actions – u2 in x1 and u1 in x2 • Action u3 is non deterministic, it flips state and acquires knowledge with a small cost – z1 increases confidence of being in x1 – z2 increases confidence of being in x2 – cost is -1 (see later) • Two states: belief is p1 = p(x1) – p(x2) = 1-p1     : 0 ; 1 u –

  7. Payoff in POMDPs • In MDPs, the payoff (or reward) depends on the state of the system. • In POMDPs the true state is not exactly known. • Therefore, we compute the expected payoff by integrating over all states :        r b , u E r x , u x               r b , u r x ' , u p x ' dx ' p r x , u p r x , u 1 1 2 2 x '

  8. Payoffs in the example I • If we are in x1 and execute u1 we receive -100 • If we are in x2 and execute u1 we receive +100 • When we are not certain of state we have a linear combination weighted with the probabilities:         r b , u 100 p 100 1 p 1 1 1        r b , u 100 p 50 1 p 2 1 1     r b , u 1 3

  9. Payoffs in the example II

  10. The resulting policy for T=1 • Finte POMDP with T=1, use V 1 (b) to determine the optimal policy: – Choose best next action among u1,u2,u3 • In our example, the optimal policy for T=1 is  3  u if p    1 1 7    b 3   u if p  2 1 7 • This is the upper thick graph in the diagram.

  11. Piecewise, linearity and convexity • The resulting value function V 1 (b) is the maximum of the three functions at each point        100 p 100 1 p   1 1        100 p 50 1 p   V b max 1 1 1     1  • It is piecewise linear and convex.

  12. Pruning • Only the first two components contribute. • The third component can be pruned away from V 1 (b) . • Pruning is crucial to have an efficient solution approach        100 p 100 1 p    1 1   V b max      1  100 p 50 1 p 1 1

  13. Increasing the time horizon • Assume robot can make an observation before acting • Sensing will provide a better belief, how much better? V 1 (b)

  14. Sensing • Suppose the robot perceives z 1. • Recall: – p(z 1 | x 1 )=0.7 and p(z 1 | x 2 )=0.3 . • Given the observation z 1 we update the belief using Bayes rule. p ( z | x ) p ( x ) 0 . 7 p    1 1 1 1 p ' p ( x | z ) 1 1 p ( z ) p ( z ) 1 1  p ( z | x ) p ( x ) 0 . 3 ( 1 p )    1 2 1 1 p ' p ( x | z ) 2 2 p ( z ) p ( z ) 1 1

  15. Value Function considering z 1 V 1 (b) project b ’=p( x 1 |z 1 ) V 1 (b|z 1 )

  16. Computing the new value function • Suppose the robot perceives z 1. • We update the belief using Bayes rule • We can compute V 1 (b | z 1 ) by replacing p1 with p’1:    0 . 7 p 0 . 3 ( 1 p )   1 1 100 100       p ( z ) p ( z )    V b | z max 1 1  1 1 0 . 7 p 0 . 3 ( 1 p )    1 1 100 50    p ( z ) p ( z )  1 1      70 p 30 ( 1 p ) 1  1 1   max      70 p 50 ( 1 p )  p z 1 1 1

  17. Expected value after measuring • We do not know in advance what will be the next measurement • Need to compute the expectation 2        V b E V ( b | z ) p ( z ) V ( b | z ) 1 z 1 i 1 i  i 1   2 2 p ( z | x ) p         i 1 1 p ( z ) V V p ( z | x ) p   i 1 1 i 1 1  p ( z )    i 1 i 1 1

  18. Expected value after measuring • We do not know in advance what will be the next measurement • Need to compute the expectation 2        V b E V ( b | z ) p ( z ) V ( b | z ) 1 z 1 i 1 i  i 1           70 p 30 ( 1 p ) 30 p 70 ( 1 p )   1 1 1 1     max max      70 p 15 ( 1 p )   30 p 35 ( 1 p )       1      1       1      1  p ( z ) V ( b | z ) p ( z ) V ( b | z ) 1 1 1 2 1 2

  19. Resulting value function • Need to consider the four possible combinations and find the max • As before we can perform pruning         70 p 30 ( 1 p ) 30 p 70 ( 1 p ) 1 1 1 1           70 p 30 ( 1 p ) 30 p 35 ( 1 p )    1 1 1 1   V b max      1 70 p 15 ( 1 p ) 30 p 70 ( 1 p )   1 1 1 1         70 p 15 ( 1 p ) 30 p 35 ( 1 p )  1 1 1 1      100 p 100 ( 1 p ) 1 1        max 40 p 55 ( 1 p ) 1 1     100 p 50 ( 1 p )   1 1

  20. Value Function considering sensing p(z 1 ) V 1 (b|z 1 ) u 1 u 2 unclear p(z 2 ) V 2 (b|z 2 )

  21. State transition • Need to consider how actions affect the state • In our case u1 and u2 leads to final states and are deterministic • u3 has a non deterministic effect on the state 2          p ' E p x ' | x , u p x ' | x , u p 1 1 3 1 i 3 i  i 1    p ( x ' | x , u ) p p ( x ' | x , u )( 1 p ) 1 1 3 1 1 2 3 1      0 . 2 p 0 . 8 ( 1 p ) 0 . 8 0 . 6 p 1 1 1

  22. State transition       p ' E p x ' | x , u 1 1 3 p '   1 0 . 8 0 . 6 p 1 p 1

  23. Resulting value function after u 3   • Considering the state transition we can compute V b | u 1 3 • Substitute p’1 in p1      100 p ' 100 ( 1 p ' ) 1 1          V b | u max 40 p ' 55 ( 1 p ' ) 1 3 1 1     100 p ' 50 ( 1 p ' )   1 1     60 p 60 ( 1 p ) 1 1        max 52 p 43 ( 1 p ) 1 1      20 p 70 ( 1 p )   1 1

  24. Value Function considering u 3 u 1 u 2 unclear project u 2 u 1 unclear

  25. Resulting value function for T=2 • can execute any of the three actions u 1 , u 2, u 3      100 p 100 ( 1 p ) • need to discount cost for u 3 1 1     100 p 50 ( 1 p )   1 1          V b max 59 p 61 ( 1 p ) 2 1 1     52 p 42 ( 1 p )   1 1       21 p 69 ( 1 p )  1 1      100 p 100 ( 1 p ) 1 1        max 100 p 50 ( 1 p ) 1 1     52 p 42 ( 1 p )   1 1

  26. Graphical representation for V 2 (b) u 2 optimal u 1 optimal unclear outcome of measurement is important here

  27. Deep horizons and pruning • We have now completed a full backup in belief space. • This process can be applied recursively. • The value functions for T=10 and T=20 are

  28. Importance of pruning   V b 1   V 1 b   V 2 b

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend