for AI and Robotics Exploration and information gathering - PowerPoint PPT Presentation

Statistical Filtering and Control for AI and Robotics Exploration and information gathering Alessandro Farinelli

Outline • POMDPs – The POMDP model – Finite world POMDP algorithm – Point based value iteration • Exploration – Information gain – Exploration in occupancy grid maps – Extension to MRS • Acknowledgment: material based on – Thrun, Burgard, Fox; Probabilistic Robotics

POMDPs • In POMDPs we apply the same idea as in MDPs. • Since the state is not observable , the agent has to make its decisions based on the belief state which is a posterior distribution over states. • Let b be the belief of the agent about the state under consideration. • POMDPs compute a value function over belief space :         V ( b ) max r ( b , u ) V ( b ' ) p ( b ' | u , b ) db '  T T 1   u b '

Problems • Each belief is a probability distribution, thus, each value in a POMDP is a function of an entire probability distribution . • This is problematic, since probability distributions are continuous . • Additionally, we have to deal with the huge complexity of belief spaces . • For finite worlds with finite state, action, and measurement spaces and finite horizons, however, we can effectively represent the value functions by piecewise linear functions . – Possible because Expectation is a linear operator

Example measurements state x 1 action u 3 state x 2 measurements 0 . 2 0 . 8 z z 0 . 7 0 . 3 u 1 1 x x 3 u 1 2 z z 3 0 . 3 0 . 7 2 2 0 . 8 0 . 2 u u u u actions u 1, u 2 1 2 1 2   100 100 100 50 payoff payoff

Discussion on the example • The two states have different optimal actions – u2 in x1 and u1 in x2 • Action u3 is non deterministic, it flips state and acquires knowledge with a small cost – z1 increases confidence of being in x1 – z2 increases confidence of being in x2 – cost is -1 (see later) • Two states: belief is p1 = p(x1) – p(x2) = 1-p1     : 0 ; 1 u –

Payoff in POMDPs • In MDPs, the payoff (or reward) depends on the state of the system. • In POMDPs the true state is not exactly known. • Therefore, we compute the expected payoff by integrating over all states :        r b , u E r x , u x               r b , u r x ' , u p x ' dx ' p r x , u p r x , u 1 1 2 2 x '

Payoffs in the example I • If we are in x1 and execute u1 we receive -100 • If we are in x2 and execute u1 we receive +100 • When we are not certain of state we have a linear combination weighted with the probabilities:         r b , u 100 p 100 1 p 1 1 1        r b , u 100 p 50 1 p 2 1 1     r b , u 1 3

Payoffs in the example II

The resulting policy for T=1 • Finte POMDP with T=1, use V 1 (b) to determine the optimal policy: – Choose best next action among u1,u2,u3 • In our example, the optimal policy for T=1 is  3  u if p    1 1 7    b 3   u if p  2 1 7 • This is the upper thick graph in the diagram.

Piecewise, linearity and convexity • The resulting value function V 1 (b) is the maximum of the three functions at each point        100 p 100 1 p   1 1        100 p 50 1 p   V b max 1 1 1     1  • It is piecewise linear and convex.

Pruning • Only the first two components contribute. • The third component can be pruned away from V 1 (b) . • Pruning is crucial to have an efficient solution approach        100 p 100 1 p    1 1   V b max      1  100 p 50 1 p 1 1

Increasing the time horizon • Assume robot can make an observation before acting • Sensing will provide a better belief, how much better? V 1 (b)

Sensing • Suppose the robot perceives z 1. • Recall: – p(z 1 | x 1 )=0.7 and p(z 1 | x 2 )=0.3 . • Given the observation z 1 we update the belief using Bayes rule. p ( z | x ) p ( x ) 0 . 7 p    1 1 1 1 p ' p ( x | z ) 1 1 p ( z ) p ( z ) 1 1  p ( z | x ) p ( x ) 0 . 3 ( 1 p )    1 2 1 1 p ' p ( x | z ) 2 2 p ( z ) p ( z ) 1 1

Value Function considering z 1 V 1 (b) project b ’=p( x 1 |z 1 ) V 1 (b|z 1 )

Computing the new value function • Suppose the robot perceives z 1. • We update the belief using Bayes rule • We can compute V 1 (b | z 1 ) by replacing p1 with p’1:    0 . 7 p 0 . 3 ( 1 p )   1 1 100 100       p ( z ) p ( z )    V b | z max 1 1  1 1 0 . 7 p 0 . 3 ( 1 p )    1 1 100 50    p ( z ) p ( z )  1 1      70 p 30 ( 1 p ) 1  1 1   max      70 p 50 ( 1 p )  p z 1 1 1

Expected value after measuring • We do not know in advance what will be the next measurement • Need to compute the expectation 2        V b E V ( b | z ) p ( z ) V ( b | z ) 1 z 1 i 1 i  i 1   2 2 p ( z | x ) p         i 1 1 p ( z ) V V p ( z | x ) p   i 1 1 i 1 1  p ( z )    i 1 i 1 1

Expected value after measuring • We do not know in advance what will be the next measurement • Need to compute the expectation 2        V b E V ( b | z ) p ( z ) V ( b | z ) 1 z 1 i 1 i  i 1           70 p 30 ( 1 p ) 30 p 70 ( 1 p )   1 1 1 1     max max      70 p 15 ( 1 p )   30 p 35 ( 1 p )       1      1       1      1  p ( z ) V ( b | z ) p ( z ) V ( b | z ) 1 1 1 2 1 2

Resulting value function • Need to consider the four possible combinations and find the max • As before we can perform pruning         70 p 30 ( 1 p ) 30 p 70 ( 1 p ) 1 1 1 1           70 p 30 ( 1 p ) 30 p 35 ( 1 p )    1 1 1 1   V b max      1 70 p 15 ( 1 p ) 30 p 70 ( 1 p )   1 1 1 1         70 p 15 ( 1 p ) 30 p 35 ( 1 p )  1 1 1 1      100 p 100 ( 1 p ) 1 1        max 40 p 55 ( 1 p ) 1 1     100 p 50 ( 1 p )   1 1

Value Function considering sensing p(z 1 ) V 1 (b|z 1 ) u 1 u 2 unclear p(z 2 ) V 2 (b|z 2 )

State transition • Need to consider how actions affect the state • In our case u1 and u2 leads to final states and are deterministic • u3 has a non deterministic effect on the state 2          p ' E p x ' | x , u p x ' | x , u p 1 1 3 1 i 3 i  i 1    p ( x ' | x , u ) p p ( x ' | x , u )( 1 p ) 1 1 3 1 1 2 3 1      0 . 2 p 0 . 8 ( 1 p ) 0 . 8 0 . 6 p 1 1 1

State transition       p ' E p x ' | x , u 1 1 3 p '   1 0 . 8 0 . 6 p 1 p 1

Resulting value function after u 3   • Considering the state transition we can compute V b | u 1 3 • Substitute p’1 in p1      100 p ' 100 ( 1 p ' ) 1 1          V b | u max 40 p ' 55 ( 1 p ' ) 1 3 1 1     100 p ' 50 ( 1 p ' )   1 1     60 p 60 ( 1 p ) 1 1        max 52 p 43 ( 1 p ) 1 1      20 p 70 ( 1 p )   1 1

Value Function considering u 3 u 1 u 2 unclear project u 2 u 1 unclear

Resulting value function for T=2 • can execute any of the three actions u 1 , u 2, u 3      100 p 100 ( 1 p ) • need to discount cost for u 3 1 1     100 p 50 ( 1 p )   1 1          V b max 59 p 61 ( 1 p ) 2 1 1     52 p 42 ( 1 p )   1 1       21 p 69 ( 1 p )  1 1      100 p 100 ( 1 p ) 1 1        max 100 p 50 ( 1 p ) 1 1     52 p 42 ( 1 p )   1 1

Graphical representation for V 2 (b) u 2 optimal u 1 optimal unclear outcome of measurement is important here

Deep horizons and pruning • We have now completed a full backup in belief space. • This process can be applied recursively. • The value functions for T=10 and T=20 are

Importance of pruning   V b 1   V 1 b   V 2 b

for AI and Robotics Exploration and information gathering - PowerPoint PPT Presentation

Statistical Filtering and Control for AI and Robotics Exploration and information gathering Alessandro Farinelli Outline POMDPs The POMDP model Finite world POMDP algorithm Point based value iteration Exploration

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Sensors for Robotics

LEGO Develops a new LEGO Develops a new robotics platform - WeDo robotics platform - WeDo

Human-Oriented Robotics Octave/Matlab Tutorial Kai Arras Social Robotics Lab, University of

Robotics Engineering Prof. Michael Gennert Robotics Engineering Program Director Fall 2016

The Robot Operating System (ROS) Introduction, Concepts and Examples Stefano Rosa, 8/5/2015

ROBOTICS ROBOTICS A brief history A brief history Basilio Bona ROBOTICA 03CFIOR 1 Outline

Human-Oriented Robotics Basics of Probabilistic Reasoning Kai Arras Social Robotics Lab,

Human-Oriented Robotics Temporal Reasoning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Unsupervised Learning Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Probability Refresher Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Robot Motion Planning Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Supervised Learning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Supervised Learning Part 2/3 Kai Arras Social Robotics Lab, University

ROBOTICS 01PEEQW Basilio Bona DAUIN Politecnico di Torino Mobile & Service Robotics

Dialogue corpora NPFL070 December 11, 2019 (NPFL070) Dialogue corpora December 11, 2019 1 /

Improving Background Based Conversation with Context-aware Knowledge Pre-selection Pengjie Ren

Video Object Mining : Issues and Perspectives Jonathan Weber, S ebastien Lef` evre, Pierre

Bayesian Reinforcement Learning: A Survey Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau,

Numerical Simulations of the Wardle Instability Sam Falle, Department of Applied Mathematics,

Improving Reproducible Deep Learning Workflows with DeepDIVA M. Alberti 1 * , V. Pondenkandath 1*

Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning Battista Biggio * Slides

Scheduling multi-task applications on heterogeneous platforms Anne Benoit, Jean-Fran cois

for AI and Robotics Exploration and information gathering - PowerPoint PPT Presentation

Statistical Filtering and Control for AI and Robotics Exploration and information gathering Alessandro Farinelli Outline POMDPs The POMDP model Finite world POMDP algorithm Point based value iteration Exploration

Mobile &amp; Service Robotics Mobile &amp; Service Robotics Sensors for Robotics Sensors for

Mobile &amp; Service Robotics Mobile &amp; Service Robotics Sensors for Robotics Sensors for

Mobile &amp; Service Robotics Mobile &amp; Service Robotics Sensors for Sensors for Robotics

LEGO Develops a new LEGO Develops a new robotics platform - WeDo robotics platform - WeDo

Human-Oriented Robotics Octave/Matlab Tutorial Kai Arras Social Robotics Lab, University of

Robotics Engineering Prof. Michael Gennert Robotics Engineering Program Director Fall 2016

The Robot Operating System (ROS) Introduction, Concepts and Examples Stefano Rosa, 8/5/2015

ROBOTICS ROBOTICS A brief history A brief history Basilio Bona ROBOTICA 03CFIOR 1 Outline

Human-Oriented Robotics Basics of Probabilistic Reasoning Kai Arras Social Robotics Lab,

Human-Oriented Robotics Temporal Reasoning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Unsupervised Learning Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Probability Refresher Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Robot Motion Planning Kai Arras Social Robotics Lab, University of

Human-Oriented Robotics Supervised Learning Part 3/3 Kai Arras Social Robotics Lab, University

Human-Oriented Robotics Supervised Learning Part 2/3 Kai Arras Social Robotics Lab, University

ROBOTICS 01PEEQW Basilio Bona DAUIN Politecnico di Torino Mobile &amp; Service Robotics

Dialogue corpora NPFL070 December 11, 2019 (NPFL070) Dialogue corpora December 11, 2019 1 /

Improving Background Based Conversation with Context-aware Knowledge Pre-selection Pengjie Ren

Video Object Mining : Issues and Perspectives Jonathan Weber, S ebastien Lef` evre, Pierre

Bayesian Reinforcement Learning: A Survey Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau,

Numerical Simulations of the Wardle Instability Sam Falle, Department of Applied Mathematics,

Improving Reproducible Deep Learning Workflows with DeepDIVA M. Alberti 1 * , V. Pondenkandath 1*

Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning Battista Biggio * Slides

Scheduling multi-task applications on heterogeneous platforms Anne Benoit, Jean-Fran cois

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Robotics Sensors for

Mobile & Service Robotics Mobile & Service Robotics Sensors for Sensors for Robotics

ROBOTICS 01PEEQW Basilio Bona DAUIN Politecnico di Torino Mobile & Service Robotics