 
              Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes
Introduction • In several control problems only an output (subset of states) is available for feedback. • For example when controlling the position of a mass on a surface using only position sensors, the state (which includes velocity) is not fully known. • More generally the output might not provide full information about the state but only partial information • In this lecture we discuss Partially Observable Markov Decision Problems (POMDP). 1
Outline • Formulation of POMDP • Bayes filter • Solving POMDP
Problem formulation Stochastic disturbances Dynamic model initial state stochastic x k +1 = f k ( x k , u k , w k ) with known h − 1 Prob[ x k = i ] = p 0 ,i X Cost g k ( x k , u k , w k ) + g h ( x h ) , noise k =0 Output y k = h k ( x k , n k ) y k ∈ Y k , n k ∈ N k Information set I k = ( y 0 , y 1 , . . . , y k , u 0 , u 1 , . . . , u k − 1 ) I 0 = ( y 0 ) k ≥ 1 , , The state, input and disturbances live in the sets defined in Lec 2 (slides 3, 16), the output and noise live in finite spaces Y k := { 1 , . . . , q k,i } N k := { 1 , . . . , ν k,i } in general dependent on state x k = i Problem: Find policy , that minimizes π = { µ 0 , . . . , µ h − 1 } u k = µ k ( I k ) h − 1 X J π = E [ g k ( x k , µ k ( I k ) , w k ) + g h ( x k )] 2 k =0
First approach We can reformulate a partial information optimal control problem as a standard full information optimal control problem by considering the state to be the information set. The problem is that the state space dimension increases exponentially. y 1 y 1 Measurement y 0 y 0 space y 0 y 2 Information I 0 = ( y 0 ) I 2 = ( y 0 , y 1 , y 2 , u 0 , u 1 ) I 1 = ( y 0 , y 1 , u 0 ) set Stage 2 Stage 0 Stage 1 3
Second approach It is possible to show that the knowledge of the probability distribution of the state given the information obtained so far denoted by is sufficient to determine optimal decisions P x k | I k (Bertsekas’ book, Section 5.4) u k = µ k ( P x k | I k ) P x 2 | I 2 Decision and P x 1 | I 1 Probability observation 1 space P x 0 | I 0 Decision and Decision and observation 1 P x 2 | I 2 observation 2 Initial state distribution P x 1 | I 1 Decision and observation2 Decision and observation 1 u 0 Decision and observation 2 Stage 0 Stage 1 Stage 2 4
Decomposition of the optimal policy Then the structure of the optimal decision-maker is (see Bertsekas’ book, Section 5.4): Dynamics Output x k y k = h k ( x k , n k ) x k +1 = f k ( x k , u k , w k ) Delay Actuator P x k | I k u k State estimator u k − 1 optimal µ k y k Policy That is, the optimal policy can be decomposed into: an estimator of and a map from P x k | I k to actions. P x k | I k 5
Discussion • The first approach is typically impractical for applications. • The second approach is often used in robotics. • To compute a crucial step is the Bayes’ rule and the state P x k | I k estimator is the Bayes filter. • The Bayes filter is important per se and we will start by studying it. 6
Outline • Formulation of POMDP • Bayes filter • Solving POMDP
Bayes’ rule The Bayes filter relies on the basic Bayes’ rule. Suppose then x ∈ { 1 , . . . , n } y ∈ { 1 , . . . , m } Prob[ x = i | y = j ] = Prob[ y = j | x = i ]Prob[ x = i ] Prob[ y = j | x = i ]Prob[ x = i ] = Prob[ y = j ] P n ` =1 Prob[ y = j | x = ` ]Prob[ x = ` ] Example: • Think of as a sensor measurement and think of as the state (see figure). If y = 1 y x what can you tell about the state? x = 3 x = 2 x = 1 y = 1 y = 2 Probability space • The Bayes’ rule allow to infer something about the state a posteriori of the sensor measurement from a priori information , . Prob[ x = i | y = j ] Prob[ x = i ] Prob[ y = j | x = i ] 7
Example An international student is deciding if it is worthwhile to design a hobby alarm (using an ultrasound sensor close to the door) like his dorm mates to detect burglars when she goes to her home country to spend Christmas. The alarm is faulty and characterised by Burglar breaks in Burglar does not break in B ¬ B Alarm does not go off Prob[ ¬ A |¬ B ] = 0 . 9 Prob[ ¬ A | B ] = 0 . 01 A Alarm goes off Prob[ A |¬ B ] = 0 . 1 ¬ A Prob[ A | B ] = 0 . 99 and historical data reveals that 2 out of 100 rooms are robbed each Christmas, i.e., Prob[ B ] = 0 . 02 , Prob[ ¬ B ] = 0 . 98 Her total belongings in the room amount to 2000 euros and asking a security agency to check her room whenever she calls (after the alarm indicates a break in) costs 300 euros and therefore she will only call if or equivalently Prob[ B | A ]2000 > 300 Prob[ B | A ] > 0 . 15 Shall she buy the alarm? 8
Computing Prob[ B | A ] We can simply use the a priori probabilities Prob[ B | A ] = Prob[ A | B ]Prob[ B ] = 0 . 99 × 0 . 02 Prob[ A ] Prob[ A ] And compute from the fact that where Prob[ A ] Prob[ B | A ] + Prob[ ¬ B | A ] = 1 Prob[ ¬ B | A ] = Prob[ A |¬ B ]Prob[ ¬ B ] = 0 . 1 × 0 . 98 Prob[ A ] Prob[ A ] Thus Prob[ A ] = 0 . 1 × 0 . 98 + 0 . 99 × 0 . 02 = 0 . 1178 Prob[ B | A ] = 0 . 1681 (yes, buy!) Equivalently we can directly apply the formula Prob[ A | B ]Prob[ B ] Prob[ B | A ] = Prob[ A | B ]Prob[ B ] + Prob[ A |¬ B ]Prob[ ¬ B ] 9
Thomas Bayes Historical note Thomas Bayes (1701 – 1761) was an English statistician, philosopher and Presbyterian minister who is known for having formulated a specific case of the theorem that bears his name: Bayes' theorem. Bayes never published what would eventually become his most famous accomplishment; his notes were edited and published after his death (source wikipedia). 10
Problem formulation How to find a state estimator to compute ? P x k | I k y k x k +1 = f k ( x k , u k , w k ) P x k | I k State estimator y k = h k ( x k , n k ) u k − 1 Note that can be used to compute any quantity of interest about P x k | I k the state (e.g. mean, median, variance, etc). In particular a typical state estimate is the mean, obtained by making equal to identity in a E [ a ( x k ) | I k ] = P n i =1 a ( i ) P x k = i | I k 11
Preliminaries For simplicity we will assume that the input, disturbance, output, and noise spaces do not change with the state (but can still depend on time). w k ∈ { 1 , . . . , ω k } u k ∈ { 1 , . . . , m k } x k ∈ { 1 , . . . , ¯ n k } y k ∈ { 1 , . . . , q k } n k ∈ { 1 , . . . , ν k } Let us start by defining a different representation of the dynamic and output maps in terms of the following matrices for each j ∈ { 1 , . . . , m k } k ∈ { 0 , . . . , h − 1 } dimensions Components state P k ( j ) [ P k ( j )] ri = Prob[ x k +1 = r | x k = i, u k = j ] n k × ¯ ¯ n k transition matrices output [ R k ( j )] ` I = Prob[ y k = ` | x k = i ] R k q k × ¯ n k matrices 12
Finding state transition matrices Use: X [ P k ( j )] ri = Prob[ x k +1 = r | x k = i ∧ u k = j ] = Prob[ w k = ι | x k = i ∧ u k = j ] { ι | f k ( i,j, ι )= r } Example ( , fixed ) n k = m k = ω k = 2 ¯ k f k ( x k , u k , w k ) w k = 2 w k = 1  �  � 0 . 8 0 . 2 1 1 P k (1) = P k (2) = 0 . 2 0 . 8 0 0 x k = 1 u k = 1 1 2 u k = 2 1 1 [ P k (1)] 11 = Prob[ w k = 1 | u k = 1 ∧ x k = 1] = 0 . 8 . . 1 x k = 2 u k = 1 2 . u k = 2 1 1 [ P k (2)] 11 = Prob[ w k = 1 | x k = 1 ∧ u k = 2] +Prob[ w k = 2 | x k = 1 ∧ u k = 2] = 1 Prob[ w k = 1 | x k = i ∧ u k = j ] = 0 . 8 , ∀ i, j . . . (Prob[ w k = 2 | x k = i ∧ u k = j ] = 0 . 2) 13
Finding output matrices Use: X = Prob[ v k = τ | x k = i ] [ R k ] ` i = Prob[ y k = ` | x k = i ] { ⌧ | g k ( i, ⌧ )= ` } Example ( , fixed ) n k = q k = ν k = 2 ¯ k  � 0 . 6 1 R k = h k ( x k , n k ) v k = 2 v k = 1 0 . 4 0 x k = 1 1 2 [ R k ] 11 = Prob[ v k = 1 | x k = 1] = 0 . 6 x k = 2 1 1 . . . [ R k ] 12 = Prob[ v k = 1 | x k = 2] + Prob[ v k = 2 | x k = 2] = 1 Prob[ v k = 1 | x k = i ] = 0 . 6 , ∀ i . . . Prob[ v k = 2 | x k = i ] = 0 . 4 , ∀ i 14
Bayes’ filter State estimator y k P x k | I k x k +1 = f k ( x k , u k , w k ) Prediction Update y k = h k ( x k , n k ) u k − 1 ⇤ | is represented by the vector ⇥ p k, 1 p k,i = Prob[ x k = i | I k ] P x k | I k p k = . . . p k,n k Prediction p k +1 = P k ( u k ) p k ¯   [ R k ] y k 1 0 0 0 . . . q k +1 = D ( y k +1 )¯ Correction p k +1 0 [ R k ] y k 2 0 0 . . .   D ( y k ) = . . .   ... ... . . .   q k +1 . . . p k +1 =   1 | q k +1 0 0 0 [ R k ] y k n . . . Initial condition q 0 q 0 = D ( y 0 )˜ p 0 ,i = Prob[ x 0 = i ] ˜ p 0 p 0 = 1 | q 0 15
Recommend
More recommend