Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes

Introduction • In several control problems only an output (subset of states) is available for feedback. • For example when controlling the position of a mass on a surface using only position sensors, the state (which includes velocity) is not fully known. • More generally the output might not provide full information about the state but only partial information • In this lecture we discuss Partially Observable Markov Decision Problems (POMDP). 1

Outline • Formulation of POMDP • Bayes filter • Solving POMDP

Problem formulation Stochastic disturbances Dynamic model initial state stochastic x k +1 = f k ( x k , u k , w k ) with known h − 1 Prob[ x k = i ] = p 0 ,i X Cost g k ( x k , u k , w k ) + g h ( x h ) , noise k =0 Output y k = h k ( x k , n k ) y k ∈ Y k , n k ∈ N k Information set I k = ( y 0 , y 1 , . . . , y k , u 0 , u 1 , . . . , u k − 1 ) I 0 = ( y 0 ) k ≥ 1 , , The state, input and disturbances live in the sets defined in Lec 2 (slides 3, 16), the output and noise live in finite spaces Y k := { 1 , . . . , q k,i } N k := { 1 , . . . , ν k,i } in general dependent on state x k = i Problem: Find policy , that minimizes π = { µ 0 , . . . , µ h − 1 } u k = µ k ( I k ) h − 1 X J π = E [ g k ( x k , µ k ( I k ) , w k ) + g h ( x k )] 2 k =0

First approach We can reformulate a partial information optimal control problem as a standard full information optimal control problem by considering the state to be the information set. The problem is that the state space dimension increases exponentially. y 1 y 1 Measurement y 0 y 0 space y 0 y 2 Information I 0 = ( y 0 ) I 2 = ( y 0 , y 1 , y 2 , u 0 , u 1 ) I 1 = ( y 0 , y 1 , u 0 ) set Stage 2 Stage 0 Stage 1 3

Second approach It is possible to show that the knowledge of the probability distribution of the state given the information obtained so far denoted by is sufficient to determine optimal decisions P x k | I k (Bertsekas’ book, Section 5.4) u k = µ k ( P x k | I k ) P x 2 | I 2 Decision and P x 1 | I 1 Probability observation 1 space P x 0 | I 0 Decision and Decision and observation 1 P x 2 | I 2 observation 2 Initial state distribution P x 1 | I 1 Decision and observation2 Decision and observation 1 u 0 Decision and observation 2 Stage 0 Stage 1 Stage 2 4

Decomposition of the optimal policy Then the structure of the optimal decision-maker is (see Bertsekas’ book, Section 5.4): Dynamics Output x k y k = h k ( x k , n k ) x k +1 = f k ( x k , u k , w k ) Delay Actuator P x k | I k u k State estimator u k − 1 optimal µ k y k Policy That is, the optimal policy can be decomposed into: an estimator of and a map from P x k | I k to actions. P x k | I k 5

Discussion • The first approach is typically impractical for applications. • The second approach is often used in robotics. • To compute a crucial step is the Bayes’ rule and the state P x k | I k estimator is the Bayes filter. • The Bayes filter is important per se and we will start by studying it. 6

Outline • Formulation of POMDP • Bayes filter • Solving POMDP

Bayes’ rule The Bayes filter relies on the basic Bayes’ rule. Suppose then x ∈ { 1 , . . . , n } y ∈ { 1 , . . . , m } Prob[ x = i | y = j ] = Prob[ y = j | x = i ]Prob[ x = i ] Prob[ y = j | x = i ]Prob[ x = i ] = Prob[ y = j ] P n ` =1 Prob[ y = j | x = ` ]Prob[ x = ` ] Example: • Think of as a sensor measurement and think of as the state (see figure). If y = 1 y x what can you tell about the state? x = 3 x = 2 x = 1 y = 1 y = 2 Probability space • The Bayes’ rule allow to infer something about the state a posteriori of the sensor measurement from a priori information , . Prob[ x = i | y = j ] Prob[ x = i ] Prob[ y = j | x = i ] 7

Example An international student is deciding if it is worthwhile to design a hobby alarm (using an ultrasound sensor close to the door) like his dorm mates to detect burglars when she goes to her home country to spend Christmas. The alarm is faulty and characterised by Burglar breaks in Burglar does not break in B ¬ B Alarm does not go off Prob[ ¬ A |¬ B ] = 0 . 9 Prob[ ¬ A | B ] = 0 . 01 A Alarm goes off Prob[ A |¬ B ] = 0 . 1 ¬ A Prob[ A | B ] = 0 . 99 and historical data reveals that 2 out of 100 rooms are robbed each Christmas, i.e., Prob[ B ] = 0 . 02 , Prob[ ¬ B ] = 0 . 98 Her total belongings in the room amount to 2000 euros and asking a security agency to check her room whenever she calls (after the alarm indicates a break in) costs 300 euros and therefore she will only call if or equivalently Prob[ B | A ]2000 > 300 Prob[ B | A ] > 0 . 15 Shall she buy the alarm? 8

Thomas Bayes Historical note Thomas Bayes (1701 – 1761) was an English statistician, philosopher and Presbyterian minister who is known for having formulated a specific case of the theorem that bears his name: Bayes' theorem. Bayes never published what would eventually become his most famous accomplishment; his notes were edited and published after his death (source wikipedia). 10

Problem formulation How to find a state estimator to compute ? P x k | I k y k x k +1 = f k ( x k , u k , w k ) P x k | I k State estimator y k = h k ( x k , n k ) u k − 1 Note that can be used to compute any quantity of interest about P x k | I k the state (e.g. mean, median, variance, etc). In particular a typical state estimate is the mean, obtained by making equal to identity in a E [ a ( x k ) | I k ] = P n i =1 a ( i ) P x k = i | I k 11

Preliminaries For simplicity we will assume that the input, disturbance, output, and noise spaces do not change with the state (but can still depend on time). w k ∈ { 1 , . . . , ω k } u k ∈ { 1 , . . . , m k } x k ∈ { 1 , . . . , ¯ n k } y k ∈ { 1 , . . . , q k } n k ∈ { 1 , . . . , ν k } Let us start by defining a different representation of the dynamic and output maps in terms of the following matrices for each j ∈ { 1 , . . . , m k } k ∈ { 0 , . . . , h − 1 } dimensions Components state P k ( j ) [ P k ( j )] ri = Prob[ x k +1 = r | x k = i, u k = j ] n k × ¯ ¯ n k transition matrices output [ R k ( j )] ` I = Prob[ y k = ` | x k = i ] R k q k × ¯ n k matrices 12

Finding state transition matrices Use: X [ P k ( j )] ri = Prob[ x k +1 = r | x k = i ∧ u k = j ] = Prob[ w k = ι | x k = i ∧ u k = j ] { ι | f k ( i,j, ι )= r } Example ( , fixed ) n k = m k = ω k = 2 ¯ k f k ( x k , u k , w k ) w k = 2 w k = 1  �  � 0 . 8 0 . 2 1 1 P k (1) = P k (2) = 0 . 2 0 . 8 0 0 x k = 1 u k = 1 1 2 u k = 2 1 1 [ P k (1)] 11 = Prob[ w k = 1 | u k = 1 ∧ x k = 1] = 0 . 8 . . 1 x k = 2 u k = 1 2 . u k = 2 1 1 [ P k (2)] 11 = Prob[ w k = 1 | x k = 1 ∧ u k = 2] +Prob[ w k = 2 | x k = 1 ∧ u k = 2] = 1 Prob[ w k = 1 | x k = i ∧ u k = j ] = 0 . 8 , ∀ i, j . . . (Prob[ w k = 2 | x k = i ∧ u k = j ] = 0 . 2) 13

Finding output matrices Use: X = Prob[ v k = τ | x k = i ] [ R k ] ` i = Prob[ y k = ` | x k = i ] { ⌧ | g k ( i, ⌧ )= ` } Example ( , fixed ) n k = q k = ν k = 2 ¯ k  � 0 . 6 1 R k = h k ( x k , n k ) v k = 2 v k = 1 0 . 4 0 x k = 1 1 2 [ R k ] 11 = Prob[ v k = 1 | x k = 1] = 0 . 6 x k = 2 1 1 . . . [ R k ] 12 = Prob[ v k = 1 | x k = 2] + Prob[ v k = 2 | x k = 2] = 1 Prob[ v k = 1 | x k = i ] = 0 . 6 , ∀ i . . . Prob[ v k = 2 | x k = i ] = 0 . 4 , ∀ i 14

Bayes’ filter State estimator y k P x k | I k x k +1 = f k ( x k , u k , w k ) Prediction Update y k = h k ( x k , n k ) u k − 1 ⇤ | is represented by the vector ⇥ p k, 1 p k,i = Prob[ x k = i | I k ] P x k | I k p k = . . . p k,n k Prediction p k +1 = P k ( u k ) p k ¯   [ R k ] y k 1 0 0 0 . . . q k +1 = D ( y k +1 )¯ Correction p k +1 0 [ R k ] y k 2 0 0 . . .   D ( y k ) = . . .   ... ... . . .   q k +1 . . . p k +1 =   1 | q k +1 0 0 0 [ R k ] y k n . . . Initial condition q 0 q 0 = D ( y 0 )˜ p 0 ,i = Prob[ x 0 = i ] ˜ p 0 p 0 = 1 | q 0 15

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction In several control problems only an output (subset of states) is available for feedback. For example when controlling the position of a mass on a

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part I Discrete

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part III

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

MA/CSSE 473 Day 28 Optimal BSTs Dynamic Programming Example OPTIMAL BINARY SEARCH TREES 1

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Shortest

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

Output Feedback Optimal Control with Constraints Mar a M. Seron September 2004 Centre for

Probabilities and Expectations A. Rupam Mahmood September 10, 2015 Probabilities

T minus 6 classes Quiz on Probability next class Know material on the slides we covered

Workshop 7.2b: Introduction to Bayesian models Murray Logan 07 Feb 2017 Section 1 Frequentist

FPRASs for DNF-Counting Kuldeep S. Meel 1 , Aditya A. Shrotri 2 , Moshe Y. Vardi 2 1 School of

COS 424 Lecture Notes Lecturer: L. Bottou Scribes: J. Valentino & R. Misener February 18,

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

A C++ Program Example: Three Bags C++ Obj C++ Object Oriented Programming t O i t d P i

Section 7.1 Probability of an Event We first study Pierre- Simon Laplaces classical theory of

Sambuz

Useful Links

Newsletter

Mail Us

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction In several control problems only an output (subset of states) is available for feedback. For example when controlling the position of a mass on a

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part I Discrete

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part III

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Optimal Control Theory The theory Optimal control theory is a mature mathematical discipline

Part 23 Optimal Control: Examples 142 Definition of optimal control problems Commonly

MA/CSSE 473 Day 28 Optimal BSTs Dynamic Programming Example OPTIMAL BINARY SEARCH TREES 1

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Shortest

CS 170 Section 6 Dynamic Programming Owen Jow | owenjow@berkeley.edu Agenda Dynamic

Dynamic Programming Kevin Zatloukal July 18, 2011 Motivation Dynamic programming deserves

Output Feedback Optimal Control with Constraints Mar a M. Seron September 2004 Centre for

Probabilities and Expectations A. Rupam Mahmood September 10, 2015 Probabilities

T minus 6 classes Quiz on Probability next class Know material on the slides we covered

Workshop 7.2b: Introduction to Bayesian models Murray Logan 07 Feb 2017 Section 1 Frequentist

FPRASs for DNF-Counting Kuldeep S. Meel 1 , Aditya A. Shrotri 2 , Moshe Y. Vardi 2 1 School of

COS 424 Lecture Notes Lecturer: L. Bottou Scribes: J. Valentino &amp; R. Misener February 18,

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

A C++ Program Example: Three Bags C++ Obj C++ Object Oriented Programming t O i t d P i

Section 7.1 Probability of an Event We first study Pierre- Simon Laplaces classical theory of

Sambuz

Useful Links

Newsletter

Mail Us

COS 424 Lecture Notes Lecturer: L. Bottou Scribes: J. Valentino & R. Misener February 18,