4SC000 Q2 2017-2018
Optimal Control and Dynamic Programming
Duarte Antunes
Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - - PowerPoint PPT Presentation
Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction In several control problems only an output (subset of states) is available for feedback. For example when controlling the position of a mass on a
4SC000 Q2 2017-2018
Duarte Antunes
1
Dynamic model xk+1 = fk(xk, uk, wk)
h−1
X
k=0
gk(xk, uk, wk) + gh(xh), π = {µ0, . . . , µh−1} Problem: Find policy , that minimizes Stochastic disturbances Cost Output Information set noise , ,
2
initial state stochastic with known
I0 = (y0) Ik = (y0, y1, . . . , yk, u0, u1, . . . , uk−1) k ≥ 1 uk = µk(Ik)
Prob[xk = i] = p0,i
yk = hk(xk, nk) Jπ = E[
h−1
X
k=0
gk(xk, µk(Ik), wk) + gh(xk)] The state, input and disturbances live in the sets defined in Lec 2 (slides 3, 16), the output and noise live in finite spaces in general dependent on state Yk := {1, . . . , qk,i} Nk :={1, . . . , νk,i} xk = i yk ∈ Yk, nk ∈ Nk
We can reformulate a partial information optimal control problem as a standard full information optimal control problem by considering the state to be the information set. The problem is that the state space dimension increases exponentially. Stage 0 Stage 1 Stage 2 Information set Measurement space y0 y1 y0
3
y1 y0 y2 I0 = (y0) I1 = (y0, y1, u0) I2 = (y0, y1, y2, u0, u1)
It is possible to show that the knowledge of the probability distribution of the state given the information obtained so far denoted by is sufficient to determine optimal decisions (Bertsekas’ book, Section 5.4) Stage 0 Stage 1 Stage 2 Probability space u0
4
Initial state distribution Pxk|Ik uk = µk(Pxk|Ik) Px0|I0 Px1|I1 Px1|I1 Decision and
Decision and
Decision and
Decision and
Decision and
Decision and
Px2|I2 Px2|I2
5
Then the structure of the optimal decision-maker is (see Bertsekas’ book, Section 5.4): Dynamics Output State estimator Actuator xk+1 = fk(xk, uk, wk) Delay µk
Policy uk xk uk−1 yk yk = hk(xk, nk) Pxk|Ik That is, the optimal policy can be decomposed into: an estimator of and a map from to actions. Pxk|Ik Pxk|Ik
estimator is the Bayes filter.
6
Pxk|Ik
7
Prob[x = i|y = j] = Prob[y=j|x=i]Prob[x=i]
Prob[y=j]
=
Prob[y=j|x=i]Prob[x=i] Pn
`=1 Prob[y=j|x=`]Prob[x=`]
The Bayes filter relies on the basic Bayes’ rule. Suppose then
measurement from a priori information , . Prob[x = i|y = j] Prob[y = j|x = i] Prob[x = i] x ∈ {1, . . . , n} y ∈ {1, . . . , m} x = 1 x = 2 x = 3
what can you tell about the state? y x y = 1 y = 2 Probability space Example: y = 1
8
An international student is deciding if it is worthwhile to design a hobby alarm (using an ultrasound sensor close to the door) like his dorm mates to detect burglars when she goes to her home country to spend Christmas. The alarm is faulty and characterised by Her total belongings in the room amount to 2000 euros and asking a security agency to check her room whenever she calls (after the alarm indicates a break in) costs 300 euros and therefore she will only call if or equivalently Shall she buy the alarm? Prob[A|¬B] = 0.1 Prob[¬A|¬B] = 0.9 Prob[A|B] = 0.99 Prob[¬A|B] = 0.01 Burglar breaks in Burglar does not break in Alarm goes off Alarm does not go off and historical data reveals that 2 out of 100 rooms are robbed each Christmas, i.e., Prob[B|A]2000 > 300 Prob[B|A] > 0.15 Prob[B] = 0.02, Prob[¬B] = 0.98 A ¬A ¬B B
9
Prob[B|A] = Prob[A|B]Prob[B]
Prob[A]
= 0.99×0.02
Prob[A]
Prob[¬B|A] = Prob[A|¬B]Prob[¬B]
Prob[A]
= 0.1×0.98
Prob[A]
We can simply use the a priori probabilities And compute from the fact that where Prob[A] Prob[B|A] + Prob[¬B|A] = 1 Thus Prob[A] = 0.1 × 0.98 + 0.99 × 0.02 = 0.1178 Prob[B|A] = 0.1681 Equivalently we can directly apply the formula Prob[B|A] = Prob[A|B]Prob[B] Prob[A|B]Prob[B] + Prob[A|¬B]Prob[¬B] (yes, buy!)
10
Thomas Bayes (1701 – 1761) was an English statistician, philosopher and Presbyterian minister who is known for having formulated a specific case of the theorem that bears his name: Bayes'
accomplishment; his notes were edited and published after his death (source wikipedia).
11
State estimator uk−1 yk
How to find a state estimator to compute ?
yk = hk(xk, nk) xk+1 = fk(xk, uk, wk) Pxk|Ik
Pxk|Ik Note that can be used to compute any quantity of interest about the state (e.g. mean, median, variance, etc). In particular a typical state estimate is the mean, obtained by making equal to identity in Pxk|Ik a E[a(xk)|Ik] = Pn
i=1 a(i)Pxk=i|Ik
12
Let us start by defining a different representation of the dynamic and
state transition matrices
matrices dimensions Components For simplicity we will assume that the input, disturbance, output, and noise spaces do not change with the state (but can still depend on time). uk ∈ {1, . . . , mk} yk ∈ {1, . . . , qk} wk ∈ {1, . . . , ωk} j ∈ {1, . . . , mk} k ∈ {0, . . . , h − 1} Rk Pk(j) [Rk(j)]`I = Prob[yk = `|xk = i] [Pk(j)]ri = Prob[xk+1 = r|xk = i, uk = j] nk ∈ {1, . . . , νk} xk ∈ {1, . . . , ¯ nk} ¯ nk × ¯ nk qk × ¯ nk
13
[Pk(j)]ri = Prob[xk+1 = r|xk = i ∧ uk = j] Use: = X
{ι|fk(i,j,ι)=r}
Prob[wk = ι|xk = i ∧ uk = j] Example ( , fixed ) xk = 1 xk = 2 uk = 2 uk = 1 wk = 1 wk = 2 uk = 2 uk = 1 1 2 1 1 1 1 1 2 Prob[wk = 1|xk = i ∧ uk = j] = 0.8, ∀i, j (Prob[wk = 2|xk = i ∧ uk = j] = 0.2) Pk(1) = 0.8 0.2 0.2 0.8
[Pk(2)]11 = Prob[wk = 1|xk = 1 ∧ uk = 2] +Prob[wk = 2|xk = 1 ∧ uk = 2] = 1 . . . . . . k
fk(xk, uk, wk) ¯ nk = mk = ωk = 2 Pk(2) = 1 1
14
Use: [Rk]`i = Prob[yk = `|xk = i] = X
{⌧|gk(i,⌧)=`}
Prob[vk = τ|xk = i] Example ( , fixed ) xk = 1 xk = 2 1 2 1 1
. . . k
vk = 1 vk = 2 Prob[vk = 1|xk = i] = 0.6, ∀i Prob[vk = 2|xk = i] = 0.4, ∀i
[Rk]11 = Prob[vk = 1|xk = 1] = 0.6
Rk = 0.6 1 0.4
. .
hk(xk, nk)
[Rk]12 = Prob[vk = 1|xk = 2] + Prob[vk = 2|xk = 2] = 1
¯ nk = qk = νk = 2
15
State estimator uk−1 yk Prediction Update Correction Prediction yk = hk(xk, nk) xk+1 = fk(xk, uk, wk) is represented by the vector ¯ pk+1 = Pk(uk)pk pk+1 = qk+1 1|qk+1 D(yk) = [Rk]yk1 . . . [Rk]yk2 . . . . . . . . . ... ... . . . . . . [Rk]ykn qk+1 = D(yk+1)¯ pk+1 Pxk|Ik Pxk|Ik pk,i = Prob[xk = i|Ik]
pk = ⇥pk,1 . . . pk,nk ⇤|
Initial condition p0 =
q0 1|q0
q0 = D(y0)˜ p0 ˜ p0,i = Prob[x0 = i]
16
Define Prediction step Then (condition on ) xk ( independent of ) uk (Markov property) ¯ pk,i = Prob[xk = i|Ik−1, uk−1] ¯ pk+1,i = Prob[xk+1 = i|Ik, uk] xk
n
X
τ=1
Prob[xk+1 = i|xk = τ, Ik, uk]Prob[xk = τ|Ik, uk]
n
X
τ=1
Prob[xk+1 = i|xk = τ, Ik, uk]Prob[xk = τ|Ik]
n
X
τ=1
Prob[xk+1 = i|xk = τ, uk]Prob[xk = τ|Ik] =
n
X
τ=1
[Pk(uk)]iτpk,τ = = = In matrix form ¯ pk+1 = Pk(uk)pk
¯ pk = ⇥¯ pk,1 . . . ¯ pk,nk ⇤|
17
Correction step
(Bayes’ rule*!) (Markov property) Define Suppose there is a measurement yk+1 = θ qk+1,i = Prob[yk+1 = θ|xk+1 = i, Ik, uk]Prob[xk+1 = i|Ik, uk] α = 1 Prob[yk+1 = θ|Ik, uk] pk+1,i = Prob[xk+1 = i|Ik, yk+1 = θ, uk] = αProb[yk+1 = θ|Ik, xk+1 = i, uk]Prob[xk+1 = i|Ik, uk] = αProb[yk+1 = θ|xk+1 = i]Prob[xk+1 = i|Ik, uk]
*Bayes’ rule holds when conditioning on a third variable Prob[x = i|y = j, z = r] = Prob[y = j|x = i, z = r]Prob[x = i|z = r]
Prob[y = j|z = r]
In matrix form: pk+1 = αqk+1 α = 1 1|qk+1 Since the probability vector must add up to one and pk+1 = qk+1 1|qk+1 qk+1 = D(yk+1)¯ pk+1
18
We wish to estimate the position of a robot in a given environment. We assign a label to each cell in the environment and indicates that the robot’s position coincides with the centre of cell i ∈ {1, . . . , n} xk = i i Cell Cell i i + 1 . . .
The derivations of the Bayes filter can be easily specialised to this case (no control input).
xk+1 = f(xk, wk)
† †
The robot moves according to a given control policy (autonomous system ) Disturbances
19
Vector field with no disturbances
20
At each step , for a given state , the new state is uncertain and can take values in a neighbourhood of the state obtained with a deterministic model according to the probability values given in the grid below Given position corresponding to xk xk k xk+1 = f(xk, wk) xk+1 = f(xk, 0) xk+1 = f(xk, 0) p1 p2 p2 p2 p2 p2 p2 p2 p2 p3 p3 p3 p3 p3 p3 p3 p3 p3 p3 p3 p3 p3 p3 p3 p3 p4 p4 p4 p4 p4 p4 p4 p4 p4 p4 p4 p4 p4 p4 p4 p4 p4 p4 p4 p4 p4 p4 p4 p4 (no disturbances) xk+1 = f(xk, wk) (possible next state) Deterministic Stochastic p1 = 0.65 p2 = 0.2
8
p3 = 0.1
16
p4 = 0.05
24
21
At each step , for a given state , the measurement is uncertain and can take values in a neighbourhood of the state according to the probability values given in the grid xk k (possible measurements) xk State yk = h(xk, nk) yk = h(xk, nk) xk q1 = 0.5 q2 = 0.2
8
q3 = 0.2
16
q4 = 0.1
24
q4 q4 q4 q4 q4 q4 q4 q4 q4 q4 q4 q4 q4 q4 q4 q4 q4 q4 q4 q4 q4 q4 q4 q4 q3 q3 q3 q3 q3 q3 q3 q3 q3 q3 q3 q3 q3 q3 q3 q3 q1 q2 q2 q2 q2 q2 q2 q2 q2
22
At each step , for a given state , the measurement is deterministic and indicates if there are objects/walls (represented in yellow) within an euclidian distance, a multiple M = 5 of the length of each cell, of the robot. If so it indicates if the closest object/wall is to the left (L), right (R), up (U) or down (D), otherwise it indicates no object close (NO) xk k yk = h(xk) measurements measurements Position/state Position/state R D L U NO
23
Video: LEC4nodisturbances.mp4
24
Video: LEC4sensor1.mp4
25
Video: LEC4sensor2.mp4
26
Video: LEC4sensor2_2.mp4
27
then establish this general fact
piecewise affine functions of the probability distribution of the state.
with the number of stages, and therefore this results has mostly theoretical
.
is will also be used to derive LQG control.
Chapter 5, Machine repair Sec 5.4 and Example 5.4.2.
second approach (the first approach would actually be similar).
28 Suppose you're on a game show, and you're given the choice
goats. You pick a door, say No 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice? Whitaker, Craig F. (9 September 1990, Parade Magazine: 16, source wikipedia.)
stage 0 stage 1 stage 2
decision 1 decision 2
Monty Hall problem
29
Dynamic model decisions u0 ∈ {1, 2, 3} u1 ∈ {1, 2, 3} k ∈ {0, 1, 2} two decision stages doors selected control input xk ∈ {1, 2, . . . , 9} We can think of the overall state as , although formally xk = (¯ xk, ˜ xk) corresponds the nine possible combinations of ˜ xk ∈ {1, 2, 3} ¯ xk ∈ {1, 2, 3} Note that from we can extract and vice-versa. xk ˜ xk, ¯ xk unknown door where the car is no disturbances ¯ xk+1 = ¯ xk ¯ x0 Prob[¯ x0 = 1] = Prob[¯ x0 = 2] = Prob[¯ x0 = 3] = 1 3 State ˜ xk+1 = uk keeps track of previous decision ˜ x0 = 1 (not relevant) Two “components”
30
g0(x0, u0) = 0 g2(x2) = 0 Cost g1(x1, u1) = ( − 1, if ¯ x1 = u1 0, otherwise Output and information sets where is a random variable taking one of two values in the set with equal probability ( ) n1 1/2 y0 = ∅ I0 = (∅) y1 = h1(x1, n1) = ( {1, 2, 3} \ {¯ x1, ˜ x1} if ¯ x1 6= ˜ x1 {1, 2, 3} \ {¯ x1, n1} if ¯ x1 = ˜ x1 I1 = (y1, u0) {1, 2, 3} \ {¯ x1}
31
, start at stage 1 and compute for every possible value of Bayes’ rule! Px1|I1 I1 = (y1, u0) P¯
x1|I1
I1 ˜ x1 = u0
(y1, u0) = (3, 1) It is obvious that How to compute. , ? Prob[¯ x1 = 3|(y1, u0) = (3, 1)] = 0 Prob[¯ x1 = 2|(y1, u0) = (3, 1)] = 0 Prob[¯ x1 = 1|(y1, u0) = (3, 1)] = 0
32
= α 1 ×
1 3
= 2
3
= α
1 2
×
1 3
= 1
3
Prob[¯ x1 = 1|y1 = 3, u0 = 1] = αProb[y1 = 3|¯ x1 = 1, u0 = 1]Prob[¯ x1 = 1|u0 = 1] Prob[¯ x1 = 2|y1 = 3, u0 = 1] = αProb[y1 = 3|¯ x1 = 2, u0 = 1]Prob[¯ x1 = 2|u0 = 1]
2 1 3
2 3 1 3
α = 1 Prob[y1 = 3|u0 = 1] = 1
1 2 × 1 2 + 1 × 1 3
= 2 ¯ x1 P¯
x1|I1=(3,1)
33 2 1 3 x1
2 3 1 3
(y1, u0) = (3, 1) The optimal decision is the one that minimizes E[g1(x1, u1)] g1(x1, u1) = ( − 1, if ¯ x1 = u1 0, otherwise E[g1(¯ x1, u1)] = −1 × Prob[¯ x1 = u1] + 0 × Prob[¯ x1 = u1] = −Prob[¯ x1 = u1] This is achieved for and is given by −Prob[¯ x1 = 2] = −2 3 ¯ x1 = 2 P¯
x1|I1=(3,1)
stage 0 stage 1 decision 1 decision 2
34 2 1 3 x1
2 3 1 3
2 1 3 x1
2 3 1 3
2 1 3 x1
2 3 1 3
2 1 3 x1
2 3 1 3
2 1 3 x1
2 3 1 3
2 1 3 x1
2 3 1 3
always switch at stage 1 pick any at stage 0 cost-to-go − 2
3
− 2
3
− 2
3
− 2
3
− 2
3
− 2
3
− 2
3
The optimal policy is to pick any door at state 1 and switch at stage 2, and the probability of winning is 2
3
P¯
x1|I1=(3,1)
P¯
x1|I1=(2,1)
P¯
x1|I1=(1,2)
P¯
x1|I1=(3,2)
P¯
x1|I1=(1,3)
P¯
x1|I1=(2,1)
35
P ¯ P
probability 2/3 and it moves to with probability 1/3 ¯ P ¯ P P ¯ P P
P
positive and negative with B G Prob[G|P] = Prob[B| ¯ P] = 3/4 Prob[B|P] = Prob[G| ¯ P] = 1/4 P ¯ P P ¯ P P ¯ P P ¯ P
Decision 1: repair? Decision 2: repair? 1st inspection 2nd inspection Repair Do not repair
*[Bertsekas’s book Section. 5.1]
S
a cost . C 1 ¯ P 2
continue will not change the state and repare will change the state to if in C P ¯ P S
36
xk ∈ {P, ¯ P} uk ∈ {C, S} Dynamics xk+1 = wk k ∈ {0, 1} Prob[wk = P|xk = P, uk = C] = 2
3 Prob[wk = P|xk = ¯
P, uk = C] = 0 Prob[wk = P|xk = ¯ P, uk = S] = 2
3
Prob[wk = P|xk = P, uk = S] = 2
3
Prob[x0 = P] = 2
3
Prob[x0 = ¯ P] = 1
3
Prob[wk = ¯ P| . . . ] = 1 − Prob[wk = P| . . . ] Measurements Prob[vk = G|xk = P] = 3
4
Prob[vk = G|xk = ¯ P] = 1
4
Prob[vk = B| . . . ] = 1 − Prob[vk = B| . . . ] yk = vk Cost g(x0, u0) + g(x1, u1) g(P, C) = 0 g(P, S) = 1 g( ¯ P, S) = 1 g( ¯ P, C) = 2 Information set I0 = (y0) I1 = (y0, y1, u0) Goal Find to minimize µ0(I0) µ1(I1) E[g(x0, µ0(I0)) + g(x1, µ1(I1))]
37
At the last decision stage, assume that we know For a given “state” the cost-to-go is k = 1 p1 p1 = Prob[x1 = ¯ P|I1] Prob[x1 = P|I1] = 1 − p1 J1 = min
u1 E[g(x1, u1)|I1]
g(P, S) = 1 g( ¯ P, S) = 1
u1 = S u1 = C - cost-to-g is since 1 2p1
E[g(x1, u1)|I1] = g( ¯ P, C) | {z }
=2
Prob[x1 = ¯ P|I1] | {z }
p1
+ g(P, C) | {z }
=0
Prob[x1 = P|I1] | {z }
1−p1
Therefore J1(p1) = min(2p1, 1) µ1(p1) = S if p1 > 1 2 C if p1 ≤ 1 2 (then )
36
At the first decision stage assume that we know P ¯ P P ¯ P P ¯ P k = 0 J1(p1) = min(2p1, 1) u0 = S u0 = C
1st inspection p0 = Prob[x0 = ¯ P|I0] To determine how decision can influence the cost to go we need to understand how the state probability distribution at , parameterised by , depends on taking into account that there will be a measurement p1 k = 1 u0 u0 y1
37
There are then four options, each leading to a probability distribution characterised by
p1 = Φ0(p0, u0, y1) = 1 7 if u0 = S, y1 = G 3 5 if u0 = S, y1 = B 1 + 2p0 7 − 4p0 if u0 = C, y1 = G 3 + 6p0 5 + 4p0 if u0 = C, y1 = B
For example for , is given by u0 = S
1/4 3/4 1/3 2/3 p1 = Prob[x1 = ¯ P|u0 = S, y1 = G] Prob[y1 = G|u0 = S, x1 = ¯ P]Prob[x1 = ¯ P|u0 = S] Prob[y1 = G|u0 = S, x1 = ¯ P]Prob[x1 = ¯ P|u0 = S] + Prob[y1 = G|u0 = S, x1 = P]Prob[x1 = P|u0 = S]
y1 = G =
= 1/7 p1
Other example
1/4 = Prob[y1 = B|u0 = C, x1 = ¯ P]Prob[x1 = ¯ P|u0 = C] Prob[y1 = B|u0 = C, x1 = ¯ P]Prob[x1 = ¯ P|u0 = C] + Prob[y1 = B|u0 = C, x1 = P]Prob[x1 = P|u0 = C] 3/4 p0 + (1 − p0)/3 (1 − p0)2/3 = 3 + 6p0 5 + 4p0 p1
u0 = C y1 = B
38
Therefore
Prob[y1 = B|p0, u0 = C] = 5 + 4p0 12 Prob[y1 = G|p0, u0 = C] = 7 − 4p0 12 Prob[y1 = G|p0, u0 = S] = 7 12 Prob[y1 = B|p0, u0 = S] = 5 12 J0(p0) = min[2p0 + Prob[y1 = G|p0, u0 = C]J1(Φ0(p0, C, G) + Prob[y1 = B|p0, u0 = C]J1(Φ0(p0, C, B) 1 + Prob[y1 = G|p0, u0 = S]J1(Φ0(p0, S, G) + Prob[y1 = B|p0, u0 = S]J1(Φ0(p0, S, B)]
where one can compute
J0(p0) = min[2p0+7 − 4p0 12 J1(1 + 2p0 7 − 4p0 )+5 + 4p0 12 J1(3 + 6p0 5 + 4p0 ), 1+ 7 12J1(1 7)+ 5 12J1(3 5)] µ0(p0) = C, p0 ≤ 3 8 S if p0 > 3 8 J0(p0) = 19 12, 3 8 ≤ p0 ≤ 1 7 + 32p0 12 if 0 ≤ p0 ≤ 3 8
Then Which can be simplified to
Dynamic model xk+1 = fk(xk, uk, wk) Cost
Then there exist vectors dimensional vectors for each k such that
Output Information set k ≥ 1 , ,
39
n Jh(ph) = α1
hph
I0 = (y0) yk = hk(xk, nk) Jπ = E[
h−1
X
k=0
gk(xk, µk(Ik), wk) + gh(xk)] Ik = (y0, y1, . . . , yk, u0, u1, . . . , uk−1)
α1
k, α2 k, . . . , αak k
Jk(pk) = min{α1
kpk, α2 kpk, . . . , αak k pk}
µk(pk) = arg min
uk E[gk(xk, uk) + Jk+1(pk+1)]
40
The statement is true for the last stage since defining α1
h =
⇥gh(1) gh(2) . . . gh(n)⇤ And assuming it is true for note that k + 1 (condition on ) yk+1 (where ) From the expression on slide 15, where ¯ α(uk) = ⇥g(1, uk) . . . g(n, uk)⇤ For simplicity assume the running cost is and it does not depend on time and disturbances and the state, input and output live in fixed sets over time xk ∈ {1, . . . , n}, uk ∈ {1, . . . , m}, yk ∈ {1, . . . , q} g(xk, uk) Jh(ph) = E[gh(xh)|Ih] =
n
X
i=1
gh(i) Prob[xh = i|Ih] | {z }
ph,i
= α1
hph
Jk(pk) = min
uk E[g(xk, uk) + Jk+1(pk+1)|Ik]
= min
uk ¯
α(uk)pk + E[Jk+1(pk+1)|Ik] = min
uk ¯
↵(uk)pk +
q
X
`=1
E[Jk+1(pk+1)|Ik, yk+1 = `]Prob[y`+1|Ik] pk+1 = qk+1 Prob[yk+1 = `|Ik] qk+1 = D(yk+1)Pk(uk)pk
41
we obtain which can be written as Replacing this expression and noticing that for some Jk(pk) = min{α1
kpk, α2 kpk, . . . , αak k pk}
Jk+1(pk+1) = min{α1
k+1pk+1, α2 k+1pk+1, . . . , αak+1 k+1 pk+1}
Jk(pk) = min
uk∈{1,...,m} ¯
↵(uk)pk+
q
X
`=1
{↵1
k+1D(`)Pk(uk)pk, ↵2 k+1D(`)Pk(uk)pk, . . . , ↵ak+1 k+1 D(`)Pk(uk)pk}
α1
k, . . . , αak k
42
computing the probability distribution of the state and a decision maker.
contexts.
typically increase exponentially with the time horizon.