Learning convex bounds for linear quadratic control policy synthesis
Jack Umenberger Thomas B. Schön
Learning convex bounds for linear quadratic control policy synthesis - - PowerPoint PPT Presentation
Learning convex bounds for linear quadratic control policy synthesis Jack Umenberger Thomas B. Schn Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning
Jack Umenberger Thomas B. Schön
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
xt+1 = Axt + But + wt wt ∼ N(0, Π) ut xt
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Goal: find a static state-feedback controller, u = Kx, to minimize
limT !1 1
T
PT
t=0 E [x0 tQxt + u0 tRut],
xt+1 = Axt + But + wt wt ∼ N(0, Π) ut xt
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
Goal: find a static state-feedback controller, u = Kx, to minimize
limT !1 1
T
PT
t=0 E [x0 tQxt + u0 tRut],
Challenge: we don’t know the system parameters
xt+1 = Axt + But + wt wt ∼ N(0, Π) ut xt θ = {A, B, Π} xt+1 = Axt + But + wt wt ∼ N(0, Π),
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
xt+1 = Axt + But + wt wt ∼ N(0, Π)
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
xt+1 = Axt + But + wt wt ∼ N(0, Π) u0:T
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
xt+1 = Axt + But + wt wt ∼ N(0, Π) u0:T x0:T
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
xt+1 = Axt + But + wt wt ∼ N(0, Π) u0:T x0:T D := {u0:T , x0:T }
From this data we can form the posterior belief over model parameters: posterior(θ|D)
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
xt+1 = Axt + But + wt wt ∼ N(0, Π) u0:T x0:T D := {u0:T , x0:T }
From this data we can form the posterior belief over model parameters: Instead of optimizing the cost for fixed parameters
cost(K|θ) posterior(θ|D)
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
xt+1 = Axt + But + wt wt ∼ N(0, Π) u0:T x0:T D := {u0:T , x0:T }
From this data we can form the posterior belief over model parameters: Instead of optimizing the cost for fixed parameters
cost(K|θ)
We can optimize the expected cost over the posterior
posterior(θ|D) cost_avg(K) = R cost(K|θ)posterior(θ|D)dθ
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
cost policy, K
cost_avg(K)
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
cost policy, K
cost_avg(K) cost_avg(K) ≈ cost_mc(K) :=
1 M
PM
i=1cost(K|θi)
θi ∼ posterior(θ|D)
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
cost policy, K
cost_avg(K) cost_avg(K) ≈ cost_mc(K) :=
1 M
PM
i=1cost(K|θi)
θi ∼ posterior(θ|D) cost_mc(K)
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
cost policy, K
K(k) cost_avg(K) cost_bound(K|K(k)) cost_avg(K) ≈ cost_mc(K) :=
1 M
PM
i=1cost(K|θi)
θi ∼ posterior(θ|D) cost_mc(K)
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
cost policy, K
K(k) K(k+1) cost_avg(K) cost_bound(K|K(k+1)) cost_avg(K) ≈ cost_mc(K) :=
1 M
PM
i=1cost(K|θi)
θi ∼ posterior(θ|D) cost_mc(K)
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
The crux of the problem is the matrix inequality
Xi Q (Ai + BiK)0 K0 Ai + BiK X1
i
K R1 ⌫ 0
known quantities decision variables
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
The crux of the problem is the matrix inequality
Xi Q (Ai + BiK)0 K0 Ai + BiK X1
i
K R1 ⌫ 0
known quantities decision variables
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
The crux of the problem is the matrix inequality
Xi Q (Ai + BiK)0 K0 Ai + BiK X1
i
K R1 ⌫ 0
known quantities decision variables
X−1
i
linear approximation
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
The crux of the problem is the matrix inequality
Xi Q (Ai + BiK)0 K0 Ai + BiK X1
i
K R1 ⌫ 0
known quantities decision variables
feasible set.
X−1
i
linear approximation
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
The crux of the problem is the matrix inequality
Xi Q (Ai + BiK)0 K0 Ai + BiK X1
i
K R1 ⌫ 0
known quantities decision variables
feasible set.
X−1
i
linear approximation
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166
better performance
more data for learning
NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166