[PPT] - Learning convex bounds for linear quadratic control policy synthesis PowerPoint Presentation

SLIDE 1

Learning convex bounds for linear quadratic control policy synthesis

Jack Umenberger Thomas B. Schön

SLIDE 2

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

SLIDE 3

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

SLIDE 4

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

SLIDE 5

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

SLIDE 6

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

SLIDE 7

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

SLIDE 8

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

SLIDE 9

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

SLIDE 10

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

SLIDE 11

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

SLIDE 12

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

SLIDE 13

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

SLIDE 14

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Problem set-up

xt+1 = Axt + But + wt wt ∼ N(0, Π) ut xt

SLIDE 15

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Problem set-up

Goal: find a static state-feedback controller, u = Kx, to minimize

limT !1 1

T

PT

t=0 E [x0 tQxt + u0 tRut],

xt+1 = Axt + But + wt wt ∼ N(0, Π) ut xt

SLIDE 16

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Problem set-up

Goal: find a static state-feedback controller, u = Kx, to minimize

limT !1 1

T

PT

t=0 E [x0 tQxt + u0 tRut],

Challenge: we don’t know the system parameters

xt+1 = Axt + But + wt wt ∼ N(0, Π) ut xt θ = {A, B, Π} xt+1 = Axt + But + wt wt ∼ N(0, Π),

SLIDE 17

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning from data

xt+1 = Axt + But + wt wt ∼ N(0, Π)

SLIDE 18

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning from data

xt+1 = Axt + But + wt wt ∼ N(0, Π) u0:T

SLIDE 19

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning from data

xt+1 = Axt + But + wt wt ∼ N(0, Π) u0:T x0:T

SLIDE 20

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning from data

xt+1 = Axt + But + wt wt ∼ N(0, Π) u0:T x0:T D := {u0:T , x0:T }

From this data we can form the posterior belief over model parameters: posterior(θ|D)

SLIDE 21

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning from data

xt+1 = Axt + But + wt wt ∼ N(0, Π) u0:T x0:T D := {u0:T , x0:T }

From this data we can form the posterior belief over model parameters: Instead of optimizing the cost for fixed parameters

cost(K|θ) posterior(θ|D)

SLIDE 22

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning from data

xt+1 = Axt + But + wt wt ∼ N(0, Π) u0:T x0:T D := {u0:T , x0:T }

From this data we can form the posterior belief over model parameters: Instead of optimizing the cost for fixed parameters

cost(K|θ)

We can optimize the expected cost over the posterior

posterior(θ|D) cost_avg(K) = R cost(K|θ)posterior(θ|D)dθ

SLIDE 23

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convex upper bounds

cost policy, K

cost_avg(K)

SLIDE 24

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convex upper bounds

cost policy, K

cost_avg(K) cost_avg(K) ≈ cost_mc(K) :=

1 M

PM

i=1cost(K|θi)

θi ∼ posterior(θ|D)

SLIDE 25

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convex upper bounds

cost policy, K

cost_avg(K) cost_avg(K) ≈ cost_mc(K) :=

1 M

PM

i=1cost(K|θi)

θi ∼ posterior(θ|D) cost_mc(K)

SLIDE 26

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convex upper bounds

cost policy, K

K(k) cost_avg(K) cost_bound(K|K(k)) cost_avg(K) ≈ cost_mc(K) :=

1 M

PM

i=1cost(K|θi)

θi ∼ posterior(θ|D) cost_mc(K)

SLIDE 27

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convex upper bounds

cost policy, K

K(k) K(k+1) cost_avg(K) cost_bound(K|K(k+1)) cost_avg(K) ≈ cost_mc(K) :=

1 M

PM

i=1cost(K|θi)

θi ∼ posterior(θ|D) cost_mc(K)

SLIDE 28

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convexification

The crux of the problem is the matrix inequality

  Xi Q (Ai + BiK)0 K0 Ai + BiK X1

i

K R1   ⌫ 0

known quantities decision variables

SLIDE 29

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convexification

The crux of the problem is the matrix inequality

  Xi Q (Ai + BiK)0 K0 Ai + BiK X1

i

K R1   ⌫ 0

known quantities decision variables

SLIDE 30

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convexification

The crux of the problem is the matrix inequality

  Xi Q (Ai + BiK)0 K0 Ai + BiK X1

i

K R1   ⌫ 0

known quantities decision variables

Replace the ‘problematic’ term with a Taylor series approx.

X−1

i

linear approximation

SLIDE 31

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convexification

The crux of the problem is the matrix inequality

  Xi Q (Ai + BiK)0 K0 Ai + BiK X1

i

K R1   ⌫ 0

known quantities decision variables

Replace the ‘problematic’ term with a Taylor series approx.
Leads to a new linear matrix inequality with a smaller

feasible set.

X−1

i

linear approximation

SLIDE 32

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convexification

The crux of the problem is the matrix inequality

  Xi Q (Ai + BiK)0 K0 Ai + BiK X1

i

K R1   ⌫ 0

known quantities decision variables

Replace the ‘problematic’ term with a Taylor series approx.
Leads to a new linear matrix inequality with a smaller

feasible set.

Hence: convex upper bound.

X−1

i

linear approximation

SLIDE 33

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Performance

better performance

more data for learning

Learning convex bounds for linear quadratic control policy synthesis - - PowerPoint PPT Presentation

Learning convex bounds for linear quadratic control policy synthesis

Learning to control

Learning to control

Learning to control

Learning to control

Learning to control

Learning to control

Learning to control

Learning to control

Learning to control

Learning to control

Learning to control

Learning to control

Problem set-up

Problem set-up

Problem set-up

Learning from data

Learning from data

Learning from data

Learning from data

Learning from data

Learning from data

Convex upper bounds

Convex upper bounds

Convex upper bounds

Convex upper bounds

Convex upper bounds

Convexification

Convexification

Convexification

Convexification

Convexification

Performance

Poster presentation

Poster #166

Today 05:00 -- 07:00 PM @ Room 210 & 230