Learning convex bounds for linear quadratic control policy synthesis - - PowerPoint PPT Presentation

learning convex bounds for linear quadratic control
SMART_READER_LITE
LIVE PREVIEW

Learning convex bounds for linear quadratic control policy synthesis - - PowerPoint PPT Presentation

Learning convex bounds for linear quadratic control policy synthesis Jack Umenberger Thomas B. Schn Learning to control data control (observations of the (stabilize the upright dynamical system) equilibrium position) learning


slide-1
SLIDE 1

Learning convex bounds for linear quadratic control policy synthesis

Jack Umenberger Thomas B. Schön

slide-2
SLIDE 2

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

slide-3
SLIDE 3

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

slide-4
SLIDE 4

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

slide-5
SLIDE 5

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

slide-6
SLIDE 6

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

slide-7
SLIDE 7

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

slide-8
SLIDE 8

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

slide-9
SLIDE 9

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

slide-10
SLIDE 10

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

slide-11
SLIDE 11

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

slide-12
SLIDE 12

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

slide-13
SLIDE 13

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning to control

data (observations of the dynamical system) control (stabilize the upright equilibrium position) learning

slide-14
SLIDE 14

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Problem set-up

xt+1 = Axt + But + wt wt ∼ N(0, Π) ut xt

slide-15
SLIDE 15

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Problem set-up

Goal: find a static state-feedback controller, u = Kx, to minimize

limT !1 1

T

PT

t=0 E [x0 tQxt + u0 tRut],

xt+1 = Axt + But + wt wt ∼ N(0, Π) ut xt

slide-16
SLIDE 16

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Problem set-up

Goal: find a static state-feedback controller, u = Kx, to minimize

limT !1 1

T

PT

t=0 E [x0 tQxt + u0 tRut],

Challenge: we don’t know the system parameters

xt+1 = Axt + But + wt wt ∼ N(0, Π) ut xt θ = {A, B, Π} xt+1 = Axt + But + wt wt ∼ N(0, Π),

slide-17
SLIDE 17

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning from data

xt+1 = Axt + But + wt wt ∼ N(0, Π)

slide-18
SLIDE 18

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning from data

xt+1 = Axt + But + wt wt ∼ N(0, Π) u0:T

slide-19
SLIDE 19

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning from data

xt+1 = Axt + But + wt wt ∼ N(0, Π) u0:T x0:T

slide-20
SLIDE 20

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning from data

xt+1 = Axt + But + wt wt ∼ N(0, Π) u0:T x0:T D := {u0:T , x0:T }

From this data we can form the posterior belief over model parameters: posterior(θ|D)

slide-21
SLIDE 21

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning from data

xt+1 = Axt + But + wt wt ∼ N(0, Π) u0:T x0:T D := {u0:T , x0:T }

From this data we can form the posterior belief over model parameters: Instead of optimizing the cost for fixed parameters

cost(K|θ) posterior(θ|D)

slide-22
SLIDE 22

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Learning from data

xt+1 = Axt + But + wt wt ∼ N(0, Π) u0:T x0:T D := {u0:T , x0:T }

From this data we can form the posterior belief over model parameters: Instead of optimizing the cost for fixed parameters

cost(K|θ)

We can optimize the expected cost over the posterior

posterior(θ|D) cost_avg(K) = R cost(K|θ)posterior(θ|D)dθ

slide-23
SLIDE 23

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convex upper bounds

cost policy, K

cost_avg(K)

slide-24
SLIDE 24

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convex upper bounds

cost policy, K

cost_avg(K) cost_avg(K) ≈ cost_mc(K) :=

1 M

PM

i=1cost(K|θi)

θi ∼ posterior(θ|D)

slide-25
SLIDE 25

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convex upper bounds

cost policy, K

cost_avg(K) cost_avg(K) ≈ cost_mc(K) :=

1 M

PM

i=1cost(K|θi)

θi ∼ posterior(θ|D) cost_mc(K)

slide-26
SLIDE 26

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convex upper bounds

cost policy, K

K(k) cost_avg(K) cost_bound(K|K(k)) cost_avg(K) ≈ cost_mc(K) :=

1 M

PM

i=1cost(K|θi)

θi ∼ posterior(θ|D) cost_mc(K)

slide-27
SLIDE 27

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convex upper bounds

cost policy, K

K(k) K(k+1) cost_avg(K) cost_bound(K|K(k+1)) cost_avg(K) ≈ cost_mc(K) :=

1 M

PM

i=1cost(K|θi)

θi ∼ posterior(θ|D) cost_mc(K)

slide-28
SLIDE 28

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convexification

The crux of the problem is the matrix inequality

  Xi Q (Ai + BiK)0 K0 Ai + BiK X1

i

K R1   ⌫ 0

known quantities decision variables

slide-29
SLIDE 29

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convexification

The crux of the problem is the matrix inequality

  Xi Q (Ai + BiK)0 K0 Ai + BiK X1

i

K R1   ⌫ 0

known quantities decision variables

slide-30
SLIDE 30

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convexification

The crux of the problem is the matrix inequality

  Xi Q (Ai + BiK)0 K0 Ai + BiK X1

i

K R1   ⌫ 0

known quantities decision variables

  • Replace the ‘problematic’ term with a Taylor series approx.

X−1

i

linear approximation

slide-31
SLIDE 31

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convexification

The crux of the problem is the matrix inequality

  Xi Q (Ai + BiK)0 K0 Ai + BiK X1

i

K R1   ⌫ 0

known quantities decision variables

  • Replace the ‘problematic’ term with a Taylor series approx.
  • Leads to a new linear matrix inequality with a smaller

feasible set.

X−1

i

linear approximation

slide-32
SLIDE 32

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Convexification

The crux of the problem is the matrix inequality

  Xi Q (Ai + BiK)0 K0 Ai + BiK X1

i

K R1   ⌫ 0

known quantities decision variables

  • Replace the ‘problematic’ term with a Taylor series approx.
  • Leads to a new linear matrix inequality with a smaller

feasible set.

  • Hence: convex upper bound.

X−1

i

linear approximation

slide-33
SLIDE 33

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Performance

better performance

more data for learning

slide-34
SLIDE 34

NeurIPS 2018 Thu Dec 6th 05:00 -- 07:00 PM @ Room 210 & 230 #166

Poster presentation

Poster #166

Today 05:00 -- 07:00 PM @ Room 210 & 230