Online Control with Adversarial Disturbances Naman Agarwal Google - - PowerPoint PPT Presentation

online control with adversarial disturbances
SMART_READER_LITE
LIVE PREVIEW

Online Control with Adversarial Disturbances Naman Agarwal Google - - PowerPoint PPT Presentation

Online Control with Adversarial Disturbances Naman Agarwal Google AI Princeton Joint Work with Brian Bullins, Elad Hazan, Sham Kakade, Karan Singh Dynamical Systems with Control Robotics ! "#$ = &(! " , ) " )


slide-1
SLIDE 1

Online Control with Adversarial Disturbances

Naman Agarwal Google AI Princeton Joint Work with Brian Bullins, Elad Hazan, Sham Kakade, Karan Singh

slide-2
SLIDE 2

Dynamical Systems with Control

!"#$ = &(!", )")

+,

  • ,
  • Robotics
  • Autonomous Vehicles
  • Data Center Cooling

[Cohen et al ‘18]

slide-3
SLIDE 3

Our Setting

Robustly Control a Noisy Linear Dynamical System

!"#$ = &!" + ()" + *"

  • Known Dynamics
  • Fully Observable State

!" : State )" : Control

slide-4
SLIDE 4

Our Setting

Robustly Control a Noisy Linear Dynamical System

!"#$ = &!" + ()" + *"

  • Known Dynamics
  • Fully Observable State

!" : State )" : Control

Disturbance +, adversarially chosen (||+,|| ≤ /)

slide-5
SLIDE 5

Our Setting

Robustly Control a Noisy Linear Dynamical System

!"#$ = &!" + ()" + *"

  • Known Dynamics
  • Fully Observable State

Minimize Costs - ∑ ,"(!", )")

  • Online and Adversarial
  • General Convex Function

!" : State )" : Control

Disturbance 01 adversarially chosen (||01|| ≤ 4)

slide-6
SLIDE 6

Our Setting

Robustly Control a Noisy Linear Dynamical System

!"#$ = &!" + ()" + *"

  • Known Dynamics
  • Fully Observable State

Minimize Costs - ∑ ,"(!", )")

  • Online and Adversarial
  • General Convex Function

!" : State )" : Control

  • vs. Linear Quadratic Regulator (LQR):

Adversarial vs Random Disturbance Online, Convex Costs vs Known Quadratic Loss

Disturbance 01 adversarially chosen (||01|| ≤ 4)

slide-7
SLIDE 7

Goal – Minimize Regret

  • Fixed Time horizon - !
  • Produce actions "#, "% … "' to minimize regret w.r.t best in hindsight

(

)*# '

+) ,), ") − min

1

(

)*# '

+) ,)(3), 3,)(3)

slide-8
SLIDE 8

Goal – Minimize Regret

  • Fixed Time horizon - !
  • Produce actions "#, "% … "' to minimize regret w.r.t best in hindsight

(

)*# '

+) ,), ") − min

1

(

)*# '

+) ,)(3), 3,)(3)

Best Linear Policy knowing 5# … 5' Optimal for LQR ") only knows 5#… 5)

slide-9
SLIDE 9

Goal – Minimize Regret

  • Fixed Time horizon - !
  • Produce actions "#, "% … "' to minimize regret w.r.t best in hindsight

(

)*# '

+) ,), ") − min

1

(

)*# '

+) ,)(3), 3,)(3)

Best Linear Policy knowing 5# … 5' Optimal for LQR ") only knows 5#… 5) Counterfactual Regret – ,)(3) depends on K

slide-10
SLIDE 10
  • min-max problem, worst case perturbation:

min

$ max '(:* + ,

  • .,, 0(2,34, … 26)

Previous work: 89 Control

  • Disturbance 24:: adversarially chosen
slide-11
SLIDE 11
  • min-max problem, worst case perturbation:

min

$ max '(:* + ,

  • .,, 0(2,34, … 26)

Previous work: 89 Control

  • Closed form: Quadratics
  • Difficult for general costs
  • 89 is Pessimistic
  • Regret: adapts to favorable sequence
  • Disturbance 24:: adversarially chosen

Compute Adaptivity

slide-12
SLIDE 12

Main Result

Efficient Online Algorithm: !" … !$ s.t.

∑&'"

$

(& )&, !& − min

/∈1&2345 ∑&'" $

(& )&, 6)& ≤ 8( :)

  • Convexity through Improper Relaxation
  • Efficient → Polynomial in system parameters, logarithmic in T
slide-13
SLIDE 13

Outline of the approach

  • 1. Improper Learning:

Can we even figure out the best in hindsight policy? ”relaxed” policy class: Next Control a linear function of previous !"

  • 2. Strong Stability ⇒

error feedback policy: learn change to action via ”small horizon” of previous disturbances.

  • 3. Small Horizon ⇒

Efficient Reduction to Online Convex Optimization (OCO) with memory [Anava et al.]

slide-14
SLIDE 14

Thank You! For more details please visit the Poster Pacific Ballroom #155 namanagarwal@google.com