Online Control with Adversarial Disturbances Naman Agarwal Google - - PowerPoint PPT Presentation

▶

Feb 18, 2024 267 likes •414 views

Online Control with Adversarial Disturbances Naman Agarwal Google AI Princeton Joint Work with Brian Bullins, Elad Hazan, Sham Kakade, Karan Singh Dynamical Systems with Control Robotics ! "#$ = &(! " , ) " )

SLIDE 1

Online Control with Adversarial Disturbances

Naman Agarwal Google AI Princeton Joint Work with Brian Bullins, Elad Hazan, Sham Kakade, Karan Singh

SLIDE 2

Dynamical Systems with Control

!"#$ = &(!", )")

,
Robotics
Autonomous Vehicles
Data Center Cooling

[Cohen et al ‘18]

SLIDE 3

Our Setting

Robustly Control a Noisy Linear Dynamical System

!"#$ = &!" + ()" + *"

Known Dynamics
Fully Observable State

!" : State )" : Control

SLIDE 4

Our Setting

Robustly Control a Noisy Linear Dynamical System

!"#$ = &!" + ()" + *"

Known Dynamics
Fully Observable State

!" : State )" : Control

Disturbance +, adversarially chosen (||+,|| ≤ /)

SLIDE 5

Our Setting

Robustly Control a Noisy Linear Dynamical System

!"#$ = &!" + ()" + *"

Known Dynamics
Fully Observable State

Minimize Costs - ∑ ,"(!", )")

Online and Adversarial
General Convex Function

!" : State )" : Control

Disturbance 01 adversarially chosen (||01|| ≤ 4)

SLIDE 6

Our Setting

Robustly Control a Noisy Linear Dynamical System

!"#$ = &!" + ()" + *"

Known Dynamics
Fully Observable State

Minimize Costs - ∑ ,"(!", )")

Online and Adversarial
General Convex Function

!" : State )" : Control

vs. Linear Quadratic Regulator (LQR):

Adversarial vs Random Disturbance Online, Convex Costs vs Known Quadratic Loss

Disturbance 01 adversarially chosen (||01|| ≤ 4)

SLIDE 7

Goal – Minimize Regret

Fixed Time horizon - !
Produce actions "#, "% … "' to minimize regret w.r.t best in hindsight

(

)*# '

+) ,), ") − min

(

)*# '

+) ,)(3), 3,)(3)

SLIDE 8

Goal – Minimize Regret

Fixed Time horizon - !
Produce actions "#, "% … "' to minimize regret w.r.t best in hindsight

(

)*# '

+) ,), ") − min

(

)*# '

+) ,)(3), 3,)(3)

Best Linear Policy knowing 5# … 5' Optimal for LQR ") only knows 5#… 5)

SLIDE 9

Goal – Minimize Regret

Fixed Time horizon - !
Produce actions "#, "% … "' to minimize regret w.r.t best in hindsight

(

)*# '

+) ,), ") − min

(

)*# '

+) ,)(3), 3,)(3)

Best Linear Policy knowing 5# … 5' Optimal for LQR ") only knows 5#… 5) Counterfactual Regret – ,)(3) depends on K

SLIDE 10

min-max problem, worst case perturbation:

min

$ max '(:* + ,

.,, 0(2,34, … 26)

Previous work: 89 Control

Disturbance 24:: adversarially chosen

SLIDE 11

min-max problem, worst case perturbation:

min

$ max '(:* + ,

.,, 0(2,34, … 26)

Previous work: 89 Control

Closed form: Quadratics
Difficult for general costs
89 is Pessimistic
Regret: adapts to favorable sequence
Disturbance 24:: adversarially chosen

Compute Adaptivity

SLIDE 12

Main Result

Efficient Online Algorithm: !" … !$ s.t.

∑&'"

(& )&, !& − min

/∈1&2345 ∑&'" $

(& )&, 6)& ≤ 8( :)

Convexity through Improper Relaxation
Efficient → Polynomial in system parameters, logarithmic in T

SLIDE 13

Outline of the approach

1. Improper Learning:

Can we even figure out the best in hindsight policy? ”relaxed” policy class: Next Control a linear function of previous !"

2. Strong Stability ⇒

error feedback policy: learn change to action via ”small horizon” of previous disturbances.

3. Small Horizon ⇒

Efficient Reduction to Online Convex Optimization (OCO) with memory [Anava et al.]

SLIDE 14

Online Control with Adversarial Disturbances

Dynamical Systems with Control

!"#$ = &(!", )")

Our Setting

!"#$ = &!" + ()" + *"

Our Setting

!"#$ = &!" + ()" + *"

Our Setting

!"#$ = &!" + ()" + *"

Our Setting

!"#$ = &!" + ()" + *"

Goal – Minimize Regret

Goal – Minimize Regret

Goal – Minimize Regret

Previous work: 89 Control

Previous work: 89 Control

Main Result

∑&'"

(& )&, !& − min

(& )&, 6)& ≤ 8( :)

Outline of the approach

Thank You! For more details please visit the Poster Pacific Ballroom #155 namanagarwal@google.com