Online Control with Adversarial Disturbances
Naman Agarwal Google AI Princeton Joint Work with Brian Bullins, Elad Hazan, Sham Kakade, Karan Singh
Online Control with Adversarial Disturbances Naman Agarwal Google - - PowerPoint PPT Presentation
Online Control with Adversarial Disturbances Naman Agarwal Google AI Princeton Joint Work with Brian Bullins, Elad Hazan, Sham Kakade, Karan Singh Dynamical Systems with Control Robotics ! "#$ = &(! " , ) " )
Naman Agarwal Google AI Princeton Joint Work with Brian Bullins, Elad Hazan, Sham Kakade, Karan Singh
+,
[Cohen et al ‘18]
Robustly Control a Noisy Linear Dynamical System
!" : State )" : Control
Robustly Control a Noisy Linear Dynamical System
!" : State )" : Control
Disturbance +, adversarially chosen (||+,|| ≤ /)
Robustly Control a Noisy Linear Dynamical System
Minimize Costs - ∑ ,"(!", )")
!" : State )" : Control
Disturbance 01 adversarially chosen (||01|| ≤ 4)
Robustly Control a Noisy Linear Dynamical System
Minimize Costs - ∑ ,"(!", )")
!" : State )" : Control
Adversarial vs Random Disturbance Online, Convex Costs vs Known Quadratic Loss
Disturbance 01 adversarially chosen (||01|| ≤ 4)
(
)*# '
+) ,), ") − min
1
(
)*# '
+) ,)(3), 3,)(3)
(
)*# '
+) ,), ") − min
1
(
)*# '
+) ,)(3), 3,)(3)
Best Linear Policy knowing 5# … 5' Optimal for LQR ") only knows 5#… 5)
(
)*# '
+) ,), ") − min
1
(
)*# '
+) ,)(3), 3,)(3)
Best Linear Policy knowing 5# … 5' Optimal for LQR ") only knows 5#… 5) Counterfactual Regret – ,)(3) depends on K
min
$ max '(:* + ,
min
$ max '(:* + ,
Compute Adaptivity
Efficient Online Algorithm: !" … !$ s.t.
$
/∈1&2345 ∑&'" $
Can we even figure out the best in hindsight policy? ”relaxed” policy class: Next Control a linear function of previous !"
error feedback policy: learn change to action via ”small horizon” of previous disturbances.
Efficient Reduction to Online Convex Optimization (OCO) with memory [Anava et al.]