robust control for analysis and design of large scale
play

robust control for analysis and design of large-scale optimization - PowerPoint PPT Presentation

robust control for analysis and design of large-scale optimization algorithms Laurent Lessard University of WisconsinMadison Joint work with Ben Recht and Andy Packard LCCC Workshop on Large-Scale and Distributed Optimization Lund


  1. robust control for analysis and design of large-scale optimization algorithms Laurent Lessard University of Wisconsin–Madison Joint work with Ben Recht and Andy Packard LCCC Workshop on Large-Scale and Distributed Optimization Lund University, June 15, 2017

  2. 1. Many algorithms can be viewed as dynamical systems with feedback (control systems!). algorithm convergence ⇐ ⇒ system stability 2. By solving a small convex program, we can recover state-of-the-art convergence results for these algorithms, automatically and efficiently. 3. The ultimate goal: to move from analysis to design. 2

  3. Unconstrained optimization: minimize f ( x ) x ∈ R N subject to • need algorithms that are fast and simple • currently favored family: first-order methods 3

  4. Gradient method 10 0 x k +1 = x k − α ∇ f ( x k ) Error 10 − 2 Heavy ball method 10 − 4 x k +1 = x k − α ∇ f ( x k ) + β ( x k − x k − 1 ) 0 20 40 60 80 Nesterov’s accelerated method y k = x k + β ( x k − x k − 1 ) x k +1 = y k − α ∇ f ( y k ) x 1 contours of f ( x ) (quadratic) x 0 4

  5. Robust algorithm selection G ∈ G : algorithm we’re going to use f ∈ S : function we’d like to minimize � � G opt = arg min max f ∈S cost( f, G ) G ∈G Similar problem for a finite number of iterations: • Drori, Teboulle (2012) • Taylor, Hendrickx, Glineur (2016) 5

  6.  Gradient method    x k +1 = x k − α ∇ f ( x k )          Heavy ball method G ∈ G x k +1 = x k − α ∇ f ( x k ) + β ( x k − x k − 1 )        Nesterov’s accelerated method     � �  x k +1 = x k − α ∇ x k + β ( x k − x k − 1 ) + β ( x k − x k − 1 ) f  Analytically solvable:    Quadratic functions: f ( x ) = 1 2 x T Qx − p T x f ∈ S    with the constraint: mI � Q � LI 6

  7. Convergence rate on quadratic functions Iterations to convergence for Gradient method 10 3 1 1 Iterations to convergence 10 2 Convergence rate ρ 0 . 8 0 . 6 10 1 1 / 2 0 . 4 Gradient (quadratic) 10 0 0 . 2 Nesterov (quadratic) Heavy ball (quadratic) 0 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 Condition ratio L/m Condition ratio L/m Convergence rate : � x k − x ⋆ � ≤ Cρ k � x 0 − x ⋆ � 1 Iterations to convergence ∝ − log ρ 7

  8. Robust algorithm selection G ∈ G : algorithm we’re going to use f ∈ S : function we’d like to minimize � � G opt = arg min max f ∈S cost( f, G ) G ∈G 1. mathematical representation for G 2. mathematical representation for S 3. main robustness result 8

  9. Dynamical system interpretation Heavy ball: x k +1 = x k − α ∇ f ( x k ) + β ( x k − x k − 1 ) Define u k := ∇ f ( x k ) and p k := x k − 1 algorithm (linear, known, decoupled) � x k +1 � � (1 + β ) I � � x k � � − αI � − βI = + u k p k +1 I 0 p k 0 � � x k � � y k = I 0 p k y u u k = ∇ f ( y k ) function (nonlinear, uncertain, coupled) 9

  10. Dynamical system interpretation x k +1 = x k − α ∇ f ( x k ) + β ( x k − x k − 1 ) Heavy ball: Define u k := ∇ f ( x k ) and p k := x k − 1 algorithm (linear, known, decoupled ) � ( x k +1 ) i � � 1 + β � � ( x k ) i � � − α � − β � ( x k +1 ) i � � 1 + β � � ( x k ) i � � − α � − β � ( x k +1 ) i � ( x k +1 ) i � � � 1 + β � 1 + β � � ( x k ) i � � ( x k ) i � � � − α � − α � � = + ( u k ) i − β − β = + ( u k ) i 0 ( p k +1 ) i 1 0 ( p k ) i = = + + ( u k ) i ( u k ) i 0 ( p k +1 ) i 1 0 ( p k ) i 0 0 ( p k +1 ) i ( p k +1 ) i 1 1 0 0 ( p k ) i ( p k ) i � � ( x k ) i � � � ( x k ) i � � � � ( x k ) i � � ( x k ) i � � ( y k ) i = 1 � 0 ( y k ) i = 1 � � 0 ( p k ) i ( y k ) i = ( y k ) i = 1 1 0 0 ( p k ) i ( p k ) i ( p k ) i y u i = 1 , . . . , N u k = ∇ f ( y k ) function (nonlinear, uncertain, coupled ) 10

  11. G ξ ξ k +1 = Aξ k + Bu k y k = Cξ k y u ∇ f u k = ∇ f ( y k )  � � 1 − α  Gradient    1 0       1 + β − β − α   � �  A B  Heavy ball 1 0 0   = 1 0 0 C 0       1 + β − β − α     Nesterov 1 0 0       − β 1 + β 0 11

  12. y u ∇ f � �� � ∇ ∇ ∇ f ( x ) f ( x ) f ( x ) ⊂ ⊂ ∇ f ( x ) : x x x linear sector bounded + slope restricted sector bounded � � � f ( x ) f ( x ) f ( x ) ⊂ ⊂ f ( x ) : x x x quadratic strongly convex + Lipschitz gradients radially quasiconvex 12

  13. y u ∇ f Representing function classes express as quadratic constraints on ( y, u ) u k ∇ f is a passive function: u k y k ≥ 0 y k sector bounded 13

  14. y u ∇ f Representing function classes express as quadratic constraints on ( y, u ) u k ∇ f is sector-bounded : L � y k � T � − 2 mL � � y k � m m + L y k ≥ 0 − 2 u k m + L u k sector bounded 14

  15. y u ∇ f Representing function classes express as quadratic constraints on ( y, u ) u k L ∇ f is sector-bounded + slope-restricted: m constraint on ( y k , u k ) depends on history y k ( y 0 , . . . , y k − 1 , u 0 , . . . , u k − 1 ) . sector bounded + slope restricted 15

  16. y u ∇ f z Ψ ζ Introduce extra dynamics • Design dynamics Ψ and multiplier matrix M . • Instead of using q ( u k , y k ) , use z T k Mz k . • Systematic way of doing this for strong convexity via Zames-Falb multipliers (1968). • General theory: Integral Quadratic Constraints (Megretski & Rantzer 1997) 16

  17.  � � 1 − α  Gradient  1 0   G  � 1 + β �  � A  � − β − α  B 1 0 0 y Heavy ball u 1 0 0 C 0   � 1 + β �  ∇ f  − β − α   1 0 0  Nesterov  1 + β − β 0 ∇ f ( x ) ∇ f ( x ) ∇ f ( x ) ⊂ ⊂ x x x f is quadratic f is strongly convex f is quasiconvex � �� � (Ψ , M ) 17

  18. Main result Problem data: • G (the algorithm) Size of LMI does not grow with problem dimension! • Ψ (what we know about f ) e.g. P ∈ S 3 × 3 , LMI ∈ S 4 × 4 Auxiliary quantities: • Compute ( ˆ A, ˆ B, ˆ C, ˆ D ) matrices from ( G, Ψ) • Choose a candidate rate 0 < ρ < 1 . If there exists P ≻ 0 such that � ˆ � A T P ˆ A T P ˆ ˆ A − ρ 2 P � ˆ � ˆ B � T M � ˆ ˆ + C D C D � 0 B T P ˆ ˆ B T P ˆ ˆ A B � cond( P ) ρ k � x 0 − x ⋆ � for all k . then � x k − x ⋆ � ≤ 18

  19. main results: analytic and numerical 19

  20. Gradient method x k +1 = x k − α ∇ f ( x k ) Convergence rate for Gradient method Iterations to convergence for Gradient method 10 3 1 1 Iterations to convergence Convergence rate ρ 10 2 0 . 8 0 . 6 10 1 1 / 2 0 . 4 Gradient ( all functions ) 10 0 0 . 2 Nesterov (quadratic) Heavy ball (quadratic) 0 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 Condition ratio L/m Condition ratio L/m analytic solution! Same rate for: quadratics, strongly convex, or quasiconvex functions. 20

  21. Nesterov’s method x k +1 = x k − α ∇ f ( x k + β ( x k − x k − 1 )) + β ( x k − x k − 1 ) Nesterov rate bounds Nesterov iterations 10 3 1 Iterations to convergence Convergence rate ρ 0 . 8 10 2 0 . 6 10 1 IQC (quasiconvex) 0 . 4 IQC (strongly convex) Nesterov (strongly convex) 10 0 0 . 2 Nesterov (quadratic) Heavy ball (quadratic) 0 10 − 1 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 Condition ratio L/m Condition ratio L/m • Cannot certify stability for quasiconvex functions • IQC bound improves upon best known bound! 21

  22. Heavy ball method x k +1 = x k − α ∇ f ( x k ) + β ( x k − x k − 1 ) Heavy ball rate bounds Heavy ball iterations 10 3 1 Iterations to convergence Convergence rate ρ 0 . 8 10 2 0 . 6 10 1 0 . 4 IQC (quasiconvex) IQC (strongly convex) 10 0 0 . 2 Nesterov (quadratic) Heavy ball (quadratic) 0 10 − 1 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 Condition ratio L/m Condition ratio L/m • Cannot certify stability for quasiconvex functions • Cannot certify stability for strongly convex functions 22

  23. The heavy ball method is not stable!  25 2 x 2 x < 1   2 x 2 + 24 x − 12 1 counterexample: f ( x ) = 1 ≤ x < 2  2 x 2 − 24 x + 36 25  x ≥ 2 and start the heavy ball iteration at x 0 = x 1 ∈ [3 . 07 , 3 . 46] . 80 • L/m = 25 60 • heavy ball iterations converge to a limit cycle 40 • simple counterexample to 20 the Aizerman (1949) and 0 f ( x ) Kalman (1957) conjectures − 2 0 2 4 23

  24. uncharted territory: noise robustness and algorithm design 24

  25. Noise robustness The ∆ δ block is uncertain u G multiplicative noise: ξ � u k − w k � ≤ δ � w k � y ∆ δ ∇ f w How does an algorithm perform in the presence of noise? 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend