How to Characterize the Worst-Case Performance of Algorithms for - PowerPoint PPT Presentation

Motivation Contemporary Analyses Partitioning Regularization Methods Summary How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis , Lehigh University joint work with Daniel P. Robinson , Johns Hopkins University U.S.-Mexico Workshop on Optimization and its Applications 8 January 2018 How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 1 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary Thanks, Don! How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 2 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary Outline Motivation Contemporary Analyses Partitioning the Search Space Behavior of Regularization Methods Summary & Perspectives How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 3 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary History Nonlinear optimization has had parallel developments convexity smoothness Rockafellar Powell Fenchel Fletcher Nemirovski Goldfarb Nesterov Nocedal subgradient sufficient inequality decrease convergence, convergence, complexity fast local guarantees convergence Worlds are (finally) colliding! How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 5 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary Worst-case complexity for nonconvex optimization Here is how we do it now: Assuming Lipschitz continuity of derivatives. . . . . . upper bound on # of iterations until �∇ f ( x k ) � 2 ≤ ǫ ? Gradient descent Newton / trust region Cubic regularization O ( ǫ − 2 ) O ( ǫ − 2 ) O ( ǫ − 3 / 2 ) How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 6 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary Self-examination But. . . ◮ Is this the best way to characterize our algorithms? ◮ Is this the best way to represent our algorithms? How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 7 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary Self-examination But. . . ◮ Is this the best way to characterize our algorithms? ◮ Is this the best way to represent our algorithms? People listen! Cubic regularization. . . ◮ Griewank (1981) ◮ Nesterov & Polyak (2006) ◮ Weiser, Deuflhard, Erdmann (2007) ◮ Cartis, Gould, Toint (2011), the ARC method . . . is a framework to which researchers have been attracted. . . ◮ Agarwal, Allen-Zhu, Bullins, Hazan, Ma (2017) ◮ Carmon, Duchi (2017) ◮ Kohler, Lucchi (2017) ◮ Peng, Roosta-Khorasan, Mahoney (2017) However, there remains a large gap between theory and practice! How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 7 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary Purpose of this talk Our goal: A complementary approach to characterize algorithms. ◮ global convergence ◮ worst-case complexity, contemporary type + our approach ◮ local convergence rate How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 8 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary Purpose of this talk Our goal: A complementary approach to characterize algorithms. ◮ global convergence ◮ worst-case complexity, contemporary type + our approach ◮ local convergence rate We’re admitting: Our approach does not give the complete picture. But we believe it is useful! How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 8 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary Purpose of this talk Our goal: A complementary approach to characterize algorithms. ◮ global convergence ◮ worst-case complexity, contemporary type + our approach ◮ local convergence rate We’re admitting: Our approach does not give the complete picture. But we believe it is useful! Nonconvexity is difficult in every sense! ◮ Can we accept a characterization strategy with some (literal) holes? ◮ Or should we be purists, even if we throw out the baby with the bathwater... How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 8 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary Simple setting Consider the iteration x k +1 ← x k − 1 L g k for all k ∈ N . A contemporary complexity analysis considers the set G ( ǫ g ) := { x ∈ R n : � g ( x ) � 2 ≤ ǫ g } and aims to find an upper bound on the cardinality of K g ( ǫ g ) := { k ∈ N : x k �∈ G ( ǫ g ) } . g k := ∇ f ( x k ), g := ∇ f How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 10 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary Upper bound on |K g ( ǫ g ) | Using s k = − 1 L g k and the upper bound f k +1 ≤ f k + g T k s k + 1 2 L � s k � 2 2 , one finds with f inf := inf x ∈ R n f ( x ) that 2 L � g k � 2 1 f k − f k +1 ≥ 2 2 L |K g ( ǫ g ) | ǫ 2 1 = ⇒ ( f 0 − f inf ) ≥ g |K g ( ǫ g ) | ≤ 2 L ( f 0 − f inf ) ǫ − 2 = ⇒ g . How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 11 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary “Nice” f But what if f is “nice”? . . . e.g., satisfying the Polyak-� Lojasiewicz condition for c ∈ (0 , ∞ ), i.e., 2 c � g ( x ) � 2 1 2 for all x ∈ R n . f ( x ) − f inf ≤ Now consider the set F ( ǫ f ) := { x ∈ R n : f ( x ) − f inf ≤ ǫ f } and consider an upper bound on the cardinality of K f ( ǫ f ) := { k ∈ N : x k �∈ F ( ǫ f ) } . How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 12 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary Upper bound on |K f ( ǫ f ) | Using s k = − 1 L g k and the upper bound f k +1 ≤ f k + g T k s k + 1 2 L � s k � 2 2 , one finds that 2 L � g k � 2 1 f k − f k +1 ≥ 2 ≥ c L ( f k − f inf ) (1 − c = ⇒ L )( f k − f inf ) ≥ f k +1 − f inf (1 − c L ) k ( f 0 − f inf ) ≥ f k − f inf = ⇒ � f 0 − f inf � � � �� − 1 L = ⇒ |K f ( ǫ f ) | ≤ log log . L − c ǫ f How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 13 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary For the first step. . . In the “general nonconvex” analysis. . . . . . the expected decrease for the first step is much more pessimistic: 2 L ǫ 2 1 general nonconvex: f 0 − f 1 ≥ g (1 − c PL condition: L )( f 0 − f inf ) ≥ f 1 − f inf . . . and it remains more pessimistic throughout! How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 14 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary Upper bounds on |K f ( ǫ f ) | versus |K g ( ǫ g ) | Let f ( x ) = 1 2 x 2 , meaning that g ( x ) = x . ◮ Let ǫ f = 1 2 ǫ 2 g , meaning that F ( ǫ f ) = G ( ǫ g ). ◮ Let x 0 = 10, c = 1, and L = 2. (Similar pictures for any L > 1.) How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 15 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary Upper bounds on |K f ( ǫ f ) | versus |{ k ∈ N : 1 2 � g k � 2 2 > ǫ g }| 2 g ( x ) 2 = 1 Let f ( x ) = 1 2 x 2 , meaning that 1 2 x 2 . ◮ Let ǫ f = ǫ g , meaning that F ( ǫ f ) = G ( ǫ g ). ◮ Let x 0 = 10, c = 1, and L = 2. (Similar pictures for any L > 1.) How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 16 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary Bad worst-case! Worst-case complexity bounds in the general nonconvex case are very pessimistic. ◮ The analysis immediately admits a large gap when the function is nice. ◮ The “essentially tight” examples for the worst-case bounds are. . . weird. 1 1 Cartis, Gould, Toint (2010) How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 17 of 32

Motivation Contemporary Analyses Partitioning Regularization Methods Summary Plea Let’s not have these be the problems that dictate how we ◮ characterize our algorithms and ◮ represent our algorithms to the world! How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization 18 of 32

How to Characterize the Worst-Case Performance of Algorithms for - PowerPoint PPT Presentation

Motivation Contemporary Analyses Partitioning Regularization Methods Summary How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis , Lehigh University joint work with Daniel P. Robinson ,

Typical versus Worst Case Design in Networking Nandita Dukkipati Yashar Ganjali, Rui Zhang-Shen

Information Geometry in Mathematical Finance: Model Risk, Worst and Almost Worst Scenarios Imre

Worst-case Ethernet Network Latency for Shaped Sources Max Azarov, Standard Microsystems (SMSC)

Comparison of Efficiency Binary Binomial Procedure (worst- (worst- (amortized) case) case)

Lattices that Admit Logarithmic Worst-Case to Average-Case Connection Factors Chris Peikert 1

quiz insertion sort: worst-case time complexity? best-case time complexity? in-place?

Lattices: From Worst-Case, to Average-Case, to Cryptography Chris Peikert Georgia Institute of

Performance Questions How to characterize the performance of applications and systems?

The 10 Worst Presentation Habits Speakers can be their own worst enemies. Here are our expert's

Using Best-Worst Using Best-Worst Scaling to measure all Scaling to measure all sorts of things

Florida Man: The World's Worst Superhero Florida Man: The World's Worst Superhero Miami Herald

Values in Worst-Case Scenarios Per Wikman-Svahn, Ph.D. Researcher, Department of Philosophy and

Worst-Case Optimal Redistribution of VCG Payments in Multi-Unit Auctions Mingyu Guo Joint work

Worst-Case Ef fi cient External-Memory Priority Queues Gerth Stlting Brodal

Review of insertionSort and mergeSort insertionSort I worst-case running time: ( n 2 ) Inf 2B:

Heapsort In the last class Mergesort Worst Case Analysis of Mergesort Lower Bounds

Modernization in Manitoba . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Speed

Schooling and children s subjective well-being Case Western Reserve February 6, 2014 Scott

Investor presentation October 2017 SHERRITT INTERNATIONAL CORPORATION 1 Forward-looking

Flourishing individuals, connected com m unities: I m plications for local strategic planning

Manufacturing Services Presentation January 3, 2019 INDUSTRY CHALLENGES Demand for more complex

devices and heating grids with benefits for energy suppliers and customers Markus Puchegger

Gravitational Waves from Core Collapse Supernovae: New Simulations and their Practical Application

OUR OBJECTIVES: Towards a successful At lowest possible ...while preserving ...and generating