Complexity of Composite Optimization Guanghui (George) Lan - PowerPoint PPT Presentation

Complexity of Composite Optimization Guanghui (George) Lan University of Florida Georgia Institue of Technology (from 1/2016) NIPS Optimization for Machine Learning Workshop December 11, 2015

Background Complex composite problems Finite-sum problems Summary General CP methods Problem: Ψ ∗ = min x ∈ X Ψ( x ) . X closed and convex. Ψ is convex x ) − Ψ ∗ ≤ ǫ . Goal: to find an ǫ -solution, i.e., ¯ x ∈ X s.t. Ψ(¯ Complexity: the number of (sub)gradient evaluations of Ψ – Ψ is smooth: O ( 1 / √ ǫ ) . Ψ is nonsmooth: O ( 1 /ǫ 2 ) . Ψ is strongly convex: O ( log ( 1 /ǫ )) . beamer-tu-logo 2 / 40

Background Complex composite problems Finite-sum problems Summary Composite optimization problems We consider composite problems which can be modeled as Ψ ∗ = min x ∈ X { Ψ( x ) := f ( x ) + h ( x ) } . Here, f : X → R is a smooth and expensive term (data fitting), h : X → R is a nonsmooth regularization term (solution structures), and X is a closed convex feasible set. Three Challenging Cases h or X are not necessarily simple. f given by the summation of many terms. f (or h ) is nonconvex and possibly stochastic. beamer-tu-logo 3 / 40

Background Complex composite problems Finite-sum problems Summary Existing complexity results Problem: Ψ ∗ := min x ∈ X { Ψ( x ) := f ( x ) + h ( x ) } . First-order methods: iterative methods which operate with the gradients (subgradients) of f and h . Complexity: number of iterations needed to find an ǫ -solution, x ) − Ψ ∗ ≤ ǫ . i.e., a point ¯ x ∈ X s.t. Ψ(¯ Easy case: h simple, X simple Pr X , h ( y ) := argmin x ∈ X � y − x � 2 + h ( x ) is easy to compute (e.g., compressed sensing). Complexity: O ( 1 / √ ǫ ) (Nesterov 07, Tseng 08, Beck and Teboulle 09). beamer-tu-logo 4 / 40

Background Complex composite problems Finite-sum problems Summary More difficult cases h general, X simple h is a general nonsmooth function; P X := argmin x ∈ X � y − x � 2 is easy to compute (e.g., total variation). Complexity: O ( 1 /ǫ 2 ) . h structured, X simple h is structured, e.g., h ( x ) = max y ∈ Y � Ax , y � ; P X is easy to compute (e.g., total variation). Complexity: O ( 1 /ǫ ) . h simple, X complicated L X , h ( y ) := argmin x ∈ X � y , x � + h ( x ) is easy to compute (e.g., matrix completion).Complexity: O ( 1 /ǫ ) . beamer-tu-logo 5 / 40

Background Complex composite problems Finite-sum problems Summary Motivation O ( 1 / √ ǫ ) h simple, X simple 100 O ( 1 /ǫ 2 ) 10 8 h general, X simple 10 4 h structured, X simple O ( 1 /ǫ ) 10 4 h simple, X complicated O ( 1 /ǫ ) More general h or more complicated X ⇓ Slow convergence of first-order algorithms ⇓ A large number of gradient evaluations of ∇ f beamer-tu-logo 6 / 40

Background Complex composite problems Finite-sum problems Summary Motivation O ( 1 / √ ǫ ) h simple, X simple 100 O ( 1 /ǫ 2 ) 10 8 h general, X simple 10 4 h structured, X simple O ( 1 /ǫ ) 10 4 h simple, X complicated O ( 1 /ǫ ) More general h or more complicated X ⇓ Slow convergence of first-order algorithms × ? ⇓ A large number of gradient evaluations of ∇ f Question: Can we skip the computation of ∇ f ? beamer-tu-logo 6 / 40

Background Complex composite problems Finite-sum problems Summary Composite problems Ψ ∗ = min x ∈ X { Ψ( x ) := f ( x ) + h ( x ) } . f is smooth, i.e., ∃ L > 0 s.t. ∀ x , y ∈ X , �∇ f ( y ) − ∇ f ( x ) � ≤ L � y − x � . h is nonsmooth, i.e., ∃ M > 0 s.t. ∀ x , y ∈ X , | h ( x ) − h ( y ) | ≤ M � y − x � . P X is simple to compute. Question: How many number of gradient evaluations of ∇ f and subgradient evaluations of h ′ are needed to find an ǫ -solution? beamer-tu-logo 7 / 40

Background Complex composite problems Finite-sum problems Summary Existing results Existing algorithms evaluate ∇ f and h ′ together at each iteration: Mirror-prox method (Juditsky, Nemirovski and Travel, 11): � � ǫ + M 2 L O ǫ 2 Accelerated stochastic approximation (Lan, 12): �� ǫ + M 2 L O ǫ 2 Issue: Whenever the second term dominates, the number of gradient evaluations ∇ f is given by O ( 1 /ǫ 2 ) . beamer-tu-logo 8 / 40

Background Complex composite problems Finite-sum problems Summary Bottleneck for composite problems The computation of ∇ f , however, is often the bottleneck in comparison with that of h ′ . The computation of ∇ f invovles a large data set, while that of h ′ only involves a very sparse matrix. In total variation minimization, the computation of gradient: O ( m × n ) , and the computation of subgradient: O ( n ) . Can we reduce the number of gradient evaluations for ∇ f from O ( 1 /ǫ 2 ) to O ( 1 / √ ǫ ) , while still maintaining the optimal O ( 1 /ǫ 2 ) bound on subgradient evaluations for h ′ ? beamer-tu-logo 9 / 40

Background Complex composite problems Finite-sum problems Summary The gradient sliding algorithm Algorithm 1 The gradient sliding (GS) algorithm Input: Initial point x 0 ∈ X and iteration limit N . Let β k ≥ 0 , γ k ≥ 0, and T k ≥ 0 be given and set ¯ x 0 = x 0 . for k = 1 , 2 , . . . , N do 1. Set x k = ( 1 − γ k )¯ x k − 1 + γ k x k − 1 and g k = ∇ f ( x k ) . 2. Set ( x k , ˜ x k ) = PS ( g k , x k − 1 , β k , T k ) . 3. Set ¯ x k = ( 1 − γ k )¯ x k − 1 + γ k ˜ x k . end for Output: ¯ x N . PS : the prox-sliding procedure. beamer-tu-logo 10 / 40

Background Complex composite problems Finite-sum problems Summary The PS procedure Procedure ( x + , ˜ x + ) = PS ( g , x , β, T ) Let the parameters p t > 0 and θ t ∈ [ 0 , 1 ] , t = 1 , . . . , be given. Set u 0 = ˜ u 0 = x . for t = 1 , 2 , . . . , T do 2 � u − x � 2 + β p t u t = argmin u ∈ X � g + h ′ ( u t − 1 ) , u � + β 2 � u − u t − 1 � 2 , ˜ u t = ( 1 − θ t )˜ u t − 1 + θ t u t . end for Set x + = u T and ˜ x + = ˜ u T . Note: � · − · � 2 / 2 can be replaced by the more general Bregman distance V ( x , u ) = ω ( u ) − ω ( x ) − �∇ ω ( x ) , u − x � . beamer-tu-logo 11 / 40

Background Complex composite problems Finite-sum problems Summary Remarks When supplied with g ( · ) , x ∈ X , β , and T , the PS procedure computes a pair of approximate solutions ( x + , ˜ x + ) ∈ X × X for the problem of: � � Φ( u ) := � g , u � + h ( u ) + β 2 � u − x � 2 . argmin u ∈ X In each iteration, the subproblem is given by � Φ k ( u ) := �∇ f ( x k ) , u � + h ( u ) + β k � 2 � u − x k � 2 . argmin u ∈ X beamer-tu-logo 12 / 40

Background Complex composite problems Finite-sum problems Summary Convergence of the PS proedure Proposition If { p t } and { θ t } in the PS procedure satisfy p t = t θ t = 2 ( t + 1 ) and t ( t + 3 ) , 2 then for any t ≥ 1 and u ∈ X, β ( t + 3 )+ β � u 0 − u � 2 M 2 u t ) − Φ( u )+ β ( t + 1 )( t + 2 ) � u t − u � 2 ≤ Φ(˜ . 2 t ( t + 3 ) t ( t + 3 ) beamer-tu-logo 13 / 40

Background Complex composite problems Finite-sum problems Summary Convergence of the GS algorithm Theorem Suppose that the previous conditions on { p t } and { θ t } hold, and that N is given a priori. If � M 2 Nk 2 � β k = 2 L 2 k , γ k = k + 1 , and T k = ˜ DL 2 for some ˜ D > 0 , then � 3 � x 0 − x ∗ � 2 L � + 2 ˜ x N ) − Ψ( x ∗ ) ≤ Ψ(¯ . D N ( N + 1 ) 2 Remark: Do NOT need N given a priori if X is bounded. beamer-tu-logo 14 / 40

Complexity of Composite Optimization Guanghui (George) Lan - PowerPoint PPT Presentation

Complexity of Composite Optimization Guanghui (George) Lan University of Florida Georgia Institue of Technology (from 1/2016) NIPS Optimization for Machine Learning Workshop December 11, 2015 Background Complex composite problems Finite-sum

COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER

Solving composite optimization problems, with applications to phase retrieval John Duchi (based

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

The Chain Rule Given a composite function: The Chain Rule Given a composite function: h ( x ) =

Plan Composite Likelihood Methods What are composite likelihoods? David Firth Where are

Composite Trust Composite Trust Composite Trust A formal derivation of conjunction A formal

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Background Background Text Complexity Text Complexity Text Complexity Sowmya V.B., Sowmya

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

IN 5210 Complexity Theory Complexity Complexity: Socio-technical (Internet, globalization)

Communication Complexity Lecture 23 Computing with remote inputs 1 Communication Complexity

Complexity and Character of Human Languages The Faculty of Language Informatics 2A: Lecture 28

Bayesian Optimization of Composite Functions Ral Astudillo Cornell University Joint work

Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise

TOUGHNESS DETERMINATION IN COMPOSITE TOUGHNESS DETERMINATION IN COMPOSITE MULTIMATERIAL CLOSED

New primal-dual subgradient methods for Convex Problems with Functional Constraints Yurii

Convex Optimization in Machine Learning and Inverse Problems Part 3: Augmented Lagrangian Methods

CSCE 970 Lecture 7: Earth, cant afford to visit each area to deter- Clustering: Basic Concepts

Edit Timelines & Efficient Streaming of Media Mangala Prabhu and Eric Reinecke Agenda

I see a cookie banner Is it even legal? Nataliia Bielova and Cristiana Santos joint work with

Probing correla4ons in A=3 systems using electron scaEering Reynier Cruz Torres Hall A/C

Non-Smooth Convex Optimization in Data Sciences Jalal Fadili Normandie Universit-ENSICAEN,

Lagrange Function and KKT Conditions October 26, 2018 265 / 429 How do you compute the table of

Complexity of Composite Optimization Guanghui (George) Lan - PowerPoint PPT Presentation

Complexity of Composite Optimization Guanghui (George) Lan University of Florida Georgia Institue of Technology (from 1/2016) NIPS Optimization for Machine Learning Workshop December 11, 2015 Background Complex composite problems Finite-sum

COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER

Solving composite optimization problems, with applications to phase retrieval John Duchi (based

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

The Chain Rule Given a composite function: The Chain Rule Given a composite function: h ( x ) =

Plan Composite Likelihood Methods What are composite likelihoods? David Firth Where are

Composite Trust Composite Trust Composite Trust A formal derivation of conjunction A formal

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Hans Vangheluwe Modelling and Simulation Causes of Complexity Dealing with Complexity

Background Background Text Complexity Text Complexity Text Complexity Sowmya V.B., Sowmya

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

IN 5210 Complexity Theory Complexity Complexity: Socio-technical (Internet, globalization)

Communication Complexity Lecture 23 Computing with remote inputs 1 Communication Complexity

Complexity and Character of Human Languages The Faculty of Language Informatics 2A: Lecture 28

Bayesian Optimization of Composite Functions Ral Astudillo Cornell University Joint work

Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise

TOUGHNESS DETERMINATION IN COMPOSITE TOUGHNESS DETERMINATION IN COMPOSITE MULTIMATERIAL CLOSED

New primal-dual subgradient methods for Convex Problems with Functional Constraints Yurii

Convex Optimization in Machine Learning and Inverse Problems Part 3: Augmented Lagrangian Methods

CSCE 970 Lecture 7: Earth, cant afford to visit each area to deter- Clustering: Basic Concepts

Edit Timelines &amp; Efficient Streaming of Media Mangala Prabhu and Eric Reinecke Agenda

I see a cookie banner Is it even legal? Nataliia Bielova and Cristiana Santos joint work with

Probing correla4ons in A=3 systems using electron scaEering Reynier Cruz Torres Hall A/C

Non-Smooth Convex Optimization in Data Sciences Jalal Fadili Normandie Universit-ENSICAEN,

Lagrange Function and KKT Conditions October 26, 2018 265 / 429 How do you compute the table of

Edit Timelines & Efficient Streaming of Media Mangala Prabhu and Eric Reinecke Agenda