complexity of composite optimization
play

Complexity of Composite Optimization Guanghui (George) Lan - PowerPoint PPT Presentation

Complexity of Composite Optimization Guanghui (George) Lan University of Florida Georgia Institue of Technology (from 1/2016) NIPS Optimization for Machine Learning Workshop December 11, 2015 Background Complex composite problems Finite-sum


  1. Complexity of Composite Optimization Guanghui (George) Lan University of Florida Georgia Institue of Technology (from 1/2016) NIPS Optimization for Machine Learning Workshop December 11, 2015

  2. Background Complex composite problems Finite-sum problems Summary General CP methods Problem: Ψ ∗ = min x ∈ X Ψ( x ) . X closed and convex. Ψ is convex x ) − Ψ ∗ ≤ ǫ . Goal: to find an ǫ -solution, i.e., ¯ x ∈ X s.t. Ψ(¯ Complexity: the number of (sub)gradient evaluations of Ψ – Ψ is smooth: O ( 1 / √ ǫ ) . Ψ is nonsmooth: O ( 1 /ǫ 2 ) . Ψ is strongly convex: O ( log ( 1 /ǫ )) . beamer-tu-logo 2 / 40

  3. Background Complex composite problems Finite-sum problems Summary Composite optimization problems We consider composite problems which can be modeled as Ψ ∗ = min x ∈ X { Ψ( x ) := f ( x ) + h ( x ) } . Here, f : X → R is a smooth and expensive term (data fitting), h : X → R is a nonsmooth regularization term (solution structures), and X is a closed convex feasible set. Three Challenging Cases h or X are not necessarily simple. f given by the summation of many terms. f (or h ) is nonconvex and possibly stochastic. beamer-tu-logo 3 / 40

  4. Background Complex composite problems Finite-sum problems Summary Existing complexity results Problem: Ψ ∗ := min x ∈ X { Ψ( x ) := f ( x ) + h ( x ) } . First-order methods: iterative methods which operate with the gradients (subgradients) of f and h . Complexity: number of iterations needed to find an ǫ -solution, x ) − Ψ ∗ ≤ ǫ . i.e., a point ¯ x ∈ X s.t. Ψ(¯ Easy case: h simple, X simple Pr X , h ( y ) := argmin x ∈ X � y − x � 2 + h ( x ) is easy to compute (e.g., compressed sensing). Complexity: O ( 1 / √ ǫ ) (Nesterov 07, Tseng 08, Beck and Teboulle 09). beamer-tu-logo 4 / 40

  5. Background Complex composite problems Finite-sum problems Summary Existing complexity results Problem: Ψ ∗ := min x ∈ X { Ψ( x ) := f ( x ) + h ( x ) } . First-order methods: iterative methods which operate with the gradients (subgradients) of f and h . Complexity: number of iterations needed to find an ǫ -solution, x ) − Ψ ∗ ≤ ǫ . i.e., a point ¯ x ∈ X s.t. Ψ(¯ Easy case: h simple, X simple Pr X , h ( y ) := argmin x ∈ X � y − x � 2 + h ( x ) is easy to compute (e.g., compressed sensing). Complexity: O ( 1 / √ ǫ ) (Nesterov 07, Tseng 08, Beck and Teboulle 09). beamer-tu-logo 4 / 40

  6. Background Complex composite problems Finite-sum problems Summary More difficult cases h general, X simple h is a general nonsmooth function; P X := argmin x ∈ X � y − x � 2 is easy to compute (e.g., total variation). Complexity: O ( 1 /ǫ 2 ) . h structured, X simple h is structured, e.g., h ( x ) = max y ∈ Y � Ax , y � ; P X is easy to compute (e.g., total variation). Complexity: O ( 1 /ǫ ) . h simple, X complicated L X , h ( y ) := argmin x ∈ X � y , x � + h ( x ) is easy to compute (e.g., matrix completion).Complexity: O ( 1 /ǫ ) . beamer-tu-logo 5 / 40

  7. Background Complex composite problems Finite-sum problems Summary More difficult cases h general, X simple h is a general nonsmooth function; P X := argmin x ∈ X � y − x � 2 is easy to compute (e.g., total variation). Complexity: O ( 1 /ǫ 2 ) . h structured, X simple h is structured, e.g., h ( x ) = max y ∈ Y � Ax , y � ; P X is easy to compute (e.g., total variation). Complexity: O ( 1 /ǫ ) . h simple, X complicated L X , h ( y ) := argmin x ∈ X � y , x � + h ( x ) is easy to compute (e.g., matrix completion).Complexity: O ( 1 /ǫ ) . beamer-tu-logo 5 / 40

  8. Background Complex composite problems Finite-sum problems Summary More difficult cases h general, X simple h is a general nonsmooth function; P X := argmin x ∈ X � y − x � 2 is easy to compute (e.g., total variation). Complexity: O ( 1 /ǫ 2 ) . h structured, X simple h is structured, e.g., h ( x ) = max y ∈ Y � Ax , y � ; P X is easy to compute (e.g., total variation). Complexity: O ( 1 /ǫ ) . h simple, X complicated L X , h ( y ) := argmin x ∈ X � y , x � + h ( x ) is easy to compute (e.g., matrix completion).Complexity: O ( 1 /ǫ ) . beamer-tu-logo 5 / 40

  9. Background Complex composite problems Finite-sum problems Summary Motivation O ( 1 / √ ǫ ) h simple, X simple 100 O ( 1 /ǫ 2 ) 10 8 h general, X simple 10 4 h structured, X simple O ( 1 /ǫ ) 10 4 h simple, X complicated O ( 1 /ǫ ) More general h or more complicated X ⇓ Slow convergence of first-order algorithms ⇓ A large number of gradient evaluations of ∇ f beamer-tu-logo 6 / 40

  10. Background Complex composite problems Finite-sum problems Summary Motivation O ( 1 / √ ǫ ) h simple, X simple 100 O ( 1 /ǫ 2 ) 10 8 h general, X simple 10 4 h structured, X simple O ( 1 /ǫ ) 10 4 h simple, X complicated O ( 1 /ǫ ) More general h or more complicated X ⇓ Slow convergence of first-order algorithms × ? ⇓ A large number of gradient evaluations of ∇ f Question: Can we skip the computation of ∇ f ? beamer-tu-logo 6 / 40

  11. Background Complex composite problems Finite-sum problems Summary Composite problems Ψ ∗ = min x ∈ X { Ψ( x ) := f ( x ) + h ( x ) } . f is smooth, i.e., ∃ L > 0 s.t. ∀ x , y ∈ X , �∇ f ( y ) − ∇ f ( x ) � ≤ L � y − x � . h is nonsmooth, i.e., ∃ M > 0 s.t. ∀ x , y ∈ X , | h ( x ) − h ( y ) | ≤ M � y − x � . P X is simple to compute. Question: How many number of gradient evaluations of ∇ f and subgradient evaluations of h ′ are needed to find an ǫ -solution? beamer-tu-logo 7 / 40

  12. Background Complex composite problems Finite-sum problems Summary Existing results Existing algorithms evaluate ∇ f and h ′ together at each iteration: Mirror-prox method (Juditsky, Nemirovski and Travel, 11): � � ǫ + M 2 L O ǫ 2 Accelerated stochastic approximation (Lan, 12): �� � ǫ + M 2 L O ǫ 2 Issue: Whenever the second term dominates, the number of gradient evaluations ∇ f is given by O ( 1 /ǫ 2 ) . beamer-tu-logo 8 / 40

  13. Background Complex composite problems Finite-sum problems Summary Bottleneck for composite problems The computation of ∇ f , however, is often the bottleneck in comparison with that of h ′ . The computation of ∇ f invovles a large data set, while that of h ′ only involves a very sparse matrix. In total variation minimization, the computation of gradient: O ( m × n ) , and the computation of subgradient: O ( n ) . Can we reduce the number of gradient evaluations for ∇ f from O ( 1 /ǫ 2 ) to O ( 1 / √ ǫ ) , while still maintaining the optimal O ( 1 /ǫ 2 ) bound on subgradient evaluations for h ′ ? beamer-tu-logo 9 / 40

  14. Background Complex composite problems Finite-sum problems Summary The gradient sliding algorithm Algorithm 1 The gradient sliding (GS) algorithm Input: Initial point x 0 ∈ X and iteration limit N . Let β k ≥ 0 , γ k ≥ 0, and T k ≥ 0 be given and set ¯ x 0 = x 0 . for k = 1 , 2 , . . . , N do 1. Set x k = ( 1 − γ k )¯ x k − 1 + γ k x k − 1 and g k = ∇ f ( x k ) . 2. Set ( x k , ˜ x k ) = PS ( g k , x k − 1 , β k , T k ) . 3. Set ¯ x k = ( 1 − γ k )¯ x k − 1 + γ k ˜ x k . end for Output: ¯ x N . PS : the prox-sliding procedure. beamer-tu-logo 10 / 40

  15. Background Complex composite problems Finite-sum problems Summary The PS procedure Procedure ( x + , ˜ x + ) = PS ( g , x , β, T ) Let the parameters p t > 0 and θ t ∈ [ 0 , 1 ] , t = 1 , . . . , be given. Set u 0 = ˜ u 0 = x . for t = 1 , 2 , . . . , T do 2 � u − x � 2 + β p t u t = argmin u ∈ X � g + h ′ ( u t − 1 ) , u � + β 2 � u − u t − 1 � 2 , ˜ u t = ( 1 − θ t )˜ u t − 1 + θ t u t . end for Set x + = u T and ˜ x + = ˜ u T . Note: � · − · � 2 / 2 can be replaced by the more general Bregman distance V ( x , u ) = ω ( u ) − ω ( x ) − �∇ ω ( x ) , u − x � . beamer-tu-logo 11 / 40

  16. Background Complex composite problems Finite-sum problems Summary Remarks When supplied with g ( · ) , x ∈ X , β , and T , the PS procedure computes a pair of approximate solutions ( x + , ˜ x + ) ∈ X × X for the problem of: � � Φ( u ) := � g , u � + h ( u ) + β 2 � u − x � 2 . argmin u ∈ X In each iteration, the subproblem is given by � Φ k ( u ) := �∇ f ( x k ) , u � + h ( u ) + β k � 2 � u − x k � 2 . argmin u ∈ X beamer-tu-logo 12 / 40

  17. Background Complex composite problems Finite-sum problems Summary Convergence of the PS proedure Proposition If { p t } and { θ t } in the PS procedure satisfy p t = t θ t = 2 ( t + 1 ) and t ( t + 3 ) , 2 then for any t ≥ 1 and u ∈ X, β ( t + 3 )+ β � u 0 − u � 2 M 2 u t ) − Φ( u )+ β ( t + 1 )( t + 2 ) � u t − u � 2 ≤ Φ(˜ . 2 t ( t + 3 ) t ( t + 3 ) beamer-tu-logo 13 / 40

  18. Background Complex composite problems Finite-sum problems Summary Convergence of the GS algorithm Theorem Suppose that the previous conditions on { p t } and { θ t } hold, and that N is given a priori. If � M 2 Nk 2 � β k = 2 L 2 k , γ k = k + 1 , and T k = ˜ DL 2 for some ˜ D > 0 , then � 3 � x 0 − x ∗ � 2 L � + 2 ˜ x N ) − Ψ( x ∗ ) ≤ Ψ(¯ . D N ( N + 1 ) 2 Remark: Do NOT need N given a priori if X is bounded. beamer-tu-logo 14 / 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend