Recent Progress on Error Bounds for Structured Convex Programming - PowerPoint PPT Presentation

Recent Progress on Error Bounds for Structured Convex Programming Zirui Zhou Joint work with Anthony Man-Cho So Department of Systems Engineering & Engineering Management The Chinese University of Hong Kong September 3, 2014, Beijing

Outline • overview of error bound • associated solution mapping • upper Lipschitzian continuity of multifunctions • a sufficient condition for error bound • strongly convex functions • convex functions with polyhedral epigraph • group-lasso regularizer • conclusion Error Bounds for Structured Convex Programming 1

Structured Convex Programming Consider the structured problem: x ∈ R n F ( x ) := f ( x ) + τP ( x ) , min τ > 0 given, optimal value v ∗ , optimal solution set X . • f : convex and continuously differentiable; • P : lower semicontinuous and convex, like – indicator function of a non-empty closed convex set, – various regularizers in application, i.e., ℓ 1 , group-lasso. Error Bounds for Structured Convex Programming 2

Residual Function Define a residual function R : R n → R n , � ℓ F ( x + d ; x ) + 1 � 2 � d � 2 R ( x ) := arg min , d ∈ R n where � · � is the usual vector 2 -norm and ℓ F is the linearization of F , ℓ F ( y ; x ) := f ( x ) + �∇ f ( x ) , y − x � + τP ( y ) . • x ∈ X ⇔ � R ( x ) � = 0 , • easy to compute. Error Bounds for Structured Convex Programming 3

Residual Function: Examples • P ( x ) ≡ 0 , R ( x ) = −∇ f ( x ) ; R ( x ) = x − [ x − ∇ f ( x )] + • P ( x ) = I D ( x ) , D ; • P ( x ) = � x � 1 , R ( x ) = x − s τ ( x − ∇ f ( x )) ; where [ · ] + D is the projection operator, s τ ( · ) is the vector shrinkage operator. Let v = s τ ( x ) ,  x i − τ, x i ≥ τ ;  − τ < x i < τ ; v i = 0 , x i + τ, x i ≤ − τ.  Error Bounds for Structured Convex Programming 4

Error Bound: Definition • Forward error: dist ( x, X ) . • Backward error: � R ( x ) � . Error Bound Condition: there exists κ > 0 and a closed set U ⊆ R n , such that dist ( x, X ) ≤ κ � R ( x ) � , whenever x ∈ U . • Global error bound: U = R n . • Local error bound: U is the closure of a neighbourhood of X . Error Bounds for Structured Convex Programming 5

What If Error Bound Holds • Stopping criterion: estimate dist ( x k , X ) , dist ( x k , X ) ≤ κ � R ( x k ) � . • Linear convergence: for example, under mild assumptions, � R ( x k ) � ≤ κ 1 � x k +1 − x k � , k = 1 , 2 , . . . , This gives a key step for linear convergence, dist ( x k , X ) ≤ κ � R ( x k ) � ≤ κκ 1 � x k +1 − x k � , – global error bound ⇒ global linear rate; – local error bound ⇒ asymptotic linear rate. Error Bounds for Structured Convex Programming 6

Conditions for Error Bounds: Existing Results (a) f is strongly convex [Pang’87] ; (b) f ( x ) = h ( Ax ) , P ( x ) is of polyhedral epigraph [Luo-Tseng’92] ; (c) f ( x ) = h ( Ax ) , P ( x ) is the group-lasso or sparse group-lasso regularizer [Tseng’09, Zhang-Jiang-Luo’13] . Notations in case (b) and (c), • A is any matrix; • h is strongly (strictly) convex differentiable function with ∇ h Lipschitz continuous; • group-lasso: for x ∈ R n , P ( x ) = � J ∈J ω J � x J � 2 . J is a non-overlapping partition of { 1 , . . . , n } . Error Bounds for Structured Convex Programming 7

Assumptions Throughout, for the structured problem x ∈ R n F ( x ) := f ( x ) + τP ( x ) , min (1) we make the following assumptions: • f takes the form f ( x ) = h ( Ax ) , where A ∈ R m × n is a matrix, h : R m → R is σ -strongly convex and ∇ h is L -Lipschitz continuous; • X is non-empty. Error Bounds for Structured Convex Programming 8

Optimal Solution Set First-order optimality condition, X = { x ∈ R n | 0 ∈ ∇ f ( x ) + τ∂P ( x ) } . Since h is strictly convex, we have y ∈ R m such that Ax = ¯ • there exists ¯ y, ∀ x ∈ X ; • ∇ f ( x ) = A T ∇ h ( Ax ) , by letting ¯ g = A T ∇ h (¯ y ) , then ∇ f ( x ) = ¯ g, ∀ x ∈ X . Thus, by assuming ¯ y and ¯ g are known, X has the following characterization, X = { x ∈ R n | Ax = ¯ y, − ¯ g ∈ τ∂P ( x ) } . Error Bounds for Structured Convex Programming 9

Solution Mapping • Let Σ : R n × R m ⇒ R n be a multifunction (set-valued function) defined as Σ( t, e ) := { x ∈ R n | Ax = t, e ∈ ∂P ( x ) } , ∀ t ∈ R m , e ∈ R n . We say Σ is the solution mapping associated with (1). • Relationship with optimal solution set: X = Σ(¯ y, − ¯ g/τ ) . Error Bounds for Structured Convex Programming 10

Upper Lipschitzian Continuity e ) ∈ R m × R n , we say For any solution mapping Σ and any (¯ t, ¯ • Σ is globally upper Lipschitzian continuous (global-ULC) at (¯ t, ¯ e ) with modulus θ , if ∀ ( t, e ) ∈ R m × R n . Σ( t, e ) ⊆ Σ(¯ e ) + θ � ( t, e ) − (¯ e ) �B , t, ¯ t, ¯ • Σ is locally upper Lipschitzian continuous (local-ULC) at (¯ t, ¯ e ) with modulus θ , if there exists a constant δ > 0 such that Σ( t, e ) ⊆ Σ(¯ e ) + θ � ( t, e ) − (¯ whenever � ( t, e ) − (¯ t, ¯ t, ¯ e ) �B , t, ¯ e ) � ≤ δ. Here B is the unit ball of R m × R n . Error Bounds for Structured Convex Programming 11

A Sufficient Condition for Error Bound Proposition. Let Σ be the associated solution mapping of (1) , then y, − ¯ ⇒ global error bound holds. (a) Σ is global-ULC at (¯ g/τ ) = (b) Σ is local-ULC at (¯ y, − ¯ g/τ ) = ⇒ local error bound holds. Remark. In case (b), the strongly convex assumption on h can be relaxed to strictly convex, i.e., strongly convex on any compact subset of dom h . Error Bounds for Structured Convex Programming 12

Proof of Global Error Bound For any x ∈ R n , by optimality condition of R ( x ) , 0 ∈ ∇ f ( x ) + R ( x ) + τ∂P ( x + R ( x )) . This gives us � A ( x + R ( x )) , −∇ f ( x ) + R ( x ) � x + R ( x ) ∈ Σ . τ y, − ¯ y, − ¯ g/τ ) = X . Since Σ is global-ULC at (¯ g/τ ) and Σ(¯ � � � A ( x + R ( x )) , −∇ f ( x ) + R ( x ) � � � dist ( x + R ( x ) , X ) ≤ − (¯ y, − ¯ θ g/τ ) � � τ � � ˜ ≤ θ ( � Ax − ¯ y � + � R ( x ) � ) . The second inequality utilizes Lipschitz continuity of ∇ f . Error Bounds for Structured Convex Programming 13

x R is the projection of x + R ( x ) . Suppose ¯ x is the projection of x onto X , and ¯ x R − R ( x ) � x R � = � x + R ( x ) − ¯ dist ( x, X ) ≤ � x − ¯ ≤ dist ( x + R ( x ) , X ) + � R ( x ) � . Thus by choosing proper constant κ 0 , we obtain dist ( x, X ) ≤ κ 0 ( � Ax − ¯ y � + � R ( x ) � ) . Using the inequality that for any a, b ∈ R , ( a + b ) 2 ≤ 2( a 2 + b 2 ) , we have y � 2 + � R ( x ) � 2 ) . dist 2 ( x, X ) ≤ 2 κ 2 0 ( � Ax − ¯ (2) Since h is strongly convex with factor σ , y � 2 ≤ �∇ h ( Ax ) − ∇ h (¯ σ � Ax − ¯ y ) , Ax − ¯ y � = �∇ f ( x ) − ¯ g, x − ¯ x � . (3) Using Fermat’s rule for R ( x ) and standard arguments, there exists constant κ 1 > 0 such that �∇ f ( x ) − ¯ g, x − ¯ x � ≤ κ 1 � x − ¯ x � · � R ( x ) � . Error Bounds for Structured Convex Programming 14

Combining the above equality with (3) and (2), there exists κ 2 > 0 satisfying dist 2 ( x, X ) ≤ κ 2 ( � x − ¯ x � · � R ( x ) � + � R ( x ) � 2 ) . Solving this quadratic inequality, we obtain a constant κ such that dist ( x, X ) ≤ κ � R ( x ) � . This establishes the global error bound. � Error Bounds for Structured Convex Programming 15

ULC Property of Solution Mapping Solution mapping: Σ( t, e ) = { x ∈ R n | Ax = t, e ∈ ∂P ( x ) } , ∀ t ∈ R m , e ∈ R n . Next, we will study the ULC property of Σ for the following three cases. • f is strongly convex and P is any lower-semicontinuous convex function; • f is non-strongly convex and P is of polyhedral epigraph; • f is non-strongly convex and P is group-lasso regularizer. Error Bounds for Structured Convex Programming 16

f Strongly Convex • A is surjective, and has inverse A − 1 . • For any ( t, e ) ∈ R m × R n , Σ( t, e ) = { A − 1 ( t ) } , or Σ( t, e ) = ∅ . • If Σ is non-empty at (¯ t, ¯ e ) , then ∀ ( t, e ) ∈ R m × R n . Σ( t, e ) ⊆ Σ(¯ e ) + � A − 1 � · � t − ¯ t, ¯ t �B , So in this case, Σ is global-ULC at (¯ t, ¯ e ) and global error bound holds. Error Bounds for Structured Convex Programming 17

f Non-Strongly Convex and P Polyhedral • P is of polyhedral epigraph. epi P = { ( z, w ) ∈ R n × R | C z z + C w w ≤ d } , where C w , d ∈ R l , C z ∈ R l × R n . • Proposition: for any e ∈ R n , e ∈ ∂P ( x ) if and only if there exists s ∈ R such that ( x, s ) is the optimal solution of the following LP: − e T z + w min (4) C z z + C w w ≤ d s.t. Proof: Indeed, if e ∈ ∂P ( x ) , by definition of subgradient, P ( z ) ≥ P ( x ) + e T ( z − x ) , ∀ z ∈ dom P. Upon rearranging, P ( x ) − e T x ≤ P ( z ) − e T z ≤ w − e T z, ∀ ( z, w ) ∈ epi P. Error Bounds for Structured Convex Programming 18

Recent Progress on Error Bounds for Structured Convex Programming - PowerPoint PPT Presentation

Recent Progress on Error Bounds for Structured Convex Programming Zirui Zhou Joint work with Anthony Man-Cho So Department of Systems Engineering & Engineering Management The Chinese University of Hong Kong September 3, 2014, Beijing

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Convex hull: basic facts Convex hull: basic facts CG Lecture 1 CG Lecture 1 Problem : give a set

Convex hulls of spheres and convex hulls of convex polytopes lying on parallel hyperplanes

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

Convex Analysis Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS133 Computational Geometry Convex Hull 4/12/2018 1 Convex Hull Given a set of n points,

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Primary objectives: Convex optimization Ellipsoid method A polynomial algorithm for

Sparse regression DS-GA 1013 / MATH-GA 2824 Mathematical Tools for Data Science

Summary Key topics. Familiarity with form of basic network gradient. Deep network

MATH 4211/6211 Optimization Convex Optimization Problems Xiaojing Ye Department of

RSLIS at INEX 2011 Social Book Search track Toine Bogers Kirstine Wilfred Christensen Birger

Seminar: Recreational Computer Science 2. How to write a (Seminar) Paper Gabi R oger

Conjugate duality in stochastic optimization Ari-Pekka Perkki o, Institute of Mathematics ,

Lecture 10: Epilogue Helger Lipmaa Helsinki University of Technology helger@tcs.hut.fi T-79.159