introductory course on non smooth optimisation
play

Introductory Course on Non-smooth Optimisation Lecture 09 - - PowerPoint PPT Presentation

Introductory Course on Non-smooth Optimisation Lecture 09 - Non-convex optimisation Jingwei Liang Department of Applied Mathematics and Theoretical Physics Table of contents Examples 1 2 Non-convex optimisation Convex relaxation 3 4


  1. Introductory Course on Non-smooth Optimisation Lecture 09 - Non-convex optimisation Jingwei Liang Department of Applied Mathematics and Theoretical Physics

  2. Table of contents Examples 1 2 Non-convex optimisation Convex relaxation 3 4 Łojasiewicz inequality 5 Kurdyka-Łojasiewicz inequality

  3. Compressed sensing For orwar ard ob obser servation tion b = A ˚ x , ˚ x ∈ R n is sparse. A : R n → R m with m << n . Compr Compressed essed sensing sensing min x ∈ R n || x || 0 s . t . Ax = b . NB : NP-hard problem. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  4. Image processing Two-phase o-phase segmen segmentation tion Given an image I , which consists of foreground and background, segment the foreground. Ideally, I = f C + b Ω \ C . Mum Mumfor ord–Shah d–Shah model model �� � � E ( u , C ) = ( u − I ) 2 d x + λ ||∇ u || 2 d x + α | C | , Ω Ω \ C where | C | = peri ( C ) . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  5. Principal component pursuit For orwar ard mix mixtur ture model model w = ˚ x +˚ y + ǫ, x ∈ R m × n is κ -sparse, ˚ y ∈ R m × n is σ -low-rank and ǫ is noise. where ˚ Non-con Non-c onvex PCP PCP 1 2 || x + y − w || 2 min x , y ∈ R m × n s . t . || x || 0 ≤ κ rank ( y ) ≤ σ. and Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  6. Neural networks Each ach la layer er of of NNs NNs is is con onvex Linear operation, e.g. convolution. Non-linear activation function, e.g. rectifier max { x , 0 } . The composition of convex functions is not necessarily convex... Neural networks are universal function approximators. Hence need to approximate non-convex functions. Cannot approximate non-convex functions with convex functions. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  7. Outline 1 Examples 2 Non-convex optimisation 3 Convex relaxation 4 Łojasiewicz inequality 5 Kurdyka-Łojasiewicz inequality

  8. Non-convex optimisation Non-convex problem Any problem that is not convex/concave is non-convex... Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  9. Challenges Potentially many local minima. Saddle points. Very flat regions. Widely varying curvature. NP-hard. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  10. Outline 1 Examples 2 Non-convex optimisation 3 Convex relaxation 4 Łojasiewicz inequality 5 Kurdyka-Łojasiewicz inequality

  11. Convex relaxation Non-c Non-con onvex op optimisa timisation tion pr problem oblem min E ( x ) . x Con Convex op optimisa timisation tion pr problem oblem min F ( x ) . x Wha What if if Argmin ( F ) ⊆ Argmin ( E ) , Subtle and case-dependent. Somehow, finding F is almost equivalent to solving E . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  12. Convex relaxation Loose relaxation Ideal relaxation In practice, it is easier to obtain Argmin ( E ) ⊆ Argmin ( F ) . Loose relaxation will work ork if two global minima are close enough. ail if Argmin ( F ) is too large. Ideal relaxation will fail Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  13. Convolution For certain problems, non-convexity can be treated as noise... Original function Convolution Symmetric boundary condition for the convolution. Almost convex problem after convolution. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  14. Outline 1 Examples 2 Non-convex optimisation 3 Convex relaxation 4 Łojasiewicz inequality 5 Kurdyka-Łojasiewicz inequality

  15. Smooth problem Let F ∈ C 1 L . Gradient descent x k + 1 = x k − γ ∇ F ( x k ) . Descent property F ( x k ) − F ( x k + 1 ) ≥ γ ( 1 − γ L 2 ) ||∇ F ( x k ) || 2 . Let γ ∈ ] 0 , 2 / L [ , i = 0 ||∇ F ( x i ) || 2 ≤ F ( x 0 ) − F ( x k + 1 ) ≤ F ( x 0 ) − F ( x ⋆ ) . 2 ) � k γ ( 1 − γ L F ( x ⋆ ) > −∞ , rhs is a positive constant. for lhs, let k → + ∞ , k → + ∞ ||∇ F ( x k ) || 2 = 0 . lim NB : for smooth case, a critical point is guarantee. For non-smooth problem... Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  16. Semi-algebraic sets and functions Semi-algebraic set A semi-algebraic subset of R n is a finite union of sets of the form x ∈ R n : f i ( x ) = 0 , g j ( x ) ≤ 0 , i ∈ I , j ∈ J � � where I , J are finite and f i , g j : R n → R are real polynomial functions. Stability under finite ∩ , ∪ and complementation. Semi-algebraic set A function or a mapping is semi-algebraic if its graph is a semi-algebraic set. Same definition for real-extended function or multivalued mappings. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  17. Properties Tarski-Seidenberg The image of a semi-algebraic set by a linear projection is semi-algebraic. The closure of a semi-algebraic set A is semi-algebraic. Ex Example ample The graph of the derivative of a semi-algebraic function is semi-algebraic. Let A be a semi-algebraic subset of R n and f : R n → R p semi-algebraic. Then f ( A ) is semi-algebraic. g ( x ) = max { F ( x , y ) : y ∈ S } is semi-algebraic if F and S are semi-algebraic. Other examples 2 || Ax − b || 2 + µ || x || p : p is rational , min 1 x 2 || AX − B || 2 + µ rank ( X ) . min 1 X Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  18. Subdifferential tial R ∈ Γ 0 ( R n ) Con Convex subdiff subdiffer eren ential g : R ( x ′ ) ≥ R ( x ) + � g , x ′ − x � , ∀ x ′ ∈ R n � � ∂ R ( x ) = . Fréchet subdifferential Given x ∈ dom ( R ) , the Fréchet subdifferential ˆ ∂ R ( x ) of R at x is the set of vectors v such that R ( x ′ ) − R ( x ) − � v , x ′ − x � � � lim inf 1 ≥ 0 . || x − x ′ || x ′ → x , x ′ � = x ∈ dom ( R ) , then ˆ If x / ∂ R ( x ) = ∅ . Limiting subdifferential The limiting-subdifferential (or simply subdifferential) of R at x , written as ∂ R ( x ) , reads = { v ∈ R n : ∃ x k → x , R ( x k ) → R ( x ) , v k ∈ ˆ ∂ R ( x ) def ∂ R ( x k ) → v } . ˆ ∂ R is convex and ∂ R is closed. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  19. Critical points Minimal Minimal norm norm subgr subgradien adient || ∂ R ( x ) || − = min {|| v || : v ∈ ∂ R ( x ) } . Critical points Fermat’s rule: if x is a minimiser of R , then 0 ∈ ∂ R ( x ) . Conversely when 0 ∈ ∂ R ( x ) , the point x is called a critical point. When R is convex, any minimiser is a global minimiser. When R is non-convex – Local minima. – Local maxima. – Saddle point. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  20. Sharpness Sharpness Function R : R n → R ∪ { + ∞} is called sharp on the slice x ∈ R n : a < f ( x ) < b � � [ a < R < b ] def = . If there exists α > 0 such that || ∂ R ( x ) || − ≥ α, ∀ x ∈ [ a < R < b ] . Norms, e.g. R ( x ) = || x || . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  21. Łojasiewicz inequality Łojasiewicz inequality Let R : R n → R ∪ { + ∞} be proper lower semi-continuous, and moreover continuous along its domain. Then R is said to have Łojasiewicz property if: for any critical point ¯ x , there exist C , ǫ > 0 and θ ∈ [ 0 , 1 [ such that x ) | θ ≤ C || v || , ∀ x ∈ B ¯ | R ( x ) − R (¯ x ( ǫ ) , v ∈ ∂ R ( x ) . By convention, let 0 0 = 0. Property Suppose that R has Łojasiewicz property. If S is a connected subset of the set of critical points of R , that is 0 ∈ ∂ R ( x ) for all x ∈ S , then R is constant on S . If in addition S is a compact set, then there exist C , ǫ > 0 and θ ∈ [ 0 , 1 [ such that x ) | θ ≤ C || v || . ∀ x ∈ R n , dist ( x , S ) ≤ ǫ, ∀ v ∈ ∂ R ( x ) : | R ( x ) − R (¯ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  22. Non-convex PPA Proximal point algorithm Let R : R n → R ∪ { + ∞} be proper and lower semi-continuous. From arbitrary x 0 ∈ R n , x k + 1 ∈ argmin x γ R ( x ) + 1 2 || x − x k || 2 . Assump Assumption tion R is proper, that is x ∈ R n R ( x ) > −∞ . inf This implies argmin x γ R ( x ) + 1 2 || x − x k || 2 is non-empty and compact. The restriction of R to its domain is a continuous function. R has the Łojasiewicz property. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

  23. Property Property Let { x k } k ∈ N be the sequence generated by non-convex PPA and ω ( x k ) the set of its limiting points. Then Sequence { R ( x k ) } k ∈ N is decreasing. k || x k − x k + 1 || 2 < + ∞ . � If R satisfies assumption 2, then ω ( x k ) ⊂ crit ( R ) . If moreover, { x k } k ∈ N is bounded ω ( x k ) is a non-empty compact set, and � � → 0 . x k , ω ( x k ) dist If R satisfies assumption 2, then R is finite and constant on ω ( x k ) . NB : Boundedness can be guaranteed if R is coercive. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend