Introductory Course on Non-smooth Optimisation Lecture 09 - - PowerPoint PPT Presentation

Introductory Course on Non-smooth Optimisation Lecture 09 - Non-convex optimisation Jingwei Liang Department of Applied Mathematics and Theoretical Physics

Table of contents Examples 1 2 Non-convex optimisation Convex relaxation 3 4 Łojasiewicz inequality 5 Kurdyka-Łojasiewicz inequality

Compressed sensing For orwar ard ob obser servation tion b = A ˚ x , ˚ x ∈ R n is sparse. A : R n → R m with m << n . Compr Compressed essed sensing sensing min x ∈ R n || x || 0 s . t . Ax = b . NB : NP-hard problem. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Image processing Two-phase o-phase segmen segmentation tion Given an image I , which consists of foreground and background, segment the foreground. Ideally, I = f C + b Ω \ C . Mum Mumfor ord–Shah d–Shah model model �� E ( u , C ) = ( u − I ) 2 d x + λ ||∇ u || 2 d x + α | C | , Ω Ω \ C where | C | = peri ( C ) . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Principal component pursuit For orwar ard mix mixtur ture model model w = ˚ x +˚ y + ǫ, x ∈ R m × n is κ -sparse, ˚ y ∈ R m × n is σ -low-rank and ǫ is noise. where ˚ Non-con Non-c onvex PCP PCP 1 2 || x + y − w || 2 min x , y ∈ R m × n s . t . || x || 0 ≤ κ rank ( y ) ≤ σ. and Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Neural networks Each ach la layer er of of NNs NNs is is con onvex Linear operation, e.g. convolution. Non-linear activation function, e.g. rectifier max { x , 0 } . The composition of convex functions is not necessarily convex... Neural networks are universal function approximators. Hence need to approximate non-convex functions. Cannot approximate non-convex functions with convex functions. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Outline 1 Examples 2 Non-convex optimisation 3 Convex relaxation 4 Łojasiewicz inequality 5 Kurdyka-Łojasiewicz inequality

Non-convex optimisation Non-convex problem Any problem that is not convex/concave is non-convex... Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Challenges Potentially many local minima. Saddle points. Very flat regions. Widely varying curvature. NP-hard. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Convex relaxation Non-c Non-con onvex op optimisa timisation tion pr problem oblem min E ( x ) . x Con Convex op optimisa timisation tion pr problem oblem min F ( x ) . x Wha What if if Argmin ( F ) ⊆ Argmin ( E ) , Subtle and case-dependent. Somehow, finding F is almost equivalent to solving E . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Convex relaxation Loose relaxation Ideal relaxation In practice, it is easier to obtain Argmin ( E ) ⊆ Argmin ( F ) . Loose relaxation will work ork if two global minima are close enough. ail if Argmin ( F ) is too large. Ideal relaxation will fail Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Convolution For certain problems, non-convexity can be treated as noise... Original function Convolution Symmetric boundary condition for the convolution. Almost convex problem after convolution. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Smooth problem Let F ∈ C 1 L . Gradient descent x k + 1 = x k − γ ∇ F ( x k ) . Descent property F ( x k ) − F ( x k + 1 ) ≥ γ ( 1 − γ L 2 ) ||∇ F ( x k ) || 2 . Let γ ∈ ] 0 , 2 / L [ , i = 0 ||∇ F ( x i ) || 2 ≤ F ( x 0 ) − F ( x k + 1 ) ≤ F ( x 0 ) − F ( x ⋆ ) . 2 ) � k γ ( 1 − γ L F ( x ⋆ ) > −∞ , rhs is a positive constant. for lhs, let k → + ∞ , k → + ∞ ||∇ F ( x k ) || 2 = 0 . lim NB : for smooth case, a critical point is guarantee. For non-smooth problem... Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Semi-algebraic sets and functions Semi-algebraic set A semi-algebraic subset of R n is a finite union of sets of the form x ∈ R n : f i ( x ) = 0 , g j ( x ) ≤ 0 , i ∈ I , j ∈ J � � where I , J are finite and f i , g j : R n → R are real polynomial functions. Stability under finite ∩ , ∪ and complementation. Semi-algebraic set A function or a mapping is semi-algebraic if its graph is a semi-algebraic set. Same definition for real-extended function or multivalued mappings. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Properties Tarski-Seidenberg The image of a semi-algebraic set by a linear projection is semi-algebraic. The closure of a semi-algebraic set A is semi-algebraic. Ex Example ample The graph of the derivative of a semi-algebraic function is semi-algebraic. Let A be a semi-algebraic subset of R n and f : R n → R p semi-algebraic. Then f ( A ) is semi-algebraic. g ( x ) = max { F ( x , y ) : y ∈ S } is semi-algebraic if F and S are semi-algebraic. Other examples 2 || Ax − b || 2 + µ || x || p : p is rational , min 1 x 2 || AX − B || 2 + µ rank ( X ) . min 1 X Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Subdifferential tial R ∈ Γ 0 ( R n ) Con Convex subdiff subdiffer eren ential g : R ( x ′ ) ≥ R ( x ) + � g , x ′ − x � , ∀ x ′ ∈ R n � � ∂ R ( x ) = . Fréchet subdifferential Given x ∈ dom ( R ) , the Fréchet subdifferential ˆ ∂ R ( x ) of R at x is the set of vectors v such that R ( x ′ ) − R ( x ) − � v , x ′ − x � � � lim inf 1 ≥ 0 . || x − x ′ || x ′ → x , x ′ � = x ∈ dom ( R ) , then ˆ If x / ∂ R ( x ) = ∅ . Limiting subdifferential The limiting-subdifferential (or simply subdifferential) of R at x , written as ∂ R ( x ) , reads = { v ∈ R n : ∃ x k → x , R ( x k ) → R ( x ) , v k ∈ ˆ ∂ R ( x ) def ∂ R ( x k ) → v } . ˆ ∂ R is convex and ∂ R is closed. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Critical points Minimal Minimal norm norm subgr subgradien adient || ∂ R ( x ) || − = min {|| v || : v ∈ ∂ R ( x ) } . Critical points Fermat’s rule: if x is a minimiser of R , then 0 ∈ ∂ R ( x ) . Conversely when 0 ∈ ∂ R ( x ) , the point x is called a critical point. When R is convex, any minimiser is a global minimiser. When R is non-convex – Local minima. – Local maxima. – Saddle point. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Sharpness Sharpness Function R : R n → R ∪ { + ∞} is called sharp on the slice x ∈ R n : a < f ( x ) < b � � [ a < R < b ] def = . If there exists α > 0 such that || ∂ R ( x ) || − ≥ α, ∀ x ∈ [ a < R < b ] . Norms, e.g. R ( x ) = || x || . Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Łojasiewicz inequality Łojasiewicz inequality Let R : R n → R ∪ { + ∞} be proper lower semi-continuous, and moreover continuous along its domain. Then R is said to have Łojasiewicz property if: for any critical point ¯ x , there exist C , ǫ > 0 and θ ∈ [ 0 , 1 [ such that x ) | θ ≤ C || v || , ∀ x ∈ B ¯ | R ( x ) − R (¯ x ( ǫ ) , v ∈ ∂ R ( x ) . By convention, let 0 0 = 0. Property Suppose that R has Łojasiewicz property. If S is a connected subset of the set of critical points of R , that is 0 ∈ ∂ R ( x ) for all x ∈ S , then R is constant on S . If in addition S is a compact set, then there exist C , ǫ > 0 and θ ∈ [ 0 , 1 [ such that x ) | θ ≤ C || v || . ∀ x ∈ R n , dist ( x , S ) ≤ ǫ, ∀ v ∈ ∂ R ( x ) : | R ( x ) − R (¯ Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Non-convex PPA Proximal point algorithm Let R : R n → R ∪ { + ∞} be proper and lower semi-continuous. From arbitrary x 0 ∈ R n , x k + 1 ∈ argmin x γ R ( x ) + 1 2 || x − x k || 2 . Assump Assumption tion R is proper, that is x ∈ R n R ( x ) > −∞ . inf This implies argmin x γ R ( x ) + 1 2 || x − x k || 2 is non-empty and compact. The restriction of R to its domain is a continuous function. R has the Łojasiewicz property. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Property Property Let { x k } k ∈ N be the sequence generated by non-convex PPA and ω ( x k ) the set of its limiting points. Then Sequence { R ( x k ) } k ∈ N is decreasing. k || x k − x k + 1 || 2 < + ∞ . � If R satisfies assumption 2, then ω ( x k ) ⊂ crit ( R ) . If moreover, { x k } k ∈ N is bounded ω ( x k ) is a non-empty compact set, and � � → 0 . x k , ω ( x k ) dist If R satisfies assumption 2, then R is finite and constant on ω ( x k ) . NB : Boundedness can be guaranteed if R is coercive. Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

Introductory Course on Non-smooth Optimisation Lecture 09 - - PowerPoint PPT Presentation

Introductory Course on Non-smooth Optimisation Lecture 09 - Non-convex optimisation Jingwei Liang Department of Applied Mathematics and Theoretical Physics Table of contents Examples 1 2 Non-convex optimisation Convex relaxation 3 4

BIBLICAL SURVEY Introductory Class Introductory Class BIBLICAL SURVEY Introductory Class

Introductory Course on Non-smooth Optimisation Lecture 01 - Gradient methods Jingwei Liang

Medicines optimisation The road to excellence Workshop Overview of meds optimisation Your

Introductory Course on Non-smooth Optimisation Lecture 04 - BackwardBackward splitting Jingwei

Introductory Course on Non-smooth Optimisation Lecture 05 - PeacemanRachford,

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Automated and Accurate Geometry Extraction and Shape Optimisation of 3D Topology Optimisation

Introduction to program optimisation Michel Schinz (based on Erik Stenmans slides) Advanced

Extremal generalized smooth words Kolakoski word Run-length encoding Smooth words Generalized

NSTD Introductory Course NSTD Introductory Course New Gen III+ Reactor New Gen III+ Reactor

An adaptive backtracking strategy for non-smooth composite optimisation problems Luca Calatroni

Variability Mechanisms Variability Mechanisms Introductory Course on Variable Stars Introductory

Optimisation Constraint Problems Combinatorial Optimisation Modelling (in MiniZinc) Solving

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Introductory Statistics Refresher Dr. Julia L. Sharp Short Course on Introductory Statistics

Pressure Optimisation Introduction Why carry out Pressure Optimisation How and Who

Spectra of C* algebras, classification. Eberhard Kirchberg HU Berlin Lect.2, Copenhagen, 09

Coherent Coincident Analysis of LIGO Burst Candidates Laura Cadonati Massachusetts Institute of

Distribution Systems for 3D Teleimmersive and Video 360 Content: Similarities and Differences

Statistical methods for understanding neural codes Liam Paninski Department of Statistics and

Optical Rings and Hybrid Mesh Rings Optical Networks draft-papadimitriou-optical-rings-00.txt

Convex discretization of functionals involving the Monge-Amp` ere operator Quentin M erigot

Large-scale Computation Nathan Lam z5113345 Sophie Calland z5161776 Stephen Webb z5075569

On Stable Convex Sets Colloquium of the Pure Mathematics Research Centre Queens University