Faster convex optimization Simulated annealing & Interior point - PowerPoint PPT Presentation

Faster convex optimization Simulated annealing & Interior point Elad Hazan Joint work with Jacob Abernethy – U MICH

Convex optimization fundamental problem of optimization: minimize a convex (linear) function over a convex set min x ∈ K f ( x ) min x ∈ K ∩ { f ( x ) ≤ t } t

Convex optimization A few examples ERM/stochastic minimization for machine 1. learning Semi-definite programming for block model, 2. 3D-reconstruction Bayesian inference relaxations. 3. Matrix completion problems, sparse reconstruction, nuclear 4. norm minimization, metric learning….

Convex optimization fundamental problem of optimization: minimize a convex (linear) function over a convex set x 2 K c > x min Convex set given by: linear constraints (LP) 1. Semi-definite constraints 2. Separation oracle 3. Membership oracle 4.

Polynomial time convex opt Ellipsoid [Shor, Khachiyan, Nemirovski-Yudin] O(n 12 ) queries/ time Interior point [Karmarkar, Nesterov- Random-walk Nemirovski] [Lovasz- Vempala,Bertsimas- require barrier Vempala,Kalai-Vempala] O(n 1/2 * n 4 ) This result + faster algorithm O( ν 1/2 * n 4 ) , O( ν 5/2 * n 3 )

Agenda 1. Mini tutorial on IPM 2. Mini tutorial on SA 3. The equivalence of SA and IPM 4. How to get faster convex opt

Interior point methods: mini-tutorial

Gradient descent y t+1 move in the direction of the c steepest decrease (-gradient) x t+1 x t y t +1 = x t � η r f ( x t ) x y +1 = project K [ y t +1 ] min k x � y k 2 Projection – Can be as hard as the original problem! x 2 K

steepest decrease direction – no information on curvature! Newton’s method (“smart gradient”): y t +1 = x t � η [ r 2 f ( x t )] − 1 r f ( x t ) x y +1 = project K [ y t +1 ] For quadratic functions: solution in 1 step

Interior point methods Avoid projections à remain in the interior always Add curvature à add a “super-smooth” barrier function min c T x min c T x - ∑ i log(b i - A i x) A 1 x - b 1 ≤ 0 x~R n … A m x - b m ≤ 0 x~ R n Barrier R(x) function

Self-concordant barrier Allow polynomial-time convex optimization [Nesterov, Nemirovski 1994]. Properties: Self-concordance parameter 1. as x-> ϑ K, R(x) à ∞ 2. r 3 R ( x )[ h, h, h ]  2( r 2 R ( x )[ h, h ]) 3 / 2 p ν r 2 R ( x )[ h, h ] r R ( x )[ h ]  Property 1: remain in the interior Properties 2: ensure that Newton’s method can exploit curvature Linear programming: X Ax ≤ b ⇒ R ( x ) = log( A i x − b i ) i

Interior point methods But now: Objective is skewed – barrier distorts c > x + R ( x ) � x 2 K c > x min min x 2 R d

Interior point methods à Add & change barrier scale t · c > x + R ( x ) � min x 2 K c > x min x 2 R d t : ∼ 0 ⇒ ∞ t k +1 = t k (1 + 1 √ ν )

FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS t · c > x + R ( x ) � min x 2 R d 14

Path following method Changing the parameter t from 0 to ∞ t · c > x + R ( x ) � min x 2 R d t · c > x + R ( x ) � β ( t ) = arg min Iteratively: x 2 R n Update t 1. Optimize new objective 2. (inside the yellow ellipse)

Inside the yellow ellipse: self concordant functions R - self concordant for convex set K, at each x, hessian of R at x defines local norm: The Dikin ellipsoid Inside Dikin ellipsoid: function is strongly convex and smooth with respect to the local norm One newton step suffices!

Path following method – complexity Self-concordance parameter ~ isoperimetric t · c > x + R ( x ) � min constant of K x 2 R d Geometric update of t à # of iterations <= ν 1/2 1. Each iteration: mirror descent (Newton), matrix inversion 2. REQUIRE EFFICIENT BARRIER!! Long standing question: efficient universal barrier?

Interior point: summary t · c > x + R ( x ) � min x 2 R d Problems with gradient descent: projections, cannot exploit curvature Moved to Newton’s method + barrier + changed scaling à interior algorithm, provably converging in poly time BUT: REQUIRE EFFICIENT BARRIER!! Long standing open question: efficient universal barrier?

Agenda 1. Mini tutorial on IPM 2. Mini tutorial on SA 3. The equivalence of SA and IPM 4. How to get faster convex opt

Simulated annealing: mini-tutorial

Simulated annealing Common heuristic for non-convex optimization: Boltzman distribution over a set K: (w.r.t. function f or direction c) e − f ( x ) t P t,f ( x ) ≡ y ∈ K e − f ( y ) R t dy t = ∞ : uniform over K t à 0: approach min f(x) over K

Simulated annealing Common heuristic for non-convex optimization: Boltzman distribution over a set K: (w.r.t. function f or direction c) c e − c > x t P t,c ( x ) ≡ y ∈ K e − c > y R t dy t = ∞ : uniform over K t à 0: approach min c T x over K

Simulated annealing - intuition e − c > x t Initially: sampling uniformly at random P t,c ( x ) ≡ y ∈ K e − c > y R t dy When temperature is very low à sample from minimum = goal If successive distributions are “close” – can use “warm start” to sample efficiently from P t+1 given an efficient method for sampling from P t What is a warm start? 1. How to sample from P t ? (there are many methods…) 2.

Hit-and-Run e − c > x t Iteratively: P t,c ( x ) ≡ y ∈ K e − c > y R t dy Sample line from distribution 1. c u ∼ N ( X t , C t ) Consider interval = restriction to K 2. Sample from induced distribution P t on 3. x t interval – this is X t+1 Theorem: HNR has stationary dist. P t x t+1 How does K enter the random walk? Notice– only membership oracle needed for K!

hit & run

Simulated annealing w. Hit-and-Run First polynomial-time algorithm [Kalai, Vempala ’06]: e − c > x t Sample from P t,c ( x ) ≡ 1. y ∈ K e − c > y R t dy using Hit-and-Run Successive distributions are close enough if 2. k cov( P t k ) � cov( P t k +1 ) k  1 KL ( P t k , P t k +1 ) ≤ 1 ⇔ 2 2 1 SA with HNR, temperature schedule of t k +1 = t k (1 − √ n ) 3. O ( √ n log 1 Their main theorem: algorithm returns approximate solution in ✏ ) iterations, and overall time O ( √ n log 1 ✏ × n × n 3 ) = ˜ O ( n 4 . 5 )

FAST CONVEX OPT-SIMULATED ANNEALING-INTERIOR POINT METHODS

New: Curve of mean of Boltzman distribution, parameterized by temperature e − c > x/t µ ( t ) = E x ∼ P t,c ( x ) [ x ] , P t,c ( x ) = y ∈ K e − c > y/t dy R

Two different convex optimization methods Interior Point Methods Simulated via Path Annealing Following via Hit-and- Run

Our key result: there exists a barrier R(x) for any convex set such that CentralPath is identically the HeatPath µ ( t ) = [ x ] t · c > x + R ( x ) � β ( t ) = arg min E x 2 R n K 3 x ⇠ e � c > x t

What is this special function? the entropic barrier: Z e − c > x dx = log partition function A ( c ) = log for the exponential family x ∈ K r A ( c ) = � E x ⇠ P c [ x ] , r 2 A ( c ) = E x ⇠ P c [( x � E [ x ])( x � E [ x ]) > ] entropic barrier for K: Guller ‘96 + Nesterov/ 1. Nemirovski ‘94 A ⇤ ( x ) = sup c { c > x − A ( c ) } ν = O(n) PSD cone - ν = O(n 1/2 ) Bubeck-Eldan ‘15: 2. ν = n + o(n)

Convergence/running time analysis Method Interior point Simulated methods annealing Inside each Fast convergence of Fast convergence of temperature Newton’s method Hit-and-Run to stationary distribution Change After Newton stationary temperature converged distribution, estimate covariance Condition Newton decrement Distance between << 1 consecutive dist.

Why is this interesting? Unifies two distinct literatures • One less algorithm to teach/learn in your class! • Using IPM ideas we get a faster algorithm for convex optimization • O ( √ n ) ⇒ ˜ O ( √ ν ) ˜ For semi-definite programming: • ν = O ( √ n ) Randomized efficient interior-point path-following algorithm for • any convex set! (long-standing open problem in optimization)

• Time for a Demo? • Time for a proof sketch? • Fin…

Faster convex optimization Simulated annealing & Interior point - PowerPoint PPT Presentation

Faster convex optimization Simulated annealing & Interior point Elad Hazan Joint work with Jacob Abernethy U MICH Convex optimization fundamental problem of optimization: minimize a convex (linear) function over a convex set min x

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Optimization Problems

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk

A Primer in Convex Optimization Moritz Diehl partly based on material by Colin Jones, Stephen

16. Review of convex optimization Convex sets and functions Convex programming models

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

CS675: Convex and Combinatorial Optimization Spring 2018 Duality of Convex Sets and Functions

Alaska Native Womens Resource Center 101 Training Series FVPSA Webinar 2018-2020 Tami Truett

Model-Switching: Dealing with Fluctuating Workloads in MLaaS * Systems Jeff Zhang 1 , Sameh

Mind Your Keys? A Security Evaluation of Java Keystores Marco Squarcina (Universit Ca

RECSM Summer School: Machine Learning for Social Sciences Session 3.3: K -Means Clustering Reto

Unix and Undergraduate Teaching Carlo Kopp, BE(Hons), MSc, PhD, PEng Monash University,

Operator equations and domain dependence Hermann Knig Kiel, Germany Bedlewo, July 2014

Machine Learning for Survival Analysis Chandan K. Reddy Yan Li Dept. of Computer Science Dept.

Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 16: Association Rules Jan-Willem