2. Elements of convex optjmizatjon Chlo-Agathe Azencot Centre for - PowerPoint PPT Presentation

Gradient descent ● Start from a random point u . ● How do I get closer to the solutjon? ● Follow the opposite of the gradient. The gradient indicates the directjon of steepest increase. ∇ - f(u)) f(u) f(u + ) u + u

Gradient descent algorithm ● Choose an initjal point ● Repeat for k=1, 2, 3, … ● Stop at some point

Gradient descent algorithm ● Choose an initjal point ● Repeat for k=1, 2, 3, … step size ● Stop at some point stopping criterion

Gradient descent algorithm ● Choose an initjal point ● Repeat for k=1, 2, 3, … step size ● Stop at some point stopping criterion Usually: stop when

Gradient descent algorithm ● Choose an initjal point ● Repeat for k=1, 2, 3, … step size ? – If the step size is too big, the search might diverge – If the step size is too small, the search might take a very long tjme – Backtracking line search makes it possible to chose the step size adaptjvely.

Gradient descent algorithm ● Choose an initjal point ● Repeat for k=1, 2, 3, … step size – If the step size is too big, the search might diverge – If the step size is too small, the search might take a very long tjme – Backtracking line search makes it possible to chose the step size adaptjvely.

Gradient descent algorithm ● Choose an initjal point ● Repeat for k=1, 2, 3, … step size – If the step size is too big, the search might diverge ? – If the step size is too small, the search might take a very long tjme – Backtracking line search makes it possible to chose the step size adaptjvely.

Gradient descent algorithm ● Choose an initjal point ● Repeat for k=1, 2, 3, … step size – If the step size is too big, the search might diverge – If the step size is too small, the search might take a very long tjme – Backtracking line search makes it possible to chose the step size adaptjvely.

BLS: shrinking needed The step size is too big and we are overshootjng our goal. f(u) α∇ f(u- f(u)) u α∇ u- f(u)

BLS: shrinking needed The step size is too big and we are overshootjng our goal. f(u) ? α∇ f(u- f(u)) u α ∇ u- /2 f(u) α∇ u- f(u)

BLS: shrinking needed The step size is too big and we are overshootjng our goal. f(u) α∇ f(u- f(u)) T f(u) α ∇ ∇ f(u)- /2 f(u) u α ∇ u- /2 f(u) α∇ u- f(u)

BLS: shrinking needed α∇ α ∇ T f(u) ∇ f(u- f(u)) > f(u)- /2 f(u) The step size is too big and we are overshootjng our goal. f(u) α∇ f(u- f(u)) T f(u) α ∇ ∇ f(u)- /2 f(u) u α ∇ u- /2 f(u) α∇ u- f(u)

BLS: no shrinking needed α∇ α ∇ T f(u) ∇ f(u- f(u)) ≤ f(u)- /2 f(u) The step size is small enough. f(u) α α ∇ ∇ T f(u) T f(u) ∇ ∇ f(u)- /2 f(u) f(u)- /2 f(u) α∇ f(u- f(u)) u α ∇ u- /2 f(u) α∇ u- f(u)

Backtracking line search ● Shrinking parameter , initjal step size ● Choose an initjal point ● Repeat for k=1, 2, 3, … – If shrink the step size: – Else: – Update: ● Stop when

Newton’s method ● Suppose f is twice derivable ● Second-order Taylor’s expansion: ● Minimize in v instead of in u

Newton’s method ● Suppose f is twice derivable ● Second-order Taylor’s expansion: ● Minimize in v instead of in u ?

Newton’s method ● Suppose f is twice derivable ● Second-order Taylor’s expansion: ● Minimize in v :

Newton CG (conjugate gradient) ● Computjng the inverse of the Hessian is computatjonally intensive. ● Instead, compute and and solve for ● What is the new update rule? ?

Newton CG (conjugate gradient) ● Computjng the inverse of the Hessian is computatjonally intensive. ● Instead, compute and and solve for ● New update rule:

Newton CG (conjugate gradient) ● Computjng the inverse of the Hessian is computatjonally intensive. ● Instead, compute and and solve for This is a problem of the form

Newton CG (conjugate gradient) ● Computjng the inverse of the Hessian is computatjonally intensive. ● Instead, compute and and solve for ? This is a problem of the form

Newton CG (conjugate gradient) ● Computjng the inverse of the Hessian is computatjonally intensive. ● Instead, compute and and solve for This is a problem of the form Second-order characterizatjon of convex functjons

Newton CG (conjugate gradient) ● Computjng the inverse of the Hessian is computatjonally intensive. ● Instead, compute and and solve for This is a problem of the form Solve using the conjugate gradient method.

Conjugate gradient method Solve ● Idea: build a set of A-conjugate vectors (basis of ) – Initjalisatjon: – At step t: ● Update rule: ● residual ● ensures – Convergence: hence

Conjugate gradient method ? Prove Given – Initjalisatjon: – At step t: ● Update rule: ● residual ● and assuming

Prove Given – Initjalisatjon: – Update rule: – residual – and assuming

Conjugate gradient method ? Prove and conclude the proof Given – Initjalisatjon: – At step t: ● Update rule: ● residual ●

Prove Given – Initjalisatjon: – Update rule: – residual –

Quasi-Newton methods ● What if the Hessian is unavailable / expensive to compute at each iteratjon? ● Approximate the inverse Hessian: update iteratjvely ● Conditjons: ∇ 1 st order Taylor applied to f – – Secant equatjon: ⇒ ● Initjalizatjon: Identjty

Quasi-Newton methods ● What if the Hessian is unavailable / expensive to compute at each iteratjon? ● Approximate the inverse Hessian: update iteratjvely ● Conditjons: ● BFGS: Broyden-Fletcher-Goldfarb-Shanno – – Secant equatjon: The mean value G of between u and v verifjes – ⇒ ● L-BFGS: Limited memory variant Do not store the full matrix W k .

Stochastjc gradient descent ● For ● Gradient descent: ● Stochastjc gradient descent: – Cyclic : cycle over 1, 2, …, m, 1, 2, …, m, … – Randomized: chose i k uniformely at random in {1, 2, …, m}.

Coordinate Descent ● For – g: convex and difgerentjable ⇒ – h i : convex the non-smooth part of f is separable. ● Minimize coordinate by coordinate: – Initjalisatjon: – For k=1, 2, …:

Coordinate Descent ● For – g: convex and difgerentjable ⇒ – h i : convex the non-smooth part of f is separable. ● Minimize coordinate by coordinate: – Initjalisatjon: – For k=1, 2, …: Variants: – re-order the coordinates randomly – Proceed by blocks of coordinates (2 or more at a tjme)

Summary: Unconstrained convex optjmizatjon If f is difgerentjable – Set its gradient to zero – If hard to solve: gradient descent Settjng the learning rate: ● Backtracking Line Search (adapt heuristjcally to avoid “overshootjng”) ● Newton’s method: Suppose f twice difgerentjable – – If the Hessian is hard to invert, compute by solving by the conjugate gradient method – If the Hessian is hard to compute, approximate the inverse Hessian with a quasi-Newton method such as BFGS ( L-BFGS : less memory) – If f is separable: stochastjc gradient descent – If the non-smooth part of f is separable: coordinate descent.

Constrained convex optjmizatjon

Constrained convex optjmizatjon ● Convex optjmizatjon program/problem: – f is convex – are convex – are affjne – The feasible set is convex

Lagrangian ● Lagrangian: = Lagrange multjpliers = dual variables

Lagrange dual functjon ● Lagrangian: ● Lagrange dual functjon: Infjmum = the greatest value x such that x ≤ L(u, α, β) ● Q is concave (independently of the convexity of f)

Lagrange dual functjon ● ● Q is concave (independently of the convexity of f)

Lagrange dual functjon ● The dual functjon gives a lower bound on our solutjon Let feasible set Then for any

Weak duality ● for any ● What is the best lower bound on p* we can get? ?

2. Elements of convex optjmizatjon Chlo-Agathe Azencot Centre for - PowerPoint PPT Presentation

Introductjon to Machine Learning CentraleSuplec Paris Fall 2017 2. Elements of convex optjmizatjon Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Why talk about

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Convex hull: basic facts Convex hull: basic facts CG Lecture 1 CG Lecture 1 Problem : give a set

Convex hulls of spheres and convex hulls of convex polytopes lying on parallel hyperplanes

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

Convex Analysis Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

CS133 Computational Geometry Convex Hull 4/12/2018 1 Convex Hull Given a set of n points,

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Optjmizatjon of magnetron sputuer-depositjon process of thin fjlm coatjngs on a moving

3.1 Online Convex Programming Definition 3.1.1 (Convex Set) A set of vectors X R n is convex

Minimizing within convex bodies using a convex hull method Edouard Oudet Thomas

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

New Approaches to Harness Global Interconnects Jason Cong Computer Science Department

Linear Programming DPV Chapter 7, Part 1 Jim Royer March 20, 2019 Uncredited diagrams are from

Computational Optimization Constrained Optimization m R b , m n n Easiest Problem

Linear Programming Lecturer: Shi Li Department of Computer Science and Engineering University at

Combiners for Backdoored Random Oracles Balthazar Bauer, Pooya Farshim, Sogol Mazaheri ENS,

On the number of distinct solutions generated by the simplex method for LP . . . . .

Mixed Integer Linear Programming Combinatorial Problem Solving (CPS) Javier Larrosa Albert

Honors Combinatorics CMSC-27410 = Math-28410 CMSC-37200 Instructor: Laszlo Babai University