Computational Optimization Last of unconstrained 2/26 Half-way - PowerPoint PPT Presentation

Computational Optimization Last of unconstrained 2/26

Half-way there Minimize f(x) (objective function) subject to x ∈ S (constraints) Can characterize problem by type of objective functions and constraints NEOS Optimization Guide http://www- fp.mcs.anl.gov/otc/Guide/OptWeb/

Optimization Recipes Optimization algorithms are like recipes using common ingredients: � Step-size � Trust regions � Newton’s method � Quasi-Newton � Conjugate directions …. Just stir up the right combination

Some other ingredients Trust Region Methods Limited Memory Quasi Newton Linear Least Square Nonlinear Least Squares Finite difference methods Automatic Differentiation

Trust Region Methods Alternative to line search methods Optimize quadratic model of objective within the “trust region” 1 ∈ + ∇ + arg min ( ) ( )' ' p f x f x p p B p k i i k 2 ≤ Δ . . s t p k − ∇ ( x ) f x + i 1 x i

Options How to pick Bk -- Newton or Quasi-Newton Δ How to pick trust region radius -- k � Shrink if fail to get a decrease � Increase if you get a good decrease � Otherwise keep the same Trust region problem need not be solved exactly. Many variations

Use ratio to determine trust region radius 1 ∈ = + ∇ + arg min ( ) : ( ) ( )' ' p m p f x f x p p B p k k i i k 2 p ≤ Δ . . s t p k Look at ratio of actual versus predicted decrease. − + ( ) ( ) f x f x p ρ = k k k − k (0) ( ) m m p k k k p = Δ If ratio is near one and then increase radius. k If ration is near zero then decrease radius.

Trust Region Methods Pros: Pick direction and stepsize simultaneously Global convergence Superlinear convergence in many cases Some types very effective in practice. Cons: Must solve one or more constrained trust region problems at each iteration

BFGS in NW = − α ∇ x x H f + 1 x k k k = + ρ ' ' where H V H V s s + 1 k k k k k k k 1 ρ = = − ρ ' ( ) with V I s y k k k k k ' y s k k = − = ∇ − ∇ s x x y f f + + 1 1 k k k k k k Hk has nice low rank structure and all we really need to do is multiply it times the gradient. So we can do these without explicitly storing it.

Hk grows at each iteration ( ) ( ) = ' ' 0 ... ... H V V H V V − − − − 1 1 k k k m k m k k ( ) ( ) + ρ ' ' ' ... ... V V s s V V − − − + − − − + − 1 1 1 1 k m k k m k m k m k m k ( ) ( ) + ρ ' ' ' ... ... V V s s V V − + − − + − + − + − + − k m 1 k 1 k m 2 k m 1 k m 1 k m 2 k 1 ...... + ρ ' s s − − − k 1 k 1 k 1 So can define a recursive procedure (see Algorithm 9page 225) that only requires inner products (assuming H0 diagonal) Uses only 4mn multiplications. Requires only storage of sk, yk

More improvements H0 can be changed at each iteration. A good choice in practice is: ' s y = γ γ = − − 0 1 1 k k H I k k k ' y y − − 1 1 k k Limit memory: only store sk,yk for last m iterates and base approximation on that.

Limited Memory BFGS –pros and cons Usually best algorithm for large problems with non-sparse Hessians May not be best if problem has special structure, e.g. sparsity, separable structure, nonlinear least squares. Needs Wolfe stepsize. Relatively cheap iterates Robust May converge slowly on highly ill conditioned problems.

Partially Separable Structure ) 5 x , 4 x ( 3 f + ) 4 x , 1 x ( 2 f ( ) + f x ) 3 i x , = ∑ 1 1 m x = i ( Examples 1 f = ( ) f x ) x ( f

Predict Drug Bioavailability ∈ y R Aqua solubility = Aquasol 525 descriptors generated � Electronic TAE ∈ 525 x R � Traditional i 197 molecules with tested solubility = � 197

b + 1- d Regression with bias w x , x b=2 y

Linear Regression Given training data: ( ) ( ) ( ) ( ) ( ) = … … , , , , , , , , , S x y x y x y x y � � 1 1 2 2 i i ∈ ∈ n points and labels x R y R i i Construct linear function: n ∑ = + = + = + ( ) , ' ( ) g x x w b x w b w x b i i = 1 i Goal for future data (x,y) with y unknown ≈ ( ) g x y

Least Squares Approximation ξ = 2 ) ( ) ) x i x g ( g − − y i y = ( ( , ) ∑ 1 y = � i x = ( , ) f y L g s ≈ Minimize loss ( ) g x Define error Want

) norm y − b Linear Least Squares Loss e − + 2 Xw 2 ) i )'( y 2 − y y b − − + b b e e x w + + ' Xw i Xw ( 1 = ( � i ∑ = = = ( , ) b w L

Optimal Solution ≈ + Want: y Xw b e e is a vector of ones Mathematical Model: = − + 2 + λ 2 min ( , , ) ( ) || || L w b S y Xw e b w w Optimality Conditions: ∂ ( , , ) L w b S = − − + λ = 2 '( ) 2 0 X y Xw e b w ∂ w ∂ ( , , ) L w b S = − − = 2 '( ) 0 e y Xw e b ∂ b

Optimal Solution Thus : = − ' ' ' e e b e y e Xw ' ' e y e Xw ⇒ = − = − ( ) ( )' b mean y mean X w � � Assume data scaled such that mean(x)=X’e =0 − + λ = − 1 ( ' ) ' ' X X I w X y X e b − = + λ = 1 ( ' ) ' ( ) w X X I X y y b mean

Nonlinear Least squares Partially separable problem � = ∑ 1 ( ) 2 ( , , ) ( , , ) f a b c f a b c i 50 2 = 1 i = − + + 40 2 ( , , ) ( ) f a b c y ax bx c i i i i 30 20 10 0 -10 0 2 4 6 8 10 12 14 16 18 20

Nonlinear Least squares Partially separable problem � ∑ ( ) ∇ = ∇ ( , , ) ( , , )' ( , , ) f a b c f a b c f a b c i i = 1 i � = ∑ 1 ( ) 2 ⎡ ⎤ ( , , ) ( , , ) f a b c f a b c 2 x i 2 i ⎢ ⎥ � ( ) = ∑ 1 i = − − + + 2 ( ) y ax bx c x ⎢ ⎥ i i i i ⎢ ⎥ = 1 i 1 ⎣ ⎦ ⎡ ⎤ 2 x i ⎢ ⎥ ∇ = − ⎢ ( , , ) f a b c x ⎥ i i = − + + 2 ( , , ) ( ) f a b c y ax bx c ⎢ ⎥ 1 i i i i ⎣ ⎦

Nonlinear Least Squares 1 ∑ = 2 min ( ) ( ) f x f x x i Problems of type: 2 i Gradient is ∑ ∇ ( ) ( ) f x f x i i i Hessian is ∑ ∑ ∇ ∇ + ∇ 2 ( ) ( )' ( ) ( ) f x f x f x f x i i i i i i Approximate Hessian by ∑ ∇ ∇ ( ) ( )' f x f x i i i Newton = Gauss-Newton Newton + trust region = Levenberg-Marquardt

Matlab Check out Matlab opimization Type bandem Help fminunc Has all the basics:

What if gradient not available? Can use finite difference methods Recall + − ( ) ( ) f x h f x = '( ) lim f x → h 0 h Approximate using small h + − ( ) ( ) f x h f x ≈ '( ) f x h ∂ + − ( ) ( ) ( ) f x he f x f x ≈ = 1 [1, 0,.., 0]' where e 1 x h 1

Problems Introduces error + − ( ) ( ) f x h f x + − − ( ) ( ) f x h f x h h 2 h Best value of h is very small. Close to machine precision. Have to do for each dimension

Automatic Differentiation function makegradient(fcn, name) %Creates a new matlab function (defined by gradient(fcn)) %and saves it with the specified name % Example: makegradient('x^2+y^2','gf') creates a file gf.m: % function functout = gf(v) % x = v(1); % y = v(2); % functout = [2*x, 2*y]; %

Automatic Differentiation function makehessian(fcn, name) %Creates a new matlab function (defined by gradient(fcn)) %and saves it with the specified name % % Example: makehessian('x^7+x*y^3','hf') creates a file hf.m: % % function functout = hf(v) % x = v(1); % y = v(2); % functout = [[42*x^5, 3*y^2]; [3*y^2, 6*x*y]]; %

Fminunc First try using finite difference approximation for the gradient For example on L X0=[1,1]’; Options = optimset(‘display’,’iter’); X=fminunc(@L,X0,Options)

To use real gradient-Option 1 Combine f,h,g into on matlab file function [f,g,H] = matL(x) f = L(x); if nargout > 1 g = gradL(x); end; if nargout > 2 H = hessL(x); end; Options = optimset(‘gradobj’,’on’,’Display’,’iter’); X=fminunc(@matL,X0,Options)

To use real gradient Options = optimset(‘gradobj’,’on’,’Display’,’iter’); X=fminunc({@L,@gradL},X0,Options)

To use Hessian Same as gradient but add Options = optimset(Options,‘hessian’,’on’); X=fminunc(@matL,X0,Options) Or X=fminunc({@L,@gradL,@hessL},X0,O ptions)

Check your gradients! Try Options = optimset(Options,‘DerivativeCheck’,’on’) X=fminunc(@L,X0,Options) Try on the our family of functions. What happens?

Computational Optimization Last of unconstrained 2/26 Half-way - PowerPoint PPT Presentation

Computational Optimization Last of unconstrained 2/26 Half-way there Minimize f(x) (objective function) subject to x S (constraints) Can characterize problem by type of objective functions and constraints NEOS Optimization Guide

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Computational Optimization Advance Topics NonSmooth Optimization Reference: Nonlinear

1. Computational Fluid a. Computational Fluid Dynamics is in the domain of Computational Science

Optimization of HPSG Grammar Implementations in Trale Georgiana Dinu Optimization of HPSG

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Optimization Optimization Goal: Find the minimizer ! that minimizes the objective (cost)

Five Steps to Optimization Five Steps to Optimization Beyond Best Practices Beyond Best

St Stress Aware Layout Stress Aware Layout St A A L L t t Optimization Optimization

TEG: A New Post-Layout TEG: A New Post-Layout Optimization Method Optimization Method Shuo

Evolutionary Algorithm 2. Swarm Intelligence and Ant Colony Optimization Ant Colony Optimization

Optimization Process Done by an Optimization Algorithm Jose Rueda Torres Learning Objectives

Optimization (Introduction) Optimization Goal: Find the minimizer that minimizes the

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems Instructor:

MATHEMATICS 1 CONTENTS Unconstrained optimization Constrained optimization Lagrange method

Hiding the Input Size in Secure Two-Party Computation Yehuda Lindell , Kobbi Nissim, Claudio

Towards Understanding Learning Representations: To What Extent Do Difgerent Neural Networks Learn

Publishing with Amusewiki Oslo, Nordic Perl Workshop 2018 Marco Pessotto (melmothX) September 6,

Communicating Results of Data Analysis Hctor Corrada Bravo University of Maryland, College

O ( d , d ) preserve classical integrability Yuta Sekiguchi University of Bern (AEC, ITP)

Metallurgical Processes Chapter Thirty One: Welding Processes Dr. Eng. Yazan Al-Zain Department

Instance: ttP

CARIBIC greenhouse gases and non-methane hydrocarbons Carl Brenninkmeijer, Tanja Schuck, and