computational optimization
play

Computational Optimization Last of unconstrained 2/26 Half-way - PowerPoint PPT Presentation

Computational Optimization Last of unconstrained 2/26 Half-way there Minimize f(x) (objective function) subject to x S (constraints) Can characterize problem by type of objective functions and constraints NEOS Optimization Guide


  1. Computational Optimization Last of unconstrained 2/26

  2. Half-way there Minimize f(x) (objective function) subject to x ∈ S (constraints) Can characterize problem by type of objective functions and constraints NEOS Optimization Guide http://www- fp.mcs.anl.gov/otc/Guide/OptWeb/

  3. Optimization Recipes Optimization algorithms are like recipes using common ingredients: � Step-size � Trust regions � Newton’s method � Quasi-Newton � Conjugate directions …. Just stir up the right combination

  4. Some other ingredients Trust Region Methods Limited Memory Quasi Newton Linear Least Square Nonlinear Least Squares Finite difference methods Automatic Differentiation

  5. Trust Region Methods Alternative to line search methods Optimize quadratic model of objective within the “trust region” 1 ∈ + ∇ + arg min ( ) ( )' ' p f x f x p p B p k i i k 2 ≤ Δ . . s t p k − ∇ ( x ) f x + i 1 x i

  6. Options How to pick Bk -- Newton or Quasi-Newton Δ How to pick trust region radius -- k � Shrink if fail to get a decrease � Increase if you get a good decrease � Otherwise keep the same Trust region problem need not be solved exactly. Many variations

  7. Use ratio to determine trust region radius 1 ∈ = + ∇ + arg min ( ) : ( ) ( )' ' p m p f x f x p p B p k k i i k 2 p ≤ Δ . . s t p k Look at ratio of actual versus predicted decrease. − + ( ) ( ) f x f x p ρ = k k k − k (0) ( ) m m p k k k p = Δ If ratio is near one and then increase radius. k If ration is near zero then decrease radius.

  8. Trust Region Methods Pros: Pick direction and stepsize simultaneously Global convergence Superlinear convergence in many cases Some types very effective in practice. Cons: Must solve one or more constrained trust region problems at each iteration

  9. BFGS in NW = − α ∇ x x H f + 1 x k k k = + ρ ' ' where H V H V s s + 1 k k k k k k k 1 ρ = = − ρ ' ( ) with V I s y k k k k k ' y s k k = − = ∇ − ∇ s x x y f f + + 1 1 k k k k k k Hk has nice low rank structure and all we really need to do is multiply it times the gradient. So we can do these without explicitly storing it.

  10. Hk grows at each iteration ( ) ( ) = ' ' 0 ... ... H V V H V V − − − − 1 1 k k k m k m k k ( ) ( ) + ρ ' ' ' ... ... V V s s V V − − − + − − − + − 1 1 1 1 k m k k m k m k m k m k ( ) ( ) + ρ ' ' ' ... ... V V s s V V − + − − + − + − + − + − k m 1 k 1 k m 2 k m 1 k m 1 k m 2 k 1 ...... + ρ ' s s − − − k 1 k 1 k 1 So can define a recursive procedure (see Algorithm 9page 225) that only requires inner products (assuming H0 diagonal) Uses only 4mn multiplications. Requires only storage of sk, yk

  11. More improvements H0 can be changed at each iteration. A good choice in practice is: ' s y = γ γ = − − 0 1 1 k k H I k k k ' y y − − 1 1 k k Limit memory: only store sk,yk for last m iterates and base approximation on that.

  12. Limited Memory BFGS –pros and cons Usually best algorithm for large problems with non-sparse Hessians May not be best if problem has special structure, e.g. sparsity, separable structure, nonlinear least squares. Needs Wolfe stepsize. Relatively cheap iterates Robust May converge slowly on highly ill conditioned problems.

  13. Partially Separable Structure ) 5 x , 4 x ( 3 f + ) 4 x , 1 x ( 2 f ( ) + f x ) 3 i x , = ∑ 1 1 m x = i ( Examples 1 f = ( ) f x ) x ( f

  14. Predict Drug Bioavailability ∈ y R Aqua solubility = Aquasol 525 descriptors generated � Electronic TAE ∈ 525 x R � Traditional i 197 molecules with tested solubility = � 197

  15. b + 1- d Regression with bias w x , x b=2 y

  16. Linear Regression Given training data: ( ) ( ) ( ) ( ) ( ) = … … , , , , , , , , , S x y x y x y x y � � 1 1 2 2 i i ∈ ∈ n points and labels x R y R i i Construct linear function: n ∑ = + = + = + ( ) , ' ( ) g x x w b x w b w x b i i = 1 i Goal for future data (x,y) with y unknown ≈ ( ) g x y

  17. Least Squares Approximation ξ = 2 ) ( ) ) x i x g ( g − − y i y = ( ( , ) ∑ 1 y = � i x = ( , ) f y L g s ≈ Minimize loss ( ) g x Define error Want

  18. ) norm y − b Linear Least Squares Loss e − + 2 Xw 2 ) i )'( y 2 − y y b − − + b b e e x w + + ' Xw i Xw ( 1 = ( � i ∑ = = = ( , ) b w L

  19. Optimal Solution ≈ + Want: y Xw b e e is a vector of ones Mathematical Model: = − + 2 + λ 2 min ( , , ) ( ) || || L w b S y Xw e b w w Optimality Conditions: ∂ ( , , ) L w b S = − − + λ = 2 '( ) 2 0 X y Xw e b w ∂ w ∂ ( , , ) L w b S = − − = 2 '( ) 0 e y Xw e b ∂ b

  20. Optimal Solution Thus : = − ' ' ' e e b e y e Xw ' ' e y e Xw ⇒ = − = − ( ) ( )' b mean y mean X w � � Assume data scaled such that mean(x)=X’e =0 − + λ = − 1 ( ' ) ' ' X X I w X y X e b − = + λ = 1 ( ' ) ' ( ) w X X I X y y b mean

  21. Nonlinear Least squares Partially separable problem � = ∑ 1 ( ) 2 ( , , ) ( , , ) f a b c f a b c i 50 2 = 1 i = − + + 40 2 ( , , ) ( ) f a b c y ax bx c i i i i 30 20 10 0 -10 0 2 4 6 8 10 12 14 16 18 20

  22. Nonlinear Least squares Partially separable problem � ∑ ( ) ∇ = ∇ ( , , ) ( , , )' ( , , ) f a b c f a b c f a b c i i = 1 i � = ∑ 1 ( ) 2 ⎡ ⎤ ( , , ) ( , , ) f a b c f a b c 2 x i 2 i ⎢ ⎥ � ( ) = ∑ 1 i = − − + + 2 ( ) y ax bx c x ⎢ ⎥ i i i i ⎢ ⎥ = 1 i 1 ⎣ ⎦ ⎡ ⎤ 2 x i ⎢ ⎥ ∇ = − ⎢ ( , , ) f a b c x ⎥ i i = − + + 2 ( , , ) ( ) f a b c y ax bx c ⎢ ⎥ 1 i i i i ⎣ ⎦

  23. Nonlinear Least Squares 1 ∑ = 2 min ( ) ( ) f x f x x i Problems of type: 2 i Gradient is ∑ ∇ ( ) ( ) f x f x i i i Hessian is ∑ ∑ ∇ ∇ + ∇ 2 ( ) ( )' ( ) ( ) f x f x f x f x i i i i i i Approximate Hessian by ∑ ∇ ∇ ( ) ( )' f x f x i i i Newton = Gauss-Newton Newton + trust region = Levenberg-Marquardt

  24. Matlab Check out Matlab opimization Type bandem Help fminunc Has all the basics:

  25. What if gradient not available? Can use finite difference methods Recall + − ( ) ( ) f x h f x = '( ) lim f x → h 0 h Approximate using small h + − ( ) ( ) f x h f x ≈ '( ) f x h ∂ + − ( ) ( ) ( ) f x he f x f x ≈ = 1 [1, 0,.., 0]' where e 1 x h 1

  26. Problems Introduces error + − ( ) ( ) f x h f x + − − ( ) ( ) f x h f x h h 2 h Best value of h is very small. Close to machine precision. Have to do for each dimension

  27. Automatic Differentiation function makegradient(fcn, name) %Creates a new matlab function (defined by gradient(fcn)) %and saves it with the specified name % Example: makegradient('x^2+y^2','gf') creates a file gf.m: % function functout = gf(v) % x = v(1); % y = v(2); % functout = [2*x, 2*y]; %

  28. Automatic Differentiation function makehessian(fcn, name) %Creates a new matlab function (defined by gradient(fcn)) %and saves it with the specified name % % Example: makehessian('x^7+x*y^3','hf') creates a file hf.m: % % function functout = hf(v) % x = v(1); % y = v(2); % functout = [[42*x^5, 3*y^2]; [3*y^2, 6*x*y]]; %

  29. Fminunc First try using finite difference approximation for the gradient For example on L X0=[1,1]’; Options = optimset(‘display’,’iter’); X=fminunc(@L,X0,Options)

  30. To use real gradient-Option 1 Combine f,h,g into on matlab file function [f,g,H] = matL(x) f = L(x); if nargout > 1 g = gradL(x); end; if nargout > 2 H = hessL(x); end; Options = optimset(‘gradobj’,’on’,’Display’,’iter’); X=fminunc(@matL,X0,Options)

  31. To use real gradient Options = optimset(‘gradobj’,’on’,’Display’,’iter’); X=fminunc({@L,@gradL},X0,Options)

  32. To use Hessian Same as gradient but add Options = optimset(Options,‘hessian’,’on’); X=fminunc(@matL,X0,Options) Or X=fminunc({@L,@gradL,@hessL},X0,O ptions)

  33. Check your gradients! Try Options = optimset(Options,‘DerivativeCheck’,’on’) X=fminunc(@L,X0,Options) Try on the our family of functions. What happens?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend