optimization unconstrained optimization constrained
play

Optimization Unconstrained optimization Constrained optimization - PowerPoint PPT Presentation

Optimization Unconstrained optimization Constrained optimization Newton with equality One-dimensional constraints Multi-dimensional Active-set method Newtons method Descent methods Basic Newton Gradient Simplex method Gauss-


  1. Optimization

  2. Unconstrained optimization Constrained optimization Newton with equality One-dimensional constraints Multi-dimensional Active-set method Newton’s method Descent methods Basic Newton Gradient Simplex method Gauss- descent Newton Conjugate Quasi- gradient Newton Interior-point method

  3. Unconstrained optimization • Define an objective function over a domain: f : R n → R • Optimization variables: x T = { x 1 , x 2 , · · · , x n } minimize f ( x 1 , x 2 , · · · , x n ) minimize f ( x ), for x ∈ R n

  4. Constraints • Equality constraints a i ( x ) = 0 for x ∈ R n , where i = 1 , · · · , p • Inequality constraints c j ( x ) ≥ 0 for x ∈ R n , where j = 1 , · · · , q

  5. Constrained optimization minimize f ( x ), for x ∈ R n subjec to a i ( x ) = 0, where i = 1 , · · · , p c j ( x ) ≥ 0, where j = 1 , · · · , q • Solution: x * satisfies constraints a i and c j , while minimizing the objective function f ( x )

  6. Formulate an optimization • General optimization problem is very difficult to solve • Certain problem classes can be solve efficiently and reliably • Convex problems can be solved with global solutions efficiently and reliably • Nonconvex problems do not guarantee global solutions

  7. Example: pattern matching • A pattern can be described by a set of points, P = { p 1 , p 2 , ..., p n } • The same object viewed from a different distance or a different angle corresponds to a different P’ • Two patterns P and P’ are similar if � ⇥ � ⇥ cos θ − sin θ r 1 p � i = η p i + sin θ cos θ r 2

  8. Example: pattern matching • Let Q = { q 1 , q 2 , ..., q n } be the target pattern, find the most similar pattern among P 1 , P 2 , ..., P n

  9. Inverse kinematics a set of 3D marker positions a pose described by joint angles

  10. Optimal motion trajectories

  11. Quiz Arrive at d with velocity = 0 Maximal force allowed: F Minimize time? Minimize energy? 0 d

  12. • Unconstrained optimization • Newton method • Gauss-Newton method • Gradient descent method • Conjugate gradient method

  13. Newton method Find the roots of of a nonlinear function C ( x ) = 0 We can linearize the function as x ) = C ( x ) + C ′ ( x )(¯ C (¯ x − x ) = 0 , where C ′ ( x ) = ∂ C ∂ x x = x − C ( x ) Then we can estimate the roots as ¯ C ′ ( x )

  14. Root estimation C ( x (1) ) = C ( x (0) ) + C ′ ( x (0) )( x (1) − x (0) ) C ( x ) x x (2) x (1) x (0)

  15. Minimization Find such that the nonlinear function is a F ( x ∗ ) x ∗ minimum What is the simplest function that has minima? F ( x ( k ) + δ ) = F ( x ( k ) ) + F ′ ( x ( k ) ) δ + 1 2 F ′′ ( x ( k ) ) δ 2 Find the minima of F ( x ) Find the roots of F ′ ( x ) δ = − F ′ ( x ) ∂ F ( x ( k ) + δ ) = 0 F ′′ ( x ) ∂δ

  16. Conditions • What are the conditions for minima to exist? • Necessary conditions: a local minimum exists at x* F ′ ( x ∗ ) = 0 F ′′ ( x ∗ ) ≥ 0 • Sufficient conditions: an isolated minimum exists x* F ′ ( x ∗ ) = 0 F ′′ ( x ∗ ) > 0

  17. Minimization F ′′ ( x ∗ ) > 0 F ( x ) x x ∗ F ′ ( x )

  18. Multidimensional optimization • Search methods only need function evaluations • First-order gradient-based methods depend on the information of gradient g • Second-order gradient-based methods depend on both gradient and Hessian H

  19. Multiple variables F ( x ( k ) + p ) = F ( x ( k ) ) + g T ( x ( k ) ) p + 1 2 p T H ( x ( k ) ) p ∂ F   ∂ x 1 . . g ( x ) = ∇ x F =   gradient vector .   ∂ F ∂ x n   ∂ 2 F ∂ 2 F · · · ∂ x 2 ∂ x 1 ∂ x n 1 ∂ 2 F ∂ 2 F    · · ·  ∂ x 2 ∂ x 1 ∂ x 2 ∂ x n H ( x ) = ∇ 2 Hessian matrix xx F =   . . . .   . .     ∂ 2 F ∂ 2 F · · · ∂ x 2 ∂ x n ∂ x 1 n

  20. Multiple variables 0 = g ( x ( k ) ) + H ( x ( k ) ) p p = − H ( x ( k ) ) − 1 g ( x ( k ) ) x ( k +1) = x ( k ) + p

  21. Multiple variables Necessary conditions: g ( x ∗ ) = 0 p T H ∗ p ≥ 0 H is positive semi-definite Sufficient conditions: g ( x ∗ ) = 0 p T H ∗ p > 0 H is positive definite

  22. Gauss-Newton method • What if the objective function is in the form of a vector of functions? f = [ f 1 ( x ) f 2 ( x ) · · · f m ( x )] T • The real-valued function can be formed as m f p ( x ) 2 = f T f � F = p =1

  23. Jacobian • Each f p ( x ) depends on x i for i = 1,2,..., m , a gradient matrix can be formed • The Jacobian need not to be a square matrix

  24. Gradient and Hessian • Gradient of objective function m ∂ F 2 f p ( x ) ∂ f p � = ∂ x i ∂ x i p =1 g F = 2 J T f • Hessian of objective function m m ∂ 2 F f p ( x ) ∂ 2 f p ∂ f p ∂ f p � � = 2 + 2 ∂ x i ∂ x j ∂ x i ∂ x j ∂ x i ∂ x j p =1 p =1 H F ≈ 2 J T J

  25. Gauss-Newton algorithm • In k th iteration, compute f p ( x k ) and J k to obtain new g k and H k • Compute p k = -(2 J T J ) -1 (2 J T f ) = -( J T J ) -1 ( J T f ) • Find α k that minimizes F ( x k + α k p k ) • Set x k +1 = x k + α k p k

  26. • First-order gradient methods • Greatest gradient descent • Conjugate gradient

  27. Solving large linear system Ax = b A a known, square, symmetric, and positive semi-definite matrix b a known vector x an unknown vector If A is dense, solve with factorization and back substitution If A is sparse, solve with iterative methods (descent methods)

  28. Quadratic form F ( x ) = 1 2 x T Ax − b T x + c F ′ ( x ) = 1 2 A T x + 1 The gradient of F ( x ) is 2 Ax − b If A is symmetric, F ′ ( x ) = Ax − b F ′ ( x ) = 0 = Ax − b The critical point of F is also the solution to Ax = b If A is not symmetric, what is the linear system solved by finding the critical points of F ?

  29. Greatest gradient descent Start at an arbitrary point x (0) and slide down to the bottom of the paraboloid Take a series of steps x (1) , x (2) , ... until we are satisfied that we are close enough to the solution x * Take a step along the direction in which F descents most quickly − F ′ ( x ( k ) ) = b − Ax ( k )

  30. Greatest gradient descent Important definitions: error: e ( k ) = x ( k ) − x ∗ residual: r ( k ) = b − Ax ( k ) = − F ′ ( x ( k ) ) = − Ae ( k ) Think residual as the direction of the greatest descent

  31. Line search x (1) = x (0) + α r (0) r (0) But how big of a step should we take? x (0) A line search is a procedure that chooses to minimize F along a line α

  32. Line search 140 120 100 150 1 100 80 50 60 2.5 0 0 40 2 2.5 5 2.5 5 -2.5 20 -2.5 0 -2.5 0 -5 1 1 0.2 0.4 0.6

  33. Optimal step size d α F ( x (1) ) = F ′ ( x (1) ) T d d d α x (1) = F ′ ( x (1) ) T r (0) = 0 F ′ ( x (1) ) r (0) r T (0) r (1) = 0

  34. Optimal step size Exercise: derive alpha from r T ( k ) r ( k +1) = 0 Hint: replace the terms involving ( k +1) with those involving ( k) by x ( k +1) = x ( k ) + α r ( k ) r T k r k Ans: α = r T k Ar k

  35. Recurrence of residual 1. r ( k ) = b − Ax ( k ) r T k r k 2. α = r T k Ar k 3. x ( k +1) = x ( k ) + α r ( k ) The algorithm requires two matrix-vector multiplications per iteration One multiplication can be eliminated by replacing step 1 with r ( k +1) = r ( k ) − α Ar ( k )

  36. Quiz • In our IK problem, we use greatest gradient descent method to find an optimal pose, but we can’t compute alpha using the formula described in the previous slides, why?

  37. Line search • Exact line search: Choose t to minimize f along the ray { x + t ∆ x | t ≥ 0 } t = argmin s ≥ 0 f ( x + s ∆ x ) • Backtracking line search depends on two constants: α and β given a descent direction ∆ x for f at x ∈ dom f , α ∈ (0,0.5), β ∈ (0,1) t := 1 while f ( x + t ∆ x ) > f ( x ) + α t ∇ f ( x ) T ∆ x , t := β t

  38. Poor convergence What is the problem with greatest descent? Wouldn’t it be nice if we can avoid to traverse the same direction?

  39. Conjugate directions Pick a set of directions: d (0) , d (1) , · · · , d ( n − 1) Take exactly one step along each direction Solution is found within n steps Two problems: 1. How do we determine these directions? 2. How do we determine the step size along each direction?

  40. A-orthogonality If we take the optimal step size along each direction d d α F ( x ( k +1) ) = 0 F ′ ( x ( k +1) ) T d = 0 d α x ( k +1) − r T ( k +1) d ( k ) = 0 d T ( k ) Ae ( k +1) = 0 Two different vectors v and u are A-orthogonal or conjugate, if v T Au = 0

  41. A-orthogonality vectors are A-orthogonal vectors are orthogonal

  42. Optimal size step must be A-orthogonal to e ( k +1) d ( k ) Using this condition, can you derive ? α ( k )

  43. Algorithm Suppose we can come up with a set of A-orthogonal directions , this algorithm will converge in n { d ( k ) } steps 1. Take d ( k ) d T ( k ) r ( k ) 2. α ( k ) = − d T ( k ) Ad ( k ) 3. x ( k +1) = x ( k ) + α ( k ) d ( k )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend