Optimization
Unconstrained optimization Constrained optimization Newton with equality One-dimensional constraints Multi-dimensional Active-set method Newton’s method Descent methods Basic Newton Gradient Simplex method Gauss- descent Newton Conjugate Quasi- gradient Newton Interior-point method
Unconstrained optimization • Define an objective function over a domain: f : R n → R • Optimization variables: x T = { x 1 , x 2 , · · · , x n } minimize f ( x 1 , x 2 , · · · , x n ) minimize f ( x ), for x ∈ R n
Constraints • Equality constraints a i ( x ) = 0 for x ∈ R n , where i = 1 , · · · , p • Inequality constraints c j ( x ) ≥ 0 for x ∈ R n , where j = 1 , · · · , q
Constrained optimization minimize f ( x ), for x ∈ R n subjec to a i ( x ) = 0, where i = 1 , · · · , p c j ( x ) ≥ 0, where j = 1 , · · · , q • Solution: x * satisfies constraints a i and c j , while minimizing the objective function f ( x )
Formulate an optimization • General optimization problem is very difficult to solve • Certain problem classes can be solve efficiently and reliably • Convex problems can be solved with global solutions efficiently and reliably • Nonconvex problems do not guarantee global solutions
Example: pattern matching • A pattern can be described by a set of points, P = { p 1 , p 2 , ..., p n } • The same object viewed from a different distance or a different angle corresponds to a different P’ • Two patterns P and P’ are similar if � ⇥ � ⇥ cos θ − sin θ r 1 p � i = η p i + sin θ cos θ r 2
Example: pattern matching • Let Q = { q 1 , q 2 , ..., q n } be the target pattern, find the most similar pattern among P 1 , P 2 , ..., P n
Inverse kinematics a set of 3D marker positions a pose described by joint angles
Optimal motion trajectories
Quiz Arrive at d with velocity = 0 Maximal force allowed: F Minimize time? Minimize energy? 0 d
• Unconstrained optimization • Newton method • Gauss-Newton method • Gradient descent method • Conjugate gradient method
Newton method Find the roots of of a nonlinear function C ( x ) = 0 We can linearize the function as x ) = C ( x ) + C ′ ( x )(¯ C (¯ x − x ) = 0 , where C ′ ( x ) = ∂ C ∂ x x = x − C ( x ) Then we can estimate the roots as ¯ C ′ ( x )
Root estimation C ( x (1) ) = C ( x (0) ) + C ′ ( x (0) )( x (1) − x (0) ) C ( x ) x x (2) x (1) x (0)
Minimization Find such that the nonlinear function is a F ( x ∗ ) x ∗ minimum What is the simplest function that has minima? F ( x ( k ) + δ ) = F ( x ( k ) ) + F ′ ( x ( k ) ) δ + 1 2 F ′′ ( x ( k ) ) δ 2 Find the minima of F ( x ) Find the roots of F ′ ( x ) δ = − F ′ ( x ) ∂ F ( x ( k ) + δ ) = 0 F ′′ ( x ) ∂δ
Conditions • What are the conditions for minima to exist? • Necessary conditions: a local minimum exists at x* F ′ ( x ∗ ) = 0 F ′′ ( x ∗ ) ≥ 0 • Sufficient conditions: an isolated minimum exists x* F ′ ( x ∗ ) = 0 F ′′ ( x ∗ ) > 0
Minimization F ′′ ( x ∗ ) > 0 F ( x ) x x ∗ F ′ ( x )
Multidimensional optimization • Search methods only need function evaluations • First-order gradient-based methods depend on the information of gradient g • Second-order gradient-based methods depend on both gradient and Hessian H
Multiple variables F ( x ( k ) + p ) = F ( x ( k ) ) + g T ( x ( k ) ) p + 1 2 p T H ( x ( k ) ) p ∂ F ∂ x 1 . . g ( x ) = ∇ x F = gradient vector . ∂ F ∂ x n ∂ 2 F ∂ 2 F · · · ∂ x 2 ∂ x 1 ∂ x n 1 ∂ 2 F ∂ 2 F · · · ∂ x 2 ∂ x 1 ∂ x 2 ∂ x n H ( x ) = ∇ 2 Hessian matrix xx F = . . . . . . ∂ 2 F ∂ 2 F · · · ∂ x 2 ∂ x n ∂ x 1 n
Multiple variables 0 = g ( x ( k ) ) + H ( x ( k ) ) p p = − H ( x ( k ) ) − 1 g ( x ( k ) ) x ( k +1) = x ( k ) + p
Multiple variables Necessary conditions: g ( x ∗ ) = 0 p T H ∗ p ≥ 0 H is positive semi-definite Sufficient conditions: g ( x ∗ ) = 0 p T H ∗ p > 0 H is positive definite
Gauss-Newton method • What if the objective function is in the form of a vector of functions? f = [ f 1 ( x ) f 2 ( x ) · · · f m ( x )] T • The real-valued function can be formed as m f p ( x ) 2 = f T f � F = p =1
Jacobian • Each f p ( x ) depends on x i for i = 1,2,..., m , a gradient matrix can be formed • The Jacobian need not to be a square matrix
Gradient and Hessian • Gradient of objective function m ∂ F 2 f p ( x ) ∂ f p � = ∂ x i ∂ x i p =1 g F = 2 J T f • Hessian of objective function m m ∂ 2 F f p ( x ) ∂ 2 f p ∂ f p ∂ f p � � = 2 + 2 ∂ x i ∂ x j ∂ x i ∂ x j ∂ x i ∂ x j p =1 p =1 H F ≈ 2 J T J
Gauss-Newton algorithm • In k th iteration, compute f p ( x k ) and J k to obtain new g k and H k • Compute p k = -(2 J T J ) -1 (2 J T f ) = -( J T J ) -1 ( J T f ) • Find α k that minimizes F ( x k + α k p k ) • Set x k +1 = x k + α k p k
• First-order gradient methods • Greatest gradient descent • Conjugate gradient
Solving large linear system Ax = b A a known, square, symmetric, and positive semi-definite matrix b a known vector x an unknown vector If A is dense, solve with factorization and back substitution If A is sparse, solve with iterative methods (descent methods)
Quadratic form F ( x ) = 1 2 x T Ax − b T x + c F ′ ( x ) = 1 2 A T x + 1 The gradient of F ( x ) is 2 Ax − b If A is symmetric, F ′ ( x ) = Ax − b F ′ ( x ) = 0 = Ax − b The critical point of F is also the solution to Ax = b If A is not symmetric, what is the linear system solved by finding the critical points of F ?
Greatest gradient descent Start at an arbitrary point x (0) and slide down to the bottom of the paraboloid Take a series of steps x (1) , x (2) , ... until we are satisfied that we are close enough to the solution x * Take a step along the direction in which F descents most quickly − F ′ ( x ( k ) ) = b − Ax ( k )
Greatest gradient descent Important definitions: error: e ( k ) = x ( k ) − x ∗ residual: r ( k ) = b − Ax ( k ) = − F ′ ( x ( k ) ) = − Ae ( k ) Think residual as the direction of the greatest descent
Line search x (1) = x (0) + α r (0) r (0) But how big of a step should we take? x (0) A line search is a procedure that chooses to minimize F along a line α
Line search 140 120 100 150 1 100 80 50 60 2.5 0 0 40 2 2.5 5 2.5 5 -2.5 20 -2.5 0 -2.5 0 -5 1 1 0.2 0.4 0.6
Optimal step size d α F ( x (1) ) = F ′ ( x (1) ) T d d d α x (1) = F ′ ( x (1) ) T r (0) = 0 F ′ ( x (1) ) r (0) r T (0) r (1) = 0
Optimal step size Exercise: derive alpha from r T ( k ) r ( k +1) = 0 Hint: replace the terms involving ( k +1) with those involving ( k) by x ( k +1) = x ( k ) + α r ( k ) r T k r k Ans: α = r T k Ar k
Recurrence of residual 1. r ( k ) = b − Ax ( k ) r T k r k 2. α = r T k Ar k 3. x ( k +1) = x ( k ) + α r ( k ) The algorithm requires two matrix-vector multiplications per iteration One multiplication can be eliminated by replacing step 1 with r ( k +1) = r ( k ) − α Ar ( k )
Quiz • In our IK problem, we use greatest gradient descent method to find an optimal pose, but we can’t compute alpha using the formula described in the previous slides, why?
Line search • Exact line search: Choose t to minimize f along the ray { x + t ∆ x | t ≥ 0 } t = argmin s ≥ 0 f ( x + s ∆ x ) • Backtracking line search depends on two constants: α and β given a descent direction ∆ x for f at x ∈ dom f , α ∈ (0,0.5), β ∈ (0,1) t := 1 while f ( x + t ∆ x ) > f ( x ) + α t ∇ f ( x ) T ∆ x , t := β t
Poor convergence What is the problem with greatest descent? Wouldn’t it be nice if we can avoid to traverse the same direction?
Conjugate directions Pick a set of directions: d (0) , d (1) , · · · , d ( n − 1) Take exactly one step along each direction Solution is found within n steps Two problems: 1. How do we determine these directions? 2. How do we determine the step size along each direction?
A-orthogonality If we take the optimal step size along each direction d d α F ( x ( k +1) ) = 0 F ′ ( x ( k +1) ) T d = 0 d α x ( k +1) − r T ( k +1) d ( k ) = 0 d T ( k ) Ae ( k +1) = 0 Two different vectors v and u are A-orthogonal or conjugate, if v T Au = 0
A-orthogonality vectors are A-orthogonal vectors are orthogonal
Optimal size step must be A-orthogonal to e ( k +1) d ( k ) Using this condition, can you derive ? α ( k )
Algorithm Suppose we can come up with a set of A-orthogonal directions , this algorithm will converge in n { d ( k ) } steps 1. Take d ( k ) d T ( k ) r ( k ) 2. α ( k ) = − d T ( k ) Ad ( k ) 3. x ( k +1) = x ( k ) + α ( k ) d ( k )
Recommend
More recommend