Optimization Unconstrained optimization Constrained optimization - PowerPoint PPT Presentation

Optimization

Unconstrained optimization Constrained optimization Newton with equality One-dimensional constraints Multi-dimensional Active-set method Newton’s method Descent methods Basic Newton Gradient Simplex method Gauss- descent Newton Conjugate Quasi- gradient Newton Interior-point method

Unconstrained optimization • Define an objective function over a domain: f : R n → R • Optimization variables: x T = { x 1 , x 2 , · · · , x n } minimize f ( x 1 , x 2 , · · · , x n ) minimize f ( x ), for x ∈ R n

Constraints • Equality constraints a i ( x ) = 0 for x ∈ R n , where i = 1 , · · · , p • Inequality constraints c j ( x ) ≥ 0 for x ∈ R n , where j = 1 , · · · , q

Constrained optimization minimize f ( x ), for x ∈ R n subjec to a i ( x ) = 0, where i = 1 , · · · , p c j ( x ) ≥ 0, where j = 1 , · · · , q • Solution: x * satisfies constraints a i and c j , while minimizing the objective function f ( x )

Formulate an optimization • General optimization problem is very difficult to solve • Certain problem classes can be solve efficiently and reliably • Convex problems can be solved with global solutions efficiently and reliably • Nonconvex problems do not guarantee global solutions

Example: pattern matching • A pattern can be described by a set of points, P = { p 1 , p 2 , ..., p n } • The same object viewed from a different distance or a different angle corresponds to a different P’ • Two patterns P and P’ are similar if � ⇥ � ⇥ cos θ − sin θ r 1 p � i = η p i + sin θ cos θ r 2

Example: pattern matching • Let Q = { q 1 , q 2 , ..., q n } be the target pattern, find the most similar pattern among P 1 , P 2 , ..., P n

Inverse kinematics a set of 3D marker positions a pose described by joint angles

Optimal motion trajectories

Quiz Arrive at d with velocity = 0 Maximal force allowed: F Minimize time? Minimize energy? 0 d

• Unconstrained optimization • Newton method • Gauss-Newton method • Gradient descent method • Conjugate gradient method

Newton method Find the roots of of a nonlinear function C ( x ) = 0 We can linearize the function as x ) = C ( x ) + C ′ ( x )(¯ C (¯ x − x ) = 0 , where C ′ ( x ) = ∂ C ∂ x x = x − C ( x ) Then we can estimate the roots as ¯ C ′ ( x )

Root estimation C ( x (1) ) = C ( x (0) ) + C ′ ( x (0) )( x (1) − x (0) ) C ( x ) x x (2) x (1) x (0)

Minimization Find such that the nonlinear function is a F ( x ∗ ) x ∗ minimum What is the simplest function that has minima? F ( x ( k ) + δ ) = F ( x ( k ) ) + F ′ ( x ( k ) ) δ + 1 2 F ′′ ( x ( k ) ) δ 2 Find the minima of F ( x ) Find the roots of F ′ ( x ) δ = − F ′ ( x ) ∂ F ( x ( k ) + δ ) = 0 F ′′ ( x ) ∂δ

Conditions • What are the conditions for minima to exist? • Necessary conditions: a local minimum exists at x* F ′ ( x ∗ ) = 0 F ′′ ( x ∗ ) ≥ 0 • Sufficient conditions: an isolated minimum exists x* F ′ ( x ∗ ) = 0 F ′′ ( x ∗ ) > 0

Minimization F ′′ ( x ∗ ) > 0 F ( x ) x x ∗ F ′ ( x )

Multidimensional optimization • Search methods only need function evaluations • First-order gradient-based methods depend on the information of gradient g • Second-order gradient-based methods depend on both gradient and Hessian H

Multiple variables F ( x ( k ) + p ) = F ( x ( k ) ) + g T ( x ( k ) ) p + 1 2 p T H ( x ( k ) ) p ∂ F   ∂ x 1 . . g ( x ) = ∇ x F =   gradient vector .   ∂ F ∂ x n   ∂ 2 F ∂ 2 F · · · ∂ x 2 ∂ x 1 ∂ x n 1 ∂ 2 F ∂ 2 F    · · ·  ∂ x 2 ∂ x 1 ∂ x 2 ∂ x n H ( x ) = ∇ 2 Hessian matrix xx F =   . . . .   . .     ∂ 2 F ∂ 2 F · · · ∂ x 2 ∂ x n ∂ x 1 n

Multiple variables 0 = g ( x ( k ) ) + H ( x ( k ) ) p p = − H ( x ( k ) ) − 1 g ( x ( k ) ) x ( k +1) = x ( k ) + p

Multiple variables Necessary conditions: g ( x ∗ ) = 0 p T H ∗ p ≥ 0 H is positive semi-definite Sufficient conditions: g ( x ∗ ) = 0 p T H ∗ p > 0 H is positive definite

Gauss-Newton method • What if the objective function is in the form of a vector of functions? f = [ f 1 ( x ) f 2 ( x ) · · · f m ( x )] T • The real-valued function can be formed as m f p ( x ) 2 = f T f � F = p =1

Jacobian • Each f p ( x ) depends on x i for i = 1,2,..., m , a gradient matrix can be formed • The Jacobian need not to be a square matrix

Gradient and Hessian • Gradient of objective function m ∂ F 2 f p ( x ) ∂ f p � = ∂ x i ∂ x i p =1 g F = 2 J T f • Hessian of objective function m m ∂ 2 F f p ( x ) ∂ 2 f p ∂ f p ∂ f p � � = 2 + 2 ∂ x i ∂ x j ∂ x i ∂ x j ∂ x i ∂ x j p =1 p =1 H F ≈ 2 J T J

Gauss-Newton algorithm • In k th iteration, compute f p ( x k ) and J k to obtain new g k and H k • Compute p k = -(2 J T J ) -1 (2 J T f ) = -( J T J ) -1 ( J T f ) • Find α k that minimizes F ( x k + α k p k ) • Set x k +1 = x k + α k p k

• First-order gradient methods • Greatest gradient descent • Conjugate gradient

Solving large linear system Ax = b A a known, square, symmetric, and positive semi-definite matrix b a known vector x an unknown vector If A is dense, solve with factorization and back substitution If A is sparse, solve with iterative methods (descent methods)

Quadratic form F ( x ) = 1 2 x T Ax − b T x + c F ′ ( x ) = 1 2 A T x + 1 The gradient of F ( x ) is 2 Ax − b If A is symmetric, F ′ ( x ) = Ax − b F ′ ( x ) = 0 = Ax − b The critical point of F is also the solution to Ax = b If A is not symmetric, what is the linear system solved by finding the critical points of F ?

Greatest gradient descent Start at an arbitrary point x (0) and slide down to the bottom of the paraboloid Take a series of steps x (1) , x (2) , ... until we are satisfied that we are close enough to the solution x * Take a step along the direction in which F descents most quickly − F ′ ( x ( k ) ) = b − Ax ( k )

Greatest gradient descent Important definitions: error: e ( k ) = x ( k ) − x ∗ residual: r ( k ) = b − Ax ( k ) = − F ′ ( x ( k ) ) = − Ae ( k ) Think residual as the direction of the greatest descent

Line search x (1) = x (0) + α r (0) r (0) But how big of a step should we take? x (0) A line search is a procedure that chooses to minimize F along a line α

Line search 140 120 100 150 1 100 80 50 60 2.5 0 0 40 2 2.5 5 2.5 5 -2.5 20 -2.5 0 -2.5 0 -5 1 1 0.2 0.4 0.6

Optimal step size d α F ( x (1) ) = F ′ ( x (1) ) T d d d α x (1) = F ′ ( x (1) ) T r (0) = 0 F ′ ( x (1) ) r (0) r T (0) r (1) = 0

Optimal step size Exercise: derive alpha from r T ( k ) r ( k +1) = 0 Hint: replace the terms involving ( k +1) with those involving ( k) by x ( k +1) = x ( k ) + α r ( k ) r T k r k Ans: α = r T k Ar k

Recurrence of residual 1. r ( k ) = b − Ax ( k ) r T k r k 2. α = r T k Ar k 3. x ( k +1) = x ( k ) + α r ( k ) The algorithm requires two matrix-vector multiplications per iteration One multiplication can be eliminated by replacing step 1 with r ( k +1) = r ( k ) − α Ar ( k )

Quiz • In our IK problem, we use greatest gradient descent method to find an optimal pose, but we can’t compute alpha using the formula described in the previous slides, why?

Line search • Exact line search: Choose t to minimize f along the ray { x + t ∆ x | t ≥ 0 } t = argmin s ≥ 0 f ( x + s ∆ x ) • Backtracking line search depends on two constants: α and β given a descent direction ∆ x for f at x ∈ dom f , α ∈ (0,0.5), β ∈ (0,1) t := 1 while f ( x + t ∆ x ) > f ( x ) + α t ∇ f ( x ) T ∆ x , t := β t

Poor convergence What is the problem with greatest descent? Wouldn’t it be nice if we can avoid to traverse the same direction?

Conjugate directions Pick a set of directions: d (0) , d (1) , · · · , d ( n − 1) Take exactly one step along each direction Solution is found within n steps Two problems: 1. How do we determine these directions? 2. How do we determine the step size along each direction?

A-orthogonality If we take the optimal step size along each direction d d α F ( x ( k +1) ) = 0 F ′ ( x ( k +1) ) T d = 0 d α x ( k +1) − r T ( k +1) d ( k ) = 0 d T ( k ) Ae ( k +1) = 0 Two different vectors v and u are A-orthogonal or conjugate, if v T Au = 0

A-orthogonality vectors are A-orthogonal vectors are orthogonal

Optimal size step must be A-orthogonal to e ( k +1) d ( k ) Using this condition, can you derive ? α ( k )

Algorithm Suppose we can come up with a set of A-orthogonal directions , this algorithm will converge in n { d ( k ) } steps 1. Take d ( k ) d T ( k ) r ( k ) 2. α ( k ) = − d T ( k ) Ad ( k ) 3. x ( k +1) = x ( k ) + α ( k ) d ( k )

Optimization Unconstrained optimization Constrained optimization - PowerPoint PPT Presentation

Optimization Unconstrained optimization Constrained optimization Newton with equality One-dimensional constraints Multi-dimensional Active-set method Newtons method Descent methods Basic Newton Gradient Simplex method Gauss-

MATHEMATICS 1 CONTENTS Unconstrained optimization Constrained optimization Lagrange method

Local, Unconstrained Function Optimization COMPSCI 527 Computer Vision COMPSCI 527

Algorithms for unconstrained local optimization Fabio Schoen 2008

Unconstrained Optimization -4 0 -4 -2 -2 BEEM103 Mathematics for Economists 0 0 2 2 4 4

Unconstrained Elastic Matching Unconstrained Elastic Matching and Eigen Eigen- -Deformations

Outline Optimization Unconstrained Optimization Problems Machine Learning and Pattern

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Constrained optimization Newton-like Linear Integer methods programming programming Descent

Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical

Unconstrained minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi

PDE-Constrained Optimization Using Hyper-Reduced Models Matthew J. Zahr and Charbel Farhat

PDE-Constrained Optimization using Progressively-Constructed Reduced-Order Models Matthew J.

Accelerating PDE-Constrained Optimization using Progressively-Constructed Reduced-Order Models

Mixed-Integer PDE-Constrained Optimization Frontiers in PDE-constrained Optimization Pelin Cay,

Ch02. Constrained Optimization Ping Yu Faculty of Business and Economics The University of Hong

Presentation constrained optimization Wenda Chen Speech Data and Constrained Optimization

Neural Architecture Search CS 4803 / 7643 Deep Learning Erik Wijmans, 10/29/2020 Background 2

Advanced Search Algorithms Daniel Clothiaux https://phontron.com/class/nn4nlp2017/ Why search?

We've gone from the industrial age which This involves developing two specialisms which was very

Discovery for the Web of Things Niels Olof Bouvin 2 The Challenge of Interoperability JOKE!

5.1 Seminar [08.11] Domain Studies [15.11] Abstraction Heuristics I [22.11] Abstraction

Orthogonal Range Searching Carola Wenk 4/9/15 1 CMPS 3130/6130 Computational Geometry

Massive Data Algorithmics Lecture 7: Range Searching Massive Data Algorithmics Lecture 7: Range

Active Inductive Logic Programming for Code Search Aishwarya Sivaraman, Tianyi Zhang, Guy Van den