CS 287 Advanced Robotics (Fall 2019) Lecture 6: Unconstrained - - PowerPoint PPT Presentation

cs 287 advanced robotics fall 2019 lecture 6
SMART_READER_LITE
LIVE PREVIEW

CS 287 Advanced Robotics (Fall 2019) Lecture 6: Unconstrained - - PowerPoint PPT Presentation

CS 287 Advanced Robotics (Fall 2019) Lecture 6: Unconstrained Optimization Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]


slide-1
SLIDE 1

CS 287 Advanced Robotics (Fall 2019) Lecture 6: Unconstrained Optimization

Pieter Abbeel UC Berkeley EECS

Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 – 11 [optional] Betts, Practical Methods for Optimal Control Using Nonlinear Programming

slide-2
SLIDE 2

Bellman’s Curse of Dimensionality

n n-dimensional state space n Number of states grows exponentially in n (for fixed

number of discretization levels per coordinate)

n In practice

n Discretization is considered only computationally feasible up

to 5 or 6 dimensional state spaces even when using

n Variable resolution discretization n Highly optimized implementations

slide-3
SLIDE 3

n

Goal: find a sequence of control inputs (and corresponding sequence of states) that solves:

n

Generally hard to do. Exception: convex problems, which means g is convex, the sets Ut and Xt are convex, and f is linear.

n

Note: iteratively applying LQR is one way to solve this problem but can get a bit tricky when there are constraints on the control inputs and state.

n

In principle (though not in our examples), u could be parameters of a control policy rather than the raw control inputs.

Optimization for Optimal Control

slide-4
SLIDE 4

n Convex optimization problems n Unconstrained minimization

n Gradient Descent n Newton’s Method n Natural Gradient / Gauss-Newton n Momentum, RMSprop, Aam

Outline

slide-5
SLIDE 5

n A function f is convex if and only if

Convex Functions

∀x1, x2 ∈ Domain(f), ∀t ∈ [0, 1] : f(tx1 + (1 − t)x2) ≤ tf(x1) + (1 − t)f(x2)

Image source: wikipedia

slide-6
SLIDE 6

Convex Functions

Source: Thomas Jungblut’s Blog

  • Unique minimum
  • Set of points for which f(x) <= a is convex
slide-7
SLIDE 7

n Convex optimization problems are a special class of

  • ptimization problems, of the following form:

with fi(x) convex for i = 0, 1, …, n

n A function f is convex if and only if

Convex Optimization Problems

min

x∈Rnf0(x)

s.t. fi(x) ≤ 0 i = 1, . . . , n Ax = b

∀x1, x2 ∈ Domain(f), ∀λ ∈ [0, 1] f(λx1 + (1 − λ)x2) ≤ λf(x1) + (1 − λ)f(x2)

slide-8
SLIDE 8

n Convex optimization problems n Unconstrained minimization

n Gradient Descent n Newton’s Method n Natural Gradient / Gauss-Newton n Momentum, RMSprop, Aam

Outline

slide-9
SLIDE 9

n

x* is a local minimum of (differentiable) f than it has to satisfy:

n

In simple cases we can directly solve the system of n equations given by (2) to find candidate local minima, and then verify (3) for these candidates.

n

In general however, solving (2) is a difficult problem. Going forward we will consider this more general setting and cover numerical solution methods for (1).

Unconstrained Minimization

slide-10
SLIDE 10

n

Idea:

n

Start somewhere

n

Repeat: Take a step in the steepest descent direction

Steepest Descent

Figure source: Mathworks

slide-11
SLIDE 11
slide-12
SLIDE 12
  • 1. Initialize x
  • 2. Repeat
  • 1. Determine the steepest descent direction Δx
  • 2. Line search: Choose a step size t > 0.
  • 3. Update: x := x + t Δx.
  • 3. Until stopping criterion is satisfied

Steepest Descent Algorithm

slide-13
SLIDE 13

What is the Steepest Descent Direction?

à Steepest Descent = Gradient Descent

slide-14
SLIDE 14

Used when the cost of solving the minimization problem with one variable is low compared to the cost of computing the search direction itself.

Stepsize Selection: Exact Line Search

slide-15
SLIDE 15

n Inexact: step length is chose to approximately minimize f along

the ray {x + t Δx | t > 0}

Stepsize Selection: Backtracking Line Search

slide-16
SLIDE 16

Stepsize Selection: Backtracking Line Search

Figure source: Boyd and Vandenberghe

slide-17
SLIDE 17
slide-18
SLIDE 18

Steepest Descent (= Gradient Descent)

Source: Boyd and Vandenberghe

slide-19
SLIDE 19

Gradient Descent: Example 1

Figure source: Boyd and Vandenberghe

slide-20
SLIDE 20

Gradient Descent: Example 2

Figure source: Boyd and Vandenberghe

slide-21
SLIDE 21

Gradient Descent: Example 3

Figure source: Boyd and Vandenberghe

slide-22
SLIDE 22

n

For quadratic function, convergence speed depends on ratio of highest second derivative over lowest second derivative (“condition number”)

n

In high dimensions, almost guaranteed to have a high (=bad) condition number

n

Rescaling coordinates (as could happen by simply expressing quantities in different measurement units) results in a different condition number

Gradient Descent Convergence

Condition number = 10 Condition number = 1

slide-23
SLIDE 23

n Convex optimization problems n Unconstrained minimization

n Gradient Descent n Newton’s Method n Natural Gradient / Gauss-Newton n Momentum, RMSprop, Aam

Outline

slide-24
SLIDE 24
slide-25
SLIDE 25

n 2nd order Taylor Approximation rather than 1st order:

assuming (which is true for convex f) the minimum of the 2nd order approximation is achieved at:

Newton’s Method

Figure source: Boyd and Vandenberghe

slide-26
SLIDE 26

Newton’s Method

Figure source: Boyd and Vandenberghe

slide-27
SLIDE 27

n Consider the coordinate transformation y = A-1 x (x = Ay) n If running Newton’s method starting from x(0) on f(x) results in

x(0), x(1), x(2), …

n Then running Newton’s method starting from y(0) = A-1 x(0) on

g(y) = f(Ay), will result in the sequence y(0) = A-1 x(0), y(1) = A-1 x(1), y(2) = A-1 x(2), … Exercise: try to prove this!

Affine Invariance

slide-28
SLIDE 28
slide-29
SLIDE 29

Affine Invariance --- Proof

slide-30
SLIDE 30

Example 1

Figure source: Boyd and Vandenberghe gradient descent with Newton’s method with backtracking line search

slide-31
SLIDE 31

Example 2

Figure source: Boyd and Vandenberghe

gradient descent Newton’s method

slide-32
SLIDE 32

Larger Version of Example 2

Figure source: Boyd and Vandenberghe

slide-33
SLIDE 33

Gradient Descent: Example 3

Figure source: Boyd and Vandenberghe

slide-34
SLIDE 34

n

Gradient descent

n

Newton’s method (converges in one step if f convex quadratic)

Example 3

slide-35
SLIDE 35

n Quasi-Newton methods use an approximation of the Hessian

n Example 1: Only compute diagonal entries of Hessian, set others equal

to zero. Note this also simplifies computations done with the Hessian.

n Example 2: Natural gradient --- see next slide

Quasi-Newton Methods

slide-36
SLIDE 36

n Convex optimization problems n Unconstrained minimization

n Gradient Descent n Newton’s Method n Natural Gradient / Gauss-Newton n Momentum, RMSprop, Aam

Outline

slide-37
SLIDE 37
slide-38
SLIDE 38

n

Consider a standard maximum likelihood problem:

n

Gradient:

n

Hessian:

n

Natural gradient:

  • nly keeps the 2nd term in the Hessian. Benefits: (1) faster to compute (only gradients needed); (2) guaranteed

to be negative definite; (3) found to be superior in some experiments; (4) invariant to re-parameterization

Natural Gradient

r2f(θ) = X

i

r2p(x(i); θ) p(x(i); θ)

r log p(x(i); θ) ⌘ ⇣ r log p(x(i); θ) ⌘>

slide-39
SLIDE 39
slide-40
SLIDE 40

n Property: Natural gradient is invariant to parameterization of

the family of probability distributions p( x ; θ)

n Hence the name. n Note this property is stronger than the property of Newton’s

method, which is invariant to affine re-parameterizations only.

n Exercise: Try to prove this property!

Natural Gradient

slide-41
SLIDE 41

n Natural gradient for parametrization with θ: n Let Φ = f(θ), and let i.e.,

à the natural gradient direction is the same independent of the (invertible, but otherwise not constrained) reparametrization f

Natural Gradient Invariant to Reparametrization --- Proof

slide-42
SLIDE 42

n Convex optimization problems n Unconstrained minimization

n Gradient Descent n Newton’s Method n Natural Gradient / Gauss-Newton n Momentum, RMSprop, Aam

Outline

slide-43
SLIDE 43

Gradient Descent

Gradient Descent with Momentum

Gradient Descent with Momentum

Typically beta = 0.9 v = exponentially weighted avg of gradient

slide-44
SLIDE 44

Gradient Descent

RMSprop

RMSprop (Root Mean Square propagation)

Typically beta = 0.999 s = exponentially weighted avg of squared gradients

RMSprop

slide-45
SLIDE 45

Gradient Descent

Adam

Adam (Adaptive momentum estimation)

Typically beta1= 0.9; beta2=0.999; eps=1e-8 s = exponentially weighted avg of squared gradients v= momentum

Adam