Introduction to Optimization Amy Langville SAMSI Undergraduate - - PowerPoint PPT Presentation

introduction
SMART_READER_LITE
LIVE PREVIEW

Introduction to Optimization Amy Langville SAMSI Undergraduate - - PowerPoint PPT Presentation

Introduction to Optimization Amy Langville SAMSI Undergraduate Workshop N.C. State University SAMSI 6/1/05 GOAL: minimize f ( x 1 , x 2 , x 3 , x 4 , x 5 ) = x 2 1 . 5 x 2 x 3 + x 4 /x 5 PRIZE: $1 million # of independent variables =


slide-1
SLIDE 1

Introduction

to

Optimization

Amy Langville SAMSI Undergraduate Workshop N.C. State University

SAMSI 6/1/05

slide-2
SLIDE 2

GOAL: minimize f(x1, x2, x3, x4, x5) = x2

1 − .5x2x3 + x4/x5

PRIZE: $1 million

  • # of independent variables =
  • z = f(x1, x2, x3, x4, x5) lives in ℜ?
slide-3
SLIDE 3

GOAL: minimize f(x1, x2, x3, x4, x5) = x2

1 − .5x2x3 + x4/x5

PRIZE: $1 million

  • # of independent variables =
  • z = f(x1, x2, x3, x4, x5) lives in ℜ?

Suppose you know little to nothing about Calculus or Optimization, could you win the prize? How?

slide-4
SLIDE 4

GOAL: minimize f(x1, x2, x3, x4, x5) = x2

1 − .5x2x3 + x4/x5

PRIZE: $1 million

  • # of independent variables =
  • z = f(x1, x2, x3, x4, x5) lives in ℜ?

Suppose you know little to nothing about Calculus or Optimization, could you win the prize? How? Trial and Error, repeated function evaluations

slide-5
SLIDE 5

Calculus III Review

  • local min vs. global min vs. saddle point
  • CPs and horizontal T. planes
  • Local Mins and 2nd Derivative Test
  • Global Mins and CPs and BPs
  • Gradient = Direction of ?
slide-6
SLIDE 6

Constrained vs. Unconstrained Opt.

  • Unconstrained

min f(x, y) = x2 + y2

  • Constrained

— min f(x, y) = x2 + y2 s.t. x ≥ 0, y ≥ 0 — min f(x, y) = x2 + y2 s.t. x > 0, y > 0 — min f(x, y) = x2 + y2 s.t. 1 ≤ x ≤ 2, 0 ≤ y ≤ 3 — min f(x, y) = x2 + y2 s.t. y = x + 2

  • EVT
slide-7
SLIDE 7

Gradient Descent Methods

  • Hillclimbers on Cloudy Day: max f(x, y) = − min f(x, y)
  • Initializations
  • 1st-order and 2nd-order info. from partials: Gradient + Hessian
  • Matlab function: gd(α, x0)
slide-8
SLIDE 8

Iterative Methods

Issues — Convergence Test: what is it for gd.m? — Convergence Proof: is gd.m guaranteed to converge to local min? For α > 0? For α < 0? — Rate of Convergence: how many iterations? How do starting points x0 affect number of iterations? Worst starting point for α = 4? Best?

slide-9
SLIDE 9

Convergence of Optimization Methods

global vs. local vs. stationary point vs. none

  • Most optimization algorithms cannot guarantee convergence to

global min, much less local min.

  • However, some classes of optim. problems are particularly nice.

— Convex objective—EX: z = .5(α x2 + y2), α > 0 Every local min is global min!

  • Even for particularly tough optim. problems, sometimes the most

popular, successful algorithms perform well on many problems, despite lack of convergence theory.

  • Must qualify statements: I found best “global min” to date.
slide-10
SLIDE 10

Your Least Squares Problem

  • how many variables/unknowns n =?
  • z = f(x1, x2, ..., xn) lives in ℜ?
  • can we graph z?
slide-11
SLIDE 11

Nonsmooth, Nondifferentiable Surfaces

  • Can’t compute gradient ∇f ⇒ can’t use GD Methods
  • Line Search Methods
  • Method of Alternating Variables (Coordinate Descent): solve se-

ries of 1-D problems — what would these steps look like on contour map?

slide-12
SLIDE 12

fminsearch and Nelder-Mead

  • maintain basis of n + 1 points where n = # variables
  • form simplex using these points; convex hull
  • idea: move in direction away from worst of these points
  • EX: n = 2, so maintain basis of 3 points living in xy-plane

⇒ simplex is triangle

  • create new simplex by moving away from worst point: reflect, ex-

pand, contract, shrink steps

slide-13
SLIDE 13

PROPERTIES OF NELDER–MEAD

117 ¯ x xr x3 ¯ x xr xe x3

  • Fig. 1. Nelder–Mead simplices after a reflection and an expansion step. The original simplex is

shown with a dashed line.

¯ x xr xc x3 ¯ x xcc x3 x1

  • Fig. 2. Nelder–Mead simplices after an outside contraction, an inside contraction, and a shrink.

The original simplex is shown with a dashed line.

then x(k+1)

1

= x(k)

1 . Beyond this, whatever rule is used to define the original ordering

may be applied after a shrink. We define the change index k∗ of iteration k as the smallest index of a vertex that differs between iterations k and k + 1: k∗ = min{ i | x(k)

i

= x(k+1)

i

}. (2.8) (Tie-breaking rules are needed to define a unique value of k∗.) When Algorithm NM terminates in step 2, 1 < k∗ ≤ n; with termination in step 3, k∗ = 1; with termination in step 4, 1 ≤ k∗ ≤ n + 1; and with termination in step 5, k∗ = 1 or 2. A statement that “xj changes” means that j is the change index at the relevant iteration. The rules and definitions given so far imply that, for a nonshrink iteration,

slide-14
SLIDE 14

N-M Algorithm

slide-15
SLIDE 15

N-M Algorithm

slide-16
SLIDE 16
✞ ✁ ☛ ✞ ✟ ✂ ✄ ✂ ✄ ✄ ☎ ✁ ✂ ✆ ✄ ✄ ✡ ✠ ✞ ✂ ✡ ✟ ☛ ✞
✠ ✝
✆ ☎
✁ ✄ ✂ ☎ ✞ ✆ ✁ ✠ ✜ ☛ ✓ ✘ ✔ ✍ ✝ ✘ ✜ ✜ ✎ ✔ ✞ ✏ ✍ ✞ ✘ ☛ ✝ ☛ ☞ ✞ ✟ ✠ ✤ ✠ ✜ ✗ ✠ ✏ ✡ ✳ ✠ ✍ ✗ ✍ ✜ ✦ ☛ ✏ ✘ ✞ ✟ ✢ ✎ ✔ ✠ ✗ ✘ ✝ ✁ ☞ ✴ ✣ ✂ ★ ✓ ✟ ✠ ✏ ✠ ✍ ✜ ✘ ✞ ✞ ✜ ✠ ✔ ✘ ✢ ✕ ✜ ✠
✞ ✏ ✘ ✍ ✝ ✦ ✜ ✠ ✝ ✓ ✙ ✏ ✍ ✓ ✜ ✔ ✖ ✗ ☛ ✓ ✝ ✟ ✘ ✜ ✜ ✞ ☛ ✓ ✍ ✏ ✗ ✔ ✞ ✟ ✠ ✜ ☛ ✙ ✍ ✜ ✢ ✘ ✝ ✘ ✢ ✎ ✢ ✜ ✘ ✂ ✠ ✍ ✝ ✍ ✢ ☛ ✠ ✩ ✍ ✪ ✥ ☞ ✞ ✠ ✏ ✭ ✔ ✞ ✠ ✕ ✔ ✪

20 40 60 80 100 120 140 160 180 10

−2

10

−1

10 SN1939A days luminosity 20 40 60 80 100 2 3 4 5 6 7 8 0.17 0.18 0.19 0.2 0.21 0.22 0.23 SN1939A, Residual norm as a function of λ1 and λ2 λ2 λ1 Residual norm

✥ ☞ ✞ ✠ ✏ ✻ ✔ ✞ ✠ ✕ ✪

20 40 60 80 100 120 140 160 180 10

−2

10

−1

10 SN1939A days luminosity 20 40 60 80 100 2 3 4 5 6 7 8 0.17 0.18 0.19 0.2 0.21 0.22 0.23 SN1939A, Residual norm as a function of λ1 and λ2 λ2 λ1 Residual norm

✻ ✮
slide-17
SLIDE 17
✞ ✁ ☛ ✞ ✟ ✂ ✄ ✂ ✄ ✄ ☎ ✁ ✂ ✆ ✄ ✄ ✡ ✠ ✞ ✂ ✡ ✟ ☛ ✞
✠ ✝
✆ ☎
✁ ✄ ✂ ☎ ✞ ✆ ✥ ☞ ✞ ✠ ✏ ✽ ✔ ✞ ✠ ✕ ✔ ✪

20 40 60 80 100 120 140 160 180 10

−2

10

−1

10 SN1939A days luminosity 20 40 60 80 100 2 3 4 5 6 7 8 0.17 0.18 0.19 0.2 0.21 0.22 0.23 SN1939A, Residual norm as a function of λ1 and λ2 λ2 λ1 Residual norm

✥ ☞ ✞ ✠ ✏ ◗ ✔ ✞ ✠ ✕ ✔ ✪

20 40 60 80 100 120 140 160 180 10

−2

10

−1

10 SN1939A days luminosity 20 40 60 80 100 2 3 4 5 6 7 8 0.17 0.18 0.19 0.2 0.21 0.22 0.23 SN1939A, Residual norm as a function of λ1 and λ2 λ2 λ1 Residual norm

✻ ✒
slide-18
SLIDE 18
✞ ✁ ☛ ✞ ✟ ✂ ✄ ✂ ✄ ✄ ☎ ✁ ✂ ✆ ✄ ✄ ✡ ✠ ✞ ✂ ✡ ✟ ☛ ✞
✠ ✝
✆ ☎
✁ ✄ ✂ ☎ ✞ ✆ ✥ ☞ ✞ ✠ ✏ ✡ ✔ ✞ ✠ ✕ ✔ ✪

20 40 60 80 100 120 140 160 180 10

−2

10

−1

10 SN1939A days luminosity 20 40 60 80 100 2 3 4 5 6 7 8 0.17 0.18 0.19 0.2 0.21 0.22 0.23 SN1939A, Residual norm as a function of λ1 and λ2 λ2 λ1 Residual norm

✥ ☞ ✞ ✠ ✏ ✻ ✽ ✔ ✞ ✠ ✕ ✔ ✪

20 40 60 80 100 120 140 160 180 10

−2

10

−1

10 SN1939A days luminosity 20 40 60 80 100 2 3 4 5 6 7 8 0.17 0.18 0.19 0.2 0.21 0.22 0.23 SN1939A, Residual norm as a function of λ1 and λ2 λ2 λ1 Residual norm

✽ ✭
slide-19
SLIDE 19
✞ ✁ ☛ ✞ ✟ ✂ ✄ ✂ ✄ ✄ ☎ ✁
✆ ✄ ✄ ☛ ✝ ✡ ✁ ✄ ✂ ✥ ☞ ✞ ✠ ✏ ◗ ✭ ✔ ✞ ✠ ✕ ✔ ✁✙ ☛ ✝ ✖ ✠ ✏ ✦ ✠ ✗ ✝ ✪

20 40 60 80 100 120 140 160 180 10

−2

10

−1

10 SN1939A days luminosity 20 40 60 80 100 2 3 4 5 6 7 8 0.17 0.18 0.19 0.2 0.21 0.22 0.23

Starting guess λ0

SN1939A, Residual norm as a function of λ1 and λ2 λ2

Optimizer λ*

λ1 Residual norm

✘ ✝ ✞ ✏ ✠ ✔ ✎ ✜ ✞ ✔ ✞ ☛ ✍ ✁ ✜ ✠
✂ ✁ ✴ ✄ ✼ ✱ ✂ ✭ ✂ ✪ ✄ ✄ ✁ ✜ ✯ ✞ ✁ ✧ ✡ ✣ ✤ ✖ ☛ ✲ ✣ ✛ ✁ ✜ ✭ ✣ ☛ ✝ ✮ ✝ ☛ ✘ ☛ ✥ ☛ ✙ ✕ ✄ ✄ ✁ ✡ ✲ ✜ ✤ ✝ ✁ ✖ ✁ ✜ ✯ ✘ ☛ ✂ ✣ ✛ ✰ ✭ ✝ ✧ ✁ ✩ ✜ ✤ ✜✩ ✜ ✢ ✆ ✝ ✜ ✧ ✤ ✗ ✞ ✤ ☛ ✙ ✕ ✄ ✄ ✁ ✡ ✲ ✜ ✤ ✝ ✁ ✖ ✁ ✜ ✯ ✘ ☛ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✞ ✤ ☛ ✙ ✕ ✄ ✄ ✁ ✡ ✲ ✜ ✤ ✝ ✁ ✖ ✁ ✜ ✯ ✘ ☛ ✭ ✆ ✩ ✱ ✯ ✆ ✖ ✍ ✙ ✞ ☎ ☞ ☛ ✌ ✁ ✯ ✆ ✚ ✛ ✞ ✤ ☛ ✘ ✭ ✆ ✩ ✱ ✯ ✆ ✖ ✍ ✙ ✄ ✭ ✧ ✴ ✖ ✑ ✙ ✙ ✕ ✄ ✄ ✁ ✡ ✲ ✜ ✤ ✝ ✁ ✖ ✁ ✜ ✯ ✘ ☛ ✭ ✆ ✩ ✱ ✯ ✆ ✖ ✑ ✙ ✞ ☎ ☞ ☛ ✌ ✁ ✯ ✆ ✚ ✛ ✞ ✤ ☛ ✘ ✭ ✆ ✩ ✱ ✯ ✆ ✖ ✑ ✙ ✄ ✭ ✧ ✴ ✖ ✑ ✙ ✙ ✕ ✄ ✄ ✟ ✣ ✲ ✲ ✧ ✲ ✘ ✟ ✘ ✝ ✔ ✞ ✁ ✜ ✝ ✖ ✭ ✆ ✩ ✱ ✯ ✆ ✘ ✝ ✘ ✚ ✙ ✕ ✄ ✄ ✁ ✡ ✲ ✜ ✤ ✝ ✁ ✖ ✁ ✜ ✯ ✘ ☛ ✝ ✖ ✍ ✙ ✞ ☎ ☞ ☛ ✌ ✁ ✞ ✤ ✝ ✖ ✑ ✙ ✞ ☎ ☞ ☛ ✌ ✁ ✞ ✤ ✞ ✤ ☛ ✘ ✝ ✙ ✕ ✄ ✄ ✁ ✡ ✲ ✜ ✤ ✝ ✁ ✖ ✁ ✜ ✯ ✘ ☛ ✂ ✣ ✛ ✜ ✯ ✰ ✆ ✭ ✤ ✧ ✲ ✩ ✞ ☎ ☞ ☛ ✌ ✁ ✞ ✤ ☛ ✘ ✣ ✲ ✲ ✧ ✲ ✙ ✕ ✄ ✄ ✁ ✡ ✲ ✜ ✤ ✝ ✁ ✖ ✁ ✜ ✯ ✘ ☛ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✞ ✤ ☛ ✙ ✕ ✄ ✄ ✁ ✝ ✆ ✝ ✲ ✣ ✛ ✁ ✜ ✭ ✣ ☛ ✝ ✮ ✝ ✂ ✣ ✛ ✰ ✭ ✝ ✧ ✁ ✩ ✜ ✤ ✜✩ ✜ ✢ ✆ ✝ ✜ ✧ ✤ ✗ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✭ ✆ ✩ ✱ ✯ ✆ ✖ ✍ ✙ ✞ ✎ ✑ ☛ ✍ ✓ ✒ ✯ ✆ ✚ ✛ ✭ ✆ ✩ ✱ ✯ ✆ ✖ ✑ ✙ ✞ ☞ ☛ ✑ ✌ ☞ ✯ ✆ ✚ ✛ ✝ ✖ ✍ ✙ ✞ ✡ ☛ ✍ ☞ ✎ ✝ ✖ ✑ ✙ ✞ ✑ ☛ ✓ ✏ ✎ ✂ ✣ ✛ ✜ ✯ ✰ ✆ ✭ ✤ ✧ ✲ ✩ ✞ ✡ ☛ ✍ ✠ ✍ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✽ ✻
slide-20
SLIDE 20

N-M Algorithm

  • not proven to converge in general
  • but, widely used

— easy to implement — inexpensive: usually only 1-2 function evaluations/iteration — no derivatives needed — makes good progress at beginning of iteration history Assignments: — Display N-M steps using

  • ptions.Display=’iter’;

fminsearch(fun,[x0],options);

— Write nested ‘for’ loops in Matlab to generate grid of starting points (and later random starting points) for fminsearch to find best “global min”

slide-21
SLIDE 21

Genetic/Evolutionary Algorithms

  • at each iteration either mate or mutate possible solution vectors
  • based on fitness of possible solution vectors as measured by ob-

jective function