CS 61A/CS 98-52 Mehrdad Niknami University of California, Berkeley - - PowerPoint PPT Presentation

cs 61a cs 98 52
SMART_READER_LITE
LIVE PREVIEW

CS 61A/CS 98-52 Mehrdad Niknami University of California, Berkeley - - PowerPoint PPT Presentation

CS 61A/CS 98-52 Mehrdad Niknami University of California, Berkeley Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 1 / 25 Warning Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 25 Warning FYI: Mehrdad Niknami (UC Berkeley) CS 61A/CS


slide-1
SLIDE 1

CS 61A/CS 98-52

Mehrdad Niknami

University of California, Berkeley

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 1 / 25

slide-2
SLIDE 2

Warning

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 25

slide-3
SLIDE 3

Warning

FYI:

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 25

slide-4
SLIDE 4

Warning

FYI: This lecture might get a little... intense... and math-y

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 25

slide-5
SLIDE 5

Warning

FYI: This lecture might get a little... intense... and math-y If it’s hard, don’t panic!

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 25

slide-6
SLIDE 6

Warning

FYI: This lecture might get a little... intense... and math-y If it’s hard, don’t panic! It’s okpy!

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 25

slide-7
SLIDE 7

Warning

FYI: This lecture might get a little... intense... and math-y If it’s hard, don’t panic! It’s okpy! They won’t all be like this!

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 25

slide-8
SLIDE 8

Warning

FYI: This lecture might get a little... intense... and math-y If it’s hard, don’t panic! It’s okpy! They won’t all be like this! Just try to enjoy it, ask questions, & learn as much as you can. :)

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 25

slide-9
SLIDE 9

Warning

FYI: This lecture might get a little... intense... and math-y If it’s hard, don’t panic! It’s okpy! They won’t all be like this! Just try to enjoy it, ask questions, & learn as much as you can. :) Ready?!

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 25

slide-10
SLIDE 10

Preliminaries

Last lecture was on equation-solving:

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 3 / 25

slide-11
SLIDE 11

Preliminaries

Last lecture was on equation-solving: “Given f and initial guess x0, solve f (x) = 0”

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 3 / 25

slide-12
SLIDE 12

Preliminaries

Last lecture was on equation-solving: “Given f and initial guess x0, solve f (x) = 0” This lecture is on optimization: arg minx F(x)

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 3 / 25

slide-13
SLIDE 13

Preliminaries

Last lecture was on equation-solving: “Given f and initial guess x0, solve f (x) = 0” This lecture is on optimization: arg minx F(x) “Given F and initial guess x0, find x that minimizes F(x)”

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 3 / 25

slide-14
SLIDE 14

Brachistochrone Problem

Let’s solve a realistic problem.

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 25

slide-15
SLIDE 15

Brachistochrone Problem

Let’s solve a realistic problem. It’s the brachistochrone (“shortest time”) problem:

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 25

slide-16
SLIDE 16

Brachistochrone Problem

Let’s solve a realistic problem. It’s the brachistochrone (“shortest time”) problem:

1 Drop a ball on a ramp Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 25

slide-17
SLIDE 17

Brachistochrone Problem

Let’s solve a realistic problem. It’s the brachistochrone (“shortest time”) problem:

1 Drop a ball on a ramp 2 Let it roll down Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 25

slide-18
SLIDE 18

Brachistochrone Problem

Let’s solve a realistic problem. It’s the brachistochrone (“shortest time”) problem:

1 Drop a ball on a ramp 2 Let it roll down 3 What shape minimizes the travel time? Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 25

slide-19
SLIDE 19

Brachistochrone Problem

Let’s solve a realistic problem. It’s the brachistochrone (“shortest time”) problem:

1 Drop a ball on a ramp 2 Let it roll down 3 What shape minimizes the travel time?

0.5 1.0 1.5

  • 1.5
  • 1.0
  • 0.5

0.0

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 25

slide-20
SLIDE 20

Brachistochrone Problem

Let’s solve a realistic problem. It’s the brachistochrone (“shortest time”) problem:

1 Drop a ball on a ramp 2 Let it roll down 3 What shape minimizes the travel time?

0.5 1.0 1.5

  • 1.5
  • 1.0
  • 0.5

0.0

= ⇒ How would you solve this?

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 25

slide-21
SLIDE 21

Brachistochrone Problem

Ideally: Learn fancy math, derive the answer, plug in the formula.

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 25

slide-22
SLIDE 22

Brachistochrone Problem

Ideally: Learn fancy math, derive the answer, plug in the formula. Oh, sorry... did you say you’re a programmer?

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 25

slide-23
SLIDE 23

Brachistochrone Problem

Ideally: Learn fancy math, derive the answer, plug in the formula. Oh, sorry... did you say you’re a programmer?

1 Math is hard Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 25

slide-24
SLIDE 24

Brachistochrone Problem

Ideally: Learn fancy math, derive the answer, plug in the formula. Oh, sorry... did you say you’re a programmer?

1 Math is hard 2 Physics is hard Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 25

slide-25
SLIDE 25

Brachistochrone Problem

Ideally: Learn fancy math, derive the answer, plug in the formula. Oh, sorry... did you say you’re a programmer?

1 Math is hard 2 Physics is hard 3 We’re lazy Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 25

slide-26
SLIDE 26

Brachistochrone Problem

Ideally: Learn fancy math, derive the answer, plug in the formula. Oh, sorry... did you say you’re a programmer?

1 Math is hard 2 Physics is hard 3 We’re lazy 4 Why learn something new when you can burn electricity instead? Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 25

slide-27
SLIDE 27

Brachistochrone Problem

Ideally: Learn fancy math, derive the answer, plug in the formula. Oh, sorry... did you say you’re a programmer?

1 Math is hard 2 Physics is hard 3 We’re lazy 4 Why learn something new when you can burn electricity instead?

OK but honestly the math is a little complicated...

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 25

slide-28
SLIDE 28

Brachistochrone Problem

Ideally: Learn fancy math, derive the answer, plug in the formula. Oh, sorry... did you say you’re a programmer?

1 Math is hard 2 Physics is hard 3 We’re lazy 4 Why learn something new when you can burn electricity instead?

OK but honestly the math is a little complicated... Calculus of variations... Euler-Lagrange differential eqn... maybe?

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 25

slide-29
SLIDE 29

Brachistochrone Problem

Ideally: Learn fancy math, derive the answer, plug in the formula. Oh, sorry... did you say you’re a programmer?

1 Math is hard 2 Physics is hard 3 We’re lazy 4 Why learn something new when you can burn electricity instead?

OK but honestly the math is a little complicated... Calculus of variations... Euler-Lagrange differential eqn... maybe? Take Physics 105... have fun!

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 25

slide-30
SLIDE 30

Brachistochrone Problem

Ideally: Learn fancy math, derive the answer, plug in the formula. Oh, sorry... did you say you’re a programmer?

1 Math is hard 2 Physics is hard 3 We’re lazy 4 Why learn something new when you can burn electricity instead?

OK but honestly the math is a little complicated... Calculus of variations... Euler-Lagrange differential eqn... maybe? Take Physics 105... have fun! Don’t get wrecked

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 25

slide-31
SLIDE 31

Brachistochrone Problem

Joking aside...

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 25

slide-32
SLIDE 32

Brachistochrone Problem

Joking aside... Most problems don’t have a nice formula, so you’ll need algorithms.

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 25

slide-33
SLIDE 33

Brachistochrone Problem

Joking aside... Most problems don’t have a nice formula, so you’ll need algorithms. Let’s get our hands dirty!

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 25

slide-34
SLIDE 34

Brachistochrone Problem

Joking aside... Most problems don’t have a nice formula, so you’ll need algorithms. Let’s get our hands dirty! Remember Riemann sums?

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 25

slide-35
SLIDE 35

Brachistochrone Problem

Joking aside... Most problems don’t have a nice formula, so you’ll need algorithms. Let’s get our hands dirty! Remember Riemann sums? This is similar:

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 25

slide-36
SLIDE 36

Brachistochrone Problem

Joking aside... Most problems don’t have a nice formula, so you’ll need algorithms. Let’s get our hands dirty! Remember Riemann sums? This is similar:

1 Chop up the ramp into line

segments

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 25

slide-37
SLIDE 37

Brachistochrone Problem

Joking aside... Most problems don’t have a nice formula, so you’ll need algorithms. Let’s get our hands dirty! Remember Riemann sums? This is similar:

1 Chop up the ramp into line

segments (but hold ends fixed)

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 25

slide-38
SLIDE 38

Brachistochrone Problem

Joking aside... Most problems don’t have a nice formula, so you’ll need algorithms. Let’s get our hands dirty! Remember Riemann sums? This is similar:

1 Chop up the ramp into line

segments (but hold ends fixed)

2 Move around the anchors to

minimize travel time

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 25

slide-39
SLIDE 39

Brachistochrone Problem

Joking aside... Most problems don’t have a nice formula, so you’ll need algorithms. Let’s get our hands dirty! Remember Riemann sums? This is similar:

1 Chop up the ramp into line

segments (but hold ends fixed)

2 Move around the anchors to

minimize travel time Q: How do you do this?

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 25

slide-40
SLIDE 40

Brachistochrone Problem

Joking aside... Most problems don’t have a nice formula, so you’ll need algorithms. Let’s get our hands dirty! Remember Riemann sums? This is similar:

1 Chop up the ramp into line

segments (but hold ends fixed)

2 Move around the anchors to

minimize travel time Q: How do you do this?

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 25

slide-41
SLIDE 41

Algorithm

Use Newton-Raphson!

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 25

slide-42
SLIDE 42

Algorithm

Use Newton-Raphson! ...but wasn’t that for finding roots? Not optimizing?

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 25

slide-43
SLIDE 43

Algorithm

Use Newton-Raphson! ...but wasn’t that for finding roots? Not optimizing? Actually, it’s used for both:

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 25

slide-44
SLIDE 44

Algorithm

Use Newton-Raphson! ...but wasn’t that for finding roots? Not optimizing? Actually, it’s used for both: If F is differentiable, minimizing F reduces to root-finding: F ′(x) = f (x) = 0

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 25

slide-45
SLIDE 45

Algorithm

Use Newton-Raphson! ...but wasn’t that for finding roots? Not optimizing? Actually, it’s used for both: If F is differentiable, minimizing F reduces to root-finding: F ′(x) = f (x) = 0 Caveat: must avoid maxima and inflection points

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 25

slide-46
SLIDE 46

Algorithm

Use Newton-Raphson! ...but wasn’t that for finding roots? Not optimizing? Actually, it’s used for both: If F is differentiable, minimizing F reduces to root-finding: F ′(x) = f (x) = 0 Caveat: must avoid maxima and inflection points

Easy in 1-D: only ± directions to check for increase/decrease

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 25

slide-47
SLIDE 47

Algorithm

Use Newton-Raphson! ...but wasn’t that for finding roots? Not optimizing? Actually, it’s used for both: If F is differentiable, minimizing F reduces to root-finding: F ′(x) = f (x) = 0 Caveat: must avoid maxima and inflection points

Easy in 1-D: only ± directions to check for increase/decrease Good luck in N-D... infinitely many directions

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 25

slide-48
SLIDE 48

Algorithm

Newton-Raphson method for optimization:

1Why are quadratics common? Energy/cost are quadratic (K = 1 2mv 2, P = I 2R...) 2You’ll see linearization ALL the time in engineering Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 8 / 25

slide-49
SLIDE 49

Algorithm

Newton-Raphson method for optimization:

1 Assume F is approximately quadratic1 (so f = F ′ approx. linear) 1Why are quadratics common? Energy/cost are quadratic (K = 1 2mv 2, P = I 2R...) 2You’ll see linearization ALL the time in engineering Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 8 / 25

slide-50
SLIDE 50

Algorithm

Newton-Raphson method for optimization:

1 Assume F is approximately quadratic1 (so f = F ′ approx. linear) 2 Guess some x0 intelligently 1Why are quadratics common? Energy/cost are quadratic (K = 1 2mv 2, P = I 2R...) 2You’ll see linearization ALL the time in engineering Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 8 / 25

slide-51
SLIDE 51

Algorithm

Newton-Raphson method for optimization:

1 Assume F is approximately quadratic1 (so f = F ′ approx. linear) 2 Guess some x0 intelligently 3 Repeatedly solve linear approximation2 of f = F ′:

f (xk) − f (xk+1) = f ′(xk) (xk − xk+1) f (xk+1) = 0 = ⇒ xk+1 = xk − f ′(xk)−1 f (xk)

1Why are quadratics common? Energy/cost are quadratic (K = 1 2mv 2, P = I 2R...) 2You’ll see linearization ALL the time in engineering Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 8 / 25

slide-52
SLIDE 52

Algorithm

Newton-Raphson method for optimization:

1 Assume F is approximately quadratic1 (so f = F ′ approx. linear) 2 Guess some x0 intelligently 3 Repeatedly solve linear approximation2 of f = F ′:

f (xk) − f (xk+1) = f ′(xk) (xk − xk+1) f (xk+1) = 0 = ⇒ xk+1 = xk − f ′(xk)−1 f (xk)

We ignored F!

1Why are quadratics common? Energy/cost are quadratic (K = 1 2mv 2, P = I 2R...) 2You’ll see linearization ALL the time in engineering Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 8 / 25

slide-53
SLIDE 53

Algorithm

Newton-Raphson method for optimization:

1 Assume F is approximately quadratic1 (so f = F ′ approx. linear) 2 Guess some x0 intelligently 3 Repeatedly solve linear approximation2 of f = F ′:

f (xk) − f (xk+1) = f ′(xk) (xk − xk+1) f (xk+1) = 0 = ⇒ xk+1 = xk − f ′(xk)−1 f (xk)

We ignored F! Avoid maxima and inflection points!

1Why are quadratics common? Energy/cost are quadratic (K = 1 2mv 2, P = I 2R...) 2You’ll see linearization ALL the time in engineering Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 8 / 25

slide-54
SLIDE 54

Algorithm

Newton-Raphson method for optimization:

1 Assume F is approximately quadratic1 (so f = F ′ approx. linear) 2 Guess some x0 intelligently 3 Repeatedly solve linear approximation2 of f = F ′:

f (xk) − f (xk+1) = f ′(xk) (xk − xk+1) f (xk+1) = 0 = ⇒ xk+1 = xk − f ′(xk)−1 f (xk)

We ignored F! Avoid maxima and inflection points! (How?)

1Why are quadratics common? Energy/cost are quadratic (K = 1 2mv 2, P = I 2R...) 2You’ll see linearization ALL the time in engineering Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 8 / 25

slide-55
SLIDE 55

Algorithm

Newton-Raphson method for optimization:

1 Assume F is approximately quadratic1 (so f = F ′ approx. linear) 2 Guess some x0 intelligently 3 Repeatedly solve linear approximation2 of f = F ′:

f (xk) − f (xk+1) = f ′(xk) (xk − xk+1) f (xk+1) = 0 = ⇒ xk+1 = xk − f ′(xk)−1 f (xk)

We ignored F! Avoid maxima and inflection points! (How?)

4 ...Profit? 1Why are quadratics common? Energy/cost are quadratic (K = 1 2mv 2, P = I 2R...) 2You’ll see linearization ALL the time in engineering Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 8 / 25

slide-56
SLIDE 56

Algorithm

Wait, but we have a function of many variables. What do?

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 9 / 25

slide-57
SLIDE 57

Algorithm

Wait, but we have a function of many variables. What do? A couple options:

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 9 / 25

slide-58
SLIDE 58

Algorithm

Wait, but we have a function of many variables. What do? A couple options:

1 Fully multivariate Newton-Raphson:

  • xk+1 =

xk − ∇ f ( xk)−1 f ( xk) Taught in EE 219A, 227C, 144/244, etc... (need Math 53 and 54)

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 9 / 25

slide-59
SLIDE 59

Algorithm

Wait, but we have a function of many variables. What do? A couple options:

1 Fully multivariate Newton-Raphson:

  • xk+1 =

xk − ∇ f ( xk)−1 f ( xk) Taught in EE 219A, 227C, 144/244, etc... (need Math 53 and 54)

2 Newton coordinate-descent Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 9 / 25

slide-60
SLIDE 60

Algorithm

Coordinate descent:

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 10 / 25

slide-61
SLIDE 61

Algorithm

Coordinate descent:

1 Take x1, use it to minimize F, holding others fixed Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 10 / 25

slide-62
SLIDE 62

Algorithm

Coordinate descent:

1 Take x1, use it to minimize F, holding others fixed 2 Take y1, use it to minimize F, holding others fixed Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 10 / 25

slide-63
SLIDE 63

Algorithm

Coordinate descent:

1 Take x1, use it to minimize F, holding others fixed 2 Take y1, use it to minimize F, holding others fixed 3 Take x2, use it to minimize F, holding others fixed Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 10 / 25

slide-64
SLIDE 64

Algorithm

Coordinate descent:

1 Take x1, use it to minimize F, holding others fixed 2 Take y1, use it to minimize F, holding others fixed 3 Take x2, use it to minimize F, holding others fixed 4 Take y2, use it to minimize F, holding others fixed Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 10 / 25

slide-65
SLIDE 65

Algorithm

Coordinate descent:

1 Take x1, use it to minimize F, holding others fixed 2 Take y1, use it to minimize F, holding others fixed 3 Take x2, use it to minimize F, holding others fixed 4 Take y2, use it to minimize F, holding others fixed 5 . . . Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 10 / 25

slide-66
SLIDE 66

Algorithm

Coordinate descent:

1 Take x1, use it to minimize F, holding others fixed 2 Take y1, use it to minimize F, holding others fixed 3 Take x2, use it to minimize F, holding others fixed 4 Take y2, use it to minimize F, holding others fixed 5 . . . 6 Cycle through again Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 10 / 25

slide-67
SLIDE 67

Algorithm

Coordinate descent:

1 Take x1, use it to minimize F, holding others fixed 2 Take y1, use it to minimize F, holding others fixed 3 Take x2, use it to minimize F, holding others fixed 4 Take y2, use it to minimize F, holding others fixed 5 . . . 6 Cycle through again

Doesn’t work as often, but it works very well here.

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 10 / 25

slide-68
SLIDE 68

Algorithm

Newton step for minimization:

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 11 / 25

slide-69
SLIDE 69

Algorithm

Newton step for minimization: def newton_minimizer_step(F, coords, h): delta = 0.0 for i in range(1, len(coords) - 1): for j in range(len(coords[i])): def f(c): return derivative(F, c, i, j, h) def df(c): return derivative(f, c, i, j, h) step = -f(coords) / df(coords) delta += abs(step) coords[i][j] += step return delta

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 11 / 25

slide-70
SLIDE 70

Algorithm

Newton step for minimization: def newton_minimizer_step(F, coords, h): delta = 0.0 for i in range(1, len(coords) - 1): for j in range(len(coords[i])): def f(c): return derivative(F, c, i, j, h) def df(c): return derivative(f, c, i, j, h) step = -f(coords) / df(coords) delta += abs(step) coords[i][j] += step return delta Side note: Notice a potential bug?

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 11 / 25

slide-71
SLIDE 71

Algorithm

Newton step for minimization: def newton_minimizer_step(F, coords, h): delta = 0.0 for i in range(1, len(coords) - 1): for j in range(len(coords[i])): def f(c): return derivative(F, c, i, j, h) def df(c): return derivative(f, c, i, j, h) step = -f(coords) / df(coords) delta += abs(step) coords[i][j] += step return delta Side note: Notice a potential bug? What’s the fix?

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 11 / 25

slide-72
SLIDE 72

Algorithm

Newton step for minimization: def newton_minimizer_step(F, coords, h): delta = 0.0 for i in range(1, len(coords) - 1): for j in range(len(coords[i])): def f(c): return derivative(F, c, i, j, h) def df(c): return derivative(f, c, i, j, h) step = -f(coords) / df(coords) delta += abs(step) coords[i][j] += step return delta Side note: Notice a potential bug? What’s the fix? Notice a 33% inefficiency?

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 11 / 25

slide-73
SLIDE 73

Algorithm

Newton step for minimization: def newton_minimizer_step(F, coords, h): delta = 0.0 for i in range(1, len(coords) - 1): for j in range(len(coords[i])): def f(c): return derivative(F, c, i, j, h) def df(c): return derivative(f, c, i, j, h) step = -f(coords) / df(coords) delta += abs(step) coords[i][j] += step return delta Side note: Notice a potential bug? What’s the fix? Notice a 33% inefficiency? What’s the fix?

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 11 / 25

slide-74
SLIDE 74

Algorithm

Computing derivatives numerically:

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 12 / 25

slide-75
SLIDE 75

Algorithm

Computing derivatives numerically: def derivative(f, coords, i, j, h): x = coords[i][j] coords[i][j] = x + h; f2 = f(coords) coords[i][j] = x - h; f1 = f(coords) coords[i][j] = x return (f2 - f1) / (2 * h)

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 12 / 25

slide-76
SLIDE 76

Algorithm

Computing derivatives numerically: def derivative(f, coords, i, j, h): x = coords[i][j] coords[i][j] = x + h; f2 = f(coords) coords[i][j] = x - h; f1 = f(coords) coords[i][j] = x return (f2 - f1) / (2 * h) Why not (f(x + h) - f(x)) / h?

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 12 / 25

slide-77
SLIDE 77

Algorithm

Computing derivatives numerically: def derivative(f, coords, i, j, h): x = coords[i][j] coords[i][j] = x + h; f2 = f(coords) coords[i][j] = x - h; f1 = f(coords) coords[i][j] = x return (f2 - f1) / (2 * h) Why not (f(x + h) - f(x)) / h? Breaking the intrinsic asymmetry reduces accuracy

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 12 / 25

slide-78
SLIDE 78

Algorithm

Computing derivatives numerically: def derivative(f, coords, i, j, h): x = coords[i][j] coords[i][j] = x + h; f2 = f(coords) coords[i][j] = x - h; f1 = f(coords) coords[i][j] = x return (f2 - f1) / (2 * h) Why not (f(x + h) - f(x)) / h? Breaking the intrinsic asymmetry reduces accuracy ∼ Words of Wisdom ∼ If your problem has {fundamental feature} that your solution doesn’t, you’ve created more problems.

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 12 / 25

slide-79
SLIDE 79

Algorithm

What is our objective function F to minimize?

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 13 / 25

slide-80
SLIDE 80

Algorithm

What is our objective function F to minimize? def falling_time(coords): # coords = [[x1,y1], [x2,y2], ...] t, speed = 0.0, 0.0 prev = None for coord in coords: if prev != None: dy = coord[1] - prev[1] d = ((coord[0] - prev[0]) ** 2 + dy ** 2) ** 0.5 accel = -9.80665 * dy / d for dt in quadratic_roots(accel, speed, -d): if dt > 0: speed += accel * dt t += dt prev = coord return t

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 13 / 25

slide-81
SLIDE 81

Algorithm

Let’s define quadratic roots...

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 14 / 25

slide-82
SLIDE 82

Algorithm

Let’s define quadratic roots... def quadratic_roots(two_a, b, c): D = b * b - 2 * two_a * c if D >= 0: if D > 0: r = D ** 0.5 roots = [(-b + r) / two_a, (-b - r) / two_a] else: roots = [-b / two_a] else: roots = [] return roots

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 14 / 25

slide-83
SLIDE 83

Algorithm

Aaaaaand put it all together

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 15 / 25

slide-84
SLIDE 84

Algorithm

Aaaaaand put it all together def main(n=6): (y1, y2) = (1.0, 0.0) (x1, x2) = (0.0, 1.0) coords = [ # initial guess: straight line [x1 + (x2 - x1) * i / n, y1 + (y2 - y1) * i / n] for i in range(n + 1) ] f = falling_time h = 0.00001 while newton_minimizer_step(f, coords, h) > 0.01: print(coords) if __name__ == '__main__': main()

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 15 / 25

slide-85
SLIDE 85

Algorithm

(Demo)

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 16 / 25

slide-86
SLIDE 86

Analysis

Error analysis:

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 25

slide-87
SLIDE 87

Analysis

Error analysis: If x∞ is the root and ǫk = xk − x∞ is the error, then:

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 25

slide-88
SLIDE 88

Analysis

Error analysis: If x∞ is the root and ǫk = xk − x∞ is the error, then: (xk+1 − x∞) = (xk − x∞) − f (xk) f ′(xk) (Newton step) ǫk+1 = ǫk − f (xk) f ′(xk) (error step) ǫk+1 = ǫk − ✘✘✘

f (x∞) + ǫkf ′(x∞) + 1

2ǫ2 kf ′′(x∞) + · · ·

f ′(x∞) + ǫkf ′′(x∞) + · · · (Taylor series) ǫk+1 =

1 2ǫ2 kf ′′(x∞) + · · ·

f ′(x∞) + ǫkf ′′(x∞) + · · · (simplify)

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 25

slide-89
SLIDE 89

Analysis

Error analysis: If x∞ is the root and ǫk = xk − x∞ is the error, then: (xk+1 − x∞) = (xk − x∞) − f (xk) f ′(xk) (Newton step) ǫk+1 = ǫk − f (xk) f ′(xk) (error step) ǫk+1 = ǫk − ✘✘✘

f (x∞) + ǫkf ′(x∞) + 1

2ǫ2 kf ′′(x∞) + · · ·

f ′(x∞) + ǫkf ′′(x∞) + · · · (Taylor series) ǫk+1 =

1 2ǫ2 kf ′′(x∞) + · · ·

f ′(x∞) + ǫkf ′′(x∞) + · · · (simplify) As ǫk → 0, the “· · · ” terms are quickly dominated. Therefore:

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 25

slide-90
SLIDE 90

Analysis

Error analysis: If x∞ is the root and ǫk = xk − x∞ is the error, then: (xk+1 − x∞) = (xk − x∞) − f (xk) f ′(xk) (Newton step) ǫk+1 = ǫk − f (xk) f ′(xk) (error step) ǫk+1 = ǫk − ✘✘✘

f (x∞) + ǫkf ′(x∞) + 1

2ǫ2 kf ′′(x∞) + · · ·

f ′(x∞) + ǫkf ′′(x∞) + · · · (Taylor series) ǫk+1 =

1 2ǫ2 kf ′′(x∞) + · · ·

f ′(x∞) + ǫkf ′′(x∞) + · · · (simplify) As ǫk → 0, the “· · · ” terms are quickly dominated. Therefore: If f ′(x∞) ≈ 0, then ǫk+1 ∝ ǫk (slow: # of correct digits adds)

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 25

slide-91
SLIDE 91

Analysis

Error analysis: If x∞ is the root and ǫk = xk − x∞ is the error, then: (xk+1 − x∞) = (xk − x∞) − f (xk) f ′(xk) (Newton step) ǫk+1 = ǫk − f (xk) f ′(xk) (error step) ǫk+1 = ǫk − ✘✘✘

f (x∞) + ǫkf ′(x∞) + 1

2ǫ2 kf ′′(x∞) + · · ·

f ′(x∞) + ǫkf ′′(x∞) + · · · (Taylor series) ǫk+1 =

1 2ǫ2 kf ′′(x∞) + · · ·

f ′(x∞) + ǫkf ′′(x∞) + · · · (simplify) As ǫk → 0, the “· · · ” terms are quickly dominated. Therefore: If f ′(x∞) ≈ 0, then ǫk+1 ∝ ǫk (slow: # of correct digits adds) Otherwise, we have ǫk+1 ∝ ǫ2

k (fast: # of correct digits doubles)

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 25

slide-92
SLIDE 92

Analysis

Some failure modes:

0.5 0.5 1.0 1.5 1.0 0.5

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 18 / 25

slide-93
SLIDE 93

Analysis

Some failure modes: f is flat near root: too slow

0.5 0.5 1.0 1.5 1.0 0.5

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 18 / 25

slide-94
SLIDE 94

Analysis

Some failure modes: f is flat near root: too slow f ′(x) ≈ 0 = shoots off into infinity (n.b. if x != 0 not a solution)

0.5 0.5 1.0 1.5 1.0 0.5

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 18 / 25

slide-95
SLIDE 95

Analysis

Some failure modes: f is flat near root: too slow f ′(x) ≈ 0 = shoots off into infinity (n.b. if x != 0 not a solution) Stable oscillation trap

  • 0.5

0.5 1.0

  • 1.5
  • 1.0
  • 0.5

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 18 / 25

slide-96
SLIDE 96

Analysis

Some failure modes: f is flat near root: too slow f ′(x) ≈ 0 = shoots off into infinity (n.b. if x != 0 not a solution) Stable oscillation trap

  • 0.5

0.5 1.0

  • 1.5
  • 1.0
  • 0.5

Intuition: Think adversarially: create “tricky” f that looks root-less

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 18 / 25

slide-97
SLIDE 97

Analysis

Some failure modes: f is flat near root: too slow f ′(x) ≈ 0 = shoots off into infinity (n.b. if x != 0 not a solution) Stable oscillation trap

  • 0.5

0.5 1.0

  • 1.5
  • 1.0
  • 0.5

Intuition: Think adversarially: create “tricky” f that looks root-less

Obviously this is possible... just put the root far away

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 18 / 25

slide-98
SLIDE 98

Analysis

Some failure modes: f is flat near root: too slow f ′(x) ≈ 0 = shoots off into infinity (n.b. if x != 0 not a solution) Stable oscillation trap

  • 0.5

0.5 1.0

  • 1.5
  • 1.0
  • 0.5

Intuition: Think adversarially: create “tricky” f that looks root-less

Obviously this is possible... just put the root far away Therefore Newton-Raphson can’t be foolproof

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 18 / 25

slide-99
SLIDE 99

Final thoughts

Notes: There are subtleties I brushed under the rug:

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 25

slide-100
SLIDE 100

Final thoughts

Notes: There are subtleties I brushed under the rug: The physics is much more complicated (why?)

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 25

slide-101
SLIDE 101

Final thoughts

Notes: There are subtleties I brushed under the rug: The physics is much more complicated (why?) The numerical code can break easily (why?)

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 25

slide-102
SLIDE 102

Final thoughts

Notes: There are subtleties I brushed under the rug: The physics is much more complicated (why?) The numerical code can break easily (why?) Can’t tell why?

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 25

slide-103
SLIDE 103

Final thoughts

Notes: There are subtleties I brushed under the rug: The physics is much more complicated (why?) The numerical code can break easily (why?) Can’t tell why? What happens if y1 = 0.5 instead of y1 = 1.0?

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 25

slide-104
SLIDE 104

Addendum 1

There’s never a one-size-fits-all solution

3Equivalent to complex-differentiability: f ′(x) = lim h→0

  • f (x + h) − f (x)
  • /h, h ∈ C.

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 25

slide-105
SLIDE 105

Addendum 1

There’s never a one-size-fits-all solution Must always know something about problem structure

3Equivalent to complex-differentiability: f ′(x) = lim h→0

  • f (x + h) − f (x)
  • /h, h ∈ C.

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 25

slide-106
SLIDE 106

Addendum 1

There’s never a one-size-fits-all solution Must always know something about problem structure Typical assumptions (stronger assumptions = better results):

3Equivalent to complex-differentiability: f ′(x) = lim h→0

  • f (x + h) − f (x)
  • /h, h ∈ C.

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 25

slide-107
SLIDE 107

Addendum 1

There’s never a one-size-fits-all solution Must always know something about problem structure Typical assumptions (stronger assumptions = better results): Vaguely predictable: Continuity

3Equivalent to complex-differentiability: f ′(x) = lim h→0

  • f (x + h) − f (x)
  • /h, h ∈ C.

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 25

slide-108
SLIDE 108

Addendum 1

There’s never a one-size-fits-all solution Must always know something about problem structure Typical assumptions (stronger assumptions = better results): Vaguely predictable: Continuity Somewhat predictable: Differentiability

3Equivalent to complex-differentiability: f ′(x) = lim h→0

  • f (x + h) − f (x)
  • /h, h ∈ C.

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 25

slide-109
SLIDE 109

Addendum 1

There’s never a one-size-fits-all solution Must always know something about problem structure Typical assumptions (stronger assumptions = better results): Vaguely predictable: Continuity Somewhat predictable: Differentiability Pretty predictable: Smoothness (infinite-differentiability)

3Equivalent to complex-differentiability: f ′(x) = lim h→0

  • f (x + h) − f (x)
  • /h, h ∈ C.

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 25

slide-110
SLIDE 110

Addendum 1

There’s never a one-size-fits-all solution Must always know something about problem structure Typical assumptions (stronger assumptions = better results): Vaguely predictable: Continuity Somewhat predictable: Differentiability Pretty predictable: Smoothness (infinite-differentiability) Extremely predictable: Analyticity (approximable by polynomial)

3Equivalent to complex-differentiability: f ′(x) = lim h→0

  • f (x + h) − f (x)
  • /h, h ∈ C.

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 25

slide-111
SLIDE 111

Addendum 1

There’s never a one-size-fits-all solution Must always know something about problem structure Typical assumptions (stronger assumptions = better results): Vaguely predictable: Continuity Somewhat predictable: Differentiability Pretty predictable: Smoothness (infinite-differentiability) Extremely predictable: Analyticity (approximable by polynomial)

Function “equals” its infinite Taylor series

3Equivalent to complex-differentiability: f ′(x) = lim h→0

  • f (x + h) − f (x)
  • /h, h ∈ C.

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 25

slide-112
SLIDE 112

Addendum 1

There’s never a one-size-fits-all solution Must always know something about problem structure Typical assumptions (stronger assumptions = better results): Vaguely predictable: Continuity Somewhat predictable: Differentiability Pretty predictable: Smoothness (infinite-differentiability) Extremely predictable: Analyticity (approximable by polynomial)

Function “equals” its infinite Taylor series Also said to be holomorphic3

3Equivalent to complex-differentiability: f ′(x) = lim h→0

  • f (x + h) − f (x)
  • /h, h ∈ C.

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 25

slide-113
SLIDE 113

Addendum 1

Q: Does knowing f (x1), f ′(x1), f ′′(x1), . . . let you predict f (x2)?

1 1 2 3 4 0.2 0.4 0.6 0.8

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 21 / 25

slide-114
SLIDE 114

Addendum 1

Q: Does knowing f (x1), f ′(x1), f ′′(x1), . . . let you predict f (x2)? A: Obviously!

1 1 2 3 4 0.2 0.4 0.6 0.8

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 21 / 25

slide-115
SLIDE 115

Addendum 1

Q: Does knowing f (x1), f ′(x1), f ′′(x1), . . . let you predict f (x2)? A: Obviously! ...not :)

1 1 2 3 4 0.2 0.4 0.6 0.8

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 21 / 25

slide-116
SLIDE 116

Addendum 1

Q: Does knowing f (x1), f ′(x1), f ′′(x1), . . . let you predict f (x2)? A: Obviously! ...not :) counterexample: f (x) =

  • e−1/x

if x > 0

  • therwise
  • 1

1 2 3 4 0.2 0.4 0.6 0.8

Indistinguishable from 0 for x ≤ 0

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 21 / 25

slide-117
SLIDE 117

Addendum 1

Q: Does knowing f (x1), f ′(x1), f ′′(x1), . . . let you predict f (x2)? A: Obviously! ...not :) counterexample: f (x) =

  • e−1/x

if x > 0

  • therwise
  • 1

1 2 3 4 0.2 0.4 0.6 0.8

Indistinguishable from 0 for x ≤ 0

However, knowing derivatives would be enough for analytic functions!

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 21 / 25

slide-118
SLIDE 118

Addendum 2

Fun facts:

4If n k=0 an−kxk = 0 then |x| ≤ 2 maxk

k

  • ak/an
  • Mehrdad Niknami (UC Berkeley)

CS 61A/CS 98-52 22 / 25

slide-119
SLIDE 119

Addendum 2

Fun facts: Why are polynomials fundamental? Why not, say, exponentials?

4If n k=0 an−kxk = 0 then |x| ≤ 2 maxk

k

  • ak/an
  • Mehrdad Niknami (UC Berkeley)

CS 61A/CS 98-52 22 / 25

slide-120
SLIDE 120

Addendum 2

Fun facts: Why are polynomials fundamental? Why not, say, exponentials?

Pretty much everything is built on addition & multiplication!

4If n k=0 an−kxk = 0 then |x| ≤ 2 maxk

k

  • ak/an
  • Mehrdad Niknami (UC Berkeley)

CS 61A/CS 98-52 22 / 25

slide-121
SLIDE 121

Addendum 2

Fun facts: Why are polynomials fundamental? Why not, say, exponentials?

Pretty much everything is built on addition & multiplication! Study of polynomials = study of addition & multiplication

4If n k=0 an−kxk = 0 then |x| ≤ 2 maxk

k

  • ak/an
  • Mehrdad Niknami (UC Berkeley)

CS 61A/CS 98-52 22 / 25

slide-122
SLIDE 122

Addendum 2

Fun facts: Why are polynomials fundamental? Why not, say, exponentials?

Pretty much everything is built on addition & multiplication! Study of polynomials = study of addition & multiplication

Polynomials are awesome

4If n k=0 an−kxk = 0 then |x| ≤ 2 maxk

k

  • ak/an
  • Mehrdad Niknami (UC Berkeley)

CS 61A/CS 98-52 22 / 25

slide-123
SLIDE 123

Addendum 2

Fun facts: Why are polynomials fundamental? Why not, say, exponentials?

Pretty much everything is built on addition & multiplication! Study of polynomials = study of addition & multiplication

Polynomials are awesome

Polynomials can approximate real-world functions very well

4If n k=0 an−kxk = 0 then |x| ≤ 2 maxk

k

  • ak/an
  • Mehrdad Niknami (UC Berkeley)

CS 61A/CS 98-52 22 / 25

slide-124
SLIDE 124

Addendum 2

Fun facts: Why are polynomials fundamental? Why not, say, exponentials?

Pretty much everything is built on addition & multiplication! Study of polynomials = study of addition & multiplication

Polynomials are awesome

Polynomials can approximate real-world functions very well Pretty much everything about polynomials has been solved

Global root bound (Fujiwara4) = ⇒ you know where to start Minimal root separation (Mahler) = ⇒ you know when to stop Guaranteed root-finding (Sturm) = ⇒ you can binary-search

4If n k=0 an−kxk = 0 then |x| ≤ 2 maxk

k

  • ak/an
  • Mehrdad Niknami (UC Berkeley)

CS 61A/CS 98-52 22 / 25

slide-125
SLIDE 125

Addendum 2

By contrast: Unlike + and ×, exponentiation is not well-understood!

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 23 / 25

slide-126
SLIDE 126

Addendum 2

By contrast: Unlike + and ×, exponentiation is not well-understood! Table-maker’s dilemma (Prof. William Kahan):

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 23 / 25

slide-127
SLIDE 127

Addendum 2

By contrast: Unlike + and ×, exponentiation is not well-understood! Table-maker’s dilemma (Prof. William Kahan): Nobody knows cost of computing xy with correct rounding (!)

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 23 / 25

slide-128
SLIDE 128

Addendum 2

By contrast: Unlike + and ×, exponentiation is not well-understood! Table-maker’s dilemma (Prof. William Kahan): Nobody knows cost of computing xy with correct rounding (!) We don’t even know if it’s possible with finite memory (!!!)

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 23 / 25

slide-129
SLIDE 129

Addendum 2

By contrast: Unlike + and ×, exponentiation is not well-understood! Table-maker’s dilemma (Prof. William Kahan): Nobody knows cost of computing xy with correct rounding (!) We don’t even know if it’s possible with finite memory (!!!) So, polynomials are really nice!

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 23 / 25

slide-130
SLIDE 130

Addendum 3

Fun fact: If f is analytic, you can compute f ′ by evaluating f only once!

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 24 / 25

slide-131
SLIDE 131

Addendum 3

Fun fact: If f is analytic, you can compute f ′ by evaluating f only once! Any guesses how?

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 24 / 25

slide-132
SLIDE 132

Addendum 3

Fun fact: If f is analytic, you can compute f ′ by evaluating f only once! Any guesses how? Complex-step differentiation!

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 24 / 25

slide-133
SLIDE 133

Addendum 3

Fun fact: If f is analytic, you can compute f ′ by evaluating f only once! Any guesses how? Complex-step differentiation! f (x + ih) ≈ f (x) + i h f ′(x) Im

  • f (x + ih)

h f ′(x) (imaginary parts match) f ′(x) ≈ Im

  • f (x + ih)
  • h

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 24 / 25

slide-134
SLIDE 134

Addendum 3

Fun fact: If f is analytic, you can compute f ′ by evaluating f only once! Any guesses how? Complex-step differentiation! f (x + ih) ≈ f (x) + i h f ′(x) Im

  • f (x + ih)

h f ′(x) (imaginary parts match) f ′(x) ≈ Im

  • f (x + ih)
  • h

Features:

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 24 / 25

slide-135
SLIDE 135

Addendum 3

Fun fact: If f is analytic, you can compute f ′ by evaluating f only once! Any guesses how? Complex-step differentiation! f (x + ih) ≈ f (x) + i h f ′(x) Im

  • f (x + ih)

h f ′(x) (imaginary parts match) f ′(x) ≈ Im

  • f (x + ih)
  • h

Features: More accurate: Avoids “catastrophic cancellation” in subtraction

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 24 / 25

slide-136
SLIDE 136

Addendum 3

Fun fact: If f is analytic, you can compute f ′ by evaluating f only once! Any guesses how? Complex-step differentiation! f (x + ih) ≈ f (x) + i h f ′(x) Im

  • f (x + ih)

h f ′(x) (imaginary parts match) f ′(x) ≈ Im

  • f (x + ih)
  • h

Features: More accurate: Avoids “catastrophic cancellation” in subtraction Faster (sometimes): f evaluated only once

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 24 / 25

slide-137
SLIDE 137

Addendum 3

Fun fact: If f is analytic, you can compute f ′ by evaluating f only once! Any guesses how? Complex-step differentiation! f (x + ih) ≈ f (x) + i h f ′(x) Im

  • f (x + ih)

h f ′(x) (imaginary parts match) f ′(x) ≈ Im

  • f (x + ih)
  • h

Features: More accurate: Avoids “catastrophic cancellation” in subtraction Faster (sometimes): f evaluated only once Difficult for ≥ 2nd derivatives (need multicomplex numbers)

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 24 / 25

slide-138
SLIDE 138

Done!

Hope you learned something new!

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 25 / 25

slide-139
SLIDE 139

Done!

Hope you learned something new! P.S.: Did you prefer the coding part? Or the math part?

Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 25 / 25