Optimization for Machine Learning Lecture 1: Introduction to - - PowerPoint PPT Presentation

optimization for machine learning
SMART_READER_LITE
LIVE PREVIEW

Optimization for Machine Learning Lecture 1: Introduction to - - PowerPoint PPT Presentation

Optimization for Machine Learning Lecture 1: Introduction to Convexity S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 12, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 43


slide-1
SLIDE 1

Optimization for Machine Learning

Lecture 1: Introduction to Convexity S.V . N. (vishy) Vishwanathan

Purdue University vishy@purdue.edu

July 12, 2012

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 43

slide-2
SLIDE 2

Regularized Risk Minimization Machine Learning We want to build a model which predicts well on data A model’s performance is quantified by a loss function

a sophisticated discrepancy score

Our model must generalize to unseen data Avoid over-fitting by penalizing complex models (Regularization) More Formally Training data: {x1, . . . , xm} Labels: {y1, . . . , ym} Learn a vector: w minimize

w

J(w) := λΩ(w)

Regularizer

+ 1 m

m

  • i=1

l(xi, yi, w)

  • Risk Remp

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 43

slide-3
SLIDE 3

Regularized Risk Minimization Machine Learning We want to build a model which predicts well on data A model’s performance is quantified by a loss function

a sophisticated discrepancy score

Our model must generalize to unseen data Avoid over-fitting by penalizing complex models (Regularization) More Formally Training data: {x1, . . . , xm} Labels: {y1, . . . , ym} Learn a vector: w minimize

w

J(w) := λΩ(w)

Regularizer

+ 1 m

m

  • i=1

l(xi, yi, w)

  • Risk Remp

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 43

slide-4
SLIDE 4

Regularized Risk Minimization Machine Learning We want to build a model which predicts well on data A model’s performance is quantified by a loss function

a sophisticated discrepancy score

Our model must generalize to unseen data Avoid over-fitting by penalizing complex models (Regularization) More Formally Training data: {x1, . . . , xm} Labels: {y1, . . . , ym} Learn a vector: w minimize

w

J(w) := λΩ(w)

Regularizer

+ 1 m

m

  • i=1

l(xi, yi, w)

  • Risk Remp

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 43

slide-5
SLIDE 5

Regularized Risk Minimization Machine Learning We want to build a model which predicts well on data A model’s performance is quantified by a loss function

a sophisticated discrepancy score

Our model must generalize to unseen data Avoid over-fitting by penalizing complex models (Regularization) More Formally Training data: {x1, . . . , xm} Labels: {y1, . . . , ym} Learn a vector: w minimize

w

J(w) := λΩ(w)

Regularizer

+ 1 m

m

  • i=1

l(xi, yi, w)

  • Risk Remp

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 43

slide-6
SLIDE 6

Regularized Risk Minimization Machine Learning We want to build a model which predicts well on data A model’s performance is quantified by a loss function

a sophisticated discrepancy score

Our model must generalize to unseen data Avoid over-fitting by penalizing complex models (Regularization) More Formally Training data: {x1, . . . , xm} Labels: {y1, . . . , ym} Learn a vector: w minimize

w

J(w) := λΩ(w)

Regularizer

+ 1 m

m

  • i=1

l(xi, yi, w)

  • Risk Remp

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 43

slide-7
SLIDE 7

Convex Functions and Sets

Outline

1

Convex Functions and Sets

2

Operations Which Preserve Convexity

3

First Order Properties

4

Subgradients

5

Constraints

6

Warmup: Minimizing a 1-d Convex Function

7

Warmup: Coordinate Descent

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 3 / 43

slide-8
SLIDE 8

Convex Functions and Sets

Focus of my Lectures

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 43

slide-9
SLIDE 9

Convex Functions and Sets

Focus of my Lectures

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 43

slide-10
SLIDE 10

Convex Functions and Sets

Focus of my Lectures −2 2 −2 2 10

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 43

slide-11
SLIDE 11

Convex Functions and Sets

Disclaimer My focus is on showing connections between various methods I will sacrifice mathematical rigor and focus on intuition

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 5 / 43

slide-12
SLIDE 12

Convex Functions and Sets

Convex Function f (x′) f (x) A function f is convex if, and only if, for all x, x′ and λ ∈ (0, 1) f (λx + (1 − λ)x′) ≤ λf (x) + (1 − λ)f (x′)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 6 / 43

slide-13
SLIDE 13

Convex Functions and Sets

Convex Function f (x′) f (x) A function f is strictly convex if, and only if, for all x, x′ and λ ∈ (0, 1) f (λx + (1 − λ)x′)<λf (x) + (1 − λ)f (x′)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 6 / 43

slide-14
SLIDE 14

Convex Functions and Sets

Convex Function f (x′) f (x) A function f is σ-strongly convex if, and only if, f (·) − σ

2 ·2 is convex.

That is, for all x, x′ and λ ∈ (0, 1) f (λx + (1 − λ)x′) ≤ λf (x) + (1 − λ)f (x′) − σ 2 λ(1 − λ)

  • x − x′

2

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 6 / 43

slide-15
SLIDE 15

Convex Functions and Sets

Exercise: Jensen’s Inequality Extend the definition of convexity to show that if f is convex, then for all λi ≥ 0 such that

i λi = 1 we have

f

  • i

λixi

  • i

λif (xi)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 7 / 43

slide-16
SLIDE 16

Convex Functions and Sets

Some Familiar Examples −4 −2 2 4 2 4 6 8 10 12 f (x) = 1

2x2 (Square norm)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

slide-17
SLIDE 17

Convex Functions and Sets

Some Familiar Examples

−2 2 −3 −2 −1 1 2 3 20 40 60

f (x, y) = 1 2

  • x, y

10, 1 2, 1 x y

  • S.V

. N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

slide-18
SLIDE 18

Convex Functions and Sets

Some Familiar Examples 0.2 0.4 0.6 0.8 1 −0.6 −0.4 −0.2 f (x) = x log x + (1 − x) log(1 − x) (Negative entropy)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

slide-19
SLIDE 19

Convex Functions and Sets

Some Familiar Examples

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5

f (x, y) = x log x + y log y − x − y (Un-normalized negative entropy)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

slide-20
SLIDE 20

Convex Functions and Sets

Some Familiar Examples −3 −2 −1 1 2 3 1 2 3 4 f (x) = max(0, 1 − x) (Hinge Loss)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 43

slide-21
SLIDE 21

Convex Functions and Sets

Some Other Important Examples Linear functions: f (x) = ax + b Softmax: f (x) = log

i exp(xi)

Norms: For example the 2-norm f (x) =

  • i x2

i

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 9 / 43

slide-22
SLIDE 22

Convex Functions and Sets

Convex Sets A set C is convex if, and only if, for all x, x′ ∈ C and λ ∈ (0, 1) we have λx + (1 − λ)x′ ∈ C

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 10 / 43

slide-23
SLIDE 23

Convex Functions and Sets

Convex Sets and Convex Functions A function f is convex if, and only if, its epigraph is a convex set

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 11 / 43

slide-24
SLIDE 24

Convex Functions and Sets

Convex Sets and Convex Functions Indicator functions of convex sets are convex IC(x) =

  • if x ∈ C

  • therwise.

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 12 / 43

slide-25
SLIDE 25

Convex Functions and Sets

Below sets of Convex Functions −2 2 −2 2 10 f (x, y) = x2 + y2

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 13 / 43

slide-26
SLIDE 26

Convex Functions and Sets

Below sets of Convex Functions 0.5 1 1.5 2 0 1 2 −2 −1 f (x, y) = x log x + y log y − x − y

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 13 / 43

slide-27
SLIDE 27

Convex Functions and Sets

Below sets of Convex Functions If f is convex, then all its level sets are convex Is the converse true? (Exercise: construct a counter-example)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 14 / 43

slide-28
SLIDE 28

Convex Functions and Sets

Minima on Convex Sets Set of minima of a convex function is a convex set Proof: Consider the set {x : f (x) ≤ f ∗}

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 15 / 43

slide-29
SLIDE 29

Convex Functions and Sets

Minima on Convex Sets Set of minima of a strictly convex function is a singleton Proof: try this at home!

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 15 / 43

slide-30
SLIDE 30

Operations Which Preserve Convexity

Outline

1

Convex Functions and Sets

2

Operations Which Preserve Convexity

3

First Order Properties

4

Subgradients

5

Constraints

6

Warmup: Minimizing a 1-d Convex Function

7

Warmup: Coordinate Descent

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 16 / 43

slide-31
SLIDE 31

Operations Which Preserve Convexity

Set Operations Intersection of convex sets is convex Image of a convex set under a linear transformation is convex Inverse image of a convex set under a linear transformation is convex

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 17 / 43

slide-32
SLIDE 32

Operations Which Preserve Convexity

Function Operations Linear Combination with non-negative weights: f (x) =

i wifi(x)

s.t. wi ≥ 0 Pointwise maximum: f (x) = maxi fi(x) Composition with affine function: f (x) = g(Ax + b) Projection along a direction: f (η) = g(x0 + ηd) Restricting the domain on a convex set: f (x)s.t. x ∈ C

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 18 / 43

slide-33
SLIDE 33

Operations Which Preserve Convexity

One Quick Example The piecewise linear function f (x) := maxi ui, x is convex

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 19 / 43

slide-34
SLIDE 34

First Order Properties

Outline

1

Convex Functions and Sets

2

Operations Which Preserve Convexity

3

First Order Properties

4

Subgradients

5

Constraints

6

Warmup: Minimizing a 1-d Convex Function

7

Warmup: Coordinate Descent

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 20 / 43

slide-35
SLIDE 35

First Order Properties

First Order Taylor Expansion The First Order Taylor approximation globally lower bounds the function For any x and x′ we have f (x) ≥ f (x′) +

  • x − x′, ∇f (x′)
  • S.V

. N. Vishwanathan (Purdue University) Optimization for Machine Learning 21 / 43

slide-36
SLIDE 36

First Order Properties

Bregman Divergence f (x′) + x − x′, ∇f (x′) f (x) f (x′) For any x and x′ the Bregman divergence defined by f is given by ∆f (x, x′) = f (x) − f (x′) −

  • x − x′, ∇f (x′)
  • .

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 22 / 43

slide-37
SLIDE 37

First Order Properties

Euclidean Distance Squared Bregman Divergence For any x and x′ the Bregman divergence defined by f is given by ∆f (x, x′) = f (x) − f (x′) −

  • x − x′, ∇f (x′)
  • .

Use f (x) = 1

2 x2 and verify that

∆f (x, x′) = 1 2

  • x − x′

2

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 23 / 43

slide-38
SLIDE 38

First Order Properties

Unnormalized Relative Entropy Bregman Divergence For any x and x′ the Bregman divergence defined by f is given by ∆f (x, x′) = f (x) − f (x′) −

  • x − x′, ∇f (x′)
  • .

Use f (x) =

i xi log xi − xi and verify that

∆f (x, x′) =

  • i

xi log xi − xi − xi log x′

i + x′ i

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 24 / 43

slide-39
SLIDE 39

First Order Properties

Identifying the Minimum Let f : X → R be a differentiable convex function. Then x is a minimizer of f , if, and only if,

  • x′ − x, ∇f (x)
  • ≥ 0 for all x′.

One way to ensure this is to set ∇f (x) = 0 Minimizing a smooth convex function is the same as finding an x such that ∇f (x) = 0

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 25 / 43

slide-40
SLIDE 40

Subgradients

Outline

1

Convex Functions and Sets

2

Operations Which Preserve Convexity

3

First Order Properties

4

Subgradients

5

Constraints

6

Warmup: Minimizing a 1-d Convex Function

7

Warmup: Coordinate Descent

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 26 / 43

slide-41
SLIDE 41

Subgradients

What if the Function is NonSmooth? The piecewise linear function f (x) := max

i

ui, x is convex but not differentiable at the kinks!

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 27 / 43

slide-42
SLIDE 42

Subgradients

Subgradients to the Rescue A subgradient at x′ is any vector s which satisfies f (x) ≥ f (x′) +

  • x − x′, s
  • for all x

Set of all subgradients is denoted as ∂f (w)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 28 / 43

slide-43
SLIDE 43

Subgradients

Subgradients to the Rescue A subgradient at x′ is any vector s which satisfies f (x) ≥ f (x′) +

  • x − x′, s
  • for all x

Set of all subgradients is denoted as ∂f (w)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 28 / 43

slide-44
SLIDE 44

Subgradients

Subgradients to the Rescue A subgradient at x′ is any vector s which satisfies f (x) ≥ f (x′) +

  • x − x′, s
  • for all x

Set of all subgradients is denoted as ∂f (w)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 28 / 43

slide-45
SLIDE 45

Subgradients

Example −3 −2 −1 1 2 3 1 2 3 f (x) = |x| and ∂f (0) = [−1, 1]

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 29 / 43

slide-46
SLIDE 46

Subgradients

Identifying the Minimum Let f : X → R be a convex function. Then x is a minimizer of f , if, and only if, there exists a µ ∈ ∂f (x) such that

  • x′ − x, µ
  • ≥ 0 for all x′.

One way to ensure this is to ensure that 0 ∈ ∂f (x)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 30 / 43

slide-47
SLIDE 47

Constraints

Outline

1

Convex Functions and Sets

2

Operations Which Preserve Convexity

3

First Order Properties

4

Subgradients

5

Constraints

6

Warmup: Minimizing a 1-d Convex Function

7

Warmup: Coordinate Descent

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 31 / 43

slide-48
SLIDE 48

Constraints

A Simple Example −4 −2 2 4 2 4 6 8 10 12 Minimize 1

2x2s.t. 1 ≤ w ≤ 2

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 32 / 43

slide-49
SLIDE 49

Constraints

Projection x′ x PC(x′) := min

x∈C

  • x − x′

2 Assignment: Compute PC(x′) when C = {x s.t. l ≤ xi ≤ u}

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 33 / 43

slide-50
SLIDE 50

Constraints

First Order Conditions For Constrained Problems x = PC(x − ∇f (x)) If x − ∇f (x) ∈ C then PC(x − ∇f (x)) = x implies that ∇f (x) = 0 Otherwise, it shows that the constraints are preventing further progress in the direction of descent

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 34 / 43

slide-51
SLIDE 51

Warmup: Minimizing a 1-d Convex Function

Outline

1

Convex Functions and Sets

2

Operations Which Preserve Convexity

3

First Order Properties

4

Subgradients

5

Constraints

6

Warmup: Minimizing a 1-d Convex Function

7

Warmup: Coordinate Descent

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 35 / 43

slide-52
SLIDE 52

Warmup: Minimizing a 1-d Convex Function

Problem Statement Given a black-box which can compute J : R → R and J′ : R → R find the minimum value of J

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 36 / 43

slide-53
SLIDE 53

Warmup: Minimizing a 1-d Convex Function

Increasing Gradients From the first order conditions J(w) ≥ J(w′) + (w − w′) · J′(w′) and J(w′) ≥ J(w) + (w′ − w) · J′(w) Add the two (w − w′) · (J′(w) − J′(w′)) ≥ 0 w ≥ w′ implies that J′(w) ≥ J′(w′)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 37 / 43

slide-54
SLIDE 54

Warmup: Minimizing a 1-d Convex Function

Increasing Gradients w J(w)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 37 / 43

slide-55
SLIDE 55

Warmup: Minimizing a 1-d Convex Function

Increasing Gradients w J′(w)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 37 / 43

slide-56
SLIDE 56

Warmup: Minimizing a 1-d Convex Function

Increasing Gradients w J(w)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 37 / 43

slide-57
SLIDE 57

Warmup: Minimizing a 1-d Convex Function

Increasing Gradients w J′(w)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 37 / 43

slide-58
SLIDE 58

Warmup: Minimizing a 1-d Convex Function

Problem Restatement w J′(w) Identify the point where the increasing function J′ crosses zero

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 38 / 43

slide-59
SLIDE 59

Warmup: Minimizing a 1-d Convex Function

Bisection Algorithm U L w J′(w)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 39 / 43

slide-60
SLIDE 60

Warmup: Minimizing a 1-d Convex Function

Bisection Algorithm U L M w J′(w)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 39 / 43

slide-61
SLIDE 61

Warmup: Minimizing a 1-d Convex Function

Bisection Algorithm U L w J′(w)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 39 / 43

slide-62
SLIDE 62

Warmup: Minimizing a 1-d Convex Function

Bisection Algorithm U L M w J′(w)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 39 / 43

slide-63
SLIDE 63

Warmup: Minimizing a 1-d Convex Function

Bisection Algorithm U L w J′(w)

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 39 / 43

slide-64
SLIDE 64

Warmup: Minimizing a 1-d Convex Function

Interval Bisection Require: L, U, ǫ

1: maxgrad ← J′(U) 2: while (U − L) · maxgrad > ǫ do 3:

M ← U+L

2

4:

if J′(M) > 0 then

5:

U ← M

6:

else

7:

L ← M

8:

end if

9: end while 10: return

U+L 2

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 40 / 43

slide-65
SLIDE 65

Warmup: Coordinate Descent

Outline

1

Convex Functions and Sets

2

Operations Which Preserve Convexity

3

First Order Properties

4

Subgradients

5

Constraints

6

Warmup: Minimizing a 1-d Convex Function

7

Warmup: Coordinate Descent

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 41 / 43

slide-66
SLIDE 66

Warmup: Coordinate Descent

Problem Statement

−2 2 −3 −2 −1 1 2 3 20 40 60

Given a black-box which can compute J : Rn → R and J′ : Rn → Rn find the minimum value of J

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 42 / 43

slide-67
SLIDE 67

Warmup: Coordinate Descent

Concrete Example

−2 2 −3 −2 −1 1 2 3 20 40 60

f (x, y) = 1 2

  • x, y

10, 1 2, 1 x y

  • S.V

. N. Vishwanathan (Purdue University) Optimization for Machine Learning 43 / 43

slide-68
SLIDE 68

Warmup: Coordinate Descent

Concrete Example

−2 2 −3 −2 −1 1 2 3 20 40 60 x y

f (x, 3) = 1 2

  • x, 3

10, 1 2, 1 x 3

  • S.V

. N. Vishwanathan (Purdue University) Optimization for Machine Learning 43 / 43

slide-69
SLIDE 69

Warmup: Coordinate Descent

Concrete Example −3 −2 −1 1 2 3 20 40 60 x f (x, 3) = 5x2 + 9 2x + 9 2

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 43 / 43

slide-70
SLIDE 70

Warmup: Coordinate Descent

Concrete Example −3 −2 −1 1 2 3 20 40 60 x f (x, 3) = 5x2 + 9 2x + 9 2 Minima: x = − 9 20

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 43 / 43

slide-71
SLIDE 71

Warmup: Coordinate Descent

Concrete Example

−2 2 −3 −2 −1 1 2 3 20 40 60 x y

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 43 / 43

slide-72
SLIDE 72

Warmup: Coordinate Descent

Concrete Example −3 −2 −1 1 2 3 2 4 6 8 y f (− 9 20, y) = 1 2y2 − 27 40y + 81 80

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 43 / 43

slide-73
SLIDE 73

Warmup: Coordinate Descent

Concrete Example −3 −2 −1 1 2 3 2 4 6 8 y f (− 9 20, y) = 1 2y2 − 27 40y + 81 80 Minima: y = 27 40

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 43 / 43

slide-74
SLIDE 74

Warmup: Coordinate Descent

Concrete Example −3 −2 −1 1 2 3 10 20 30 40 50 x f (x, 27

40)

Are we done?

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 43 / 43

slide-75
SLIDE 75

Warmup: Coordinate Descent

Concrete Example −3 −2 −1 1 2 3 10 20 30 40 50 x = − 9

20

x f (x, 27

40)

Are we done?

S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 43 / 43