Optimizing Costly Functions with Simple Constraints: A - - PowerPoint PPT Presentation

optimizing costly functions with simple constraints a
SMART_READER_LITE
LIVE PREVIEW

Optimizing Costly Functions with Simple Constraints: A - - PowerPoint PPT Presentation

Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm Mark Schmidt, Ewout van den Berg, Michael P. Friedlander, and Kevin Murphy Department of Computer Science University of British Columbia


slide-1
SLIDE 1

Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm Mark Schmidt, Ewout van den Berg, Michael P. Friedlander, and Kevin Murphy

Department of Computer Science University of British Columbia

April 18, 2009

slide-2
SLIDE 2

Introduction PQN Algorithm Experiments Discussion Motivating Problem Our Contribution

Outline

1 Introduction

Motivating Problem Our Contribution

2 PQN Algorithm 3 Experiments 4 Discussion

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-3
SLIDE 3

Introduction PQN Algorithm Experiments Discussion Motivating Problem Our Contribution

Motivating Problem: Structure Learning in Discrete MRFs

We want to fit a Markov random field to discrete data y, but don’t know the graph structure

Y1 Y2 ? Y3 Y4 ? ? ? ? ?

We can learn a sparse structure by using ℓ1-regularization of the edge parameters [Wainwright et al. 2006, Lee et al. 2006] Since each edge has multiple parameters, we use group ℓ1-regularization [Bach et al. 2004, Turlach et al. 2005, Yuan & Lin 2006]: minimize

w

− log p(y|w) subject to

  • e

||we||2 ≤ τ

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-4
SLIDE 4

Introduction PQN Algorithm Experiments Discussion Motivating Problem Our Contribution

Motivating Problem: Structure Learning in Discrete MRFs

We want to fit a Markov random field to discrete data y, but don’t know the graph structure

Y1 Y2 ? Y3 Y4 ? ? ? ? ?

We can learn a sparse structure by using ℓ1-regularization of the edge parameters [Wainwright et al. 2006, Lee et al. 2006] Since each edge has multiple parameters, we use group ℓ1-regularization [Bach et al. 2004, Turlach et al. 2005, Yuan & Lin 2006]: minimize

w

− log p(y|w) subject to

  • e

||we||2 ≤ τ

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-5
SLIDE 5

Introduction PQN Algorithm Experiments Discussion Motivating Problem Our Contribution

Motivating Problem: Structure Learning in Discrete MRFs

We want to fit a Markov random field to discrete data y, but don’t know the graph structure

Y1 Y2 ? Y3 Y4 ? ? ? ? ?

We can learn a sparse structure by using ℓ1-regularization of the edge parameters [Wainwright et al. 2006, Lee et al. 2006] Since each edge has multiple parameters, we use group ℓ1-regularization [Bach et al. 2004, Turlach et al. 2005, Yuan & Lin 2006]: minimize

w

− log p(y|w) subject to

  • e

||we||2 ≤ τ

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-6
SLIDE 6

Introduction PQN Algorithm Experiments Discussion Motivating Problem Our Contribution

Optimization Problem Challenges

Solving this optimization problem has 3 complicating factors:

1 the number of parameters is large 2 evaluating the objective is expensive 3 the parameters have constraints

So how should we solve it? Interior point methods: the number of parameters is too large Projected gradient: evaluating the objective is too expensive Quasi-Newton methods (L-BFGS): we have constraints

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-7
SLIDE 7

Introduction PQN Algorithm Experiments Discussion Motivating Problem Our Contribution

Optimization Problem Challenges

Solving this optimization problem has 3 complicating factors:

1 the number of parameters is large 2 evaluating the objective is expensive 3 the parameters have constraints

So how should we solve it? Interior point methods: the number of parameters is too large Projected gradient: evaluating the objective is too expensive Quasi-Newton methods (L-BFGS): we have constraints

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-8
SLIDE 8

Introduction PQN Algorithm Experiments Discussion Motivating Problem Our Contribution

Extending the L-BFGS Algorithm

Quasi-Newton methods that use L-BFGS updates achieve state of the art performance for unconstrained differentiable optimization [Nocedal 1980, Liu & Nocedal 1989] L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

  • ptimization [Byrd et al. 1995]

OWL-QN: state of the art performance for ℓ1-regularized

  • ptimization [Andrew & Gao 2007].

The above don’t apply since our constraints are not separable However, the constraints are still simple: we can compute the projection in O(n)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-9
SLIDE 9

Introduction PQN Algorithm Experiments Discussion Motivating Problem Our Contribution

Extending the L-BFGS Algorithm

Quasi-Newton methods that use L-BFGS updates achieve state of the art performance for unconstrained differentiable optimization [Nocedal 1980, Liu & Nocedal 1989] L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

  • ptimization [Byrd et al. 1995]

OWL-QN: state of the art performance for ℓ1-regularized

  • ptimization [Andrew & Gao 2007].

The above don’t apply since our constraints are not separable However, the constraints are still simple: we can compute the projection in O(n)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-10
SLIDE 10

Introduction PQN Algorithm Experiments Discussion Motivating Problem Our Contribution

Extending the L-BFGS Algorithm

Quasi-Newton methods that use L-BFGS updates achieve state of the art performance for unconstrained differentiable optimization [Nocedal 1980, Liu & Nocedal 1989] L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

  • ptimization [Byrd et al. 1995]

OWL-QN: state of the art performance for ℓ1-regularized

  • ptimization [Andrew & Gao 2007].

The above don’t apply since our constraints are not separable However, the constraints are still simple: we can compute the projection in O(n)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-11
SLIDE 11

Introduction PQN Algorithm Experiments Discussion Motivating Problem Our Contribution

Extending the L-BFGS Algorithm

Quasi-Newton methods that use L-BFGS updates achieve state of the art performance for unconstrained differentiable optimization [Nocedal 1980, Liu & Nocedal 1989] L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

  • ptimization [Byrd et al. 1995]

OWL-QN: state of the art performance for ℓ1-regularized

  • ptimization [Andrew & Gao 2007].

The above don’t apply since our constraints are not separable However, the constraints are still simple: we can compute the projection in O(n)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-12
SLIDE 12

Introduction PQN Algorithm Experiments Discussion Motivating Problem Our Contribution

Our Contribution

This talk presents an extension of L-BFGS that is suitable when:

1 the number of parameters is large 2 evaluating the objective is expensive 3 the parameters have constraints 4 projecting onto the constraints is substantially cheaper than

evaluating the objective function The method uses a two-level strategy At the outer level, L-BFGS updates build a constrained local quadratic approximation to the function At the inner level, SPG uses projections to minimize this constrained quadratic approximation

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-13
SLIDE 13

Introduction PQN Algorithm Experiments Discussion Motivating Problem Our Contribution

Our Contribution

This talk presents an extension of L-BFGS that is suitable when:

1 the number of parameters is large 2 evaluating the objective is expensive 3 the parameters have constraints 4 projecting onto the constraints is substantially cheaper than

evaluating the objective function The method uses a two-level strategy At the outer level, L-BFGS updates build a constrained local quadratic approximation to the function At the inner level, SPG uses projections to minimize this constrained quadratic approximation

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-14
SLIDE 14

Introduction PQN Algorithm Experiments Discussion Motivating Problem Our Contribution

Our Contribution

This talk presents an extension of L-BFGS that is suitable when:

1 the number of parameters is large 2 evaluating the objective is expensive 3 the parameters have constraints 4 projecting onto the constraints is substantially cheaper than

evaluating the objective function The method uses a two-level strategy At the outer level, L-BFGS updates build a constrained local quadratic approximation to the function At the inner level, SPG uses projections to minimize this constrained quadratic approximation

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-15
SLIDE 15

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Outline

1 Introduction 2 PQN Algorithm

Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

3 Experiments 4 Discussion

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-16
SLIDE 16

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Problem Statement and Assumptions

We address the problem of minimizing a differentiable function f (x) over a convex set C: minimize

x

f (x) subject to x ∈ C We assume you can compute the objective f (x), the gradient ∇f (x), and the projection PC(x): PC(x) = arg min

c

c − x2 subject to c ∈ C.

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-17
SLIDE 17

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Problem Statement and Assumptions

We address the problem of minimizing a differentiable function f (x) over a convex set C: minimize

x

f (x) subject to x ∈ C We assume you can compute the objective f (x), the gradient ∇f (x), and the projection PC(x): PC(x) = arg min

c

c − x2 subject to c ∈ C.

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-18
SLIDE 18

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

PG: Projected Gradient Algorithm

PG: move towards the projection of the negative gradient

Feasible Set f(x)

xk

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-19
SLIDE 19

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

PG: Projected Gradient Algorithm

PG: move towards the projection of the negative gradient

Feasible Set

xk - gk

f(x)

xk

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-20
SLIDE 20

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

PG: Projected Gradient Algorithm

PG: move towards the projection of the negative gradient

Feasible Set

xk - gk P(xk - gk)

f(x)

xk

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-21
SLIDE 21

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

PG: Projected Gradient Algorithm

PG: move towards the projection of the negative gradient

Feasible Set

xk - gk P(xk - gk)

f(x)

dk xk

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-22
SLIDE 22

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Naive Projected Newton Algorithm

The problem with projected gradient: slow convergence Can we speed this up by projecting the Newton direction?

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-23
SLIDE 23

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Naive Projected Newton Algorithm

The problem with projected gradient: slow convergence Can we speed this up by projecting the Newton direction?

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-24
SLIDE 24

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Naive Projected Newton Algorithm

Feasible Set

xk - gk P(xk - gk)

f(x)

xk

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-25
SLIDE 25

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Naive Projected Newton Algorithm

Feasible Set

xk - gk P(xk - gk)

f(x)

xk

q(x)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-26
SLIDE 26

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Naive Projected Newton Algorithm

Feasible Set

xk - gk P(xk - gk)

f(x)

xk - Bk\gk xk

q(x)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-27
SLIDE 27

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Naive Projected Newton Algorithm

Feasible Set

xk - gk P(xk - gk)

f(x)

xk - Bk\gk P(xk - Bk\gk) xk

q(x)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-28
SLIDE 28

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Naive Projected Newton Algorithm

The problem with projected gradient: slow convergence Can we speed this up by projecting the Newton direction? NO! This can point in the wrong direction

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-29
SLIDE 29

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Naive Projected Newton Algorithm

The problem with projected gradient: slow convergence Can we speed this up by projecting the Newton direction? NO! This can point in the wrong direction

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-30
SLIDE 30

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Naive Projected Gradient Algorithm: Problem

Feasible Set

xk

f(x)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-31
SLIDE 31

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Naive Projected Gradient Algorithm: Problem

Feasible Set

xk - gk P(xk - gk) xk

f(x)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-32
SLIDE 32

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Naive Projected Gradient Algorithm: Problem

Feasible Set

xk - gk P(xk - gk) xk

q(x) f(x)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-33
SLIDE 33

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Naive Projected Gradient Algorithm: Problem

Feasible Set

xk - gk P(xk - gk) xk - Bk\gk xk

q(x) f(x)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-34
SLIDE 34

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Naive Projected Gradient Algorithm: Problem

Feasible Set

xk - gk P(xk - gk) xk - Bk\gk P(xk - Bk\gk) xk

q(x) f(x)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-35
SLIDE 35

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Correct Projected Newton Algorithm

In projected Newton methods, we form a quadratic approximation to the function around xk: qk(x) fk + (x − xk)T∇f (xk) + 1

2(x − xk)TBk(x − xk)

At each iteration, we minimize this function over the set: minimize

x

qk(x) subject to x ∈ C NOT the same as projecting the unconstrained Newton step This generates a feasible descent direction dk x − xk The method has a quadratic rate of convergence around a local minimizer [Bertsekas, 1999]

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-36
SLIDE 36

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Correct Projected Newton Algorithm

In projected Newton methods, we form a quadratic approximation to the function around xk: qk(x) fk + (x − xk)T∇f (xk) + 1

2(x − xk)TBk(x − xk)

At each iteration, we minimize this function over the set: minimize

x

qk(x) subject to x ∈ C NOT the same as projecting the unconstrained Newton step This generates a feasible descent direction dk x − xk The method has a quadratic rate of convergence around a local minimizer [Bertsekas, 1999]

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-37
SLIDE 37

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Correct Projected Newton Algorithm

In projected Newton methods, we form a quadratic approximation to the function around xk: qk(x) fk + (x − xk)T∇f (xk) + 1

2(x − xk)TBk(x − xk)

At each iteration, we minimize this function over the set: minimize

x

qk(x) subject to x ∈ C NOT the same as projecting the unconstrained Newton step This generates a feasible descent direction dk x − xk The method has a quadratic rate of convergence around a local minimizer [Bertsekas, 1999]

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-38
SLIDE 38

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Correct Projected Newton Algorithm

In projected Newton methods, we form a quadratic approximation to the function around xk: qk(x) fk + (x − xk)T∇f (xk) + 1

2(x − xk)TBk(x − xk)

At each iteration, we minimize this function over the set: minimize

x

qk(x) subject to x ∈ C NOT the same as projecting the unconstrained Newton step This generates a feasible descent direction dk x − xk The method has a quadratic rate of convergence around a local minimizer [Bertsekas, 1999]

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-39
SLIDE 39

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Correct Projected Newton Algorithm

In projected Newton methods, we form a quadratic approximation to the function around xk: qk(x) fk + (x − xk)T∇f (xk) + 1

2(x − xk)TBk(x − xk)

At each iteration, we minimize this function over the set: minimize

x

qk(x) subject to x ∈ C NOT the same as projecting the unconstrained Newton step This generates a feasible descent direction dk x − xk The method has a quadratic rate of convergence around a local minimizer [Bertsekas, 1999]

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-40
SLIDE 40

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Projected Newton Algorithm

Feasible Set f(x)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-41
SLIDE 41

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Projected Newton Algorithm

Feasible Set f(x) q(x)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-42
SLIDE 42

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Projected Newton Algorithm

Feasible Set f(x) q(x)

P(xk - gk)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-43
SLIDE 43

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Projected Newton Algorithm

Feasible Set f(x) q(x)

P(xk - gk)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-44
SLIDE 44

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Projected Newton Algorithm

Feasible Set f(x) q(x)

P(xk - gk)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-45
SLIDE 45

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Projected Newton Algorithm

Feasible Set f(x) q(x) minxC q(x)

P(xk - gk) dk

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-46
SLIDE 46

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Problems with the Projected Newton Algorithm

Unfortunately, the projected Newton method can be inefficient: Computing dk may be very expensive Using a general n-by-n matrix Bk is impratical Our algorithm is a projected quasi-Newton algorithm where: L-BFGS updates construct a diagonal plus low-rank Bk SPG efficiently computes dk with this Bk and projections.

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-47
SLIDE 47

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Problems with the Projected Newton Algorithm

Unfortunately, the projected Newton method can be inefficient: Computing dk may be very expensive Using a general n-by-n matrix Bk is impratical Our algorithm is a projected quasi-Newton algorithm where: L-BFGS updates construct a diagonal plus low-rank Bk SPG efficiently computes dk with this Bk and projections.

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-48
SLIDE 48

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Outline

1 Introduction

Motivating Problem Our Contribution

2 PQN Algorithm

Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

3 Experiments

Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

4 Discussion

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-49
SLIDE 49

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Broyden-Fletcher-Goldfarb-Shanno (BFGS) Updates

Quasi-Newton methods work with parameter and gradient differences between iterations: sk xk+1 − xk and yk gk+1 − gk They start with an initial approximation B0 σI, and choose Bk+1 to interpolate the gradient difference: Bk+1sk = yk Since Bk+1 is not unique, the BFGS method chooses the matrix whose difference with Bk minimizes a weighted Frobenius norm: Bk+1 = Bk − BksksT

k Bk

sT

k Bksk

+ ykyT

k

yT

k sk

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-50
SLIDE 50

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Broyden-Fletcher-Goldfarb-Shanno (BFGS) Updates

Quasi-Newton methods work with parameter and gradient differences between iterations: sk xk+1 − xk and yk gk+1 − gk They start with an initial approximation B0 σI, and choose Bk+1 to interpolate the gradient difference: Bk+1sk = yk Since Bk+1 is not unique, the BFGS method chooses the matrix whose difference with Bk minimizes a weighted Frobenius norm: Bk+1 = Bk − BksksT

k Bk

sT

k Bksk

+ ykyT

k

yT

k sk

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-51
SLIDE 51

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Broyden-Fletcher-Goldfarb-Shanno (BFGS) Updates

Quasi-Newton methods work with parameter and gradient differences between iterations: sk xk+1 − xk and yk gk+1 − gk They start with an initial approximation B0 σI, and choose Bk+1 to interpolate the gradient difference: Bk+1sk = yk Since Bk+1 is not unique, the BFGS method chooses the matrix whose difference with Bk minimizes a weighted Frobenius norm: Bk+1 = Bk − BksksT

k Bk

sT

k Bksk

+ ykyT

k

yT

k sk

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-52
SLIDE 52

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

L-BFGS: Limited-Memory BFGS

Instead of storing Bk, the limited-memory BFGS (L-BFGS) method just stores the previous m differences sk and yk. [Nocedal 1980, Liu & Nocedal 1989] These updates applied to B0 = σkI can be written compactly in a diagonal plus low-rank form [Byrd et al. 1994]: Bm = σkI − NM−1NT This representations makes multiplication with Bk cost O(mn).

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-53
SLIDE 53

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

SPG: Spectral Projected Gradient

Recall the projected quasi-Newton sub-problem: minimize

x

fk + (x − xk)T∇f (xk) + 1

2(x − xk)TBk(x − xk)

subject to x ∈ C With the L-BFGS representation of Bk, we can compute the

  • bjective function and gradient in O(mn).

This still doesn’t let us efficiently solve the problem To solve it, we use the spectral projected gradient (SPG) algorithm.

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-54
SLIDE 54

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

SPG: Spectral Projected Gradient

Recall the projected quasi-Newton sub-problem: minimize

x

fk + (x − xk)T∇f (xk) + 1

2(x − xk)TBk(x − xk)

subject to x ∈ C With the L-BFGS representation of Bk, we can compute the

  • bjective function and gradient in O(mn).

This still doesn’t let us efficiently solve the problem To solve it, we use the spectral projected gradient (SPG) algorithm.

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-55
SLIDE 55

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

SPG: Spectral Projected Gradient

Recall the projected quasi-Newton sub-problem: minimize

x

fk + (x − xk)T∇f (xk) + 1

2(x − xk)TBk(x − xk)

subject to x ∈ C With the L-BFGS representation of Bk, we can compute the

  • bjective function and gradient in O(mn).

This still doesn’t let us efficiently solve the problem To solve it, we use the spectral projected gradient (SPG) algorithm.

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-56
SLIDE 56

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

SPG: Spectral Projected Gradient

The classic projected gradient takes steps of the form xk+1 = PC(xk − αgk) SPG has two enhancements [Birgin et al. 2000]: It uses the Barzilai and Borwein [1988] ‘spectral’ step length: αbb = yk−1, yk−1 sk−1, yk−1 It uses a non-monotone line search [Grippo et al. 1986]

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-57
SLIDE 57

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

SPG: Spectral Projected Gradient

The classic projected gradient takes steps of the form xk+1 = PC(xk − αgk) SPG has two enhancements [Birgin et al. 2000]: It uses the Barzilai and Borwein [1988] ‘spectral’ step length: αbb = yk−1, yk−1 sk−1, yk−1 It uses a non-monotone line search [Grippo et al. 1986]

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-58
SLIDE 58

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Barzilai & Borwein Step Size

15 10 5 5 10 15 Gradient Descent BarzilaiBorwein

Barzilai-Borwein Steepest Descent

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-59
SLIDE 59

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

SPG: Spectral Projected Gradient

There is growing interest in SPG for constrained optimization [Dai & Fletcher 2005, van den Berg & Friedlander 2008] We apply SPG to minimize the strictly convex constrained quadratic approximations Friedlander et al. [1999] show that SPG has a superlinear convergence rate for minimizing strictly convex quadratics Instead of ‘solving’ the sub-problem, we could just perform k iterations of SPG to improve the steepest descent direction. In this case, solving the sub-problems is in O(mnk), plus the cost of computing the projection k times.

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-60
SLIDE 60

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

SPG: Spectral Projected Gradient

There is growing interest in SPG for constrained optimization [Dai & Fletcher 2005, van den Berg & Friedlander 2008] We apply SPG to minimize the strictly convex constrained quadratic approximations Friedlander et al. [1999] show that SPG has a superlinear convergence rate for minimizing strictly convex quadratics Instead of ‘solving’ the sub-problem, we could just perform k iterations of SPG to improve the steepest descent direction. In this case, solving the sub-problems is in O(mnk), plus the cost of computing the projection k times.

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-61
SLIDE 61

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

SPG: Spectral Projected Gradient

There is growing interest in SPG for constrained optimization [Dai & Fletcher 2005, van den Berg & Friedlander 2008] We apply SPG to minimize the strictly convex constrained quadratic approximations Friedlander et al. [1999] show that SPG has a superlinear convergence rate for minimizing strictly convex quadratics Instead of ‘solving’ the sub-problem, we could just perform k iterations of SPG to improve the steepest descent direction. In this case, solving the sub-problems is in O(mnk), plus the cost of computing the projection k times.

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-62
SLIDE 62

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

SPG: Spectral Projected Gradient

There is growing interest in SPG for constrained optimization [Dai & Fletcher 2005, van den Berg & Friedlander 2008] We apply SPG to minimize the strictly convex constrained quadratic approximations Friedlander et al. [1999] show that SPG has a superlinear convergence rate for minimizing strictly convex quadratics Instead of ‘solving’ the sub-problem, we could just perform k iterations of SPG to improve the steepest descent direction. In this case, solving the sub-problems is in O(mnk), plus the cost of computing the projection k times.

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-63
SLIDE 63

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Outline of the Method

The projected quasi-Newton (PQN) method:

1 Evaluate the current objective function and gradient 2 Add/remove difference vectors for L-BFGS 3 Run SPG to compute the projected quasi-Newton direction dk 4 Generate the next iterate with a backtracking line search

The overall algorithm will be most effective when: computing projections is cheaper than evaluating the objective

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-64
SLIDE 64

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Outline of the Method

The projected quasi-Newton (PQN) method:

1 Evaluate the current objective function and gradient 2 Add/remove difference vectors for L-BFGS 3 Run SPG to compute the projected quasi-Newton direction dk 4 Generate the next iterate with a backtracking line search

The overall algorithm will be most effective when: computing projections is cheaper than evaluating the objective

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-65
SLIDE 65

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Outline

1 Introduction

Motivating Problem Our Contribution

2 PQN Algorithm

Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

3 Experiments

Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

4 Discussion

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-66
SLIDE 66

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Projection onto Norm-Balls

We are interested in projecting onto balls induced by norms: C ≡ {x | x ≤ τ} This projection can be computed in linear-time for many ℓp-norms, such as the ℓ2-, ℓ∞-, and ℓ1-norms [Duchi et al. 2008] We are also interested in the case of the mixed p, q-norm balls that arise in group variable selection: xp,q =

i xσip q

1/p The group-lasso is the special case where p = 1, q = 2: x1,2 =

i xσi2

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-67
SLIDE 67

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Projection onto Norm-Balls

We are interested in projecting onto balls induced by norms: C ≡ {x | x ≤ τ} This projection can be computed in linear-time for many ℓp-norms, such as the ℓ2-, ℓ∞-, and ℓ1-norms [Duchi et al. 2008] We are also interested in the case of the mixed p, q-norm balls that arise in group variable selection: xp,q =

i xσip q

1/p The group-lasso is the special case where p = 1, q = 2: x1,2 =

i xσi2

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-68
SLIDE 68

Introduction PQN Algorithm Experiments Discussion Projected Newton Algorithm Limited-Memory BFGS Updates Spectral Projected Gradient Projection onto Norm-Balls

Projection onto Mixed Norm-Balls

The following proposition leads to an expected linear-time randomized algorithm for group-lasso projection: Proposition Consider c ∈ Rn and a set of g disjoint groups {σi}g

i=1 such that

∪iσi = {1, . . . , n}. Then the Euclidean projection PC(c) onto the ℓ1,2-norm ball of radius τ is given by xσi = sgn(cσi) · wi, i = 1, . . . , g, where w = P(v) is the projection of vector v onto the ℓ1-norm ball of radius τ, with vi = cσi2.

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-69
SLIDE 69

Introduction PQN Algorithm Experiments Discussion Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

Outline

1 Introduction 2 PQN Algorithm 3 Experiments

Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

4 Discussion

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-70
SLIDE 70

Introduction PQN Algorithm Experiments Discussion Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

Experiments

We performed several experiments to test the new method: We first compared to other extensions of L-BFGS [see paper] We then compared to state of the art methods for graph structure learning

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-71
SLIDE 71

Introduction PQN Algorithm Experiments Discussion Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

Experiments

We performed several experiments to test the new method: We first compared to other extensions of L-BFGS [see paper] We then compared to state of the art methods for graph structure learning

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-72
SLIDE 72

Introduction PQN Algorithm Experiments Discussion Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

Gaussian Graphical Model Structure Learning

We looked at training a Gaussian graphical model with an ℓ1 penalty on the precision matrix elements to induce a sparse structure [Banerjee et al. 2006, Friedman et al. 2007]: minimize

K≻0

− log det(K) + tr(ˆ ΣK) + λK1, We used the Gasch et al. [2000] data with the pre-processing of Duchi et al. [2008], and as with previous work we solve the dual problem: maximize

W

log det(ˆ Σ + W ) subject to ˆ Σ + W ≻ 0, W ∞ ≤ λ We compared to a projected gradient method [Duchi et al. 2008].

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-73
SLIDE 73

Introduction PQN Algorithm Experiments Discussion Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

Gaussian Graphical Model Structure Learning

We looked at training a Gaussian graphical model with an ℓ1 penalty on the precision matrix elements to induce a sparse structure [Banerjee et al. 2006, Friedman et al. 2007]: minimize

K≻0

− log det(K) + tr(ˆ ΣK) + λK1, We used the Gasch et al. [2000] data with the pre-processing of Duchi et al. [2008], and as with previous work we solve the dual problem: maximize

W

log det(ˆ Σ + W ) subject to ˆ Σ + W ≻ 0, W ∞ ≤ λ We compared to a projected gradient method [Duchi et al. 2008].

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-74
SLIDE 74

Introduction PQN Algorithm Experiments Discussion Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

Gaussian Graphical Model Structure Learning

We looked at training a Gaussian graphical model with an ℓ1 penalty on the precision matrix elements to induce a sparse structure [Banerjee et al. 2006, Friedman et al. 2007]: minimize

K≻0

− log det(K) + tr(ˆ ΣK) + λK1, We used the Gasch et al. [2000] data with the pre-processing of Duchi et al. [2008], and as with previous work we solve the dual problem: maximize

W

log det(ˆ Σ + W ) subject to ˆ Σ + W ≻ 0, W ∞ ≤ λ We compared to a projected gradient method [Duchi et al. 2008].

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-75
SLIDE 75

Introduction PQN Algorithm Experiments Discussion Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

Gaussian Graphical Model Structure Learning

20 40 60 80 100 −1200 −1100 −1000 −900 −800 −700 −600 −500 −400

Function Evaluations Objective Value

PQN PG BCD SPG

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-76
SLIDE 76

Introduction PQN Algorithm Experiments Discussion Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

Gaussian Graphical Model Structure Learning

100 200 300 400 500 600 700 800 900 −600 −550 −500 −450

Function Evaluations Objective Value

PQN PG BCD SPG

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-77
SLIDE 77

Introduction PQN Algorithm Experiments Discussion Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

Gaussian Graphical Model Structure Learning with Groups

We also compared the methods when we induce a group-sparse precision matrix using the ℓ1,∞-norm [Duchi et al. 2008]: minimize

K≻0

− log det(K) + tr(ˆ ΣK) + λK1,∞,

50 100 150 200 −1200 −1100 −1000 −900 −800 −700 −600 −500 −400 −300 −200

Function Evaluations Objective Value

PQN PG SPG

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-78
SLIDE 78

Introduction PQN Algorithm Experiments Discussion Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

Gaussian Graphical Model Structure Learning with Groups

We also compared the methods when we induce a group-sparse precision matrix using the ℓ1,∞-norm [Duchi et al. 2008]: minimize

K≻0

− log det(K) + tr(ˆ ΣK) + λK1,∞,

200 400 600 800 1000 −400 −380 −360 −340 −320 −300 −280

Function Evaluations Objective Value

PQN PG SPG

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-79
SLIDE 79

Introduction PQN Algorithm Experiments Discussion Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

Gaussian Graphical Model Structure Learning with Groups

We also used PQN to look at the performance if we replace the ℓ1,∞-norm [Duchi et al. 2008] with the ℓ1,2-norm: minimize

K≻0

− log det(K) + tr(ˆ ΣK) + λK1,2,

10

−4

10

−3

10

−2

−542 −540 −538 −536 −534 −532 −530

Regularization Strength (λ) Average Log−Likelihood

L1,2 L1,∞ L1 Base

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-80
SLIDE 80

Introduction PQN Algorithm Experiments Discussion Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

Markov Random Field Structure Learning

Finally, we looked at learning a sparse Markov random field: minimize

w

− log p(y|w) subject to

  • e

||we||2 ≤ τ We used the trinary data from [Sachs et al. 2005], and compared to Grafting [Lee et al. 2006] and applying SPG to a second-order cone reformulation [Schmidt et al. 2008].

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-81
SLIDE 81

Introduction PQN Algorithm Experiments Discussion Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

Markov Random Field Structure Learning

Finally, we looked at learning a sparse Markov random field: minimize

w

− log p(y|w) subject to

  • e

||we||2 ≤ τ We used the trinary data from [Sachs et al. 2005], and compared to Grafting [Lee et al. 2006] and applying SPG to a second-order cone reformulation [Schmidt et al. 2008].

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-82
SLIDE 82

Introduction PQN Algorithm Experiments Discussion Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

Markov Random Field Structure Learning

10 20 30 40 50 60 70 80 90 100 3.6 3.7 3.8 3.9 4 4.1 4.2 4.3 4.4 4.5 x 10

4

Function Evaluations Objective Value

PQN SPG Graft PQN−SOC

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-83
SLIDE 83

Introduction PQN Algorithm Experiments Discussion Gaussian Graphical Model Structure Learning Markov Random Field Structure Learning

Markov Random Field Structure Learning

100 200 300 400 500 600 700 800 900 3.7 3.8 3.9 4 4.1 4.2 4.3 x 10

4

Function Evaluations Objective Value

PQN SPG Graft PQN−SOC

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-84
SLIDE 84

Introduction PQN Algorithm Experiments Discussion

Outline

1 Introduction 2 PQN Algorithm 3 Experiments 4 Discussion

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-85
SLIDE 85

Introduction PQN Algorithm Experiments Discussion

Extensions to Other Problems

There are many other cases where we can efficiently compute projections: Projection onto hyper-planes or half-spaces is trivial Projecting onto the probability simplex can be done in O(n log n) Projecting onto the positive semi-definite cone involves truncated the spectral decomposition Projecting onto second-order cones of the form x2 ≤ y can be done in O(n) Dykstra’s algorithm can be used for combinations of simple constraints [Dykstra, 1983]

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints

slide-86
SLIDE 86

Introduction PQN Algorithm Experiments Discussion

Summary

PQN is an extension of L-BFGS that is suitable when:

1 the number of parameters is large 2 evaluating the objective is expensive 3 the parameters have constraints 4 projecting onto the constraints is substantially cheaper than

evaluating the objective function We have found the algorithm useful for a variety of problems, and it is likely useful for others (code online soon)

  • M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy

Optimizing Costly Functions with Simple Constraints