Dynamique(s) de descente pour loptimisation multi-objectif - - PowerPoint PPT Presentation

dynamique s de descente pour l optimisation multi objectif
SMART_READER_LITE
LIVE PREVIEW

Dynamique(s) de descente pour loptimisation multi-objectif - - PowerPoint PPT Presentation

Dynamique(s) de descente pour loptimisation multi-objectif Guillaume Garrigos Istituto Italiano di Tecnologia & Massachusetts Institute of Technology Genova, Italie Journes SMAI-MODE 24 Mars, 2016 Journes SMAI-MODE 2016 - Toulouse


slide-1
SLIDE 1

Dynamique(s) de descente pour l’optimisation multi-objectif

Guillaume Garrigos

Istituto Italiano di Tecnologia & Massachusetts Institute of Technology Genova, Italie

Journées SMAI-MODE 24 Mars, 2016

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 1/20

slide-2
SLIDE 2

Introduction/Motivation

Multi-objective problem

In engineering, decision sciences, it happens that various objective functions shall be minimized simultaneously: f1, ..., fm : H − → R

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 2/20

slide-3
SLIDE 3

Introduction/Motivation

Multi-objective problem

In engineering, decision sciences, it happens that various objective functions shall be minimized simultaneously: f1, ..., fm : H − → R − → Needs appropriate tools: multi-objective optimization.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 2/20

slide-4
SLIDE 4

The multi-objective optimization problem

Let F = (f1, ..., fm) : H → Rm locally Lipschitz, H Hilbert. Solve MIN (f1(x), ..., fm(x)) : x ∈ C ⊂ H convex.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 3/20

slide-5
SLIDE 5

The multi-objective optimization problem

Let F = (f1, ..., fm) : H → Rm locally Lipschitz, H Hilbert. Solve MIN (f1(x), ..., fm(x)) : x ∈ C ⊂ H convex. We consider the usual order(s) on Rm: a ĺ b ⇔ ai ≤ bi for all i = 1, ..., m, a ă b ⇔ ai < bi for all i = 1, ..., m.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 3/20

slide-6
SLIDE 6

The multi-objective optimization problem

Let F = (f1, ..., fm) : H → Rm locally Lipschitz, H Hilbert. Solve MIN (f1(x), ..., fm(x)) : x ∈ C ⊂ H convex. We consider the usual order(s) on Rm: a ĺ b ⇔ ai ≤ bi for all i = 1, ..., m, a ă b ⇔ ai < bi for all i = 1, ..., m. x is a Pareto point if ∄y ∈ C such that F(y) ň F(x) x is a weak Pareto point if ∄y ∈ C such that F(y) ă F(x)

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 3/20

slide-7
SLIDE 7

The multi-objective optimization problem

Let F = (f1, ..., fm) : H → Rm locally Lipschitz. Solve MIN f1(x), ..., fm(x) : x ∈ C ⊂ H convex. How to solve it?

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 4/20

slide-8
SLIDE 8

The multi-objective optimization problem

Let F = (f1, ..., fm) : H → Rm locally Lipschitz. Solve MIN f1(x), ..., fm(x) : x ∈ C ⊂ H convex. How to solve it? genetic algorithm − → no theoretical guarantees.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 4/20

slide-9
SLIDE 9

The multi-objective optimization problem

Let F = (f1, ..., fm) : H → Rm locally Lipschitz. Solve MIN f1(x), ..., fm(x) : x ∈ C ⊂ H convex. How to solve it? genetic algorithm − → no theoretical guarantees. scalarization method:

  • θ∈∆m

argmin

x∈H

fθ(x) ⊂ {weak Paretos} ⊂ {Paretos}, where ∆m is the simplex unit and fθ(x) :=

m

  • i=1

θifi(x).

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 4/20

slide-10
SLIDE 10

The multi-objective optimization problem

Let F = (f1, ..., fm) : H → Rm locally Lipschitz. Solve MIN f1(x), ..., fm(x) : x ∈ C ⊂ H convex. We are going to present a method which:

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 4/20

slide-11
SLIDE 11

The multi-objective optimization problem

Let F = (f1, ..., fm) : H → Rm locally Lipschitz. Solve MIN f1(x), ..., fm(x) : x ∈ C ⊂ H convex. We are going to present a method which: generalizes the gradient descent dynamic ˙ x(t) + ∇f (x(t)) = 0,

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 4/20

slide-12
SLIDE 12

The multi-objective optimization problem

Let F = (f1, ..., fm) : H → Rm locally Lipschitz. Solve MIN f1(x), ..., fm(x) : x ∈ C ⊂ H convex. We are going to present a method which: generalizes the gradient descent dynamic ˙ x(t) + ∇f (x(t)) = 0, is cooperative, i.e. all objective functions decrease simultaneously,

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 4/20

slide-13
SLIDE 13

The multi-objective optimization problem

Let F = (f1, ..., fm) : H → Rm locally Lipschitz. Solve MIN f1(x), ..., fm(x) : x ∈ C ⊂ H convex. We are going to present a method which: generalizes the gradient descent dynamic ˙ x(t) + ∇f (x(t)) = 0, is cooperative, i.e. all objective functions decrease simultaneously, is independent of any choice of parameters.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 4/20

slide-14
SLIDE 14

Towards a descent dynamic for multi-objective optimization

Single objective optimization: xn+1 = xn + λndn, where dn satisfies df (xn; dn) < 0 (e.g. dn = −∇f (xn)). Multi-objective optimization: Can we find dn such that dfi(xn; dn) < 0 for all i ∈ {1, ..., m} ?

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 5/20

slide-15
SLIDE 15

Towards a descent dynamic for multi-objective optimization

Historical review

Cornet (1981) s(x) := −[∇f1(x), ∇f2(x)]0 s(x), ∇fi(x) < 0 ∇f1(x) ∇f2(x)

PhD defense - Guillaume Garrigos 28/30

slide-16
SLIDE 16

Multi-objective steepest descent

Let F = (f1, ..., fm) : H − → Rm locally Lipschitz, C = H Hilbert. Definition For all x ∈ H, s(x) := − (co {∂Cfi(x)}i=1,...,m)0 is the (common) steepest descent direction at x.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 6/20

slide-17
SLIDE 17

Multi-objective steepest descent

Let F = (f1, ..., fm) : H − → Rm locally Lipschitz, C = H Hilbert. Definition For all x ∈ H, s(x) := − (co {∂Cfi(x)}i=1,...,m)0 is the (common) steepest descent direction at x. Remarks in the smooth case If m = 1 then s(x) = −∇f1(x).

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 6/20

slide-18
SLIDE 18

Multi-objective steepest descent

Let F = (f1, ..., fm) : H − → Rm locally Lipschitz, C = H Hilbert. Definition For all x ∈ H, s(x) := − (co {∂Cfi(x)}i=1,...,m)0 is the (common) steepest descent direction at x. Remarks in the smooth case If m = 1 then s(x) = −∇f1(x). At each x, s(x) selects a convex combination: s(x) = −

m

  • i=1

θi(x)∇fi(x) = −∇fθ(x)(x) where fθ(x) =

m

  • i=1

θi(x)fi.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 6/20

slide-19
SLIDE 19

Multi-objective steepest descent

Let F = (f1, ..., fm) : H − → Rm locally Lipschitz, C = H Hilbert. Definition For all x ∈ H, s(x) := − (co {∂Cfi(x)}i=1,...,m)0 is the (common) steepest descent direction at x. Remarks in the smooth case If m = 1 then s(x) = −∇f1(x). At each x, s(x) selects a convex combination: s(x) = −

m

  • i=1

θi(x)∇fi(x) = −∇fθ(x)(x) where fθ(x) =

m

  • i=1

θi(x)fi. s(x) is the steepest descent:

s(x) s(x) = argmin d∈BH

  • max

i=1,...,m ∇fi(x), d

  • .

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 6/20

slide-20
SLIDE 20

The (multi-objective) Steepest Descent dynamic

Algorithm: xn+1 = xn + λns(xn). Studied in the 2000’s by Svaiter, Fliege, Iusem, ... Continuous dynamic: (SD) ˙ x(t) = s(x(t)), i.e. (SD) ˙ x(t) + (co {∂Cfi(x(t))}i)0 = 0

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 7/20

slide-21
SLIDE 21

The (multi-objective) Steepest Descent dynamic

Example

(SD) ˙ x(t) = s(x(t)) with f1(x) = x2 and f2(x) = x1.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 8/20

slide-22
SLIDE 22

The (multi-objective) Steepest Descent dynamic

Example

(SD) ˙ x(t) = s(x(t)) with f1(x) = x2 and f2(x) = x1.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 8/20

slide-23
SLIDE 23

The (multi-objective) Steepest Descent dynamic

Example

(SD) ˙ x(t) = s(x(t)) with f1(x) = x2 and f2(x) = x1.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 8/20

slide-24
SLIDE 24

The (multi-objective) Steepest Descent dynamic

Example

(SD) ˙ x(t) = s(x(t)) with f1(x) = x2 and f2(x) = x1.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 8/20

slide-25
SLIDE 25

The (multi-objective) Steepest Descent dynamic

Example

(SD) ˙ x(t) = s(x(t)) with f1(x) = x2 and f2(x) = x1.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 8/20

slide-26
SLIDE 26

The (multi-objective) Steepest Descent dynamic

Example

(SD) ˙ x(t) = s(x(t)) with f1(x) = x2 and f2(x) = x1.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 8/20

slide-27
SLIDE 27

The (multi-objective) Steepest Descent dynamic

Example

(SD) ˙ x(t) = s(x(t)) with f1(x) = x2

1 and f2(x) = x2 2.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 9/20

slide-28
SLIDE 28

The (multi-objective) Steepest Descent dynamic

Example

(SD) ˙ x(t) = s(x(t)) with f1(x) = x2

1 and f2(x) = x2 2.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 9/20

slide-29
SLIDE 29

The (multi-objective) Steepest Descent dynamic

Example

(SD) ˙ x(t) = s(x(t)) with f1(x) = x2

1 and f2(x) = x2 2.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 9/20

slide-30
SLIDE 30

The (multi-objective) Steepest Descent dynamic

Main results (Attouch, G., Goudou, 2014)

A cooperative dynamic Let x : R+ − → H be a solution of (SD) ˙ x(t) = s(x(t)). For all i = 1, ..., m, the function t → fi(x(·)) is decreasing.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 10/20

slide-31
SLIDE 31

The (multi-objective) Steepest Descent dynamic

Main results (Attouch, G., Goudou, 2014)

A cooperative dynamic Let x : R+ − → H be a solution of (SD) ˙ x(t) = s(x(t)). For all i = 1, ..., m, the function t → fi(x(·)) is decreasing. Convergence in the convex case Assume that the objective functions are convex. Then any bounded trajectory weakly converges to a weak Pareto point.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 10/20

slide-32
SLIDE 32

The (multi-objective) Steepest Descent dynamic

Main results (Attouch, G., Goudou, 2014)

A cooperative dynamic Let x : R+ − → H be a solution of (SD) ˙ x(t) = s(x(t)). For all i = 1, ..., m, the function t → fi(x(·)) is decreasing. Convergence in the convex case Assume that the objective functions are convex. Then any bounded trajectory weakly converges to a weak Pareto point. Existence in the convex case Suppose that H is finite dimensional. Then, for any initial data, there exists a global solution to (SD).

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 10/20

slide-33
SLIDE 33

The (multi-objective) Steepest Descent dynamic

Going further

In case of convex constraint C ⊂ H: (SD) ˙ x(t) + (NC(x(t)) + co {∂Cfi(x(t))}i)0 = 0. How to discretize it properly?

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 11/20

slide-34
SLIDE 34

The (multi-objective) Steepest Descent dynamic

Going further

In case of convex constraint C ⊂ H: (SD) ˙ x(t) + (NC(x(t)) + co {∂Cfi(x(t))}i)0 = 0. How to discretize it properly? Uniqueness? Yes, if {∇fi(x(·))}i=1,...,m are affinely independants.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 11/20

slide-35
SLIDE 35

The (multi-objective) Steepest Descent dynamic

Going further

In case of convex constraint C ⊂ H: (SD) ˙ x(t) + (NC(x(t)) + co {∂Cfi(x(t))}i)0 = 0. How to discretize it properly? Uniqueness? Yes, if {∇fi(x(·))}i=1,...,m are affinely independants. Convergence to Pareto points? Guaranteed by endowing Rm with a different order (but some of the Paretos might be lost in the

  • peration).

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 11/20

slide-36
SLIDE 36

Numerical results

Recovering the Pareto front

f1(x, y) = x + y f2(x, y) = x2 + y2 + 1

x + 3e−100(x−0.3)2 + 3e−100(x−0.6)2

(x, y) ∈ C = [0.1, 1]2 Plot of F(C), F = (f1, f2) : C − → R2 .

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 12/20

slide-37
SLIDE 37

Numerical results

Recovering the Pareto front

f1(x, y) = x + y f2(x, y) = x2 + y2 + 1

x + 3e−100(x−0.3)2 + 3e−100(x−0.6)2

(x, y) ∈ C = [0.1, 1]2 Plot of F(C), F = (f1, f2) : C − → R2 and its pareto front.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 12/20

slide-38
SLIDE 38

Numerical results

Recovering the Pareto front

f1(x, y) = x + y f2(x, y) = x2 + y2 + 1

x + 3e−100(x−0.3)2 + 3e−100(x−0.6)2

(x, y) ∈ C = [0.1, 1]2 Gradient method (Right) vs Scalar method (Left). 100 samples.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 12/20

slide-39
SLIDE 39

Numerical results

Pareto selection with Tikhonov penalization

Can we select, among the weak Paretos (= the zeros of x → s(x)) the closest to a desired state?

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 13/20

slide-40
SLIDE 40

Numerical results

Pareto selection with Tikhonov penalization

Can we select, among the weak Paretos (= the zeros of x → s(x)) the closest to a desired state? → Tikhonov regularization ˙ x(t) − s(x(t)) + ε(x(t) − xd) = 0, ε > 0.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 13/20

slide-41
SLIDE 41

Numerical results

Pareto selection with Tikhonov penalization

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 13/20

slide-42
SLIDE 42

Numerical results

Pareto selection with Tikhonov penalization

Can we select, among the weak Paretos (= the zeros of x → s(x)) the closest to a desired state? → Diagonal Tikhonov regularization ˙ x(t) − s(x(t)) + ε(t)(x(t) − xd) = 0, ε(t) ↓ 0, ∞ ε(t) dt = +∞. See the works of Attouch, Cabot, Czarnecki, Peypouquet (...) in the monotone case.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 14/20

slide-43
SLIDE 43

Numerical results

Pareto selection with Tikhonov penalization

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 14/20

slide-44
SLIDE 44

Numerical results

Pareto selection with Tikhonov penalization

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 14/20

slide-45
SLIDE 45

Numerical results

Pareto selection with Tikhonov penalization

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 14/20

slide-46
SLIDE 46

What about inertial dynamics?

˙ x(t) + ∇f (x(t)) = 0 xn+1 = xn − λ∇f (xn) ¨ x(t) + γ ˙ x(t) + ∇f (x(t)) = 0 xn+1 = yn − λ∇f (yn) yn+1 = xn+1 + (1 − γ)(xn+1 − xn)

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 15/20

slide-47
SLIDE 47

What about inertial dynamics?

˙ x(t) + ∇f (x(t)) = 0 xn+1 = xn − λ∇f (xn) ¨ x(t) + γ ˙ x(t) + ∇f (x(t)) = 0 xn+1 = yn − λ∇f (yn) yn+1 = xn+1 + (1 − γ)(xn+1 − xn)

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 15/20

slide-48
SLIDE 48

What about inertial dynamics?

˙ x(t) + ∇f (x(t)) = 0 ¨ x(t) + γ ˙ x(t) + ∇f (x(t)) = 0 Inertia promotes Faster trajectories (varying γ), Exploratory properties.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 15/20

slide-49
SLIDE 49

Convergence rates : empirical observation

f1(x)= 10

  • i=1

x2

i −10cos(2πxi)+10

1

4 , f2(x)=

10

  • i=1

(xi−1.5)2−10cos(2π(xi−1.5))+10 1

4

Convergence rate of F(xn) − F(x∞)∞: Steepest Descent vs Inertial Steepest Descent

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 16/20

slide-50
SLIDE 50

Inertial (multi-objective) Steepest Descent

Let f1, ..., fm be smooth, with L-Lipschitz gradient. (ISD) ¨ x(t) = −γ ˙ x(t) + s(x(t)).

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 17/20

slide-51
SLIDE 51

Inertial (multi-objective) Steepest Descent

Let f1, ..., fm be smooth, with L-Lipschitz gradient. (ISD) ¨ x(t) = −γ ˙ x(t) + s(x(t)). Example: f1(x) = x2 and f2(x) = x1.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 17/20

slide-52
SLIDE 52

Inertial (multi-objective) Steepest Descent

Main results (Attouch, G., 2015)

Let f1, ..., fm be smooth, with L-Lipschitz gradient. (ISD) m¨ x(t) = −γ ˙ x(t) + s(x(t)). Assume that γ ≥ L.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 18/20

slide-53
SLIDE 53

Inertial (multi-objective) Steepest Descent

Main results (Attouch, G., 2015)

Let f1, ..., fm be smooth, with L-Lipschitz gradient. (ISD) m¨ x(t) = −γ ˙ x(t) + s(x(t)). Assume that γ ≥ L. Existence Suppose that H is finite dimensional. Then, for any initial data, there exists a global solution to (ISD).

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 18/20

slide-54
SLIDE 54

Inertial (multi-objective) Steepest Descent

Main results (Attouch, G., 2015)

Let f1, ..., fm be smooth, with L-Lipschitz gradient. (ISD) m¨ x(t) = −γ ˙ x(t) + s(x(t)). Assume that γ ≥ L. Existence Suppose that H is finite dimensional. Then, for any initial data, there exists a global solution to (ISD). Convergence in the convex case Let f1, ..., fm be convex. Then, any bounded trajectory weakly converges to a weak Pareto point.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 18/20

slide-55
SLIDE 55

Conclusion

The steepest descent provides a flexible tool once adapted to multi-objective optimization problems.

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 19/20

slide-56
SLIDE 56

Conclusion

The steepest descent provides a flexible tool once adapted to multi-objective optimization problems. Open questions: Understand the asymptotic behaviour of ˙ x(t) − s(x(t)) + ε(t)x(t) = 0 (the set of weak Paretos is non-convex).

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 19/20

slide-57
SLIDE 57

Conclusion

The steepest descent provides a flexible tool once adapted to multi-objective optimization problems. Open questions: Understand the asymptotic behaviour of ˙ x(t) − s(x(t)) + ε(t)x(t) = 0 (the set of weak Paretos is non-convex). Having convergence rates for first and second-order dynamics (the critical values are not unique).

Journées SMAI-MODE 2016 - Toulouse - Guillaume Garrigos 19/20

slide-58
SLIDE 58

Thank you for your attention !