ALADINAn Algorithm for Distributed Non-Convex Optimization and - - PowerPoint PPT Presentation

aladin an algorithm for distributed non convex
SMART_READER_LITE
LIVE PREVIEW

ALADINAn Algorithm for Distributed Non-Convex Optimization and - - PowerPoint PPT Presentation

ALADINAn Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of Magdeburg, University of Freiburg 1


slide-1
SLIDE 1

ALADIN—An Algorithm for Distributed Non-Convex Optimization and Control

Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl

ShanghaiTech University, University of Magdeburg, University of Freiburg

1

slide-2
SLIDE 2

Motivation: sensor network localization

Decoupled case: each sensor takes measurement ηi of its position χi and solves ∀i ∈ {1, . . . , 7}, min

χi χi − ηi2 2 .

2

slide-3
SLIDE 3

Motivation: sensor network localization

Coupled case: sensors additionally measure the distance to their neighbors min

χ 7

  • i=1
  • χi − ηi2

2 + (χi − χi+12 − ¯

ηi)2 with χ8 = χ1

3

slide-4
SLIDE 4

Motivation: sensor network localization

Equivalent formulation: set x1 = (χ1, ζ1) with ζ1 = χ2, set x2 = (χ2, ζ2) with ζ2 = χ3, and so on

4

slide-5
SLIDE 5

Motivation: sensor network localization

Equivalent formulation: set x1 = (χ1, ζ1) with ζ1 = χ2, set x2 = (χ2, ζ2) with ζ2 = χ3, and so on

5

slide-6
SLIDE 6

Motivation: sensor network localization

Equivalent formulation: set x1 = (χ1, ζ1) with ζ1 = χ2, set x2 = (χ2, ζ2) with ζ2 = χ3, and so on

6

slide-7
SLIDE 7

Motivation: sensor network localization

Equivalent formulation (cont.): new variables xi = (χi, ζi) separable non-convex objectives fi(xi) = 1 2χi − ηi2

2 + 1

2ζi − ηi+12

2 + 1

2 (χi − ζi2 − ¯ ηi)2 affine coupling, ζi = χi+1, can be written as

7

  • i=1

Aixi = 0 .

7

slide-8
SLIDE 8

Motivation: sensor network localization

Equivalent formulation (cont.): new variables xi = (χi, ζi) separable non-convex objectives fi(xi) = 1 2χi − ηi2

2 + 1

2ζi − ηi+12

2 + 1

2 (χi − ζi2 − ¯ ηi)2 affine coupling, ζi = χi+1, can be written as

7

  • i=1

Aixi = 0 .

8

slide-9
SLIDE 9

Motivation: sensor network localization

Equivalent formulation (cont.): new variables xi = (χi, ζi) separable non-convex objectives fi(xi) = 1 2χi − ηi2

2 + 1

2ζi − ηi+12

2 + 1

2 (χi − ζi2 − ¯ ηi)2 affine coupling, ζi = χi+1, can be written as

7

  • i=1

Aixi = 0 .

9

slide-10
SLIDE 10

Motivation: sensor network localization

Optimization problem: min

x 7

  • i=1

fi(xi) s.t.

7

  • i=1

Aixi = 0 .

10

slide-11
SLIDE 11

Aim of distributed optimization algorithms

Find local minimizers of min

x N

  • i=1

fi(xi) s.t.

N

  • i=1

Aixi = b Functions fi : Rn → R potentially non-convex. Matrices Ai ∈ Rm×n and vectors b ∈ Rm given. Problem: N is large.

11

slide-12
SLIDE 12

Aim of distributed optimization algorithms

Find local minimizers of min

x N

  • i=1

fi(xi) s.t.

N

  • i=1

Aixi = b Functions fi : Rn → R potentially non-convex. Matrices Ai ∈ Rm×n and vectors b ∈ Rm given. Problem: N is large.

12

slide-13
SLIDE 13

Aim of distributed optimization algorithms

Find local minimizers of min

x N

  • i=1

fi(xi) s.t.

N

  • i=1

Aixi = b Functions fi : Rn → R potentially non-convex. Matrices Ai ∈ Rm×n and vectors b ∈ Rm given. Problem: N is large.

13

slide-14
SLIDE 14

Aim of distributed optimization algorithms

Find local minimizers of min

x N

  • i=1

fi(xi) s.t.

N

  • i=1

Aixi = b Functions fi : Rn → R potentially non-convex. Matrices Ai ∈ Rm×n and vectors b ∈ Rm given. Problem: N is large.

14

slide-15
SLIDE 15

Overview

  • Theory
  • Distributed optimization algorithms
  • ALADIN
  • Applications
  • Sensor network localization
  • MPC with long horizons

15

slide-16
SLIDE 16

Distributed optimization problem

Find local minimizers of min

x N

  • i=1

fi(xi) s.t.

N

  • i=1

Aixi = b . Functions fi : Rn → R potentially non-convex. Matrices Ai ∈ Rm×n and vectors b ∈ Rm given. Problem: N is large.

16

slide-17
SLIDE 17

Dual decomposition

Main idea: solve dual problem max

λ

d(λ) with d(λ) = min

x N

  • i=1

{fi(xi) + λTAixi} − λTb Evaluation of d can be parallelized. Applicable if fis are (strictly) convex For non-convex fi: duality gap possible

  • H. Everett. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources, 1963.

17

slide-18
SLIDE 18

Dual decomposition

Main idea: solve dual problem max

λ

d(λ) with d(λ) =

N

  • i=1

min

xi {fi(xi) + λTAixi} − λTb

Evaluation of d can be parallelized. Applicable if fis are (strictly) convex For non-convex fi: duality gap possible

  • H. Everett. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources, 1963.

18

slide-19
SLIDE 19

Dual decomposition

Main idea: solve dual problem max

λ

d(λ) with d(λ) =

N

  • i=1

min

xi {fi(xi) + λTAixi} − λTb

Evaluation of d can be parallelized. Applicable if fis are (strictly) convex For non-convex fi: duality gap possible

  • H. Everett. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources, 1963.

19

slide-20
SLIDE 20

Dual decomposition

Main idea: solve dual problem max

λ

d(λ) with d(λ) =

N

  • i=1

min

xi {fi(xi) + λTAixi} − λTb

Evaluation of d can be parallelized. Applicable if fis are (strictly) convex For non-convex fi: duality gap possible

  • H. Everett. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources, 1963.

20

slide-21
SLIDE 21

ADMM (consensus variant)

Alternating Direction Method of Multipliers

Input: Initial guesses xi ∈ Rn and λi ∈ Rm; ρ > 0, ǫ > 0. Repeat:

  • 1. Solve decoupled NLPs

min

yi

fi(yi) + λT

i Aiyi + ρ

2 Ai(yi − xi)2

2 .

  • 2. Implement dual gradient steps λ+

i = λi + ρAi(yi − xi).

  • 3. Solve coupled QP

min

x+ N

  • i=1

ρ 2

  • Ai(yi − x+

i )

  • 2

2 − (λ+ i )TAix+ i

  • s.t.

N

  • i=1

Aix+

i

= b .

  • 4. Update the iterates x ← x+ and λ ← λ+.
  • D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximations, 1976.

21

slide-22
SLIDE 22

ADMM (consensus variant)

Alternating Direction Method of Multipliers

Input: Initial guesses xi ∈ Rn and λi ∈ Rm; ρ > 0, ǫ > 0. Repeat:

  • 1. Solve decoupled NLPs

min

yi

fi(yi) + λT

i Aiyi + ρ

2 Ai(yi − xi)2

2 .

  • 2. Implement dual gradient steps λ+

i = λi + ρAi(yi − xi).

  • 3. Solve coupled QP

min

x+ N

  • i=1

ρ 2

  • Ai(yi − x+

i )

  • 2

2 − (λ+ i )TAix+ i

  • s.t.

N

  • i=1

Aix+

i

= b .

  • 4. Update the iterates x ← x+ and λ ← λ+.
  • D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximations, 1976.

22

slide-23
SLIDE 23

ADMM (consensus variant)

Alternating Direction Method of Multipliers

Input: Initial guesses xi ∈ Rn and λi ∈ Rm; ρ > 0, ǫ > 0. Repeat:

  • 1. Solve decoupled NLPs

min

yi

fi(yi) + λT

i Aiyi + ρ

2 Ai(yi − xi)2

2 .

  • 2. Implement dual gradient steps λ+

i = λi + ρAi(yi − xi).

  • 3. Solve coupled QP

min

x+ N

  • i=1

ρ 2

  • Ai(yi − x+

i )

  • 2

2 − (λ+ i )TAix+ i

  • s.t.

N

  • i=1

Aix+

i

= b .

  • 4. Update the iterates x ← x+ and λ ← λ+.
  • D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximations, 1976.

23

slide-24
SLIDE 24

ADMM (consensus variant)

Alternating Direction Method of Multipliers

Input: Initial guesses xi ∈ Rn and λi ∈ Rm; ρ > 0, ǫ > 0. Repeat:

  • 1. Solve decoupled NLPs

min

yi

fi(yi) + λT

i Aiyi + ρ

2 Ai(yi − xi)2

2 .

  • 2. Implement dual gradient steps λ+

i = λi + ρAi(yi − xi).

  • 3. Solve coupled QP

min

x+ N

  • i=1

ρ 2

  • Ai(yi − x+

i )

  • 2

2 − (λ+ i )TAix+ i

  • s.t.

N

  • i=1

Aix+

i

= b .

  • 4. Update the iterates x ← x+ and λ ← λ+.
  • D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximations, 1976.

24

slide-25
SLIDE 25

ADMM (consensus variant)

Alternating Direction Method of Multipliers

Input: Initial guesses xi ∈ Rn and λi ∈ Rm; ρ > 0, ǫ > 0. Repeat:

  • 1. Solve decoupled NLPs

min

yi

fi(yi) + λT

i Aiyi + ρ

2 Ai(yi − xi)2

2 .

  • 2. Implement dual gradient steps λ+

i = λi + ρAi(yi − xi).

  • 3. Solve coupled QP

min

x+ N

  • i=1

ρ 2

  • Ai(yi − x+

i )

  • 2

2 − (λ+ i )TAix+ i

  • s.t.

N

  • i=1

Aix+

i

= b .

  • 4. Update the iterates x ← x+ and λ ← λ+.
  • D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximations, 1976.

25

slide-26
SLIDE 26

Limitations of ADMM

1) Convergence rate of ADMM is very scaling dependent. 2) ADMM may be divergent, if fis are nonconvex. Example: min

x

x1 · x2 s.t. x1 − x2 = 0 .

unique and regular minimizer at x∗

1 = x∗ 2 = λ∗ = 0.

For ρ = 3

4 all sub-problems are strictly convex.

ADMM is divergent; λ+ = −2λ.

This talk: addresses Problem 2), mitigates Problem 1)

26

slide-27
SLIDE 27

Limitations of ADMM

1) Convergence rate of ADMM is very scaling dependent. 2) ADMM may be divergent, if fis are nonconvex. Example: min

x

x1 · x2 s.t. x1 − x2 = 0 .

unique and regular minimizer at x∗

1 = x∗ 2 = λ∗ = 0.

For ρ = 3

4 all sub-problems are strictly convex.

ADMM is divergent; λ+ = −2λ.

This talk: addresses Problem 2), mitigates Problem 1)

27

slide-28
SLIDE 28

Limitations of ADMM

1) Convergence rate of ADMM is very scaling dependent. 2) ADMM may be divergent, if fis are nonconvex. Example: min

x

x1 · x2 s.t. x1 − x2 = 0 .

unique and regular minimizer at x∗

1 = x∗ 2 = λ∗ = 0.

For ρ = 3

4 all sub-problems are strictly convex.

ADMM is divergent; λ+ = −2λ.

This talk: addresses Problem 2), mitigates Problem 1)

28

slide-29
SLIDE 29

Limitations of ADMM

1) Convergence rate of ADMM is very scaling dependent. 2) ADMM may be divergent, if fis are nonconvex. Example: min

x

x1 · x2 s.t. x1 − x2 = 0 .

unique and regular minimizer at x∗

1 = x∗ 2 = λ∗ = 0.

For ρ = 3

4 all sub-problems are strictly convex.

ADMM is divergent; λ+ = −2λ.

This talk: addresses Problem 2), mitigates Problem 1)

29

slide-30
SLIDE 30

Limitations of ADMM

1) Convergence rate of ADMM is very scaling dependent. 2) ADMM may be divergent, if fis are nonconvex. Example: min

x

x1 · x2 s.t. x1 − x2 = 0 .

unique and regular minimizer at x∗

1 = x∗ 2 = λ∗ = 0.

For ρ = 3

4 all sub-problems are strictly convex.

ADMM is divergent; λ+ = −2λ.

This talk: addresses Problem 2), mitigates Problem 1)

30

slide-31
SLIDE 31

Limitations of ADMM

1) Convergence rate of ADMM is very scaling dependent. 2) ADMM may be divergent, if fis are nonconvex. Example: min

x

x1 · x2 s.t. x1 − x2 = 0 .

unique and regular minimizer at x∗

1 = x∗ 2 = λ∗ = 0.

For ρ = 3

4 all sub-problems are strictly convex.

ADMM is divergent; λ+ = −2λ.

This talk: addresses Problem 2), mitigates Problem 1)

31

slide-32
SLIDE 32

Overview

  • Theory
  • Distributed optimization algorithms
  • ALADIN
  • Applications
  • Sensor network localization
  • MPC with long horizons

32

slide-33
SLIDE 33

ALADIN (full step variant)

Augmented Lagrangian based Alternating Direction Inexact Newton Method

Input: Initial guesses xi ∈ Rn and λ ∈ Rm, ρ > 0, ǫ > 0. Repeat:

  • 1. Solve decoupled NLPs

min

yi

fi(yi) + λTAiyi + ρ 2 yi − xi2

Σi .

  • 2. Compute gi = ∇fi(yi), choose Hi ≈ ∇2fi(yi), and solve coupled QP

min

∆y N

  • i=1

1 2∆yT

i Hi∆yi + gT i ∆yi

  • s.t.

N

  • i=1

Ai(yi + ∆yi) = b | λ+

  • 3. Set x ← y + ∆y and λ ← λ+.

33

slide-34
SLIDE 34

ALADIN (full step variant)

Augmented Lagrangian based Alternating Direction Inexact Newton Method

Input: Initial guesses xi ∈ Rn and λ ∈ Rm, ρ > 0, ǫ > 0. Repeat:

  • 1. Solve decoupled NLPs

min

yi

fi(yi) + λTAiyi + ρ 2 yi − xi2

Σi .

  • 2. Compute gi = ∇fi(yi), choose Hi ≈ ∇2fi(yi), and solve coupled QP

min

∆y N

  • i=1

1 2∆yT

i Hi∆yi + gT i ∆yi

  • s.t.

N

  • i=1

Ai(yi + ∆yi) = b | λ+

  • 3. Set x ← y + ∆y and λ ← λ+.

34

slide-35
SLIDE 35

ALADIN (full step variant)

Augmented Lagrangian based Alternating Direction Inexact Newton Method

Input: Initial guesses xi ∈ Rn and λ ∈ Rm, ρ > 0, ǫ > 0. Repeat:

  • 1. Solve decoupled NLPs

min

yi

fi(yi) + λTAiyi + ρ 2 yi − xi2

Σi .

  • 2. Compute gi = ∇fi(yi), choose Hi ≈ ∇2fi(yi), and solve coupled QP

min

∆y N

  • i=1

1 2∆yT

i Hi∆yi + gT i ∆yi

  • s.t.

N

  • i=1

Ai(yi + ∆yi) = b | λ+

  • 3. Set x ← y + ∆y and λ ← λ+.

35

slide-36
SLIDE 36

ALADIN (full step variant)

Augmented Lagrangian based Alternating Direction Inexact Newton Method

Input: Initial guesses xi ∈ Rn and λ ∈ Rm, ρ > 0, ǫ > 0. Repeat:

  • 1. Solve decoupled NLPs

min

yi

fi(yi) + λTAiyi + ρ 2 yi − xi2

Σi .

  • 2. Compute gi = ∇fi(yi), choose Hi ≈ ∇2fi(yi), and solve coupled QP

min

∆y N

  • i=1

1 2∆yT

i Hi∆yi + gT i ∆yi

  • s.t.

N

  • i=1

Ai(yi + ∆yi) = b | λ+

  • 3. Set x ← y + ∆y and λ ← λ+.

36

slide-37
SLIDE 37

Special cases

For ρ → ∞: ALADIN ≡ SQP For 0 < ρ < ∞: If Hi = ρAT

i Ai, Σi = AT i Ai, then

ALADIN ≡ ADMM For ρ = 0: ALADIN ≡ Dual Decomposition (+ Inexact Newton)

37

slide-38
SLIDE 38

Special cases

For ρ → ∞: ALADIN ≡ SQP For 0 < ρ < ∞: If Hi = ρAT

i Ai, Σi = AT i Ai, then

ALADIN ≡ ADMM For ρ = 0: ALADIN ≡ Dual Decomposition (+ Inexact Newton)

38

slide-39
SLIDE 39

Special cases

For ρ → ∞: ALADIN ≡ SQP For 0 < ρ < ∞: If Hi = ρAT

i Ai, Σi = AT i Ai, then

ALADIN ≡ ADMM For ρ = 0: ALADIN ≡ Dual Decomposition (+ Inexact Newton)

39

slide-40
SLIDE 40

Local convergence

Assumptions fis are twice continuously differentiable minimizer (x∗, λ∗) regular KKT point (LICQ + SOSC satisfied) ρ satisfies ∇2fi(yi) + ρΣi ≻ 0 Theorem Full-step variant of ALADIN converges locally

  • 1. with quadratic convergence rate, if Hi = ∇2fi(yi) + O(yi − x∗)
  • 2. with linear converges rate, if Hi − ∇2fi(yi) is sufficiently small

40

slide-41
SLIDE 41

Local convergence

Assumptions fis are twice continuously differentiable minimizer (x∗, λ∗) regular KKT point (LICQ + SOSC satisfied) ρ satisfies ∇2fi(yi) + ρΣi ≻ 0 Theorem Full-step variant of ALADIN converges locally

  • 1. with quadratic convergence rate, if Hi = ∇2fi(yi) + O(yi − x∗)
  • 2. with linear converges rate, if Hi − ∇2fi(yi) is sufficiently small

41

slide-42
SLIDE 42

Globalization

Definition (L1-penalty function) We say that x+ is a descent step if Φ(x+) < Φ(x) for Φ(x) =

N

  • i=1

fi(xi) + λ

  • N
  • i=1

Aixi − b

  • 1

, λ sufficiently large.

42

slide-43
SLIDE 43

Globalization

Rough sketch: As long as ∇2fi(xi) + ρΣi ≻ 0 the proximal objectives ˜ fi(yi) = fi(yi) + ρ 2 yi − xi2

Σi

are strictly convex in a neighborhood of the current primal iterates xi. If we don’t update x, y can be enforced to converge to the solution z

  • f the convex auxiliary problem

min

z N

  • i=1

˜ fi(zi) s.t.

N

  • i=1

Aizi = b . Strategy: if x+ is not a descent direction, we skip the primal update until y is a descent and set x+ = y (similar to proximal methods) This strategy leads to a globalization routine for ALADIN if Hi ≻ 0.

43

slide-44
SLIDE 44

Globalization

Rough sketch: As long as ∇2fi(xi) + ρΣi ≻ 0 the proximal objectives ˜ fi(yi) = fi(yi) + ρ 2 yi − xi2

Σi

are strictly convex in a neighborhood of the current primal iterates xi. If we don’t update x, y can be enforced to converge to the solution z

  • f the convex auxiliary problem

min

z N

  • i=1

˜ fi(zi) s.t.

N

  • i=1

Aizi = b . Strategy: if x+ is not a descent direction, we skip the primal update until y is a descent and set x+ = y (similar to proximal methods) This strategy leads to a globalization routine for ALADIN if Hi ≻ 0.

44

slide-45
SLIDE 45

Globalization

Rough sketch: As long as ∇2fi(xi) + ρΣi ≻ 0 the proximal objectives ˜ fi(yi) = fi(yi) + ρ 2 yi − xi2

Σi

are strictly convex in a neighborhood of the current primal iterates xi. If we don’t update x, y can be enforced to converge to the solution z

  • f the convex auxiliary problem

min

z N

  • i=1

˜ fi(zi) s.t.

N

  • i=1

Aizi = b . Strategy: if x+ is not a descent direction, we skip the primal update until y is a descent and set x+ = y (similar to proximal methods) This strategy leads to a globalization routine for ALADIN if Hi ≻ 0.

45

slide-46
SLIDE 46

Globalization

Rough sketch: As long as ∇2fi(xi) + ρΣi ≻ 0 the proximal objectives ˜ fi(yi) = fi(yi) + ρ 2 yi − xi2

Σi

are strictly convex in a neighborhood of the current primal iterates xi. If we don’t update x, y can be enforced to converge to the solution z

  • f the convex auxiliary problem

min

z N

  • i=1

˜ fi(zi) s.t.

N

  • i=1

Aizi = b . Strategy: if x+ is not a descent direction, we skip the primal update until y is a descent and set x+ = y (similar to proximal methods) This strategy leads to a globalization routine for ALADIN if Hi ≻ 0.

46

slide-47
SLIDE 47

Inequalities

Distributed NLP with inequalities: min

x N

  • i=1

fi(xi) s.t.    N

i=1 Aixi = b

hi(xi) ≤ 0 . Functions fi : Rn → R and hi : Rn → Rnh potentially non-convex. Matrices Ai ∈ Rm×n and vectors b ∈ Rm given. Problem: N is large.

47

slide-48
SLIDE 48

ALADIN (with inequalities)

Augmented Lagrangian based Alternating Direction Inexact Newton Method

Input: Initial guesses xi ∈ Rn and λ ∈ Rm, ρ > 0, ǫ > 0. Repeat:

  • 1. Solve decoupled NLPs

min

yi

fi(yi) + λTAiyi + ρ 2 yi − xi2

Σi

s.t. hi(yi) ≤ 0 | κi .

  • 2. Set gi = ∇fi(yi)+∇hi(yi)κi, Hi ≈ ∇2

fi(yi)+κT

i hi(yi)

  • , solve

min

∆y N

  • i=1

1 2∆yT

i Hi∆yi + gT i ∆yi

  • s.t.

N

  • i=1

Ai(yi + ∆yi) = b | λ+

  • 3. Set x ← y + ∆y and λ ← λ+.

48

slide-49
SLIDE 49

ALADIN (with inequalities)

Augmented Lagrangian based Alternating Direction Inexact Newton Method

Input: Initial guesses xi ∈ Rn and λ ∈ Rm, ρ > 0, ǫ > 0. Repeat:

  • 1. Solve decoupled NLPs

min

yi

fi(yi) + λTAiyi + ρ 2 yi − xi2

Σi

s.t. hi(yi) ≤ 0 | κi .

  • 2. Set gi = ∇fi(yi)+∇hi(yi)κi, Hi ≈ ∇2

fi(yi)+κT

i hi(yi)

  • , solve

min

∆y N

  • i=1

1 2∆yT

i Hi∆yi + gT i ∆yi

  • s.t.

N

  • i=1

Ai(yi + ∆yi) = b | λ+

  • 3. Set x ← y + ∆y and λ ← λ+.

49

slide-50
SLIDE 50

ALADIN (with inequalities)

Augmented Lagrangian based Alternating Direction Inexact Newton Method

Input: Initial guesses xi ∈ Rn and λ ∈ Rm, ρ > 0, ǫ > 0. Repeat:

  • 1. Solve decoupled NLPs

min

yi

fi(yi) + λTAiyi + ρ 2 yi − xi2

Σi

s.t. hi(yi) ≤ 0 | κi .

  • 2. Set gi = ∇fi(yi)+∇hi(yi)κi, Hi ≈ ∇2

fi(yi)+κT

i hi(yi)

  • , solve

min

∆y N

  • i=1

1 2∆yT

i Hi∆yi + gT i ∆yi

  • s.t.

N

  • i=1

Ai(yi + ∆yi) = b | λ+

  • 3. Set x ← y + ∆y and λ ← λ+.

50

slide-51
SLIDE 51

ALADIN (with inequalities)

Augmented Lagrangian based Alternating Direction Inexact Newton Method

Input: Initial guesses xi ∈ Rn and λ ∈ Rm, ρ > 0, ǫ > 0. Repeat:

  • 1. Solve decoupled NLPs

min

yi

fi(yi) + λTAiyi + ρ 2 yi − xi2

Σi

s.t. hi(yi) ≤ 0 | κi .

  • 2. Set gi = ∇fi(yi)+∇hi(yi)κi, Hi ≈ ∇2

fi(yi)+κT

i hi(yi)

  • , solve

min

∆y N

  • i=1

1 2∆yT

i Hi∆yi + gT i ∆yi

  • s.t.

N

  • i=1

Ai(yi + ∆yi) = b | λ+

  • 3. Set x ← y + ∆y and λ ← λ+.

51

slide-52
SLIDE 52

ALADIN (with inequalities)

Augmented Lagrangian based Alternating Direction Inexact Newton Method

Remarks: If approximation Ci ≈ C ∗

i = ∇hi(yi) is available, solve QP

min∆y,s N

i=1

1

2∆yT i Hi∆yi + gT i ∆yi

  • + λTs + µ

2 s2 2

s.t.      N

i=1 Ai (yi + ∆yi) = b + s

  • λQP

Ci∆yi = 0 , i ∈ {1, . . . , N} . with gi = ∇fi(yi) + (C ∗

i − Ci)Tκi and µ > 0 instead.

If Hi and Ci constant, pre-compute matrix decompositions

52

slide-53
SLIDE 53

ALADIN (with inequalities)

Augmented Lagrangian based Alternating Direction Inexact Newton Method

Remarks: If approximation Ci ≈ C ∗

i = ∇hi(yi) is available, solve QP

min∆y,s N

i=1

1

2∆yT i Hi∆yi + gT i ∆yi

  • + λTs + µ

2 s2 2

s.t.      N

i=1 Ai (yi + ∆yi) = b + s

  • λQP

Ci∆yi = 0 , i ∈ {1, . . . , N} . with gi = ∇fi(yi) + (C ∗

i − Ci)Tκi and µ > 0 instead.

If Hi and Ci constant, pre-compute matrix decompositions

53

slide-54
SLIDE 54

Overview

  • Theory
  • Distributed Optimization Algorithms
  • ALADIN
  • Applications
  • Sensor network localization
  • MPC with long horizons

54

slide-55
SLIDE 55

Sensor network localization

Case study: 25000 sensors measure positions and distances in a graph additional inequality constraints, χi − ηi2 ≤ ¯ σ, remove outliers.

55

slide-56
SLIDE 56

ALADIN versus SQP

105 primal and 7.5 ∗ 104 dual optimization variables implementation in JULIA

56

slide-57
SLIDE 57

Overview

  • Theory
  • Distributed Optimization Algorithms
  • ALADIN
  • Applications
  • Sensor network localization
  • MPC with long horizons

57

slide-58
SLIDE 58

Nonlinear MPC

Repeat: Wait for state measurement ˆ x. Solve min

x,u

m−1

i=0 l(xi, ui) + M(xm)

s.t.            xi+1 = f (xi, ui) x0 = ˆ x (xi, ui) ∈ X × U , Send u0 to the process.

58

slide-59
SLIDE 59

ACADO Toolkit

59

slide-60
SLIDE 60

MPC Benchmark

Nonlinear Model, 4 States, 2 Controls Run-time of ACADO code generation (not distributed)

CPU time Percentage Integration & sensitivities 117 µs 65 % QP (Condensing + qpOASES) 59 µs 33 % Complete real-time iteration 181 µs 100 %

60

slide-61
SLIDE 61

MPC Benchmark

Nonlinear Model, 4 States, 2 Controls Run-time of ACADO code generation (not distributed)

CPU time Percentage Integration & sensitivities 117 µs 65 % QP (Condensing + qpOASES) 59 µs 33 % Complete real-time iteration 181 µs 100 %

61

slide-62
SLIDE 62

MPC with ALADIN

ALADIN Step 1: solve decoupled NLPs choose a short horizon n = m

N and solve

min

y,v

Ψj(yj

0) + n−1 i=0 l(yj i, vj i) + Φj+1(yj n)

s.t.    yj

i+1 = f (yj i, vj i) , i = 0, ..., n − 1

yj

i ∈ X , vj i ∈ U .

Arrival and end costs depend on ALADIN iterates, Ψ0(y) = I(ˆ x, y) ΦN(y) = M(y) Ψj(y) = −λT

j y + ρ 2 y − zj2 P

Ψj(y) = λT

j y + ρ 2 y − zj2 P

62

slide-63
SLIDE 63

MPC with ALADIN

ALADIN Step 1: solve decoupled NLPs choose a short horizon n = m

N and solve

min

y,v

Ψj(yj

0) + n−1 i=0 l(yj i, vj i) + Φj+1(yj n)

s.t.    yj

i+1 = f (yj i, vj i) , i = 0, ..., n − 1

yj

i ∈ X , vj i ∈ U .

Arrival and end costs depend on ALADIN iterates, Ψ0(y) = I(ˆ x, y) ΦN(y) = M(y) Ψj(y) = −λT

j y + ρ 2 y − zj2 P

Ψj(y) = λT

j y + ρ 2 y − zj2 P

63

slide-64
SLIDE 64

MPC with ALADIN

ALADIN Step 2: solve coupled QP As we have no constraints, QP = LQR problem solve all matrix-valued Ricatti equations offline solve online QP online by backward-forward sweep Code export export all online operations as optimized C-code NLP solver: explicit MPC (rough heuristic)

  • ne ALADIN iteration per sampling time, skip globalization

64

slide-65
SLIDE 65

MPC with ALADIN

ALADIN Step 2: solve coupled QP As we have no constraints, QP = LQR problem solve all matrix-valued Ricatti equations offline solve online QP online by backward-forward sweep Code export export all online operations as optimized C-code NLP solver: explicit MPC (rough heuristic)

  • ne ALADIN iteration per sampling time, skip globalization

65

slide-66
SLIDE 66

ALADIN + code generation (results by Yuning Jiang)

Same nonlinear model as before, n = 2 Run-time of real-time ALADIN, 10 processors

CPU time Percentage Parallel explicit MPC 6 µs 54 % QP sweeps 3 µs 27 % communication overhead 2 µs 18 %

66

slide-67
SLIDE 67

ALADIN + code generation (results by Yuning Jiang)

Same nonlinear model as before, n = 2 Run-time of real-time ALADIN, 10 processors

CPU time Percentage Parallel explicit MPC 6 µs 54 % QP sweeps 3 µs 27 % communication overhead 2 µs 18 %

67

slide-68
SLIDE 68

Conclusions

ALADIN Theory can solve nonconvex distributed optimization problems, min

x N

  • i=1

fi(xi) s.t.

N

  • i=1

Aixi = b , to local optimality. contains SQP, ADMM, and Dual Decomposition as special case local convergence analysis similar to SQP; globalization possible

68

slide-69
SLIDE 69

Conclusions

ALADIN Theory can solve nonconvex distributed optimization problems, min

x N

  • i=1

fi(xi) s.t.

N

  • i=1

Aixi = b , to local optimality. contains SQP, ADMM, and Dual Decomposition as special case local convergence analysis similar to SQP; globalization possible

69

slide-70
SLIDE 70

Conclusions

ALADIN Theory can solve nonconvex distributed optimization problems, min

x N

  • i=1

fi(xi) s.t.

N

  • i=1

Aixi = b , to local optimality. contains SQP, ADMM, and Dual Decomposition as special case local convergence analysis similar to SQP; globalization possible

70

slide-71
SLIDE 71

Conclusions

ALADIN Applications large-scale distributed sensor network: ALADIN outperforms SQP and ADMM small-scale embedded MPC: ALADIN can be used to alternate between explicit & online MPC

71

slide-72
SLIDE 72

Conclusions

ALADIN Applications large-scale distributed sensor network: ALADIN outperforms SQP and ADMM small-scale embedded MPC: ALADIN can be used to alternate between explicit & online MPC

72

slide-73
SLIDE 73

References

  • B. Houska, J. Frasch, M. Diehl. An Augmented Lagrangian Based Algorithm for

Distributed Non-Convex Optimization. SIOPT, 2016.

  • D. Kouzoupis, R. Quirynen, B. Houska, M. Diehl. A block based ALADIN scheme

for highly parallelizable direct optimal control. ACC, 2016.

73