Xiaoming Yuan Hong Kong Baptist University September 02, 2014 - - PowerPoint PPT Presentation

xiaoming yuan
SMART_READER_LITE
LIVE PREVIEW

Xiaoming Yuan Hong Kong Baptist University September 02, 2014 - - PowerPoint PPT Presentation

Accuracy v.s. Implementability in Algorithmic Design - An Example of Operator Splitting Methods for Convex Optimization Xiaoming Yuan Hong Kong Baptist University September 02, 2014 Outline Backgrounds 1 Accuracy v.s. Implementability


slide-1
SLIDE 1

Accuracy v.s. Implementability in Algorithmic Design —- An Example of Operator Splitting Methods for Convex Optimization

Xiaoming Yuan

Hong Kong Baptist University

September 02, 2014

slide-2
SLIDE 2

Outline

1

Backgrounds

2

Accuracy v.s. Implementability – An Easier Case

3

Accuracy v.s. Implementability – A More Complicated Case

4

Conclusions

slide-3
SLIDE 3

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Outline

1

Backgrounds

2

Accuracy v.s. Implementability – An Easier Case

3

Accuracy v.s. Implementability – A More Complicated Case

4

Conclusions

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 3 / 37

slide-4
SLIDE 4

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

What Do I Want to Say?

Accuracy:

The fidelity to the original model.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 4 / 37

slide-5
SLIDE 5

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

What Do I Want to Say?

Accuracy:

The fidelity to the original model. Able to solve a subproblem EXACTLY.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 4 / 37

slide-6
SLIDE 6

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

What Do I Want to Say?

Accuracy:

The fidelity to the original model. Able to solve a subproblem EXACTLY. Maintain the convergence (or faster convergence) of an algorithm.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 4 / 37

slide-7
SLIDE 7

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

What Do I Want to Say?

Accuracy:

The fidelity to the original model. Able to solve a subproblem EXACTLY. Maintain the convergence (or faster convergence) of an algorithm.

Implementability:

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 4 / 37

slide-8
SLIDE 8

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

What Do I Want to Say?

Accuracy:

The fidelity to the original model. Able to solve a subproblem EXACTLY. Maintain the convergence (or faster convergence) of an algorithm.

Implementability:

Easy to solve a subproblem

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 4 / 37

slide-9
SLIDE 9

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

What Do I Want to Say?

Accuracy:

The fidelity to the original model. Able to solve a subproblem EXACTLY. Maintain the convergence (or faster convergence) of an algorithm.

Implementability:

Easy to solve a subproblem Ready for coding

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 4 / 37

slide-10
SLIDE 10

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

What Do I Want to Say?

Accuracy:

The fidelity to the original model. Able to solve a subproblem EXACTLY. Maintain the convergence (or faster convergence) of an algorithm.

Implementability:

Easy to solve a subproblem Ready for coding

They are both important (I hope you also agree).

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 4 / 37

slide-11
SLIDE 11

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

What Do I Want to Say?

Accuracy:

The fidelity to the original model. Able to solve a subproblem EXACTLY. Maintain the convergence (or faster convergence) of an algorithm.

Implementability:

Easy to solve a subproblem Ready for coding

They are both important (I hope you also agree). Yet, they are usually conflicted (to be proved later).

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 4 / 37

slide-12
SLIDE 12

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

A Canonical Convex Optimization Model

A canonical convex minimization model with linear constraints: min{θ(x) | Ax = b, x ∈ X}, with A ∈ ℜm×n, b ∈ ℜm, X ⊆ ℜn a closed convex set, θ : ℜn → ℜ a convex but not necessarily smooth function.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 5 / 37

slide-13
SLIDE 13

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

A Canonical Convex Optimization Model

A canonical convex minimization model with linear constraints: min{θ(x) | Ax = b, x ∈ X}, with A ∈ ℜm×n, b ∈ ℜm, X ⊆ ℜn a closed convex set, θ : ℜn → ℜ a convex but not necessarily smooth function. Solving the original model — thus with 100% accuracy.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 5 / 37

slide-14
SLIDE 14

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

A Canonical Convex Optimization Model

A canonical convex minimization model with linear constraints: min{θ(x) | Ax = b, x ∈ X}, with A ∈ ℜm×n, b ∈ ℜm, X ⊆ ℜn a closed convex set, θ : ℜn → ℜ a convex but not necessarily smooth function. Solving the original model — thus with 100% accuracy. But how?

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 5 / 37

slide-15
SLIDE 15

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

A Canonical Convex Optimization Model

A canonical convex minimization model with linear constraints: min{θ(x) | Ax = b, x ∈ X}, with A ∈ ℜm×n, b ∈ ℜm, X ⊆ ℜn a closed convex set, θ : ℜn → ℜ a convex but not necessarily smooth function. Solving the original model — thus with 100% accuracy. But how? — in general, not possible.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 5 / 37

slide-16
SLIDE 16

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

A Canonical Convex Optimization Model

A canonical convex minimization model with linear constraints: min{θ(x) | Ax = b, x ∈ X}, with A ∈ ℜm×n, b ∈ ℜm, X ⊆ ℜn a closed convex set, θ : ℜn → ℜ a convex but not necessarily smooth function. Solving the original model — thus with 100% accuracy. But how? — in general, not possible. — not implementable.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 5 / 37

slide-17
SLIDE 17

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

A Canonical Convex Optimization Model

A canonical convex minimization model with linear constraints: min{θ(x) | Ax = b, x ∈ X}, with A ∈ ℜm×n, b ∈ ℜm, X ⊆ ℜn a closed convex set, θ : ℜn → ℜ a convex but not necessarily smooth function. Solving the original model — thus with 100% accuracy. But how? — in general, not possible. — not implementable. The penalty method: xk+1 = arg min

  • θ(x) + β

2Ax − b2 x ∈ X

  • which solves an easier problem without linear constraints

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 5 / 37

slide-18
SLIDE 18

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

A Canonical Convex Optimization Model

A canonical convex minimization model with linear constraints: min{θ(x) | Ax = b, x ∈ X}, with A ∈ ℜm×n, b ∈ ℜm, X ⊆ ℜn a closed convex set, θ : ℜn → ℜ a convex but not necessarily smooth function. Solving the original model — thus with 100% accuracy. But how? — in general, not possible. — not implementable. The penalty method: xk+1 = arg min

  • θ(x) + β

2Ax − b2 x ∈ X

  • which solves an easier problem without linear constraints — with

much more implementability.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 5 / 37

slide-19
SLIDE 19

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

A Canonical Convex Optimization Model

A canonical convex minimization model with linear constraints: min{θ(x) | Ax = b, x ∈ X}, with A ∈ ℜm×n, b ∈ ℜm, X ⊆ ℜn a closed convex set, θ : ℜn → ℜ a convex but not necessarily smooth function. Solving the original model — thus with 100% accuracy. But how? — in general, not possible. — not implementable. The penalty method: xk+1 = arg min

  • θ(x) + β

2Ax − b2 x ∈ X

  • which solves an easier problem without linear constraints — with

much more implementability. Of course, with much less accuracy

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 5 / 37

slide-20
SLIDE 20

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

A Canonical Convex Optimization Model

A canonical convex minimization model with linear constraints: min{θ(x) | Ax = b, x ∈ X}, with A ∈ ℜm×n, b ∈ ℜm, X ⊆ ℜn a closed convex set, θ : ℜn → ℜ a convex but not necessarily smooth function. Solving the original model — thus with 100% accuracy. But how? — in general, not possible. — not implementable. The penalty method: xk+1 = arg min

  • θ(x) + β

2Ax − b2 x ∈ X

  • which solves an easier problem without linear constraints — with

much more implementability. Of course, with much less accuracy —- indeed, not necessarily convergent if β +∞.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 5 / 37

slide-21
SLIDE 21

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

A Canonical Convex Optimization Model

A canonical convex minimization model with linear constraints: min{θ(x) | Ax = b, x ∈ X}, with A ∈ ℜm×n, b ∈ ℜm, X ⊆ ℜn a closed convex set, θ : ℜn → ℜ a convex but not necessarily smooth function. Solving the original model — thus with 100% accuracy. But how? — in general, not possible. — not implementable. The penalty method: xk+1 = arg min

  • θ(x) + β

2Ax − b2 x ∈ X

  • which solves an easier problem without linear constraints — with

much more implementability. Of course, with much less accuracy —- indeed, not necessarily convergent if β +∞. With sufficient implementability while too little accuracy.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 5 / 37

slide-22
SLIDE 22

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

The augmented Lagrangian method

How can we keep both the implementability (as the penalty method) and accuracy (with convergence)?

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 6 / 37

slide-23
SLIDE 23

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

The augmented Lagrangian method

How can we keep both the implementability (as the penalty method) and accuracy (with convergence)? Answer: The augmented Lagrangian method (H. Hestenes and M. Powell in 1969, individually)

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 6 / 37

slide-24
SLIDE 24

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

The augmented Lagrangian method

How can we keep both the implementability (as the penalty method) and accuracy (with convergence)? Answer: The augmented Lagrangian method (H. Hestenes and M. Powell in 1969, individually)

  • xk+1

= arg min

  • θ(x) − (λk)T(Ax − b) + β

2Ax − b2

x ∈ X

  • λk+1

= λk − β(Axk+1 − b) where λ ∈ ℜm is the Lagrange multiplier and β > 0 is a penalty parameter.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 6 / 37

slide-25
SLIDE 25

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

The augmented Lagrangian method

How can we keep both the implementability (as the penalty method) and accuracy (with convergence)? Answer: The augmented Lagrangian method (H. Hestenes and M. Powell in 1969, individually)

  • xk+1

= arg min

  • θ(x) − (λk)T(Ax − b) + β

2Ax − b2

x ∈ X

  • λk+1

= λk − β(Axk+1 − b) where λ ∈ ℜm is the Lagrange multiplier and β > 0 is a penalty parameter. The subproblem is as difficult as that of the penalty method (the same level of implementability)

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 6 / 37

slide-26
SLIDE 26

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

The augmented Lagrangian method

How can we keep both the implementability (as the penalty method) and accuracy (with convergence)? Answer: The augmented Lagrangian method (H. Hestenes and M. Powell in 1969, individually)

  • xk+1

= arg min

  • θ(x) − (λk)T(Ax − b) + β

2Ax − b2

x ∈ X

  • λk+1

= λk − β(Axk+1 − b) where λ ∈ ℜm is the Lagrange multiplier and β > 0 is a penalty parameter. The subproblem is as difficult as that of the penalty method (the same level of implementability) It is convergent with any fixed β > 0 (higher accuracy)

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 6 / 37

slide-27
SLIDE 27

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Some Comments on ALM

The ALM:

  • xk+1

= arg min

  • θ(x) − (λk)T(Ax − b) + β

2Ax − b2

x ∈ X

  • λk+1

= λk − β(Axk+1 − b) ALM has an augmented term and it updates the dual variable iteratively

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 7 / 37

slide-28
SLIDE 28

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Some Comments on ALM

The ALM:

  • xk+1

= arg min

  • θ(x) − (λk)T(Ax − b) + β

2Ax − b2

x ∈ X

  • λk+1

= λk − β(Axk+1 − b) ALM has an augmented term and it updates the dual variable iteratively In 1976, T. Rockafellar showed that ALM is an application of the proximal point algorithm (B. Martinet, 1970, or even earlier, J. Moreau, 1965) to the dual problem of the model above.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 7 / 37

slide-29
SLIDE 29

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Some Comments on ALM

The ALM:

  • xk+1

= arg min

  • θ(x) − (λk)T(Ax − b) + β

2Ax − b2

x ∈ X

  • λk+1

= λk − β(Axk+1 − b) ALM has an augmented term and it updates the dual variable iteratively In 1976, T. Rockafellar showed that ALM is an application of the proximal point algorithm (B. Martinet, 1970, or even earlier, J. Moreau, 1965) to the dual problem of the model above. It can be regarded as a dual ascent method over the dual variable λ.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 7 / 37

slide-30
SLIDE 30

Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Some Comments on ALM

The ALM:

  • xk+1

= arg min

  • θ(x) − (λk)T(Ax − b) + β

2Ax − b2

x ∈ X

  • λk+1

= λk − β(Axk+1 − b) ALM has an augmented term and it updates the dual variable iteratively In 1976, T. Rockafellar showed that ALM is an application of the proximal point algorithm (B. Martinet, 1970, or even earlier, J. Moreau, 1965) to the dual problem of the model above. It can be regarded as a dual ascent method over the dual variable λ. A significant difference from the penalty method — the penalty parameter of ALM can theoretically be fixed as any positive scalar.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 7 / 37

slide-31
SLIDE 31

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Outline

1

Backgrounds

2

Accuracy v.s. Implementability – An Easier Case

3

Accuracy v.s. Implementability – A More Complicated Case

4

Conclusions

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 8 / 37

slide-32
SLIDE 32

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

A Separable Model

For many applications, the last model can be specified as a separable form min{θ1(x1) + θ2(x2) | A1x1 + A2x2 = b, x1 ∈ X1, x2 ∈ X2}, where A1 ∈ ℜm×n1, A2 ∈ ℜm×n2, b ∈ ℜm, Xi ⊆ ℜni (i = 1, 2) and θi : ℜni → ℜ (i = 1, 2).

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 9 / 37

slide-33
SLIDE 33

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

A Separable Model

For many applications, the last model can be specified as a separable form min{θ1(x1) + θ2(x2) | A1x1 + A2x2 = b, x1 ∈ X1, x2 ∈ X2}, where A1 ∈ ℜm×n1, A2 ∈ ℜm×n2, b ∈ ℜm, Xi ⊆ ℜni (i = 1, 2) and θi : ℜni → ℜ (i = 1, 2). This model corresponds to the last model with θ(x) = θ1(x1) + θ2(x2), x = (x1, x2), A = (A1, A2), X = X1 × X2 and n = n1 + n2.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 9 / 37

slide-34
SLIDE 34

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

A Separable Model

For many applications, the last model can be specified as a separable form min{θ1(x1) + θ2(x2) | A1x1 + A2x2 = b, x1 ∈ X1, x2 ∈ X2}, where A1 ∈ ℜm×n1, A2 ∈ ℜm×n2, b ∈ ℜm, Xi ⊆ ℜni (i = 1, 2) and θi : ℜni → ℜ (i = 1, 2). This model corresponds to the last model with θ(x) = θ1(x1) + θ2(x2), x = (x1, x2), A = (A1, A2), X = X1 × X2 and n = n1 + n2. A typical application of the widely-used l1-l2 model min{µx1 + 1 2Ax − b2} where the least-square term 1

2Ax − b2 represents a data-fidelity

term and the l1-norm term x1 is a regularization term for inducing spare solutions, and µ > 0 is a trade-off parameter.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 9 / 37

slide-35
SLIDE 35

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Using ALM Directly with 100% Accuracy

Applying ALM directly:

  • (xk+1

1

, xk+1

2

)=arg min

  • θ1(x1) + θ2(x2) − (λk)T (A1x1 + A2x2 − b) + β

2 A1x1 + A2x2 − b2

(x1, x2) ∈ X1 × X2

  • λk+1 = λk − β(A1xk+1 + A2xk+1

2

− b); Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 10 / 37

slide-36
SLIDE 36

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Using ALM Directly with 100% Accuracy

Applying ALM directly:

  • (xk+1

1

, xk+1

2

)=arg min

  • θ1(x1) + θ2(x2) − (λk)T (A1x1 + A2x2 − b) + β

2 A1x1 + A2x2 − b2

(x1, x2) ∈ X1 × X2

  • λk+1 = λk − β(A1xk+1 + A2xk+1

2

− b);

How about its implementability?

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 10 / 37

slide-37
SLIDE 37

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Using ALM Directly with 100% Accuracy

Applying ALM directly:

  • (xk+1

1

, xk+1

2

)=arg min

  • θ1(x1) + θ2(x2) − (λk)T (A1x1 + A2x2 − b) + β

2 A1x1 + A2x2 − b2

(x1, x2) ∈ X1 × X2

  • λk+1 = λk − β(A1xk+1 + A2xk+1

2

− b);

How about its implementability? Is it easy to solve the ALM subproblem exactly?

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 10 / 37

slide-38
SLIDE 38

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Splitting the ALM with Less Accuracy?

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 11 / 37

slide-39
SLIDE 39

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Splitting the ALM with Less Accuracy?

Parallel (Jacobian) Splitting:

     xk+1

1

= arg min

  • θ1(x1) − (λk )T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

= arg minθ2(x2) − (λk )T (A2x2) + β

2 A1xk 1 + A2x2 − b2 | x2 ∈ X2

, λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b). Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 11 / 37

slide-40
SLIDE 40

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Splitting the ALM with Less Accuracy?

Parallel (Jacobian) Splitting:

     xk+1

1

= arg min

  • θ1(x1) − (λk )T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

= arg minθ2(x2) − (λk )T (A2x2) + β

2 A1xk 1 + A2x2 − b2 | x2 ∈ X2

, λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b).

Sequential (Gauss-Seidel) Splitting:

     xk+1

1

= arg min

  • θ1(x1) − (λk)T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2) + β

2 A1xk+1 1

+ A2x2 − b2 | x2 ∈ X2

  • ,

λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b). Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 11 / 37

slide-41
SLIDE 41

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Splitting the ALM with Less Accuracy?

Parallel (Jacobian) Splitting:

     xk+1

1

= arg min

  • θ1(x1) − (λk )T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

= arg minθ2(x2) − (λk )T (A2x2) + β

2 A1xk 1 + A2x2 − b2 | x2 ∈ X2

, λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b).

Sequential (Gauss-Seidel) Splitting:

     xk+1

1

= arg min

  • θ1(x1) − (λk)T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2) + β

2 A1xk+1 1

+ A2x2 − b2 | x2 ∈ X2

  • ,

λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b).

Both lose accuracy but gain implementability — less accurate but more implementable cases compared to the original ALM.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 11 / 37

slide-42
SLIDE 42

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Splitting the ALM with Less Accuracy?

Parallel (Jacobian) Splitting:

     xk+1

1

= arg min

  • θ1(x1) − (λk )T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

= arg minθ2(x2) − (λk )T (A2x2) + β

2 A1xk 1 + A2x2 − b2 | x2 ∈ X2

, λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b).

Sequential (Gauss-Seidel) Splitting:

     xk+1

1

= arg min

  • θ1(x1) − (λk)T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2) + β

2 A1xk+1 1

+ A2x2 − b2 | x2 ∈ X2

  • ,

λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b).

Both lose accuracy but gain implementability — less accurate but more implementable cases compared to the original ALM. They are equally implementable, and Sequential Splitting is more accurate.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 11 / 37

slide-43
SLIDE 43

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Splitting the ALM with Less Accuracy?

Parallel (Jacobian) Splitting:

     xk+1

1

= arg min

  • θ1(x1) − (λk )T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

= arg minθ2(x2) − (λk )T (A2x2) + β

2 A1xk 1 + A2x2 − b2 | x2 ∈ X2

, λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b).

Sequential (Gauss-Seidel) Splitting:

     xk+1

1

= arg min

  • θ1(x1) − (λk)T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2) + β

2 A1xk+1 1

+ A2x2 − b2 | x2 ∈ X2

  • ,

λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b).

Both lose accuracy but gain implementability — less accurate but more implementable cases compared to the original ALM. They are equally implementable, and Sequential Splitting is more accurate. Parallel Splitting is not convergent (He/Hou/Y, 2013).

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 11 / 37

slide-44
SLIDE 44

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Splitting the ALM with Less Accuracy?

Parallel (Jacobian) Splitting:

     xk+1

1

= arg min

  • θ1(x1) − (λk )T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

= arg minθ2(x2) − (λk )T (A2x2) + β

2 A1xk 1 + A2x2 − b2 | x2 ∈ X2

, λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b).

Sequential (Gauss-Seidel) Splitting:

     xk+1

1

= arg min

  • θ1(x1) − (λk)T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2) + β

2 A1xk+1 1

+ A2x2 − b2 | x2 ∈ X2

  • ,

λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b).

Both lose accuracy but gain implementability — less accurate but more implementable cases compared to the original ALM. They are equally implementable, and Sequential Splitting is more accurate. Parallel Splitting is not convergent (He/Hou/Y, 2013). Sequential Splitting is convergent — the Alternating Direction Method of Multipliers (ADMM) originally proposed by R. Glowinski and Marrocco in 1975.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 11 / 37

slide-45
SLIDE 45

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Comments on ADMM

The ADMM scheme:      xk+1

1

= arg min

  • θ1(x1) − (λk)T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2) + β

2 A1xk+1 1

+ A2x2 − b2 | x2 ∈ X2

  • ,

λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b).

ADMM represents an inexact version of ALM, because the (x1, x2)-subproblem in ALM is decomposed into two smaller ones.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 12 / 37

slide-46
SLIDE 46

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Comments on ADMM

The ADMM scheme:      xk+1

1

= arg min

  • θ1(x1) − (λk)T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2) + β

2 A1xk+1 1

+ A2x2 − b2 | x2 ∈ X2

  • ,

λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b).

ADMM represents an inexact version of ALM, because the (x1, x2)-subproblem in ALM is decomposed into two smaller ones. It is possible to take advantage of the properties of θ1 and θ2 individually — the decomposed subproblems are potentially much easier than the aggregated subproblem in (the original subproblem of) ALM.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 12 / 37

slide-47
SLIDE 47

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Comments on ADMM

The ADMM scheme:      xk+1

1

= arg min

  • θ1(x1) − (λk)T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2) + β

2 A1xk+1 1

+ A2x2 − b2 | x2 ∈ X2

  • ,

λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b).

ADMM represents an inexact version of ALM, because the (x1, x2)-subproblem in ALM is decomposed into two smaller ones. It is possible to take advantage of the properties of θ1 and θ2 individually — the decomposed subproblems are potentially much easier than the aggregated subproblem in (the original subproblem of) ALM. For the mentioned l1-l2 model, all subproblems are even easy enough to have closed-form solutions (to be delineated).

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 12 / 37

slide-48
SLIDE 48

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Cont’d

A “renaissance" of ADMM in many application domains such as image processing, statistical learning, computer vision, and so on.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 13 / 37

slide-49
SLIDE 49

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Cont’d

A “renaissance" of ADMM in many application domains such as image processing, statistical learning, computer vision, and so on. In 2011, we proved ADMM’s convergence rate.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 13 / 37

slide-50
SLIDE 50

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Cont’d

A “renaissance" of ADMM in many application domains such as image processing, statistical learning, computer vision, and so on. In 2011, we proved ADMM’s convergence rate. Review papers: Boyd et al. 2010, Glowinski 2012, Eckstein and Yao 2012.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 13 / 37

slide-51
SLIDE 51

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Accuracy of ADMM

Certainly, acquiring implementability does not mean no care about the accuracy.

1Ng/Wang/Y., Inexact alternating direction methods for image recovery, SIAM

Journal on Scientific Computing, 33(4), 1643-1668, 2011.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 14 / 37

slide-52
SLIDE 52

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Accuracy of ADMM

Certainly, acquiring implementability does not mean no care about the accuracy. The accuracy of ADMM’s subproblems should be considered seriously.

     xk+1

1

≈arg min

  • θ1(x1) − (λk )T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

≈arg minθ2(x2) − (λk )T (A2x2) + β

2 A1xk+1 1

+ A2x2 − b2 | x2 ∈ X2 , λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b). 1Ng/Wang/Y., Inexact alternating direction methods for image recovery, SIAM

Journal on Scientific Computing, 33(4), 1643-1668, 2011.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 14 / 37

slide-53
SLIDE 53

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Accuracy of ADMM

Certainly, acquiring implementability does not mean no care about the accuracy. The accuracy of ADMM’s subproblems should be considered seriously.

     xk+1

1

≈arg min

  • θ1(x1) − (λk )T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

≈arg minθ2(x2) − (λk )T (A2x2) + β

2 A1xk+1 1

+ A2x2 − b2 | x2 ∈ X2 , λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b).

How to define “≈" rigorously above?

1Ng/Wang/Y., Inexact alternating direction methods for image recovery, SIAM

Journal on Scientific Computing, 33(4), 1643-1668, 2011.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 14 / 37

slide-54
SLIDE 54

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Accuracy of ADMM

Certainly, acquiring implementability does not mean no care about the accuracy. The accuracy of ADMM’s subproblems should be considered seriously.

     xk+1

1

≈arg min

  • θ1(x1) − (λk )T (A1x1) + β

2 A1x1 + A2xk 2 − b2 | x1 ∈ X1

  • ,

xk+1

2

≈arg minθ2(x2) − (λk )T (A2x2) + β

2 A1xk+1 1

+ A2x2 − b2 | x2 ∈ X2 , λk+1 = λk − β(A1xk+1

1

+ A2xk+1

2

− b).

How to define “≈" rigorously above? For a general case, we need to analyze rigorously the inexactness criterion for solving these subproblems 1.

1Ng/Wang/Y., Inexact alternating direction methods for image recovery, SIAM

Journal on Scientific Computing, 33(4), 1643-1668, 2011.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 14 / 37

slide-55
SLIDE 55

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Two ADMM Applications

(1) Compressive Sensing (Donoho, Candes, Tao,· · · )

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 15 / 37

slide-56
SLIDE 56

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Two ADMM Applications

(1) Compressive Sensing (Donoho, Candes, Tao,· · · ) Allowing us to go beyond the Shannon limit to exploit the sparsity

  • f a signal.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 15 / 37

slide-57
SLIDE 57

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Two ADMM Applications

(1) Compressive Sensing (Donoho, Candes, Tao,· · · ) Allowing us to go beyond the Shannon limit to exploit the sparsity

  • f a signal.

Acquiring important information of a signal efficiently (e.g., storage-saving, speed-improving).

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 15 / 37

slide-58
SLIDE 58

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Two ADMM Applications

(1) Compressive Sensing (Donoho, Candes, Tao,· · · ) Allowing us to go beyond the Shannon limit to exploit the sparsity

  • f a signal.

Acquiring important information of a signal efficiently (e.g., storage-saving, speed-improving).

compressive equipment

  • riginal signal
  • bservation

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 15 / 37

slide-59
SLIDE 59

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Two ADMM Applications

(1) Compressive Sensing (Donoho, Candes, Tao,· · · ) Allowing us to go beyond the Shannon limit to exploit the sparsity

  • f a signal.

Acquiring important information of a signal efficiently (e.g., storage-saving, speed-improving).

compressive equipment

  • riginal signal
  • bservation

Ideal model: Ax = b x — original signal, A — sensing matrix (a fat matrix), b — observation (with noise)

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 15 / 37

slide-60
SLIDE 60

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

The Sparsity of a Signal

Some signals are large-scale but sparse (maybe under some transform domain)

500 1000 1500 2000 2500 3000 −2 2 100 200 300 400 500 600 700 800 900 1000 −2 2 100 200 300 400 500 600 700 800 900 1000 −2 2 2 4 6 8 10 12 14 16 −2 2

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 16 / 37

slide-61
SLIDE 61

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Mathematical Model

Find a sparse solution of a system of linear equations min

  • x0 | Ax = b, x ∈ Rn

, where x0 = number of nonzeros of x and A ∈ Rm×n with m ≪ n.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 17 / 37

slide-62
SLIDE 62

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Mathematical Model

Find a sparse solution of a system of linear equations min

  • x0 | Ax = b, x ∈ Rn

, where x0 = number of nonzeros of x and A ∈ Rm×n with m ≪ n. The solution is in general not unique.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 17 / 37

slide-63
SLIDE 63

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Mathematical Model

Find a sparse solution of a system of linear equations min

  • x0 | Ax = b, x ∈ Rn

, where x0 = number of nonzeros of x and A ∈ Rm×n with m ≪ n. The solution is in general not unique. It is NP-hard!

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 17 / 37

slide-64
SLIDE 64

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Basic Models for Compressive Sensing

Basis-pursuit (BP): min {x1 | Ax = b}

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 18 / 37

slide-65
SLIDE 65

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Basic Models for Compressive Sensing

Basis-pursuit (BP): min {x1 | Ax = b} l1-regularized least-squares model: min τx1 + 1 2Ax − b2

2

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 18 / 37

slide-66
SLIDE 66

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

A Reformulation of the l1 − l2 Model

min

x

τx1 + 1 2Ax − b2

2

  • By introducing y

min τx1 + 1

2Ay − b2 2

s.t. x = y.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 19 / 37

slide-67
SLIDE 67

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Solutions of ADMM’s Subproblems

min τx1 + 1

2Ay − b2 2

s.t. x = y.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 20 / 37

slide-68
SLIDE 68

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Solutions of ADMM’s Subproblems

min τx1 + 1

2Ay − b2 2

s.t. x = y.

1

xk+1 = arg min

x∈Rn τx1 + β 2

  • x − yk − λk

β

  • 2

2;

2

yk+1: (βI + ATA)y = ATb + βxk+1 − λk;

3

λk+1 = λk − β

  • xk+1 − yk+1

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 20 / 37

slide-69
SLIDE 69

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Solutions of ADMM’s Subproblems

min τx1 + 1

2Ay − b2 2

s.t. x = y.

1

xk+1 = arg min

x∈Rn τx1 + β 2

  • x − yk − λk

β

  • 2

2;

2

yk+1: (βI + ATA)y = ATb + βxk+1 − λk;

3

λk+1 = λk − β

  • xk+1 − yk+1

P1 is a soft-shrinkage operator P2 is a system of linear equations, efficient solvers (e.g. PCG or BB) are available

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 20 / 37

slide-70
SLIDE 70

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Another ADMM Application

(2) Image deblurring A clean image could be degraded by blur — defocus of the camera’s lens, the moving object, turbulence in the air, · · ·

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 21 / 37

slide-71
SLIDE 71

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Another ADMM Application

(2) Image deblurring A clean image could be degraded by blur — defocus of the camera’s lens, the moving object, turbulence in the air, · · · min |∇x|1 + µ 2Kx − x02, where x is the clean image, x0 is the corrupted image by Gaussian noise, K is the point spread function (blur), ∇ is a gradient operator (by Rudin/Osher/Fatemi, 92’) to preserve sharp edges of an image, and µ is a trade-off parameter.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 21 / 37

slide-72
SLIDE 72

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Another ADMM Application

(2) Image deblurring A clean image could be degraded by blur — defocus of the camera’s lens, the moving object, turbulence in the air, · · · min |∇x|1 + µ 2Kx − x02, where x is the clean image, x0 is the corrupted image by Gaussian noise, K is the point spread function (blur), ∇ is a gradient operator (by Rudin/Osher/Fatemi, 92’) to preserve sharp edges of an image, and µ is a trade-off parameter.

  • riginal image

blurred image restored image

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 21 / 37

slide-73
SLIDE 73

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Applying ADMM

Reformulate it as min |y|1 + µ 2Kx − x02 s.t. ∇x = y, to which ADMM is applicable.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 22 / 37

slide-74
SLIDE 74

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Applying ADMM

Reformulate it as min |y|1 + µ 2Kx − x02 s.t. ∇x = y, to which ADMM is applicable. The resulting subproblems are easy.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 22 / 37

slide-75
SLIDE 75

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Applying ADMM

Reformulate it as min |y|1 + µ 2Kx − x02 s.t. ∇x = y, to which ADMM is applicable. The resulting subproblems are easy. The x-subproblem (via a DFT): ˜ xk = arg min

x

µ 2Kx − x02 − (λk)T(∇x − yk) + β 2∇x − yk2

  • .

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 22 / 37

slide-76
SLIDE 76

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Applying ADMM

Reformulate it as min |y|1 + µ 2Kx − x02 s.t. ∇x = y, to which ADMM is applicable. The resulting subproblems are easy. The x-subproblem (via a DFT): ˜ xk = arg min

x

µ 2Kx − x02 − (λk)T(∇x − yk) + β 2∇x − yk2

  • .

The y-subproblem (via a shrinkage): ˜ yk = arg min

y

  • |y|1 − (λk+1)T(∇xk+1 − y) + β

2∇xk+1 − y2

  • .

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 22 / 37

slide-77
SLIDE 77

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Image Inpainting

Problem: Some pixels are missing in image. Partial information of image is available g = S f, S — mask Model: min {∇f1 | S f = g}

  • riginal image

missing pixel image restored image

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 23 / 37

slide-78
SLIDE 78

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Image Decomposition

Problem: Separate the sketch (cartoon) and oscillating component (texture) of image f = u + v, u — cartoon part, v — texture part Model: min

  • τ∇u1 + v−1,∞ | u + v = f
  • riginal image

cartoon part texture part

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 24 / 37

slide-79
SLIDE 79

Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Magnetic Resonance Imaging (MRI)

Problem: Reconstruct a medical image by sampling its Fourier coefficients partially Fg = PFf, P — sampling mask, F — Fourier transform Model: min {∇f1 | Fg = PFf} medical image sampling mask reconstruction

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 25 / 37

slide-80
SLIDE 80

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Outline

1

Backgrounds

2

Accuracy v.s. Implementability – An Easier Case

3

Accuracy v.s. Implementability – A More Complicated Case

4

Conclusions

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 26 / 37

slide-81
SLIDE 81

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

A More Complicated Model with Higher Degree of Separability

A more complicated multi-block separable convex optimization model:

min   

m

  • i=1

θi (xi)

  • m
  • i=1

Ai xi = b, xi ∈ Xi , i = 1, 2, · · · , m    ,

with m ≥ 3.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 27 / 37

slide-82
SLIDE 82

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

A More Complicated Model with Higher Degree of Separability

A more complicated multi-block separable convex optimization model:

min   

m

  • i=1

θi (xi)

  • m
  • i=1

Ai xi = b, xi ∈ Xi , i = 1, 2, · · · , m    ,

with m ≥ 3. Applications include Image alignment problem The robust principal component analysis model with noisy and incomplete data The latent variable Gaussian graphical model selection The quadratic discriminant analysis model

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 27 / 37

slide-83
SLIDE 83

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Splitting Versions with Less Accuracy while More Implementability

Obviously, the parallel (Jacobian) splitting:

                                         xk+1

1

= argmin{θ1(x1) − (λk)T(A1x1) + β

2 A1x1 + m

  • j=2

Aj xk

j − b2 | x1 ∈ X1},

· · · · · · xk+1

i

= argmin{θi (xi ) − (λk )T (Aixi ) + β

2 i−1

  • j=1

Aj xk

j + Ai xi + m

  • j=i+1

Aj xk

j − b2 | xi ∈ Xi },

· · · · · · xk+1

m

= argmin{θm(xm) − (λk)T (Amxm) + β

2 m−1

  • j=1

Aj xk

j + Amxm − b2 | xm ∈ Xm},

λk+1 = λk − β(

m

  • i=1

Ai xk+1

i

− b).

does not work (more details are coming).

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 28 / 37

slide-84
SLIDE 84

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Cont’d

Can we extend ADMM straightforwardly (by splitting ALM into m subproblems sequentially)?

                                         xk+1

1

= argmin{θ1(x1) − (λk )T(A1x1) + β

2 A1x1 + m

  • j=2

Aj xk

j − b2 | x1 ∈ X1},

· · · · · · xk+1

i

= argmin{θi (xi) − (λk)T (Ai xi ) + β

2 i−1

  • j=1

Aj xk+1

j

+ Ai xi +

m

  • j=i+1

Aj xk

j − b2 | xi ∈ Xi },

· · · · · · xk+1

m

= argmin{θm(xm) − (λk)T (Amxm) + β

2 m−1

  • j=1

Aj xk+1

j

+ Amxm − b2 | xm ∈ Xm}, λk+1 = λk − β(

m

  • i=1

Ai xk+1

i

− b). Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 29 / 37

slide-85
SLIDE 85

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Cont’d

Can we extend ADMM straightforwardly (by splitting ALM into m subproblems sequentially)?

                                         xk+1

1

= argmin{θ1(x1) − (λk )T(A1x1) + β

2 A1x1 + m

  • j=2

Aj xk

j − b2 | x1 ∈ X1},

· · · · · · xk+1

i

= argmin{θi (xi) − (λk)T (Ai xi ) + β

2 i−1

  • j=1

Aj xk+1

j

+ Ai xi +

m

  • j=i+1

Aj xk

j − b2 | xi ∈ Xi },

· · · · · · xk+1

m

= argmin{θm(xm) − (λk)T (Amxm) + β

2 m−1

  • j=1

Aj xk+1

j

+ Amxm − b2 | xm ∈ Xm}, λk+1 = λk − β(

m

  • i=1

Ai xk+1

i

− b).

This direct extension of the ADMM has been widely used in the literature; and it does work very well for many applications!

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 29 / 37

slide-86
SLIDE 86

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Cont’d

Can we extend ADMM straightforwardly (by splitting ALM into m subproblems sequentially)?

                                         xk+1

1

= argmin{θ1(x1) − (λk )T(A1x1) + β

2 A1x1 + m

  • j=2

Aj xk

j − b2 | x1 ∈ X1},

· · · · · · xk+1

i

= argmin{θi (xi) − (λk)T (Ai xi ) + β

2 i−1

  • j=1

Aj xk+1

j

+ Ai xi +

m

  • j=i+1

Aj xk

j − b2 | xi ∈ Xi },

· · · · · · xk+1

m

= argmin{θm(xm) − (λk)T (Amxm) + β

2 m−1

  • j=1

Aj xk+1

j

+ Amxm − b2 | xm ∈ Xm}, λk+1 = λk − β(

m

  • i=1

Ai xk+1

i

− b).

This direct extension of the ADMM has been widely used in the literature; and it does work very well for many applications! But for a very long time, neither affirmative convergence proof nor counter example showing its divergence was available.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 29 / 37

slide-87
SLIDE 87

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Recently we 2 found some examples showing the divergence of the direct extension of ADMM even when m = 3. So, the direct extension of ADMM for multi-block separable convex optimization model is not necessarily convergent!

2Chen/He/Ye/Y., The direct extension of ADMM for multi-block separable convex

minimization models is not necessarily convergent, September 2013.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 30 / 37

slide-88
SLIDE 88

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Recently we 2 found some examples showing the divergence of the direct extension of ADMM even when m = 3. So, the direct extension of ADMM for multi-block separable convex optimization model is not necessarily convergent! That is, even to solve

min

  • θ1(x1) + θ2(x2) + θ3(x3)
  • A1x1 + A2x2 + A3x3 = b, xi ∈ Xi , i = 1, 2, 3
  • ,

2Chen/He/Ye/Y., The direct extension of ADMM for multi-block separable convex

minimization models is not necessarily convergent, September 2013.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 30 / 37

slide-89
SLIDE 89

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Recently we 2 found some examples showing the divergence of the direct extension of ADMM even when m = 3. So, the direct extension of ADMM for multi-block separable convex optimization model is not necessarily convergent! That is, even to solve

min

  • θ1(x1) + θ2(x2) + θ3(x3)
  • A1x1 + A2x2 + A3x3 = b, xi ∈ Xi , i = 1, 2, 3
  • ,

the following scheme is not necessarily convergent:

           xk+1

1

= argmin{θ1(x1) − (λk )T(A1x1) + β

2 A1x1 + A2xk 2 + A3xk 3 − b2 | x1 ∈ X1},

xk+1

2

= argmin{θ2(x2) − (λk )T (A2x2) + β

2 A1xk+1 1

+ A2x2 + A3xk

3 − b2 | x2 ∈ X2},

xk+1

3

= argmin{θ3(x3) − (λk )T (A3x3) + β

2 A1xk+1 1

+ A2xk+1

2

+ A3x3 − b2 | x3 ∈ X3}, λk+1 = λk − β(A‘xk+1

+ A2xk+1

2

+ A3xk+1

3

− b). 2Chen/He/Ye/Y., The direct extension of ADMM for multi-block separable convex

minimization models is not necessarily convergent, September 2013.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 30 / 37

slide-90
SLIDE 90

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Recently we 2 found some examples showing the divergence of the direct extension of ADMM even when m = 3. So, the direct extension of ADMM for multi-block separable convex optimization model is not necessarily convergent! That is, even to solve

min

  • θ1(x1) + θ2(x2) + θ3(x3)
  • A1x1 + A2x2 + A3x3 = b, xi ∈ Xi , i = 1, 2, 3
  • ,

the following scheme is not necessarily convergent:

           xk+1

1

= argmin{θ1(x1) − (λk )T(A1x1) + β

2 A1x1 + A2xk 2 + A3xk 3 − b2 | x1 ∈ X1},

xk+1

2

= argmin{θ2(x2) − (λk )T (A2x2) + β

2 A1xk+1 1

+ A2x2 + A3xk

3 − b2 | x2 ∈ X2},

xk+1

3

= argmin{θ3(x3) − (λk )T (A3x3) + β

2 A1xk+1 1

+ A2xk+1

2

+ A3x3 − b2 | x3 ∈ X3}, λk+1 = λk − β(A‘xk+1

+ A2xk+1

2

+ A3xk+1

3

− b).

Both Jacobian and Gauss-Seidel decompositions fail — too much loss of accuracy for m ≥ 3!

2Chen/He/Ye/Y., The direct extension of ADMM for multi-block separable convex

minimization models is not necessarily convergent, September 2013.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 30 / 37

slide-91
SLIDE 91

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

One Way of Applying the ADMM

Conceptually, we can treat the multi-block model as a two-block model

min

  • θ1(x1) + θ2(x2) + θ3(x3)
  • A1x1 + A2x2 + A3x3 = b, xi ∈ Xi , i = 1, 2, 3
  • ,

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 31 / 37

slide-92
SLIDE 92

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

One Way of Applying the ADMM

Conceptually, we can treat the multi-block model as a two-block model

min

  • θ1(x1) + θ2(x2) + θ3(x3)
  • A1x1 + A2x2 + A3x3 = b, xi ∈ Xi , i = 1, 2, 3
  • ,

Then, apply the original ADMM (for the two-block case)

  • xk+1

1

= arg min

  • θ1(x1) − (λk)T (A1x1) + β

2 A1x1 + A2xk 2 + A3xk 3 − b2

x1 ∈ X1

  • ,

(xk+1

2

, xk+1

3

)= arg min

  • θ2(x2) + θ3(x3) − (λk)T (A2x2 + A3x3 − b)

+ β

2 A1xk+1 1

+ A2x2 + A3x3 − b2 x2 ∈ X2, x3 ∈ X3

  • ,

λk+1 = λk − αβ(A1xk+1

1

+ A2xk+1

2

+ A3xk+1

3

− b). Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 31 / 37

slide-93
SLIDE 93

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

One Way of Applying the ADMM

Conceptually, we can treat the multi-block model as a two-block model

min

  • θ1(x1) + θ2(x2) + θ3(x3)
  • A1x1 + A2x2 + A3x3 = b, xi ∈ Xi , i = 1, 2, 3
  • ,

Then, apply the original ADMM (for the two-block case)

  • xk+1

1

= arg min

  • θ1(x1) − (λk)T (A1x1) + β

2 A1x1 + A2xk 2 + A3xk 3 − b2

x1 ∈ X1

  • ,

(xk+1

2

, xk+1

3

)= arg min

  • θ2(x2) + θ3(x3) − (λk)T (A2x2 + A3x3 − b)

+ β

2 A1xk+1 1

+ A2x2 + A3x3 − b2 x2 ∈ X2, x3 ∈ X3

  • ,

λk+1 = λk − αβ(A1xk+1

1

+ A2xk+1

2

+ A3xk+1

3

− b).

It is accurate (recall ADMM’s convergence).

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 31 / 37

slide-94
SLIDE 94

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

One Way of Applying the ADMM

Conceptually, we can treat the multi-block model as a two-block model

min

  • θ1(x1) + θ2(x2) + θ3(x3)
  • A1x1 + A2x2 + A3x3 = b, xi ∈ Xi , i = 1, 2, 3
  • ,

Then, apply the original ADMM (for the two-block case)

  • xk+1

1

= arg min

  • θ1(x1) − (λk)T (A1x1) + β

2 A1x1 + A2xk 2 + A3xk 3 − b2

x1 ∈ X1

  • ,

(xk+1

2

, xk+1

3

)= arg min

  • θ2(x2) + θ3(x3) − (λk)T (A2x2 + A3x3 − b)

+ β

2 A1xk+1 1

+ A2x2 + A3x3 − b2 x2 ∈ X2, x3 ∈ X3

  • ,

λk+1 = λk − αβ(A1xk+1

1

+ A2xk+1

2

+ A3xk+1

3

− b).

It is accurate (recall ADMM’s convergence). But it is not implementable (hard to solve the (x2, x3)-subproblem).

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 31 / 37

slide-95
SLIDE 95

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

ADMM with Further Splitting

Split the (x2, x3)-subproblem in parallel

           xk+1

1

= arg min

  • θ1(x1) − (λk )T (A1x1) + β

2 A1x1 + A2xk 2 + A3xk 3 − b2

x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2 + A3xk

3 − b) + β 2 A1xk+1 1

+ A2x2 + A3xk

3 − b2

x2 ∈ X2

  • ,

xk+1

3

= arg min

  • θ3(x3) − (λk)T (A2xk

2 + A3x3 − b) + β 2 A1xk+1 1

+ A2xk

2 + A3x3 − b2

x3 ∈ X3

  • ,

λk+1 = λk − αβ(A1xk+1

1

+ A2xk+1

2

+ A3xk+1

3

− b). Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 32 / 37

slide-96
SLIDE 96

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

ADMM with Further Splitting

Split the (x2, x3)-subproblem in parallel

           xk+1

1

= arg min

  • θ1(x1) − (λk )T (A1x1) + β

2 A1x1 + A2xk 2 + A3xk 3 − b2

x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2 + A3xk

3 − b) + β 2 A1xk+1 1

+ A2x2 + A3xk

3 − b2

x2 ∈ X2

  • ,

xk+1

3

= arg min

  • θ3(x3) − (λk)T (A2xk

2 + A3x3 − b) + β 2 A1xk+1 1

+ A2xk

2 + A3x3 − b2

x3 ∈ X3

  • ,

λk+1 = λk − αβ(A1xk+1

1

+ A2xk+1

2

+ A3xk+1

3

− b).

Split the (x2, x3)-subproblem sequentially

           xk+1

1

= arg min

  • θ1(x1) − (λk)T (A1x1) + β

2 A1x1 + A2xk 2 + A3xk 3 − b2

x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2 + A3xk

3 − b) + β 2 A1xk+1 1

+ A2x2 + A3xk

3 − b2

x2 ∈ X2

  • ,

xk+1

3

= arg min

  • θ3(x3) − (λk)T (A2xk+1

2

+ A3x3 − b) + β

2 A1xk+1 1

+ A2xk+1

2

+ A3x3 − b2 x3 ∈ X3

  • ,

λk+1 = λk − αβ(A1xk+1

1

+ A2xk+1

2

+ A3xk+1

3

− b). Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 32 / 37

slide-97
SLIDE 97

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

ADMM with Further Splitting

Split the (x2, x3)-subproblem in parallel

           xk+1

1

= arg min

  • θ1(x1) − (λk )T (A1x1) + β

2 A1x1 + A2xk 2 + A3xk 3 − b2

x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2 + A3xk

3 − b) + β 2 A1xk+1 1

+ A2x2 + A3xk

3 − b2

x2 ∈ X2

  • ,

xk+1

3

= arg min

  • θ3(x3) − (λk)T (A2xk

2 + A3x3 − b) + β 2 A1xk+1 1

+ A2xk

2 + A3x3 − b2

x3 ∈ X3

  • ,

λk+1 = λk − αβ(A1xk+1

1

+ A2xk+1

2

+ A3xk+1

3

− b).

Split the (x2, x3)-subproblem sequentially

           xk+1

1

= arg min

  • θ1(x1) − (λk)T (A1x1) + β

2 A1x1 + A2xk 2 + A3xk 3 − b2

x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2 + A3xk

3 − b) + β 2 A1xk+1 1

+ A2x2 + A3xk

3 − b2

x2 ∈ X2

  • ,

xk+1

3

= arg min

  • θ3(x3) − (λk)T (A2xk+1

2

+ A3x3 − b) + β

2 A1xk+1 1

+ A2xk+1

2

+ A3x3 − b2 x3 ∈ X3

  • ,

λk+1 = λk − αβ(A1xk+1

1

+ A2xk+1

2

+ A3xk+1

3

− b).

Both are implementable,

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 32 / 37

slide-98
SLIDE 98

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

ADMM with Further Splitting

Split the (x2, x3)-subproblem in parallel

           xk+1

1

= arg min

  • θ1(x1) − (λk )T (A1x1) + β

2 A1x1 + A2xk 2 + A3xk 3 − b2

x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2 + A3xk

3 − b) + β 2 A1xk+1 1

+ A2x2 + A3xk

3 − b2

x2 ∈ X2

  • ,

xk+1

3

= arg min

  • θ3(x3) − (λk)T (A2xk

2 + A3x3 − b) + β 2 A1xk+1 1

+ A2xk

2 + A3x3 − b2

x3 ∈ X3

  • ,

λk+1 = λk − αβ(A1xk+1

1

+ A2xk+1

2

+ A3xk+1

3

− b).

Split the (x2, x3)-subproblem sequentially

           xk+1

1

= arg min

  • θ1(x1) − (λk)T (A1x1) + β

2 A1x1 + A2xk 2 + A3xk 3 − b2

x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2 + A3xk

3 − b) + β 2 A1xk+1 1

+ A2x2 + A3xk

3 − b2

x2 ∈ X2

  • ,

xk+1

3

= arg min

  • θ3(x3) − (λk)T (A2xk+1

2

+ A3x3 − b) + β

2 A1xk+1 1

+ A2xk+1

2

+ A3x3 − b2 x3 ∈ X3

  • ,

λk+1 = λk − αβ(A1xk+1

1

+ A2xk+1

2

+ A3xk+1

3

− b).

Both are implementable, but how about the accuracy?

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 32 / 37

slide-99
SLIDE 99

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

ADMM with Further Splitting

Split the (x2, x3)-subproblem in parallel

           xk+1

1

= arg min

  • θ1(x1) − (λk )T (A1x1) + β

2 A1x1 + A2xk 2 + A3xk 3 − b2

x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2 + A3xk

3 − b) + β 2 A1xk+1 1

+ A2x2 + A3xk

3 − b2

x2 ∈ X2

  • ,

xk+1

3

= arg min

  • θ3(x3) − (λk)T (A2xk

2 + A3x3 − b) + β 2 A1xk+1 1

+ A2xk

2 + A3x3 − b2

x3 ∈ X3

  • ,

λk+1 = λk − αβ(A1xk+1

1

+ A2xk+1

2

+ A3xk+1

3

− b).

Split the (x2, x3)-subproblem sequentially

           xk+1

1

= arg min

  • θ1(x1) − (λk)T (A1x1) + β

2 A1x1 + A2xk 2 + A3xk 3 − b2

x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2 + A3xk

3 − b) + β 2 A1xk+1 1

+ A2x2 + A3xk

3 − b2

x2 ∈ X2

  • ,

xk+1

3

= arg min

  • θ3(x3) − (λk)T (A2xk+1

2

+ A3x3 − b) + β

2 A1xk+1 1

+ A2xk+1

2

+ A3x3 − b2 x3 ∈ X3

  • ,

λk+1 = λk − αβ(A1xk+1

1

+ A2xk+1

2

+ A3xk+1

3

− b).

Both are implementable, but how about the accuracy? Both are not necessarily convergent (Liu/Lu/Y., in pending)

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 32 / 37

slide-100
SLIDE 100

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

ADMM with Further Splitting

Split the (x2, x3)-subproblem in parallel

           xk+1

1

= arg min

  • θ1(x1) − (λk )T (A1x1) + β

2 A1x1 + A2xk 2 + A3xk 3 − b2

x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2 + A3xk

3 − b) + β 2 A1xk+1 1

+ A2x2 + A3xk

3 − b2

x2 ∈ X2

  • ,

xk+1

3

= arg min

  • θ3(x3) − (λk)T (A2xk

2 + A3x3 − b) + β 2 A1xk+1 1

+ A2xk

2 + A3x3 − b2

x3 ∈ X3

  • ,

λk+1 = λk − αβ(A1xk+1

1

+ A2xk+1

2

+ A3xk+1

3

− b).

Split the (x2, x3)-subproblem sequentially

           xk+1

1

= arg min

  • θ1(x1) − (λk)T (A1x1) + β

2 A1x1 + A2xk 2 + A3xk 3 − b2

x1 ∈ X1

  • ,

xk+1

2

= arg min

  • θ2(x2) − (λk)T (A2x2 + A3xk

3 − b) + β 2 A1xk+1 1

+ A2x2 + A3xk

3 − b2

x2 ∈ X2

  • ,

xk+1

3

= arg min

  • θ3(x3) − (λk)T (A2xk+1

2

+ A3x3 − b) + β

2 A1xk+1 1

+ A2xk+1

2

+ A3x3 − b2 x3 ∈ X3

  • ,

λk+1 = λk − αβ(A1xk+1

1

+ A2xk+1

2

+ A3xk+1

3

− b).

Both are implementable, but how about the accuracy? Both are not necessarily convergent (Liu/Lu/Y., in pending) Implementable but not accurate!

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 32 / 37

slide-101
SLIDE 101

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Convergence-guarantee

How to guarantee the convergence while remain the implementability?

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 33 / 37

slide-102
SLIDE 102

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Convergence-guarantee

How to guarantee the convergence while remain the implementability? Correct the output of the decomposed subproblems, see our work in 2011-2013.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 33 / 37

slide-103
SLIDE 103

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Convergence-guarantee

How to guarantee the convergence while remain the implementability? Correct the output of the decomposed subproblems, see our work in 2011-2013. Proximally regularized the decomposed subproblems (this works even when the ALM subproblem is decomposed in parallel), see He/Xu/Y., Deng/Lai/Pang/Yin, Wang/Hong/Ma/Luo, etc.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 33 / 37

slide-104
SLIDE 104

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Accuracy Improvement

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 34 / 37

slide-105
SLIDE 105

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Accuracy Improvement

For the convergence-guaranteed and implementability-preserved algorithms,

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 34 / 37

slide-106
SLIDE 106

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Accuracy Improvement

For the convergence-guaranteed and implementability-preserved algorithms, How to design inexact criteria for the subproblems for the general setting? (Y., ongoing)

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 34 / 37

slide-107
SLIDE 107

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Accuracy Improvement

For the convergence-guaranteed and implementability-preserved algorithms, How to design inexact criteria for the subproblems for the general setting? (Y., ongoing) Do we really need to decompose m times?

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 34 / 37

slide-108
SLIDE 108

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Accuracy Improvement

For the convergence-guaranteed and implementability-preserved algorithms, How to design inexact criteria for the subproblems for the general setting? (Y., ongoing) Do we really need to decompose m times? — How about decomposing less blocks thus preserve more accuracy of the subproblems?

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 34 / 37

slide-109
SLIDE 109

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Accuracy Improvement

For the convergence-guaranteed and implementability-preserved algorithms, How to design inexact criteria for the subproblems for the general setting? (Y., ongoing) Do we really need to decompose m times? — How about decomposing less blocks thus preserve more accuracy of the subproblems? We can regroup m block as t blocks with t ≪ m, apply existing methods for the t-block reformulated model to gain the accuracy (i.e., the proved convergence) and further decompose each subproblem to gain the implementability

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 34 / 37

slide-110
SLIDE 110

Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Accuracy Improvement

For the convergence-guaranteed and implementability-preserved algorithms, How to design inexact criteria for the subproblems for the general setting? (Y., ongoing) Do we really need to decompose m times? — How about decomposing less blocks thus preserve more accuracy of the subproblems? We can regroup m block as t blocks with t ≪ m, apply existing methods for the t-block reformulated model to gain the accuracy (i.e., the proved convergence) and further decompose each subproblem to gain the implementability —-(He/Y. and Fu/He/Wang/Y.’s work in August 2014)

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 34 / 37

slide-111
SLIDE 111

Conclusions 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Outline

1

Backgrounds

2

Accuracy v.s. Implementability – An Easier Case

3

Accuracy v.s. Implementability – A More Complicated Case

4

Conclusions

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 35 / 37

slide-112
SLIDE 112

Conclusions 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Conclusions

Accuracy and implementability are two common yet usually conflicted objectives in algorithmic design.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 36 / 37

slide-113
SLIDE 113

Conclusions 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Conclusions

Accuracy and implementability are two common yet usually conflicted objectives in algorithmic design. We show by some convex optimization models with strong application backgrounds (imaging, learning, cloud computing, big data, etc.) how to consider these two objectives.

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 36 / 37

slide-114
SLIDE 114

Conclusions 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Conclusions

Accuracy and implementability are two common yet usually conflicted objectives in algorithmic design. We show by some convex optimization models with strong application backgrounds (imaging, learning, cloud computing, big data, etc.) how to consider these two objectives. Interesting theoretical questions arise, such as the convergence rate analysis (introducing some new analytic tools like variational analysis).

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 36 / 37

slide-115
SLIDE 115

Conclusions 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Conclusions

Accuracy and implementability are two common yet usually conflicted objectives in algorithmic design. We show by some convex optimization models with strong application backgrounds (imaging, learning, cloud computing, big data, etc.) how to consider these two objectives. Interesting theoretical questions arise, such as the convergence rate analysis (introducing some new analytic tools like variational analysis). Extendable to more areas (e.g., PDE or PDE-constrained

  • ptimization (control) problems).

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 36 / 37

slide-116
SLIDE 116

Conclusions 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Conclusions

Accuracy and implementability are two common yet usually conflicted objectives in algorithmic design. We show by some convex optimization models with strong application backgrounds (imaging, learning, cloud computing, big data, etc.) how to consider these two objectives. Interesting theoretical questions arise, such as the convergence rate analysis (introducing some new analytic tools like variational analysis). Extendable to more areas (e.g., PDE or PDE-constrained

  • ptimization (control) problems).

Application-driven optimization makes sense!

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 36 / 37

slide-117
SLIDE 117

Conclusions 2014 Workshop on Optimization for Modern Computation, Peking Univesity

Thank you! xmyuan@hkbu.edu.hk

Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 37 / 37