Operator splitting techniques and their application to embedded - - PowerPoint PPT Presentation

operator splitting techniques and their application to
SMART_READER_LITE
LIVE PREVIEW

Operator splitting techniques and their application to embedded - - PowerPoint PPT Presentation

Operator splitting techniques and their application to embedded optimization problems Puya Latafat (Joint work with Panagiotis Patrinos) IMT School for Advanced Studies Lucca puya.latafat@imtlucca.it Department of Electrical Engineering


slide-1
SLIDE 1

Operator splitting techniques and their application to embedded optimization problems

Puya Latafat (Joint work with Panagiotis Patrinos)

IMT School for Advanced Studies Lucca puya.latafat@imtlucca.it Department of Electrical Engineering (ESAT-STADIUS), KU Leuven panos.patrinos@esat.kuleuven.be

September 1, 2016

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 1 / 19

slide-2
SLIDE 2

Outline

◮ structured optimization problem ◮ monotone operators and the splitting principle ◮ a primal-dual algorithm ◮ application: distributed optimization

Based on

  • 1. Latafat and Patrinos. "Asymmetric Forward-Backward-Adjoint Splitting

for Solving Monotone Inclusions Involving Three Operators." arXiv preprint arXiv:1602.08729 (2016).

  • 2. Latafat, Stella, Patrinos. "New Primal-Dual Proximal Algorithms for

Distributed Optimization" accepted for 56th IEEE Conference on Decision and Control (2016)

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 2 / 19

slide-3
SLIDE 3

Structured Optimization Problem

minimize

x∈I Rn

f (x)

  • nonsmooth

+ g(Lx)

nonsmooth

+ h(x)

  • smooth

◮ f : I

Rn → ¯ I R, g : I Rm → ¯ I R are proper closed convex functions with easy to compute proximal maps

◮ L is a linear operator from I

Rm to I Rn

◮ h is a differentiable function and ∇h(·) is β-Lipschitz

example 1: MPC formulations, h being the quadratic function, L encoding the dynamics, g and f indicator functions for constraint

  • n states and inputs

example 2: distributed optimization over graphs (In this talk) more examples: machine learning and signal processing

◮ Goal: find the solution as a fix point of an operator

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 3 / 19

slide-4
SLIDE 4

Example: Generalized Lasso with Box Constraint

minimize

x∈I Rn 1 2Ax − b2 2 + λLx1

subject to l ≤ x ≤ u, minimize

x∈I Rn 1 2Ax − b2 2

  • h(x)

+ λLx1

  • g(Lx)

+ δl≤x≤u(x)

  • f (x)

◮ we want algorithms that involve only L, L⊤, proxf , proxg, ∇h ◮ proxg◦L is not trivial (unless L⊤L = α Id) ◮ no inner loops or linear systems to solve ◮ no need to introduce dummy variables

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 4 / 19

slide-5
SLIDE 5

Subgradients and Monotone Operators

◮ the subdifferential of f is the set valued operator:

∂f : x → {u ∈ I Rn|(∀y ∈ dom f )y − x, u + f (x) ≤ f (y)} example: if 0 ∈ ∂f (x⋆) ⇒ f (x⋆) ≤ f (y) ∀y ∈ dom f example: for differentiable f , ∂f = ∇f x1 f (x) f (x1) + u1, x − x1 f (x1) + u2, x − x1

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 5 / 19

slide-6
SLIDE 6

◮ set valued mapping A is monotone if

x − y, u − v ≥ 0 ∀x, y, u ∈ Ax, v ∈ Ay example: ∂f for proper convex function f

◮ A is maximally monotone if it is not contained in graph of

another mapping example: ∂f for proper closed convex function f x y

slide-7
SLIDE 7

◮ set valued mapping A is monotone if

x − y, u − v ≥ 0 ∀x, y, u ∈ Ax, v ∈ Ay example: ∂f for proper convex function f

◮ A is maximally monotone if it is not contained in graph of

another mapping example: ∂f for proper closed convex function f x y

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 6 / 19

slide-8
SLIDE 8

◮ proximal mapping of proper closed convex function f :

proxγf (x) = argmin

z

  • f (z) + 1

2γ x − z2

  • ◮ unique minimizer

◮ closed form solution for many functions such as l1, l2 norms,

quadratic, log barrier,... example: f = δC ⇒ proxγf = PC f (x) = 1

2x⊤Qx + q⊤x ⇒ proxγf = (I + γQ)−1(x − γq)

◮ equivalently 0 ∈ z − x + γ∂f (z), the resolvent of ∂f

Jγ∂f = (Id +γ∂f )−1 = proxγf

◮ not every monotone operator can be written as subgradient of

a function example: a linear skew symmetric matrix is monotone but not subgradient of any function

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 7 / 19

slide-9
SLIDE 9

Operator Splitting framework

minimize

x∈I Rn

f (x)

  • nonsmooth

+ g(Lx)

nonsmooth

+ h(x)

  • smooth

◮ unconstrained minimization ◮ optimality condition

0 ∈ ∂f (x) + L∗∂g(Lx) + ∇h(x)

◮ monotone inclusion form:

0 ∈ Ax + L∗BLx + Cx

◮ A, B, ∂f , ∂g are set valued. C is single valued

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 8 / 19

slide-10
SLIDE 10

Initial Value Problem and Euler’s methods

the path following problem

dx(t) dt

= −∇h(x(t)) x(0) = x0 x(t) → x⋆ such that x⋆ minimizes h(·)

◮ Euler’s forward method (explicit)

x(t + △t) − x(t) △t ∼ = −∇h(x(t)) ⇒ xk+1 = xk − γ∇h(xk)

◮ Euler’s backward method (implicit)

x(t + △t) − x(t) △t ∼ = −∇h(x(t + △t)) ⇒ xk+1 = xk−γ∇h(xk+1)

◮ implicit method is known to have better stability properties

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 9 / 19

slide-11
SLIDE 11

◮ the big idea is to generalize this to the inclusion problem

0 ∈ dx(t) dt + Tx(t)

◮ the forward step (explicit) ⇒ xk+1 ∈ xk − γTxk ◮ the backward step (implicit, less sensitive to ill conditioning)

xk+1 ∈ (Id +γT)−1

  • resolvent

xk

◮ the splitting principle

find x such that 0 ∈ Tx

◮ the idea of splitting principle is to combine these basic

  • perations, also borrowed from finite differences

◮ the backward step JγT = (Id +γT)−1 might not be easy to

compute

◮ split T = A + B + · · · with one or more having easy to

compute resolvent

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 10 / 19

slide-12
SLIDE 12

Operator Splittings

◮ two term splittings:

◮ forward-backward splitting (Mercier, 1979) ◮ Douglas-Rachford splitting (Lions and Mercier, 1979) ◮ Tseng’s forward-backward-forward splitting (Tseng 2000)

◮ three term splittings:

◮ three-operator splitting (Davis and Yin, 2015) ◮ V˜

u-Condat’s primal-dual Algorithm, equivalent to forward-backward in a certain space (V˜ u and Condat, 2013)

◮ forward-Douglas-Rachford splitting, only when third operator is

a normal cone operator (Briceño-Arias, 2013) splitting

  • ur proposed method: Asymmetric Forward-Backward-Adjoint

splitting (AFBA)

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 11 / 19

slide-13
SLIDE 13

Forward-Backward Spliting

◮ monotone inclusion 0 ∈ Ax + Bx ◮ A maximally monotone, B single valued and cocoercive. ◮ minimization problem

minimize

x∈I Rn

f (x)

  • nonsmooth

+ h(x)

  • smooth

◮ forward-backward iteration

xn+1 = (Id +γA)−1(Id −γB)xn

◮ proximal gradient method:

xn+1 = proxγf (xn − γ∇h(xn))

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 12 / 19

slide-14
SLIDE 14

A Primal-Dual algorithm

Algorithm 1 Inputs: x0 ∈ I Rn, y0 ∈ I Rm for n = 0, . . . do ¯ xn = proxγ1f (xn − γ1L⊤yn − γ1∇h(xn)) yn+1 = proxγ2g∗(yn + γ2L¯ xn) xn+1 = ¯ xn − γ1L⊤(yn+1 − yn)

◮ Arrow-Hurwicz updates ◮ it converges if βhγ1 < 2 − γ1γ2L2 −

  • γ1γ2L2.

◮ 2 matrix vector products ◮ new algorithm ◮ it generalizes Drori and Sabach and Teboulle, 2015, to include

a nonsmooth term f

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 13 / 19

slide-15
SLIDE 15

Example: Generalized Lasso with Box Constraint

minimize

x∈I Rn 1 2Ax − b2 2 + λLx1

subject to l ≤ x ≤ u, minimize

x∈I Rn 1 2Ax − b2 2

  • h(x)

+ λLx1

  • g(Lx)

+ δl≤x≤u(x)

  • f (x)

The first two steps become ¯ xn = Pl≤x≤u(xn − γ1L⊤yn − γ1A⊤(Axn − b)) yn+1 = ¯ xn − γ−1

2

sign(¯ xn) max {|¯ xn| − 1, 0} xn+1 = ¯ xn − γ1L⊤(yn+1 − yn)

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 14 / 19

slide-16
SLIDE 16

Distributed Optimization

◮ large networks each having it’s own data processing unit ◮ each agent can communicate with its neighbors and is not

aware of other agents in the network

◮ plug and play or distributed reconfiguration: with addition or

removal of new agents, only the neighbors are

◮ possibility of asynchronous algorithms including transmission

delays

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 15 / 19

slide-17
SLIDE 17

Application: AFBA and Distributed Optimization

minimize

x∈I Rn

N

i=1

fi(x)

nonsmooth

+ gi(Lix)

nonsmooth

+ hi(x)

smooth ◮ N agents, each only with private

fi, gi, Li, hi

◮ undirected connected graph

G = (V , E)

◮ each agent i, can communicate

with its neighbors j ∈ Ni = {j ∈ V | (i, j) ∈ E} 1 2 3 5 4 6 1 N1 = {2, 4, 6} f1, g1, L1, h1

◮ goal: minimize the aggregate of private cost functions over a connected

graph minimize

xi∈I Rn N

  • i=1

fi(xi) + gi(Lixi) + hi(xi) subject to xi = xj (i, j) ∈ E

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 16 / 19

slide-18
SLIDE 18

Application: AFBA and Distributed Optimization

minimize

x∈I Rn

N

i=1

fi(x)

nonsmooth

+ gi(Lix)

nonsmooth

+ hi(x)

smooth ◮ N agents, each only with private

fi, gi, Li, hi

◮ undirected connected graph

G = (V , E)

◮ each agent i, can communicate

with its neighbors j ∈ Ni = {j ∈ V | (i, j) ∈ E} 1 2 3 5 4 6 1 N1 = {2, 4, 6} f1, g1, L1, h1

◮ goal: minimize the aggregate of private cost functions over a connected

graph minimize

xi∈I Rn N

  • i=1

fi(xi) + gi(Lixi) + hi(xi) subject to xi = xj (i, j) ∈ E

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 16 / 19

slide-19
SLIDE 19

Application: AFBA and Distributed Optimization

◮ consider any orientation ◮ B is any node-arc incidence matrix ◮ A = B⊤ ⊗ In ∈ I

RMn×Nn minimize

xi∈I Rn

N

i=1 fi(xi) + hi(xi) + gi(Cixi) + δ{0}(Ax), ◮ has the structure suitable for AFBA ◮ inherits properties of AFBA such as big-O(1/(n + 1)) and little-o(1/(n + 1))

Algorithm 2 Distributed version of Algorithm 4 Inputs: σi > 0, τi > 0, κi,j > 0 for j ∈ Ni, i = 1, . . . , N, initial values x0

i ∈ I

Rn, y0

i ∈ I

Rri, ρ0

i ∈ I

Rn, v0

i = C⊤ i y0 i + ρ0 i .

for k = 1, . . . do for each agent i = 1, . . . , N do Local steps: ¯ xk

i = proxσifi(xk i − σivk i − σi∇hi(xk i ))

yk+1

i

= proxτig∗

i (yk

i + τiCi¯

xk

i )

Exchange of information with neighbors: ρk+1

i

= ρk

i + j∈Ni κi,j(¯

xk

i − ¯

xk

j )

Local steps: vk+1

i

= C⊤

i yk+1 i

+ ρk+1

i

xk+1

i

= ¯ xk

i − σi(vk+1 i

− vk

i ) Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 17 / 19

slide-20
SLIDE 20

Numerical simulation

minimize λx1 +

N

  • i=1

1 2Dix − di2 2

subject to Cix ≤ ci, i = 1, . . . , N.

◮ fi(x) = λ N x1, gi(x) = δ≤ci(x),

hi(x) = 1

2Dix − di2 2 ◮ N = 64, n = 500 ◮ Di ∈ I

Rmi×n, Ci ∈ I Rri×n, di ∈ I Rmi, ci ∈ I Rri

◮ state of the art: V˜

u-Condat

◮ new primal-dual algorithms 3 and 4 ◮ distributed version of Algorithm 4

usually performs better and required less information exchange.

1,000 2,000 3,000 4,000 5,000 10−7 10−5 10−3 10−1 101

  • Dist. V˜

u-Condat

  • Dist. Algorithm 3
  • Dist. Algorithm 4

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 18 / 19

slide-21
SLIDE 21

Thank You

Puya Latafat IMTLucca, KU-Leuven AFBA September 1, 2016 19 / 19