Stochastic Forward-Backward Splitting Silvia Villa joint work with - - PowerPoint PPT Presentation

stochastic forward backward splitting
SMART_READER_LITE
LIVE PREVIEW

Stochastic Forward-Backward Splitting Silvia Villa joint work with - - PowerPoint PPT Presentation

Stochastic Forward-Backward Splitting Silvia Villa joint work with Lorenzo Rosasco and Bang Cong V u Laboratory for Computational and Statistical Learning, IIT and MIT http://lcsl.mit.edu/data/silviavilla 2015 Dagstuhl Seminar


slide-1
SLIDE 1

Stochastic Forward-Backward Splitting

Silvia Villa

joint work with Lorenzo Rosasco and Bang Cong V˜ u

Laboratory for Computational and Statistical Learning, IIT and MIT http://lcsl.mit.edu/data/silviavilla

2015 – Dagstuhl Seminar “Mathematical and Computational Foundations of Learning Theory”

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 1 / 36

slide-2
SLIDE 2

Introduction

Problem setting

Given a separable Hilbert space H, we consider the problem min

w∈H T(w),

T(w) = F(w) + R(w), with F : H → R convex and continuously differentiable, with Lipschitz continuous gradient, i.e., ∇F(w) − ∇F(w′) ≤ βw − w′ R : H → R ∪ {+∞} proper, convex, and lower semicontinuous

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 2 / 36

slide-3
SLIDE 3

Introduction

Statistical learning with regularization

Given: Hilbert space H, measure space (Ω, A, P), loss function ℓ : H × Ω → R+ Goal Approximate the infimum of F(w) =

  • H×Ω

ℓ(w, ξ)dP(ξ), given a training set {ξ1, . . . , ξm} of points sampled from P. If, for every ξ ∈ Ω: ℓ(·, ξ) is convex ∇ℓ(·, ξ) is Lipschitz continuous (uniformly w.r.t. ξ) then F is convex and ∇F is Lipschitz continuous.

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 3 / 36

slide-4
SLIDE 4

Introduction

Statistical learning with regularization

Given: Hilbert space H, measure space (Ω, A, P), loss function ℓ : H × Ω → R+ Goal Approximate the infimum of

  • H×Ω

ℓ(w, ξ)dP(ξ) + R(w), given a training set {ξ1, . . . , ξm} of points sampled from P. If, for every ξ ∈ Ω: ℓ(·, ξ) is convex ∇ℓ(·, ξ) is Lipschitz continuous (uniformly w.r.t. ξ) then F is convex and ∇F is Lipschitz continuous.

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 3 / 36

slide-5
SLIDE 5

Introduction

Statistical learning with regularization

Given: Hilbert space H, measure space (Ω, A, P), loss function ℓ : H × Ω → R+ Goal Approximate the infimum of

  • H×Ω

ℓ(w, ξ)dP(ξ) + R(w), given a training set {ξ1, . . . , ξm} of points sampled from P. If, for every ξ ∈ Ω: ℓ(·, ξ) is convex ∇ℓ(·, ξ) is Lipschitz continuous (uniformly w.r.t. ξ) then F is convex and ∇F is Lipschitz continuous.

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 3 / 36

slide-6
SLIDE 6

Introduction

Statistical learning with regularization cont’d

A common strategy is to minimize T(w) = 1 m

m

  • i=1

ℓ(w, ξi) + R(w) Example Given a Hilbert space X (input space), Y ⊂ R (output space), and L: R × Y → R+ set Ω = X × Y , ξ = (x, y), ℓ(w, ξ) = L(w, x, y).

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 4 / 36

slide-7
SLIDE 7

Introduction

Forward-backward splitting algorithm (proximal gradient algorithm)

Given w0 ∈ H and γn ∈ [ǫ, 2/β − ǫ], define FB wn+1 = proxγnR(wn − γn∇F(wn)) with proxγR(w) = argmin

v∈H

  • R(v) + 1

2γ v − w2

  • See [Bausckhe-Combettes, Convex analysis and monotone operator theory in

Hilbert spaces, 2011].

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 5 / 36

slide-8
SLIDE 8

Algorithm and convergence results

Stochastic forward-backward splitting algorithm

Given w0 such that E[w02] < +∞, γn > 0, define SFB wn+1 = proxγnR(wn − γn∇F(wn))

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 6 / 36

slide-9
SLIDE 9

Algorithm and convergence results

Stochastic forward-backward splitting algorithm

Given w0 such that E[w02] < +∞, γn > 0, define SFB wn+1 = proxγnR(wn − γnGn) where Gn is a stochastic estimate of the gradient.

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 6 / 36

slide-10
SLIDE 10

Algorithm and convergence results

Stochastic forward-backward splitting algorithm

Given w0 such that E[w02] < +∞, γn > 0, λn ∈ [0, 1], define SFB yn = proxγnR(wn − γnGn) wn+1 = (1 − λn)wn + λnyn. where Gn is a stochastic estimate of the gradient.

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 6 / 36

slide-11
SLIDE 11

Algorithm and convergence results

Online learning point of view

Duchi Singer 2009 Langford Li Zhang 2009 Shalev-Shwartz Shamir Srebro Sridaran 2009 Kakade Tewari 2008 Bottou Bousquet 2008 Hazan Kalai Kale Agarwal 2006

... ... Convergence analysis is often based on: Regret estimation + Online to batch conversions

Cesa-Bianchi-Conconi-Gentile 2004

which imply averaging of the iterates.

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 7 / 36

slide-12
SLIDE 12

Algorithm and convergence results

Outline

Convergence results for stochastic forward-backward for minimization problems Extension to monotone inclusions Primal-dual stochastic proximal methods

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 8 / 36

slide-13
SLIDE 13

Algorithm and convergence results

Assumptions

Let (Ω, F, P) be a probability space. Define the filtration Fn = σ({w0, . . . , wn}) and assume E[Gn2] < +∞ E[Gn|Fn] = ∇F(wn) E[Gn − ∇F(wn)2|Fn] ≤ σ2(1 + ∇F(wn)2),

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 9 / 36

slide-14
SLIDE 14

Algorithm and convergence results

The case of statistical learning

Given a Hilbert space, the objective is to minimize T(w) =

  • H×Y

ℓ(w, ξ)dP(ξ) + R(w) given a sequence of i.i.d. samples (ξi)i∈N. Then, Fn = σ(ξ1, . . . , ξn) and Gn = ∇ℓ(·, ξn)(wn) = ⇒ E[Gn|Fn] = ∇F(wn) E[Gn − ∇F(wn)2|Fn] ≤ σ2(1 + ∇F(wn)2) is a condition on the variance of the random variable ξ ∈ Ω → ∇ℓ(·, ξ)(w)

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 10 / 36

slide-15
SLIDE 15

Algorithm and convergence results

Statistical learning - Incremental FB algorithm

The goal is to minimize T(w) = 1 m

m

  • i=1

ℓ(w, ξi) + R(w) Let in : Ω → {1, . . . , m} be a sequence of independent r.v. such that, for every n and i, P[in = i] = 1/m. Then Gn = ∇ℓ(·, ξin))(wn) is such that E[Gn|Fn] = E[Gn] = 1

m

m

i=1 ∇ℓ(·, ξi)(wn).

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 11 / 36

slide-16
SLIDE 16

Algorithm and convergence results

Comparison between FB and SFB algorithm

The stochastic incremental FB algorithm becomes SFB wn+1 = proxγnR(wn − γn∇wℓ(wn, ξin)) The FB algorithm is FB wn+1 = proxγnR

  • wn − γn

1 m

m

  • i=1

∇wℓ(wn, ξi)

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 12 / 36

slide-17
SLIDE 17

Algorithm and convergence results

Convergence Results

Assume that a solution w exists. Given w0 such that E[w02] < +∞, γn, and Gn. SFBA wn+1 = proxγnR(wn − γnGn) Our contributions

1

Convergence rates for (E[wn − w2]

2

Almost sure convergence of wn

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 13 / 36

slide-18
SLIDE 18

Algorithm and convergence results

Convergence Results

Assume that a solution w exists. Given w0 such that E[w02] < +∞, γn, and Gn. SFBA wn+1 = proxγnR(wn − γnGn) Our contributions

1

convergence rates for E[wn − w2]

2

Almost sure convergence of wn

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 13 / 36

slide-19
SLIDE 19

Algorithm and convergence results

Convergence rates for E[wn − w2]

Main assumption F is µ strongly convex and R is ν strongly convex with µ + ν > 0. Theorem Let α > 0 and θ ∈ ]0, 1] . Assume that γn = α/nθ and suppose that there exists ǫ > 0 s.t. γn < (2 − ǫ) (1 + 2σ2)β (β is the Lipschitz constant of ∇F). Then, setting c = 2α(ν + µǫ)/(1 + ν)2, E[wn − w2] ≤

  • O(1/nθ)

if θ ∈ ]0, 1[ O(1/nc) + O(1/n) if θ = 1

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 14 / 36

slide-20
SLIDE 20

Algorithm and convergence results

Remarks and related work

c can be made greater than 1 by properly choosing α (knowledge of µ and ν is required)

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 15 / 36

slide-21
SLIDE 21

Algorithm and convergence results

Remarks and related work

c can be made greater than 1 by properly choosing α (knowledge of µ and ν is required) the obtained rate of convergence is the same that can be obtained using “accelerated” methods (see e.g. [Kwok-Hu-Pan, NIPS 2009,

Ghadimi-Lan 2012, Li-Chen-Pe˜ na 2014])

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 15 / 36

slide-22
SLIDE 22

Algorithm and convergence results

Remarks and related work

c can be made greater than 1 by properly choosing α (knowledge of µ and ν is required) the obtained rate of convergence is the same that can be obtained using “accelerated” methods (see e.g. [Kwok-Hu-Pan, NIPS 2009,

Ghadimi-Lan 2012, Li-Chen-Pe˜ na 2014]) the result is not asymptotic. An explicit estimate of the constants in the O terms is available (Chung Lemma)

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 15 / 36

slide-23
SLIDE 23

Algorithm and convergence results

Remarks and related work

c can be made greater than 1 by properly choosing α (knowledge of µ and ν is required) the obtained rate of convergence is the same that can be obtained using “accelerated” methods (see e.g. [Kwok-Hu-Pan, NIPS 2009,

Ghadimi-Lan 2012, Li-Chen-Pe˜ na 2014]) the result is not asymptotic. An explicit estimate of the constants in the O terms is available (Chung Lemma) extends to the non smooth case results that were known only in the smooth case [Bach-Moulines 2011]

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 15 / 36

slide-24
SLIDE 24

Algorithm and convergence results

Comparison with FOBOS (Duchi-Singer,2009)

A closely related algorithm is FOBOS wn+1 = proxγnR(wn − γnGn) wn+1 = n+1

k=0 γkwk

n+1

k=0 γk

no averages are computed in SFB algorithm convergence rate of SFB is faster (1/n) w.r.t. the one of FOBOS (log n/n) [Duchi-Singer, 2009] the convergence analysis of FOBOS does not require F differentiable and relies on boundedness of ∂F (square loss excluded) and ∂R this answers a question posed in [Rakhlin-Shamir-Sridaran, Making

gradient descent optimal for strongly convex stochastic optimization, 2012]

related results in [Atchade-Fort-Moulines, Arxiv 2014]

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 16 / 36

slide-25
SLIDE 25

Algorithm and convergence results

Averaging could be a bad idea

500 600 700 800 900 1000 1000 2000 3000 4000 5000 Sparsity Iterations SPG SAGE

Figure: Number of zero components of the vector wn − w with the same initial point for SPG and SAGE. The number of zero components of FOBOS is decreasing with the iterations, and close to 400.

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 17 / 36

slide-26
SLIDE 26

Algorithm and convergence results

Main Results

Assume that a solution w exists. Given w0 ∈ H, γn and Gn. SFB wn+1 = proxγnR(wn − γnGn) Our contributions

1

Convergence rates for E[wn − w2]

2

Almost sure convergence of wn

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 18 / 36

slide-27
SLIDE 27

Algorithm and convergence results

Almost sure convergence with uniform convexity

Theorem Suppose that F is uniformly convex at w and

  • γn = +∞
  • γ2

n < +∞.

Then wn − w → 0 almost surely Recall that a function F : H → R is uniformly convex at a point w if ∃ φ: R+ → R+ vanishing only at 0 such that (∀w ∈ H) ∇F(w) − ∇F(w), w − w ≥ Φ(w − w)

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 19 / 36

slide-28
SLIDE 28

Algorithm and convergence results

Almost sure convergence

Theorem Suppose that F is strictly convex at w and

  • γn = +∞
  • γ2

n < +∞.

If ∇F is weakly continuous, then there exists a subsequence tn wtn − w → 0 almost surely If R = ιC for some closed convex set C, then wn − w → 0 almost surely Recall that a function F : H → R is strictly convex at a point w if (∀w = w) ∇F(w) − ∇F(w), w − w > 0

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 20 / 36

slide-29
SLIDE 29

Algorithm and convergence results

Stochastic quasi Fej´ er sequences (Ermoliev 60s)

Let S be a non-empty subset of H. Definition A sequence of random vectors (wn)n∈N in H is stochastic quasi-Fej´ er monotone with respect to the set S if there exist ζn(x), tn(x) ∈ ℓ1

+(N) and

ξn(x) ≥ 0 s.t. for every w ∈ S E[w02] < +∞ and E[wn+1 − w2|Fn] ≤ (1 + tn(w))wn − w2 + ζn(w) − ξn(w).

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 21 / 36

slide-30
SLIDE 30

Algorithm and convergence results

Properties of stochastic quasi Fej´ er sequences

Theorem Let (wn)n∈N in H be stochastic quasi-Fej´ er monotone with respect to the set S. Let w ∈ S. Then, (wn)n∈N is bounded a.s. If the set of weak cluster points of (wn)n∈N is contained in S a.s., then (wn) weakly converges to a random vector in S a.s. See [Robbins-Siegmund 1971] Other properties in [Combettes-Pesquet 2014]

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 22 / 36

slide-31
SLIDE 31

Stochastic forward-backward splitting for monotone inclusions

Problem setting

Given a Hilbert space H, we consider the problem Find w ∈ H such that T(w) = min

w (F(w) + R(w)) ,

with F : H → R convex and continuously differentiable, with Lipschitz continuous gradient, i.e., ∇F(w) − ∇F(w′) ≤ βw − w′ R : H → R ∪ {+∞} proper, convex, and lower semicontinuous

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 23 / 36

slide-32
SLIDE 32

Stochastic forward-backward splitting for monotone inclusions

Problem setting

Given a Hilbert space H, we consider the problem Find w ∈ H such that 0 ∈ ∇F(w) + ∂R(w) with F : H → R convex and continuously differentiable, with Lipschitz continuous gradient, i.e., ∇F(w) − ∇F(w′) ≤ βw − w′ R : H → R ∪ {+∞} proper, convex, and lower semicontinuous

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 23 / 36

slide-33
SLIDE 33

Stochastic forward-backward splitting for monotone inclusions

Extensions: monotone inclusions framework

Given a Hilbert space H, we consider the problem Find w ∈ H such that 0 ∈ Aw

  • ∂R(w)

+ Bw

  • ∇F(w)

, where A: H → 2H is maximally monotone, i.e. for every w and w′ in H and for every u ∈ Aw and u′ ∈ Aw′ w − w′, u − u′ ≥ 0 and there exists no monotone operator whose graph properly contains the graph of A. B : H → H is single valued and, for every v and w in H, v − w, Bv − Bw ≥ (1/β)Bv − Bw, U(Bv − Bw), with U : H → H bounded linear, self-adjoint, and strongly positive.

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 24 / 36

slide-34
SLIDE 34

Stochastic forward-backward splitting for monotone inclusions

Stochastic forward-backward splitting

Given a Hilbert space H, we consider the problem Find w ∈ H such that 0 ∈ Aw + Bw, Assume U = Id. Given w0 such that E[w02] < +∞, γn > 0, and Gn. SFBA wn+1 = JγnA(wn − γnGn) with JγnA = (I + γnA)−1 and E[Gn|Fn] = Bwn.

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 25 / 36

slide-35
SLIDE 35

Stochastic forward-backward splitting for monotone inclusions

Preconditioned inertial stochastic FB splitting

Given a Hilbert space H, we consider the problem Find w ∈ H such that 0 ∈ Aw + Bw, Given w0 such that E[w02] < +∞, γn ∈ [ǫ, (2 − ǫ)β], αn ∈ [0, 1 − ǫ], and Gn ∈ H. SIFB zn = wn + αn(wn − wn−1) wn+1 = JγnUA(zn − γnUGn) with JγnUA = (I + γnUA)−1 and E[Gn|Fn] = Bzn. In the deterministic setting: [Lorenz Pock 2015]

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 26 / 36

slide-36
SLIDE 36

Stochastic forward-backward splitting for monotone inclusions

Results

Theorem Assume that:

  • n E[Gn − Bzn2|Fn] < +∞

supn wn − wn−1 < +∞ a.s. and

n αn < +∞.

Then, there exists w s.t. 0 ∈ Aw + Bw a.s. and wn ⇀ w a.s. Bwn → Bw if B is demiregular at w, then wn → w.

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 27 / 36

slide-37
SLIDE 37

Stochastic forward-backward splitting for monotone inclusions

Remarks

Deterministic inertial algorithms for monotone inclusions:

  • n αnwn − wn−12 < +∞ [Lorenz and Pock 2015]

Nonvanishing stepsize γn Strong assumptions on the stochastic approximation (weaker than

[Combettes and Pesquet 2014]). See [Bianchi and Hachem August 2015]

for new results! Strong assumptions on αn (not including standard choices) Computational cost: high if JγnUA is difficult to compute. Unchanged if U is simple The monotone inclusion formulation with fixed stepsize and preconditioning U allows to derive inertial stochastic primal-dual algorithms (in the deterministic case: [Attouch-Briceno-Arias-Combettes,

Combettes-Pesquet, Vu, Chambolle-Pock, Loris-Verhoeven (2010-2014)])

Related results for block coordinate methods in [Pesquet-Repetti 2015]

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 28 / 36

slide-38
SLIDE 38

Primal-dual stochastic algorithms

Structured minimization problems

Given H, K Hilbert spaces minimizew∈H F(w) + R(w) + Φ(Lw) where F : H → R convex, differentiable, with a Lipschitz continuous gradient R : H → R ∪ {+∞} convex and lsc L: H → K bounded and linear, Φ: K → R ∪ {+∞} convex and lsc

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 29 / 36

slide-39
SLIDE 39

Primal-dual stochastic algorithms

Structured minimization problems

Given Hilbert spaces H1, . . . , Hm and K1, . . . , Ks min

w1∈H1,...,wm∈Hm

F(w1, . . . , wm) +

m

  • i=1

Ri(wi) +

s

  • k=1

(ℓkΦk) m

  • i=1

Lk,iwi

  • where

F : H1 × . . . × Hm → R convex, differentiable, with a Lipschitz continuous gradient Ri : Hi → R ∪ {+∞} convex and lsc Lk,i : Hi → Kk bounded and linear, Φk : Kk → R ∪ {+∞} convex and lsc, ℓk : Kk → R ∪ {+∞} strongly convex

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 29 / 36

slide-40
SLIDE 40

Primal-dual stochastic algorithms

A stochastic Inertial Primal-Dual Algorithm

Given w0 = w−1 ∈ H and v0 = v−1 ∈ K, τ > 0, σ ∈ ]0, 2/β[, with τσ < 1/L2 SIPD Inertial step: zn = wn + αn(wn − wn−1) dn = vn + αn(vn − vn−1)

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 30 / 36

slide-41
SLIDE 41

Primal-dual stochastic algorithms

A stochastic Inertial Primal-Dual Algorithm

Given w0 = w−1 ∈ H and v0 = v−1 ∈ K, τ > 0, σ ∈ ]0, 2/β[, with τσ < 1/L2 SIPD Inertial step: zn = wn + αn(wn − wn−1) dn = vn + αn(vn − vn−1) Primal variable update: wn+1 = proxσR(zn − σ(L∗dn + Gn)) yn+1 = 2wn+1 − zn with: E[Gn|Fn] = ∇F(zn), E[Dn|Fn] = 0

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 30 / 36

slide-42
SLIDE 42

Primal-dual stochastic algorithms

A stochastic Inertial Primal-Dual Algorithm

Given w0 = w−1 ∈ H and v0 = v−1 ∈ K, τ > 0, σ ∈ ]0, 2/β[, with τσ < 1/L2 SIPD Inertial step: zn = wn + αn(wn − wn−1) dn = vn + αn(vn − vn−1) Primal variable update: wn+1 = proxσR(zn − σ(L∗dn + Gn)) yn+1 = 2wn+1 − zn Dual variable update vn+1 = proxτΦ∗(dn + τ(Lyn+1 − Dn)) with: E[Gn|Fn] = ∇F(zn), E[Dn|Fn] = 0

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 30 / 36

slide-43
SLIDE 43

Primal-dual stochastic algorithms

Convergence analysis

Primal dual monotone inclusion Find (w, v) ∈ H × K such that (0, 0) ∈ (∇F(w) + ∂R(w) + L∗v, −Lw + ∂Φ∗v) A = (∂R + L∗, −L + ∂Φ∗) B = (∇F, 0) Convergence follows by noting that (zn, dn) = (wn + αn(wn − wn−1), vn + αn(vn − vn−1) (wn+1, vn+1) = JUA((zn, dn) − U(Gn, Dn)) with U−1 = I/σ −L∗ L I/τ

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 31 / 36

slide-44
SLIDE 44

Primal-dual stochastic algorithms

The case R = 0.

Given H Hilbert space minimizew∈H F(w) + Φ(Lw) where F : H → R convex, differentiable, with a Lipschitz continuous gradient K Hilbert space, L: H → K bounded and linear, Φ: K → R ∪ {+∞} convex and lsc

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 32 / 36

slide-45
SLIDE 45

Primal-dual stochastic algorithms

Another inertial stochastic primal-dual algorithm

Given w0 = w−1 ∈ H and v0 = v−1 ∈ K, τ > 0, σ ∈ ]0, 2/β[ with τσ < 1/L2. SIPD Inertial step: zn = wn + αn(wn − wn−1) dn = vn + αn(vn − vn−1)

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 33 / 36

slide-46
SLIDE 46

Primal-dual stochastic algorithms

Another inertial stochastic primal-dual algorithm

Given w0 = w−1 ∈ H and v0 = v−1 ∈ K, τ > 0, σ ∈ ]0, 2/β[ with τσ < 1/L2. SIPD Inertial step: zn = wn + αn(wn − wn−1) dn = vn + αn(vn − vn−1) Dual variable update: yn = zn − σ(Gn + L∗dn) vn+1 = proxτΦ∗(dn − τ(Lyn − Dn)) with: E[Gn|Fn] = ∇F(zn), E[Dn|Fn] = 0

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 33 / 36

slide-47
SLIDE 47

Primal-dual stochastic algorithms

Another inertial stochastic primal-dual algorithm

Given w0 = w−1 ∈ H and v0 = v−1 ∈ K, τ > 0, σ ∈ ]0, 2/β[ with τσ < 1/L2. SIPD Inertial step: zn = wn + αn(wn − wn−1) dn = vn + αn(vn − vn−1) Dual variable update: yn = zn − σ(Gn + L∗dn) vn+1 = proxτΦ∗(dn − τ(Lyn − Dn)) Primal variable update wn+1 = zn − σ(Gn + L∗vn+1) with: E[Gn|Fn] = ∇F(zn), E[Dn|Fn] = 0

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 33 / 36

slide-48
SLIDE 48

Primal-dual stochastic algorithms

Convergence analysis

Primal dual monotone inclusion Find (w, v) ∈ H × K such that (0, 0) ∈ (∇F(w) + L∗v, −Lw + ∂Φ∗v) A = (L∗, −L + ∂Φ∗) B = (∇F, 0) Convergence follows by noting that (zn, dn) = (wn + αn(wn − wn−1), vn + αn(vn − vn−1)) (wn+1, vn+1) = JUA((zn, dn) − U(Gn, Dn)) with U−1 = I/σ I/τ − σLL∗

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 34 / 36

slide-49
SLIDE 49

Primal-dual stochastic algorithms

Concluding remarks

convergence rates and almost sure convergence for stochastic forward-backward algorithm under general error conditions extension to monotone inclusions stochastic forward-backward splitting: almost sure convergence derivation of primal-dual stochastic algorithms

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 35 / 36

slide-50
SLIDE 50

Primal-dual stochastic algorithms

References

  • L. Rosasco, S. Villa, and B. C. V˜

u, Convergence of stochastic proximal gradient algorithm, arxiv 1403.5074

  • L. Rosasco, S. Villa, and B. C. V˜

u, Stochastic forward-backward splitting for monotone inclusions, arxiv 1403.7999

  • L. Rosasco, S. Villa, and B. C. V˜

u, A stochastic inertial forward-backward splitting algorithm for multivariate monotone inclusions, arXiv:1507.00848

  • S. Villa (LCSL - IIT and MIT)

Stochastic forward-backward 36 / 36