Regularizing with BregmanMoreau envelopes Heinz H. Bauschke Minh N. - - PowerPoint PPT Presentation

regularizing with bregman moreau envelopes
SMART_READER_LITE
LIVE PREVIEW

Regularizing with BregmanMoreau envelopes Heinz H. Bauschke Minh N. - - PowerPoint PPT Presentation

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Regularizing with BregmanMoreau envelopes Heinz H. Bauschke Minh N. Dao Scott B Lindstrom COCANA CARMA CARMA University of British University of Newcastle University of


slide-1
SLIDE 1

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion

Regularizing with Bregman–Moreau envelopes

Heinz H. Bauschke COCANA

University of British Columbia

heinz.bauschke@ubc.ca

Minh N. Dao CARMA

University of Newcastle

daonminh@gmail.com

Scott B Lindstrom CARMA

University of Newcastle

scott.lindstrom@uon.edu.au

AMSI Optimise 2017

Revised July 4, 2017

1 / 39

slide-2
SLIDE 2

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion

Outline I

1 Moreau Envelopes

Introduction Epigraph Intuition Classical Results Prox Operators

2 Bregman Distance

Definition Our assumptions on f Three demo functions What Changes?

3 Bregman Envelopes

Introduction Results Envelopes Prox Operators

4 Conclusion

Summary References

2 / 39

slide-3
SLIDE 3

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Epigraph Intuition Classical Results Prox Operators

Definition 1 (Moreau envelope) Moreau envelope with parameter γ ∈ R++ envγ

θ : x → inf y∈X θ(y) + 1

2γ x − y2. (1) A special case of infimal convolution: θf : Rn → [−∞, ∞] : x → inf

y∈Rn (θ(y) + f (x − y))

(“exact” if θf (x) = min

y∈Rn (θ(y) + f (x − y)) ∀x ∈ dom θf

Moreau only considered γ = 1 Systematic study involving γ originated with Attouch [2][3]

3 / 39

slide-4
SLIDE 4

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Epigraph Intuition Classical Results Prox Operators

Epigraph Intuition

Think: smoothing through epigraph addition epi(θf ) = epi θ + epi f 1is always true when θf is exact.

1Minkowski sum 4 / 39

slide-5
SLIDE 5

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Epigraph Intuition Classical Results Prox Operators

Limiting case

As γ → 0 we recover θ

5 / 39

slide-6
SLIDE 6

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Epigraph Intuition Classical Results Prox Operators

Limiting Case

As γ → ∞ we recover min θ

6 / 39

slide-7
SLIDE 7

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Epigraph Intuition Classical Results Prox Operators

Varying the Parameter

Varying the parameter

7 / 39

slide-8
SLIDE 8

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Epigraph Intuition Classical Results Prox Operators

Prox Operators

Throughout: θ is a lower semicontinuous convex function of Legendre type Definition 2 (Prox Operator) Proxγθ(x) is the unique point satisfying envγθ(x) = min

y∈Rn

  • θ(y) + 1

2γ x − y2

  • = θ(Proxγθ(x)) + 1

2γ x − Proxγθ(x)2

8 / 39

slide-9
SLIDE 9

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Epigraph Intuition Classical Results Prox Operators

Prox Operators: Geometric Intuition

Figure: Where θ = |y + x − 1| and γ = 1/2 Figure: The net Prox γθ(1, 2) where γ ∈]0, ∞[

9 / 39

slide-10
SLIDE 10

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Epigraph Intuition Classical Results Prox Operators

Prox Operators: Limiting Cases

Figure: limγ→∞ Prox γθ(x) = P

argmin θ(x)

10 / 39

slide-11
SLIDE 11

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Epigraph Intuition Classical Results Prox Operators

Prox Operators: Limiting Cases

Figure: limγ→0 Prox γθ(x) = x

11 / 39

slide-12
SLIDE 12

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Epigraph Intuition Classical Results Prox Operators

Special case: projection

Remark 1 (Special case: projection) When θ = ιC is the indicator of C Prox γθ(x) = PC(x) is the projection operator.

12 / 39

slide-13
SLIDE 13

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Definition Our assumptions on f Three demo functions What Changes?

Bregman Distance

Definition 3 (Bregman Distance) The Bregman Distance of a function f between two points x, y is Df (x, y) = f (x) − f (y) − ∇f (y), x − y y x f (y) f (x) ∇f (y), x − y Df (x, y)

Figure: Bregman distance where the function f is the Boltzmann-Shannon Entropy x → x log(x) − x

13 / 39

slide-14
SLIDE 14

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Definition Our assumptions on f Three demo functions What Changes?

Our assumptions on f

We assume:

1 f is a lower semicontinuous convex function of Legendre type

and U := int dom f .

2 ∇2f exists and is continuous on U; 3 Df is jointly convex, i.e., convex on X × X; 4 (∀x ∈ U) Df (x, ·) is strictly convex on U; 5 (∀x ∈ U) Df (x, ·) is coercive, i.e., Df (x, y) → +∞ as

y → +∞.

14 / 39

slide-15
SLIDE 15

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Definition Our assumptions on f Three demo functions What Changes?

Our demo functions

Where x, y ∈ RJ:

1

Energy: If f : x → 1

2 x2, then

Df (x, y) = 1

2 x − y2 2

Boltzmann–Shannon2 entropy: If f : x →

J

  • j=1

xj ln(xj) − xj, then one obtains the Kullback–Leibler divergence Df (x, y) = J

j=1 xj ln(xj/yj) − xj + yj,

if x ≥ 0 and y > 0; +∞,

  • therwise.

3

Fermi–Dirac entropy: If f : x →

J

  • j=1

xj ln(xj) + (1 − xj) ln(1 − xj), then

Df (x, y) = J

j=1 xj ln(xj/yj) + (1 − xj) log

  • (1 − xj)/(1 − yj)
  • ,

if 0 ≤ x ≤ 1 and 0 < y < 1; +∞,

  • therwise.

2With Boltzmann–Shannon entropy and Fermi–Dirac entropy, we use convention 0 · ln(0) := 0. 15 / 39

slide-16
SLIDE 16

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Definition Our assumptions on f Three demo functions What Changes?

What Changes?

We lose triangle Inequality. Df (x, y) y x z f (y) f (x) f (z) Df (z, y) Df (x, z)

Figure: Where f is the Boltzmann-Shannon Entropy, Df (x, y) > Df (z, y) + Df (x, z).

16 / 39

slide-17
SLIDE 17

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Definition Our assumptions on f Three demo functions What Changes?

What Changes?

We also lose symmetry... and translation invariance.

17 / 39

slide-18
SLIDE 18

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Definition Our assumptions on f Three demo functions What Changes?

What Changes?

Except when using the energy, of course.

18 / 39

slide-19
SLIDE 19

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Results Envelopes Prox Operators

Bregman Envelopes

Definition 4 For a given θ, f where int dom f ∩ dom θ = ∅: The left Bregman envelope is ← − envγ

θ : X → [−∞, +∞] : y → inf x∈X θ(x) + 1

γ Df (x, y) (2) The right Bregman envelope is − → envγ

θ : X → [−∞, +∞] : x → inf y∈X θ(y) + 1

γ Df (x, y), (3) If f = 1

2 · 2, then Df : (x, y) → 1 2x − y2, and

← − envγ

θ = −

→ envγ

θ = θ ( 1 2γ · 2) is the classical Moreau envelope of θ

  • f parameter γ.

19 / 39

slide-20
SLIDE 20

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Results Envelopes Prox Operators

Bregman Envelopes

Consider the following properties: (a) U ∩ dom θ is bounded. (b) inf θ(U) > −∞. (c) f is supercoercive, i.e., f (x)/x → +∞ as x → +∞. (d) (∀x ∈ U) Df (x, ·) is supercoercive. Then the following hold (and we suppose them moving forward): If any of (a), (b), or (c) holds, then (∀y ∈ U) θ(·) + 1 γ Df (·, y) is coercive If any of (a), (b), or (d) holds, then (∀x ∈ U) θ(·) + 1 γ Df (x, ·) is coercive.

20 / 39

slide-21
SLIDE 21

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Results Envelopes Prox Operators

Bregman Proximity Operators

Definition 5 (Bregman Proximity Operators) For a given θ, f where int dom f ∩ dom θ = ∅:

1 The left prox operator is

← − Pθ : int dom f → int dom f : y → argmin

x∈X

θ(x) + 1 γ Df (x, y).

2 The right prox operator is

− → Pθ : int dom f → int dom f : x → argmin

y∈X

θ(y) + 1 γ Df (x, y).

21 / 39

slide-22
SLIDE 22

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Results Envelopes Prox Operators

Epigraphs

Shown: left envelope with Boltzman- Shannon entropy Still regularizes Addition changes

22 / 39

slide-23
SLIDE 23

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Results Envelopes Prox Operators

Epigraphs

Shown: right envelope with Boltzman- Shannon entropy Think: what about limiting cases?

23 / 39

slide-24
SLIDE 24

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Results Envelopes Prox Operators

Results

With x, y ∈ int dom f the following hold: As γ ↓ 0:

1

Left case: ← − envγ

θ(y) ↑ θ(y), θ(←

− Pγθ(y)) ↑ θ(y),

1 γ Df (←

− Pγθ(y), y) → 0, and ← − Pγθ(y) → y.

2

Right case: − → envγ

θ(x) ↑ θ(x), θ(−

→ Pγθ(x)) ↑ θ(x),

1 γ Df (x, −

→ Pγθ(x)) → 0, and − → Pγθ(x) → x.

As γ ↑ ∞:

1

Left case: ← − envγ

θ(y) ↓ inf θ(X) and if argmin θ ⊆ int dom f ,

then ← − Pγθ(y) → ← − Pargmin θy as γ ↑ +∞.

2

Right case: − → envγ

θ(x) ↓ inf θ(X) and if argmin θ ⊆ int dom f ,

then − → Pγθ(x) → − → Pargmin θx as γ ↑ +∞.

24 / 39

slide-25
SLIDE 25

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Results Envelopes Prox Operators

Example: Boltzmann-Shannon Entropy

25 / 39

slide-26
SLIDE 26

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Results Envelopes Prox Operators

Example: Fermi-Dirac Entropy

26 / 39

slide-27
SLIDE 27

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Results Envelopes Prox Operators

Prox Operators: Geometric Intuition

Figure: θ = |y + x − 1| and Df (·, (1, 2)) where γ = 1 and f is Boltzmann-Shannon entropy Figure: The net ← − Pγθ(1, 2) where γ ∈]0, ∞[

27 / 39

slide-28
SLIDE 28

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Results Envelopes Prox Operators

Bregman Projections

Definition 6 (Bregman projectors) Let C be a closed convex subset

  • f X, int dom f ∩ C = ∅.

← − PC := ← − PιC is the left Bregman projector onto C − → PC := − → PιC is the right Bregman projector onto C. Bregman and Euclidean projections may differ in Rn for n > 1.

28 / 39

slide-29
SLIDE 29

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Results Envelopes Prox Operators

Bregman Projections

Figure: Bregman projection with Boltzmann-Shannon entropy and with the energy (Euclidean case)

29 / 39

slide-30
SLIDE 30

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Results Envelopes Prox Operators

Prox Operators: Limiting Cases

Figure: limγ→∞ ← − Pγθ(x) = ← − Pargmin θ(x) (Analogously: limγ→∞ − → Pγθ(x) = − → Pargmin θ(x))

30 / 39

slide-31
SLIDE 31

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Introduction Results Envelopes Prox Operators

Prox Operators: Limiting Cases

Figure: limγ→0 ← − Pγθ(x) = x (Analogously: limγ→0 − → Pγθ(x) = x)

31 / 39

slide-32
SLIDE 32

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Summary References

Our Contributions

Contributions of this work include:

1 Limit as the parameter goes to infinity 2 Exploration of right prox 3 Computed examples prototypical of polyhedral adaptaion 32 / 39

slide-33
SLIDE 33

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Summary References

Thanks for listening!

33 / 39

slide-34
SLIDE 34

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Summary References

References I

[1] F. Alvarez, R. Correa, and M. Marechal, Regular self-proximal distances are Bregman, Journal of Convex Analysis 24 (2017), 135–148. [2] H. Attouch, Convergence de fonctions convexes, des sous-diff´ erentiels et semi-groupes associ´ es, Comptes Rendus de l’Acad´ emie des Sciences de Paris 284 (1977), 539–542. [3] H. Attouch, Variational Convergence for Functions and Operators, Pitman, 1984. [4] H.H. Bauschke, J. Bolte, and M. Teboulle, A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications, Mathematics of Operations Research, 2016. DOI 10.1287/moor.2016.0817 [5] H.H. Bauschke and J.M. Borwein, Legendre functions and the method of random Bregman projections, Journal of Convex Analysis 4 (1997), 27–67. [6] H.H. Bauschke, J.M. Borwein, and P.L. Combettes, Bregman monotone

  • ptimization algorithms, SIAM Journal on Control and Optimization 42

(2003), 596–636.

34 / 39

slide-35
SLIDE 35

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Summary References

References II

[7] H.H. Bauschke and P.L. Combettes, Iterating Bregman retractions, SIAM Journal on Optimization 13 (2003), 1159–1173. [8] H.H. Bauschke and P.L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, second edition, Springer, 2017. [9] H.H. Bauschke, P.L. Combettes, and D. Noll, Joint minimization with alternating Bregman proximity operators, Pacific Journal of Optimization 2 (2006), 401–424. [10] H.H. Bauschke and A.S. Lewis, Dykstra’s algorithm with Bregman projectors: a convergence proof, Optimization 48 (2000), 409–427. [11] H.H. Bauschke and D. Noll, The method of forward projections, Journal

  • f Nonlinear and Convex Analysis 3 (2002), 191–205.

[12] L.M. Bregman, The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming, USSR Computational Mathematics and Mathematical Physics 7 (1967), 200–217.

35 / 39

slide-36
SLIDE 36

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Summary References

References III

[13] R. Burachik and G. Kassay, On a generalized proximal point method for solving equilibrium problems in Banach spaces, Nonlinear Analysis 75 (2012), 6456–6464. [14] D. Butnariu and A.N. Iusem, Totally Convex Functions for Fixed Points Computation and Infinite Dimensional Optimization, Kluwer, 2000. [15] C. Byrne and Y. Censor, Proximity function minimization using multiple Bregman projections, with applications to split feasibility and Kullback–Leibler distance minimization, Annals of Operations Research 105 (2001), 77–98. [16] Y. Censor and G. T. Herman, Block-iterative algorithms with underrelaxed Bregman projections, SIAM Journal on Optimization 13 (2002), 283–297. [17] Y. Censor and S. Reich, The Dykstra algorithm with Bregman projections, Communications in Applied Analysis 2 (1998), 407–419. [18] Y. Censor and S.A. Zenios, Proximal minimization algorithm with D-functions, Journal of Optimization Theory and Applications 73 (1992), 451–464.

36 / 39

slide-37
SLIDE 37

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Summary References

References IV

[19] Y. Censor and S.A. Zenios, Parallel Optimization: Theory, Algorithms, and Applications, Oxford University Press, 1997. [20] G. Chen and M. Teboulle, Convergence analysis of a proximal-like minimization algorithm using Bregman functions, SIAM Journal on Optimization 3 (1993), 538–543. [21] Y.Y. Chen, C. Kan, and W. Song, The Moreau envelop function and proximal mapping with respect to the Bregman distance in Banach spaces, Vietnam Journal of Mathematics 40 (2012), 181–199. [22] P.L. Combettes and Q.V. Nguyen, Solving composite monotone inclusions in reflexive Banach spaces by constructing best Bregman approximations from their Kuhn–Tucker set, Journal of Convex Analysis 23 (2016), 481–510. [23] J. Eckstein, Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming, Mathematics of Operations Research 18 (1993), 202–226. [24] J.-B. Hiriart-Urruty and C. Lemar´ echal, Convex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods, Springer, 1993.

37 / 39

slide-38
SLIDE 38

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Summary References

References V

[25] C. Kan and W. Song, The Moreau envelope function and proximal mapping in the sense of the Bregman distance, Nonlinear Analysis 75 (2012), 1385–1399. [26] G. Kassay, S. Reich, and S. Sabach, Iterative methods for solving systems

  • f variational inequalities in reflexive Banach spaces, SIAM Journal on

Optimization 21 (2011), 1319–1344. [27] K.C. Kiwiel, Proximal minimization methods with generalized Bregman functions, SIAM Journal on Control and Optimization 35 (1997), 1142–1168. [28] K. Lange, MM Optimization Algorithms, SIAM, 2016. [29] B.S. Mordukhovich and N.M. Nam, An Easy Path to Convex Analysis and Applications, Morgan & Claypool Publishers, 2013. [30] J.-J. Moreau, Proximit´ e et dualit´ e dans un espace hilbertien, Bulletin de la Soci´ et´ e Math´ ematique de France 93 (1965), 273–299. [31] Q.V. Nguyen, Forward-backward splitting with Bregman distances, Vietnam Journal of Mathematics, 2017, DOI 10.1007/s10013-016-0238-3

38 / 39

slide-39
SLIDE 39

Moreau Envelopes Bregman Distance Bregman Envelopes Conclusion Summary References

References VI

[32] Q.V. Nguyen, Variable quasi-Bregman monotone sequences, Numerical Algorithms 73 (2016), 1107–1130. [33] R.T. Rockafellar, Convex Analysis, Princeton University Press, 1970. [34] R.T. Rockafellar and R.J-B Wets, Variational Analysis, Springer-Verlag, 1998. [35] S. Sabach, Products of finitely many resolvents of maximal monotone mappings in reflexive Banach spaces, SIAM Journal on Optimization 21 (2011), 1289–1308.

39 / 39