[PPT] - Characterization of Convex Objective Functions and Optimal Expected PowerPoint Presentation

SLIDE 1

Characterization of Convex Objective Functions and Optimal Expected Convergence Rates of SGD

Marten van Dijk1, Lam M. Nguyen2 and Dzung T. Phan2

Phuong Ha Nguyen1

1. Secure Computation Laboratory, ECE, University of Connecticut 2. IBM Research, Thomas J. Watson Research Center International Conference on Machine Learning (ICML) Long Beach, California, 2019

Marten Lam

P. Ha

Dzung

SLIDE 2

Problem Setting

§ Solve

min

$∈&'{𝐺(𝑥) = 𝐹𝜊[𝑔(𝑥; 𝜊)]}

§ Assumptions

Convex:

𝑔 𝑥; 𝜊 − 𝑔 𝑥6; 𝜊 ≥ 𝛼𝑔 𝑥6; 𝜊 , 𝑥 − 𝑥6

Smooth:

||𝛼𝑔 𝑥; 𝜊 − 𝛼𝑔 𝑥6; 𝜊 || ≤ 𝑀||𝑥 − 𝑥6||

§ Find a 𝑥= close to

𝑋∗ = {𝑥∗ ∈ 𝑆A ∶ ∀$∈&', 𝐺 𝑥 ≥ 𝐺 𝑥∗ }

§ Problem: Characterize Expected Convergence Rates

𝐹 inf

E∗∈F∗||wH − w∗||I

and 𝐹[𝐺(𝑥=) − 𝐺(𝑥∗)]

2

SLIDE 3

Problem Setting

§ Solve

min

$∈&'{𝐺(𝑥) = 𝐹𝜊[𝑔(𝑥; 𝜊)]}

§ Assumptions

Convex:

𝑔 𝑥; 𝜊 − 𝑔 𝑥6; 𝜊 ≥ 𝛼𝑔 𝑥6; 𝜊 , 𝑥 − 𝑥6

Smooth:

||𝛼𝑔 𝑥; 𝜊 − 𝛼𝑔 𝑥6; 𝜊 || ≤ 𝑀||𝑥 − 𝑥6||

§ Find a 𝑥= close to

𝑋∗ = {𝑥∗ ∈ 𝑆A ∶ ∀$∈&', 𝐺 𝑥 ≥ 𝐺 𝑥∗ }

§ Problem: Characterize Expected Convergence Rates

𝐹 inf

E∗∈F∗||wH − w∗||I

and 𝐹[𝐺(𝑥=) − 𝐺(𝑥∗)]

3

SLIDE 4

Problem Setting

§ Solve

min

$∈&'{𝐺(𝑥) = 𝐹𝜊[𝑔(𝑥; 𝜊)]}

§ Assumptions

Convex:

𝑔 𝑥; 𝜊 − 𝑔 𝑥6; 𝜊 ≥ 𝛼𝑔 𝑥6; 𝜊 , 𝑥 − 𝑥6

Smooth:

||𝛼𝑔 𝑥; 𝜊 − 𝛼𝑔 𝑥6; 𝜊 || ≤ 𝑀||𝑥 − 𝑥6||

§ Find a 𝑥= close to

𝑋∗ = {𝑥∗ ∈ 𝑆A ∶ ∀$∈&', 𝐺 𝑥 ≥ 𝐺 𝑥∗ }

§ Problem: Characterize Expected Convergence Rates

𝐹 inf

E∗∈F∗||wH − w∗||I

and 𝐹[𝐺(𝑥=) − 𝐺(𝑥∗)]

4

SLIDE 5

Problem Setting

§ Solve

min

$∈&'{𝐺(𝑥) = 𝐹𝜊[𝑔(𝑥; 𝜊)]}

§ Assumptions

Convex:

𝑔 𝑥; 𝜊 − 𝑔 𝑥6; 𝜊 ≥ 𝛼𝑔 𝑥6; 𝜊 , 𝑥 − 𝑥6

Smooth:

||𝛼𝑔 𝑥; 𝜊 − 𝛼𝑔 𝑥6; 𝜊 || ≤ 𝑀||𝑥 − 𝑥6||

§ Find a 𝑥= close to

𝑋∗ = {𝑥∗ ∈ 𝑆A ∶ ∀$∈&', 𝐺 𝑥 ≥ 𝐺 𝑥∗ }

§ Problem: Characterize Expected Convergence Rates

𝐹 inf

E∗∈F∗||wH − w∗||I

and 𝐹[𝐺(𝑥=) − 𝐺(𝑥∗)]

5

Stochastic Gradient Descend (SGD): Initialize: 𝑥J Iterate: for 𝑢 = 0, 1, 2, … , do Choose 𝜃= > 0 Generate random 𝜊= Compute 𝛼𝑔 𝑥=; 𝜊= Update 𝑥=RS = 𝑥= − 𝜃= 𝛼𝑔 𝑥=; 𝜊= end for

SLIDE 6

Problem Setting

§ Solve

min

$∈&'{𝐺(𝑥) = 𝐹𝜊[𝑔(𝑥; 𝜊)]}

§ Assumptions

Convex:

𝑔 𝑥; 𝜊 − 𝑔 𝑥6; 𝜊 ≥ 𝛼𝑔 𝑥6; 𝜊 , 𝑥 − 𝑥6

Smooth:

||𝛼𝑔 𝑥; 𝜊 − 𝛼𝑔 𝑥6; 𝜊 || ≤ 𝑀||𝑥 − 𝑥6||

§ Find a 𝑥= close to

𝑋∗ = {𝑥∗ ∈ 𝑆A ∶ ∀$∈&', 𝐺 𝑥 ≥ 𝐺 𝑥∗ }

§ Problem: Characterize Expected Convergence Rates

𝐹 inf

E∗∈F∗||wH − w∗||I

and 𝐹[𝐺(𝑥=) − 𝐺(𝑥∗)]

6

Stochastic Gradient Descend (SGD): Initialize: 𝑥J Iterate: for 𝑢 = 0, 1, 2, … , do Choose 𝜃= > 0 Generate random 𝜊= Compute 𝛼𝑔 𝑥=; 𝜊= Update 𝑥=RS = 𝑥= − 𝜃= 𝛼𝑔 𝑥=; 𝜊= end for

SLIDE 7

Beyond convex and strongly convex functions

Strongly Convex 𝐺 𝑥 − 𝐺 𝑥∗ ≥

T I ||𝑥 − 𝑥∗||I

Plain Convex 𝐺 𝑥 − 𝐺 𝑥∗ ≥ 0

SLIDE 8

𝜕-Convexity

Strongly Convex 𝐺 𝑥 − 𝐺 𝑥∗ ≥

T I ||𝑥 − 𝑥∗||I

Plain Convex 𝐺 𝑥 − 𝐺 𝑥∗ ≥ 0 𝜕 − Convex 𝜕 𝐺 𝑥 − 𝐺 𝑥∗ ≥ inf

$∗∈F∗ ||𝑥 − 𝑥∗||I ,

𝜕6 > 0, 𝜕66 < 0,

SLIDE 9

𝜕-Convexity with curvature ℎ ∈ [0,1]

Strongly Convex 𝐺 𝑥 − 𝐺 𝑥∗ ≥

T I ||𝑥 − 𝑥∗||I

Plain Convex 𝐺 𝑥 − 𝐺 𝑥∗ ≥ 0 𝜕 − Convex 𝜕 𝐺 𝑥 − 𝐺 𝑥∗ ≥ inf

$∗∈F∗ ||𝑥 − 𝑥∗||I ,

𝜕6 > 0, 𝜕66 < 0, 𝐺 𝑥 − 𝐺 𝑥∗

] ≥ 𝛽

inf

$∗∈F∗ ||𝑥 − 𝑥∗||I

ℎ = 0 ℎ = 1 ℎ ∈ (0,1)

SLIDE 10

HEB (Holderian Error Bound): 𝐺 𝑥 − 𝐺 𝑥∗

] ≥ 𝛽

inf

$∗∈F∗ ||𝑥 − 𝑥∗||I, where ℎ ∈ 0,2 .

HEB and 𝜕-convexity are not subclasses of one another but they do intersection for ℎ ∈ 0,1 . [Bolte, J., Nguyen, T. P., Peypouquet, J., and Suter, B. W. From error bounds to the complexity of first

rder descent methods for convex functions. Mathematical Programming, 165(2):471–507, Oct 2017]

𝜕-Convexity with curvature ℎ ∈ [0,1]

Strongly Convex 𝐺 𝑥 − 𝐺 𝑥∗ ≥

T I ||𝑥 − 𝑥∗||I

Plain Convex 𝐺 𝑥 − 𝐺 𝑥∗ ≥ 0 𝜕 − Convex 𝜕 𝐺 𝑥 − 𝐺 𝑥∗ ≥ inf

$∗∈F∗ ||𝑥 − 𝑥∗||I ,

𝜕6 > 0, 𝜕66 < 0, 𝐺 𝑥 − 𝐺 𝑥∗

] ≥ 𝛽

inf

$∗∈F∗ ||𝑥 − 𝑥∗||I

ℎ = 0 ℎ = 1 ℎ ∈ (0,1)

SLIDE 11

Close to optimal stepsize

Strongly Convex 𝐺 𝑥 − 𝐺 𝑥∗ ≥

T I ||𝑥 − 𝑥∗||I

Plain Convex 𝐺 𝑥 − 𝐺 𝑥∗ ≥ 0 𝜕 − Convex 𝜕 𝐺 𝑥 − 𝐺 𝑥∗ ≥ inf

$∗∈F∗ ||𝑥 − 𝑥∗||I ,

𝜕6 > 0, 𝜕66 < 0, 𝐺 𝑥 − 𝐺 𝑥∗

] ≥ 𝛽

inf

$∗∈F∗ ||𝑥 − 𝑥∗||I

𝜃= =

` HRa b/ def

𝐷𝑚𝑝𝑡𝑓 𝑢𝑝 𝑝𝑞𝑢𝑗𝑛𝑏𝑚 𝑡𝑢𝑓𝑞𝑡𝑗𝑨𝑓 ℎ = 0 ℎ = 1 ℎ ∈ (0,1) SGD

SLIDE 12

Convergence Rate of SGD

12

Strongly Convex 𝐺 𝑥 − 𝐺 𝑥∗ ≥

T I ||𝑥 − 𝑥∗||I

Plain Convex 𝐺 𝑥 − 𝐺 𝑥∗ ≥ 0 𝜕 − Convex 𝜕 𝐺 𝑥 − 𝐺 𝑥∗ ≥ inf

$∗∈F∗ ||𝑥 − 𝑥∗||I ,

𝜕6 > 0, 𝜕66 < 0, 𝐺 𝑥 − 𝐺 𝑥∗

] ≥ 𝛽

inf

$∗∈F∗ ||𝑥 − 𝑥∗||I

𝐹 inf

E∗∈F∗||wH − w∗||I = 𝑃 𝑢r]/(Ir])

1 𝑢 s

tu=RS I=

𝐹 𝐺 𝑥t − 𝐺 𝑥∗ = 𝑃(𝑢rS/(Ir])) 𝜃= =

` HRa b/ def

𝐷𝑚𝑝𝑡𝑓 𝑢𝑝 𝑝𝑞𝑢𝑗𝑛𝑏𝑚 𝑡𝑢𝑓𝑞𝑡𝑗𝑨𝑓 ℎ = 0 ℎ = 1 ℎ ∈ (0,1) SGD

SLIDE 13

Convergence Rate of SGD

13

Strongly Convex 𝐺 𝑥 − 𝐺 𝑥∗ ≥

T I ||𝑥 − 𝑥∗||I

Plain Convex 𝐺 𝑥 − 𝐺 𝑥∗ ≥ 0 𝜕 − Convex 𝜕 𝐺 𝑥 − 𝐺 𝑥∗ ≥ inf

$∗∈F∗ ||𝑥 − 𝑥∗||I ,

𝜕6 > 0, 𝜕66 < 0, 𝐺 𝑥 − 𝐺 𝑥∗

] ≥ 𝛽

inf

$∗∈F∗ ||𝑥 − 𝑥∗||I

𝐹 inf

E∗∈F∗||wH − w∗||I = 𝑃 𝑢r]/(Ir])

1 𝑢 s

tu=RS I=

𝐹 𝐺 𝑥t − 𝐺 𝑥∗ = 𝑃(𝑢rS/(Ir])) ℎ = 0 ℎ = 1 ℎ ∈ (0,1) [Useless,0] [Useful,1] [Useful,0] [Useful,1] 0 ← ℎ → 1

SLIDE 14

Convergence Rate of SGD

14

Strongly Convex 𝐺 𝑥 − 𝐺 𝑥∗ ≥

T I ||𝑥 − 𝑥∗||I

Plain Convex 𝐺 𝑥 − 𝐺 𝑥∗ ≥ 0 𝜕 − Convex 𝜕 𝐺 𝑥 − 𝐺 𝑥∗ ≥ inf

$∗∈F∗ ||𝑥 − 𝑥∗||I ,

𝜕6 > 0, 𝜕66 < 0, 𝐺 𝑥 − 𝐺 𝑥∗

] ≥ 𝛽

inf

$∗∈F∗ ||𝑥 − 𝑥∗||I

𝐹 inf

E∗∈F∗||wH − w∗||I = 𝑃 𝑢r]/(Ir])

1 𝑢 s

tu=RS I=

𝐹 𝐺 𝑥t − 𝐺 𝑥∗ = 𝑃(𝑢rS/(Ir])) ℎ = 0 ℎ = 1 ℎ ∈ (0,1) h= ½ 𝐺 𝑥 = 𝐼 𝑥 + 𝜇𝐻 𝑥 , 𝐼 𝑥 − 𝑑𝑝𝑜𝑤𝑓𝑦 𝐻 𝑥 = s

tuS A

[𝑓$€+𝑓r$€ − 2 − 𝑥t

I]

SLIDE 15

Experiment

15

Curvature 0 (convex) Curvature unknown Curvature ½ Curvature 1 (strongly convex) 𝑔

t 𝑥 = log(1 + exp(−𝑧t𝑦t …𝑥))

𝑔

t † 𝑥 = 𝑔 t 𝑥 + 𝜇 𝑥

𝑔

t ` 𝑥 = 𝑔 t 𝑥 + 𝜇

2 𝑥

I

𝑔

t † 𝑥 = 𝑔 t 𝑥 + 𝜇𝐻 𝑥

𝐻 𝑥 = s

tuS A

[𝑓$€+𝑓r$€ − 2 − 𝑥t

I]

SLIDE 16

Conclusion

§ 𝜕-convexity notion: plain convex, strongly convex and something in between § SGD with 𝜕-convex objective functions

16

Thank you for your attention! J

https://arxiv.org/abs/1810.04100

Characterization of Convex Objective Functions and Optimal Expected - - PowerPoint PPT Presentation

Characterization of Convex Objective Functions and Optimal Expected Convergence Rates of SGD

Problem Setting

Problem Setting

Problem Setting

Problem Setting

Problem Setting

Beyond convex and strongly convex functions

𝜕-Convexity

𝜕-Convexity with curvature ℎ ∈ [0,1]

𝜕-Convexity with curvature ℎ ∈ [0,1]

Close to optimal stepsize

Convergence Rate of SGD

Convergence Rate of SGD

Convergence Rate of SGD

Experiment

Conclusion

Thank you for your attention! J

Poster Number: #193 – Pacific Ballroom. – 06:30—09:00PM – 06/11