Characterization of Convex Objective Functions and Optimal Expected - - PowerPoint PPT Presentation

β–Ά
characterization of convex objective functions and
SMART_READER_LITE
LIVE PREVIEW

Characterization of Convex Objective Functions and Optimal Expected - - PowerPoint PPT Presentation

Characterization of Convex Objective Functions and Optimal Expected Convergence Rates of SGD Phuong Ha Nguyen 1 Marten van Dijk 1 , Lam M. Nguyen 2 and Dzung T. Phan 2 Marten Lam P. Ha Dzung 1. Secure Computation Laboratory, ECE, University


slide-1
SLIDE 1

Characterization of Convex Objective Functions and Optimal Expected Convergence Rates of SGD

Marten van Dijk1, Lam M. Nguyen2 and Dzung T. Phan2

Phuong Ha Nguyen1

1. Secure Computation Laboratory, ECE, University of Connecticut 2. IBM Research, Thomas J. Watson Research Center International Conference on Machine Learning (ICML) Long Beach, California, 2019

Marten Lam

  • P. Ha

Dzung

slide-2
SLIDE 2

Problem Setting

Β§ Solve

min

$∈&'{𝐺(π‘₯) = 𝐹𝜊[𝑔(π‘₯; 𝜊)]}

Β§ Assumptions

Β­ Convex:

𝑔 π‘₯; 𝜊 βˆ’ 𝑔 π‘₯6; 𝜊 β‰₯ 𝛼𝑔 π‘₯6; 𝜊 , π‘₯ βˆ’ π‘₯6

Β­ Smooth:

||𝛼𝑔 π‘₯; 𝜊 βˆ’ 𝛼𝑔 π‘₯6; 𝜊 || ≀ 𝑀||π‘₯ βˆ’ π‘₯6||

Β§ Find a π‘₯= close to

π‘‹βˆ— = {π‘₯βˆ— ∈ 𝑆A ∢ βˆ€$∈&', 𝐺 π‘₯ β‰₯ 𝐺 π‘₯βˆ— }

Β§ Problem: Characterize Expected Convergence Rates

𝐹 inf

Eβˆ—βˆˆFβˆ—||wH βˆ’ wβˆ—||I

and 𝐹[𝐺(π‘₯=) βˆ’ 𝐺(π‘₯βˆ—)]

2

slide-3
SLIDE 3

Problem Setting

Β§ Solve

min

$∈&'{𝐺(π‘₯) = 𝐹𝜊[𝑔(π‘₯; 𝜊)]}

Β§ Assumptions

Β­ Convex:

𝑔 π‘₯; 𝜊 βˆ’ 𝑔 π‘₯6; 𝜊 β‰₯ 𝛼𝑔 π‘₯6; 𝜊 , π‘₯ βˆ’ π‘₯6

Β­ Smooth:

||𝛼𝑔 π‘₯; 𝜊 βˆ’ 𝛼𝑔 π‘₯6; 𝜊 || ≀ 𝑀||π‘₯ βˆ’ π‘₯6||

Β§ Find a π‘₯= close to

π‘‹βˆ— = {π‘₯βˆ— ∈ 𝑆A ∢ βˆ€$∈&', 𝐺 π‘₯ β‰₯ 𝐺 π‘₯βˆ— }

Β§ Problem: Characterize Expected Convergence Rates

𝐹 inf

Eβˆ—βˆˆFβˆ—||wH βˆ’ wβˆ—||I

and 𝐹[𝐺(π‘₯=) βˆ’ 𝐺(π‘₯βˆ—)]

3

slide-4
SLIDE 4

Problem Setting

Β§ Solve

min

$∈&'{𝐺(π‘₯) = 𝐹𝜊[𝑔(π‘₯; 𝜊)]}

Β§ Assumptions

Β­ Convex:

𝑔 π‘₯; 𝜊 βˆ’ 𝑔 π‘₯6; 𝜊 β‰₯ 𝛼𝑔 π‘₯6; 𝜊 , π‘₯ βˆ’ π‘₯6

Β­ Smooth:

||𝛼𝑔 π‘₯; 𝜊 βˆ’ 𝛼𝑔 π‘₯6; 𝜊 || ≀ 𝑀||π‘₯ βˆ’ π‘₯6||

Β§ Find a π‘₯= close to

π‘‹βˆ— = {π‘₯βˆ— ∈ 𝑆A ∢ βˆ€$∈&', 𝐺 π‘₯ β‰₯ 𝐺 π‘₯βˆ— }

Β§ Problem: Characterize Expected Convergence Rates

𝐹 inf

Eβˆ—βˆˆFβˆ—||wH βˆ’ wβˆ—||I

and 𝐹[𝐺(π‘₯=) βˆ’ 𝐺(π‘₯βˆ—)]

4

slide-5
SLIDE 5

Problem Setting

Β§ Solve

min

$∈&'{𝐺(π‘₯) = 𝐹𝜊[𝑔(π‘₯; 𝜊)]}

Β§ Assumptions

Β­ Convex:

𝑔 π‘₯; 𝜊 βˆ’ 𝑔 π‘₯6; 𝜊 β‰₯ 𝛼𝑔 π‘₯6; 𝜊 , π‘₯ βˆ’ π‘₯6

Β­ Smooth:

||𝛼𝑔 π‘₯; 𝜊 βˆ’ 𝛼𝑔 π‘₯6; 𝜊 || ≀ 𝑀||π‘₯ βˆ’ π‘₯6||

Β§ Find a π‘₯= close to

π‘‹βˆ— = {π‘₯βˆ— ∈ 𝑆A ∢ βˆ€$∈&', 𝐺 π‘₯ β‰₯ 𝐺 π‘₯βˆ— }

Β§ Problem: Characterize Expected Convergence Rates

𝐹 inf

Eβˆ—βˆˆFβˆ—||wH βˆ’ wβˆ—||I

and 𝐹[𝐺(π‘₯=) βˆ’ 𝐺(π‘₯βˆ—)]

5

Stochastic Gradient Descend (SGD): Initialize: π‘₯J Iterate: for 𝑒 = 0, 1, 2, … , do Choose πœƒ= > 0 Generate random 𝜊= Compute 𝛼𝑔 π‘₯=; 𝜊= Update π‘₯=RS = π‘₯= βˆ’ πœƒ= 𝛼𝑔 π‘₯=; 𝜊= end for

slide-6
SLIDE 6

Problem Setting

Β§ Solve

min

$∈&'{𝐺(π‘₯) = 𝐹𝜊[𝑔(π‘₯; 𝜊)]}

Β§ Assumptions

Β­ Convex:

𝑔 π‘₯; 𝜊 βˆ’ 𝑔 π‘₯6; 𝜊 β‰₯ 𝛼𝑔 π‘₯6; 𝜊 , π‘₯ βˆ’ π‘₯6

Β­ Smooth:

||𝛼𝑔 π‘₯; 𝜊 βˆ’ 𝛼𝑔 π‘₯6; 𝜊 || ≀ 𝑀||π‘₯ βˆ’ π‘₯6||

Β§ Find a π‘₯= close to

π‘‹βˆ— = {π‘₯βˆ— ∈ 𝑆A ∢ βˆ€$∈&', 𝐺 π‘₯ β‰₯ 𝐺 π‘₯βˆ— }

Β§ Problem: Characterize Expected Convergence Rates

𝐹 inf

Eβˆ—βˆˆFβˆ—||wH βˆ’ wβˆ—||I

and 𝐹[𝐺(π‘₯=) βˆ’ 𝐺(π‘₯βˆ—)]

6

Stochastic Gradient Descend (SGD): Initialize: π‘₯J Iterate: for 𝑒 = 0, 1, 2, … , do Choose πœƒ= > 0 Generate random 𝜊= Compute 𝛼𝑔 π‘₯=; 𝜊= Update π‘₯=RS = π‘₯= βˆ’ πœƒ= 𝛼𝑔 π‘₯=; 𝜊= end for

slide-7
SLIDE 7

Beyond convex and strongly convex functions

Strongly Convex 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯

T I ||π‘₯ βˆ’ π‘₯βˆ—||I

Plain Convex 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯ 0

slide-8
SLIDE 8

πœ•-Convexity

Strongly Convex 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯

T I ||π‘₯ βˆ’ π‘₯βˆ—||I

Plain Convex 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯ 0 πœ• βˆ’ Convex πœ• 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯ inf

$βˆ—βˆˆFβˆ— ||π‘₯ βˆ’ π‘₯βˆ—||I ,

πœ•6 > 0, πœ•66 < 0,

slide-9
SLIDE 9

πœ•-Convexity with curvature β„Ž ∈ [0,1]

Strongly Convex 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯

T I ||π‘₯ βˆ’ π‘₯βˆ—||I

Plain Convex 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯ 0 πœ• βˆ’ Convex πœ• 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯ inf

$βˆ—βˆˆFβˆ— ||π‘₯ βˆ’ π‘₯βˆ—||I ,

πœ•6 > 0, πœ•66 < 0, 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ—

] β‰₯ 𝛽

inf

$βˆ—βˆˆFβˆ— ||π‘₯ βˆ’ π‘₯βˆ—||I

β„Ž = 0 β„Ž = 1 β„Ž ∈ (0,1)

slide-10
SLIDE 10

HEB (Holderian Error Bound): 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ—

] β‰₯ 𝛽

inf

$βˆ—βˆˆFβˆ— ||π‘₯ βˆ’ π‘₯βˆ—||I, where β„Ž ∈ 0,2 .

HEB and πœ•-convexity are not subclasses of one another but they do intersection for β„Ž ∈ 0,1 . [Bolte, J., Nguyen, T. P., Peypouquet, J., and Suter, B. W. From error bounds to the complexity of first

  • rder descent methods for convex functions. Mathematical Programming, 165(2):471–507, Oct 2017]

πœ•-Convexity with curvature β„Ž ∈ [0,1]

Strongly Convex 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯

T I ||π‘₯ βˆ’ π‘₯βˆ—||I

Plain Convex 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯ 0 πœ• βˆ’ Convex πœ• 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯ inf

$βˆ—βˆˆFβˆ— ||π‘₯ βˆ’ π‘₯βˆ—||I ,

πœ•6 > 0, πœ•66 < 0, 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ—

] β‰₯ 𝛽

inf

$βˆ—βˆˆFβˆ— ||π‘₯ βˆ’ π‘₯βˆ—||I

β„Ž = 0 β„Ž = 1 β„Ž ∈ (0,1)

slide-11
SLIDE 11

Close to optimal stepsize

Strongly Convex 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯

T I ||π‘₯ βˆ’ π‘₯βˆ—||I

Plain Convex 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯ 0 πœ• βˆ’ Convex πœ• 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯ inf

$βˆ—βˆˆFβˆ— ||π‘₯ βˆ’ π‘₯βˆ—||I ,

πœ•6 > 0, πœ•66 < 0, 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ—

] β‰₯ 𝛽

inf

$βˆ—βˆˆFβˆ— ||π‘₯ βˆ’ π‘₯βˆ—||I

πœƒ= =

` HRa b/ def

π·π‘šπ‘π‘‘π‘“ 𝑒𝑝 π‘π‘žπ‘’π‘—π‘›π‘π‘š π‘‘π‘’π‘“π‘žπ‘‘π‘—π‘¨π‘“ β„Ž = 0 β„Ž = 1 β„Ž ∈ (0,1) SGD

slide-12
SLIDE 12

Convergence Rate of SGD

12

Strongly Convex 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯

T I ||π‘₯ βˆ’ π‘₯βˆ—||I

Plain Convex 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯ 0 πœ• βˆ’ Convex πœ• 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯ inf

$βˆ—βˆˆFβˆ— ||π‘₯ βˆ’ π‘₯βˆ—||I ,

πœ•6 > 0, πœ•66 < 0, 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ—

] β‰₯ 𝛽

inf

$βˆ—βˆˆFβˆ— ||π‘₯ βˆ’ π‘₯βˆ—||I

𝐹 inf

Eβˆ—βˆˆFβˆ—||wH βˆ’ wβˆ—||I = 𝑃 𝑒r]/(Ir])

1 𝑒 s

tu=RS I=

𝐹 𝐺 π‘₯t βˆ’ 𝐺 π‘₯βˆ— = 𝑃(𝑒rS/(Ir])) πœƒ= =

` HRa b/ def

π·π‘šπ‘π‘‘π‘“ 𝑒𝑝 π‘π‘žπ‘’π‘—π‘›π‘π‘š π‘‘π‘’π‘“π‘žπ‘‘π‘—π‘¨π‘“ β„Ž = 0 β„Ž = 1 β„Ž ∈ (0,1) SGD

slide-13
SLIDE 13

Convergence Rate of SGD

13

Strongly Convex 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯

T I ||π‘₯ βˆ’ π‘₯βˆ—||I

Plain Convex 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯ 0 πœ• βˆ’ Convex πœ• 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯ inf

$βˆ—βˆˆFβˆ— ||π‘₯ βˆ’ π‘₯βˆ—||I ,

πœ•6 > 0, πœ•66 < 0, 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ—

] β‰₯ 𝛽

inf

$βˆ—βˆˆFβˆ— ||π‘₯ βˆ’ π‘₯βˆ—||I

𝐹 inf

Eβˆ—βˆˆFβˆ—||wH βˆ’ wβˆ—||I = 𝑃 𝑒r]/(Ir])

1 𝑒 s

tu=RS I=

𝐹 𝐺 π‘₯t βˆ’ 𝐺 π‘₯βˆ— = 𝑃(𝑒rS/(Ir])) β„Ž = 0 β„Ž = 1 β„Ž ∈ (0,1) [Useless,0] [Useful,1] [Useful,0] [Useful,1] 0 ← β„Ž β†’ 1

slide-14
SLIDE 14

Convergence Rate of SGD

14

Strongly Convex 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯

T I ||π‘₯ βˆ’ π‘₯βˆ—||I

Plain Convex 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯ 0 πœ• βˆ’ Convex πœ• 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ— β‰₯ inf

$βˆ—βˆˆFβˆ— ||π‘₯ βˆ’ π‘₯βˆ—||I ,

πœ•6 > 0, πœ•66 < 0, 𝐺 π‘₯ βˆ’ 𝐺 π‘₯βˆ—

] β‰₯ 𝛽

inf

$βˆ—βˆˆFβˆ— ||π‘₯ βˆ’ π‘₯βˆ—||I

𝐹 inf

Eβˆ—βˆˆFβˆ—||wH βˆ’ wβˆ—||I = 𝑃 𝑒r]/(Ir])

1 𝑒 s

tu=RS I=

𝐹 𝐺 π‘₯t βˆ’ 𝐺 π‘₯βˆ— = 𝑃(𝑒rS/(Ir])) β„Ž = 0 β„Ž = 1 β„Ž ∈ (0,1) h= Β½ 𝐺 π‘₯ = 𝐼 π‘₯ + πœ‡π» π‘₯ , 𝐼 π‘₯ βˆ’ π‘‘π‘π‘œπ‘€π‘“π‘¦ 𝐻 π‘₯ = s

tuS A

[𝑓$€+𝑓r$€ βˆ’ 2 βˆ’ π‘₯t

I]

slide-15
SLIDE 15

Experiment

15

Curvature 0 (convex) Curvature unknown Curvature Β½ Curvature 1 (strongly convex) 𝑔

t π‘₯ = log(1 + exp(βˆ’π‘§t𝑦t …π‘₯))

𝑔

t † π‘₯ = 𝑔 t π‘₯ + πœ‡ π‘₯

𝑔

t ` π‘₯ = 𝑔 t π‘₯ + πœ‡

2 π‘₯

I

𝑔

t † π‘₯ = 𝑔 t π‘₯ + πœ‡π» π‘₯

𝐻 π‘₯ = s

tuS A

[𝑓$€+𝑓r$€ βˆ’ 2 βˆ’ π‘₯t

I]

slide-16
SLIDE 16

Conclusion

Β§ πœ•-convexity notion: plain convex, strongly convex and something in between Β§ SGD with πœ•-convex objective functions

16

Thank you for your attention! J

https://arxiv.org/abs/1810.04100

Poster Number: #193 – Pacific Ballroom. – 06:30β€”09:00PM – 06/11