Problem Definition Machine learning problems, such as image - - PowerPoint PPT Presentation

β–Ά
problem definition
SMART_READER_LITE
LIVE PREVIEW

Problem Definition Machine learning problems, such as image - - PowerPoint PPT Presentation

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization Zhize Li , Jian Li IIIS, Tsinghua University https://zhizeli.github.io/ Dec 6th, NeurIPS 2018 Problem Definition Machine learning problems, such as image


slide-1
SLIDE 1

Zhize Li, Jian Li

IIIS, Tsinghua University https://zhizeli.github.io/ Dec 6th, NeurIPS 2018

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

slide-2
SLIDE 2

Problem Definition

Machine learning problems, such as image classification or voice recognition, are usually modeled as a (nonconvex) optimization problem:

Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm

Goal: find a good enough solution (parameters) , e.g.,

min

πœ„ 𝑀 πœ„ .

ΰ·  πœ„

‖𝛼𝑀 ΰ·  πœ„ β€–2 ≀ πœ—

2/7

slide-3
SLIDE 3

Problem Definition

We consider the more general nonsmooth nonconvex case:

Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm

min𝑦𝛸 𝑦 : = 𝑔 𝑦 + β„Ž 𝑦 = 1 π‘œ ෍

𝑗=1 π‘œ

ሻ 𝑔

𝑗(𝑦 + β„Ž 𝑦 ,

Where and all are possibly nonconvex (loss on data samples),

ሻ 𝑔

𝑗(𝑦

ሻ 𝑔(𝑦

and is nonsmooth but convex (e.g., regularizer

  • r indicator

function for some convex set ).

ሻ β„Ž(𝑦 π‘š1 ‖𝑦‖1 𝐷 ሻ 𝐽𝐷(𝑦

3/7

slide-4
SLIDE 4

Problem Definition

We consider the more general nonsmooth nonconvex case:

Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm

min𝑦𝛸 𝑦 : = 𝑔 𝑦 + β„Ž 𝑦 = 1 π‘œ ෍

𝑗=1 π‘œ

ሻ 𝑔

𝑗(𝑦 + β„Ž 𝑦 ,

Where and all are possibly nonconvex (loss on data samples),

ሻ 𝑔

𝑗(𝑦

ሻ 𝑔(𝑦

Benefit of : try to deal with the nonsmooth and constrained problems.

ሻ β„Ž(𝑦

and is nonsmooth but convex (e.g., regularizer

  • r indicator

function for some convex set ).

ሻ β„Ž(𝑦 π‘š1 ‖𝑦‖1 𝐷 ሻ 𝐽𝐷(𝑦

3/7

slide-5
SLIDE 5

Our Results

We propose a simple ProxSVRG+ algorithm, which recovers/improves several previous results (e.g., ProxGD, ProxSVRG/SAGA, SCSG).

Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 4/7

slide-6
SLIDE 6

Our Results

We propose a simple ProxSVRG+ algorithm, which recovers/improves several previous results (e.g., ProxGD, ProxSVRG/SAGA, SCSG).

Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm

Benefits: simpler algorithm, simpler analysis, better theoretical results,

4/7

slide-7
SLIDE 7

Our Results

We propose a simple ProxSVRG+ algorithm, which recovers/improves several previous results (e.g., ProxGD, ProxSVRG/SAGA, SCSG).

Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm

Benefits: simpler algorithm, simpler analysis, better theoretical results,

4/7

more attractive in practice (prefers moderate minibatch size, auto-adapt to local curvature, i.e., auto-switch to faster linear convergence in that regions although the objective function is generally nonconvex).

ሻ 𝑃(β‹… log Ξ€ 1 πœ—

slide-8
SLIDE 8

Theoretical Results

Our ProxSVRG+ prefers moderate minibatch size (red box) which is not too small for parallelism or vectorization and not too large for better generalization,

Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 5/7

slide-9
SLIDE 9

Theoretical Results

Our ProxSVRG+ prefers moderate minibatch size (red box) which is not too small for parallelism or vectorization and not too large for better generalization,

Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 5/7

and uses less PO calls than ProxSVRG.

slide-10
SLIDE 10

Theoretical Results

Our ProxSVRG+ prefers moderate minibatch size (red box) which is not too small for parallelism or vectorization and not too large for better generalization,

Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm

Recently, [Zhou et al., 2018] and [Fang et al., 2018] improve the SFO to in the smooth setting.

ΰ΅― 𝑃( Ξ€ π‘œ Ξ€

1 2 πœ—

5/7

and uses less PO calls than ProxSVRG.

slide-11
SLIDE 11

Experimental Results

Our ProxSVRG+ prefers much smaller minibatch size than ProxSVRG [Reddi et al., 2016], and performs much better than ProxGD and ProxSGD [Ghadimi et al., 2016].

Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 6/7

slide-12
SLIDE 12

Our Poster: 5:00-7:00 PM Room 210 #5

Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm

Thanks!

7/7