Zhize Li, Jian Li
IIIS, Tsinghua University https://zhizeli.github.io/ Dec 6th, NeurIPS 2018
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization
Problem Definition Machine learning problems, such as image - - PowerPoint PPT Presentation
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization Zhize Li , Jian Li IIIS, Tsinghua University https://zhizeli.github.io/ Dec 6th, NeurIPS 2018 Problem Definition Machine learning problems, such as image
IIIS, Tsinghua University https://zhizeli.github.io/ Dec 6th, NeurIPS 2018
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization
Machine learning problems, such as image classification or voice recognition, are usually modeled as a (nonconvex) optimization problem:
Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm
Goal: find a good enough solution (parameters) , e.g.,
min
π π π .
ΰ· π
βπΌπ ΰ· π β2 β€ π
2/7
We consider the more general nonsmooth nonconvex case:
Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm
minπ¦πΈ π¦ : = π π¦ + β π¦ = 1 π ΰ·
π=1 π
α» π
π(π¦ + β π¦ ,
Where and all are possibly nonconvex (loss on data samples),
α» π
π(π¦
α» π(π¦
and is nonsmooth but convex (e.g., regularizer
function for some convex set ).
α» β(π¦ π1 βπ¦β1 π· α» π½π·(π¦
3/7
We consider the more general nonsmooth nonconvex case:
Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm
minπ¦πΈ π¦ : = π π¦ + β π¦ = 1 π ΰ·
π=1 π
α» π
π(π¦ + β π¦ ,
Where and all are possibly nonconvex (loss on data samples),
α» π
π(π¦
α» π(π¦
Benefit of : try to deal with the nonsmooth and constrained problems.
α» β(π¦
and is nonsmooth but convex (e.g., regularizer
function for some convex set ).
α» β(π¦ π1 βπ¦β1 π· α» π½π·(π¦
3/7
We propose a simple ProxSVRG+ algorithm, which recovers/improves several previous results (e.g., ProxGD, ProxSVRG/SAGA, SCSG).
Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 4/7
We propose a simple ProxSVRG+ algorithm, which recovers/improves several previous results (e.g., ProxGD, ProxSVRG/SAGA, SCSG).
Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm
Benefits: simpler algorithm, simpler analysis, better theoretical results,
4/7
We propose a simple ProxSVRG+ algorithm, which recovers/improves several previous results (e.g., ProxGD, ProxSVRG/SAGA, SCSG).
Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm
Benefits: simpler algorithm, simpler analysis, better theoretical results,
4/7
more attractive in practice (prefers moderate minibatch size, auto-adapt to local curvature, i.e., auto-switch to faster linear convergence in that regions although the objective function is generally nonconvex).
α» π(β log Ξ€ 1 π
Our ProxSVRG+ prefers moderate minibatch size (red box) which is not too small for parallelism or vectorization and not too large for better generalization,
Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 5/7
Our ProxSVRG+ prefers moderate minibatch size (red box) which is not too small for parallelism or vectorization and not too large for better generalization,
Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 5/7
and uses less PO calls than ProxSVRG.
Our ProxSVRG+ prefers moderate minibatch size (red box) which is not too small for parallelism or vectorization and not too large for better generalization,
Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm
Recently, [Zhou et al., 2018] and [Fang et al., 2018] improve the SFO to in the smooth setting.
ΰ΅― π( Ξ€ π Ξ€
1 2 π
5/7
and uses less PO calls than ProxSVRG.
Our ProxSVRG+ prefers much smaller minibatch size than ProxSVRG [Reddi et al., 2016], and performs much better than ProxGD and ProxSGD [Ghadimi et al., 2016].
Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 6/7
Our Poster: 5:00-7:00 PM Room 210 #5
Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm
7/7