Problem Definition Machine learning problems, such as image - PowerPoint PPT Presentation

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization Zhize Li , Jian Li IIIS, Tsinghua University https://zhizeli.github.io/ Dec 6th, NeurIPS 2018

Problem Definition Machine learning problems, such as image classification or voice recognition, are usually modeled as a (nonconvex) optimization problem: min 𝜄 𝑀 𝜄 . 𝜄 ‖ 2 ≤ 𝜗 ෠ ‖𝛼𝑀 ෠ Goal: find a good enough solution (parameters) , e.g., 𝜄 Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 2/7

Problem Definition We consider the more general nonsmooth nonconvex case: 𝑜 min 𝑦 𝛸 𝑦 : = 𝑔 𝑦 + ℎ 𝑦 = 1 ሻ 𝑜 ෍ 𝑔 𝑗 (𝑦 + ℎ 𝑦 , 𝑗=1 ሻ ሻ 𝑔(𝑦 𝑔 𝑗 (𝑦 Where and all are possibly nonconvex (loss on data samples), ሻ ‖𝑦‖ 1 ℎ(𝑦 𝑚 1 and is nonsmooth but convex (e.g., regularizer or indicator ሻ 𝐽 𝐷 (𝑦 𝐷 function for some convex set ). Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 3/7

Problem Definition We consider the more general nonsmooth nonconvex case: 𝑜 min 𝑦 𝛸 𝑦 : = 𝑔 𝑦 + ℎ 𝑦 = 1 ሻ 𝑜 ෍ 𝑔 𝑗 (𝑦 + ℎ 𝑦 , 𝑗=1 ሻ ሻ 𝑔(𝑦 𝑔 𝑗 (𝑦 Where and all are possibly nonconvex (loss on data samples), ሻ ‖𝑦‖ 1 ℎ(𝑦 𝑚 1 and is nonsmooth but convex (e.g., regularizer or indicator ሻ 𝐽 𝐷 (𝑦 𝐷 function for some convex set ). ሻ ℎ(𝑦 Benefit of : try to deal with the nonsmooth and constrained problems. Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 3/7

Our Results We propose a simple ProxSVRG+ algorithm, which recovers/improves several previous results (e.g., ProxGD, ProxSVRG/SAGA, SCSG). Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 4/7

Our Results We propose a simple ProxSVRG+ algorithm, which recovers/improves several previous results (e.g., ProxGD, ProxSVRG/SAGA, SCSG). Benefits: simpler algorithm, simpler analysis, better theoretical results, Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 4/7

Our Results We propose a simple ProxSVRG+ algorithm, which recovers/improves several previous results (e.g., ProxGD, ProxSVRG/SAGA, SCSG). Benefits: simpler algorithm, simpler analysis, better theoretical results, more attractive in practice (prefers moderate minibatch size, auto-adapt to local curvature, i.e., auto-switch to faster linear convergence 𝑃(⋅ log Τ ሻ 1 𝜗 in that regions although the objective function is generally nonconvex). Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 4/7

Theoretical Results Our ProxSVRG+ prefers moderate minibatch size (red box) which is not too small for parallelism or vectorization and not too large for better generalization, Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 5/7

Theoretical Results Our ProxSVRG+ prefers moderate minibatch size (red box) which is not too small for parallelism or vectorization and not too large for better generalization, and uses less PO calls than ProxSVRG. Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 5/7

Theoretical Results Our ProxSVRG+ prefers moderate minibatch size (red box) which is not too small for parallelism or vectorization and not too large for better generalization, and uses less PO calls than ProxSVRG. Recently, [Zhou et al., 2018] and [Fang et al., 2018] improve the SFO 1 2 𝜗 𝑜 Τ Τ 𝑃( ൯ to in the smooth setting. Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 5/7

Experimental Results Our ProxSVRG+ prefers much smaller minibatch size than ProxSVRG [Reddi et al., 2016], and performs much better than ProxGD and ProxSGD [Ghadimi et al., 2016]. Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 6/7

Thanks! Our Poster: 5:00-7:00 PM Room 210 #5 Zhize Li (Tsinghua) A Simple ProxSVRG+ Algorithm 7/7

Problem Definition Machine learning problems, such as image - PowerPoint PPT Presentation

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization Zhize Li , Jian Li IIIS, Tsinghua University https://zhizeli.github.io/ Dec 6th, NeurIPS 2018 Problem Definition Machine learning problems, such as image

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Texture Synthesis Presented by James Hays Problem Statement 1 Problem Statement Problem

Problem Definition Problem Definition CG Lecture 5 CG Lecture 5 Point Location Point Location

Formal Definition of a Finite Automaton Formal Definition of a Finite Automaton p.1/23 Why a

Fundamentalism Definition? Definition? Definition? Definition? Origins Conflict with

Overview of Line Search Topics Problem Definition Problem definition f ( ) Line search

Univariate Smoothing Overview Problem Definition & Interpolation Problem definition

Problem Definition Techniques 4. 1. K-T Problem Critical Analysis Thinking Problem

Nonlinear Modeling Overview Problem Definition Problem definition Observed Process

Reduction Informal Definition A problem A is reducible to problem B iff the solution to problem B

Problems Problem Spaces Problems, Problem Spaces, and Search Ahmed Rafea Ahmed Rafea Problem

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Last time: Problem-Solving Problem solving: Goal formulation Problem formulation

TV Ads Attribution and Gaussian Processes Adrin Jalali November 16, 2016 1 / 27 Problem

Towards a Physically Analysis of the Problem Meaningful Definition of Additional Problem: . . .

The Problem-Solving Enterprise Definition 1 (Problem). A doubtful or diffi- cult question; a

Multilinguism and Linked Open Data in the EU Open Data Portal and other projects of the

UT Participatory Day 2019 Yes, if versus No, unless 15:30 till 16:10 ish 25

K3: Language Design for Building Multi-Platform Domain-Specific Runtimes P.C. Shyamshankar with

Leapfrog tracking of an electrostatic ring for the pEDM project. Code M3 Alfredo U. Luccio

Repairing Sequential Consistency in C/C++11 Ori Lahav 1 Viktor Vafeiadis 1 Jeehoon Kang 2

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 12:

Outline I. Flags what is a flag? who cares? II. (commutative) Generic matrices

Prelim 1 tonight 7:30 PM NetID ends in 0-6: Olin Hall 155 NetID ends in 7-9: Olin Hall

Problem Definition Machine learning problems, such as image - PowerPoint PPT Presentation

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization Zhize Li , Jian Li IIIS, Tsinghua University https://zhizeli.github.io/ Dec 6th, NeurIPS 2018 Problem Definition Machine learning problems, such as image

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Texture Synthesis Presented by James Hays Problem Statement 1 Problem Statement Problem

Problem Definition Problem Definition CG Lecture 5 CG Lecture 5 Point Location Point Location

Formal Definition of a Finite Automaton Formal Definition of a Finite Automaton p.1/23 Why a

Fundamentalism Definition? Definition? Definition? Definition? Origins Conflict with

Overview of Line Search Topics Problem Definition Problem definition f ( ) Line search

Univariate Smoothing Overview Problem Definition &amp; Interpolation Problem definition

Problem Definition Techniques 4. 1. K-T Problem Critical Analysis Thinking Problem

Nonlinear Modeling Overview Problem Definition Problem definition Observed Process

Reduction Informal Definition A problem A is reducible to problem B iff the solution to problem B

Problems Problem Spaces Problems, Problem Spaces, and Search Ahmed Rafea Ahmed Rafea Problem

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Last time: Problem-Solving Problem solving: Goal formulation Problem formulation

TV Ads Attribution and Gaussian Processes Adrin Jalali November 16, 2016 1 / 27 Problem

Towards a Physically Analysis of the Problem Meaningful Definition of Additional Problem: . . .

The Problem-Solving Enterprise Definition 1 (Problem). A doubtful or diffi- cult question; a

Multilinguism and Linked Open Data in the EU Open Data Portal and other projects of the

UT Participatory Day 2019 Yes, if versus No, unless 15:30 till 16:10 ish 25

K3: Language Design for Building Multi-Platform Domain-Specific Runtimes P.C. Shyamshankar with

Leapfrog tracking of an electrostatic ring for the pEDM project. Code M3 Alfredo U. Luccio

Repairing Sequential Consistency in C/C++11 Ori Lahav 1 Viktor Vafeiadis 1 Jeehoon Kang 2

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 12:

Outline I. Flags what is a flag? who cares? II. (commutative) Generic matrices

Prelim 1 tonight 7:30 PM NetID ends in 0-6: Olin Hall 155 NetID ends in 7-9: Olin Hall

Univariate Smoothing Overview Problem Definition & Interpolation Problem definition