Improved Zeroth-Order Variance Reduced Algorithms and Analysis for - - PowerPoint PPT Presentation

improved zeroth order variance reduced algorithms and
SMART_READER_LITE
LIVE PREVIEW

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for - - PowerPoint PPT Presentation

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization Kaiyi Ji 1 , Zhe Wang 1 , Yi Zhou 2 , Yingbin Liang 1 1 Ohio State University, 2 Duke University ICML 2019 K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order


slide-1
SLIDE 1

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization

Kaiyi Ji1, Zhe Wang1, Yi Zhou2, Yingbin Liang1

1Ohio State University, 2Duke University

ICML 2019

  • K. Ji, Z. Wang, Y. Zhou, Y. Liang

(The Ohio State University) Zeroth-Order Nonconvex Optimization ICML 2019 1 / 8

slide-2
SLIDE 2

Zeroth-order (Gradient-free) Nonconvex Optimization

  • Problem fomulation:

min

x∈Rd f (x) := 1

n

n

  • i=1

fi(x)

◮ fi(·): individual nonconvex loss function ◮ Gradient of fi(·) is unknown ◮ Only the function value of fi(·) is accessible ◮ Examples:

Generation of black-box adversarial samples Parameter optimization for black-box systems Action exploration in reinforcement learning

Generating black-box adversarial samples

  • K. Ji, Z. Wang, Y. Zhou, Y. Liang

(The Ohio State University) Zeroth-Order Nonconvex Optimization ICML 2019 2 / 8

slide-3
SLIDE 3

Zeroth-order (Gradient-free) Nonconvex Optimization

min

x∈Rd f (x) := 1

n

n

  • i=1

fi(x)

  • Standard assumptions on f (·):

◮ f (·) is bounded below, i.e., f ∗ = infx∈Rd f (x) > −∞ ◮ ∇fi(·) is L-smooth, i.e.,

∇fi(x) − ∇fi(y) ≤ Lx − y

◮ (Online case) ∇fi(·) has bounded variance, i.e., there exists σ > 0 s.t.

1 n

n

  • i=1

∇fi(x) − ∇f (x)2 ≤ σ2

  • Optimization goal: find an ǫ-accurate stationary solution

E∇f (x)2 ≤ ǫ

  • K. Ji, Z. Wang, Y. Zhou, Y. Liang

(The Ohio State University) Zeroth-Order Nonconvex Optimization ICML 2019 3 / 8

slide-4
SLIDE 4

Existing Zeroth-Order SVRG

ZO-SVRG ( Liu et al, 2018)

  • Each outer-loop iteration estimates gradient by ˆ

gs = ˆ ∇randf (xs

0, us 0)

  • Each inner-loop iteration computes

vs

t = 1

|B|

  • i∈B

ˆ ∇randfi(xs

t; us t) − ˆ

∇randfi(xs

0; us 0)

  • + ˆ

gs,

  • Two-point gradient estimator: ˆ

∇randfi(xs

t, us t) = d β (fi(xs t + βus t) − fi(xs t))us t

  • us

t: smoothing vector; β: smoothing parameter

  • K. Ji, Z. Wang, Y. Zhou, Y. Liang

(The Ohio State University) Zeroth-Order Nonconvex Optimization ICML 2019 4 / 8

slide-5
SLIDE 5

Existing Zeroth-Order SVRG

ZO-SVRG ( Liu et al, 2018)

  • Each outer-loop iteration estimates gradient by ˆ

gs = ˆ ∇randf (xs

0, us 0)

  • Each inner-loop iteration computes

vs

t = 1

|B|

  • i∈B

ˆ ∇randfi(xs

t; us t) − ˆ

∇randfi(xs

0; us 0)

  • + ˆ

gs,

  • Two-point gradient estimator: ˆ

∇randfi(xs

t, us t) = d β (fi(xs t + βus t) − fi(xs t))us t

  • us

t: smoothing vector; β: smoothing parameter Algorithms Convergence rate # of function queries ZO-SGD O(

  • d/T)

O(dǫ−2) ZO-SVRG O(d/T + 1/|B|) O(dǫ−2 + nǫ−1)

◮ Issue: ZO-SVRG has worse query complexity than ZO-SGD

  • K. Ji, Z. Wang, Y. Zhou, Y. Liang

(The Ohio State University) Zeroth-Order Nonconvex Optimization ICML 2019 4 / 8

slide-6
SLIDE 6

ZO-SVRG-Coord-Rand vs ZO-SVRG

ZO-SVRG-Coord-Rand (This paper)

  • Each outer-loop iteration estimates gradient by ˆ

gs = ˆ ∇coordfS(xk)

◮ As a comparison, ZO-SVRG uses ˆ

gs = ˆ ∇randf (xs

0, us 0)

  • Each inner-loop iteration computes

vs

t = 1

|B|

  • i∈B

ˆ ∇randfi(xs

t;

us

i,t

  • ZO-SVRG: us

t

) − ˆ ∇randfi(xs

0;

us

i,t

  • ZO-SVRG: us

)

  • + ˆ

gs,

  • ˆ

∇coordf (·): coordinate-wise gradient estimator

Algorithms Convergence rate Function query complexity ZO-SGD O(

  • d/T)

O(dǫ−2) ZO-SVRG O(d/T + 1/|B|) O(dǫ−2 + nǫ−1) ZO-SVRG-Coord-Rand O(1/T) O

  • min
  • dǫ−5/3, dn2/3ǫ−1
  • K. Ji, Z. Wang, Y. Zhou, Y. Liang

(The Ohio State University) Zeroth-Order Nonconvex Optimization ICML 2019 5 / 8

slide-7
SLIDE 7

Sharp Analysis for ZO-SVRG-Coord (Liu et al, 2018)

ZO-SVRG-Coord (Liu et al, 2018)

  • Each outer-loop iteration estimates gradient by ˆ

gs = ˆ ∇coordfS(xk)

  • Each inner-loop iteration computes

vs

t = 1

|B|

  • i∈B

ˆ ∇coordfi(xs

t; us i,t) − ˆ

∇coordfi(xs

0; us i,t)

  • + ˆ

gs,

Algorithms Stepsize Convergence rate Function query complexity ZO-SVRG-Coord O( 1

d )

O( d

T )

O

  • dn + d2

ǫ + dn ǫ

  • ZO-SVRG-Coord (our analysis)

O(1) O( 1

T )

O

  • min
  • d

ǫ5/3 , dn2/3 ǫ

  • Key idea:
  • Coordinate-wise gradient estimator → high accuracy → faster rate
  • K. Ji, Z. Wang, Y. Zhou, Y. Liang

(The Ohio State University) Zeroth-Order Nonconvex Optimization ICML 2019 6 / 8

slide-8
SLIDE 8

More Results

  • Develop a faster zeroth-order SPIDER-type algorithm
  • Develop improved zeroth-order algorithms for

◮ nonconvex nonsmooth optimization ◮ convex smooth optimization ◮ Polyak-

Lojasiewicz (PL) condition

  • Experiments:

450 900 1350 1800 # of iterations 8 10 12 14 16 18 20 Loss ZO-SGD ZO-SVRG-Ave SPIDER-SZO ZO-SVRG-Coord ZO-SVRG-Coord-Rand ZO-SPIDER-Coord 1 2 3 4 # of function queries 105 8 10 12 14 16 Loss ZO-SGD ZO-SVRG-Ave SPIDER-SZO ZO-SVRG-Coord ZO-SVRG-Coord-Rand ZO-SPIDER-Coord 200 400 600 800 1000 # of iterations 8 10 12 14 16 18 20 Loss ZO-SGD ZO-SVRG-Ave SPIDER-SZO ZO-SVRG-Coord ZO-SVRG-Coord-Rand ZO-SPIDER-Coord 1 2 3 4 5 # of function queries 105 7 10 13 16 17 Loss ZO-SGD ZO-SVRG-Ave SPIDER-SZO ZO-SVRG-Coord ZO-SVRG-Coord-Rand ZO-SPIDER-Coord

Generating black-box adversarial examples for DNNs

  • K. Ji, Z. Wang, Y. Zhou, Y. Liang

(The Ohio State University) Zeroth-Order Nonconvex Optimization ICML 2019 7 / 8

slide-9
SLIDE 9

Thanks!

  • K. Ji, Z. Wang, Y. Zhou, Y. Liang

(The Ohio State University) Zeroth-Order Nonconvex Optimization ICML 2019 8 / 8