improved zeroth order variance reduced algorithms and
play

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for - PowerPoint PPT Presentation

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization Kaiyi Ji 1 , Zhe Wang 1 , Yi Zhou 2 , Yingbin Liang 1 1 Ohio State University, 2 Duke University ICML 2019 K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order


  1. Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization Kaiyi Ji 1 , Zhe Wang 1 , Yi Zhou 2 , Yingbin Liang 1 1 Ohio State University, 2 Duke University ICML 2019 K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 1 / 8

  2. Zeroth-order (Gradient-free) Nonconvex Optimization • Problem fomulation: n x ∈ R d f ( x ) := 1 � min f i ( x ) n i =1 ◮ f i ( · ): individual nonconvex loss function ◮ Gradient of f i ( · ) is unknown ◮ Only the function value of f i ( · ) is accessible ◮ Examples: Generation of black-box adversarial samples Parameter optimization for black-box systems Action exploration in reinforcement learning Generating black-box adversarial samples K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 2 / 8

  3. Zeroth-order (Gradient-free) Nonconvex Optimization n x ∈ R d f ( x ) := 1 � min f i ( x ) n i =1 • Standard assumptions on f ( · ): ◮ f ( · ) is bounded below, i.e., f ∗ = inf x ∈ R d f ( x ) > −∞ ◮ ∇ f i ( · ) is L -smooth, i.e., �∇ f i ( x ) − ∇ f i ( y ) � ≤ L � x − y � ◮ (Online case) ∇ f i ( · ) has bounded variance, i.e., there exists σ > 0 s.t. n 1 � �∇ f i ( x ) − ∇ f ( x ) � 2 ≤ σ 2 n i =1 • Optimization goal: find an ǫ -accurate stationary solution E �∇ f ( x ) � 2 ≤ ǫ K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 3 / 8

  4. Existing Zeroth-Order SVRG ZO-SVRG ( Liu et al, 2018) g s = ˆ • Each outer-loop iteration estimates gradient by ˆ ∇ rand f ( x s 0 , u s 0 ) • Each inner-loop iteration computes � ˆ t = 1 � � t ) − ˆ v s ∇ rand f i ( x s t ; u s ∇ rand f i ( x s 0 ; u s 0 ) + ˆ g s , | B | i ∈ B • Two-point gradient estimator: ˆ ∇ rand f i ( x s t , u s t ) = d β ( f i ( x s t + β u s t ) − f i ( x s t )) u s t • u s t : smoothing vector; β : smoothing parameter K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 4 / 8

  5. Existing Zeroth-Order SVRG ZO-SVRG ( Liu et al, 2018) g s = ˆ • Each outer-loop iteration estimates gradient by ˆ ∇ rand f ( x s 0 , u s 0 ) • Each inner-loop iteration computes � ˆ t = 1 � � t ) − ˆ v s ∇ rand f i ( x s t ; u s ∇ rand f i ( x s 0 ; u s 0 ) + ˆ g s , | B | i ∈ B • Two-point gradient estimator: ˆ ∇ rand f i ( x s t , u s t ) = d β ( f i ( x s t + β u s t ) − f i ( x s t )) u s t • u s t : smoothing vector; β : smoothing parameter Algorithms Convergence rate # of function queries � O ( d ǫ − 2 ) ZO-SGD O ( d / T ) O ( d ǫ − 2 + n ǫ − 1 ) ZO-SVRG O ( d / T + 1 / | B | ) ◮ Issue: ZO-SVRG has worse query complexity than ZO-SGD K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 4 / 8

  6. ZO-SVRG-Coord-Rand vs ZO-SVRG ZO-SVRG-Coord-Rand (This paper) g s = ˆ • Each outer-loop iteration estimates gradient by ˆ ∇ coord f S ( x k ) g s = ˆ ◮ As a comparison, ZO-SVRG uses ˆ ∇ rand f ( x s 0 , u s 0 ) • Each inner-loop iteration computes � ˆ t = 1 � � ) − ˆ v s ∇ rand f i ( x s u s ∇ rand f i ( x s u s t ; 0 ; ) + ˆ g s , i , t i , t | B | ���� ���� i ∈ B ZO-SVRG: u s ZO-SVRG: u s t 0 • ˆ ∇ coord f ( · ): coordinate-wise gradient estimator Algorithms Convergence rate Function query complexity � O ( d ǫ − 2 ) ZO-SGD O ( d / T ) O ( d ǫ − 2 + n ǫ − 1 ) ZO-SVRG O ( d / T + 1 / | B | ) � � d ǫ − 5 / 3 , dn 2 / 3 ǫ − 1 �� ZO-SVRG-Coord-Rand O (1 / T ) O min K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 5 / 8

  7. Sharp Analysis for ZO-SVRG-Coord (Liu et al, 2018) ZO-SVRG-Coord (Liu et al, 2018) g s = ˆ • Each outer-loop iteration estimates gradient by ˆ ∇ coord f S ( x k ) • Each inner-loop iteration computes � ˆ t = 1 � � v s ∇ coord f i ( x s t ; u s i , t ) − ˆ ∇ coord f i ( x s 0 ; u s i , t ) + ˆ g s , | B | i ∈ B Algorithms Stepsize Convergence rate Function query complexity dn + d 2 � � O ( 1 O ( d ǫ + dn ZO-SVRG-Coord d ) T ) O ǫ � � ǫ 5 / 3 , dn 2 / 3 �� O ( 1 d ZO-SVRG-Coord (our analysis) O (1) T ) O min ǫ Key idea: • Coordinate-wise gradient estimator → high accuracy → faster rate K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 6 / 8

  8. More Results • Develop a faster zeroth-order SPIDER-type algorithm • Develop improved zeroth-order algorithms for ◮ nonconvex nonsmooth optimization ◮ convex smooth optimization ◮ Polyak-� Lojasiewicz (PL) condition • Experiments: 20 16 20 17 ZO-SGD ZO-SGD ZO-SGD 16 ZO-SGD 18 18 ZO-SVRG-Ave ZO-SVRG-Ave ZO-SVRG-Ave 14 ZO-SVRG-Ave 16 SPIDER-SZO 16 SPIDER-SZO SPIDER-SZO SPIDER-SZO ZO-SVRG-Coord 13 ZO-SVRG-Coord ZO-SVRG-Coord Loss 14 Loss 12 ZO-SVRG-Coord Loss 14 Loss ZO-SVRG-Coord-Rand ZO-SVRG-Coord-Rand ZO-SVRG-Coord-Rand ZO-SVRG-Coord-Rand 12 12 ZO-SPIDER-Coord ZO-SPIDER-Coord 10 ZO-SPIDER-Coord ZO-SPIDER-Coord 10 10 10 8 8 8 7 0 450 900 1350 1800 1 2 3 4 0 200 400 600 800 1000 1 2 3 4 5 # of iterations # of function queries 10 5 # of iterations # of function queries 10 5 Generating black-box adversarial examples for DNNs K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 7 / 8

  9. Thanks! K. Ji, Z. Wang, Y. Zhou, Y. Liang Zeroth-Order Nonconvex Optimization (The Ohio State University) ICML 2019 8 / 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend