spider near optimal non convex optimization via
play

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path - PowerPoint PPT Presentation

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator Cong Fang Chris Junchi Li Zhouchen Lin Tong Zhang Problem We consider the following non-convex problem: n f ( x ) = 1 minimize f i ( x )


  1. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator Cong Fang Chris Junchi Li Zhouchen Lin Tong Zhang

  2. Problem We consider the following non-convex problem: n � f ( x ) = 1 minimize f i ( x ) (**) n x ∈ R d i =1 Study both finite-sum case ( n is finite) and online case ( n is ∞ ). • ǫ -approximate first-order stationary point , or simply an FSP, if �∇ f ( x ) � ≤ ǫ (0.1) • ( ǫ, δ ) -approximate second-order stationary point , or simply an SSP, if � � ≥ −O ( √ ε ) ∇ 2 f ( x ) �∇ f ( x ) � ≤ ǫ, λ min (0.2) 50 20 20 15 40 10 10 30 z 5 20 0 z z 0 10 5 10 10 20 15 4 20 30 2 4 0 y 2 2 4 0 y 2 4 2 0 2 4 4 4 2 2 0 2 0 2 4 y 0 4 2 4 2 4 x x x 4 Local Minimizer Conspicuous Saddle SSP

  3. Comparison of Existing Methods Algorithm Online Finite-Sum ε − 4 n ε − 2 GD / SGD (Nesterov,2004) (Allen-Zhu, Hazan, 2016) First-order ε − 3 . 333 n + n 2 / 3 ε − 2 SVRG / SCSG (Reddi et al., 2016) Stationary (Lei et al., 2017) Point ε − 3 n + n 1 / 2 ε − 2 SNVRG (Zhou et al., 2018) ε − 3 n + n 1 / 2 ε − 2 ∆ Spider -SFO (this work) (Ge et al.,2015) poly ( d ) ε − 4 n ε − 2 Perturbed GD / SGD (Jin et al.,2017b) Second-order Neon +GD (Xu et al.,2017) ε − 4 n ε − 2 Stationary / Neon +SGD (Allen-zhu, Li,2017b) Point n ε − 1 . 75 AGD (Jin et al.,2017b) N/A (Allen-Zhu, Hazan, 2016) ε − 3 . 5 Neon +SVRG n ε − 1 . 5 + n 2 / 3 ε − 2 (Reddi et al.,2016) (Hessian- ( ε − 3 . 333 ) / Neon +SCSG Lipschitz (Lei et al.,2017) Required) (Agarwal et al.,2017) n ε − 1 . 5 + n 3 / 4 ε − 1 . 75 ε − 3 . 5 Neon +FastCubic/CDHS (Carmon et al.,2016) (Tripuraneni et al.,2017) (Allen-Zhu, 2017) ε − 3 . 5 n ε − 1 . 5 + n 2 / 3 ε − 2 Neon +Natasha2 (Xu et al., 2017) ( ε − 3 . 25 ) (Allen-Zhu, Li, 2015) n 1 / 2 ǫ − 2 ( n ≥ ǫ − 1 ) Spider -SFO + ε − 3 (this work)

  4. Example: Algorithm for Searching FSP in Expectation Algorithm 1 Spider -SFO in Expectation: Input x 0 , q , S 1 , S 2 , n 0 , ǫ (For a finding FSP) 1: for k = 0 to K do if mod ( k, q ) = 0 then 2: Draw S 1 samples (or compute the full gradient for the finite-sum case), v k = ∇ f S 1 ( x k ) 3: else 4: Draw S 2 samples, and let v k = ∇ f S 2 ( x k ) − ∇ f S 2 ( x k − 1 ) + v k − 1 5: end if 6: x k +1 = x k − η k v k where η k = min � � ǫ 1 Ln 0 � v k � , 7: 2 Ln 0 8: end for x chosen uniformly at random from { x k } K − 1 9: Return ˜ k =0 • We prove the stochastic gradient costs to find an approximate FSP is both � n 1 / 2 ǫ − 2 � upper and lower bounded by O under certain conditions • A similar complexity has also been obtain by Zhou et al., (2018)

  5. Stochastic Path-Integrated Differential Estimator: Core Idea Observe a sequence � x 0: K = { � x 0 , . . . , � x K } , the goal is to dynamically track for a x k ) for k = 0 , 1 , . . . , K quantity Q ( x ). For Q ( � • Initial estimate � x 0 ) ≈ Q ( � x 0 ) Q ( � x k ) − Q ( � x k − 1 ) such that for each • Unbiased estimate ξ k ( � x 0: k ) of Q ( � k = 1 , . . . , K x k ) − Q ( � x k − 1 ) E [ ξ k ( � x 0: k ) | � x 0: k ] = Q ( � • Integrate the stochastic differential estimate as K � � x 0: K ) := � x 0 ) + Q ( � Q ( � ξ k ( � x 0: k ) (0.3) k =1 • Call estimator � Q ( � x 0: K ) the Stochastic Path-Integrated Differential EstimatoR, or Spider for brevity • Example: Q ( x ) is picked as ∇ f ( x ) (or f ( x )) A similar idea, named SARAH, has been proposed by Nguyen et al. (2017)

  6. Summary and Extension Summary: (i) Proposed Spider technique for tracking: • Avoidance of excessive access of oracles and reduction of time complexity • Potential application in many stochastic estimation problems (ii) Proposed Spider -SFO algorithms for first-order non-convex optimization • Achieves � O ( ε − 3 ) rate for finding ε -FSP in expectation • Proved that Spider -SFO matches the lower bound in the finite-sum case (Carmon et al. 2017) Extension in the long version: https://arxiv.org/pdf/1807.01695.pdf (i) Obtain high-probability results for Spider -SFO (ii) Proposed Spider -SFO + algorithms for first-order non-convex optimization O ( ε − 3 ) rate for finding ( ε, O ( √ ε ))-SSP • Achieves � (iii) Proposed Spider -SZO algorithm for zeroth-order non-convex optimization • Achieves an improved rate of O ( d ε − 3 )

  7. Thank you! Welcome to Poster #49 in Room 210 & 230 AB today!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend