non asymptotic analysis of fractional langevin monte
play

Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for - PowerPoint PPT Presentation

Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization Thanh Huy Nguyen, Umut S im sekli, Ga el Richard LTCI, T el ecom Paris, Institut Polytechnique de Paris, France Non-Asymptotic Analysis of


  1. Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard LTCI, T´ el´ ecom Paris, Institut Polytechnique de Paris, France Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

  2. Introduction Non-convex optimization problem : min f ( x ) Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

  3. Introduction Non-convex optimization problem : min f ( x ) Fractional Langevin Algorithm (FLA) (Simsekli, 2017) : W k +1 = W k − η c α ∇ f ( W k ) + � 1 /α ∆ L α � η/β k +1 − { ∆ L α k } k ∈ N + : α -stable random variables − α ∈ (1 , 2]: the characteristic index, c α : a known constant Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

  4. Introduction Non-convex optimization problem : min f ( x ) Fractional Langevin Algorithm (FLA) (Simsekli, 2017) : W k +1 = W k − η c α ∇ f ( W k ) + � 1 /α ∆ L α � η/β k +1 − { ∆ L α k } k ∈ N + : α -stable random variables − α ∈ (1 , 2]: the characteristic index, c α : a known constant α -stable Distribution α -stable L´ evy Motion : =1.2 100 =1.2 10 -1 =1.6 =1.6 =2.0 =2.0 50 10 -2 0 10 -3 -50 -15 -10 -5 0 5 10 15 0 500 1000 1500 2000 2500 3000 Generalizes Stochastic Gradient Langevin Dynamics ( α = 2) (Welling and Teh, 2011) Strong links with SGD for Deep Neural Networks (Simsekli et al. 2019) Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

  5. Introduction Non-convex optimization problem : min f ( x ) Fractional Langevin Algorithm (FLA) (Simsekli, 2017) : W k +1 = W k − η c α ∇ f ( W k ) + � 1 /α ∆ L α � η/β k +1 − { ∆ L α k } k ∈ N + : α -stable random variables − α ∈ (1 , 2]: the characteristic index, c α : a known constant α -stable Distribution α -stable L´ evy Motion : =1.2 100 =1.2 10 -1 =1.6 =1.6 =2.0 =2.0 50 10 -2 0 10 -3 -50 -15 -10 -5 0 5 10 15 0 500 1000 1500 2000 2500 3000 Generalizes Stochastic Gradient Langevin Dynamics ( α = 2) (Welling and Teh, 2011) Strong links with SGD for Deep Neural Networks (Simsekli et al. 2019) Our Goal: Analyze E [ f ( W k ) − f ⋆ ], where f ⋆ � min f ( x ) Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

  6. Method of Analysis Define three stochastic processes: d X 1 ( t ) = − c α ∇ f ( X 1 ( t − )) d t + β − 1 /α d L α ( t ) , ∞ � ∇ f ( X 2 ( j η )) I [ j η, ( j +1) η [ ( t ) d t + β − 1 /α d L α ( t ) , d X 2 ( t ) = − c α k =0 φ ( X 3 ( t − )) ∂ f ( X 3 ( t − )) � �� d X 3 ( t ) = −D α − 2 /φ ( X 3 ( t − )) d t + β − 1 /α d L α ( t ) . x i ∂ x i Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

  7. Method of Analysis Define three stochastic processes: d X 1 ( t ) = − c α ∇ f ( X 1 ( t − )) d t + β − 1 /α d L α ( t ) , ∞ � ∇ f ( X 2 ( j η )) I [ j η, ( j +1) η [ ( t ) d t + β − 1 /α d L α ( t ) , d X 2 ( t ) = − c α k =0 φ ( X 3 ( t − )) ∂ f ( X 3 ( t − )) � �� d X 3 ( t ) = −D α − 2 /φ ( X 3 ( t − )) d t + β − 1 /α d L α ( t ) . x i ∂ x i − D : Riesz fractional (directional) derivative − X 1 is the continuous-time limit of the FLA algorithm − X 2 is a linearly interpolated version of W k : X 2 ( k η ) = W k , ∀ k ∈ N + − X 3 admits π ∝ exp( − β f ( x )) d x as its unique invariant distribution Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

  8. Method of Analysis Define three stochastic processes: d X 1 ( t ) = − c α ∇ f ( X 1 ( t − )) d t + β − 1 /α d L α ( t ) , ∞ � ∇ f ( X 2 ( j η )) I [ j η, ( j +1) η [ ( t ) d t + β − 1 /α d L α ( t ) , d X 2 ( t ) = − c α k =0 φ ( X 3 ( t − )) ∂ f ( X 3 ( t − )) � �� d X 3 ( t ) = −D α − 2 /φ ( X 3 ( t − )) d t + β − 1 /α d L α ( t ) . x i ∂ x i − D : Riesz fractional (directional) derivative − X 1 is the continuous-time limit of the FLA algorithm − X 2 is a linearly interpolated version of W k : X 2 ( k η ) = W k , ∀ k ∈ N + − X 3 admits π ∝ exp( − β f ( x )) d x as its unique invariant distribution Decompose the error E f ( W k ) − f ∗ as: [ E f ( X 2 ( k η )) − E f ( X 1 ( k η ))] + [ E f ( X 1 ( k η )) − E f ( X 3 ( k η ))] + [ E f ( X 3 ( k η )) − E f ( ˆ W )] + [ E f ( ˆ W ) − f ∗ ] − ˆ W ∼ π ∝ exp( − β f ( x )) d x − Relate these terms to Wasserstein distance between processes Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

  9. Main Result Main assumptions: older continuous gradients: c α �∇ f ( x ) − ∇ f ( y ) � ≤ M � x − y � γ 1 ) H¨ 2 ) Dissipativity: c α � x , ∇ f ( x ) � ≥ m � x � 1+ γ − b Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

  10. Main Result Main assumptions: older continuous gradients: c α �∇ f ( x ) − ∇ f ( y ) � ≤ M � x − y � γ 1 ) H¨ 2 ) Dissipativity: c α � x , ∇ f ( x ) � ≥ m � x � 1+ γ − b Theorem For 0 < η < m / M 2 , there exists C > 0 such that: � q + k 1+max { 1 q ,γ + γ q + γ 1 q } η α q d E [ f ( W k )] − f ∗ ≤ C q ,γ + γ k 1+max { 1 1 q } η ( q − 1) γ β α q � Mc − 1 + β b + d exp( − λ ∗ k η α ) + β γ +1 (1 + γ ) m β d (2 e ( b + d 2 Γ( d 2 + 1) β d β )) + 1 β log . d ( dm ) 2 Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

  11. Main Result Main assumptions: older continuous gradients: c α �∇ f ( x ) − ∇ f ( y ) � ≤ M � x − y � γ 1 ) H¨ 2 ) Dissipativity: c α � x , ∇ f ( x ) � ≥ m � x � 1+ γ − b Theorem For 0 < η < m / M 2 , there exists C > 0 such that: � q + k 1+max { 1 q ,γ + γ q + γ 1 q } η α q d E [ f ( W k )] − f ∗ ≤ C q ,γ + γ k 1+max { 1 1 q } η ( q − 1) γ β α q � Mc − 1 + β b + d exp( − λ ∗ k η α ) + β γ +1 (1 + γ ) m β d (2 e ( b + d 2 Γ( d 2 + 1) β d β )) + 1 β log . d ( dm ) 2 − Worse dependency on η and k than the case α = 2 − Requires smaller η Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

  12. Additional Results Posterior Sampling: sampling from π ∝ exp( − β f ( x )) d x Stochastic Gradients: � n f ( x ) � 1 i =1 f ( i ) ( x ) n � � � ∇ f ≈ ∇ f k ( x ) � i ∈ Ω k ∇ f ( i ) ( x ) / n s Non-Asymptotic Analysis of FLMC for Non-Convex Optimization Thanh Huy Nguyen, Umut S ¸im¸ sekli, Ga¨ el Richard

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend