acceleration through optimistic no regret dynamics
play

Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and - PowerPoint PPT Presentation

Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and Jacob Abernethy Georgia Tech Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics Convex Optimization min x X f ( x ) (1) Method: Gradient


  1. Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and Jacob Abernethy Georgia Tech Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  2. Convex Optimization min x ∈X f ( x ) (1) Method: Gradient Descent, Frank-Wolfe method, Nesterov’s accelerated method, Heavy Ball ... etc. Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  3. Convex Optimization min x ∈X f ( x ) (1) Method: Gradient Descent, Frank-Wolfe method, Nesterov’s accelerated method, Heavy Ball ... etc. L -smooth convex problems min x ∈X f ( x ) . : Nesterov’s accelerated method: O ( 1 T 2 ) . L -smooth and µ -strongly convex problems min x ∈X f ( x ) . Denote κ := L µ . : Nesterov’s accelerated method: O ( exp ( − T √ κ )) . Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  4. Online learning (minimizing regret) Online learning protocol: 1: for t = 1 to T do Play w t according to OnlineAlgorithm w � � ℓ 1 ( w 1 ) , . . . , ℓ t − 1 ( w t − 1 ) . 2: 3: Receive loss function ℓ t ( · ) and suffer loss ℓ t ( w t ) . 4: end for T := � T t = 1 ℓ t ( w t ) − � T Regret w t = 1 ℓ t ( w ∗ ) . convex loss functions { ℓ t ( · ) } T t = 1 . Regret w = O ( 1 T ) . T √ T strongly convex loss functions { ℓ t ( · ) } T t = 1 . Regret w = O ( log T T ) . T T Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  5. New perspective: A two-player zero-sum game A zero-sum game ( Fenchel game ) g ( x , y ) := � x , y � − f ∗ ( y ) . V ∗ := min def Fenchel y ∈Y � x , y � − f ∗ ( y ) x ∈X max y ∈Y g ( x , y ) = min x ∈X max = x ∈X f ( x ) . min Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  6. New perspective: A two-player zero-sum game A zero-sum game ( Fenchel game ) g ( x , y ) := � x , y � − f ∗ ( y ) . V ∗ := min def Fenchel y ∈Y � x , y � − f ∗ ( y ) x ∈X max y ∈Y g ( x , y ) = min x ∈X max = x ∈X f ( x ) . min Equivalent to solving the underlying optimization problem! If ( ˆ x , ˆ y ) is an ǫ -equilibrium of the game, then f (ˆ x ) ≤ min f ( x ) + ǫ. x Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  7. Meta algorithm for Fenchel-game Algorithm 0 Meta Algorithm 1: Given a sequence of weights { α t } . 2: for t = 1 , 2 , . . . , T do y t := OnlineAlgorithm Y ( g ( x 1 , · ) , . . . , g ( x t − 1 , · ) . 3: x t := OnlineAlgorithm X ( g ( · , y 1 ) , . . . , g ( · , y t − 1 ) , g ( · , y t )) . 4: y -player‘s loss function: α t ℓ t ( y ) := α t ( f ∗ ( y ) − � x t , y � ) . 5: x -player‘s loss function: α t h t ( x ) := α t ( � x , y t � − f ∗ ( y t )) . 6: 7: end for � � T � � T s = 1 α s x s s = 1 α s y s 8: Output (¯ x T , ¯ y T ) := , . A T A T Let x ∗ = arg min x f ( x ) . � T � T α -R EG y := t = 1 α t ℓ t ( y t ) − min y ∈Y t = 1 α t ℓ t ( y ) (2) T T � � α -R EG x α t h t ( x ∗ ) := α t h t ( x t ) − (3) t = 1 t = 1 Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  8. Meta algorithm for Fenchel-game Algorithm 0 Meta Algorithm 1: Given a sequence of weights { α t } . 2: for t = 1 , 2 , . . . , T do y t := OnlineAlgorithm Y ( g ( x 1 , · ) , . . . , g ( x t − 1 , · ) . 3: x t := OnlineAlgorithm X ( g ( · , y 1 ) , . . . , g ( · , y t − 1 ) , g ( · , y t )) . 4: y -player‘s loss function: α t ℓ t ( y ) := α t ( f ∗ ( y ) − � x t , y � ) . 5: x -player‘s loss function: α t h t ( x ) := α t ( � x , y t � − f ∗ ( y t )) . 6: 7: end for � � T � � T s = 1 α s x s s = 1 α s y s 8: Output (¯ x T , ¯ y T ) := , . A T A T , A T := � T Define the weighted average regret α -R EG := α -R EG t = 1 α t . A T x T ) ≤ min x f ( x ) + α -R EG x + α -R EG y Theorem: f (¯ . A T A T Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  9. Nesterov’s 1983 accelerated method (Unconstrained Optimization: min x ∈ R n f ( x )) ) Algorithm 1 Nesterov’s method from the Meta Algorithm 1: Given the sequence of weights { α t = t } . 2: for t = 1 , 2 , . . . , T do y -player plays Optimistic-FTL . 3: � t − 1 y t ← ∇ f ( � x t ) = arg min y ∈Y s = 1 α s ℓ s ( y ) + m t ( y ) , A t ( α t x t − 1 + � t − 1 x t := 1 � where m t ( y ) = α t ℓ t − 1 ( y ) and s = 1 α s x s ) . 4: x -player plays Gradient Descent . x t = x t − 1 − γ t α t ∇ h t ( x ) = x t − 1 − γ t α t y t = x t − 1 − γ t α t ∇ f ( � x t ) . 5: 6: end for � � T � � T s = 1 α s x s s = 1 α s y s 7: Output (¯ x T , ¯ y T ) := , . A T A T x t − 1 x t + 1 ) + ( t − 1 ¯ x t + 1 = ¯ 4 L ∇ f ( � t + 2 )(¯ x t − ¯ x t − 1 ) . Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  10. Other accelerated variants (Constrained Optimization: min x ∈K f ( x )) ) Algorithm 2 Nesterov‘s method from the Meta Algorithm 1: Given the sequence of weights { α t = t } . 2: for t = 1 , 2 , . . . , T do y -player plays Optimistic-FTL . 3: � t − 1 y t ← ∇ f ( � x t ) = arg min y ∈Y s = 1 α s ℓ s ( y ) + m t ( y ) , A t ( α t x t − 1 + � t − 1 x t := 1 where m t ( y ) = α t ℓ t − 1 ( y ) and � s = 1 α s x s ) . (A) x -player plays Mirror Descent . 4: x t = arg min x ∈K γ t � x , α t y t � + V x t − 1 ( x ) . 5: Or, (B) x -player plays Be-The-Regularized-Leader . 6: � t s = 1 θ s � x , α s y s � + 1 x t = arg min x ∈K η R ( x ) , 7: 8: end for � � T � � T s = 1 α s x s s = 1 α s y s 9: Output (¯ x T , ¯ y T ) := . , A T A T (A) Nesterov’s 1988 (1-memory) and (B) Nesterov’s 2005 ( ∞ -memory) accelerated method Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  11. Heavy Ball method (Unconstrained Optimization: min x ∈ R n f ( x )) ) Algorithm 3 Heavy Ball from the Meta Algorithm 1: Given the sequence of weights { α t = t } . 2: for t = 1 , 2 , . . . , T do y -player plays FTL . 3: � t − 1 � t − 1 s = 1 α s x s y t ← ∇ f (¯ s = 1 α s ℓ s ( y ) ¯ x t − 1 ) = arg min y ∈Y x t − 1 := A t − 1 x -player plays Gradient Descent . 4: x t = x t − 1 − γ t α t ∇ h t ( x ) = x t − 1 − γ t α t y t = x t − 1 − γ t α t ∇ f ( � x t ) . 5: 6: end for � � T � � T s = 1 α s x s s = 1 α s y s 7: Output (¯ x T , ¯ y T ) := . , A T A T x t − 1 − γ t α 2 x t − 1 ) + ( α t A t − 2 x t = ¯ ¯ A t ∇ f (¯ A t α t − 1 )(¯ x t − 1 − ¯ x t − 2 ) . (Heavy ball) t x t − 1 − γ t α 2 x t ) + ( α t A t − 2 ¯ x t = ¯ A t α t − 1 )(¯ x t − 1 − ¯ A t ∇ f ( � x t − 2 ) . (Nesterov‘s alg.) t Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  12. Analysis: L -smooth convex optimization problems y -player plays Optimistic-FTL � t − 1 y t ← ∇ f ( � x t ) = arg min y ∈Y s = 1 α s ℓ s ( y ) + α t ℓ t − 1 ( y ) α -R EG y := � T � T t = 1 α t ℓ t ( y ) ≤ � T L α 2 A t � x t − 1 − x t � 2 . t = 1 α t ℓ t ( y t ) − min t t = 1 y ∈Y x -player plays MirrorDescent x t = arg min x ∈K γ ′ t �∇ f ( � x t ) , x � + V x t − 1 ( x ) α -R EG x := � T t = 1 α t h t ( x t ) − � T γ T − � T t = 1 α t h t ( x ∗ ) ≤ D 2 γ t � x t − 1 − x t � 2 . 1 t = 1 where D is a constant such that V x t ( x ∗ ) ≤ D for all t . � 2 γ t ) � x t − 1 − x t � 2 � γ T + � T ✭ t = 1 ✭✭✭✭✭✭✭✭✭✭ ( α 2 1 D 1 = O ( LD f (¯ x T ) − min x ∈X f ( x ) ≤ A t L − T 2 ) . t A T 1 1 as long as γ t satisfying CL ≤ γ t ≤ 4 L for some constant C > 4. Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  13. Thank you! Other instances of the meta-algorithm Accelerated linear rate of Nesterov’s method for strongly convex and smooth problems Accelerated Proximal Method Accelerated Frank-Wolfe Come to our poster #156! Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend