on adaptive strategies and convex optimization algorithms
play

On Adaptive Strategies and Convex Optimization Algorithms Joon Kwon - PowerPoint PPT Presentation

On Adaptive Strategies and Convex Optimization Algorithms Joon Kwon joint work with Panayotis Mertikopoulos Institut de Math ematiques de Jussieu Universit e Pierre-et-Marie-Curie Paris, France Workshop on Algorithms and Dynamics for


  1. On Adaptive Strategies and Convex Optimization Algorithms Joon Kwon joint work with Panayotis Mertikopoulos Institut de Math´ ematiques de Jussieu Universit´ e Pierre-et-Marie-Curie Paris, France Workshop on Algorithms and Dynamics for Games and Optimization Playa Blanca, Tongoy, Chile October 2013

  2. Framework ( V , ∥ · ∥ ) a normed space of finite dimension and ( V ∗ , ∥ · ∥ ∗ ) its dual C ⊂ V a convex compact set Nature chooses a sequence u 1 , . . . , u n , . . . ∈ V ∗ ▶ choose x 1 ∈ C ▶ u 1 is revealed ▶ get payoff ⟨ u 1 | x 1 ⟩ . . . ▶ A stage n + 1, knowing u 1 , . . . , u n choose x n +1 ∈ C ▶ u n +1 is revealed ▶ get payoff ⟨ u n +1 | x n +1 ⟩ ( V ∗ ) n − → C σ = ( σ n ) n ⩾ 1 σ n +1 : ( u 1 . . . , u n ) �− → x n +1 n ∑ maximize ⟨ u k | x k ⟩ k =1

  3. The Case of the simplex ▶ V = V ∗ = R d � { } d � ∑ x ∈ R d � ▶ C = ∆ d = x i = 1 prob. dist. on { 1 , . . . , d } ⇝ + � � i =1 ▶ Choose x n +1 ∈ ∆ d , ▶ Draw i n +1 ∈ { 1 , . . . , d } according to x n +1 , ▶ Get payoff u i n +1 n +1 . [ n ] n ∑ ∑ u i k = ⟨ u k | x k ⟩ E k k =1 k =1

  4. The Regret Wish : A strategy σ such that: [ ( )] n n 1 ∑ ∑ ∀ ( u n ) n ⩾ 1 , lim sup max ⟨ u k | x ⟩ − ⟨ u k | x k ⟩ ⩽ 0 n x ∈ C n → + ∞ k =1 k =1 � �� � Regret Speed of convergence?

  5. Extension to convex losses ▶ ℓ n : C − → R convex loss functions ▶ Loss: ℓ n ( x n ) n n n ∑ ∑ ∑ ℓ k ( x k ) − min ℓ k ( x ) = max ( ℓ k ( x k ) − ℓ k ( x )) x ∈ C x ∈ C k =1 k =1 k =1 n ∑ ⩽ max ⟨∇ ℓ k ( x k ) | x k − x ⟩ x ∈ C k =1 n n ∑ ∑ = max ⟨−∇ ℓ k ( x k ) | x ⟩ − ⟨−∇ ℓ k ( x k ) | x k ⟩ x ∈ C k =1 k =1 n n ∑ ∑ = max ⟨ u k | x ⟩ − ⟨ u k | x k ⟩ x ∈ C k =1 k =1 u n = −∇ ℓ n ( x n )

  6. Convex optimization ▶ f : C − → R convex function ℓ n = f n n n 1 1 ℓ k ( x ) = 1 ∑ ∑ ∑ ℓ k ( x k ) − min f ( x k ) − min x ∈ C f ( x ) n n n x ∈ C k =1 k =1 k =1

  7. A Family of strategies u 1 , u 2 , . . . , u n ∈ V ∗ ↓ n ∑ u k ∈ V ∗ k =1 ↓ ( n ) ∑ x n +1 = Q u k k =1 ( Q : V ∗ − → C )

  8. V ∗ Q : − → C �− → arg max {⟨ y | x ⟩ − h ( x ) } y x ∈ C = - . . . . . argmax argmax h : C − → R convex ▶ continous ⇝ Q h ( y ) exists h max ▶ strictly convex ⇝ Q h ( y ) is h min unique . . . ( ) n ∑ x n +1 = Q h η n u k = Q h ( y n ) η n > 0 and ↘ k =1

  9. Some known strategies and algorithms ▶ Exponential Weight Algorithm (EWA) ▶ 1 / √ n -Exponential Weight Algorithm (1 / √ n -EWA) ▶ Vanishingly Smooth Fictitious Play (VSFP) ▶ Smooth Fictitious Play (SFP) ▶ Projected Subgradient Method (PSM) ▶ Mirror Descent (MD) ▶ Online Gradient Descent (OGD) ▶ Online Mirror Descent (OMD) ▶ Follow the Regularized Leader (FRL)

  10. Exponential Weight Algorithm ▶ C = ∆ d ( ) n ∑ exp η u k , i k =1 x n +1 , i = ) . ( d n ∑ ∑ exp η u k , j j =1 k =1 d e y i ∑ h ( x ) = x i log x i − → Q h ( y ) i = ∑ d j =1 e y j i =1 ( ) n ∑ x n +1 = Q h η u k k =1

  11. Projected Subgradient Method ∥ x − y n ∥ 2 x n +1 = arg min 2  x ∈ C y n = − ∑ n k =1 γ k ∇ f ( x k ) { }  ∥ x ∥ 2 2 − 2 ⟨ y n | x ⟩ + ∥ y n ∥ 2 = arg min 2 x n +1 = arg min ∥ x − y n ∥ 2 . x ∈ C  x ∈ C { } ⟨ y n | x ⟩ − 1 2 ∥ x ∥ 2 = arg max 2 x ∈ C h ( x ) = 1 2 ∥ x ∥ 2 2 u n = − γ n ∇ f ( x n )

  12. Name C h η n u n ∥ · ∥ References d Littlestone, Warmuth 1994 ∑ EW ∆ d x i log x i η – ∥ · ∥ 1 Sorin 2009 i =1 d 1 / √ n -EW η Auer, Cesa-Bianchi, ∑ ∆ d x i log x i √ n – ∥ · ∥ 1 Gentile 2002 i =1 η n α VSFP ∆ d any – ∥ · ∥ 1 Bena¨ ım, Faure 2013 α ∈ ( − 1 , 0) η Fudenberg, Levine 1995 SFP ∆ d any – ∥ · ∥ 1 Bena¨ ım, Hofbauer, Sorin 2006 n 1 2 ∥ · ∥ 2 PSM any 1 − γ n ∇ f ( x n ) ∥ · ∥ 2 Polyak 69? 2 Nemirovski, Yudin 1983 MD any any 1 − γ n ∇ f ( x n ) any Beck, Teboulle 2003 1 2 ∥ · ∥ 2 OGD any 1 − γ n ∇ f n ( x n ) ∥ · ∥ 2 Zinkevich 2003 2 OMD any any η −∇ f n ( x n ) any Shalev-Shwartz 2007 FRL any any η – any Shalev-Shwartz 2007

  13. Interrelations . . . . . . . . . . • FRL MD OGD VSFP SFP 1 √ n -EW OMD PSM EW

  14. The Continuous-Time Counterpart V ∗ R ∗ u : − → η : − → R + R + + meas. cont., ↘ t �− → u t t �− → η t ( ) n ∑ x n +1 = Q h η n u k k =1 ∫ t ( ) x t = Q h ˜ η t = Q h ( y t ) u s ds 0 Theorem ∀ ( u t ) t ∈ R + , ∫ t ∫ t x s ⟩ ds ⩽ h max − h min ∀ t ⩾ 0 , max ⟨ u s | x ⟩ ds − ⟨ u s | ˜ η t x ∈ C 0 0

  15. The Analysis ∫ t ∫ t x s ⟩ ds ⩽ h max − h min max ⟨ u s | x ⟩ ds − ⟨ u s | ˜ η t x ∈ C 0 0 ∫ t ⟨ y t | x ⟩ ⩽ h ∗ ( y t ) ⟨ u s | x ⟩ ds = 1 + h ( x ) η t η t η t 0 ∫ t ( h ∗ ( y s ) ) ⩽ h ∗ (0) d ds + h max + η 0 ds η s η t 0 � �� � η s /η 2 ⩽ ⟨ u s | ˜ x s ⟩ + h min ˙ s ∫ t ( ) ⩽ − h min − 1 + 1 + h max + ⟨ u s | ˜ x s ⟩ ds + h min η 0 η t η 0 η t 0 ∫ t x s ⟩ ds + h max − h min ⟨ u s | ˜ ⩽ η t 0

  16. Back to Discrete Time ∫ t ∫ t x s ⟩ ds ⩽ h max − h min max ⟨ u s | x ⟩ ds − ⟨ u s | ˜ η t x ∈ C 0 0 ( u n ) n ⩾ 1 , h , ( η n ) n ⩾ 1 u t = u ⌈ t ⌉ , η t cont. interp. of η n  x n +1 = Q h ( y n )   n  ∑ x t = Q h ( y t ) ˜ y n = η n  u k  ∫ t  k =1 y t = η t u s ds  0 n n ∫ n ∫ n ∑ ∑ max ⟨ u k | x ⟩ − ⟨ u k | x k ⟩ ⩽ ? max ⟨ u t | x ⟩ dt − ⟨ u t | ˜ x t ⟩ dt x ∈ C x ∈ C 0 0 k =1 k =1 ∫ n ∫ n ⟨ ⟩ ⟨ u t | ˜ x t ⟩ dt u t | ˜ x ⌊ t ⌋ dt 0 0

  17. � � � � � ⟩� �⟨ ⟩ �⟨ � = � ˜ − ⟨ u s | ˜ x s ⟩ � ˜ x ⌊ s ⌋ − ˜ u s x ⌊ s ⌋ u s x s � � � ⩽ ∥ u s ∥ ∗ � ˜ x ⌊ s ⌋ − ˜ x s � � �� � ⩽ 1 � � � Q h ( y ⌊ s ⌋ ) − Q h ( y s ) ⩽ � � � ⩽ K � y s − y ⌊ s ⌋ � ∗ � � ∫ s ∫ v � � � � η v u + ( − ˙ η v ) u v dv ⩽ K � � � � ⌊ s ⌋ 0 ∗ ⩽ K ( η s − s ˙ η s )

  18. Q h = ∇ h ∗ ⇒ h 1 ∇ h ∗ K -Lipschitz ⇐ K -strongly convex Definition f is C -strongly convex wrt ∥ · ∥ if ∀ x , y , ∀ λ ∈ [0 , 1], f ( λ x + (1 − λ ) y ) ⩽ λ f ( x ) + (1 − λ ) f ( y ) − C 2 λ (1 − λ ) ∥ y − x ∥ 2 d x i log x i ∑ is 1-strongly convex wrt ∥ · ∥ 1 i =1 1 2 ∥ · ∥ 2 is 1-strongly convex wrt ∥ · ∥ 2 2

  19. Theorem 1. h K-strongly convex on C wrt ∥ · ∥ 2. ( η n ) n ⩾ 1 positive and nonincreasing 3. η t a continuous and nonincresing interpolation ( ) n ∑ 4. x n +1 = Q h η n u k k =1 Then, for every sequence ∥ u n ∥ ∗ ⩽ M, ∫ n n n + M 2 ⟨ u k | x k ⟩ ⩽ h max − h min ∑ ∑ max ⟨ u k | x k ⟩ − ( η t − t ˙ η t ) dt . η n K x ∈ C 0 k =1 k =1

  20. Name Assumption Bound on the regret log d EW ∥ u n ∥ ∞ ⩽ 1 + η n η ( log d ) √ n 1 / √ n -EW ∥ u n ∥ ∞ ⩽ 1 + 3 η η h max − h min n − α + η (1 − α ) C (1 + α ) n α +1 VSFP ∥ u n ∥ ∞ ⩽ 1 η h max − h min n + η (1 + log n ) SFP ∥ u n ∥ ∞ ⩽ 1 η K ∥ C ∥ 2 / 2 + M 2 ∑ n k =1 γ 2 k PSM ∥∇ f ∥ 2 ⩽ M ∑ n k =1 γ k h max − h min + M 2 / (2 K ) ∑ n k =1 γ 2 k MD ∥∇ f ∥ ∗ ⩽ M ∑ n k =1 γ k ∥ C ∥ 2 / 2 + M 2 ∑ n k =1 γ 2 k OGD ∥∇ f n ∥ 2 ⩽ M ∑ n k =1 γ k + η M 2 h max − h min OMD ∥∇ f n ∥ ∗ ⩽ M n η K + η M 2 h max − h min FRL ∥ u n ∥ ∗ ⩽ M n η K

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend