optimal algorithms for online convex optimization with
play

Optimal Algorithms for Online Convex Optimization with Multi-Point - PowerPoint PPT Presentation

Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback Alekh Agarwal Ofer Dekel Lin Xiao UC Berkeley Microsoft Research Online Convex Optimization (Full-Info) Adversary Player Online Convex Optimization


  1. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback Alekh Agarwal Ofer Dekel Lin Xiao UC Berkeley Microsoft Research

  2. Online Convex Optimization (Full-Info) Adversary Player

  3. Online Convex Optimization (Full-Info) Adversary Player x 1 K x 1

  4. Online Convex Optimization (Full-Info) Adversary Player x 1 ℓ 1 K x 1

  5. Online Convex Optimization (Full-Info) Player updates x t +1 = Π K ( x t − η ∇ ℓ t ( x t )). Adversary Player x 1 ℓ 1 ∇ ℓ 1 ( x 1 ) K x 1 x 2

  6. Online Convex Optimization (Full-Info) Adversary Player x 1 ℓ 1 x T ℓ T K x T x 1 x 2 Minimize regret: R T = � T � T t =1 ℓ t ( x t ) − min x ∈K t =1 ℓ t ( x ).

  7. Bandit Convex Optimization Adversary Player

  8. Bandit Convex Optimization Adversary Player x 1 K x 1

  9. Bandit Convex Optimization Adversary Player x 1 ℓ 1 ( x 1 ) ℓ 1 K x 1

  10. Bandit Gradient Descent [FKM’05] Adversary Player x 1 Full−Info K x 1

  11. Bandit Gradient Descent [FKM’05] Adversary Player y 1 x 1 Full−Info K y 1 x 1

  12. Bandit Gradient Descent [FKM’05] Adversary Player y 1 ℓ 1 ( y 1 ) x 1 Full−Info K y 1 ℓ 1 x 1

  13. Bandit Gradient Descent [FKM’05] Updates x t +1 = Π (1 − ξ ) K ( x t − η t g t ). Adversary Player y 1 ℓ 1 ( y 1 ) x 1 g 1 Full−Info K y 1 ℓ 1 x 1 Minimize regret: R T = � T � T t =1 ℓ t ( y t ) − min x ∈K t =1 ℓ t ( x ).

  14. A survey of known regret bounds Linear Convex Strongly Convex Upper Lower Upper Lower Upper Lower √ √ √ √ Full-Info O ( T ) O ( T ) O ( T ) O ( T ) O (log T ) O (log T ) Deterministic results against completely adaptive adversaries in Full-Info.

  15. A survey of known regret bounds Linear Convex Strongly Convex Upper Lower Upper Lower Upper Lower √ √ √ √ Full-Info O ( T ) O ( T ) O ( T ) O ( T ) O (log T ) O (log T ) √ √ √ √ O ( T 3 / 4 ) O ( T 2 / 3 ) Bandit O ( T ) O ( T ) O ( T ) O ( T )? Deterministic results against completely adaptive adversaries in Full-Info. High probability results against adaptive adversaries for Bandit.

  16. The Multi-Point (MP) feedback setup Want to interpolate between bandit and full information. Player allowed several queries per round. Adversary reveals value of ℓ t at all points picked. Average regret on points played: T k 1 � � R T = ℓ t ( y t , i ) − min x ∈K ℓ t ( x ) . k t =1 i =1

  17. A survey of known regret bounds Linear Convex Strongly Convex Upper Lower Upper Lower Upper Lower √ √ √ √ Full-Info O ( T ) O ( T ) O ( T ) O ( T ) O (log T ) O (log T ) √ √ √ √ O ( T 3 / 4 ) O ( T 2 / 3 ) Bandit O ( T ) O ( T ) O ( T ) O ( T )? √ √ √ √ MP Bandit O ( T ) O ( T ) O ( T ) O ( T ) O (log T ) O (log T ) Deterministic results against completely adaptive adversaries in Full-Info. High probability results against adaptive adversaries for Bandit.

  18. Properties of gradient estimator g t [FKM’05] g t = d δ ℓ t ( x t + δ u t ) u t . Unbiased for linear functions. Nearly unbiased for general convex functions. ℓ t x t − δ x t + δ x t

  19. Properties of gradient estimator g t [FKM’05] g t = d δ ℓ t ( x t + δ u t ) u t . Unbiased for linear functions. Nearly unbiased for general convex functions. ℓ t ( x t + δ ) ℓ t 2 δℓ t ℓ t ( x t − δ ) x t − δ x t x t + δ Regret bounds scale with � g t � . � g t � grows as 1 /δ .

  20. Gradient Descent Algorithm with two queries per round (GD2P) g t = d Estimates gradient ˜ 2 δ ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t . Updates x t +1 = Π (1 − ξ ) K ( x t − η ˜ g t ) . Adversary Player x 1 Full−Info K x 1

  21. Gradient Descent Algorithm with two queries per round (GD2P) g t = d Estimates gradient ˜ 2 δ ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t . Updates x t +1 = Π (1 − ξ ) K ( x t − η ˜ g t ) . Adversary Player { y 1 , 1 , y 1 , 2 } x 1 Full−Info K y 1 , 1 x 1 y 1 , 2

  22. Gradient Descent Algorithm with two queries per round (GD2P) g t = d Estimates gradient ˜ 2 δ ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t . Updates x t +1 = Π (1 − ξ ) K ( x t − η ˜ g t ) . Adversary Player { y 1 , 1 , y 1 , 2 } { ℓ 1 ( y 1 , 1 ) , ℓ 1 ( y 1 , 2 ) } x 1 Full−Info K ℓ 1 y 1 , 1 x 1 y 1 , 2

  23. Gradient Descent Algorithm with two queries per round (GD2P) g t = d Estimates gradient ˜ 2 δ ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t . Updates x t +1 = Π (1 − ξ ) K ( x t − η ˜ g t ) . Adversary Player { y 1 , 1 , y 1 , 2 } { ℓ 1 ( y 1 , 1 ) , ℓ 1 ( y 1 , 2 ) } x 1 Full−Info g 1 ˜ K ℓ 1 y 1 , 1 x 1 y 1 , 2

  24. Properties of the gradient estimator ˜ g t g t = d g t = d δ ℓ t ( x t + δ u t ) u t , ˜ 2 δ ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t . Identical to g t in expectation, E ˜ g t = E g t . Bounded norm � ˜ g t � ≤ dG . g t � = d � ˜ 2 δ � ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t �

  25. Properties of the gradient estimator ˜ g t g t = d g t = d δ ℓ t ( x t + δ u t ) u t , ˜ 2 δ ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t . Identical to g t in expectation, E ˜ g t = E g t . Bounded norm � ˜ g t � ≤ dG . g t � = d � ˜ 2 δ � ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t � = d 2 δ | ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t ) |

  26. Properties of the gradient estimator ˜ g t g t = d g t = d δ ℓ t ( x t + δ u t ) u t , ˜ 2 δ ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t . Identical to g t in expectation, E ˜ g t = E g t . Bounded norm � ˜ g t � ≤ dG . g t � = d � ˜ 2 δ � ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t � = d 2 δ | ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t ) | ≤ dG 2 δ � 2 δ u t � = Gd .

  27. Regret analysis for gradient descent with two queries Bounded non-empty set: r B ⊆ K ⊆ D B . Lipschitz loss functions: | ℓ t ( x ) − ℓ t ( y ) | ≤ G � x − y � , ∀ x , y ∈ K , ∀ t . σ t -strong convexity: ℓ t ( y ) ≥ ℓ t ( x ) + �∇ ℓ t ( x ) , y − x � + σ t 2 � x − y � 2 . Theorem Under above assumptions, let σ 1 > 0 . If the GD2P algorithm is σ 1: t , δ = log T 1 and ξ = δ run with η t = r , then for any x ∈ K , T T T 1 � � 2( ℓ t ( y t , 1 ) + ℓ t ( y t , 2 )) − E ℓ t ( x ) ≤ E t =1 t =1 T d 2 G 2 1 � � 3 + D � + G log( T ) . 2 σ 1: t r t =1

  28. Regret bound for convex, Lipschitz functions Corollary Suppose the set K is bounded and non-empty, and ℓ t is convex, G 1 Lipschitz for all t. If the GD2P algorithm is run with η t = T , √ δ = log T and ξ = δ r , then T T T 1 � � E 2( ℓ t ( y t , 1 ) + ℓ t ( y t , 2 )) − min x ∈K E ℓ t ( x ) ≤ t =1 t =1 √ � � 3 + D ( d 2 G 2 + D 2 ) T + G log( T ) . r Optimal due to matching lower bound in full-information setup. Bound also holds with high probability for adaptive adversaries.

  29. Regret bound for strongly convex, Lipschitz functions Corollary Suppose the set K is bounded and non-empty, and ℓ t is σ -strongly convex, G Lipschitz for all t. If the GD2P algorithm is run with σ t , δ = log T 1 and ξ = δ η t = r , then T T T 1 � � 2( ℓ t ( y t , 1 ) + ℓ t ( y t , 2 )) − min ℓ t ( x ) ≤ E x ∈K E t =1 t =1 � d 2 G � + 3 + D G log( T ) . σ r Optimal due to matching lower bound in full-information setup.

  30. Extension to other gradient estimators Bounded exploration (BE): � x t − y i , t � ≤ δ . Bounded gradient estimator (BG): � ˜ g t � ≤ G 1 . Approximately unbiased (AU): � E t ˜ g t − ∇ ℓ t ( x t ) � ≤ c δ . Theorem Let K be bounded, non-empty and ℓ t be σ t -strongly convex with for σ 1 > 0 . For any gradient estimator satisfying above conditions, the regret of GD2P algorithm is bounded as: T T 1 � � 2( ℓ t ( y t , 1 ) + ℓ t ( y t , 2 )) − E ℓ t ( x ) ≤ E t =1 t =1 T G 2 1 � 1 + 2 c + D � 1 � + G log( T ) . 2 σ 1: t r t =1

  31. Analysis of other estimators for smooth functions Need to establish conditions (BE), (BG) and (AU). Smoothness assumption: ℓ t ( y ) ≤ ℓ t ( x ) + �∇ ℓ t ( x ) , y − x � + L 2 � x − y � 2 . Examples: Squared ℓ p norm � x − θ � 2 p for p ≥ 2. Quadratic loss ( y − w T x ) 2 for bounded x . Logistic loss log(1 + exp( − w T x )). ℓ ( x )

  32. A Randomized Co-ordinate Descent algorithm Pick a co-ordinate i t ∈ { i , . . . , d } u.a.r. Play y t , 1 = x t + δ e i t , y t , 2 = x t − δ e i t . g t = d Set ˜ 2 δ ( ℓ t ( y t , 1 ) − ℓ t ( y t , 2 )) e i t .

  33. A Randomized Co-ordinate Descent algorithm Pick a co-ordinate i t ∈ { i , . . . , d } u.a.r. Play y t , 1 = x t + δ e i t , y t , 2 = x t − δ e i t . g t = d Set ˜ 2 δ ( ℓ t ( y t , 1 ) − ℓ t ( y t , 2 )) e i t . √ dL δ (AU) holds: � E t ˜ g t − ∇ ℓ t ( x t ) � ≤ . 4 Same regret bound as before, with 1-dimensional gradient updates.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend