SLIDE 1
Optimal Algorithms for Online Convex Optimization with Multi-Point - - PowerPoint PPT Presentation
Optimal Algorithms for Online Convex Optimization with Multi-Point - - PowerPoint PPT Presentation
Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback Alekh Agarwal Ofer Dekel Lin Xiao UC Berkeley Microsoft Research Online Convex Optimization (Full-Info) Adversary Player Online Convex Optimization
SLIDE 2
SLIDE 3
Online Convex Optimization (Full-Info)
Adversary Player
x1
K
x1
SLIDE 4
Online Convex Optimization (Full-Info)
Adversary Player
ℓ1 K
x1 x1
SLIDE 5
Online Convex Optimization (Full-Info)
Player updates xt+1 = ΠK(xt − η∇ℓt(xt)). Adversary Player ∇ℓ1(x1) K
x1 x1 x2
ℓ1
SLIDE 6
Online Convex Optimization (Full-Info)
Adversary Player
ℓT K
x1 x1 x2 xT
ℓ1
xT
Minimize regret: RT = T
t=1 ℓt(xt) − minx∈K
T
t=1 ℓt(x).
SLIDE 7
Bandit Convex Optimization
Adversary Player
SLIDE 8
Bandit Convex Optimization
Adversary Player
x1
K
x1
SLIDE 9
Bandit Convex Optimization
Adversary Player
ℓ1 K
x1 x1
ℓ1(x1)
SLIDE 10
Bandit Gradient Descent [FKM’05]
Adversary Player Full−Info
x1
K
x1
SLIDE 11
Bandit Gradient Descent [FKM’05]
Adversary Player Full−Info
y1
K
x1 x1 y1
SLIDE 12
Bandit Gradient Descent [FKM’05]
Adversary Player Full−Info
ℓ1(y1) K
x1 x1 y1 y1
ℓ1
SLIDE 13
Bandit Gradient Descent [FKM’05]
Updates xt+1 = Π(1−ξ)K(xt − ηtgt).
Adversary Player Full−Info
g1
K
x1 x1 y1 y1
ℓ1 ℓ1(y1)
Minimize regret: RT = T
t=1 ℓt(yt) − minx∈K
T
t=1 ℓt(x).
SLIDE 14
A survey of known regret bounds
Linear Convex Strongly Convex Upper Lower Upper Lower Upper Lower Full-Info O( √ T) O( √ T) O( √ T) O( √ T) O(log T) O(log T)
Deterministic results against completely adaptive adversaries in Full-Info.
SLIDE 15
A survey of known regret bounds
Linear Convex Strongly Convex Upper Lower Upper Lower Upper Lower Full-Info O( √ T) O( √ T) O( √ T) O( √ T) O(log T) O(log T) Bandit O( √ T) O( √ T) O(T 3/4) O( √ T) O(T 2/3) O( √ T)?
Deterministic results against completely adaptive adversaries in Full-Info. High probability results against adaptive adversaries for Bandit.
SLIDE 16
The Multi-Point (MP) feedback setup
Want to interpolate between bandit and full information. Player allowed several queries per round. Adversary reveals value of ℓt at all points picked. Average regret on points played: RT =
T
- t=1
1 k
k
- i=1
ℓt(yt,i) − min
x∈K ℓt(x).
SLIDE 17
A survey of known regret bounds
Linear Convex Strongly Convex Upper Lower Upper Lower Upper Lower Full-Info O( √ T) O( √ T) O( √ T) O( √ T) O(log T) O(log T) Bandit O( √ T) O( √ T) O(T 3/4) O( √ T) O(T 2/3) O( √ T)? MP Bandit O( √ T) O( √ T) O( √ T) O( √ T) O(log T) O(log T)
Deterministic results against completely adaptive adversaries in Full-Info. High probability results against adaptive adversaries for Bandit.
SLIDE 18
Properties of gradient estimator gt [FKM’05]
gt = d δ ℓt(xt + δut)ut. Unbiased for linear functions. Nearly unbiased for general convex functions.
xt + δ ℓt xt xt − δ
SLIDE 19
Properties of gradient estimator gt [FKM’05]
gt = d δ ℓt(xt + δut)ut. Unbiased for linear functions. Nearly unbiased for general convex functions.
2δℓt ℓt xt − δ xt xt + δ ℓt(xt − δ) ℓt(xt + δ)
Regret bounds scale with gt. gt grows as 1/δ.
SLIDE 20
Gradient Descent Algorithm with two queries per round (GD2P)
Estimates gradient ˜ gt = d
2δ(ℓt(xt + δut) − ℓt(xt − δut))ut.
Updates xt+1 = Π(1−ξ)K(xt − η˜ gt).
Adversary Player Full−Info
x1
K
x1
SLIDE 21
Gradient Descent Algorithm with two queries per round (GD2P)
Estimates gradient ˜ gt = d
2δ(ℓt(xt + δut) − ℓt(xt − δut))ut.
Updates xt+1 = Π(1−ξ)K(xt − η˜ gt).
Adversary Player Full−Info
y1,2 x1
{y1,1,y1,2} K
y1,1 x1
SLIDE 22
Gradient Descent Algorithm with two queries per round (GD2P)
Estimates gradient ˜ gt = d
2δ(ℓt(xt + δut) − ℓt(xt − δut))ut.
Updates xt+1 = Π(1−ξ)K(xt − η˜ gt).
Adversary Player Full−Info
K
x1
{y1,1,y1,2} ℓ1 {ℓ1(y1,1), ℓ1(y1,2)}
y1,2 y1,1 x1
SLIDE 23
Gradient Descent Algorithm with two queries per round (GD2P)
Estimates gradient ˜ gt = d
2δ(ℓt(xt + δut) − ℓt(xt − δut))ut.
Updates xt+1 = Π(1−ξ)K(xt − η˜ gt).
Adversary Player Full−Info
K
x1
{y1,1,y1,2} ℓ1 {ℓ1(y1,1), ℓ1(y1,2)}
˜ g1 y1,2 y1,1 x1
SLIDE 24
Properties of the gradient estimator ˜ gt
gt = d δ ℓt(xt + δut)ut, ˜ gt = d 2δ(ℓt(xt + δut) − ℓt(xt − δut))ut. Identical to gt in expectation, E˜ gt = Egt. Bounded norm ˜ gt ≤ dG. ˜ gt= d 2δ(ℓt(xt + δut) − ℓt(xt − δut))ut
SLIDE 25
Properties of the gradient estimator ˜ gt
gt = d δ ℓt(xt + δut)ut, ˜ gt = d 2δ(ℓt(xt + δut) − ℓt(xt − δut))ut. Identical to gt in expectation, E˜ gt = Egt. Bounded norm ˜ gt ≤ dG. ˜ gt= d 2δ(ℓt(xt + δut) − ℓt(xt − δut))ut = d 2δ|ℓt(xt + δut) − ℓt(xt − δut)|
SLIDE 26
Properties of the gradient estimator ˜ gt
gt = d δ ℓt(xt + δut)ut, ˜ gt = d 2δ(ℓt(xt + δut) − ℓt(xt − δut))ut. Identical to gt in expectation, E˜ gt = Egt. Bounded norm ˜ gt ≤ dG. ˜ gt= d 2δ(ℓt(xt + δut) − ℓt(xt − δut))ut = d 2δ|ℓt(xt + δut) − ℓt(xt − δut)| ≤ dG 2δ 2δut = Gd.
SLIDE 27
Regret analysis for gradient descent with two queries
Bounded non-empty set: rB ⊆ K ⊆ DB. Lipschitz loss functions: |ℓt(x) − ℓt(y)| ≤ Gx − y, ∀x, y ∈ K, ∀ t. σt-strong convexity: ℓt(y) ≥ ℓt(x) + ∇ℓt(x), y − x + σt 2 x − y2. Theorem Under above assumptions, let σ1 > 0. If the GD2P algorithm is run with ηt =
1 σ1:t , δ = log T T
and ξ = δ
r , then for any x ∈ K,
E
T
- t=1
1 2(ℓt(yt,1) + ℓt(yt,2)) − E
T
- t=1
ℓt(x) ≤ d2G 2 2
T
- t=1
1 σ1:t + G log(T)
- 3 + D
r
- .
SLIDE 28
Regret bound for convex, Lipschitz functions
Corollary Suppose the set K is bounded and non-empty, and ℓt is convex, G Lipschitz for all t. If the GD2P algorithm is run with ηt =
1 √ T ,
δ = log T
T
and ξ = δ
r , then
E
T
- t=1
1 2(ℓt(yt,1) + ℓt(yt,2)) − min
x∈K E T
- t=1
ℓt(x) ≤ (d2G 2 + D2) √ T + G log(T)
- 3 + D
r
- .
Optimal due to matching lower bound in full-information setup. Bound also holds with high probability for adaptive adversaries.
SLIDE 29
Regret bound for strongly convex, Lipschitz functions
Corollary Suppose the set K is bounded and non-empty, and ℓt is σ-strongly convex, G Lipschitz for all t. If the GD2P algorithm is run with ηt =
1 σt , δ = log T T
and ξ = δ
r , then
E
T
- t=1
1 2(ℓt(yt,1) + ℓt(yt,2)) − min
x∈K E T
- t=1
ℓt(x) ≤ G log(T) d2G σ + 3 + D r
- .
Optimal due to matching lower bound in full-information setup.
SLIDE 30
Extension to other gradient estimators
Bounded exploration (BE): xt − yi,t ≤ δ. Bounded gradient estimator (BG): ˜ gt ≤ G1. Approximately unbiased (AU): Et ˜ gt − ∇ℓt(xt) ≤ cδ. Theorem Let K be bounded, non-empty and ℓt be σt-strongly convex with for σ1 > 0. For any gradient estimator satisfying above conditions, the regret of GD2P algorithm is bounded as: E
T
- t=1
1 2(ℓt(yt,1) + ℓt(yt,2)) − E
T
- t=1
ℓt(x) ≤ G 2
1
2
T
- t=1
1 σ1:t + G log(T)
- 1 + 2c + D
r
- .
SLIDE 31
Analysis of other estimators for smooth functions
Need to establish conditions (BE), (BG) and (AU). Smoothness assumption: ℓt(y) ≤ ℓt(x) + ∇ℓt(x), y − x + L 2x − y2. Examples:
Squared ℓp norm x − θ2
p for p ≥ 2.
Quadratic loss (y − w Tx)2 for bounded x. Logistic loss log(1 + exp(−w Tx)). ℓ(x)
SLIDE 32
A Randomized Co-ordinate Descent algorithm
Pick a co-ordinate it ∈ {i, . . . , d} u.a.r. Play yt,1 = xt + δeit, yt,2 = xt − δeit. Set ˜ gt = d
2δ(ℓt(yt,1) − ℓt(yt,2))eit.
SLIDE 33
A Randomized Co-ordinate Descent algorithm
Pick a co-ordinate it ∈ {i, . . . , d} u.a.r. Play yt,1 = xt + δeit, yt,2 = xt − δeit. Set ˜ gt = d
2δ(ℓt(yt,1) − ℓt(yt,2))eit.
(AU) holds: Et ˜ gt − ∇ℓt(xt) ≤
√ dLδ 4
. Same regret bound as before, with 1-dimensional gradient updates.
SLIDE 34
Extension to completely adaptive adversaries
Previously needed ℓt independent of xt. Randomization futile if ℓt depends on xt. Can satisfy (AU) deterministically with d + 1 queries. Deterministic first and second-order algorithms for smooth loss functions. Play the points xt, xt + δei for i = 1, . . . , d. Set ˜ gt = 1
δ
d
i=1(ℓt(xt + δei) − ℓt(xt))ei.
Satisfies (BE), (BG) and (AU): ˜ gt ≤ dG, ˜ gt∇ℓt(xt) ≤
√ dLδ 2
.
SLIDE 35
Regret bounds for d + 1 queries
O( √ T) regret for smooth, convex functions. O(log T) regret for smooth, strongly convex functions. O(log T) regret for smooth, exp-concave functions using quasi-Newton variant. Matches lower bounds from full-information setup. Regret bounds hold for completely adaptive adversaries.
SLIDE 36
Conclusion
Introduce the multi-point feedback model for partial information. Optimal regret with high probability against adaptive adveraries using just 2 queries per round. Completely adaptive adversaries using d + 1 queries. Open questions:
One-point bandit feedback. √ T lower bound for bandit strongly convex. Distribution over number of queries. High probability log(T) for strongly convex.
SLIDE 37