Optimal Algorithms for Online Convex Optimization with Multi-Point - - PowerPoint PPT Presentation

optimal algorithms for online convex optimization with
SMART_READER_LITE
LIVE PREVIEW

Optimal Algorithms for Online Convex Optimization with Multi-Point - - PowerPoint PPT Presentation

Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback Alekh Agarwal Ofer Dekel Lin Xiao UC Berkeley Microsoft Research Online Convex Optimization (Full-Info) Adversary Player Online Convex Optimization


slide-1
SLIDE 1

Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback

Alekh Agarwal Ofer Dekel Lin Xiao UC Berkeley Microsoft Research

slide-2
SLIDE 2

Online Convex Optimization (Full-Info)

Adversary Player

slide-3
SLIDE 3

Online Convex Optimization (Full-Info)

Adversary Player

x1

K

x1

slide-4
SLIDE 4

Online Convex Optimization (Full-Info)

Adversary Player

ℓ1 K

x1 x1

slide-5
SLIDE 5

Online Convex Optimization (Full-Info)

Player updates xt+1 = ΠK(xt − η∇ℓt(xt)). Adversary Player ∇ℓ1(x1) K

x1 x1 x2

ℓ1

slide-6
SLIDE 6

Online Convex Optimization (Full-Info)

Adversary Player

ℓT K

x1 x1 x2 xT

ℓ1

xT

Minimize regret: RT = T

t=1 ℓt(xt) − minx∈K

T

t=1 ℓt(x).

slide-7
SLIDE 7

Bandit Convex Optimization

Adversary Player

slide-8
SLIDE 8

Bandit Convex Optimization

Adversary Player

x1

K

x1

slide-9
SLIDE 9

Bandit Convex Optimization

Adversary Player

ℓ1 K

x1 x1

ℓ1(x1)

slide-10
SLIDE 10

Bandit Gradient Descent [FKM’05]

Adversary Player Full−Info

x1

K

x1

slide-11
SLIDE 11

Bandit Gradient Descent [FKM’05]

Adversary Player Full−Info

y1

K

x1 x1 y1

slide-12
SLIDE 12

Bandit Gradient Descent [FKM’05]

Adversary Player Full−Info

ℓ1(y1) K

x1 x1 y1 y1

ℓ1

slide-13
SLIDE 13

Bandit Gradient Descent [FKM’05]

Updates xt+1 = Π(1−ξ)K(xt − ηtgt).

Adversary Player Full−Info

g1

K

x1 x1 y1 y1

ℓ1 ℓ1(y1)

Minimize regret: RT = T

t=1 ℓt(yt) − minx∈K

T

t=1 ℓt(x).

slide-14
SLIDE 14

A survey of known regret bounds

Linear Convex Strongly Convex Upper Lower Upper Lower Upper Lower Full-Info O( √ T) O( √ T) O( √ T) O( √ T) O(log T) O(log T)

Deterministic results against completely adaptive adversaries in Full-Info.

slide-15
SLIDE 15

A survey of known regret bounds

Linear Convex Strongly Convex Upper Lower Upper Lower Upper Lower Full-Info O( √ T) O( √ T) O( √ T) O( √ T) O(log T) O(log T) Bandit O( √ T) O( √ T) O(T 3/4) O( √ T) O(T 2/3) O( √ T)?

Deterministic results against completely adaptive adversaries in Full-Info. High probability results against adaptive adversaries for Bandit.

slide-16
SLIDE 16

The Multi-Point (MP) feedback setup

Want to interpolate between bandit and full information. Player allowed several queries per round. Adversary reveals value of ℓt at all points picked. Average regret on points played: RT =

T

  • t=1

1 k

k

  • i=1

ℓt(yt,i) − min

x∈K ℓt(x).

slide-17
SLIDE 17

A survey of known regret bounds

Linear Convex Strongly Convex Upper Lower Upper Lower Upper Lower Full-Info O( √ T) O( √ T) O( √ T) O( √ T) O(log T) O(log T) Bandit O( √ T) O( √ T) O(T 3/4) O( √ T) O(T 2/3) O( √ T)? MP Bandit O( √ T) O( √ T) O( √ T) O( √ T) O(log T) O(log T)

Deterministic results against completely adaptive adversaries in Full-Info. High probability results against adaptive adversaries for Bandit.

slide-18
SLIDE 18

Properties of gradient estimator gt [FKM’05]

gt = d δ ℓt(xt + δut)ut. Unbiased for linear functions. Nearly unbiased for general convex functions.

xt + δ ℓt xt xt − δ

slide-19
SLIDE 19

Properties of gradient estimator gt [FKM’05]

gt = d δ ℓt(xt + δut)ut. Unbiased for linear functions. Nearly unbiased for general convex functions.

2δℓt ℓt xt − δ xt xt + δ ℓt(xt − δ) ℓt(xt + δ)

Regret bounds scale with gt. gt grows as 1/δ.

slide-20
SLIDE 20

Gradient Descent Algorithm with two queries per round (GD2P)

Estimates gradient ˜ gt = d

2δ(ℓt(xt + δut) − ℓt(xt − δut))ut.

Updates xt+1 = Π(1−ξ)K(xt − η˜ gt).

Adversary Player Full−Info

x1

K

x1

slide-21
SLIDE 21

Gradient Descent Algorithm with two queries per round (GD2P)

Estimates gradient ˜ gt = d

2δ(ℓt(xt + δut) − ℓt(xt − δut))ut.

Updates xt+1 = Π(1−ξ)K(xt − η˜ gt).

Adversary Player Full−Info

y1,2 x1

{y1,1,y1,2} K

y1,1 x1

slide-22
SLIDE 22

Gradient Descent Algorithm with two queries per round (GD2P)

Estimates gradient ˜ gt = d

2δ(ℓt(xt + δut) − ℓt(xt − δut))ut.

Updates xt+1 = Π(1−ξ)K(xt − η˜ gt).

Adversary Player Full−Info

K

x1

{y1,1,y1,2} ℓ1 {ℓ1(y1,1), ℓ1(y1,2)}

y1,2 y1,1 x1

slide-23
SLIDE 23

Gradient Descent Algorithm with two queries per round (GD2P)

Estimates gradient ˜ gt = d

2δ(ℓt(xt + δut) − ℓt(xt − δut))ut.

Updates xt+1 = Π(1−ξ)K(xt − η˜ gt).

Adversary Player Full−Info

K

x1

{y1,1,y1,2} ℓ1 {ℓ1(y1,1), ℓ1(y1,2)}

˜ g1 y1,2 y1,1 x1

slide-24
SLIDE 24

Properties of the gradient estimator ˜ gt

gt = d δ ℓt(xt + δut)ut, ˜ gt = d 2δ(ℓt(xt + δut) − ℓt(xt − δut))ut. Identical to gt in expectation, E˜ gt = Egt. Bounded norm ˜ gt ≤ dG. ˜ gt= d 2δ(ℓt(xt + δut) − ℓt(xt − δut))ut

slide-25
SLIDE 25

Properties of the gradient estimator ˜ gt

gt = d δ ℓt(xt + δut)ut, ˜ gt = d 2δ(ℓt(xt + δut) − ℓt(xt − δut))ut. Identical to gt in expectation, E˜ gt = Egt. Bounded norm ˜ gt ≤ dG. ˜ gt= d 2δ(ℓt(xt + δut) − ℓt(xt − δut))ut = d 2δ|ℓt(xt + δut) − ℓt(xt − δut)|

slide-26
SLIDE 26

Properties of the gradient estimator ˜ gt

gt = d δ ℓt(xt + δut)ut, ˜ gt = d 2δ(ℓt(xt + δut) − ℓt(xt − δut))ut. Identical to gt in expectation, E˜ gt = Egt. Bounded norm ˜ gt ≤ dG. ˜ gt= d 2δ(ℓt(xt + δut) − ℓt(xt − δut))ut = d 2δ|ℓt(xt + δut) − ℓt(xt − δut)| ≤ dG 2δ 2δut = Gd.

slide-27
SLIDE 27

Regret analysis for gradient descent with two queries

Bounded non-empty set: rB ⊆ K ⊆ DB. Lipschitz loss functions: |ℓt(x) − ℓt(y)| ≤ Gx − y, ∀x, y ∈ K, ∀ t. σt-strong convexity: ℓt(y) ≥ ℓt(x) + ∇ℓt(x), y − x + σt 2 x − y2. Theorem Under above assumptions, let σ1 > 0. If the GD2P algorithm is run with ηt =

1 σ1:t , δ = log T T

and ξ = δ

r , then for any x ∈ K,

E

T

  • t=1

1 2(ℓt(yt,1) + ℓt(yt,2)) − E

T

  • t=1

ℓt(x) ≤ d2G 2 2

T

  • t=1

1 σ1:t + G log(T)

  • 3 + D

r

  • .
slide-28
SLIDE 28

Regret bound for convex, Lipschitz functions

Corollary Suppose the set K is bounded and non-empty, and ℓt is convex, G Lipschitz for all t. If the GD2P algorithm is run with ηt =

1 √ T ,

δ = log T

T

and ξ = δ

r , then

E

T

  • t=1

1 2(ℓt(yt,1) + ℓt(yt,2)) − min

x∈K E T

  • t=1

ℓt(x) ≤ (d2G 2 + D2) √ T + G log(T)

  • 3 + D

r

  • .

Optimal due to matching lower bound in full-information setup. Bound also holds with high probability for adaptive adversaries.

slide-29
SLIDE 29

Regret bound for strongly convex, Lipschitz functions

Corollary Suppose the set K is bounded and non-empty, and ℓt is σ-strongly convex, G Lipschitz for all t. If the GD2P algorithm is run with ηt =

1 σt , δ = log T T

and ξ = δ

r , then

E

T

  • t=1

1 2(ℓt(yt,1) + ℓt(yt,2)) − min

x∈K E T

  • t=1

ℓt(x) ≤ G log(T) d2G σ + 3 + D r

  • .

Optimal due to matching lower bound in full-information setup.

slide-30
SLIDE 30

Extension to other gradient estimators

Bounded exploration (BE): xt − yi,t ≤ δ. Bounded gradient estimator (BG): ˜ gt ≤ G1. Approximately unbiased (AU): Et ˜ gt − ∇ℓt(xt) ≤ cδ. Theorem Let K be bounded, non-empty and ℓt be σt-strongly convex with for σ1 > 0. For any gradient estimator satisfying above conditions, the regret of GD2P algorithm is bounded as: E

T

  • t=1

1 2(ℓt(yt,1) + ℓt(yt,2)) − E

T

  • t=1

ℓt(x) ≤ G 2

1

2

T

  • t=1

1 σ1:t + G log(T)

  • 1 + 2c + D

r

  • .
slide-31
SLIDE 31

Analysis of other estimators for smooth functions

Need to establish conditions (BE), (BG) and (AU). Smoothness assumption: ℓt(y) ≤ ℓt(x) + ∇ℓt(x), y − x + L 2x − y2. Examples:

Squared ℓp norm x − θ2

p for p ≥ 2.

Quadratic loss (y − w Tx)2 for bounded x. Logistic loss log(1 + exp(−w Tx)). ℓ(x)

slide-32
SLIDE 32

A Randomized Co-ordinate Descent algorithm

Pick a co-ordinate it ∈ {i, . . . , d} u.a.r. Play yt,1 = xt + δeit, yt,2 = xt − δeit. Set ˜ gt = d

2δ(ℓt(yt,1) − ℓt(yt,2))eit.

slide-33
SLIDE 33

A Randomized Co-ordinate Descent algorithm

Pick a co-ordinate it ∈ {i, . . . , d} u.a.r. Play yt,1 = xt + δeit, yt,2 = xt − δeit. Set ˜ gt = d

2δ(ℓt(yt,1) − ℓt(yt,2))eit.

(AU) holds: Et ˜ gt − ∇ℓt(xt) ≤

√ dLδ 4

. Same regret bound as before, with 1-dimensional gradient updates.

slide-34
SLIDE 34

Extension to completely adaptive adversaries

Previously needed ℓt independent of xt. Randomization futile if ℓt depends on xt. Can satisfy (AU) deterministically with d + 1 queries. Deterministic first and second-order algorithms for smooth loss functions. Play the points xt, xt + δei for i = 1, . . . , d. Set ˜ gt = 1

δ

d

i=1(ℓt(xt + δei) − ℓt(xt))ei.

Satisfies (BE), (BG) and (AU): ˜ gt ≤ dG, ˜ gt∇ℓt(xt) ≤

√ dLδ 2

.

slide-35
SLIDE 35

Regret bounds for d + 1 queries

O( √ T) regret for smooth, convex functions. O(log T) regret for smooth, strongly convex functions. O(log T) regret for smooth, exp-concave functions using quasi-Newton variant. Matches lower bounds from full-information setup. Regret bounds hold for completely adaptive adversaries.

slide-36
SLIDE 36

Conclusion

Introduce the multi-point feedback model for partial information. Optimal regret with high probability against adaptive adveraries using just 2 queries per round. Completely adaptive adversaries using d + 1 queries. Open questions:

One-point bandit feedback. √ T lower bound for bandit strongly convex. Distribution over number of queries. High probability log(T) for strongly convex.

slide-37
SLIDE 37

Thank You