Bayesian Optimization under Heavy-tailed Payoffs Sayak Ray Chowdhury - - PowerPoint PPT Presentation

bayesian optimization under heavy tailed payoffs
SMART_READER_LITE
LIVE PREVIEW

Bayesian Optimization under Heavy-tailed Payoffs Sayak Ray Chowdhury - - PowerPoint PPT Presentation

Bayesian Optimization under Heavy-tailed Payoffs Sayak Ray Chowdhury Joint work with Aditya Gopalan Department of ECE, Indian Institute of Science NeurIPS, Dec. 2019 Black-box optimization 1.5 Problem: Maximize an unknown 1 utility


slide-1
SLIDE 1

Bayesian Optimization under Heavy-tailed Payoffs

Sayak Ray Chowdhury

Joint work with

Aditya Gopalan

Department of ECE, Indian Institute of Science

NeurIPS, Dec. 2019

slide-2
SLIDE 2

Black-box optimization

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −2 −1.5 −1 −0.5 0.5 1 1.5

f(x) x D

Problem: Maximize an unknown utility function f : D → R by Sequentially querying f at inputs x1, x2, . . . , xT and Observing noisy function evaluations: yt = f(xt) + ǫt Want: Low cumulative regret:

T

  • t=1
  • f(x⋆) − f(xt)
  • 1
slide-3
SLIDE 3

Heavy-tailed noise

  • eg. Student’s-t, Pareto, Cauchy etc.

Motivation: Significant chance of very high/low values Corrupted measurements Bursty traffic flow distributions Price fluctuations in financial and insurance data Existing works assume light-tailed noise (e.g. Srinivas et. al ’11, Hernandez-Lobato et al.’14, ...) Question: Bayesian optimization algorithms with guarantees under heavy-tailed noise? 2

slide-4
SLIDE 4

Algorithm 1: Truncated GP-UCB (TGP-UCB)

Unknown function f modeled by a Gaussian Process f ∼ GP(0, k) At round t: 3

slide-5
SLIDE 5

Algorithm 1: Truncated GP-UCB (TGP-UCB)

Unknown function f modeled by a Gaussian Process f ∼ GP(0, k) At round t:

1

Choose the query point xt using current GP posterior and a suitable parameter βt: xt = argmax

x∈D

µt−1(x) + βtσt−1(x) 3

slide-6
SLIDE 6

Algorithm 1: Truncated GP-UCB (TGP-UCB)

Unknown function f modeled by a Gaussian Process f ∼ GP(0, k) At round t:

1

Choose the query point xt using current GP posterior and a suitable parameter βt: xt = argmax

x∈D

µt−1(x) + βtσt−1(x)

2

Truncate the observed payoff yt using a suitable threshold bt: ˆ yt = yt1|yt|≤bt 3

slide-7
SLIDE 7

Algorithm 1: Truncated GP-UCB (TGP-UCB)

Unknown function f modeled by a Gaussian Process f ∼ GP(0, k) At round t:

1

Choose the query point xt using current GP posterior and a suitable parameter βt: xt = argmax

x∈D

µt−1(x) + βtσt−1(x)

2

Truncate the observed payoff yt using a suitable threshold bt: ˆ yt = yt1|yt|≤bt

3

Update GP posterior (µt, σt) with new observation (xt, ˆ yt): µt(x) = kt(x)T (Kt + λI)−1[ˆ y1, . . . , ˆ yt]T σ2

t (x)

= k(x, x) − kt(x)T (Kt + λI)−1kt(x) 3

slide-8
SLIDE 8

Regret bounds

Assumption on heavy-tailed payoffs: E

  • |yt|1+α

< +∞ for α ∈ (0, 1] Algorithm Payoff Regret GP-UCB (Srinivas et. al) sub-Gaussian O

  • γT T

1 2

  • TGP-UCB (this paper)

Heavy-tailed O

  • γT T

2+α 2(1+α)

  • α = 1

⇒ Regret ˜ O

  • T

3 4

  • 4
slide-9
SLIDE 9

Regret bounds

Assumption on heavy-tailed payoffs: E

  • |yt|1+α

< +∞ for α ∈ (0, 1] Algorithm Payoff Regret GP-UCB (Srinivas et. al) sub-Gaussian O

  • γT T

1 2

  • TGP-UCB (this paper)

Heavy-tailed O

  • γT T

2+α 2(1+α)

  • α = 1

⇒ Regret ˜ O

  • T

3 4

  • We also give a Ω
  • T

1 1+α

  • regret lower bound for any algorithm

4

slide-10
SLIDE 10

Regret bounds

Assumption on heavy-tailed payoffs: E

  • |yt|1+α

< +∞ for α ∈ (0, 1] Algorithm Payoff Regret GP-UCB (Srinivas et. al) sub-Gaussian O

  • γT T

1 2

  • TGP-UCB (this paper)

Heavy-tailed O

  • γT T

2+α 2(1+α)

  • α = 1

⇒ Regret ˜ O

  • T

3 4

  • We also give a Ω
  • T

1 1+α

  • regret lower bound for any algorithm

Question: Can we achieve ˜ O

  • T

1 1+α

  • regret scaling?

4

slide-11
SLIDE 11

Regret bounds

Assumption on heavy-tailed payoffs: E

  • |yt|1+α

< +∞ for α ∈ (0, 1] Algorithm Payoff Regret GP-UCB (Srinivas et. al) sub-Gaussian O

  • γT T

1 2

  • TGP-UCB (this paper)

Heavy-tailed O

  • γT T

2+α 2(1+α)

  • α = 1

⇒ Regret ˜ O

  • T

3 4

  • We also give a Ω
  • T

1 1+α

  • regret lower bound for any algorithm

Question: Can we achieve ˜ O

  • T

1 1+α

  • regret scaling?

Ans: YES 4

slide-12
SLIDE 12

Algorithm 2: Adaptively Truncated Approximate GP-UCB

Idea: UCB with Kernel approximation + Feature adaptive truncation: xt = argmaxx∈D ˜ µt−1(x) + βt˜ σt−1(x) 5

slide-13
SLIDE 13

Algorithm 2: Adaptively Truncated Approximate GP-UCB

Idea: UCB with Kernel approximation + Feature adaptive truncation: xt = argmaxx∈D ˜ µt−1(x) + βt˜ σt−1(x) Kernel approximation: Compute: Vt = t

s=1 φt(xs)φt(xs)T +λI

(mt rows and mt columns) Ut = V

− 1

2

t

[φt(x1), . . . , φt(xt)] (mt rows and t columns) 5

slide-14
SLIDE 14

Algorithm 2: Adaptively Truncated Approximate GP-UCB

Idea: UCB with Kernel approximation + Feature adaptive truncation: xt = argmaxx∈D ˜ µt−1(x) + βt˜ σt−1(x) Kernel approximation: Compute: Vt = t

s=1 φt(xs)φt(xs)T +λI

(mt rows and mt columns) Ut = V

− 1

2

t

[φt(x1), . . . , φt(xt)] (mt rows and t columns) Feature adaptive truncation:      u11 u12 · · · u1t u21 u22 · · · u2t . . . . . . ... . . . umt1 umt2 · · · umtt     

    y1 y2 · · · yt y1 y2 · · · yt . . . . . . ... . . . y1 y2 · · · yt      Hadamard product 5

slide-15
SLIDE 15

Algorithm 2: Adaptively Truncated Approximate GP-UCB

Idea: UCB with Kernel approximation + Feature adaptive truncation: xt = argmaxx∈D ˜ µt−1(x) + βt˜ σt−1(x) Kernel approximation: Compute: Vt = t

s=1 φt(xs)φt(xs)T +λI

(mt rows and mt columns) Ut = V

− 1

2

t

[φt(x1), . . . , φt(xt)] (mt rows and t columns) Feature adaptive truncation: Find row sums r1, r2, . . . , rmt 5

slide-16
SLIDE 16

Algorithm 2: Adaptively Truncated Approximate GP-UCB

Idea: UCB with Kernel approximation + Feature adaptive truncation: xt = argmaxx∈D ˜ µt−1(x) + βt˜ σt−1(x) Kernel approximation: Compute: Vt = t

s=1 φt(xs)φt(xs)T +λI

(mt rows and mt columns) Ut = V

− 1

2

t

[φt(x1), . . . , φt(xt)] (mt rows and t columns) Approximate posterior GP: ˜ µt(x) = φt(x)T V −1/2

t

[r1, . . . , rmt]T ˜ σ2

t (x)

= k(x, x) − φt(x)T φt(x) + λφt(x)T V −1

t

φt(x) where ri = t

s=1 uisys1|uisys|≤bt (ui is the ith row of Ut)

5

slide-17
SLIDE 17

See you at the poster session

Bayesian Optimization under Heavy-tailed Payoffs Poster #11

Tue Dec 10th 05:30 – 07:30 PM @ East Exhibition Hall B + C

Acknowledgements:

1

Tata Trusts travel grant

2

Google India Phd fellowship grant

3

DST Inspire research grant 6