Almost Optimal Algorithms for Linear Stochastic Bandits with - - PowerPoint PPT Presentation

almost optimal algorithms for linear stochastic bandits
SMART_READER_LITE
LIVE PREVIEW

Almost Optimal Algorithms for Linear Stochastic Bandits with - - PowerPoint PPT Presentation

1/7 Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payofgs Department of Computer Science and Engineering The Chinese University of Hong Kong NeurIPS, Dec. 2018 Han Shao , Xiaotian Yu , Irwin King and Michael


slide-1
SLIDE 1

1/7

Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payofgs

Han Shao∗, Xiaotian Yu∗, Irwin King and Michael R. Lyu

Department of Computer Science and Engineering The Chinese University of Hong Kong

NeurIPS, Dec. 2018

slide-2
SLIDE 2

2/7

Linear Stochastic Bandits (LSB)

Previous setting

x1,t ∈ Rd

Exploration

x4,t

Exploitation

True Optimal Empirically Optimal at t

Learning setting

▶ 1. Given a set of arms represented by D ⊆ Rd ▶ 2. At time t, select an arm xt ∈ D, and observe

yt(xt) = ⟨xt, θ∗⟩ + ηt

▶ 3. The goal is to maximize ∑T t=1 E[yt(xt)] ▶ 4. ηt follows a sub-Gaussian distribution (E[η2 t ] < ∞)

slide-3
SLIDE 3

3/7

What Is A Heavy-Tailed Distribution?

Practical scenarios

▶ High-probability extreme returns in fjnancial markets

Gaussian NASDAQ returns

▶ Many other real cases

  • 1. Delays in communication networks (Liebeherr et al., 2012)
  • 2. Analysis of biological data (Burnecki et al., 2015)
  • 3. ...
slide-4
SLIDE 4

4/7

LSB with Heavy-Tailed Payofgs

Problem defjnition

▶ Multi-armed bandits (MAB) with heavy-tailed payofgs

(Bubeck et al., 2013) E[η1+ϵ

t

] < +∞, (1) where ϵ ∈ (0, 1]

▶ Our setting: LSB with ηt satisfying Eq. (1)

▶ Weaker assumption than sub-Gaussian ▶ Medina and Yang (2016) studied LSB with heavy-tailed payofgs

sub-Gaussian heavy-tailed (ϵ = 1) MAB O(T

1 2 )

O(T

1 2 ) by Bubeck et al. (2013)

LSB

  • O(T

1 2 )

  • O(T

3 4 ) by Medina and Yang (2016)

▶ Can we achieve

O(T

1 2 )?

slide-5
SLIDE 5

5/7

Algorithm: Median of means under OFU (MENU)

Framework comparison with MoM by Medina and Yang (2016)

slide-6
SLIDE 6

6/7

Regret Bounds

▶ Upper bounds

algorithm MoM MENU CRT TOFU regret

  • O(T

1+2ϵ 1+3ϵ )

  • O(T

1 1+ϵ )

  • O(T

1 2 + 1 2(1+ϵ) )

  • O(T

1 1+ϵ )

▶ Lower bound: Ω(T

1 1+ϵ )

When ϵ = 1, our algorithms achieve O(T

1 2 )

slide-7
SLIDE 7

7/7

See You at the Poster Session

Time: Dec. 5th, 10:45 AM – 12:45 PM Location: Room 210 & 230 AB #158