bayesian optimization under heavy tailed payoffs
play

Bayesian Optimization under Heavy-tailed Payoffs Sayak Ray Chowdhury - PowerPoint PPT Presentation

Bayesian Optimization under Heavy-tailed Payoffs Sayak Ray Chowdhury Joint work with Aditya Gopalan Department of ECE, Indian Institute of Science NeurIPS, Dec. 2019 Black-box optimization 1.5 Problem: Maximize an unknown 1 utility


  1. Bayesian Optimization under Heavy-tailed Payoffs Sayak Ray Chowdhury Joint work with Aditya Gopalan Department of ECE, Indian Institute of Science NeurIPS, Dec. 2019

  2. Black-box optimization 1.5 Problem: Maximize an unknown 1 utility function f : D → R by 0.5 0 f(x) Sequentially querying f at inputs x 1 , x 2 , . . . , x T and − 0.5 − 1 Observing noisy function D evaluations: y t = f ( x t ) + ǫ t − 1.5 − 2 0 0.1 0.2 0.3 0.4 0.5 x 0.6 0.7 0.8 0.9 1 T � � f ( x ⋆ ) − f ( x t ) � Want: Low cumulative regret: t =1 1

  3. Heavy-tailed noise Motivation: Significant chance of very high/low values Corrupted measurements Bursty traffic flow distributions Price fluctuations in financial and insurance data eg. Student’s- t , Pareto, Cauchy etc. Existing works assume light-tailed noise (e.g. Srinivas et. al ’11, Hernandez-Lobato et al.’14, ...) Question: Bayesian optimization algorithms with guarantees under heavy-tailed noise? 2

  4. Algorithm 1: Truncated GP-UCB (TGP-UCB) Unknown function f modeled by a Gaussian Process f ∼ GP (0 , k ) At round t : 3

  5. Algorithm 1: Truncated GP-UCB (TGP-UCB) Unknown function f modeled by a Gaussian Process f ∼ GP (0 , k ) At round t : Choose the query point x t using current GP posterior and a suitable 1 parameter β t : x t = argmax µ t − 1 ( x ) + β t σ t − 1 ( x ) x ∈ D 3

  6. Algorithm 1: Truncated GP-UCB (TGP-UCB) Unknown function f modeled by a Gaussian Process f ∼ GP (0 , k ) At round t : Choose the query point x t using current GP posterior and a suitable 1 parameter β t : x t = argmax µ t − 1 ( x ) + β t σ t − 1 ( x ) x ∈ D Truncate the observed payoff y t using a suitable threshold b t : 2 y t = y t 1 | y t |≤ b t ˆ 3

  7. Algorithm 1: Truncated GP-UCB (TGP-UCB) Unknown function f modeled by a Gaussian Process f ∼ GP (0 , k ) At round t : Choose the query point x t using current GP posterior and a suitable 1 parameter β t : x t = argmax µ t − 1 ( x ) + β t σ t − 1 ( x ) x ∈ D Truncate the observed payoff y t using a suitable threshold b t : 2 y t = y t 1 | y t |≤ b t ˆ Update GP posterior ( µ t , σ t ) with new observation ( x t , ˆ y t ) : 3 k t ( x ) T ( K t + λI ) − 1 [ˆ y t ] T µ t ( x ) = y 1 , . . . , ˆ k ( x, x ) − k t ( x ) T ( K t + λI ) − 1 k t ( x ) σ 2 t ( x ) = 3

  8. Regret bounds Assumption on heavy-tailed payoffs: � | y t | 1+ α � < + ∞ for α ∈ (0 , 1] E Algorithm Payoff Regret � � 1 GP-UCB (Srinivas et. al) sub-Gaussian O γ T T 2 � 2+ α � TGP-UCB (this paper) Heavy-tailed O γ T T 2(1+ α ) � � Regret ˜ 3 α = 1 ⇒ O T 4 4

  9. Regret bounds Assumption on heavy-tailed payoffs: � | y t | 1+ α � < + ∞ for α ∈ (0 , 1] E Algorithm Payoff Regret � � 1 GP-UCB (Srinivas et. al) sub-Gaussian O γ T T 2 � 2+ α � TGP-UCB (this paper) Heavy-tailed O γ T T 2(1+ α ) � � Regret ˜ 3 α = 1 ⇒ O T 4 � � 1 We also give a Ω regret lower bound for any algorithm T 1+ α 4

  10. Regret bounds Assumption on heavy-tailed payoffs: � | y t | 1+ α � < + ∞ for α ∈ (0 , 1] E Algorithm Payoff Regret � � 1 GP-UCB (Srinivas et. al) sub-Gaussian O γ T T 2 � 2+ α � TGP-UCB (this paper) Heavy-tailed O γ T T 2(1+ α ) � � Regret ˜ 3 α = 1 ⇒ O T 4 � � 1 We also give a Ω regret lower bound for any algorithm T 1+ α � � 1 Question: Can we achieve ˜ O T regret scaling? 1+ α 4

  11. Regret bounds Assumption on heavy-tailed payoffs: � | y t | 1+ α � < + ∞ for α ∈ (0 , 1] E Algorithm Payoff Regret � � 1 GP-UCB (Srinivas et. al) sub-Gaussian O γ T T 2 � 2+ α � TGP-UCB (this paper) Heavy-tailed O γ T T 2(1+ α ) � � Regret ˜ 3 α = 1 ⇒ O T 4 � � 1 We also give a Ω regret lower bound for any algorithm T 1+ α � � 1 Question: Can we achieve ˜ O T regret scaling? 1+ α Ans: YES 4

  12. Algorithm 2: Adaptively Truncated Approximate GP-UCB Idea: UCB with Kernel approximation + Feature adaptive truncation: x t = argmax x ∈ D ˜ µ t − 1 ( x ) + β t ˜ σ t − 1 ( x ) 5

  13. Algorithm 2: Adaptively Truncated Approximate GP-UCB Idea: UCB with Kernel approximation + Feature adaptive truncation: x t = argmax x ∈ D ˜ µ t − 1 ( x ) + β t ˜ σ t − 1 ( x ) Kernel approximation: Compute: s =1 φ t ( x s ) φ t ( x s ) T + λI V t = � t ( m t rows and m t columns) − 1 U t = V 2 [ φ t ( x 1 ) , . . . , φ t ( x t )] t ( m t rows and t columns) 5

  14. Algorithm 2: Adaptively Truncated Approximate GP-UCB Idea: UCB with Kernel approximation + Feature adaptive truncation: x t = argmax x ∈ D ˜ µ t − 1 ( x ) + β t ˜ σ t − 1 ( x ) Kernel approximation: Compute: s =1 φ t ( x s ) φ t ( x s ) T + λI V t = � t ( m t rows and m t columns) − 1 U t = V 2 [ φ t ( x 1 ) , . . . , φ t ( x t )] t ( m t rows and t columns) Feature adaptive truncation:     u 11 u 12 · · · u 1 t y 1 y 2 · · · y t Hadamard u 21 u 22 · · · u 2 t y 1 y 2 · · · y t     product � . . . . . .  ...   ...  . . . . . .     . . . . . .     u m t 1 u m t 2 · · · u m t t y 1 y 2 · · · y t 5

  15. Algorithm 2: Adaptively Truncated Approximate GP-UCB Idea: UCB with Kernel approximation + Feature adaptive truncation: x t = argmax x ∈ D ˜ µ t − 1 ( x ) + β t ˜ σ t − 1 ( x ) Kernel approximation: Compute: s =1 φ t ( x s ) φ t ( x s ) T + λI V t = � t ( m t rows and m t columns) − 1 U t = V 2 [ φ t ( x 1 ) , . . . , φ t ( x t )] t ( m t rows and t columns) Feature adaptive truncation: Find row sums r 1 , r 2 , . . . , r m t 5

  16. Algorithm 2: Adaptively Truncated Approximate GP-UCB Idea: UCB with Kernel approximation + Feature adaptive truncation: x t = argmax x ∈ D ˜ µ t − 1 ( x ) + β t ˜ σ t − 1 ( x ) Kernel approximation: Compute: s =1 φ t ( x s ) φ t ( x s ) T + λI V t = � t ( m t rows and m t columns) − 1 U t = V 2 [ φ t ( x 1 ) , . . . , φ t ( x t )] t ( m t rows and t columns) Approximate posterior GP: φ t ( x ) T V − 1 / 2 [ r 1 , . . . , r m t ] T µ t ( x ) ˜ = t k ( x, x ) − φ t ( x ) T φ t ( x ) + λφ t ( x ) T V − 1 σ 2 ˜ t ( x ) = φ t ( x ) t s =1 u is y s 1 | u is y s |≤ b t ( u i is the i th row of U t ) where r i = � t 5

  17. See you at the poster session Bayesian Optimization under Heavy-tailed Payoffs Poster #11 Tue Dec 10th 05:30 – 07:30 PM @ East Exhibition Hall B + C Acknowledgements: Tata Trusts travel grant 1 Google India Phd fellowship grant 2 DST Inspire research grant 3 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend