Bayesian Optimization under Heavy-tailed Payoffs Sayak Ray Chowdhury - PowerPoint PPT Presentation

Bayesian Optimization under Heavy-tailed Payoffs Sayak Ray Chowdhury Joint work with Aditya Gopalan Department of ECE, Indian Institute of Science NeurIPS, Dec. 2019

Black-box optimization 1.5 Problem: Maximize an unknown 1 utility function f : D → R by 0.5 0 f(x) Sequentially querying f at inputs x 1 , x 2 , . . . , x T and − 0.5 − 1 Observing noisy function D evaluations: y t = f ( x t ) + ǫ t − 1.5 − 2 0 0.1 0.2 0.3 0.4 0.5 x 0.6 0.7 0.8 0.9 1 T � � f ( x ⋆ ) − f ( x t ) � Want: Low cumulative regret: t =1 1

Heavy-tailed noise Motivation: Significant chance of very high/low values Corrupted measurements Bursty traffic flow distributions Price fluctuations in financial and insurance data eg. Student’s- t , Pareto, Cauchy etc. Existing works assume light-tailed noise (e.g. Srinivas et. al ’11, Hernandez-Lobato et al.’14, ...) Question: Bayesian optimization algorithms with guarantees under heavy-tailed noise? 2

Algorithm 1: Truncated GP-UCB (TGP-UCB) Unknown function f modeled by a Gaussian Process f ∼ GP (0 , k ) At round t : 3

Algorithm 1: Truncated GP-UCB (TGP-UCB) Unknown function f modeled by a Gaussian Process f ∼ GP (0 , k ) At round t : Choose the query point x t using current GP posterior and a suitable 1 parameter β t : x t = argmax µ t − 1 ( x ) + β t σ t − 1 ( x ) x ∈ D 3

Algorithm 1: Truncated GP-UCB (TGP-UCB) Unknown function f modeled by a Gaussian Process f ∼ GP (0 , k ) At round t : Choose the query point x t using current GP posterior and a suitable 1 parameter β t : x t = argmax µ t − 1 ( x ) + β t σ t − 1 ( x ) x ∈ D Truncate the observed payoff y t using a suitable threshold b t : 2 y t = y t 1 | y t |≤ b t ˆ 3

Algorithm 1: Truncated GP-UCB (TGP-UCB) Unknown function f modeled by a Gaussian Process f ∼ GP (0 , k ) At round t : Choose the query point x t using current GP posterior and a suitable 1 parameter β t : x t = argmax µ t − 1 ( x ) + β t σ t − 1 ( x ) x ∈ D Truncate the observed payoff y t using a suitable threshold b t : 2 y t = y t 1 | y t |≤ b t ˆ Update GP posterior ( µ t , σ t ) with new observation ( x t , ˆ y t ) : 3 k t ( x ) T ( K t + λI ) − 1 [ˆ y t ] T µ t ( x ) = y 1 , . . . , ˆ k ( x, x ) − k t ( x ) T ( K t + λI ) − 1 k t ( x ) σ 2 t ( x ) = 3

Regret bounds Assumption on heavy-tailed payoffs: � | y t | 1+ α � < + ∞ for α ∈ (0 , 1] E Algorithm Payoff Regret � � 1 GP-UCB (Srinivas et. al) sub-Gaussian O γ T T 2 � 2+ α � TGP-UCB (this paper) Heavy-tailed O γ T T 2(1+ α ) � � Regret ˜ 3 α = 1 ⇒ O T 4 4

Regret bounds Assumption on heavy-tailed payoffs: � | y t | 1+ α � < + ∞ for α ∈ (0 , 1] E Algorithm Payoff Regret � � 1 GP-UCB (Srinivas et. al) sub-Gaussian O γ T T 2 � 2+ α � TGP-UCB (this paper) Heavy-tailed O γ T T 2(1+ α ) � � Regret ˜ 3 α = 1 ⇒ O T 4 � � 1 We also give a Ω regret lower bound for any algorithm T 1+ α 4

Regret bounds Assumption on heavy-tailed payoffs: � | y t | 1+ α � < + ∞ for α ∈ (0 , 1] E Algorithm Payoff Regret � � 1 GP-UCB (Srinivas et. al) sub-Gaussian O γ T T 2 � 2+ α � TGP-UCB (this paper) Heavy-tailed O γ T T 2(1+ α ) � � Regret ˜ 3 α = 1 ⇒ O T 4 � � 1 We also give a Ω regret lower bound for any algorithm T 1+ α � � 1 Question: Can we achieve ˜ O T regret scaling? 1+ α 4

Regret bounds Assumption on heavy-tailed payoffs: � | y t | 1+ α � < + ∞ for α ∈ (0 , 1] E Algorithm Payoff Regret � � 1 GP-UCB (Srinivas et. al) sub-Gaussian O γ T T 2 � 2+ α � TGP-UCB (this paper) Heavy-tailed O γ T T 2(1+ α ) � � Regret ˜ 3 α = 1 ⇒ O T 4 � � 1 We also give a Ω regret lower bound for any algorithm T 1+ α � � 1 Question: Can we achieve ˜ O T regret scaling? 1+ α Ans: YES 4

Algorithm 2: Adaptively Truncated Approximate GP-UCB Idea: UCB with Kernel approximation + Feature adaptive truncation: x t = argmax x ∈ D ˜ µ t − 1 ( x ) + β t ˜ σ t − 1 ( x ) 5

Algorithm 2: Adaptively Truncated Approximate GP-UCB Idea: UCB with Kernel approximation + Feature adaptive truncation: x t = argmax x ∈ D ˜ µ t − 1 ( x ) + β t ˜ σ t − 1 ( x ) Kernel approximation: Compute: s =1 φ t ( x s ) φ t ( x s ) T + λI V t = � t ( m t rows and m t columns) − 1 U t = V 2 [ φ t ( x 1 ) , . . . , φ t ( x t )] t ( m t rows and t columns) 5

Algorithm 2: Adaptively Truncated Approximate GP-UCB Idea: UCB with Kernel approximation + Feature adaptive truncation: x t = argmax x ∈ D ˜ µ t − 1 ( x ) + β t ˜ σ t − 1 ( x ) Kernel approximation: Compute: s =1 φ t ( x s ) φ t ( x s ) T + λI V t = � t ( m t rows and m t columns) − 1 U t = V 2 [ φ t ( x 1 ) , . . . , φ t ( x t )] t ( m t rows and t columns) Feature adaptive truncation:     u 11 u 12 · · · u 1 t y 1 y 2 · · · y t Hadamard u 21 u 22 · · · u 2 t y 1 y 2 · · · y t     product � . . . . . .  ...   ...  . . . . . .     . . . . . .     u m t 1 u m t 2 · · · u m t t y 1 y 2 · · · y t 5

Algorithm 2: Adaptively Truncated Approximate GP-UCB Idea: UCB with Kernel approximation + Feature adaptive truncation: x t = argmax x ∈ D ˜ µ t − 1 ( x ) + β t ˜ σ t − 1 ( x ) Kernel approximation: Compute: s =1 φ t ( x s ) φ t ( x s ) T + λI V t = � t ( m t rows and m t columns) − 1 U t = V 2 [ φ t ( x 1 ) , . . . , φ t ( x t )] t ( m t rows and t columns) Feature adaptive truncation: Find row sums r 1 , r 2 , . . . , r m t 5

Algorithm 2: Adaptively Truncated Approximate GP-UCB Idea: UCB with Kernel approximation + Feature adaptive truncation: x t = argmax x ∈ D ˜ µ t − 1 ( x ) + β t ˜ σ t − 1 ( x ) Kernel approximation: Compute: s =1 φ t ( x s ) φ t ( x s ) T + λI V t = � t ( m t rows and m t columns) − 1 U t = V 2 [ φ t ( x 1 ) , . . . , φ t ( x t )] t ( m t rows and t columns) Approximate posterior GP: φ t ( x ) T V − 1 / 2 [ r 1 , . . . , r m t ] T µ t ( x ) ˜ = t k ( x, x ) − φ t ( x ) T φ t ( x ) + λφ t ( x ) T V − 1 σ 2 ˜ t ( x ) = φ t ( x ) t s =1 u is y s 1 | u is y s |≤ b t ( u i is the i th row of U t ) where r i = � t 5

See you at the poster session Bayesian Optimization under Heavy-tailed Payoffs Poster #11 Tue Dec 10th 05:30 – 07:30 PM @ East Exhibition Hall B + C Acknowledgements: Tata Trusts travel grant 1 Google India Phd fellowship grant 2 DST Inspire research grant 3 6

Bayesian Optimization under Heavy-tailed Payoffs Sayak Ray Chowdhury - PowerPoint PPT Presentation

Bayesian Optimization under Heavy-tailed Payoffs Sayak Ray Chowdhury Joint work with Aditya Gopalan Department of ECE, Indian Institute of Science NeurIPS, Dec. 2019 Black-box optimization 1.5 Problem: Maximize an unknown 1 utility

Heavy tails: right skew ! Right skew ! normal distribution (not heavy tailed) ! e.g. heights of

Optimizing performance in heavy-tailed system: a case study Lyubov V. Potakhina Alexander S.

Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed

Importance Sampling Methodology for Multidimensional Heavy-tailed Random Walks Jose Blanchet

Processing Quantities with Result for Addition . . . Heavy-Tailed Distribution of Case of a

Bayesian analysis for heavy-tailed nonlinear mixed effects models Cibele M. Russo in

Statistical Inference for Heavy and Super-Heavy-tailed distributions M. Isabel Fraga Alves DEIO,

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Lattice optimization for low charge Lattice optimization for low charge state heavy ion operation

Exercise 12: Heavy ions beams Exercise 12: Heavy ions beams Beginners FLUKA Course Exercise

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

ATLAS Heavy Flavour production Looking towards Run 2 Heavy Flavour at the LHC

QFT Dynamics from CFT Data Zuhair U. Khandker University of Illinois, Urbana-Champaign Boston

Truncated Unity Functional renormalization group (TUfRG) for 2D lattices: getting more

Truncating TLS Connections to Violate Beliefs in Web Applications Ben Smyth & Alfredo Pironti

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Introd u cing time based q u eries MAN IP U L ATIN G TIME SE R IE S DATA W ITH XTS AN D ZOO IN

Data Mining and Matrices 03 Singular Value Decomposition Rainer Gemulla, Pauli Miettinen

The SXS Catalog of Simulations The SXS Catalog of Simulations Mike Boyle Mike Boyle Outline

from Raw Choice Data ARJUN SESHADRI, STANFORD UNIVERSITY ALEX PEYSAKHOVICH, FACEBOOK ARTIFICIAL

Sambuz

Useful Links

Newsletter

Mail Us

Bayesian Optimization under Heavy-tailed Payoffs Sayak Ray Chowdhury - PowerPoint PPT Presentation

Bayesian Optimization under Heavy-tailed Payoffs Sayak Ray Chowdhury Joint work with Aditya Gopalan Department of ECE, Indian Institute of Science NeurIPS, Dec. 2019 Black-box optimization 1.5 Problem: Maximize an unknown 1 utility

Heavy tails: right skew ! Right skew ! normal distribution (not heavy tailed) ! e.g. heights of

Optimizing performance in heavy-tailed system: a case study Lyubov V. Potakhina Alexander S.

Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed

Importance Sampling Methodology for Multidimensional Heavy-tailed Random Walks Jose Blanchet

Processing Quantities with Result for Addition . . . Heavy-Tailed Distribution of Case of a

Bayesian analysis for heavy-tailed nonlinear mixed effects models Cibele M. Russo in

Statistical Inference for Heavy and Super-Heavy-tailed distributions M. Isabel Fraga Alves DEIO,

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Lattice optimization for low charge Lattice optimization for low charge state heavy ion operation

Exercise 12: Heavy ions beams Exercise 12: Heavy ions beams Beginners FLUKA Course Exercise

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

ATLAS Heavy Flavour production Looking towards Run 2 Heavy Flavour at the LHC

QFT Dynamics from CFT Data Zuhair U. Khandker University of Illinois, Urbana-Champaign Boston

Truncated Unity Functional renormalization group (TUfRG) for 2D lattices: getting more

Truncating TLS Connections to Violate Beliefs in Web Applications Ben Smyth &amp; Alfredo Pironti

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Introd u cing time based q u eries MAN IP U L ATIN G TIME SE R IE S DATA W ITH XTS AN D ZOO IN

Data Mining and Matrices 03 Singular Value Decomposition Rainer Gemulla, Pauli Miettinen

The SXS Catalog of Simulations The SXS Catalog of Simulations Mike Boyle Mike Boyle Outline

from Raw Choice Data ARJUN SESHADRI, STANFORD UNIVERSITY ALEX PEYSAKHOVICH, FACEBOOK ARTIFICIAL

Sambuz

Useful Links

Newsletter

Mail Us

Truncating TLS Connections to Violate Beliefs in Web Applications Ben Smyth & Alfredo Pironti