Quantile Stein Variational Gradient Descent for Batch Bayesian - PowerPoint PPT Presentation

Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization Chengyue Gong [1] Jian Peng [2] Qiang Liu [1] [1] The University of Texas at Austin [2] University of Illinois at Urbana-Champaign Chengyue Gong, Jian Peng, Qiang Liu Quantile Stein Variational Gradient Descent 1 / 9

Bayesian Optimization Goal: black-box optimization max f ( x ) , f ( · ): expensive, black-box function . x Bayesian Optimization: Iteratively acquire new points based on an acquisition function: New Input x new ← arg max x α ( x | D ) , Acquisition Function D new ← D ∪ { x new , f ( x new ) } , input Black-Box Function output Acquisition function: � α ( x | D ) := E f [ f ( x ) | D ] + η var f [ f ( x ) | D ] . ( UCB ) Chengyue Gong, Jian Peng, Qiang Liu Quantile Stein Variational Gradient Descent 2 / 9

Batch Bayesian Optimization : Find multiple query points { x i } m i =1 in parallel at every iteration. Much more challenging; two desiderata: Acquisition Function Diversity : Everyone should be unique. New Inputs Qualification : Everyone should be good. input Black-Box Function output Next Query Points Diversity ✔ Diversity ✗ High-Quality ✗ High-Quality ✔ Chengyue Gong, Jian Peng, Qiang Liu Quantile Stein Variational Gradient Descent 3 / 9

Optimizing the distribution ρ of query points { x i } by � � F [ ρ ] := E ω max ρ [ α ( x )] + η H [ ρ ] . ρ H [ ρ ] is the entropy. It encourages the diversity . E ω ρ [ · ] is a quantile distorted expectation. It enforces qualification , � 1 Q β E ω ρ [ α ( x )] = f ,ρ ω ( β ) d β, 0 Q β f ,ρ is the β -th quantile of α ( x ), when x ∼ ρ . ω : [0 , 1] → R + is a distortion function: Risk neutral : ω ( β ) = 1. Risk aversion : ω ( β ) is monotonic decreasing. Risk seeking : ω ( β ) is monotonic increasing. We want risk aversion: Take ω ( β ) = β − λ , where λ ≥ 0. Chengyue Gong, Jian Peng, Qiang Liu Quantile Stein Variational Gradient Descent 4 / 9

Quantile Stein Variational Gradient Descent [Liu, Wang 16] Idea: Find particle distributions n � ρ := δ x i / n i =1 to approximately solve the optimization. The particles { x i } n i =1 are iteratively moved to maximize the objective by gradient-like updates � d � � φ ∗ = arg max x ′ i ← x i + ǫ φ ∗ ( x i ) , d ǫ F [ ρ ′ ] ǫ =0 s . t . || φ || H ≤ 1 , � φ ∈H ǫ : step-size; φ ∗ : chosen to maximize the objective function as fast as possible. H : a reproducing kernel Hilbert space (RKHS) with positive definite kernel k ( x , x ′ ). Chengyue Gong, Jian Peng, Qiang Liu Quantile Stein Variational Gradient Descent 5 / 9

Quantile Stein Variational Gradient Descent [Liu, Wang 16] Optimization: � � F [ ρ ] := E ω max ρ [ α ( x )] + η H [ ρ ] . ρ Algorithm: n x i ← x i + ǫ � [ ξ ( x j ) ∇ x α ( x j ) k ( x j , x i ) + η ∇ x j k ( x j , x i ) ] , ∀ i = 1 , . . . , n . n �� i =1 quantile gradient repulsive force Here, each particle is assigned a weight to account the distortion function: n � rank ( x j ) � � ξ ( x j ) = ω , rank ( x j ) = I [ α ( x ℓ ) ≤ α ( x j )] . n ℓ =1 Chengyue Gong, Jian Peng, Qiang Liu Quantile Stein Variational Gradient Descent 6 / 9

Empirical Results Standard Benchmarks LP-UCB DPP MACE QSBO-UCB Ours ( Gonzalez et. al., 2016) (Kathuria et. al., 2016) (Lyu et. al., 2018) Branin 3.28e-4 9.63e-4 2.85e-5 5.14e-5 Eggholder 51.34 82.81 74.14 46.86 Dropwave 0.14 0.13 0.09 0.07 CrossInTray 6.83e-3 7.64e-3 3.78e-4 1.35e-4 gSobol5 1.85 2.34 1.14 0.32 gSobol10 1.04e2 1.07e3 48.92 31.19 gSobol15 2.34e3 5.28e3 6.39e2 3.61e2 Ackley5 3.71 3.74 2.36 2.23 Ackley10 3.87 4.23 3.01 2.41 Alpine2 75.92 73.39 63.29 73.01 Table: Negative Rewards Chengyue Gong, Jian Peng, Qiang Liu Quantile Stein Variational Gradient Descent 7 / 9

Empirical Results Automatic Chemical Design (Gomez-Bombarelli et. al., 2018; Griffiths, 2017) LP-UCB DPP MACE QSBO-UCB QED 0.91 ± 0.05 0.91 ± 0.06 0.92 ± 0.03 0.93 ± 0.03 SAS 2.18 ± 0.06 2.29 ± 0.08 2.16 ± 0.04 2.08 ± 0.05 LogP 0.50 ± 0.11 0.47 ± 0.07 0.41 ± 0.06 0.33 ± 0.08 QED:0.459 QED:0.622 QED:0.872 QED:0.923 ... QED:0.355 QED:0.941 Figure: Illustration of the search process of our QSBO-UCB. Chengyue Gong, Jian Peng, Qiang Liu Quantile Stein Variational Gradient Descent 8 / 9

Conclusions 1 A new algorithm (QSVGD) for risk-sensitive objective 2 Risk-aversion samples for batch Bayesian optimization 3 Good empirical results Thank You Poster #239, Today 06:30 PM –09:00 PM @ Pacific Ballroom Chengyue Gong, Jian Peng, Qiang Liu Quantile Stein Variational Gradient Descent 9 / 9

Quantile Stein Variational Gradient Descent for Batch Bayesian - PowerPoint PPT Presentation

Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization Chengyue Gong [1] Jian Peng [2] Qiang Liu [1] [1] The University of Texas at Austin [2] University of Illinois at Urbana-Champaign Chengyue Gong, Jian Peng, Qiang Liu

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models Dilin Wang

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. MLSS

Gradient Descent Michail Michailidis & Patrick Maiden Outline

Learning to learn by gradient descent by gradient descent Liyan Jiang July 18, 2019 1

Machine Learning (CSE 446): Gradient Descent and Stochastic Gradient Descent Sham M Kakade

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

LOGISTIC REGRESSION, GRADIENT LOGISTIC REGRESSION, GRADIENT DESCENT, NEWTON DESCENT, NEWTON

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. NeurIPS

Fitting Neural Networks Gradient Descent and Stochastic Gradient Descent CS109A Introduction to

Batch Systems Running calculations on HPC resources Outline What is a batch system? How

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Conjugate gradient training algorithm Steepest descent algorithm Definitions: So far: j

Continuous Descent Operation (CDO) Continuous Descent Operation (CDO) Doc 9331 Doc 9331 Erwin

Application areas of Application areas of Scalable Adaptive Multicast Scalable Adaptive

Determination of the infrared radiative forcing at the tropical tropopause with AIRS AIRS

A Fast Worm Scan Detection Tool for VPN Congestion Avoidance Arno Wagner,Thomas D ubendorfer,

Aggregating and Predicting Sequence Labels from Crowd Annotations An T. Nguyen 1 Byron C.

What is Universal Design? will being at 12:30 PM ET 1 About Your Hosts TransCen, Inc.

Interactive Theorem Proving, Automated Reasoning, and Mathematical Computation Jeremy Avigad

Automatic Policy Refinement Using OWL-S and Semantic Infrastructure Information Torsten Klie,

The 3rd World Sustainability Forum Sustainable Urban Development (Section E). Industrial Risk in

Quantile Stein Variational Gradient Descent for Batch Bayesian - PowerPoint PPT Presentation

Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization Chengyue Gong [1] Jian Peng [2] Qiang Liu [1] [1] The University of Texas at Austin [2] University of Illinois at Urbana-Champaign Chengyue Gong, Jian Peng, Qiang Liu

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models Dilin Wang

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. MLSS

Gradient Descent Michail Michailidis &amp; Patrick Maiden Outline

Learning to learn by gradient descent by gradient descent Liyan Jiang July 18, 2019 1

Machine Learning (CSE 446): Gradient Descent and Stochastic Gradient Descent Sham M Kakade

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

LOGISTIC REGRESSION, GRADIENT LOGISTIC REGRESSION, GRADIENT DESCENT, NEWTON DESCENT, NEWTON

Painless Stochastic Gradient Descent : Interpolation, Line-Search, and Convergence Rates. NeurIPS

Fitting Neural Networks Gradient Descent and Stochastic Gradient Descent CS109A Introduction to

Batch Systems Running calculations on HPC resources Outline What is a batch system? How

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

Conjugate gradient training algorithm Steepest descent algorithm Definitions: So far: j

Continuous Descent Operation (CDO) Continuous Descent Operation (CDO) Doc 9331 Doc 9331 Erwin

Application areas of Application areas of Scalable Adaptive Multicast Scalable Adaptive

Determination of the infrared radiative forcing at the tropical tropopause with AIRS AIRS

A Fast Worm Scan Detection Tool for VPN Congestion Avoidance Arno Wagner,Thomas D ubendorfer,

Aggregating and Predicting Sequence Labels from Crowd Annotations An T. Nguyen 1 Byron C.

What is Universal Design? will being at 12:30 PM ET 1 About Your Hosts TransCen, Inc.

Interactive Theorem Proving, Automated Reasoning, and Mathematical Computation Jeremy Avigad

Automatic Policy Refinement Using OWL-S and Semantic Infrastructure Information Torsten Klie,

The 3rd World Sustainability Forum Sustainable Urban Development (Section E). Industrial Risk in

Gradient Descent Michail Michailidis & Patrick Maiden Outline