Adversarial event generator tuning with Bayesian Optimization Maxim - PowerPoint PPT Presentation

Adversarial event generator tuning with Bayesian Optimization Maxim Borisyak, Andrey Ustyuzhanin National Research University Higher School of Economics (HSE) July 7, 2018

Event Generator Tuning

Intro We consider problem of tuning parameters of event generators to ’real’ data: • generating samples is expensive; • generator is non-differentiable. Working example: Pythia 8 generator. 2

Approach I • Bayesian Optimization on the objective: n bins • additional assumptions on distributions are required to guarantee convergence; 3 • two histogram for each parameter: data i and MC i ; ( data i − MC i ) 2 χ 2 = ∑ σ 2 data , i + σ 2 MC , i i = 1

Approach II • an adversarial objective: • Variational Optimization to search for distribution over generator parameters . 4 Wasserstein ( F real , F θ ) = sup x ∼ F real d ( x ) − E x ∼ F θ d ( x ) E d ∈ L 1

Assumptions and goals We consider Adversarial Bayesian Optimization: • no additional restrictions on distribution shapes; Our primary concern is time complexity : • sampling from the target event generator is expensive; • number of generator calls dominates overall complexity; • minimizing number of event generator calls ; • there is a configuration of generator that perfectly matches ’real’ data. 5

Adversarial Bayesian Optimization

Adversarial Objective Jensen-Shannon distance: • Jensen-Shannon distance can be approximated by a classifier. f 6 2 [ P ( x ) Q ( x ) ] JS ( P , Q ) = log 2 + 1 x ∼ P log E P ( x ) + Q ( x ) + E x ∼ Q log = P ( x ) + Q ( x ) log 2 − min cross - entropy ( f , P , Q )

Multi-Stage Adversarial Bayesian Optimization • sequence of classifier models with increasing power: 7 F 1 ⊆ F 2 ⊆ · · · ⊆ F m = F • classifier F i associated with ’pseudo’ JS distance: pJS i ( P , Q ) = log 2 − min f ∈F i cross - entropy ( f , P , Q ) pJS 1 ( P , Q ) ≤ pJS 2 ( P , Q ) ≤ · · · ≤ pJS m ( P , Q ) = JS ( P , Q ); pJS i ( P , Q ) ≥ 0 = ⇒ pJS i + 1 ( P , Q ) ≥ 0

Multi-Stage Adversarial Bayesian Optimization • ’weak’ classifiers tend to require less samples; • ’weak’ classifiers can be used to rapidly explore search space; • these results are constraints for a more powerful classifier. 8 pJS i ( P , Q ) ≥ 0 = ⇒ pJS i + 1 ( P , Q ) ≥ 0

Multi-Stage Adversarial Bayesian Optimization 3: 4: 5: end for 9 1: model 1 = unconstrained BO on pJS 1 ( data , generator θ ) 2: for k = 2 , . . . , m do ( ) constraint k ( θ ) = P pJS k − 1 ≤ 0 | θ, model k − 1 model k = BO on pJS k ( data , · ) s.t. constraint j ( theta ) > τ , j = 0 , . . . , k − 1

Experiments

Experiment We follow problem statement from Ilten P, Williams M, Yang Y. Event generator tuning using Bayesian optimization. Journal of Instrumentation. 2017 Apr 27;12(04):P04028. • values of Monash tune as parameters of the ’real’ distribution; • 2-stage Adversarial Bayesian Optimization; • number of samples required to avoid overfitting of the classifier is measured. 10 • e + e − modeled by Pythia 8 ;

Experiment 1 Target generator options: • alphaSvalue . 11

Experiment 1: stage 1 12

Experiment 1: single stage 16

Experiment 1: results 17

Experiment 2 Target generator options: • bLund ; • sigma ; • aExtraSQuark ; • aExtraDiQuark ; • rFactC ; • rFactB . Second group of varables from Ilten P, Williams M, Yang Y. Event generator tuning using Bayesian optimization. Journal of Instrumentation. 2017 Apr 27;12(04):P04028. 18

Experiment 2: results 19

Summary

Summary • Adversarial Bayesian Optimization is a promising tool for tuning event generators; • Multi-stage Adversarial Bayesian Optimization utilizes ’weak’ classifiers to incrementally constrain search space: • rapid exploration of search space on first stages; • late stages search for solution only among promising candidates; • reduction in overall cost of optimization. 20

Backup 20

Bayesian Adversarial Optimization 5: 8: end while 7: 1 1: initialize Bayesian Optimization train 6: 21 2: while not bored do 3: 4: θ ← askBO () X θ train , X θ test ← sample ( θ ) f ← train discriminator on X θ train and X real [∑ m ] i = 1 log f ( X θ, i i = 1 log( 1 − f ( X real , i L ← test ) + ∑ m test )) 2 · m tellBO ( θ, log 2 − L )

Possible Caveats • constraints are observed by authors to mess with GP; • it is likely that the method would still work (modifying constraints) if classifiers are from the same family of algorithms; • it is possible, that BO with weak classifier carries no information about BO with a strong classifier. 22 • without assumption ∃ θ : JS ( generator ( θ ) , real ) = 0 :

Expected Improvement with Constraints Problem: s.t. • improvement is impossible if constraints are violated: Gelbart, M.A., Snoek, J. and Adams, R.P., 2014. Bayesian optimization with unknown constraints. arXiv preprint arXiv:1403.5607. 23 EI ( x ) → min; g ( x ) ≥ 0 . CEI ( x ) = P ( g ( x ) ≥ 0 ) · EI ( x ) + P ( g ( x ) < 0 ) · 0 • constraints in our case: model i ( x ) ≤ 0 .

Technical details • training set is incrementally extended until over-fitting becomes insignificant. • 2 stage ABO: • 1 stage: XGboost with 1 tree and max depth = 3; • 2 stage: XGboost with 20 tree and max depth = 6. 24

Experiment 1 25

Adversarial event generator tuning with Bayesian Optimization Maxim - PowerPoint PPT Presentation

Adversarial event generator tuning with Bayesian Optimization Maxim Borisyak, Andrey Ustyuzhanin National Research University Higher School of Economics (HSE) July 7, 2018 Event Generator Tuning Intro We consider problem of tuning parameters

Adversarial Approaches to Bayesian Learning and Bayesian Approaches to Adversarial Robustness

ARM memory generator Arm Memory generator Make sure you create a folder similar to what you

Build your own VTA design with Chisel Luis Vega VTA-generator vision VTA-generator vision

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Status Report on the Event Generator WHIZARD Jrgen R. Reuter, DESY J.R.Reuter Status Report on

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

A deterministic algorithm for stochastic multistage problems or The problem-child algorithm

Developing and Shipping LLVM and Clang with CMake The lesser of two evils Chris Bieneman IRC:

Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14

Counting Given a set S we will use | S | for the number of elements of S . Simple Probability A

Some basics in probability and statistics . Course of Machine Learning Master Degree in Computer

T-61.3050 Machine Learning: Basic Principles Dimensionality Reduction Kai Puolam aki

MLES & Multivariate Normal Theory STA721 Linear Models Duke University Merlise Clyde

Separability f : x = ( x 1 , , x n ) n f ( x ) Given , let us de fi ne the

Adversarial event generator tuning with Bayesian Optimization Maxim - PowerPoint PPT Presentation

Adversarial event generator tuning with Bayesian Optimization Maxim Borisyak, Andrey Ustyuzhanin National Research University Higher School of Economics (HSE) July 7, 2018 Event Generator Tuning Intro We consider problem of tuning parameters

Adversarial Approaches to Bayesian Learning and Bayesian Approaches to Adversarial Robustness

ARM memory generator Arm Memory generator Make sure you create a folder similar to what you

Build your own VTA design with Chisel Luis Vega VTA-generator vision VTA-generator vision

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Status Report on the Event Generator WHIZARD Jrgen R. Reuter, DESY J.R.Reuter Status Report on

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

A deterministic algorithm for stochastic multistage problems or The problem-child algorithm

Developing and Shipping LLVM and Clang with CMake The lesser of two evils Chris Bieneman IRC:

Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14

Counting Given a set S we will use | S | for the number of elements of S . Simple Probability A

Some basics in probability and statistics . Course of Machine Learning Master Degree in Computer

T-61.3050 Machine Learning: Basic Principles Dimensionality Reduction Kai Puolam aki

MLES &amp; Multivariate Normal Theory STA721 Linear Models Duke University Merlise Clyde

Separability f : x = ( x 1 , , x n ) n f ( x ) Given , let us de fi ne the

MLES & Multivariate Normal Theory STA721 Linear Models Duke University Merlise Clyde