Multi-fidelity Bayesian Optimisation g ( z, x ) f ( x ) Z x z X - PowerPoint PPT Presentation

Multi-fidelity Bandit Optimisation in 2 Fidelities (1 Approximation) (Kandasamy et al. NIPS 2016b) f (2) f (1) x ⋆ ◮ Optimise f = f (2) . x ⋆ = argmax x f (2) ( x ). ◮ But .. we have an approximation f (1) to f (2) . ◮ f (1) costs λ (1) , f (2) costs λ (2) . λ (1) < λ (2) . “cost”: could be computation time, money etc. ◮ f (1) , f (2) ∼ GP (0 , κ ). ◮ � f (2) − f (1) � ∞ ≤ ζ (1) . ζ (1) is known. 12/30

Multi-fidelity Bandit Optimisation in 2 Fidelities (1 Approximation) (Kandasamy et al. NIPS 2016b) f (2) f (1) x ⋆ At time t : Determine the point x t ∈ X and fidelity m t ∈ { 1 , 2 } for querying. 13/30

Multi-fidelity Bandit Optimisation in 2 Fidelities (1 Approximation) (Kandasamy et al. NIPS 2016b) f (2) f (1) x ⋆ At time t : Determine the point x t ∈ X and fidelity m t ∈ { 1 , 2 } for querying. Maximise f (2) . Don’t care for maximum of f (1) . End Goal: 13/30

Multi-fidelity Bandit Optimisation in 2 Fidelities (1 Approximation) (Kandasamy et al. NIPS 2016b) f (2) f (1) x ⋆ At time t : Determine the point x t ∈ X and fidelity m t ∈ { 1 , 2 } for querying. Maximise f (2) . Don’t care for maximum of f (1) . End Goal: S (Λ) = f (2) ( x ⋆ ) − max t : m t =2 f (2) ( x t ) Simple Regret: Capital Λ ← amount of the resource spent. E.g. seconds or dollars . 13/30

Multi-fidelity Bandit Optimisation in 2 Fidelities (1 Approximation) (Kandasamy et al. NIPS 2016b) f (2) f (1) x ⋆ At time t : Determine the point x t ∈ X and fidelity m t ∈ { 1 , 2 } for querying. Maximise f (2) . Don’t care for maximum of f (1) . End Goal: S (Λ) = f (2) ( x ⋆ ) − max t : m t =2 f (2) ( x t ) Simple Regret: Capital Λ ← amount of the resource spent. E.g. seconds or dollars . No reward for querying f (1) , but use cheap evaluations to guide search for x ⋆ at f (2) . 13/30

Challenges f (2) = f x ⋆ 13/30

Challenges f (2) + ζ (1) − ζ (1) x ⋆ 13/30

Challenges f (2) f (1) x ⋆ 13/30

Challenges f (2) f (1) x ⋆ ◮ f (1) is not just a noisy version of f (2) . 13/30

Challenges f (2) f (1) x ⋆ x (1) ⋆ ◮ f (1) is not just a noisy version of f (2) . x (1) ◮ Cannot just maximise f (1) . is suboptimal for f (2) . ⋆ 13/30

Challenges f (2) f (1) x ⋆ x (1) ⋆ ◮ f (1) is not just a noisy version of f (2) . x (1) ◮ Cannot just maximise f (1) . is suboptimal for f (2) . ⋆ ◮ Need to explore f (2) sufficiently well around the high valued regions of f (1) – but at a not too large region. 13/30

Challenges f (1) f (2) x (1) x ⋆ ⋆ ◮ f (1) is not just a noisy version of f (2) . x (1) ◮ Cannot just maximise f (1) . is suboptimal for f (2) . ⋆ ◮ Need to explore f (2) sufficiently well around the high valued regions of f (1) – but at a not too large region. 13/30

Challenges f (1) f (2) x (1) x ⋆ ⋆ ◮ f (1) is not just a noisy version of f (2) . x (1) ◮ Cannot just maximise f (1) . is suboptimal for f (2) . ⋆ ◮ Need to explore f (2) sufficiently well around the high valued regions of f (1) – but at a not too large region. Key Message: We will explore X using f (1) and use f (2) mostly in a promising region X α . 13/30

MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound f (2) f (1) x ⋆ 14/30

MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound f (2) f (1) x ⋆ ◮ Construct Upper Confidence Bound ϕ t for f (2) . Choose point x t = argmax x ∈X ϕ t ( x ). 14/30

MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound t = 11 f (2) f (1) x ⋆ x t ◮ Construct Upper Confidence Bound ϕ t for f (2) . Choose point x t = argmax x ∈X ϕ t ( x ). ϕ (1) µ (1) t − 1 ( x ) + β 1 / 2 σ (1) t − 1 ( x ) + ζ (1) t ( x ) = t ϕ (2) t ( x ) = µ (2) t − 1 ( x ) + β 1 / 2 σ (2) t − 1 ( x ) t ϕ t ( x ) = min { ϕ (1) t ( x ) , ϕ (2) t ( x ) } 14/30

MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound t = 11 f (2) m t = 2 f (1) γ (1) x ⋆ x t ◮ Construct Upper Confidence Bound ϕ t for f (2) . Choose point x t = argmax x ∈X ϕ t ( x ). ϕ (1) µ (1) t − 1 ( x ) + β 1 / 2 σ (1) t − 1 ( x ) + ζ (1) t ( x ) = t ϕ (2) t ( x ) = µ (2) t − 1 ( x ) + β 1 / 2 σ (2) t − 1 ( x ) t ϕ t ( x ) = min { ϕ (1) t ( x ) , ϕ (2) t ( x ) } � if β 1 / 2 σ (1) t − 1 ( x t ) > γ (1) 1 ◮ Choose fidelity m t = t 2 otherwise. 14/30

Theoretical Results for MF-GP-UCB GP-UCB , κ is an SE kernel (Srinivas et al. 2010) � vol ( X ) S (Λ) = f (2) ( x ⋆ ) − max t : m t =2 f (2) ( x t ) w.h.p � Λ 15/30

Theoretical Results for MF-GP-UCB GP-UCB , κ is an SE kernel (Srinivas et al. 2010) � vol ( X ) S (Λ) = f (2) ( x ⋆ ) − max t : m t =2 f (2) ( x t ) w.h.p � Λ MF-GP-UCB , κ is an SE kernel (Kandasamy et al. NIPS 2016b) � � vol ( X α ) vol ( X ) w.h.p ∀ α > 0 , S (Λ) + � Λ 2 − α Λ X α = { x : f (2) ( x ⋆ ) − f (1) ( x ) ≤ C α ζ (1) } Good approximation (small ζ (1) ) = ⇒ vol ( X α ) ≪ vol ( X ). 15/30

MF-GP-UCB with multiple approximations 16/30

MF-GP-UCB with multiple approximations Things work out. 16/30

Experiment: Viola & Jones Face Detection 22 Threshold values for each cascade. ( d = 22) Fidelities with dataset sizes (300 , 3000). ( M = 2) 0.35 0.3 0.25 0.2 0.15 0.1 1000 2000 3000 4000 5000 6000 7000 8000 17/30

Experiment: Cosmological Maximum Likelihood Inference ◮ Type Ia Supernovae Data ◮ Maximum likelihood inference for 3 cosmological parameters: ◮ Hubble Constant H 0 ◮ Dark Energy Fraction Ω Λ ◮ Dark Matter Fraction Ω M ◮ Likelihood: Robertson Walker metric (Robertson 1936) Requires numerical integration for each point in the dataset. 18/30

Experiment: Cosmological Maximum Likelihood Inference 3 cosmological parameters. ( d = 3) Fidelities: integration on grids of size (10 2 , 10 4 , 10 6 ). ( M = 3) 10 5 0 -5 -10 500 1000 1500 2000 2500 3000 3500 19/30

MF-GP-UCB Synthetic Experiment: Hartmann-3 D d = 3 , M = 3 Query frequencies for Hartmann-3D 40 m=1 m=2 35 m=3 Num. of Queries 30 25 20 15 10 5 0 0 0.5 1 1.5 2 2.5 3 3.5 f (3) ( x ) 19/30

Outline 1. A finite number of approximations (Kandasamy et al. NIPS 2016b) - Formalism, intuition and challenges - Algorithm - Theoretical results - Experiments 2. A continuous spectrum of approximations (Kandasamy et al. ICML 2017) - Formalism - Algorithm - Theoretical results - Experiments 20/30

Why continuous approximations? - Use an arbitrary amount of data? 21/30

Why continuous approximations? - Use an arbitrary amount of data? - Iterative algorithms: use arbitrary number of iterations? 21/30

Why continuous approximations? - Use an arbitrary amount of data? - Iterative algorithms: use arbitrary number of iterations? E.g. Train an ML model with N • data and T • iterations. - But use N < N • data and T < T • iterations to approximate cross validation performance at ( N • , T • ). 21/30

Why continuous approximations? - Use an arbitrary amount of data? - Iterative algorithms: use arbitrary number of iterations? E.g. Train an ML model with N • data and T • iterations. - But use N < N • data and T < T • iterations to approximate cross validation performance at ( N • , T • ). Approximations from a continuous 2D “fidelity space” ( N , T ). 21/30

Why continuous approximations? - Use an arbitrary amount of data? - Iterative algorithms: use arbitrary number of iterations? E.g. Train an ML model with N • data and T • iterations. - But use N < N • data and T < T • iterations to approximate cross validation performance at ( N • , T • ). Approximations from a continuous 2D “fidelity space” ( N , T ). Scientific studies: Simulations and numerical computations at varying continuous levels of granularity. 21/30

Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. ICML 2017) A fidelity space Z and domain X Z ← all ( N , T ) values. X ← all hyper-parameter values. Z X 22/30

Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. ICML 2017) g ( z, x ) A fidelity space Z and domain X Z ← all ( N , T ) values. X ← all hyper-parameter values. g : Z × X → R . g ([ N , T ] , x ) ← cv accuracy when training with N data for T iterations Z at hyper-parameter x . X 22/30

Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. ICML 2017) g ( z, x ) A fidelity space Z and domain X Z ← all ( N , T ) values. X ← all hyper-parameter values. f ( x ) g : Z × X → R . g ([ N , T ] , x ) ← cv accuracy when training with N data for T iterations Z at hyper-parameter x . z • X We wish to optimise f ( x ) = g ( z • , x ) where z • ∈ Z . z • = [ N • , T • ]. 22/30

Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. ICML 2017) g ( z, x ) A fidelity space Z and domain X Z ← all ( N , T ) values. X ← all hyper-parameter values. f ( x ) g : Z × X → R . g ([ N , T ] , x ) ← cv accuracy when training with N data for T iterations Z x ⋆ at hyper-parameter x . z • X We wish to optimise f ( x ) = g ( z • , x ) where z • ∈ Z . z • = [ N • , T • ]. End Goal: Find x ⋆ = argmax x f ( x ). 22/30

Multi-fidelity Optimisation with Continuous Approximations (Kandasamy et al. ICML 2017) g ( z, x ) A fidelity space Z and domain X Z ← all ( N , T ) values. X ← all hyper-parameter values. f ( x ) g : Z × X → R . g ([ N , T ] , x ) ← cv accuracy when training with N data for T iterations Z x ⋆ at hyper-parameter x . z • X We wish to optimise f ( x ) = g ( z • , x ) where z • ∈ Z . z • = [ N • , T • ]. End Goal: Find x ⋆ = argmax x f ( x ). A cost function, λ : Z → R + . λ ( z ) λ ( z ) = λ ( N , T ) = O ( N 2 T ) . Z z • 22/30

Multi-fidelity Simple Regret (Kandasamy et al. ICML 2017) g ( z, x ) f ( x ) λ ( z ) Z x ⋆ z • X Z z • End Goal: Find x ⋆ = argmax x f ( x ). 23/30

Multi-fidelity Simple Regret (Kandasamy et al. ICML 2017) g ( z, x ) f ( x ) λ ( z ) Z x ⋆ z • X Z z • End Goal: Find x ⋆ = argmax x f ( x ). Simple Regret after capital Λ: S (Λ) = f ( x ⋆ ) − max t : z t = z • f ( x t ) . Λ ← amount of a resource spent, e.g. computation time or money. 23/30

Multi-fidelity Simple Regret (Kandasamy et al. ICML 2017) g ( z, x ) f ( x ) λ ( z ) Z x ⋆ z • X Z z • End Goal: Find x ⋆ = argmax x f ( x ). Simple Regret after capital Λ: S (Λ) = f ( x ⋆ ) − max t : z t = z • f ( x t ) . Λ ← amount of a resource spent, e.g. computation time or money. No reward for maximising low fidelities, but use cheap evaluations at z � = z • to speed up search for x ⋆ . 23/30

BOCA : Bayesian Optimisation with Continuous Approximations (Kandasamy et al. ICML 2017) 24/30

BOCA : Bayesian Optimisation with Continuous Approximations (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R σ t − 1 : Z × X → R + std-dev 24/30

BOCA : Bayesian Optimisation with Continuous Approximations (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R σ t − 1 : Z × X → R + std-dev (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X 24/30

BOCA : Bayesian Optimisation with Continuous Approximations (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R σ t − 1 : Z × X → R + std-dev (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X � � (2) Z t ≈ { z • } ∪ z : σ t − 1 ( z , x t ) ≥ γ ( z ) (cheapest z in Z t ) (3) z t = argmin λ ( z ) z ∈Z t 24/30

BOCA : Bayesian Optimisation with Continuous Approximations (Kandasamy et al. ICML 2017) Model g ∼ GP (0 , κ ) and com- pute posterior GP : mean µ t − 1 : Z × X → R σ t − 1 : Z × X → R + std-dev (1) x t ← maximise upper confidence bound for f ( x ) = g ( z • , x ). µ t − 1 ( z • , x ) + β 1 / 2 x t = argmax σ t − 1 ( z • , x ) t x ∈X � λ ( z ) � q � � (2) Z t ≈ { z • } ∪ z : σ t − 1 ( z , x t ) ≥ γ ( z ) = ξ ( z ) λ ( z • ) (cheapest z in Z t ) (3) z t = argmin λ ( z ) z ∈Z t 24/30

Theoretical Results for BOCA κ : ( Z × X ) 2 → R . g ∼ GP ( 0 , κ ), κ ([ z , x ] , [ z ′ , x ′ ]) = κ X ( x , x ′ ) · κ Z ( z , z ′ ) 25/30

Theoretical Results for BOCA κ : ( Z × X ) 2 → R . g ∼ GP ( 0 , κ ), κ ([ z , x ] , [ z ′ , x ′ ]) = κ X ( x , x ′ ) · κ Z ( z , z ′ ) g ( z, x ) g ( z, x ) f ( x ) f ( x ) Z Z x ⋆ x ⋆ z • X z • X “good” “bad” 25/30

Multi-fidelity Bayesian Optimisation g ( z, x ) f ( x ) Z x z X - PowerPoint PPT Presentation

Multi-fidelity Bayesian Optimisation g ( z, x ) f ( x ) Z x z X Kirthevasan Kandasamy Carnegie Mellon University Facebook Inc. Menlo Park, CA September 26, 2017 Slides: www.cs.cmu.edu/ kkandasa/talks/fb-mf-slides.pdf

Medicines optimisation The road to excellence Workshop Overview of meds optimisation Your

BayeHem: Bayesian Optimisation of Genome Assembly 1. Genome Assembly 2. Bayesian Optimisation

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Automated and Accurate Geometry Extraction and Shape Optimisation of 3D Topology Optimisation

Introductory Course on Non-smooth Optimisation Lecture 09 - Non-convex optimisation Jingwei Liang

Introduction to program optimisation Michel Schinz (based on Erik Stenmans slides) Advanced

What You Need to Know The importance of Up-Front and Multi-Fidelity simulation

Fidelity Bank Presentation Non-Deal Roadshow May Jun 2018 www.fidelitybank.ng Profile of

Water Resources Water Resources Water Resources Water Resources Geospatial World Forum 2014

ACHIEVING HIGH FIDELITY ASSERTIVE COMMUNITY TREATMENT THROUGH THE IMPLEMENTATION OF FIDELITY

Tier 2 Fidelity Data: Strengthening your Tier 2 PBIS Implementation: Using Fidelity Measures to

Informal/formal sector partnerships: Fidelity Bank Ghana BUSINESS UNIT: MICROFINANCE DATE: 22 nd

Wireless Fidelity with bwfm(4) Patrick Wildt September 22, 2019 Patrick Wildt Wireless Fidelity

Optimizing Email Signups How Fidelity Investments boosted email signups Scott MacMillan

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bandit Optimisation with Approximations Kirthevasan Kandasamy Carnegie Mellon University

Image Processing II Computer Vision Fall 2018 Columbia University Convolution Review Cross

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1.

Network-based and Client-based DMM solutions using Mobile IP mechanisms

Beyond Memorability: Visualization Recognition and Recall Borkin, M., Bylinskii, Z., Kim, N.W.,

Large gratings for large telescopes Ernesto Oliva, INAF-Arcetri (Florence) Why do astronomers

Lucia Oliva Collaborators: Elena Bratkovskaya, Pierre Moreau, Vadim Voronyuk Dynamics of quarks

Bar recursion over finite partial functions Thomas Powell (joint work with Paulo Oliva)