Myopic Posterior Sampling for Adaptive Goal Oriented Design of - - PowerPoint PPT Presentation

myopic posterior sampling for adaptive goal oriented
SMART_READER_LITE
LIVE PREVIEW

Myopic Posterior Sampling for Adaptive Goal Oriented Design of - - PowerPoint PPT Presentation

Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments Kirthevasan Kandasamy , Willie Neiswanger, Reed Zhang, Akshay Krishnamurthy, Jeff Schneider, Barnab as P oczos ICML 2019 Example 1: Active Learning in Parametric


slide-1
SLIDE 1

Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments

Kirthevasan Kandasamy, Willie Neiswanger, Reed Zhang, Akshay Krishnamurthy, Jeff Schneider, Barnab´ as P´

  • czos

ICML 2019

slide-2
SLIDE 2

Example 1: Active Learning in Parametric Models

(Expensive) Blackbox System

Goal: Learn parameter θ in as few experiments. Algorithms: Active-Set-Select (Chaudhuri et al. 2015)

1

slide-3
SLIDE 3

Example 2: Blackbox Optimisation

(Expensive) Blackbox System

Goal: Find argmaxx fθ(x) in as few experiments. Algorithms: UCB (Srinivas et al 2010, Auer 2002), EI (Jones et al 1998).

2

slide-4
SLIDE 4

Adaptive Goal Oriented Design of Experiments

Update model with results Next design to test

Experiment

(Bayesian) Model Recommendation Algorithm

Application Specific Goal 3

slide-5
SLIDE 5

Adaptive Goal Oriented Design of Experiments

Update model with results Next design to test

Experiment

(Bayesian) Model Recommendation Algorithm

Application Specific Goal

◮ Blackbox Optimisation ◮ Active Learning ◮ Active Quadrature (Osborne et al. 2012) ◮ Active Level Set Estimation (Gotovos et al. ’13) ◮ Active Search (Ma et al. ’17) ◮ Active Posterior Estimation (Kandasamy et al. ’15)

3

slide-6
SLIDE 6

Adaptive Goal Oriented Design of Experiments

Update model with results Next design to test

Experiment

(Bayesian) Model Recommendation Algorithm

Application Specific Goal

◮ Blackbox Optimisation ◮ Active Learning ◮ Active Quadrature (Osborne et al. 2012) ◮ Active Level Set Estimation (Gotovos et al. ’13) ◮ Active Search (Ma et al. ’17) ◮ Active Posterior Estimation (Kandasamy et al. ’15)

Issues:

◮ New goal/setting =

⇒ New algorithm?

◮ Algorithms tend to depend on the model and vice versa.

3

slide-7
SLIDE 7

Adaptive Goal Oriented Design of Experiments

  • 1. System:

◮ An unknown parameter θ completely specifies the system. ◮ A prior P(θ) and a likelihood P(Y |X, θ).

4

slide-8
SLIDE 8

Adaptive Goal Oriented Design of Experiments

  • 1. System:

◮ An unknown parameter θ completely specifies the system. ◮ A prior P(θ) and a likelihood P(Y |X, θ).

  • 2. Goal:

◮ Collect data Dn = {(xt, yxt)}n t=1 to maximise a user specified

reward function λ(θ, Dn).

4

slide-9
SLIDE 9

Algorithm: Myopic Posterior Sampling (MPS)

Inspired by Posterior (Thompson) Sampling (Thompson 1933).

At each time step, myopically choose action by assuming that a posterior sample θ′ ∼ P(θ|past-experiments) is the true parameter.

5

slide-10
SLIDE 10

Algorithm: Myopic Posterior Sampling (MPS)

Inspired by Posterior (Thompson) Sampling (Thompson 1933).

At each time step, myopically choose action by assuming that a posterior sample θ′ ∼ P(θ|past-experiments) is the true parameter. Only requires that we can sample from the posterior.

  • Many probabilistic programming tools available today.

5

slide-11
SLIDE 11

Theory

Theorem (Informal): Under certain conditions, MPS is competitive with a globally optimal oracle that knows θ.

Proof ideas from adaptive submodularity and bandits.

6

slide-12
SLIDE 12

Theory

Theorem (Informal): Under certain conditions, MPS is competitive with a globally optimal oracle that knows θ.

Proof ideas from adaptive submodularity and bandits.

Prior work: With adaptive submodularity, myopic planning algorithms are good when the reward is known a priori.

6

slide-13
SLIDE 13

Theory

Theorem (Informal): Under certain conditions, MPS is competitive with a globally optimal oracle that knows θ.

Proof ideas from adaptive submodularity and bandits.

Prior work: With adaptive submodularity, myopic planning algorithms are good when the reward is known a priori. This work:

◮ λ(θ, Dn): reward not known a priori. ◮ A myopic learning+planning algorithm is good in adaptive

submodular environments.

6

slide-14
SLIDE 14

Experiments

Active Learning

Synthetic Example

20 40 60 80 100 10 -3 10 -2 10 -1 10 0

Oracle MPS RAND

Chaudhuri et al '15

Active Level Set Estimation

Luminous Red Galaxies

20 40 60 80 100 0.05 0.1 0.15 0.2 0.25 0.3

RAND Gotovos et al '13 MPS Oracle

Active Posterior Estimation

Type Ia Supernova

20 40 60 80 100 10 -1

Oracle MPS RAND

Kandasamy et al '15

Application Specific Goal

Electrolyte Design

10 20 30 40 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

RAND Oracle MPS

7

slide-15
SLIDE 15

Willie Reed Akshay Jeff Barnabas Code: github.com/kirthevasank/mps

Poster: #262