Parallel Thompson Sampling for Large-scale Accelerated Exploration - - PowerPoint PPT Presentation

parallel thompson sampling for large scale accelerated
SMART_READER_LITE
LIVE PREVIEW

Parallel Thompson Sampling for Large-scale Accelerated Exploration - - PowerPoint PPT Presentation

Parallel Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space, Jos e Miguel Hern andezLobato Department of Engineering University of Cambridge http://jmhl.org , jmh233@cam.ac.uk Joint work with James Requeima,


slide-1
SLIDE 1

Parallel Thompson Sampling for Large-scale Accelerated Exploration

  • f Chemical Space,

Jos´ e Miguel Hern´ andez–Lobato Department of Engineering University of Cambridge http://jmhl.org, jmh233@cam.ac.uk Joint work with James Requeima, Edward O. Pyzer-Knapp and Alan Aspuru-Guzik.

1 / 91

slide-2
SLIDE 2

Drug and material design Goal: find novel molecules that optimally fulfill various metrics. About 108 compounds in databases, potential ones: 1020 − 1060. Challenges:

  • Evaluating molecular properties is slow and expensive.
  • Chemical space is huge.

2 / 91

slide-3
SLIDE 3

Drug and material design Goal: find novel molecules that optimally fulfill various metrics. About 108 compounds in databases, potential ones: 1020 − 1060. Challenges:

  • Evaluating molecular properties is slow and expensive.
  • Chemical space is huge.

Bayesian optimization can accelerate the search.

3 / 91

slide-4
SLIDE 4

Bayesian optimization aims to efficiently optimize black-box functions: x⋆ = arg max

x∈X

f (x) No gradients, observations may be corrupted by noise. Black-box queries are very expensive (time, economic cost, etc...).

4 / 91

slide-5
SLIDE 5

Bayesian optimization aims to efficiently optimize black-box functions: x⋆ = arg max

x∈X

f (x) No gradients, observations may be corrupted by noise. Black-box queries are very expensive (time, economic cost, etc...). Main idea: replace expensive black-box queries with cheaper computations that will save additional queries in the long run.

5 / 91

slide-6
SLIDE 6
  • bjective

1 Get initial sample.

6 / 91

slide-7
SLIDE 7
  • bjective
  • bjective

1 Get initial sample.

7 / 91

slide-8
SLIDE 8
  • bjective

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

8 / 91

slide-9
SLIDE 9
  • bjective

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

9 / 91

slide-10
SLIDE 10

Objective

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

10 / 91

slide-11
SLIDE 11

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x).

11 / 91

slide-12
SLIDE 12

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x). 5 Collect data and update model.

12 / 91

slide-13
SLIDE 13

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x). 5 Collect data and update model. 6 Repeat!

13 / 91

slide-14
SLIDE 14

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x). 5 Collect data and update model. 6 Repeat!

14 / 91

slide-15
SLIDE 15

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x). 5 Collect data and update model. 6 Repeat!

15 / 91

slide-16
SLIDE 16

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x). 5 Collect data and update model. 6 Repeat!

16 / 91

slide-17
SLIDE 17

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x). 5 Collect data and update model. 6 Repeat!

17 / 91

slide-18
SLIDE 18

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x). 5 Collect data and update model. 6 Repeat!

18 / 91

slide-19
SLIDE 19

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x). 5 Collect data and update model. 6 Repeat!

19 / 91

slide-20
SLIDE 20

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x). 5 Collect data and update model. 6 Repeat!

20 / 91

slide-21
SLIDE 21

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x). 5 Collect data and update model. 6 Repeat!

21 / 91

slide-22
SLIDE 22

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x). 5 Collect data and update model. 6 Repeat!

22 / 91

slide-23
SLIDE 23

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x). 5 Collect data and update model. 6 Repeat!

23 / 91

slide-24
SLIDE 24

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x). 5 Collect data and update model. 6 Repeat!

24 / 91

slide-25
SLIDE 25

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x). 5 Collect data and update model. 6 Repeat!

25 / 91

slide-26
SLIDE 26

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x). 5 Collect data and update model. 6 Repeat!

26 / 91

slide-27
SLIDE 27

Objective Acquisition Function α(x)

1 Get initial sample. 2 Fit a model to the data:

p(y|x, D) .

3 Select data collection strategy:

α(x) = Ep(y|x,D)[U(y|x, D)] .

4 Optimize acquisition function α(x). 5 Collect data and update model. 6 Repeat!

27 / 91

slide-28
SLIDE 28

Discovering new optimal molecules

Library generation

Fragments Bonding rules Performance evaluation Interesting molecules

2

2 1

28 / 91

slide-29
SLIDE 29

Discovering new optimal molecules

Library generation

Fragments Bonding rules Performance evaluation Interesting molecules

2

2 1

Bayesian optimization can accelerate the search!

29 / 91

slide-30
SLIDE 30

Discovering new optimal molecules

Library generation

Fragments Bonding rules Performance evaluation Interesting molecules

2

2 1

Bayesian optimization can accelerate the search!

Challenges:

1 Massive libraries with millions of candidate molecules.

30 / 91

slide-31
SLIDE 31

Discovering new optimal molecules

Library generation

Fragments Bonding rules Performance evaluation Interesting molecules

2

2 1

Bayesian optimization can accelerate the search!

Challenges:

1 Massive libraries with millions of candidate molecules. 2 Need to collect hundreds of thousands of data points.

31 / 91

slide-32
SLIDE 32

Discovering new optimal molecules

Library generation

Fragments Bonding rules Performance evaluation Interesting molecules

2

2 1

Bayesian optimization can accelerate the search!

Challenges:

1 Massive libraries with millions of candidate molecules. 2 Need to collect hundreds of thousands of data points. 3 How to collect data in parallel efficiently? e.g. with a computer cluster.

32 / 91

slide-33
SLIDE 33

Parallel Bayesian optimization

Traditional Bayesian optimization is sequential!

33 / 91

slide-34
SLIDE 34

Parallel Bayesian optimization

Traditional Bayesian optimization is sequential!

34 / 91

slide-35
SLIDE 35

Parallel Bayesian optimization

Traditional Bayesian optimization is sequential! Computing clusters allow us to collect a batch of data at once!

35 / 91

slide-36
SLIDE 36

Parallel Bayesian optimization

Traditional Bayesian optimization is sequential! Computing clusters allow us to collect a batch of data at once!

36 / 91

slide-37
SLIDE 37

Parallel Bayesian optimization

Traditional Bayesian optimization is sequential! Computing clusters allow us to collect a batch of data at once!

37 / 91

slide-38
SLIDE 38

Parallel Bayesian optimization

Traditional Bayesian optimization is sequential! Computing clusters allow us to collect a batch of data at once! Parallel experiments should be highly informative but also diverse!

38 / 91

slide-39
SLIDE 39

Traditional parallel BO

Parallel BO can be implemented by averaging the sequential acquisition function across data {yk}K

k=1 fantasized at pending evaluation locations {xk}K k=1:

αparallel(x|D) = Ep({yk}K

k=1|{xk}K k=1,D)

  • αsequential(x|D ∪ {xk,yk}K

k=1)

  • .

39 / 91

slide-40
SLIDE 40

Traditional parallel BO

Parallel BO can be implemented by averaging the sequential acquisition function across data {yk}K

k=1 fantasized at pending evaluation locations {xk}K k=1:

αparallel(x|D) = Ep({yk}K

k=1|{xk}K k=1,D)

  • αsequential(x|D ∪ {xk,yk}K

k=1)

  • .

Approximated by an empirical average across fantasies (samples) of {yk}K

k=1.

40 / 91

slide-41
SLIDE 41

Traditional parallel BO

Figure source: Snoek et al. 2012.

Two pending evaluations, three fantasies. Three acquisition functions, one per fantasy. Average acquisition function.

41 / 91

slide-42
SLIDE 42

Traditional parallel BO

Figure source: Snoek et al. 2012.

Two pending evaluations, three fantasies. Three acquisition functions, one per fantasy. Average acquisition function.

Challenges:

  • Lack of scalability with large batch sizes and large library sizes.

42 / 91

slide-43
SLIDE 43

Traditional parallel BO

time

43 / 91

slide-44
SLIDE 44

Traditional parallel BO

time

Updating the model and optimizing acquisition function is done sequentially.

44 / 91

slide-45
SLIDE 45

Traditional parallel BO

time

Updating the model and optimizing acquisition function is done sequentially. Fails to exploit parallelism!

45 / 91

slide-46
SLIDE 46

Traditional parallel BO

time

Updating the model and optimizing acquisition function is done sequentially. Fails to exploit parallelism! There is a need for methods that fully work in a parallel and distributed manner.

46 / 91

slide-47
SLIDE 47

Thompson sampling (TS)

Sequential BO method that collects data by evaluating at x ∼ p(x⋆|D). Implemented by drawing f ′ ∼ p(f |D) and then evaluating at x = arg min

x′

f ′(x′). The acquisition function is a sample from the posterior over functions!

47 / 91

slide-48
SLIDE 48

Thompson sampling (TS)

Sequential BO method that collects data by evaluating at x ∼ p(x⋆|D). Implemented by drawing f ′ ∼ p(f |D) and then evaluating at x = arg min

x′

f ′(x′). The acquisition function is a sample from the posterior over functions!

  • bjective

48 / 91

slide-49
SLIDE 49

Thompson sampling (TS)

Sequential BO method that collects data by evaluating at x ∼ p(x⋆|D). Implemented by drawing f ′ ∼ p(f |D) and then evaluating at x = arg min

x′

f ′(x′). The acquisition function is a sample from the posterior over functions!

  • bjective

49 / 91

slide-50
SLIDE 50

Thompson sampling (TS)

Sequential BO method that collects data by evaluating at x ∼ p(x⋆|D). Implemented by drawing f ′ ∼ p(f |D) and then evaluating at x = arg min

x′

f ′(x′). The acquisition function is a sample from the posterior over functions!

  • bjective

50 / 91

slide-51
SLIDE 51

Thompson sampling (TS)

Sequential BO method that collects data by evaluating at x ∼ p(x⋆|D). Implemented by drawing f ′ ∼ p(f |D) and then evaluating at x = arg min

x′

f ′(x′). The acquisition function is a sample from the posterior over functions!

  • bjective

51 / 91

slide-52
SLIDE 52

Thompson sampling (TS)

Sequential BO method that collects data by evaluating at x ∼ p(x⋆|D). Implemented by drawing f ′ ∼ p(f |D) and then evaluating at x = arg min

x′

f ′(x′). The acquisition function is a sample from the posterior over functions!

  • bjective

Exploitation: achieved because on average f ′ minimizes the prediction error.

52 / 91

slide-53
SLIDE 53

Thompson sampling (TS)

Sequential BO method that collects data by evaluating at x ∼ p(x⋆|D). Implemented by drawing f ′ ∼ p(f |D) and then evaluating at x = arg min

x′

f ′(x′). The acquisition function is a sample from the posterior over functions!

  • bjective

Exploitation: achieved because on average f ′ minimizes the prediction error. Exploration: achieved because f ′ ∼ p(f |D) is a random sample.

53 / 91

slide-54
SLIDE 54

Thompson sampling (TS)

Sequential BO method that collects data by evaluating at x ∼ p(x⋆|D). Implemented by drawing f ′ ∼ p(f |D) and then evaluating at x = arg min

x′

f ′(x′). The acquisition function is a sample from the posterior over functions!

  • bjective

Very simple strategy that

  • ften works well in practice.

Exploitation: achieved because on average f ′ minimizes the prediction error. Exploration: achieved because f ′ ∼ p(f |D) is a random sample.

54 / 91

slide-55
SLIDE 55

TS as utility maximization and parallel TS

The utility function used by TS is U(y|x, D) = y. TS aims to optimize αTS(x) = Ep(y|x, D)[y] where p(y|x, D) =

  • p(y|x, θ)p(θ|D) dθ and θ are the model parameters.

.

55 / 91

slide-56
SLIDE 56

TS as utility maximization and parallel TS

The utility function used by TS is U(y|x, D) = y. TS aims to optimize αTS(x) = Ep(y|x, D)[y] θm ∼ p(θ|D) , where p(y|x, D) =

  • p(y|x, θ)p(θ|D) dθ and θ are the model parameters.

.

56 / 91

slide-57
SLIDE 57

TS as utility maximization and parallel TS

The utility function used by TS is U(y|x, D) = y. TS aims to optimize αTS(x) = Ep(y|x, D)[y] ≈ 1 M

M

  • m=1

Ep(y|x, θm)[y] , θm ∼ p(θ|D) , where p(y|x, D) =

  • p(y|x, θ)p(θ|D) dθ and θ are the model parameters.

.

57 / 91

slide-58
SLIDE 58

TS as utility maximization and parallel TS

The utility function used by TS is U(y|x, D) = y. TS aims to optimize αTS(x) = Ep(y|x, D)[y] ≈ 1 M

M

  • m=1

Ep(y|x, θm)[y] , θm ∼ p(θ|D) , where p(y|x, D) =

  • p(y|x, θ)p(θ|D) dθ and θ are the model parameters.

TS uses M = 1 since low values of M increase variance and exploration.

58 / 91

slide-59
SLIDE 59

TS as utility maximization and parallel TS

The utility function used by TS is U(y|x, D) = y. TS aims to optimize αTS(x) = Ep(y|x, D)[y] ≈ 1 M

M

  • m=1

Ep(y|x, θm)[y] , θm ∼ p(θ|D) , where p(y|x, D) =

  • p(y|x, θ)p(θ|D) dθ and θ are the model parameters.

TS uses M = 1 since low values of M increase variance and exploration. We can apply the traditional parallel BO approach to TS:

59 / 91

slide-60
SLIDE 60

TS as utility maximization and parallel TS

The utility function used by TS is U(y|x, D) = y. TS aims to optimize αTS(x) = Ep(y|x, D)[y] ≈ 1 M

M

  • m=1

Ep(y|x, θm)[y] , θm ∼ p(θ|D) , where p(y|x, D) =

  • p(y|x, θ)p(θ|D) dθ and θ are the model parameters.

TS uses M = 1 since low values of M increase variance and exploration. We can apply the traditional parallel BO approach to TS: αparallel TS(x|D) = Ep({yk }K

k=1|{xk }K k=1,D)

  • αTS(x|D ∪ {xk,yk}K

k=1)

  • 60 / 91
slide-61
SLIDE 61

TS as utility maximization and parallel TS

The utility function used by TS is U(y|x, D) = y. TS aims to optimize αTS(x) = Ep(y|x, D)[y] ≈ 1 M

M

  • m=1

Ep(y|x, θm)[y] , θm ∼ p(θ|D) , where p(y|x, D) =

  • p(y|x, θ)p(θ|D) dθ and θ are the model parameters.

TS uses M = 1 since low values of M increase variance and exploration. We can apply the traditional parallel BO approach to TS: αparallel TS(x|D) = Ep({yk }K

k=1|{xk }K k=1,D)

  • αTS(x|D ∪ {xk,yk}K

k=1)

  • ≈ 1

M

M

  • m=1

αTS(x|D ∪ {xk,yk,m}K

k=1) 61 / 91

slide-62
SLIDE 62

TS as utility maximization and parallel TS

The utility function used by TS is U(y|x, D) = y. TS aims to optimize αTS(x) = Ep(y|x, D)[y] ≈ 1 M

M

  • m=1

Ep(y|x, θm)[y] , θm ∼ p(θ|D) , where p(y|x, D) =

  • p(y|x, θ)p(θ|D) dθ and θ are the model parameters.

TS uses M = 1 since low values of M increase variance and exploration. We can apply the traditional parallel BO approach to TS: αparallel TS(x|D) = Ep({yk }K

k=1|{xk }K k=1,D)

  • αTS(x|D ∪ {xk,yk}K

k=1)

  • ≈ 1

M

M

  • m=1

αTS(x|D ∪ {xk,yk,m}K

k=1)

where {yk,m}K

k=1 ∼ p({yk}K k=1|{xk}K k=1, D), 62 / 91

slide-63
SLIDE 63

TS as utility maximization and parallel TS

The utility function used by TS is U(y|x, D) = y. TS aims to optimize αTS(x) = Ep(y|x, D)[y] ≈ 1 M

M

  • m=1

Ep(y|x, θm)[y] , θm ∼ p(θ|D) , where p(y|x, D) =

  • p(y|x, θ)p(θ|D) dθ and θ are the model parameters.

TS uses M = 1 since low values of M increase variance and exploration. We can apply the traditional parallel BO approach to TS: αparallel TS(x|D) = Ep({yk }K

k=1|{xk }K k=1,D)

  • αTS(x|D ∪ {xk,yk}K

k=1)

  • ≈ 1

M

M

  • m=1

αTS(x|D ∪ {xk,yk,m}K

k=1)

where {yk,m}K

k=1 ∼ p({yk}K k=1|{xk}K k=1, D), and as before, M = 1. 63 / 91

slide-64
SLIDE 64

TS as utility maximization and parallel TS

The utility function used by TS is U(y|x, D) = y. TS aims to optimize αTS(x) = Ep(y|x, D)[y] ≈ 1 M

M

  • m=1

Ep(y|x, θm)[y] , θm ∼ p(θ|D) , where p(y|x, D) =

  • p(y|x, θ)p(θ|D) dθ and θ are the model parameters.

TS uses M = 1 since low values of M increase variance and exploration. We can apply the traditional parallel BO approach to TS: αparallel TS(x|D) = Ep({yk }K

k=1|{xk }K k=1,D)

  • αTS(x|D ∪ {xk,yk}K

k=1)

  • ≈ 1

M

M

  • m=1

αTS(x|D ∪ {xk,yk,m}K

k=1) = αTS(x|D) ,

where {yk,m}K

k=1 ∼ p({yk}K k=1|{xk}K k=1, D), and as before, M = 1. 64 / 91

slide-65
SLIDE 65

TS as utility maximization and parallel TS

The utility function used by TS is U(y|x, D) = y. TS aims to optimize αTS(x) = Ep(y|x, D)[y] ≈ 1 M

M

  • m=1

Ep(y|x, θm)[y] , θm ∼ p(θ|D) , where p(y|x, D) =

  • p(y|x, θ)p(θ|D) dθ and θ are the model parameters.

TS uses M = 1 since low values of M increase variance and exploration. We can apply the traditional parallel BO approach to TS: αparallel TS(x|D) = Ep({yk }K

k=1|{xk }K k=1,D)

  • αTS(x|D ∪ {xk,yk}K

k=1)

  • ≈ 1

M

M

  • m=1

αTS(x|D ∪ {xk,yk,m}K

k=1) = αTS(x|D) ,

where {yk,m}K

k=1 ∼ p({yk}K k=1|{xk}K k=1, D), and as before, M = 1.

Our parallel TS is equivalent to running sequential TS multiple times!

65 / 91

slide-66
SLIDE 66

Example

  • bjective

66 / 91

slide-67
SLIDE 67

Example

  • bjective

67 / 91

slide-68
SLIDE 68

Example

  • bjective

68 / 91

slide-69
SLIDE 69

Example

  • bjective

69 / 91

slide-70
SLIDE 70

Example

  • bjective

70 / 91

slide-71
SLIDE 71

Example

  • bjective

71 / 91

slide-72
SLIDE 72

Example

  • bjective

72 / 91

slide-73
SLIDE 73

Example

  • bjective

73 / 91

slide-74
SLIDE 74

Example

  • bjective

74 / 91

slide-75
SLIDE 75

Example

  • bjective

75 / 91

slide-76
SLIDE 76

Example

  • bjective

76 / 91

slide-77
SLIDE 77

Example

  • bjective

77 / 91

slide-78
SLIDE 78

Example

  • bjective

78 / 91

slide-79
SLIDE 79

Example

  • bjective

79 / 91

slide-80
SLIDE 80

Example

  • bjective

80 / 91

slide-81
SLIDE 81

Example

  • bjective

Each optimization problem can be done independently in a different computer.

81 / 91

slide-82
SLIDE 82

Parallel Thompson sampling

time

82 / 91

slide-83
SLIDE 83

Parallel Thompson sampling

time

Works in a fully parallel and distributed manner.

83 / 91

slide-84
SLIDE 84

Thompson sampling with Gaussian processes (GPs)

GPs are a non-parametric models, so sampling the model parameters θ and

  • ptimizing Ep(y|x, θ)[y] is not possible.

We approximate the objective function f as f (x) ≈ Φ(x)θ with random features Φ(x) = C cos(WTx + b) , (1) where W, b ∼ p(W, b), a distribution specified by the GP covariance function. The prior for θ is N(0, I). The resulting Bayesian linear regression model is a parametric approximation to the original GP model.

84 / 91

slide-85
SLIDE 85

Results with Gaussian process based models

Batch size: 10

10 20 30 40 50 Number of Samples 2 1 1 2 3 4 5 log Immediate Regret

Bohachevsky TS EI parallel TS parallel EI

10 20 30 40 50 Number of Samples 4 3 2 1 1 2 log Immediate Regret

Branin

10 20 30 40 50 Number of Samples 1.5 1.0 0.5 0.0 0.5 1.0 log Immediate Regret

Hartmann

20 40 60 80 100 Number of Samples 10 8 6 4 2 Log Immediate Regret

GP Samples

85 / 91

slide-86
SLIDE 86

Results with Bayesian neural networks

Data sets:

  • CEP: Harvard Clean Energy Project data, 2.3M molecules.
  • One-dose: percentage cell growth relative to control, 27,000 molecules.
  • Malaria: drug concentration giving half max response, 19,000 molecules.

Batch sizes: 500 (CEP) and 200 (Malaria and One-dose).

86 / 91

slide-87
SLIDE 87

Results with Bayesian neural networks

Data sets:

  • CEP: Harvard Clean Energy Project data, 2.3M molecules.
  • One-dose: percentage cell growth relative to control, 27,000 molecules.
  • Malaria: drug concentration giving half max response, 19,000 molecules.

Batch sizes: 500 (CEP) and 200 (Malaria and One-dose). Fraction of top 10% (CEP) or 1% (Malaria and One-dose) molecules found:

Thompson Ignoring uncertainty Random

87 / 91

slide-88
SLIDE 88

Results with Bayesian neural networks

Data sets:

  • CEP: Harvard Clean Energy Project data, 2.3M molecules.
  • One-dose: percentage cell growth relative to control, 27,000 molecules.
  • Malaria: drug concentration giving half max response, 19,000 molecules.

Batch sizes: 500 (CEP) and 200 (Malaria and One-dose). Fraction of top 10% (CEP) or 1% (Malaria and One-dose) molecules found:

Thompson Ignoring uncertainty Random

BO gives 20× gains over random in CEP. Using uncertainty always helps.

88 / 91

slide-89
SLIDE 89

Comparison with ǫ-greedy sampling

Table: Average rank and standard errors obtained by each method. Method Rank ǫ = 0.01 3.42±0.28 ǫ = 0.025 3.02±0.25 ǫ = 0.05 2.86±0.23 ǫ = 0.075 3.20±0.26 Thompson 2.51±0.20

89 / 91

slide-90
SLIDE 90

Take home message

Parallel Thompson sampling...

1 Batch BO method that runs in a parallel and distributed manner. 2 Can handle large batch sizes and large molecule libraries. 3 Comparable to non-scalable approaches in small problems with GPs. 4 Outperforms other alternative scalable approaches in large scale settings.

90 / 91

slide-91
SLIDE 91

Thanks!

91 / 91