Safe Exploration for Optimization with Gaussian Processes Yanan Sui - - PowerPoint PPT Presentation

safe exploration for optimization with gaussian processes
SMART_READER_LITE
LIVE PREVIEW

Safe Exploration for Optimization with Gaussian Processes Yanan Sui - - PowerPoint PPT Presentation

Safe Exploration for Optimization with Gaussian Processes Yanan Sui Alkis Gotovos Joel W. Burdick Andreas Krause Caltech ETH Zurich Caltech ETH Zurich Better safe than sorry youtube.com/user/mattessons Safe Exploration for Optimization


slide-1
SLIDE 1

Safe Exploration for Optimization with Gaussian Processes

Yanan Sui

Caltech

Alkis Gotovos

ETH Zurich

Joel W. Burdick

Caltech

Andreas Krause

ETH Zurich

slide-2
SLIDE 2

Better safe than sorry

youtube.com/user/mattessons

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 2

slide-3
SLIDE 3

Therapeutic spinal cord stimulation

girardgibbs.com sjm.com

◮ Find electrode confjgurations that

maximize muscle activity

◮ Bad confjgurations may cause pain or have

negative efgects on treatment

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 3

slide-4
SLIDE 4

Goal Optimize an unknown reward function via sequential sampling AND remain “safe” throughout the process

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 4

slide-5
SLIDE 5

Problem statement

◮ Finite decision set D ◮ Unknown reward function f : D → R

Safety threshold Seed set

  • f safe decisions (

) h

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 5

slide-6
SLIDE 6

Problem statement

◮ Finite decision set D ◮ Unknown reward function f : D → R ◮ Safety threshold h ∈ R

Seed set

  • f safe decisions (

) h

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 6

slide-7
SLIDE 7

Problem statement

◮ Finite decision set D ◮ Unknown reward function f : D → R ◮ Safety threshold h ∈ R

Seed set

  • f safe decisions (

) h

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 7

slide-8
SLIDE 8

Problem statement

◮ Finite decision set D ◮ Unknown reward function f : D → R ◮ Safety threshold h ∈ R ◮ Seed set S0 of safe decisions (∀x ∈ S0, f(x) ≥ h)

h

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 8

slide-9
SLIDE 9

Problem statement

Sequential sampling

◮ For t = 1, 2, . . .

◮ select xt ∈ D ◮ observe f(xt) + nt

Goal Find argmax Remain safe:

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 9

slide-10
SLIDE 10

Problem statement

Sequential sampling

◮ For t = 1, 2, . . .

◮ select xt ∈ D ◮ observe f(xt) + nt

Goal

◮ Find x∗ ∈ argmaxx∈D f(x) ◮ Remain safe: ∀t ≥ 1, f(xt) ≥ h

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 10

slide-11
SLIDE 11

Problem statement

Sequential sampling

◮ For t = 1, 2, . . .

◮ select xt ∈ D ◮ observe f(xt) + nt

Goal

◮ Find x∗ ∈ argmaxx∈D f(x) ◮

Remain safe: ∀t ≥ 1, f(xt) ≥ h

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 11

slide-12
SLIDE 12

Related work

◮ Bayesian optimization: function evaluation is expensive

Various proposed criteria, e.g.,

Expected improvement [Mockus et al., 1974] UCB [Auer, 2002] [Srinivas et al., 2010]

Related variants

Level set estimation [Gotovos et al., 2013] Bayesian optimization with constraints [Gardner et al., 2014]

Gaussian processes popular for modeling the unknown function

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 12

slide-13
SLIDE 13

Related work

◮ Bayesian optimization: function evaluation is expensive ◮ Various proposed criteria, e.g.,

◮ Expected improvement [Mockus et al., 1974] ◮ UCB [Auer, 2002] [Srinivas et al., 2010]

Related variants

Level set estimation [Gotovos et al., 2013] Bayesian optimization with constraints [Gardner et al., 2014]

Gaussian processes popular for modeling the unknown function

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 13

slide-14
SLIDE 14

Related work

◮ Bayesian optimization: function evaluation is expensive ◮ Various proposed criteria, e.g.,

◮ Expected improvement [Mockus et al., 1974] ◮ UCB [Auer, 2002] [Srinivas et al., 2010]

◮ Related variants

◮ Level set estimation [Gotovos et al., 2013] ◮ Bayesian optimization with constraints [Gardner et al., 2014]

Gaussian processes popular for modeling the unknown function

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 14

slide-15
SLIDE 15

Related work

◮ Bayesian optimization: function evaluation is expensive ◮ Various proposed criteria, e.g.,

◮ Expected improvement [Mockus et al., 1974] ◮ UCB [Auer, 2002] [Srinivas et al., 2010]

◮ Related variants

◮ Level set estimation [Gotovos et al., 2013] ◮ Bayesian optimization with constraints [Gardner et al., 2014]

◮ Gaussian processes popular for modeling the unknown function

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 15

slide-16
SLIDE 16

Gaussian process regression

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 16

slide-17
SLIDE 17

Gaussian process regression

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 17

slide-18
SLIDE 18

Gaussian process regression

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 18

slide-19
SLIDE 19

Gaussian process regression

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 19

slide-20
SLIDE 20

Gaussian process regression

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 20

slide-21
SLIDE 21

Gaussian process regression

ut(x) ℓt(x) x

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 21

slide-22
SLIDE 22

GP-UCB

◮ Use upper confjdence bounds for optimistic sampling

argmax Sublinear regret under suitable conditions on

[Srinivas et al., 2010]

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 22

slide-23
SLIDE 23

GP-UCB

◮ Use upper confjdence bounds for optimistic sampling ◮ xt = argmaxx∈D ut(x)

Sublinear regret under suitable conditions on

[Srinivas et al., 2010]

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 23

slide-24
SLIDE 24

GP-UCB

◮ Use upper confjdence bounds for optimistic sampling ◮ xt = argmaxx∈D ut(x) ◮ Sublinear regret under suitable conditions on f [Srinivas et al., 2010]

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 24

slide-25
SLIDE 25

GP-UCB example (t = 0)

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 25

slide-26
SLIDE 26

GP-UCB example (t = 5)

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 26

slide-27
SLIDE 27

GP-UCB example (t = 10)

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 27

slide-28
SLIDE 28

GP-UCB example (t = 20)

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 28

slide-29
SLIDE 29

Certifying safety

◮ Assume that f is L-Lipschitz continuous w.r.t. a metric d

If for some safe we know , then a safety certifjcate for is

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 29

slide-30
SLIDE 30

Certifying safety

◮ Assume that f is L-Lipschitz continuous w.r.t. a metric d ◮ If for some safe x we know f(x), then a safety certifjcate for x′ is

f(x) − L d(x, x′) ≥ h

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 30

slide-31
SLIDE 31

Certifying safety

S0

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 31

slide-32
SLIDE 32

Certifying safety

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 32

slide-33
SLIDE 33

Certifying safety

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 33

slide-34
SLIDE 34

Certifying safety

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 34

slide-35
SLIDE 35

Certifying safety

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 35

slide-36
SLIDE 36

Certifying safety

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 36

slide-37
SLIDE 37

Certifying safety

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 37

slide-38
SLIDE 38

Certifying safety

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 38

slide-39
SLIDE 39

Certifying safety

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 39

slide-40
SLIDE 40

Certifying safety

¯ R0(S0)

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 40

slide-41
SLIDE 41

Certifying safety

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 41

slide-42
SLIDE 42

Certifying safety

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 42

slide-43
SLIDE 43

Reachability

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 43

slide-44
SLIDE 44

Reachability

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 44

slide-45
SLIDE 45

Reachability

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 45

slide-46
SLIDE 46

Reachability

¯ Rϵ(S0)

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 46

slide-47
SLIDE 47

Reconsidering optimization

◮ Initial goal of fjnding f ∗ = maxx∈D f(x) is unrealistic

Instead, aim for the -reachable maximum max Smaller stricter goal need more samples

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 47

slide-48
SLIDE 48

Reconsidering optimization

◮ Initial goal of fjnding f ∗ = maxx∈D f(x) is unrealistic ◮ Instead, aim for the ϵ-reachable maximum

f ∗

ϵ =

max

x∈ ¯ Rϵ(S0) f(x)

Smaller stricter goal need more samples

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 48

slide-49
SLIDE 49

Reconsidering optimization

◮ Initial goal of fjnding f ∗ = maxx∈D f(x) is unrealistic ◮ Instead, aim for the ϵ-reachable maximum

f ∗

ϵ =

max

x∈ ¯ Rϵ(S0) f(x) ◮ Smaller ϵ → stricter goal → need more samples

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 49

slide-50
SLIDE 50

First attempt: Safe-UCB

◮ Keep set St of certifjed safe points (starting with S0)

Use Lipschitz property with GP lower bounds to certify safety argmax

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 50

slide-51
SLIDE 51

First attempt: Safe-UCB

◮ Keep set St of certifjed safe points (starting with S0) ◮ Use Lipschitz property with GP lower bounds to certify safety

argmax

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 51

slide-52
SLIDE 52

First attempt: Safe-UCB

◮ Keep set St of certifjed safe points (starting with S0) ◮ Use Lipschitz property with GP lower bounds to certify safety ◮ xt = argmaxx∈St ut(x)

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 52

slide-53
SLIDE 53

Safe-UCB example (t = 0)

S0

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 53

slide-54
SLIDE 54

Safe-UCB example (t = 5)

St

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 54

slide-55
SLIDE 55

Safe-UCB example (t = 10)

St

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 55

slide-56
SLIDE 56

Safe-UCB example (t = 20)

St

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 56

slide-57
SLIDE 57

Safe-UCB example (t = 50)

St

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 57

slide-58
SLIDE 58

SafeOpt

◮ Encourage expansion of St → keep set Gt ⊆ St of potential expanders

Encourage locating the maximum within keep set

  • f potential

maximizers Pick most uncertain point within

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 58

slide-59
SLIDE 59

SafeOpt

◮ Encourage expansion of St → keep set Gt ⊆ St of potential expanders ◮ Encourage locating the maximum within St → keep set Mt ⊆ St of potential

maximizers Pick most uncertain point within

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 59

slide-60
SLIDE 60

SafeOpt

◮ Encourage expansion of St → keep set Gt ⊆ St of potential expanders ◮ Encourage locating the maximum within St → keep set Mt ⊆ St of potential

maximizers

◮ Pick most uncertain point within Gt ∪ Mt

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 60

slide-61
SLIDE 61

SafeOpt example (t = 0)

S0

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 61

slide-62
SLIDE 62

SafeOpt example (t = 5)

Mt St Gt

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 62

slide-63
SLIDE 63

SafeOpt example (t = 10)

Mt St Gt

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 63

slide-64
SLIDE 64

SafeOpt example (t = 20)

Mt St Gt

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 64

slide-65
SLIDE 65

SafeOpt example (t = 30)

Mt St Gt

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 65

slide-66
SLIDE 66

SafeOpt example (t = 35)

Mt St Gt

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 66

slide-67
SLIDE 67

SafeOpt example (t = 40)

Mt St Gt

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 67

slide-68
SLIDE 68

SafeOpt example (t = 50)

Mt St Gt

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 68

slide-69
SLIDE 69

SafeOpt pseudocode

Input: sample set D, kernel k, Lipschitz constant L, seed set S0, safety threshold h for t = 1, 2, . . . do Update St, Gt, and Mt xt ← argmaxx∈Gt∪Mt(ut(x) − ℓt(x)) yt ← f(xt) + nt Update GP estimates end for

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 69

slide-70
SLIDE 70

Theorem

Assumptions

◮ f has bounded norm in the RKHS defjned by k ◮ f is L-Lipschitz continuous ◮ nt is a uniformly bounded martingale difgerence sequence

Under suitable scaling of the GP confjdence intervals, the following jointly hold w.h.p. , ,

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 70

slide-71
SLIDE 71

Theorem

Assumptions

◮ f has bounded norm in the RKHS defjned by k ◮ f is L-Lipschitz continuous ◮ nt is a uniformly bounded martingale difgerence sequence

Under suitable scaling of the GP confjdence intervals, the following jointly hold w.h.p.

◮ ∀t ≥ 1, f(xt) ≥ h ◮ ∀t ≥ t∗, f(ˆ

xt) ≥ f ∗

ϵ − ϵ

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 71

slide-72
SLIDE 72

Experiment 1: Synthetic

◮ Draw 100 random 2-D functions from GP prior (sq. exponential kernel) ◮ Use random singleton seed set S0 per function ◮ Run 100 iterations of each algorithm

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 72

slide-73
SLIDE 73

Experiment 1: Synthetic

rt := f ∗

ϵ − maxτ≤t f(xτ)

10 20 30 40 50 60 70 1 2 3 Safe-UCB SafeOpt GP-UCB

t rt

SafeOpt Safe-UCB GP-UCB

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 73

slide-74
SLIDE 74

Experiment 1: Synthetic

rt := f ∗

ϵ − maxτ≤t f(xτ)

10 20 30 40 50 60 70 1 2 3 Safe-UCB SafeOpt GP-UCB

t rt

1.5 2 2.5 SafeOpt 1.5 2 2.5 Safe-UCB 1.5 2 2.5 GP-UCB

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 74

slide-75
SLIDE 75

Experiment 2: Spinal cord therapy

◮ Electrode confjgurations are

represented by points in R4

◮ Fit sq. exponential ARD kernel ◮ Run 300 iterations of each

algorithm

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 75

slide-76
SLIDE 76

Experiment 2: Spinal cord therapy

SafeOpt Safe-UCB GP-UCB 25 50 75 100 stop SafeOpt Safe-UCB GP-UCB non-stop SafeOpt Safe-UCB GP-UCB

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 76

slide-77
SLIDE 77

Experiment 2: Spinal cord therapy

SafeOpt Safe-UCB GP-UCB 25 50 75 100 stop SafeOpt Safe-UCB GP-UCB 25 50 75 100 non-stop SafeOpt Safe-UCB GP-UCB

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 77

slide-78
SLIDE 78

Experiment 2: Spinal cord therapy

SafeOpt Safe-UCB GP-UCB 25 50 75 100 stop SafeOpt Safe-UCB GP-UCB 25 50 75 100 non-stop 0.4 0.6 0.8 1 SafeOpt 0.4 0.6 0.8 1 Safe-UCB 0.4 0.6 0.8 1 GP-UCB

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 78

slide-79
SLIDE 79

Conclusion

Recap

◮ We formulated safe optimization using the concept of reachability ◮ We proposed SafeOpt, an algorithm with theoretical guarantees

What we skipped here Rigorous theoretical setup and analysis Another application: safe movie recommendation

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 79

slide-80
SLIDE 80

Conclusion

Recap

◮ We formulated safe optimization using the concept of reachability ◮ We proposed SafeOpt, an algorithm with theoretical guarantees

What we skipped here

◮ Rigorous theoretical setup and analysis ◮ Another application: safe movie recommendation

Safe Exploration for Optimization with Gaussian Processes Alkis Gotovos 80