Bayesian Probabilistic Numerical Methods Numerical Disintegration - - PowerPoint PPT Presentation

bayesian probabilistic numerical methods
SMART_READER_LITE
LIVE PREVIEW

Bayesian Probabilistic Numerical Methods Numerical Disintegration - - PowerPoint PPT Presentation

Bayesian Probabilistic Numerical Methods Numerical Disintegration and Pipelines Jon Cockayne June 6, 2017 1 (Re)introduction (Re)introduction Prior 2 u Data A ( u ) = a Q # a Information Equation Posterior


slide-1
SLIDE 1

Bayesian Probabilistic Numerical Methods

Numerical Disintegration and Pipelines

Jon Cockayne June 6, 2017

1

slide-2
SLIDE 2

(Re)introduction

slide-3
SLIDE 3

(Re)introduction

“Data” A(u) = a u ∼ µ Q#µa “Prior” “Information Equation” “Posterior”

2

slide-4
SLIDE 4

Q1: How can we access µa?

2

slide-5
SLIDE 5

(Re)introduction

Unless probabilistic numerical methods “agree” about what their uncertainty means, they cannot be composed coherently.

3

slide-6
SLIDE 6

Modelling Electro-Mechanics in the Heart

4

slide-7
SLIDE 7

Modelling Electro-Mechanics in the Heart

Ca Flux during Caffeine Ca transient Fit NCX Model Fit RyR Model Ca Flux during tail of Field Stimulation Ca Transient. Less Ca Flux through NCX (calculated) Fit SERCA Model ICaL Voltage clamp traces Fit ICaL Model Ca flux during start of Field Stimulation Ca

  • Transient. Less Ca Flux through NCX,

SERCA and ICaL (calculated)

5

slide-8
SLIDE 8

Q2: when is it “legal” to compose Bayesian PNM in pipelines?

5

slide-9
SLIDE 9

Numerical Disintegration

slide-10
SLIDE 10

Numerical Disintegration

Recall, the issue: X a = {u ∈ X : A(u) = a} µ(Xa) = 0

which means…

d

a

d

6

slide-11
SLIDE 11

Numerical Disintegration

Recall, the issue: X a = {u ∈ X : A(u) = a} µ(Xa) = 0

which means…

∄ dµa dµ

6

slide-12
SLIDE 12

Our Approach

Design an algorithm for approximately sampling µa. Two sources of error

  • Intractability of

a (“Numerical Disintegration”)

  • Intractability of non-Gaussian priors (“prior truncation”)

7

slide-13
SLIDE 13

Our Approach

Design an algorithm for approximately sampling µa. Two sources of error

  • Intractability of µa (“Numerical Disintegration”)
  • Intractability of non-Gaussian priors (“prior truncation”)

7

slide-14
SLIDE 14

Our Approach

Design an algorithm for approximately sampling µa. Two sources of error

  • Intractability of µa (“Numerical Disintegration”)
  • Intractability of non-Gaussian priors (“prior truncation”)

7

slide-15
SLIDE 15

Three Considerations

Numerical Disintegration Prior Truncation

8

slide-16
SLIDE 16

Three Considerations

Numerical Disintegration Prior Truncation Sampler Convergence

8

slide-17
SLIDE 17

Three Considerations

Numerical Disintegration Prior Truncation Sampler Convergence

8

slide-18
SLIDE 18

Numerical Disintegration

Introduce the δ-relaxed measure µa

δ…

dµa

δ

dµ ∝ ϕ (∥A(u) − a∥A δ ) a relaxation function chosen so that:

  • 1
  • r

0 as r .

9

slide-19
SLIDE 19

Numerical Disintegration

Introduce the δ-relaxed measure µa

δ…

dµa

δ

dµ ∝ ϕ (∥A(u) − a∥A δ ) ϕ : R+ → R+ a relaxation function chosen so that:

  • ϕ(0) = 1
  • ϕ(r) → 0 as r → ∞.

9

slide-20
SLIDE 20

Numerical Disintegration: Intuition

“Ideal” Radon–Nikodym derivative “dµa dµ ∝ I(u ∈ X a)”

10

slide-21
SLIDE 21

Example Relaxation Functions

ϕ(r) = I(r < 1) ϕ(r) = exp(−r2)

11

a 1 a

slide-22
SLIDE 22

Example Relaxation Functions

ϕ(r) = I(r < 1) Uniform noise over Bδ(a) ϕ(r) = exp(−r2) Gaussian noise with s.d. ∝ δ

11

a 1 a

slide-23
SLIDE 23

Tempering for Sampling µa

δ

To sample µa

δ we take inspiration from rare event simulation and use tempering

schemes to sample the posterior. Set

1 N and consider a a

1

a

N

  • a

0 is the prior and easy to sample.

  • a

N has

N close to zero and is hard to sample.

  • Intermediate distributions defjne a “ladder” which takes us from prior to posterior.

12

slide-24
SLIDE 24

Tempering for Sampling µa

δ

To sample µa

δ we take inspiration from rare event simulation and use tempering

schemes to sample the posterior. Set δ0 > δ1 > . . . > δN and consider µa

δ0, µa δ1, . . . , µa δN

  • a

0 is the prior and easy to sample.

  • a

N has

N close to zero and is hard to sample.

  • Intermediate distributions defjne a “ladder” which takes us from prior to posterior.

12

slide-25
SLIDE 25

Tempering for Sampling µa

δ

To sample µa

δ we take inspiration from rare event simulation and use tempering

schemes to sample the posterior. Set δ0 > δ1 > . . . > δN and consider µa

δ0, µa δ1, . . . , µa δN

  • µa

δ0 is the prior and easy to sample.

  • µa

δN has δN close to zero and is hard to sample.

  • Intermediate distributions defjne a “ladder” which takes us from prior to posterior.

12

slide-26
SLIDE 26

Example: Poisson’s Equation

Consider − d2 dx2 u(x)= sin(2πx) x ∈ (0, 1) u(x)= 0 x = 0, x = 1

  • Use a Gaussian prior on u x .
  • Impose boundary conditions explicitly.
  • Impose interior conditions at x

1 3, x 2 3.

  • Construct the posterior using ND with

1 0 10

2 10 4 .

  • Use

r exp r2 .

13

slide-27
SLIDE 27

Example: Poisson’s Equation

Consider − d2 dx2 u(x)= sin(2πx) x ∈ (0, 1) u(x)= 0 x = 0, x = 1

  • Use a Gaussian prior on u(x).
  • Impose boundary conditions explicitly.
  • Impose interior conditions at x

1 3, x 2 3.

  • Construct the posterior using ND with

1 0 10

2 10 4 .

  • Use

r exp r2 .

13

slide-28
SLIDE 28

Example: Poisson’s Equation

Consider − d2 dx2 u(x)= sin(2πx) x ∈ (0, 1) u(x)= 0 x = 0, x = 1

  • Use a Gaussian prior on u(x).
  • Impose boundary conditions explicitly.
  • Impose interior conditions at x = 1/3, x = 2/3.
  • Construct the posterior using ND with

1 0 10

2 10 4 .

  • Use

r exp r2 .

13

slide-29
SLIDE 29

Example: Poisson’s Equation

Consider − d2 dx2 u(x)= sin(2πx) x ∈ (0, 1) u(x)= 0 x = 0, x = 1

  • Use a Gaussian prior on u(x).
  • Impose boundary conditions explicitly.
  • Impose interior conditions at x = 1/3, x = 2/3.
  • Construct the posterior using ND with δ ∈

{ 1.0, 10−2, 10−4} .

  • Use ϕ(r) = exp(−r2).

13

slide-30
SLIDE 30

Example: Poisson’s Equation

In what follows, on the left are samples from the posterior µa

δ in X-space.

On the right are contours of ϕ (∥A(u) − a∥A δ ) in A-space.

14

slide-31
SLIDE 31

Example: Poisson’s Equation

0.0 0.2 0.4 0.6 0.8 1.0 0.100 0.075 0.050 0.025 0.000 0.025 0.050 0.075 0.100 0.4 0.6 0.8 1.0 1.2 1.2 1.0 0.8 0.6 0.4

= 1.0

15

slide-32
SLIDE 32

Example: Poisson’s Equation

0.0 0.2 0.4 0.6 0.8 1.0 0.100 0.075 0.050 0.025 0.000 0.025 0.050 0.075 0.100 0.4 0.6 0.8 1.0 1.2 1.2 1.0 0.8 0.6 0.4

= 0.01

15

slide-33
SLIDE 33

Example: Poisson’s Equation

0.0 0.2 0.4 0.6 0.8 1.0 0.100 0.075 0.050 0.025 0.000 0.025 0.050 0.075 0.100 0.4 0.6 0.8 1.0 1.2 1.2 1.0 0.8 0.6 0.4

= 0.0001

15

slide-34
SLIDE 34

Three Considerations

Numerical Disintegration Prior Truncation Sampler Convergence

16

slide-35
SLIDE 35

Prior Construction

Assume X has a countable basis {ϕi}, i = 0, . . . , ∞. Then for any u ∈ X u(x) =

i=0

uiϕi(x) Difgerent

i require difgerent

for almost-sure convergence…

  • i IID Uniform,

1

  • i IID Gaussian,

2

  • i IID Cauchy,

2

For practical computation we truncate to N terms.

17

slide-36
SLIDE 36

Prior Construction

Assume X has a countable basis {ϕi}, i = 0, . . . , ∞. Then for any u ∈ X u(x) =

i=0

γiξiϕi(x) Difgerent ξi require difgerent γ for almost-sure convergence…

  • ξi IID Uniform, γ ∈ ℓ1
  • ξi IID Gaussian, γ ∈ ℓ2
  • ξi IID Cauchy, γ ∈ ℓ2

For practical computation we truncate to N terms.

17

slide-37
SLIDE 37

Prior Construction

Assume X has a countable basis {ϕi}, i = 0, . . . , ∞. Then for any u ∈ X uN(x) =

N

i=0

γiξiϕi(x) Difgerent ξi require difgerent γ for almost-sure convergence…

  • ξi IID Uniform, γ ∈ ℓ1
  • ξi IID Gaussian, γ ∈ ℓ2
  • ξi IID Cauchy, γ ∈ ℓ2

For practical computation we truncate to N terms.

17

slide-38
SLIDE 38

Three Considerations

Numerical Disintegration Prior Truncation Sampler Convergence

18

slide-39
SLIDE 39

Convergence, but in what metric?

All results show weak convergence framed in terms of an abstract integral probability metric1: dF(ν, ν′) = sup

∥f∥F≤1

  • ν(f) − ν′(f)
  • Results are generic to A u ,

. Examples: Total Variation, Wasserstein

1Müller [1997]

19

slide-40
SLIDE 40

Convergence, but in what metric?

All results show weak convergence framed in terms of an abstract integral probability metric1: dF(ν, ν′) = sup

∥f∥F≤1

  • ν(f) − ν′(f)
  • Results are generic to A(u), µ.

Examples: Total Variation, Wasserstein

1Müller [1997]

19

slide-41
SLIDE 41

Convergence, but in what metric?

All results show weak convergence framed in terms of an abstract integral probability metric1: dF(ν, ν′) = sup

∥f∥F≤1

  • ν(f) − ν′(f)
  • Results are generic to A(u), µ.

Examples: Total Variation, Wasserstein

1Müller [1997]

19

slide-42
SLIDE 42

Convergence of µa

δ

Theorem Assume that dF(µa, µa′) ≤ Cµ

  • a − a′

α for some Cµ, α constant and A#µ-almost-all a, a′ ∈ A. Then, for small d

a a

C 1 C for A

  • almost-all a

20

slide-43
SLIDE 43

Convergence of µa

δ

Theorem Assume that dF(µa, µa′) ≤ Cµ

  • a − a′

α for some Cµ, α constant and A#µ-almost-all a, a′ ∈ A. Then, for small δ dF(µa

δ, µa) ≤ Cµ (1 + Cφ) δα

for A#µ-almost-all a ∈ A

20

slide-44
SLIDE 44

Total Error

Denote by µa

δ,N the posterior distribution given by

dµa

δ,N

dµ ∝ ϕ (∥A ◦ PN(u) − a∥A δ )

21

slide-45
SLIDE 45

Total Error

Denote by µa

δ,N the posterior distribution given by

dµa

δ,N

dµ ∝ ϕ (∥A ◦ PN(u) − a∥A δ )

21

slide-46
SLIDE 46

Total Error

Denote by µa

δ,N the posterior distribution given by

dµa

δ,N

dµ ∝ ϕ (∥A ◦ PN(u) − a∥A δ ) Assume that ∥A(u) − A ◦ PN(u)∥ ≤ exp(m ∥u∥X )Φ(N)

21

slide-47
SLIDE 47

Total Error

Denote by µa

δ,N the posterior distribution given by

dµa

δ,N

dµ ∝ ϕ (∥A ◦ PN(u) − a∥A δ ) Assume that ∥A(u) − A ◦ PN(u)∥ ≤ exp(m ∥u∥X )Φ(N) Then under certain assumptions it can be shown2 that: dF(µa, µa

δ,N) ≤ Cµ(1 + Cφ)δα + CδΦ(N)

2Cockayne et al. [2017]

21

slide-48
SLIDE 48

Total Error

Denote by µa

δ,N the posterior distribution given by

dµa

δ,N

dµ ∝ ϕ (∥A ◦ PN(u) − a∥A δ ) Assume that ∥A(u) − A ◦ PN(u)∥ ≤ exp(m ∥u∥X )Φ(N) Then under certain assumptions it can be shown2 that: dF(µa, µa

δ,N) ≤ Cµ(1 + Cφ)δα + CδΦ(N)

Thus, we have convergence with δ provided CδΦ(N) is controlled.

2Cockayne et al. [2017]

21

slide-49
SLIDE 49

Numerical Disintegration

Numerical Example

slide-50
SLIDE 50

Painlevé’s First Transcendental

u′′(x) − u(x)2 = −x u(0) = 0 u(x) → √x as x → ∞

22

slide-51
SLIDE 51

Painlevé’s First Transcendental

u′′(x) − u(x)2 = −x u(0) = 0 u(10) = √ 10

2 4 6 8 10

t

3 2 1 1 2 3 4

x(t)

10 20 30 40 50 60 70 80

i

10-14 10-12 10-10 10-8 10-6 10-4 10-2 100

un

Positive Negative

22

slide-52
SLIDE 52

Painlevé’s First Transcendental

u′′(x) − u(x)2 = −x u(0) = 0 u(10) = √ 10 We use ϕ(x) = exp(−x2), and defjne a schedule of 1600 δ from 10 to 10−4. Following results are based on equi-spaced ti, i = 1, . . . , 15, and generated with an SMC algorithm based upon a Cauchy prior.

22

slide-53
SLIDE 53

Painlevé: Posterior Measures

23

slide-54
SLIDE 54

Painlevé: Posterior Measures

23

slide-55
SLIDE 55

Painlevé: Posterior Measures

23

slide-56
SLIDE 56

Painlevé: Posterior Measures

23

slide-57
SLIDE 57

Pipelines

slide-58
SLIDE 58

Example: Split Integration

∫ 1 u(x)dx = ∫ 0.5 u(x)dx + ∫ 1

0.5

u(x)dx Observations {u(x1), . . . , u(x2m)}, where u1 = 0, um = 0.5, u2m = 1 u x1 u xm

1

u xm u xm

1

u x2m B1 B2

0 5

u x dt

1 0 5 u x dt

B3

1 0 u x dt

When is the output of the pipeline Bayesian?

24

slide-59
SLIDE 59

Example: Split Integration

∫ 1 u(x)dx = ∫ 0.5 u(x)dx + ∫ 1

0.5

u(x)dx Observations {u(x1), . . . , u(x2m)}, where u1 = 0, um = 0.5, u2m = 1 u(x1), . . . , u(xm−1) u(xm) u(xm+1), . . . , u(x2m) B1(µ, ·) B2(µ, ·) ∫ 0.5 u(x)dt ∫ 1

0.5 u(x)dt

B3(µ, ·) ∫ 1

0 u(x)dt

When is the output of the pipeline Bayesian?

24

slide-60
SLIDE 60

Example: Split Integration

∫ 1 u(x)dx = ∫ 0.5 u(x)dx + ∫ 1

0.5

u(x)dx Observations {u(x1), . . . , u(x2m)}, where u1 = 0, um = 0.5, u2m = 1 u(x1), . . . , u(xm−1) u(xm) u(xm+1), . . . , u(x2m) B1(µ, ·) B2(µ, ·) ∫ 0.5 u(x)dt ∫ 1

0.5 u(x)dt

B3(µ, ·) ∫ 1

0 u(x)dt

When is the output of the pipeline Bayesian?

24

slide-61
SLIDE 61

Dependence Graphs

The abstract structure of the graph allows us to establish a coherence condition 1 2 3 1 2 1 2 1 2 Pipeline The dependency graph of a pipeline is obtained by deleting the method nodes and connecting their inputs directly to their outputs.

25

slide-62
SLIDE 62

Dependence Graphs

The abstract structure of the graph allows us to establish a coherence condition 1 2 3 4 5 6 Dependency Graph The dependency graph of a pipeline is obtained by deleting the method nodes and connecting their inputs directly to their outputs.

25

slide-63
SLIDE 63

Coherence

1 2 3 4 5 6 Defjnition A prior is coherent for the dependency graph if Yk is conditionally independent of Yi given Yj. Here i, j < k, i are non-parent nodes and j are parent nodes.

26

slide-64
SLIDE 64

Bayesian Pipelines

Theorem A pipeline is Bayesian for its output QoI if:

  • 1. The prior is coherent for the dependence graph.
  • 2. The composite PNM are Bayesian.

27

slide-65
SLIDE 65

Split Integration: Coherence

u(x1), . . . , u(xm−1) u(xm) u(xm+1), . . . , u(x2m) B1(µ, ·) B2(µ, ·) ∫ 0.5 u(x)dt ∫ 1

0.5 u(x)dt

B3(µ, ·) ∫ 1

0 u(x)dt

Is

1 0 5 u x dt independent of u x1

u xm

1 ?

Sometimes - e.g. a Wiener process prior. Sometimes not - e.g. if implies a Wiener process on u s x .

28

slide-66
SLIDE 66

Split Integration: Coherence

u(x1), . . . , u(xm−1) u(xm) u(xm+1), . . . , u(x2m) B1(µ, ·) B2(µ, ·) ∫ 0.5 u(x)dt ∫ 1

0.5 u(x)dt

B3(µ, ·) ∫ 1

0 u(x)dt

Is ∫ 1

0.5 u(x) dt independent of u(x1), . . . , u(xm−1)?

Sometimes - e.g. a Wiener process prior. Sometimes not - e.g. if implies a Wiener process on u s x .

28

slide-67
SLIDE 67

Split Integration: Coherence

u(x1), . . . , u(xm−1) u(xm) u(xm+1), . . . , u(x2m) B1(µ, ·) B2(µ, ·) ∫ 0.5 u(x)dt ∫ 1

0.5 u(x)dt

B3(µ, ·) ∫ 1

0 u(x)dt

Is ∫ 1

0.5 u(x) dt independent of u(x1), . . . , u(xm−1)?

Sometimes - e.g. a Wiener process prior. Sometimes not - e.g. if implies a Wiener process on u s x .

28

slide-68
SLIDE 68

Split Integration: Coherence

u(x1), . . . , u(xm−1) u(xm) u(xm+1), . . . , u(x2m) B1(µ, ·) B2(µ, ·) ∫ 0.5 u(x)dt ∫ 1

0.5 u(x)dt

B3(µ, ·) ∫ 1

0 u(x)dt

Is ∫ 1

0.5 u(x) dt independent of u(x1), . . . , u(xm−1)?

Sometimes - e.g. a Wiener process prior. Sometimes not - e.g. if µ implies a Wiener process on u(s)(x).

28

slide-69
SLIDE 69

Conclusions

slide-70
SLIDE 70

Conclusions

We have seen…

  • A method for approximately sampling from µa.
  • Theoretical results proving asymptotic convergence of that sampler.
  • Coherence conditions for composing Bayesian PNM into a Bayesian pipeline.

29

slide-71
SLIDE 71

More to come

Numerical disintegration is highly ineffjcient compared to classical numerical methods (and even conjugate PNM). This is not intended to be a practical numerical method, but a framework for comparing other “approximate Bayesian” methods to the ideal. Next steps:

  • Make the algorithm more effjcient?
  • Explore more effjcient approximations to the posterior

30

slide-72
SLIDE 72

More to come

Numerical disintegration is highly ineffjcient compared to classical numerical methods (and even conjugate PNM). This is not intended to be a practical numerical method, but a framework for comparing other “approximate Bayesian” methods to the ideal. Next steps:

  • Make the algorithm more effjcient?
  • Explore more effjcient approximations to the posterior

30

slide-73
SLIDE 73

More to come

Numerical disintegration is highly ineffjcient compared to classical numerical methods (and even conjugate PNM). This is not intended to be a practical numerical method, but a framework for comparing other “approximate Bayesian” methods to the ideal. Next steps:

  • Make the algorithm more effjcient?
  • Explore more effjcient approximations to the posterior

30

slide-74
SLIDE 74

Thanks!

30

slide-75
SLIDE 75

References I

  • J. Cockayne, C. Oates, T. Sullivan, and M. Girolami. Bayesian probabilistic numerical methods, 2017.
  • A. Müller. Integral probability metrics and their generating classes of functions. Adv. in Appl. Probab., 29(2):429–443, 1997. ISSN 0001-8678. doi:

10.2307/1428011. URL http://dx.doi.org/10.2307/1428011.

31