Bayesian Probabilistic Numerical Methods Numerical Disintegration - - PowerPoint PPT Presentation
Bayesian Probabilistic Numerical Methods Numerical Disintegration - - PowerPoint PPT Presentation
Bayesian Probabilistic Numerical Methods Numerical Disintegration and Pipelines Jon Cockayne June 6, 2017 1 (Re)introduction (Re)introduction Prior 2 u Data A ( u ) = a Q # a Information Equation Posterior
(Re)introduction
(Re)introduction
“Data” A(u) = a u ∼ µ Q#µa “Prior” “Information Equation” “Posterior”
2
Q1: How can we access µa?
2
(Re)introduction
Unless probabilistic numerical methods “agree” about what their uncertainty means, they cannot be composed coherently.
3
Modelling Electro-Mechanics in the Heart
4
Modelling Electro-Mechanics in the Heart
Ca Flux during Caffeine Ca transient Fit NCX Model Fit RyR Model Ca Flux during tail of Field Stimulation Ca Transient. Less Ca Flux through NCX (calculated) Fit SERCA Model ICaL Voltage clamp traces Fit ICaL Model Ca flux during start of Field Stimulation Ca
- Transient. Less Ca Flux through NCX,
SERCA and ICaL (calculated)
5
Q2: when is it “legal” to compose Bayesian PNM in pipelines?
5
Numerical Disintegration
Numerical Disintegration
Recall, the issue: X a = {u ∈ X : A(u) = a} µ(Xa) = 0
which means…
d
a
d
6
Numerical Disintegration
Recall, the issue: X a = {u ∈ X : A(u) = a} µ(Xa) = 0
which means…
∄ dµa dµ
6
Our Approach
Design an algorithm for approximately sampling µa. Two sources of error
- Intractability of
a (“Numerical Disintegration”)
- Intractability of non-Gaussian priors (“prior truncation”)
7
Our Approach
Design an algorithm for approximately sampling µa. Two sources of error
- Intractability of µa (“Numerical Disintegration”)
- Intractability of non-Gaussian priors (“prior truncation”)
7
Our Approach
Design an algorithm for approximately sampling µa. Two sources of error
- Intractability of µa (“Numerical Disintegration”)
- Intractability of non-Gaussian priors (“prior truncation”)
7
Three Considerations
Numerical Disintegration Prior Truncation
8
Three Considerations
Numerical Disintegration Prior Truncation Sampler Convergence
8
Three Considerations
Numerical Disintegration Prior Truncation Sampler Convergence
8
Numerical Disintegration
Introduce the δ-relaxed measure µa
δ…
dµa
δ
dµ ∝ ϕ (∥A(u) − a∥A δ ) a relaxation function chosen so that:
- 1
- r
0 as r .
9
Numerical Disintegration
Introduce the δ-relaxed measure µa
δ…
dµa
δ
dµ ∝ ϕ (∥A(u) − a∥A δ ) ϕ : R+ → R+ a relaxation function chosen so that:
- ϕ(0) = 1
- ϕ(r) → 0 as r → ∞.
9
Numerical Disintegration: Intuition
“Ideal” Radon–Nikodym derivative “dµa dµ ∝ I(u ∈ X a)”
10
Example Relaxation Functions
ϕ(r) = I(r < 1) ϕ(r) = exp(−r2)
11
a 1 a
Example Relaxation Functions
ϕ(r) = I(r < 1) Uniform noise over Bδ(a) ϕ(r) = exp(−r2) Gaussian noise with s.d. ∝ δ
11
a 1 a
Tempering for Sampling µa
δ
To sample µa
δ we take inspiration from rare event simulation and use tempering
schemes to sample the posterior. Set
1 N and consider a a
1
a
N
- a
0 is the prior and easy to sample.
- a
N has
N close to zero and is hard to sample.
- Intermediate distributions defjne a “ladder” which takes us from prior to posterior.
12
Tempering for Sampling µa
δ
To sample µa
δ we take inspiration from rare event simulation and use tempering
schemes to sample the posterior. Set δ0 > δ1 > . . . > δN and consider µa
δ0, µa δ1, . . . , µa δN
- a
0 is the prior and easy to sample.
- a
N has
N close to zero and is hard to sample.
- Intermediate distributions defjne a “ladder” which takes us from prior to posterior.
12
Tempering for Sampling µa
δ
To sample µa
δ we take inspiration from rare event simulation and use tempering
schemes to sample the posterior. Set δ0 > δ1 > . . . > δN and consider µa
δ0, µa δ1, . . . , µa δN
- µa
δ0 is the prior and easy to sample.
- µa
δN has δN close to zero and is hard to sample.
- Intermediate distributions defjne a “ladder” which takes us from prior to posterior.
12
Example: Poisson’s Equation
Consider − d2 dx2 u(x)= sin(2πx) x ∈ (0, 1) u(x)= 0 x = 0, x = 1
- Use a Gaussian prior on u x .
- Impose boundary conditions explicitly.
- Impose interior conditions at x
1 3, x 2 3.
- Construct the posterior using ND with
1 0 10
2 10 4 .
- Use
r exp r2 .
13
Example: Poisson’s Equation
Consider − d2 dx2 u(x)= sin(2πx) x ∈ (0, 1) u(x)= 0 x = 0, x = 1
- Use a Gaussian prior on u(x).
- Impose boundary conditions explicitly.
- Impose interior conditions at x
1 3, x 2 3.
- Construct the posterior using ND with
1 0 10
2 10 4 .
- Use
r exp r2 .
13
Example: Poisson’s Equation
Consider − d2 dx2 u(x)= sin(2πx) x ∈ (0, 1) u(x)= 0 x = 0, x = 1
- Use a Gaussian prior on u(x).
- Impose boundary conditions explicitly.
- Impose interior conditions at x = 1/3, x = 2/3.
- Construct the posterior using ND with
1 0 10
2 10 4 .
- Use
r exp r2 .
13
Example: Poisson’s Equation
Consider − d2 dx2 u(x)= sin(2πx) x ∈ (0, 1) u(x)= 0 x = 0, x = 1
- Use a Gaussian prior on u(x).
- Impose boundary conditions explicitly.
- Impose interior conditions at x = 1/3, x = 2/3.
- Construct the posterior using ND with δ ∈
{ 1.0, 10−2, 10−4} .
- Use ϕ(r) = exp(−r2).
13
Example: Poisson’s Equation
In what follows, on the left are samples from the posterior µa
δ in X-space.
On the right are contours of ϕ (∥A(u) − a∥A δ ) in A-space.
14
Example: Poisson’s Equation
0.0 0.2 0.4 0.6 0.8 1.0 0.100 0.075 0.050 0.025 0.000 0.025 0.050 0.075 0.100 0.4 0.6 0.8 1.0 1.2 1.2 1.0 0.8 0.6 0.4
= 1.0
15
Example: Poisson’s Equation
0.0 0.2 0.4 0.6 0.8 1.0 0.100 0.075 0.050 0.025 0.000 0.025 0.050 0.075 0.100 0.4 0.6 0.8 1.0 1.2 1.2 1.0 0.8 0.6 0.4
= 0.01
15
Example: Poisson’s Equation
0.0 0.2 0.4 0.6 0.8 1.0 0.100 0.075 0.050 0.025 0.000 0.025 0.050 0.075 0.100 0.4 0.6 0.8 1.0 1.2 1.2 1.0 0.8 0.6 0.4
= 0.0001
15
Three Considerations
Numerical Disintegration Prior Truncation Sampler Convergence
16
Prior Construction
Assume X has a countable basis {ϕi}, i = 0, . . . , ∞. Then for any u ∈ X u(x) =
∞
∑
i=0
uiϕi(x) Difgerent
i require difgerent
for almost-sure convergence…
- i IID Uniform,
1
- i IID Gaussian,
2
- i IID Cauchy,
2
For practical computation we truncate to N terms.
17
Prior Construction
Assume X has a countable basis {ϕi}, i = 0, . . . , ∞. Then for any u ∈ X u(x) =
∞
∑
i=0
γiξiϕi(x) Difgerent ξi require difgerent γ for almost-sure convergence…
- ξi IID Uniform, γ ∈ ℓ1
- ξi IID Gaussian, γ ∈ ℓ2
- ξi IID Cauchy, γ ∈ ℓ2
For practical computation we truncate to N terms.
17
Prior Construction
Assume X has a countable basis {ϕi}, i = 0, . . . , ∞. Then for any u ∈ X uN(x) =
N
∑
i=0
γiξiϕi(x) Difgerent ξi require difgerent γ for almost-sure convergence…
- ξi IID Uniform, γ ∈ ℓ1
- ξi IID Gaussian, γ ∈ ℓ2
- ξi IID Cauchy, γ ∈ ℓ2
For practical computation we truncate to N terms.
17
Three Considerations
Numerical Disintegration Prior Truncation Sampler Convergence
18
Convergence, but in what metric?
All results show weak convergence framed in terms of an abstract integral probability metric1: dF(ν, ν′) = sup
∥f∥F≤1
- ν(f) − ν′(f)
- Results are generic to A u ,
. Examples: Total Variation, Wasserstein
1Müller [1997]
19
Convergence, but in what metric?
All results show weak convergence framed in terms of an abstract integral probability metric1: dF(ν, ν′) = sup
∥f∥F≤1
- ν(f) − ν′(f)
- Results are generic to A(u), µ.
Examples: Total Variation, Wasserstein
1Müller [1997]
19
Convergence, but in what metric?
All results show weak convergence framed in terms of an abstract integral probability metric1: dF(ν, ν′) = sup
∥f∥F≤1
- ν(f) − ν′(f)
- Results are generic to A(u), µ.
Examples: Total Variation, Wasserstein
1Müller [1997]
19
Convergence of µa
δ
Theorem Assume that dF(µa, µa′) ≤ Cµ
- a − a′
α for some Cµ, α constant and A#µ-almost-all a, a′ ∈ A. Then, for small d
a a
C 1 C for A
- almost-all a
20
Convergence of µa
δ
Theorem Assume that dF(µa, µa′) ≤ Cµ
- a − a′
α for some Cµ, α constant and A#µ-almost-all a, a′ ∈ A. Then, for small δ dF(µa
δ, µa) ≤ Cµ (1 + Cφ) δα
for A#µ-almost-all a ∈ A
20
Total Error
Denote by µa
δ,N the posterior distribution given by
dµa
δ,N
dµ ∝ ϕ (∥A ◦ PN(u) − a∥A δ )
21
Total Error
Denote by µa
δ,N the posterior distribution given by
dµa
δ,N
dµ ∝ ϕ (∥A ◦ PN(u) − a∥A δ )
21
Total Error
Denote by µa
δ,N the posterior distribution given by
dµa
δ,N
dµ ∝ ϕ (∥A ◦ PN(u) − a∥A δ ) Assume that ∥A(u) − A ◦ PN(u)∥ ≤ exp(m ∥u∥X )Φ(N)
21
Total Error
Denote by µa
δ,N the posterior distribution given by
dµa
δ,N
dµ ∝ ϕ (∥A ◦ PN(u) − a∥A δ ) Assume that ∥A(u) − A ◦ PN(u)∥ ≤ exp(m ∥u∥X )Φ(N) Then under certain assumptions it can be shown2 that: dF(µa, µa
δ,N) ≤ Cµ(1 + Cφ)δα + CδΦ(N)
2Cockayne et al. [2017]
21
Total Error
Denote by µa
δ,N the posterior distribution given by
dµa
δ,N
dµ ∝ ϕ (∥A ◦ PN(u) − a∥A δ ) Assume that ∥A(u) − A ◦ PN(u)∥ ≤ exp(m ∥u∥X )Φ(N) Then under certain assumptions it can be shown2 that: dF(µa, µa
δ,N) ≤ Cµ(1 + Cφ)δα + CδΦ(N)
Thus, we have convergence with δ provided CδΦ(N) is controlled.
2Cockayne et al. [2017]
21
Numerical Disintegration
Numerical Example
Painlevé’s First Transcendental
u′′(x) − u(x)2 = −x u(0) = 0 u(x) → √x as x → ∞
22
Painlevé’s First Transcendental
u′′(x) − u(x)2 = −x u(0) = 0 u(10) = √ 10
2 4 6 8 10
t
3 2 1 1 2 3 4
x(t)
10 20 30 40 50 60 70 80
i
10-14 10-12 10-10 10-8 10-6 10-4 10-2 100
un
Positive Negative
22
Painlevé’s First Transcendental
u′′(x) − u(x)2 = −x u(0) = 0 u(10) = √ 10 We use ϕ(x) = exp(−x2), and defjne a schedule of 1600 δ from 10 to 10−4. Following results are based on equi-spaced ti, i = 1, . . . , 15, and generated with an SMC algorithm based upon a Cauchy prior.
22
Painlevé: Posterior Measures
23
Painlevé: Posterior Measures
23
Painlevé: Posterior Measures
23
Painlevé: Posterior Measures
23
Pipelines
Example: Split Integration
∫ 1 u(x)dx = ∫ 0.5 u(x)dx + ∫ 1
0.5
u(x)dx Observations {u(x1), . . . , u(x2m)}, where u1 = 0, um = 0.5, u2m = 1 u x1 u xm
1
u xm u xm
1
u x2m B1 B2
0 5
u x dt
1 0 5 u x dt
B3
1 0 u x dt
When is the output of the pipeline Bayesian?
24
Example: Split Integration
∫ 1 u(x)dx = ∫ 0.5 u(x)dx + ∫ 1
0.5
u(x)dx Observations {u(x1), . . . , u(x2m)}, where u1 = 0, um = 0.5, u2m = 1 u(x1), . . . , u(xm−1) u(xm) u(xm+1), . . . , u(x2m) B1(µ, ·) B2(µ, ·) ∫ 0.5 u(x)dt ∫ 1
0.5 u(x)dt
B3(µ, ·) ∫ 1
0 u(x)dt
When is the output of the pipeline Bayesian?
24
Example: Split Integration
∫ 1 u(x)dx = ∫ 0.5 u(x)dx + ∫ 1
0.5
u(x)dx Observations {u(x1), . . . , u(x2m)}, where u1 = 0, um = 0.5, u2m = 1 u(x1), . . . , u(xm−1) u(xm) u(xm+1), . . . , u(x2m) B1(µ, ·) B2(µ, ·) ∫ 0.5 u(x)dt ∫ 1
0.5 u(x)dt
B3(µ, ·) ∫ 1
0 u(x)dt
When is the output of the pipeline Bayesian?
24
Dependence Graphs
The abstract structure of the graph allows us to establish a coherence condition 1 2 3 1 2 1 2 1 2 Pipeline The dependency graph of a pipeline is obtained by deleting the method nodes and connecting their inputs directly to their outputs.
25
Dependence Graphs
The abstract structure of the graph allows us to establish a coherence condition 1 2 3 4 5 6 Dependency Graph The dependency graph of a pipeline is obtained by deleting the method nodes and connecting their inputs directly to their outputs.
25
Coherence
1 2 3 4 5 6 Defjnition A prior is coherent for the dependency graph if Yk is conditionally independent of Yi given Yj. Here i, j < k, i are non-parent nodes and j are parent nodes.
26
Bayesian Pipelines
Theorem A pipeline is Bayesian for its output QoI if:
- 1. The prior is coherent for the dependence graph.
- 2. The composite PNM are Bayesian.
27
Split Integration: Coherence
u(x1), . . . , u(xm−1) u(xm) u(xm+1), . . . , u(x2m) B1(µ, ·) B2(µ, ·) ∫ 0.5 u(x)dt ∫ 1
0.5 u(x)dt
B3(µ, ·) ∫ 1
0 u(x)dt
Is
1 0 5 u x dt independent of u x1
u xm
1 ?
Sometimes - e.g. a Wiener process prior. Sometimes not - e.g. if implies a Wiener process on u s x .
28
Split Integration: Coherence
u(x1), . . . , u(xm−1) u(xm) u(xm+1), . . . , u(x2m) B1(µ, ·) B2(µ, ·) ∫ 0.5 u(x)dt ∫ 1
0.5 u(x)dt
B3(µ, ·) ∫ 1
0 u(x)dt
Is ∫ 1
0.5 u(x) dt independent of u(x1), . . . , u(xm−1)?
Sometimes - e.g. a Wiener process prior. Sometimes not - e.g. if implies a Wiener process on u s x .
28
Split Integration: Coherence
u(x1), . . . , u(xm−1) u(xm) u(xm+1), . . . , u(x2m) B1(µ, ·) B2(µ, ·) ∫ 0.5 u(x)dt ∫ 1
0.5 u(x)dt
B3(µ, ·) ∫ 1
0 u(x)dt
Is ∫ 1
0.5 u(x) dt independent of u(x1), . . . , u(xm−1)?
Sometimes - e.g. a Wiener process prior. Sometimes not - e.g. if implies a Wiener process on u s x .
28
Split Integration: Coherence
u(x1), . . . , u(xm−1) u(xm) u(xm+1), . . . , u(x2m) B1(µ, ·) B2(µ, ·) ∫ 0.5 u(x)dt ∫ 1
0.5 u(x)dt
B3(µ, ·) ∫ 1
0 u(x)dt
Is ∫ 1
0.5 u(x) dt independent of u(x1), . . . , u(xm−1)?
Sometimes - e.g. a Wiener process prior. Sometimes not - e.g. if µ implies a Wiener process on u(s)(x).
28
Conclusions
Conclusions
We have seen…
- A method for approximately sampling from µa.
- Theoretical results proving asymptotic convergence of that sampler.
- Coherence conditions for composing Bayesian PNM into a Bayesian pipeline.
29
More to come
Numerical disintegration is highly ineffjcient compared to classical numerical methods (and even conjugate PNM). This is not intended to be a practical numerical method, but a framework for comparing other “approximate Bayesian” methods to the ideal. Next steps:
- Make the algorithm more effjcient?
- Explore more effjcient approximations to the posterior
30
More to come
Numerical disintegration is highly ineffjcient compared to classical numerical methods (and even conjugate PNM). This is not intended to be a practical numerical method, but a framework for comparing other “approximate Bayesian” methods to the ideal. Next steps:
- Make the algorithm more effjcient?
- Explore more effjcient approximations to the posterior
30
More to come
Numerical disintegration is highly ineffjcient compared to classical numerical methods (and even conjugate PNM). This is not intended to be a practical numerical method, but a framework for comparing other “approximate Bayesian” methods to the ideal. Next steps:
- Make the algorithm more effjcient?
- Explore more effjcient approximations to the posterior
30
Thanks!
30
References I
- J. Cockayne, C. Oates, T. Sullivan, and M. Girolami. Bayesian probabilistic numerical methods, 2017.
- A. Müller. Integral probability metrics and their generating classes of functions. Adv. in Appl. Probab., 29(2):429–443, 1997. ISSN 0001-8678. doi:
10.2307/1428011. URL http://dx.doi.org/10.2307/1428011.