Bayesian inference in dynamic modeling of biological systems (a - - PowerPoint PPT Presentation
Bayesian inference in dynamic modeling of biological systems (a - - PowerPoint PPT Presentation
Bayesian inference in dynamic modeling of biological systems (a modeling practitioners perspective) Karl Thomaseth National Research Council Institute of Biomedical Engineering, ISIB-CNR Padova, Italy 3 H-Glucose - 3 H-H 2 O Kinetics Early
3H-Glucose - 3H-H2O Kinetics
Early Estimation Approach
- Nonlinear least squares
ˆ ϑ = arg min
ϑ n
- i=1
wi(yi − f(ti, ϑ))2
- Justification
yi = f(ti, ϑ) + e(ti); e(ti) ∼ N(0, σ2(ti)) log p(y|ϑ) = const. − 1 2
n
- i=1
(yi − f(ti, ϑ))2 σ2(ti) if wi = σ−2(ti) (known !) = ⇒ ˆ ϑ = ˆ ϑ(ML).
Computations
- Restricted step Gauss-Newton, aka Levenberg-Marquardt
∇ϑ log p(y|ϑ) =
n
- i=1
wi(yi − f(ti, ϑ))∇ϑf(ti, ϑ) ∇2
ϑ2 log p(y|ϑ)
= −
n
- i=1
wi∇ϑf(ti, ϑ)∇ϑf(ti, ϑ)′ (+ . . .)
- Iterations
ˆ ϑk+1 = ˆ ϑk − [ ˜ ∇2
ϑ2 log p(y|ϑk) + λI]−1∇ϑ log p(y|ϑk)
- Estimator variance (Cramer-Rao LB)
var(ˆ ϑ) ≥
- E[−∇2
ϑ2 log p(y|ϑ)]
−1 ⇒ −[ ˜ ∇2
ϑ2 log p(y|ˆ
ϑ)]−1
Maximum a Posteriori
- Prior - Posterior distributions
p(ϑ) ∼ N(ϑ0, Σ0) p(ϑ|y) ∝ p(y|ϑ)p(ϑ) log p(ϑ|y) = −1 2 n
- i=1
(yi − f(ti, ϑ))2 σ2(ti) + (ϑ − ϑ0)′Σ−1
0 (ϑ − ϑ0)
- + c.
- Implications
∇ϑ log p(ϑ|y) =
n
- i=1
wi(yi − f(ti, ϑ))∇ϑf(ti, ϑ) − Σ−1
0 (ϑ − ϑ0)
˜ ∇2
ϑ2 log p(ϑ|y)
= − n
- i=1
wi∇ϑf(ti, ϑ)∇ϑf(ti, ϑ)′ + Σ−1
- < 0
Dynamic Modeling Classes
- Regression (black box) models, e.g.
f(t, ϑ) = Ae−α t + Be−β t
- Comprehensive models based on physi(ologi)cal principles
– conservation of mass ⇒ compartmental models – conservation of energy (chemical, mechanical, thermal) ⇒ multiphysics models – anatomy (circulation), metabolic pathways
- Minimal models
– highly aggregated physiological models – identifiability of parameters
PBPK - Models
Body Fluid Compartments
7.5%
Plasma
20%
Interstitial- Lymph
55%
Intracellular
7.5%
Dense CT - Cartilage
7.5%
Bone
2.5%
Transcellular
Two Compartment Approximation
7.5%
Plasma
20%
Interstitial- Lymph
55%
Intracellular
7.5%
Dense CT - Cartilage
7.5%
Bone
2.5%
Transcellular
Extracellular Intracellular
Model Equivalence
Qc Q1 Q2
1 2
Cart C1 C2 Cl1 Cl2 Qc Q1 Q2
1 2
Cart C1 C2 Cl1 Cl2 Q3 Cl3 C3 kidney Cl1 Cl2
u y
Cl12
1 2
k12 k21 k01 k02
1 2 u y A B C D
Bayesian inference: Example
Modeling Population Kinetics of Free Fatty Acids in Isolated Rat Hepatocytes using Markov Chain Monte Carlo
Alessandra Pavan, Karl Thomaseth Institute of Biomedical Engineering, Padova, Italy Anna Valerio Department of Clinical and Experimental Medicine, University of Padova, Italy
Cellular Fuels - Glucose
Glycogen
Liver BLOOD Muscle Adipocyte
Triacylglycerol Protein Glycerol Glycerol
Gluconeogenic precursors Amino acids Amino acids
Acetyl CoA Acetyl CoA TCA TCA Glucose 6-P Acetyl CoA TCA Glucose Glucose Pyruvate Glucose Glucose Fatty acids Fatty acids Fatty acids Fatty acids Fatty acids
Ketone bodies Ketone bodies Ketone bodies Ketone bodies HSL
Insulin
– – +
Pancreas
+
ATP
Cellular Fuels - Free Fatty Acids
Glycogen
Liver BLOOD Muscle Adipocyte
Triacylglycerol Protein Glycerol Glycerol
Gluconeogenic precursors Amino acids Amino acids
Acetyl CoA Acetyl CoA TCA TCA Glucose 6-P Glucose Glucose Pyruvate Glucose Glucose Acetyl CoA TCA Fatty acids Fatty acids Fatty acids Fatty acids Fatty acids
Ketone bodies Ketone bodies Ketone bodies Ketone bodies HSL
Insulin
– – +
Pancreas
+
ATP
Cellular Fuels - Control
Glycogen
Liver BLOOD Muscle Adipocyte
Triacylglycerol Protein Glycerol Glycerol
Gluconeogenic precursors Amino acids Amino acids
Acetyl CoA Acetyl CoA TCA TCA Glucose 6-P Acetyl CoA TCA Pyruvate Glucose Glucose Fatty acids Fatty acids Fatty acids Fatty acids Fatty acids
Ketone bodies Ketone bodies Ketone bodies Ketone bodies HSL
Insulin
– – +
Pancreas
Glucose Glucose
+
ATP
Hormonal Control of Ketogenesis
triglyceride
adipose tissue liver blood
FFA FFA ketone bodies triglycerides phospholipids
epinephrine insulin glucagon &
Experimental Protocol
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6
Protocol
Oleate (FFA) no yes yes no yes yes Epinephrine no no yes yes no yes Insulin no no no no yes yes
N
0.25 mmol/L (N=18) 0.5 mmol/L (N=23) 1.0 mmol/L (N=13)
Oleate Dose
mmol/L (N=30)
Ketone Body Production
FFA
ß - Oxidation
k1 k2 k3 k4
FFA AcAc BOH
k1 k2 k3 k4
Dose FFA AcAc + BOH
k1 k2
Dose
≈
Experimental Data - Examples
50 100 0.5 1 1.5 50 100 0.5 1 1.5 50 100 0.5 1 50 100 1 2 50 100 1 2 50 100 0.5 1 50 100 1 2 50 100 1 2 50 100 0.2 0.4 50 100 0.2 0.4 50 100 0.5 1 50 100 0.2 0.4 50 100 0.1 0.2 50 100 0.5 1 50 100 0.5 1 50 100 0.2 0.4
D=0 D=0.5 D=1 D=0.25 mmol/L mmol/L mmol/L mmol/L
min min min min
Minimal FFA – KB Kinetic Model
- System dynamics with known + random FFA dose
˙ x1(t) = −(k1 + k2) x1(t) ; x1(0) = D + b ˙ x2(t) = 4 k2 x1(t) ; x2(0) = y1(t) = x1(tj) + x1b + ǫ1(t) ; t = {tj}1,...,n y2(t) = x2(t) + x2b + pb t + ǫ2(t)
- Constraints and parameterization
k = k1 + k2 (k > 0) ⇒ ϑk = log k ϕ = k2/(k1 + k2) (0 ≤ ϕ ≤ 1) ⇒ ϑϕ = log(ϕ/(1 − ϕ)) ⇒ ϑ = [log k, logitϕ, log pb, log x1b, log x2b]
- Effects of covaritates
ϑ[k,ϕ,pb,x1b,x2b] = X · µ[k,ϕ,pb,x1b,x2b] ; X = [1, ∆Ncells, D, IIns, IEpi]
Hierarchical Population Kinetic Model
- 3. Prior distributions
p(µ, Σ, σ2) = p(µ) p(Σ) p(σ2)
- 2. Between-individual variability
ϑi ∼ p(µ, Xi, Σ) ∼ N(Xi · µ, Σ)
- 1. Within-individual variability
yij ∼ p(y|ϑi, Di, tij, σ2) ∼ N(f(ϑi, Di, tij), σ2)
2
θ i yij Di t ij X i µ Σ
Hyperparameters
σ
Bayesian Inference
- Joint Posterior Probability
π(ϑ, µ, σ2, Σ
- x
|y) ∝ N
i
ni
j p(yij|ϑi, σ2)
- p(ϑi|µ, Σ)×
× p(µ) p(σ2) p(Σ)
- Paradigm: inference, e.g. mean, SD, median, 95% CI,
based on marginal distributions: π(ϑ|y), π(µ|y), . . .
- χ
g(x)π(x|y) dx = Eπ (g(x))
- Monte Carlo approach: generate iid random samples
x1, ..., xn, xi ∼ π(x|y) ∀i = 1, . . . , n
1 n n
- i=1
g(xi) − → Eπ(g(x))
Markov Chain Monte Carlo
- Problem: the joint posterior distribution is impossible to
- btain analytically in general (exception conjugate priors)
- Solution: generate, non-independent, Markov chains of
random variables sampled from the full conditional distributions using: – Gibbs sampler – Metropolis Hastings – Mixed algorithms: Gibbs + Metropolis
- Computationally intensive + need to discard initial
samples (burnin) to reach steady-state distributions
Gibbs Sampler: x ∼ π(x|y)
xt = (x1, x2, · · · , xk) ↓ x1 ∼ π(x1|x2, x3, · · · xk, y) ⇓ x2 ∼ π(x2|x1, x3, · · · xk, y) ⇓ x3 ∼ π(x3|x1, x2, · · · xk, y) . . . xk ∼ π(xk|x1, x2, · · · xk−1, y) ↓ xt+1 = (x1, x2, · · · , xk) ∼ π(x1, x2, · · · xk−1, xk|y) & xi ∼ π(xi|y) i = 1, . . . , k
Metropolis Hastings: xi ∼ π(xi|x(−i), y)
- Applicable if full conditional distributions are not standard
– given xt
i
– generate candidate value wi ∼ q(·|xi) – accept (xt+1
i
= wi) with probability α(wi|xi) = min
- 1, π(wi|...) q(xi|wi)
π(xi|...) q(wi|xi)
- Random Walk Metropolis (q(xi|wi) = q(wi|xi))
– wi = xt
i + δi
δi ∼ N(0, Σ) α(wi|xi) = min
- 1, π(wi|...)
π(xi|...)
- – if accepted: xt+1
i
:= xi + δi (α = 1 if π(wi|..) > π(xi|..)) – if rejected: xt+1
i
:= xi
Hierarchical Mixed Effects Model
- First level
yij = f(ϑi, di, Di, tij) + ǫij , ǫij ∼ N(0, Σ1) ϑi = (log ki, logitϕi), di = (log pbi, log x1b, log x2b)
- Second level
ϑi ∼ t − Student(ν, ¯ ϑ, Σ2) ⇒ ∼ N(¯ ϑ, Σ2/λi) p(λi) , λi ∼ Ga(ν
2, ν 2)
di ∼ N( ¯ d, Σ3)
- Third level
Σ−1
l
∼ Wishart
- ρl, (ρlRl)−1
l = 1, . . . , 3 µ ∼ N(c, C) η ∼ N(M, S)
Hierarchy in FFA and KB Model
µ ϑ
2
Σ X d t y D λ η
1
ν c C M S R ρ Σ Σ
3 3 3
R ρ2
2
R ρ1
1
Model Validation (Example)
Results (Example)
mean variance 95% CI k 0.0257 0.00001 0.0185, 0.0362 ϕ 0.3915 0.0054 0.2894, 0.5166 ϑ(1) -3.7707 0.0485
- 4.1258, -3.4447
ϑ(2) -0.3262 0.1552
- 0.9248, 0.2946
µ(0)
1
- 3.6904
0.0022
- 3.7814, -3.6020
µ(1)
1
- 0.0976
0.0038
- 0.2116, 0.0314
µ(0)
2
- 0.5948
0.0086
- 0.7713, -0.4223
µ(1)
2
0.5749 0.0136 0.3458, 0.8057
Modeling Issues
- Traditional
– selection of structure vs complexity – validation
- New topics
– ad-hoc MCMC implementation – choice of sampling method (Gibbs, Metropolis) – choice of statistical model (conjugate prior) – prior information (informative vs non-inf.) – burn-in and convergence diagnostics (mixing) – multiple chain runs
Software – Aims
- to allow an intuitive definition of hierarchical statistical
models of metabolic and pharmacokinetic systems
- to produce efficient simulation and symbolic code for
different software environments, e.g. Fortran, C++, R, L
A
T E X, . . .
Software – Methods
- dictionary based on standard notations used to describe
biological processes and probability distributions
- symbolic generation and manipulation of model equations
- different ad hoc source code formatting procedures
Compartmental Models
- Definitions
compartment x1, x2; inputsignal u1; transfer k(x1, 0), % clearance k(x1, x2), % clearance a(x2, x1), % signal q(0, x1) = u1; % flow measure y1 = x1:c ;
k12 a21 k10 x1 x2 u1 y1
- Equations
˙ x1 = −(k10 + k12) x1 v1 + a21 x2 v2 + u1 ˙ x2 = k12 x1 v1 y1 = x1 v1
Diffusion
- Definitions
conductor kd; connect kd x1 x2;
- r
resistor diff r = 1/kd; connect diff r x1 x2;
x1 x2 kd
- Equations
˙ x1 = kd (x2 v2 − x1 v1 ) ˙ x2 = kd (x1 v1 − x2 v2 )
Electrical/Hydraulic Circuits
- Definitions
node n1, n2 ; capacitor c1, c2 ; resistor r1 ; connect r1 n1 n2 ; connect c1 0 n1, c2 0 n2 ;
n1 n2 c1 c2 r1
- Equations
˙ x1 = 1 r1 (x2 c2 − x1 c1 ) ˙ x2 = 1 r1 (x1 c1 − x2 c2 )
Chemical Reactions
- Power law approximation
na A(pa) + nb B(pb) k1 − → nc C v = k1[A]pa[B]pb ˙ A = −na v ; ˙ B = −nb v ; ˙ C = nc v
- Definitions
compartment a, b, c ; parameter k1, pa, pb ; reaction na a^pa + nb b^pb =k1> nc c
> <
c b a
{na,pa} {nb,pb} {nc} k1
- Example
A + B k1 − → ← − k2 C ≡ a + b <k2==k1> c
Bond Graphs
- Definitions
capacitor c; inertia l; resistor r1, r2; esource v; junction0 j0; junction1 j1; bond v j1, j1 l, j1 r1, j1 j0, j0 c, j0 r2;
L R1 R2 C V V L R1 1 R2 C
- Equations
˙ x1 = x2 l − 1 r2 x1 c ˙ x2 = v −
- r1
x2 l + x1 c
Symbolic Manipulation of Equations
- Automatic generation of model equations
˙ x(t) = f(x, p, u, t) ; x(0) = x0(p) y(t) = g(x, p, u, t)
- Sensitivity differential equations
˙ xipj =
n
- k=1
∂fi(x, p, u, t) ∂xk xkpj + ∂fi(x, p, u, t) ∂pj xipj(0) = ∂x0i(p) ∂pj
- Model output sensitivities
yipj =
n
- k=1
∂gi(x, p, u, t) ∂xk xkpj + ∂gi(x, p, u, t) ∂pj
Probability Distributions
- Definitions
parameter mua=0, rho=2, a=0, b=0.01; dimension mua (2 , 1) ; dimension xcov(ndata , 2), r(2 , 2) ; prior taua ~ wishart( rho , r ) ; prior tau ~ gamma( a , b ) ; prior mu ~ normal( mua , taua ) ; prior y ~ normal( mu , tau , xcov ) ; sample y ;
a
τ
τa µa
µ
y x R ρ b
- Equations
y ∼ N(X · µ, τ) µ ∼ N(µa, τa) τ ∼ Ga(a, b) τa ∼ W(ρ, R)
Directed Acyclic Graphs
- DAGs represent variables (nodes) and relationships (links)
- Joint Probability Distributions: Let i ∈ I be a node of a
DAG, pa(i) the set of its parent nodes, x(i) and x(pa(i)) the corresponding random variables, then the joint probability
- f x = {x(i)}i∈I is
π(x) =
- i∈I
π(x(i)|x(pa(i))) Example: π(a, b, c, d) = π(a)π(b)π(c|a, b)π(d|c)
d a b c
Full Conditional Distributions & DAGs
- Markov property: each random variable in a DAG is
conditionally independent from variables other than parents and children: π(x(i)|x(−i)) ∝ π(x(i)|x(pa(i)))
- j:i∈pa(j)
π(x(j)|x(pa(j))) Examples: π(c|a, b, d) ∝ π(c|a, b) π(d|c) π(a|b, c, d) ∝ π(a) π(c|a, b)
d a b c
- DAGs allow the implementation of efficient simulation
code and the symbolic analysis of full conditional distributions for selecting Gibbs versus Metropolis
Conjugate Prior Distributions
- A class of prior distributions P is conjugate to a class of
likelihood functions π(x|ϑ) if π(ϑ) ∈ P, {xi ∼ π(x|ϑ)}i=1,...,n ⇒ π(ϑ|x1, . . . , xn) ∈ P
- Example: consider
π(a|b, c, d) ∝ π(a)π(c|a, b), if the prior π(a) belongs to a family P conjugate to π(c|a, b), then π(a|x(−a)) ∈ P.
d a b c
- Conjugated families become useful if the expression of
the posterior distribution is known analytically. This is the prerequisite for application of the Gibbs sampler.
Normal - Conjugate Distributions
- If x ∼ N(ϑ, τ) (τ = σ−2), and ϑ ∼ N(µ, τµ)
π(ϑ|τ, x) = N τµµ + τ n
i=1 xi
τµ + nτ , τµ + nτ
- If x ∼ N(ϑ, τ), and τ ∼ Ga(α, β)
π(τ|ϑ, x) = Ga
- α + n
2 , β + 1 2
n
- i=1
(xi − ϑ)2
- If x ∼ N(ϑ, R), (dim(ϑ) = k, R = Σ−1), and ϑ ∼ N(µ, T)
π(ϑ|R, x) = N
- (T + nR)−1(Tµ + R
n
- i=1
xi), T + nR
- ,
- If x ∼ N(ϑ, R), and R ∼ W(ρ, S)
π(R|ϑ, x) = W
- ρ + n, S +
n
- i=1
(xi − ϑ)(xi − ϑ)t
Symbolic Manipulation Algorithm
- Syntax
prior <name> ∼ <distribution>(<parameters...>) sample <name> % if measured
- Build DAG and determine all parents and children
- Apply recursively rules to check if children have conjugate
distributions TRUE: generate full conditional distribution (Gibbs) FALSE: generate relevant probabilities (Metropolis)
- PROLOG rules
conj(P,X) :- prior(P, normal), prior(X, normal,[P, ]) conj(T,X) :- prior(T, gamma), prior(X, normal,[ , T]) conj(R,X) :- prior(R, wishart), prior(X, normal,[ , R])
- Full conditional distribution
fcd(T,gamma,[A,B],X,[A+0.5,B+(X-Mu)^2/2]) :- normal(X).
Example 1
- Assumptions: y ∼ N(µ, τ), µ ∼ N(µa, τa), τ ∼ Ga(a, b)
- Model specification:
prior tau ~ gamma ( a, b ); prior mu ~ normal ( mua, taua ); prior y ~ normal ( mu, tau ); sample y;
b a
τ
τa µa
µ
y
- Generated source code (MATLAB):
% P(tau | a, b, mu, mua, taua) % Sampling method : gibbs tau = mc2gamrnd(a+0.5, b+(y-mu)^2/2); % P(mu | mua, taua, a, b, tau) % Sampling method : gibbs mu=mc2normrnd(mldivide(taua+tau,taua*mua+tau*y),taua+tau); where mldivide(a,b) returns a−1· b.
Acknowledments
- Alberto Salvan (CNR)
- Wolfgang’s Students (thesis for Laurea dergree)
– Bolcato Lisa (1998) – Alessandra Pavan (2000) – Pennelli Maurizio (2004)
- Paola Bortot (Univ. Bologna)