[PPT] - quancol . ........ . . . ... ... ... ... ... ... ... PowerPoint Presentation

SLIDE 1

Embedding Machine Learning in Formal Stochastic Models of Biological Processes

Jane Hillston

Joint work with Anastasis Georgoulas and Guido Sanguinetti, School of Informatics, University of Edinburgh

21st September 2016

quancol . ........ . . . ... ... ... ... ... ... ...

Hillston 21/9/2016 1 / 70

SLIDE 2

Outline

1

Introduction

2

Probabilistic Programming

3

ProPPA

4

Inference

5

Results

6

Conclusions

Hillston 21/9/2016 2 / 70

SLIDE 3

Outline

1

Introduction

2

Probabilistic Programming

3

ProPPA

4

Inference

5

Results

6

Conclusions

Hillston 21/9/2016 3 / 70

SLIDE 4

Formal modelling

Formal languages (process algebras, Petri nets, rule-based) provide a convenient interface for describing complex systems. High-level abstraction makes writing and manipulating models easier. They can capture different kinds of behaviour: deterministic, stochastic, . . . Formal nature lends itself to automatic, rigorous methods for analysis and verification. . . . but what if parts of the system are unknown?

Hillston 21/9/2016 4 / 70

SLIDE 5

Alternative perspective

? ?

Model creation is data-driven

Hillston 21/9/2016 5 / 70

SLIDE 6

Modelling

There are two approaches to model construction: Machine Learning: extracting a model from the data generated by the system, or refining a model based on system behaviour using statistical techniques. Mechanistic Modelling: starting from a description or hypothesis, construct a model that algorithmically mimics the behaviour

f the system, validated against data.

Hillston 21/9/2016 6 / 70

SLIDE 7

Machine Learning

prior posterior data

inference

Hillston 21/9/2016 7 / 70

SLIDE 8

Machine Learning

prior posterior data

inference

Bayes’ Theorem

For the distribution of a parameter θ and observed data D, P(θ | D) ∝ P(θ)P(D | θ)

Hillston 21/9/2016 7 / 70

SLIDE 9

Bayesian statistics

Represent belief and uncertainty as probability distributions (prior, posterior). Treat parameters and unobserved variables similarly. Bayes’ Theorem: P(θ | D) = P(θ) · P(D | θ) P(D) posterior ∝ prior · likelihood

Hillston 21/9/2016 8 / 70

SLIDE 10

Bayesian statistics

5 10 15 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Prior distribution (no data) Hillston 21/9/2016 9 / 70

SLIDE 11

Bayesian statistics

5 10 15 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

After 20 data points Hillston 21/9/2016 9 / 70

SLIDE 12

Bayesian statistics

5 10 15 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

After 40 data points Hillston 21/9/2016 9 / 70

SLIDE 13

Bayesian statistics

5 10 15 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

After 60 data points Hillston 21/9/2016 9 / 70

SLIDE 14

Mechanistic modelling

Models are constructed reflecting what is known about the components of the biological system and their behaviour.

1Jasmin Fisher, Thomas A. Henzinger: Executable cell biology. Nature

Biotechnology 2007

Hillston 21/9/2016 10 / 70

SLIDE 15

Mechanistic modelling

Models are constructed reflecting what is known about the components of the biological system and their behaviour. Several approaches originating in theoretical computer science have been proposed to capture the system behaviour in a high-level way.

1Jasmin Fisher, Thomas A. Henzinger: Executable cell biology. Nature

Biotechnology 2007

Hillston 21/9/2016 10 / 70

SLIDE 16

Mechanistic modelling

Models are constructed reflecting what is known about the components of the biological system and their behaviour. Several approaches originating in theoretical computer science have been proposed to capture the system behaviour in a high-level way. These are then compiled into executable models1 which can be run to deepen understanding of the model.

1Jasmin Fisher, Thomas A. Henzinger: Executable cell biology. Nature

Biotechnology 2007

Hillston 21/9/2016 10 / 70

SLIDE 17

Mechanistic modelling

Models are constructed reflecting what is known about the components of the biological system and their behaviour. Several approaches originating in theoretical computer science have been proposed to capture the system behaviour in a high-level way. These are then compiled into executable models1 which can be run to deepen understanding of the model. Executing the model generates data that can be compared with biological data.

1Jasmin Fisher, Thomas A. Henzinger: Executable cell biology. Nature

Biotechnology 2007

Hillston 21/9/2016 10 / 70

SLIDE 18

Optimizing models

Usual process of parameterising a model is iterative and manual.

model data

simulate/ analyse update

Hillston 21/9/2016 11 / 70

SLIDE 19

Comparing the techniques

Data-driven modelling: + rigorous handling of parameter uncertainty

limited or no treatment of stochasticity
in many cases bespoke solutions are required which can

limit the size of system which can be handled

Hillston 21/9/2016 12 / 70

SLIDE 20

Comparing the techniques

Data-driven modelling: + rigorous handling of parameter uncertainty

limited or no treatment of stochasticity
in many cases bespoke solutions are required which can

limit the size of system which can be handled Mechanistic modelling: + general execution ”engine” (deterministic or stochastic) can be reused for many models + models can be used speculatively to investigate roles of parameters, or alternative hypotheses

parameters are assumed to be known and fixed, or

costly approaches must be used to seek appropriate parameterisation

Hillston 21/9/2016 12 / 70

SLIDE 21

Developing a probabilistic programming approach

What if we could... include information about uncertainty in the model? automatically use observations to refine this uncertainty? do all this in a formal context? Starting from an existing process algebra (Bio-PEPA), we have developed a new language ProPPA that addresses these issues2.

2Anastasis Georgoulas, Jane Hillston, Dimitrios Milios, Guido Sanguinetti:

Probabilistic Programming Process Algebra. QEST 2014: 249-264.

Hillston 21/9/2016 13 / 70

SLIDE 22

Outline

1

Introduction

2

Probabilistic Programming

3

ProPPA

4

Inference

5

Results

6

Conclusions

Hillston 21/9/2016 14 / 70

SLIDE 23

Probabilistic programming

A programming paradigm for describing incomplete knowledge scenarios, and resolving the uncertainty. Describe how the data is generated in syntax like a conventional programming language, but leaving some variables uncertain.

Hillston 21/9/2016 15 / 70

SLIDE 24

Probabilistic programming

A programming paradigm for describing incomplete knowledge scenarios, and resolving the uncertainty. Describe how the data is generated in syntax like a conventional programming language, but leaving some variables uncertain. Specify observations, which impose constraints on acceptable outputs

f the program.

Hillston 21/9/2016 15 / 70

SLIDE 25

Probabilistic programming

A programming paradigm for describing incomplete knowledge scenarios, and resolving the uncertainty. Describe how the data is generated in syntax like a conventional programming language, but leaving some variables uncertain. Specify observations, which impose constraints on acceptable outputs

f the program.

Run program forwards: Generate data consistent with observations.

Hillston 21/9/2016 15 / 70

SLIDE 26

Probabilistic programming

A programming paradigm for describing incomplete knowledge scenarios, and resolving the uncertainty. Describe how the data is generated in syntax like a conventional programming language, but leaving some variables uncertain. Specify observations, which impose constraints on acceptable outputs

f the program.

Run program forwards: Generate data consistent with observations. Run program backwards: Find values for the uncertain variables which make the output match the observations.

Hillston 21/9/2016 15 / 70

SLIDE 27

Outline

1

Introduction

2

Probabilistic Programming

3

ProPPA

4

Inference

5

Results

6

Conclusions

Hillston 21/9/2016 16 / 70

SLIDE 28

Molecular processes as concurrent computations

Concurrency Molecular Biology Metabolism Signal Transduction Concurrent computational processes Molecules Enzymes and metabolites Interacting proteins Synchronous communication Molecular interaction Binding and catalysis Binding and catalysis Transition or mobility Biochemical modification or relocation Metabolite synthesis Protein binding, modification or sequestration

A. Regev and E. Shapiro Cells as computation, Nature 419, 2002.

Hillston 21/9/2016 17 / 70

SLIDE 29

Molecular processes as concurrent computations

Concurrency Molecular Biology Metabolism Signal Transduction Concurrent computational processes Molecules Enzymes and metabolites Interacting proteins Synchronous communication Molecular interaction Binding and catalysis Binding and catalysis Transition or mobility Biochemical modification or relocation Metabolite synthesis Protein binding, modification or sequestration

A. Regev and E. Shapiro Cells as computation, Nature 419, 2002.

Hillston 21/9/2016 18 / 70

SLIDE 30

Stochastic Process Algebra

In a stochastic process algebra actions (reactions) have a name or type, but also a stochastic duration or rate.

Hillston 21/9/2016 19 / 70

SLIDE 31

Stochastic Process Algebra

In a stochastic process algebra actions (reactions) have a name or type, but also a stochastic duration or rate. In systems biology modelling it is these rates that are often unknown.

Hillston 21/9/2016 19 / 70

SLIDE 32

Stochastic Process Algebra

In a stochastic process algebra actions (reactions) have a name or type, but also a stochastic duration or rate. In systems biology modelling it is these rates that are often unknown. The language may be used to generate a Markov Process (CTMC).

SPA MODEL LABELLED TRANSITION SYSTEM CTMC Q ✲ ✲ SOS rules state transition diagram

Hillston 21/9/2016 19 / 70

SLIDE 33

Stochastic Process Algebra

In a stochastic process algebra actions (reactions) have a name or type, but also a stochastic duration or rate. In systems biology modelling it is these rates that are often unknown. The language may be used to generate a Markov Process (CTMC).

SPA MODEL LABELLED TRANSITION SYSTEM CTMC Q ✲ ✲ SOS rules state transition diagram

Q is the infinitesimal generator matrix characterising the CTMC.

Hillston 21/9/2016 19 / 70

SLIDE 34

Stochastic Process Algebra

In a stochastic process algebra actions (reactions) have a name or type, but also a stochastic duration or rate. In systems biology modelling it is these rates that are often unknown. The language may be used to generate a Markov Process (CTMC).

SPA MODEL LABELLED TRANSITION SYSTEM CTMC Q ✲ ✲ SOS rules state transition diagram

Q is the infinitesimal generator matrix characterising the CTMC. Models are typically executed by simulation using Gillespie’s Stochastic Simulation Algorithm (SSA) or similar.

Hillston 21/9/2016 19 / 70

SLIDE 35

Bio-PEPA modelling

The state of the system at any time consists of the local states of each of its sequential/species components.

Hillston 21/9/2016 20 / 70

SLIDE 36

Bio-PEPA modelling

The state of the system at any time consists of the local states of each of its sequential/species components. The local states of components are quantitative rather than functional, i.e. biological changes to species are represented as distinct components.

Hillston 21/9/2016 20 / 70

SLIDE 37

Bio-PEPA modelling

The state of the system at any time consists of the local states of each of its sequential/species components. The local states of components are quantitative rather than functional, i.e. biological changes to species are represented as distinct components. A component varying its state corresponds to it varying its amount.

Hillston 21/9/2016 20 / 70

SLIDE 38

Bio-PEPA modelling

The state of the system at any time consists of the local states of each of its sequential/species components. The local states of components are quantitative rather than functional, i.e. biological changes to species are represented as distinct components. A component varying its state corresponds to it varying its amount. This is captured by an integer parameter associated with the species and the effect of a reaction is to vary that parameter by a number corresponding to the stoichiometry of this species in the reaction.

Hillston 21/9/2016 20 / 70

SLIDE 39

The abstraction

Each species i is described by a species component Ci

Hillston 21/9/2016 21 / 70

SLIDE 40

The abstraction

Each species i is described by a species component Ci Each reaction j is associated with an action type αj and its dynamics is described by a specific function fαj

Hillston 21/9/2016 21 / 70

SLIDE 41

The abstraction

Each species i is described by a species component Ci Each reaction j is associated with an action type αj and its dynamics is described by a specific function fαj The species components (now quantified) are then composed together to describe the behaviour of the system.

Hillston 21/9/2016 21 / 70

SLIDE 42

The semantics

The semantics is defined by two transition relations: First, a capability relation — is a transition possible? Second, a stochastic relation — gives rate of a transition, derived from the parameters of the model.

Hillston 21/9/2016 22 / 70

SLIDE 43

The semantics

The semantics is defined by two transition relations: First, a capability relation — is a transition possible? Second, a stochastic relation — gives rate of a transition, derived from the parameters of the model. The labelled transition system generated by the stochastic relation formally defines the underlying CTMC.

Hillston 21/9/2016 22 / 70

SLIDE 44

Example — in Bio-PEPA

I

R S I

S S R

spread

stop1 stop2

Hillston 21/9/2016 23 / 70

SLIDE 45

Example — in Bio-PEPA

I

R S I

S S R

spread

stop1 stop2

k_s = 0.5; k_r = 0.1; kineticLawOf spread : k_s * I * S; kineticLawOf stop1 : k_r * S * S; kineticLawOf stop2 : k_r * S * R; I = (spread,1) ↓ ; S = (spread,1) ↑ + (stop1,1) ↓ + (stop2,1) ↓ ; R = (stop1,1) ↑ + (stop2,1) ↑ ; I[10] ⊲

⊳

∗

S[5] ⊲

⊳

∗

R[0]

Hillston 21/9/2016 23 / 70

SLIDE 46

A Probabilistic Programming Process Algebra: ProPPA

The objective of ProPPA is to retain the features of the stochastic process algebra: simple model description in terms of components rigorous semantics giving an executable version of the model...

Hillston 21/9/2016 24 / 70

SLIDE 47

A Probabilistic Programming Process Algebra: ProPPA

The objective of ProPPA is to retain the features of the stochastic process algebra: simple model description in terms of components rigorous semantics giving an executable version of the model... ... whilst also incorporating features of a probabilistic programming language: recording uncertainty in the parameters ability to incorporate observations into models access to inference to update uncertainty based on observations

Hillston 21/9/2016 24 / 70

SLIDE 48

Example Revisited

I

R S I

S S R

spread

stop1 stop2

k_s = 0.5; k_r = 0.1; kineticLawOf spread : k_s * I * S; kineticLawOf stop1 : k_r * S * S; kineticLawOf stop2 : k_r * S * R; I = (spread,1) ↓ ; S = (spread,1) ↑ + (stop1,1) ↓ + (stop2,1) ↓ ; R = (stop1,1) ↑ + (stop2,1) ↑ ; I[10] ⊲

⊳

∗

S[5] ⊲

⊳

∗

R[0]

Hillston 21/9/2016 25 / 70

SLIDE 49

Additions

Declaring uncertain parameters: k s = Uniform(0,1); k t = Uniform(0,1); Providing observations:

bserve(’trace’)

Specifying inference approach: infer(’ABC’)

Hillston 21/9/2016 26 / 70

SLIDE 50

Additions

I

R S I

S S R

spread stop1 stop2

k_s = Uniform(0,1); k_r = Uniform(0,1); kineticLawOf spread : k_s * I * S; kineticLawOf stop1 : k_r * S * S; kineticLawOf stop2 : k_r * S * R; I = (spread,1) ↓ ; S = (spread,1) ↑ + (stop1,1) ↓ + (stop2,1) ↓ ; R = (stop1,1) ↑ + (stop2,1) ↑ ; I[10] ⊲

⊳

∗

S[5] ⊲

⊳

∗

R[0]

bserve(’trace’)

infer(’ABC’) //Approximate Bayesian Computation

Hillston 21/9/2016 27 / 70

SLIDE 51

Semantics

A Bio-PEPA model can be interpreted as a CTMC; however, CTMCs cannot capture uncertainty in the rates (every transition must have a concrete rate). ProPPA models include uncertainty in the parameters, which translates into uncertainty in the transition rates. A ProPPA model should be mapped to something like a distribution

ver CTMCs.

Hillston 21/9/2016 28 / 70

SLIDE 52

parameter

model

k = 2

CTMC

Hillston 21/9/2016 29 / 70

SLIDE 53

parameter

model

k ∈ [0,5]

set

f CTMCs

Hillston 21/9/2016 29 / 70

SLIDE 54

parameter

model

k ∼ p

distribution

ver CTMCs

μ

Hillston 21/9/2016 29 / 70

SLIDE 55

Constraint Markov Chains

Constraint Markov Chains3 (CMCs) are a generalization of DTMCs, in which the transition probabilities are not concrete, but can take any value satisfying some constraints.

Constraint Markov Chain

A CMC is a tuple S, o, A, V , φ, where: S is the set of states, of cardinality k.

∈ S is the initial state.

A is a set of atomic propositions. V : S → 22A gives a set of acceptable labellings for each state. φ : S × [0, 1]k → {0, 1} is the constraint function.

3Caillaud et al., Constraint Markov Chains, Theoretical Computer Science, 2011 Hillston 21/9/2016 30 / 70

SLIDE 56

Constraint Markov Chains

In a CMC, arbitrary constraints are permitted, expressed through the function φ: φ(s, p) = 1 iff p is an acceptable vector of transition probabilities from state s. However, CMCs are defined only for the discrete-time case, and this does not say anything about how likely a value is to be chosen,

nly about whether it is acceptable.

To address these shortcomings, we define Probabilistic Constraint Markov Chains.

Hillston 21/9/2016 31 / 70

SLIDE 57

5 10 15 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5 10 15 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Hillston 21/9/2016 32 / 70

SLIDE 58

Probabilistic CMCs

A Probabilistic Constraint Markov Chain is a tuple S, o, A, V , φ, where: S is the set of states, of cardinality k.

∈ S is the initial state.

A is a set of atomic propositions. V : S → 22A gives a set of acceptable labellings for each state. φ : S × [0, ∞)k → [0, ∞) is the constraint function. This is applicable to continuous-time systems. φ(s, ·) is now a probability density function on the transition rates from state s.

Hillston 21/9/2016 33 / 70

SLIDE 59

Semantics of ProPPA

The semantics definition follows that of Bio-PEPA, which is defined using two transition relations: Capability relation — is a transition possible? Stochastic relation — gives rate of a transition

Hillston 21/9/2016 34 / 70

SLIDE 60

Semantics of ProPPA

The semantics definition follows that of Bio-PEPA, which is defined using two transition relations: Capability relation — is a transition possible? Stochastic relation — gives distribution of the rate of a transition The distribution over the parameter values induces a distribution over transition rates. Rules are expressed as state-to-function transition systems (FuTS4).

4De Nicola et al., A Uniform Definition of Stochastic Process Calculi, ACM

Computing Surveys, 2013

Hillston 21/9/2016 35 / 70

SLIDE 61

Semantics of ProPPA

The semantics definition follows that of Bio-PEPA, which is defined using two transition relations: Capability relation — is a transition possible? Stochastic relation — gives distribution of the rate of a transition The distribution over the parameter values induces a distribution over transition rates. Rules are expressed as state-to-function transition systems (FuTS4). This gives rise the underlying PCMC.

4De Nicola et al., A Uniform Definition of Stochastic Process Calculi, ACM

Computing Surveys, 2013

Hillston 21/9/2016 35 / 70

SLIDE 62

Simulating Probabilistic Constraint Markov Chains

Probabilistic Constraint Markov Chains are open to two alternative dynamic interpretations:

1 For each trajectory, for each uncertain transition rate, sample once at

the start of the run and use that value throughout;

2 During each trajectory, each time a transition with an uncertain rate

is encountered, sample a value but then discard it and re-sample whenever this transition is visited again.

Hillston 21/9/2016 36 / 70

SLIDE 63

Simulating Probabilistic Constraint Markov Chains

Probabilistic Constraint Markov Chains are open to two alternative dynamic interpretations:

1 For each trajectory, for each uncertain transition rate, sample once at

the start of the run and use that value throughout;

2 During each trajectory, each time a transition with an uncertain rate

is encountered, sample a value but then discard it and re-sample whenever this transition is visited again.

1 Uncertain Markov Chains Hillston 21/9/2016 36 / 70

SLIDE 64

Simulating Probabilistic Constraint Markov Chains

Probabilistic Constraint Markov Chains are open to two alternative dynamic interpretations:

1 For each trajectory, for each uncertain transition rate, sample once at

the start of the run and use that value throughout;

2 During each trajectory, each time a transition with an uncertain rate

is encountered, sample a value but then discard it and re-sample whenever this transition is visited again.

1 Uncertain Markov Chains 2 Imprecise Markov Chains Hillston 21/9/2016 36 / 70

SLIDE 65

Simulating Probabilistic Constraint Markov Chains

Probabilistic Constraint Markov Chains are open to two alternative dynamic interpretations:

1 For each trajectory, for each uncertain transition rate, sample once at

the start of the run and use that value throughout;

2 During each trajectory, each time a transition with an uncertain rate

is encountered, sample a value but then discard it and re-sample whenever this transition is visited again.

1 Uncertain Markov Chains 2 Imprecise Markov Chains Hillston 21/9/2016 36 / 70

SLIDE 66

Simulating Probabilistic Constraint Markov Chains

Probabilistic Constraint Markov Chains are open to two alternative dynamic interpretations:

1 For each trajectory, for each uncertain transition rate, sample once at

the start of the run and use that value throughout;

2 During each trajectory, each time a transition with an uncertain rate

is encountered, sample a value but then discard it and re-sample whenever this transition is visited again.

1 Uncertain Markov Chains 2 Imprecise Markov Chains

Our current work is focused on the Uncertain Markov Chain case.

Hillston 21/9/2016 36 / 70

SLIDE 67

Outline

1

Introduction

2

Probabilistic Programming

3

ProPPA

4

Inference

5

Results

6

Conclusions

Hillston 21/9/2016 37 / 70

SLIDE 68

Inference

parameter

model

k ∼ p

distribution

ver CTMCs

μ

Hillston 21/9/2016 38 / 70

SLIDE 69

Inference

parameter

model

k ∼ p

distribution

ver CTMCs

μ

bservations

inference

posterior distribution

μ*

Hillston 21/9/2016 38 / 70

SLIDE 70

Inference

P(θ | D) ∝ P(θ)P(D | θ) The ProPPA semantics does not define a single inference algorithm, allowing for a modular approach. Different algorithms can act on different input (time-series vs properties), return different results or in different forms. Exact inference is often impossible, as we cannot calculate the likelihood. We must use approximate algorithms or approximations of the system.

Hillston 21/9/2016 39 / 70

SLIDE 71

Inferring likelihood in uncertain CTMCs

Transient probabilities can be expressed as: dpi(t) dt =

j=i

pj(t) · qji − pi(t)

j=i

qij

Hillston 21/9/2016 40 / 70

SLIDE 72

Inferring likelihood in uncertain CTMCs

Transient probabilities can be expressed as: dpi(t) dt =

j=i

pj(t) · qji − pi(t)

j=i

qij The probability of a single observation (y, t) can then be expressed as p(y, t) =

i∈S

pi(t)π(y | i) where π(y | i) is the probability of observing y when in state i.

Hillston 21/9/2016 40 / 70

SLIDE 73

Inferring likelihood in uncertain CTMCs

Transient probabilities can be expressed as: dpi(t) dt =

j=i

pj(t) · qji − pi(t)

j=i

qij The probability of a single observation (y, t) can then be expressed as p(y, t) =

i∈S

pi(t)π(y | i) where π(y | i) is the probability of observing y when in state i. The likelihood can then be expressed as P(D | θ) =

N

j=1
i∈S

p(i|θ)(tj)π(yj | i)

Hillston 21/9/2016 40 / 70

SLIDE 74

Calculating the transient probabilities

For finite state-spaces, the transient probabilities can, in principle, be computed as p(t) = p(0)eQt. Likelihood is hard to compute: Computing eQt is expensive if the state space is large Impossible directly in infinite state-spaces

Hillston 21/9/2016 41 / 70

SLIDE 75

Basic Inference

Approximate Bayesian Computation is a simple simulation-based solution:

◮ Approximates posterior distribution over parameters as a set of samples ◮ Likelihood of parameters is approximated with a notion of distance. Hillston 21/9/2016 42 / 70

SLIDE 76

Basic Inference

Approximate Bayesian Computation is a simple simulation-based solution:

◮ Approximates posterior distribution over parameters as a set of samples ◮ Likelihood of parameters is approximated with a notion of distance.

x x x x

t

X

Hillston 21/9/2016 42 / 70

SLIDE 77

Basic Inference

Approximate Bayesian Computation is a simple simulation-based solution:

◮ Approximates posterior distribution over parameters as a set of samples ◮ Likelihood of parameters is approximated with a notion of distance.

x x x x

t

X

Hillston 21/9/2016 42 / 70

SLIDE 78

Basic Inference

Approximate Bayesian Computation is a simple simulation-based solution:

◮ Approximates posterior distribution over parameters as a set of samples ◮ Likelihood of parameters is approximated with a notion of distance.

x x x x

t

X

Hillston 21/9/2016 42 / 70

SLIDE 79

Basic Inference

Approximate Bayesian Computation is a simple simulation-based solution:

◮ Approximates posterior distribution over parameters as a set of samples ◮ Likelihood of parameters is approximated with a notion of distance.

x x x x

t

X

Σ(xi-yi)2 > ε

rejected

Hillston 21/9/2016 42 / 70

SLIDE 80

Basic Inference

Approximate Bayesian Computation is a simple simulation-based solution:

◮ Approximates posterior distribution over parameters as a set of samples ◮ Likelihood of parameters is approximated with a notion of distance.

x x x x

t

X

Σ(xi-yi)2 < ε

accepted

Hillston 21/9/2016 42 / 70

SLIDE 81

Approximate Bayesian Computation

ABC algorithm

1 Sample a parameter set from the prior distribution. 2 Simulate the system using these parameters. 3 Compare the simulation trace obtained with the observations. 4 If distance < ǫ, accept, otherwise reject.

This results in an approximate to the posterior distribution. As ǫ → 0, set of samples converges to true posterior. We use a more elaborate version based on Markov Chain Monte Carlo sampling.

Hillston 21/9/2016 43 / 70

SLIDE 82

Inference for infinite state spaces

Various methods become inefficient or inapplicable as the state-space grows. How to deal with unbounded systems? Multiple simulation runs Large population approximations (diffusion, Linear Noise,. . . ) Systematic truncation Random truncations

Hillston 21/9/2016 44 / 70

SLIDE 83

Expanding the likelihood

The likelihood can be written as an infinite series: p(x′, t′ | x, t) =

∞

N=0

p(N)(x′, t′ | x, t) =

∞

N=0
f (N)(x′, t′ | x, t) − f (N−1)(x′, t′ | x, t)
where

x∗ = max{x, x′} p(N)(x′, t′ | x, t) is the probability of going from state x at time t to state x′ at time t′ through a path with maximum state x∗ + N f (N) is the same, except the maximum state cannot exceed x∗ + N (but does not have to reach it)

Hillston 21/9/2016 45 / 70

SLIDE 84

Expanding the likelihood

The likelihood can be written as an infinite series: p(x′, t′ | x, t) =

∞

N=0

p(N)(x′, t′ | x, t) =

∞

N=0
f (N)(x′, t′ | x, t) − f (N−1)(x′, t′ | x, t)
where

x∗ = max{x, x′} p(N)(x′, t′ | x, t) is the probability of going from state x at time t to state x′ at time t′ through a path with maximum state x∗ + N f (N) is the same, except the maximum state cannot exceed x∗ + N (but does not have to reach it) Any finite number of terms can be computed — Can the infinite sum be computed or estimated?

Hillston 21/9/2016 45 / 70

SLIDE 85

(0,0) (0,1) (0,3) (0,4) (1,0) (1,3) (1,4) (2,0) (2,1) (2,2) (2,3) (2,4) (3,0) (3,1) (3,2) (3,3) (3,4) (1,1) (0,2) (1,2)

x x'

Hillston 21/9/2016 46 / 70

SLIDE 86

(0,0) (0,1) (0,3) (0,4) (1,0) (1,3) (1,4) (2,0) (2,1) (2,2) (2,3) (2,4) (3,0) (3,1) (3,2) (3,3) (3,4) (1,1) (0,2) (1,2)

S0

x x' x*

Hillston 21/9/2016 46 / 70

SLIDE 87

(0,0) (0,1) (0,3) (0,4) (1,0) (1,3) (1,4) (2,0) (2,1) (2,2) (2,3) (2,4) (3,0) (3,1) (3,2) (3,3) (3,4) (1,1) (0,2) (1,2)

S0 S1

x x' x*

Hillston 21/9/2016 46 / 70

SLIDE 88

Russian Roulette Truncation

We want to estimate the value of f =

∞

n=0

fn where the fn’s are computable.

Hillston 21/9/2016 47 / 70

SLIDE 89

Russian Roulette Truncation

We want to estimate the value of f =

∞

n=0

fn where the fn’s are computable. Choose a single term fk with probability pk; estimate ˆ f = fk

pk

ˆ f is unbiased... but its variance can be high.

Hillston 21/9/2016 47 / 70

SLIDE 90

Russian Roulette Truncation

We want to estimate the value of f =

∞

n=0

fn where the fn’s are computable. Truncate the sum randomly: stop at term k with probability qk. Form ˆ f as a partial sum of the fn, n = 1, . . . , k, rescaled appropriately. ˆ f =

k

n=0

fn

k−1

j=0

(1 − qj)

Hillston 21/9/2016 47 / 70

SLIDE 91

Russian Roulette

ˆ f ← f0 i ← 1 p ← 1 loop Choose to stop with probability qi if stopping then return ˆ f else p ← p · (1 − qi) ˆ f ← ˆ f + fi

p

i ← i + 1 end if end loop

Hillston 21/9/2016 48 / 70

SLIDE 92

Russian Roulette

ˆ f =

k

n=0

fn k−1

j=0 (1 − qj)

In our case, f is a probability that we wish to approximate. Using ˆ f instead of f leads to an error; however ˆ f is unbiased: E[ˆ f ] = f . ˆ f is also guaranteed to be positive. Pseudo-marginal algorithms can use this and still draw samples from the correct distribution. We have developed both Metropolis-Hastings and Gibbs-like sampling algorithms based on this approach5.

5Unbiased Bayesian Inference for Population Markov Jump Processes via Random

Truncations. A.Georgoulas, J.Hillston and D.Sanguinetti, to appear in Stats & Comp.

Hillston 21/9/2016 49 / 70

SLIDE 93

SSA samples

Hillston 21/9/2016 50 / 70

SLIDE 94

Posterior samples

Hillston 21/9/2016 51 / 70

SLIDE 95

Outline

1

Introduction

2

Probabilistic Programming

3

ProPPA

4

Inference

5

Results

6

Conclusions

Hillston 21/9/2016 52 / 70

SLIDE 96

Example model

I

R S I

S S R k_s = Uniform(0,1); k_r = Uniform(0,1); kineticLawOf spread : k_s * I * S; kineticLawOf stop1 : k_r * S * S; kineticLawOf stop2 : k_r * S * R; I = (spread,1) ↓ ; S = (spread,1) ↑ + (stop1,1) ↓ + (stop2,1) ↓ ; R = (stop1,1) ↑ + (stop2,1) ↑ ; I[10] ⊲

⊳

∗

S[5] ⊲

⊳

∗

R[0]

bserve(’trace’)

infer(’ABC’) //Approximate Bayesian Computation

Hillston 21/9/2016 53 / 70

SLIDE 97

Results

Tested on the rumour-spreading example, giving the two parameters uniform priors. Approximate Bayesian Computation Returns posterior as a set of points (samples) Observations: time-series (single simulation)

Hillston 21/9/2016 54 / 70

SLIDE 98

Results: ABC

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ks kr

Hillston 21/9/2016 55 / 70

SLIDE 99

Results: ABC

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ks kr

Hillston 21/9/2016 56 / 70

SLIDE 100

Results: ABC

0.2 0.4 0.6 0.8 1 2000 4000 6000 8000 10000 12000 kr Number of samples 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 5000 6000 7000 ks Number of samples

Hillston 21/9/2016 57 / 70

SLIDE 101

Genetic Toggle Switch

Two mutually-repressing genes: promoters (unobserved) and their protein products Bistable behaviour: switching induced by environmental changes Synthesised in E. coli6 Stochastic variant7 where switching is induced by noise

6Gardner, Cantor & Collins, Construction of a genetic toggle switch in Escherichia

coli, Nature, 2000

7Tian & Burrage, Stochastic models for regulatory networks of the genetic toggle

switch, PNAS, 2006

Hillston 21/9/2016 58 / 70

SLIDE 102

Genetic Toggle Switch

P1

∅ ∅

P2

∅ ∅

G1,on G1,off G2,on G2,off

participates accelerates

Hillston 21/9/2016 59 / 70

SLIDE 103

Toggle switch model: species

G1 = activ1 ↑ + deact1 ↓ + expr1 ⊕; G2 = activ2 ↑ + deact2 ↓ + expr2 ⊕; P1 = expr1 ↑ + degr1 ↓ + deact2 ⊕ ; P2 = expr2 ↑ + degr2 ↓ + deact1 ⊕ G1[1] <> G2[0] <> P1[20] <*> P2[0]

bserve(toggle_obs);

infer(rouletteGibbs);

Hillston 21/9/2016 60 / 70

SLIDE 104

θ 1 = Gamma(3,5); //etc... kineticLawOf expr1 : θ 1 * G1; kineticLawOf expr2 : θ 2 * G2; kineticLawOf degr1 : θ 3 * P1; kineticLawOf degr2 : θ 4 * P2; kineticLawOf activ1 : θ 5 * (1 - G1); kineticLawOf activ2 : θ 6 * (1 - G2); kineticLawOf deact1 : θ 7 * exp(r ∗ P2) * G1; kineticLawOf deact2 : θ 8 * exp(r ∗ P1) * G2; G1 = activ1 ↑ + deact1 ↓ + expr1 ⊕; G2 = activ2 ↑ + deact2 ↓ + expr2 ⊕; P1 = expr1 ↑ + degr1 ↓ + deact2 ⊕ ; P2 = expr2 ↑ + degr2 ↓ + deact1 ⊕ G1[1] <> G2[0] <> P1[20] <*> P2[0]

bserve(toggle_obs);

infer(rouletteGibbs);

Hillston 21/9/2016 61 / 70

SLIDE 105

Experiment

Simulated observations Gamma priors on all parameters (required by algorithm) Goal: learn posterior of 8 parameters 5000 samples taken using the Gibbs-like random truncation algorithm

Hillston 21/9/2016 62 / 70

SLIDE 106

Promoters

20 40 60 80 100

Hillston 21/9/2016 63 / 70

SLIDE 107

Proteins

20 40 60 80 100 5 10 15 20 25 30

Hillston 21/9/2016 64 / 70

SLIDE 108

Observations used

20 40 60 80 100 5 10 15 20 25 30

Hillston 21/9/2016 65 / 70

SLIDE 109

Results

Hillston 21/9/2016 66 / 70

SLIDE 110

Outline

1

Introduction

2

Probabilistic Programming

3

ProPPA

4

Inference

5

Results

6

Conclusions

Hillston 21/9/2016 67 / 70

SLIDE 111

Workflow

model inference algorithm low-level description inference results (samples) statistics plotting prediction

...

infer compile

Hillston 21/9/2016 68 / 70

SLIDE 112

Summary

ProPPA is a process algebra that incorporates uncertainty and

bservations directly in the model, influenced by probabilistic

programming. Syntax remains similar to Bio-PEPA. Semantics defined in terms of an extension of Constraint Markov Chains. Observations can be either time-series or logical properties. Parameter inference based on random truncations (Russian Roulette)

ffers new possibilities for inference.

Hillston 21/9/2016 69 / 70

SLIDE 113

Thanks

Anastasis Georgoulas Guido Sanguinetti

Hillston 21/9/2016 70 / 70