quancol . ........ . . ... . ... ... ... ... ... ... - - PowerPoint PPT Presentation

quan col
SMART_READER_LITE
LIVE PREVIEW

quancol . ........ . . ... . ... ... ... ... ... ... - - PowerPoint PPT Presentation

Probabilistic Programming of Biology Jane Hillston Joint work with Anastasis Georgoulas and Guido Sanguinetti School of Informatics, University of Edinburgh December 2015 quancol . ........ . . ... . ... ... ... ... ... ...


slide-1
SLIDE 1

Probabilistic Programming of Biology

Jane Hillston Joint work with Anastasis Georgoulas and Guido Sanguinetti

School of Informatics, University of Edinburgh

December 2015

quancol . ........ . . . ... ... ... ... ... ... ...

Hillston Dagstuhl 15491 1 / 29

slide-2
SLIDE 2

Outline

1

Introduction

2

Probabilistic Programming

3

ProPPA

4

Inference

5

Conclusions

Hillston Dagstuhl 15491 2 / 29

slide-3
SLIDE 3

Outline

1

Introduction

2

Probabilistic Programming

3

ProPPA

4

Inference

5

Conclusions

Hillston Dagstuhl 15491 3 / 29

slide-4
SLIDE 4

Modelling

There are two approaches to model construction: Machine Learning: extracting a model from the data generated by the system, or refining a model based on system behaviour using statistical techniques. Mechanistic Modelling: starting from a description or hypothesis, construct a model that algorithmically mimics the behaviour

  • f the system, validated against data.

Hillston Dagstuhl 15491 4 / 29

slide-5
SLIDE 5

Machine Learning

prior posterior data

inference

Hillston Dagstuhl 15491 5 / 29

slide-6
SLIDE 6

Machine Learning

prior posterior data

inference

Bayesian statistics

Represent belief and uncertainty as probability distributions (prior, posterior). Treat parameters and unobserved variables similarly. Bayes’ Theorem: P(θ | D) = P(θ) · P(D | θ) P(D) posterior ∝ prior · likelihood

Hillston Dagstuhl 15491 5 / 29

slide-7
SLIDE 7

Mechanistic modelling

Models are constructed reflecting what is known about the components of the biological system and their behaviour. A variety of formal modelling techniques from theoretical computer science have been proposed to capture the system behaviour. These are then compiled into executable models1 which can be run to deepen understanding of the model. Executing the model generates data that can be compared with biological data.

1Jasmin Fisher, Thomas A. Henzinger: Executable cell biology. Nature

Biotechnology 2007

Hillston Dagstuhl 15491 6 / 29

slide-8
SLIDE 8

Comparing the techniques

Data-driven modelling: + rigorous handling of parameter uncertainty

  • limited or no treatment of stochasticity
  • in many cases bespoke solutions are required which can

limit the size of system which can be handled

Hillston Dagstuhl 15491 7 / 29

slide-9
SLIDE 9

Comparing the techniques

Data-driven modelling: + rigorous handling of parameter uncertainty

  • limited or no treatment of stochasticity
  • in many cases bespoke solutions are required which can

limit the size of system which can be handled Mechanistic modelling: + general execution ”engine” (deterministic or stochastic) can be reused for many models + models can be used speculatively to investigate roles of parameters, or alternative hypotheses

  • parameters are assumed to be known and fixed

Hillston Dagstuhl 15491 7 / 29

slide-10
SLIDE 10

Comparing the techniques

Data-driven modelling: + rigorous handling of parameter uncertainty

  • limited or no treatment of stochasticity
  • in many cases bespoke solutions are required which can

limit the size of system which can be handled Mechanistic modelling: + general execution ”engine” (deterministic or stochastic) can be reused for many models + models can be used speculatively to investigate roles of parameters, or alternative hypotheses

  • parameters are assumed to be known and fixed

Probabilistic Programming seeks to bring elements of both forms of modelling together.

Hillston Dagstuhl 15491 7 / 29

slide-11
SLIDE 11

Outline

1

Introduction

2

Probabilistic Programming

3

ProPPA

4

Inference

5

Conclusions

Hillston Dagstuhl 15491 8 / 29

slide-12
SLIDE 12

Probabilistic programming

A way to express probabilistic models in a high level language, like software code. Offers automated inference without the need to write bespoke solutions. Platforms: IBAL, Church, Infer.NET, Fun, ... Key actions: specify a distribution, specify observations, infer posterior distribution.

Hillston Dagstuhl 15491 9 / 29

slide-13
SLIDE 13

Probabilistic Process Algebra

What if we could... include information about uncertainty in the model? automatically use observations to refine this uncertainty? do all this in a formal context? Starting from an existing process algebra (Bio-PEPA), we have developed a new language ProPPA that addresses these issues.2

2Anastasis Georgoulas, Jane Hillston, Dimitrios Milios, Guido Sanguinetti:

Probabilistic Programming Process Algebra. QEST 2014: 249-264.

Hillston Dagstuhl 15491 10 / 29

slide-14
SLIDE 14

Outline

1

Introduction

2

Probabilistic Programming

3

ProPPA

4

Inference

5

Conclusions

Hillston Dagstuhl 15491 11 / 29

slide-15
SLIDE 15

Stochastic Process Algebra

In a stochastic process algebra actions (reactions) not only have a name or type, but also a stochastic duration or rate.

Hillston Dagstuhl 15491 12 / 29

slide-16
SLIDE 16

Stochastic Process Algebra

In a stochastic process algebra actions (reactions) not only have a name or type, but also a stochastic duration or rate. The language may be used to generate a Markov Process (CTMC).

SPA MODEL LABELLED TRANSITION SYSTEM CTMC Q ✲ ✲ SOS rules state transition diagram

Q is the infinitesimal generator matrix characterising the CTMC.

Hillston Dagstuhl 15491 12 / 29

slide-17
SLIDE 17

Stochastic Process Algebra

In a stochastic process algebra actions (reactions) not only have a name or type, but also a stochastic duration or rate. The language may be used to generate a Markov Process (CTMC).

SPA MODEL LABELLED TRANSITION SYSTEM CTMC Q ✲ ✲ SOS rules state transition diagram

Q is the infinitesimal generator matrix characterising the CTMC. Models are typically executed by simulation using Gillespie’s Stochastic Simulation Algorithm (SSA) or similar.

Hillston Dagstuhl 15491 12 / 29

slide-18
SLIDE 18

The Bio-PEPA abstraction

Each species i is described by a species component Ci

Hillston Dagstuhl 15491 13 / 29

slide-19
SLIDE 19

The Bio-PEPA abstraction

Each species i is described by a species component Ci Each reaction j is associated with an action type αj and its dynamics is described by a specific function fαj

Hillston Dagstuhl 15491 13 / 29

slide-20
SLIDE 20

The Bio-PEPA abstraction

Each species i is described by a species component Ci Each reaction j is associated with an action type αj and its dynamics is described by a specific function fαj The species components are then composed together to describe the behaviour of the system.

Hillston Dagstuhl 15491 13 / 29

slide-21
SLIDE 21

The Bio-PEPA abstraction

Each species i is described by a species component Ci Each reaction j is associated with an action type αj and its dynamics is described by a specific function fαj The species components are then composed together to describe the behaviour of the system. The semantics is defined by two transition relations: First, a capability relation — is a transition possible? Second, a stochastic relation — gives rate of a transition, derived from the parameters of the model. The result is a Continuous Time Markov Chain (CTMC)

Hillston Dagstuhl 15491 13 / 29

slide-22
SLIDE 22

A Probabilistic Programming Process Algebra: ProPPA

ProPPA aims to retain the features of the stochastic process algebra: simple model description in terms of components rigorous semantics giving an executable version of the model...

Hillston Dagstuhl 15491 14 / 29

slide-23
SLIDE 23

A Probabilistic Programming Process Algebra: ProPPA

ProPPA aims to retain the features of the stochastic process algebra: simple model description in terms of components rigorous semantics giving an executable version of the model... ... whilst also incorporating features of a probabilistic programming language: recording uncertainty in the parameters ability to incorporate observations into models accss to inference to update uncertainty based on observations

Hillston Dagstuhl 15491 14 / 29

slide-24
SLIDE 24

Example

I

R S I

S S R

spread

stop1 stop2

k_s = 0.5; k_r = 0.1; kineticLawOf spread : k_s * I * S; kineticLawOf stop1 : k_r * S * S; kineticLawOf stop2 : k_r * S * R; I = (spread,1) ↓ ; S = (spread,1) ↑ + (stop1,1) ↓ + (stop2,1) ↓ ; R = (stop1,1) ↑ + (stop2,1) ↑ ; I[10] ⊲

S[5] ⊲

R[0]

Hillston Dagstuhl 15491 15 / 29

slide-25
SLIDE 25

Additions

Declaring uncertain parameters: k s = Uniform(0,1); k t = Gaussian(0,1); Providing observations:

  • bserve(’trace’)

Specifying inference approach: infer(’ABC’)

Hillston Dagstuhl 15491 16 / 29

slide-26
SLIDE 26

Additions

I

R S I

S S R

spread stop1 stop2

k_s = Uniform(0,1); k_r = Uniform(0,1); kineticLawOf spread : k_s * I * S; kineticLawOf stop1 : k_r * S * S; kineticLawOf stop2 : k_r * S * R; I = (spread,1) ↓ ; S = (spread,1) ↑ + (stop1,1) ↓ + (stop2,1) ↓ ; R = (stop1,1) ↑ + (stop2,1) ↑ ; I[10] ⊲

S[5] ⊲

R[0]

  • bserve(’trace’)

infer(’ABC’) //Approximate Bayesian Computation

Hillston Dagstuhl 15491 17 / 29

slide-27
SLIDE 27

parameter

model

k = 2

CTMC

Hillston Dagstuhl 15491 18 / 29

slide-28
SLIDE 28

parameter

model

k ∈ [0,5]

set

  • f CTMCs

Hillston Dagstuhl 15491 18 / 29

slide-29
SLIDE 29

parameter

model

k ∼ p

distribution

  • ver CTMCs

μ

A ProPPA model should be mapped to something like a distribution over CTMCs – a Probabilistic Constraint Markov Chain.

Hillston Dagstuhl 15491 18 / 29

slide-30
SLIDE 30

Probabilistic CMCs

A Probabilistic Constraint Markov Chain is a tuple S, o, A, V , φ, where: S is the set of states, of cardinality k.

  • ∈ S is the initial state.

A is a set of atomic propositions. V : S → 22A gives a set of acceptable labellings for each state. φ : S × [0, ∞)k → [0, ∞) is the constraint function. Similarly to Bio-PEPA the semantics of ProPPA is defined using two transition relations: Capability relation — is a transition possible? Stochastic relation — gives distribution of the rate of a transition

Hillston Dagstuhl 15491 19 / 29

slide-31
SLIDE 31

Outline

1

Introduction

2

Probabilistic Programming

3

ProPPA

4

Inference

5

Conclusions

Hillston Dagstuhl 15491 20 / 29

slide-32
SLIDE 32

Inference

parameter

model

k ∼ p

distribution

  • ver CTMCs

μ

Hillston Dagstuhl 15491 21 / 29

slide-33
SLIDE 33

Inference

parameter

model

k ∼ p

distribution

  • ver CTMCs

μ

  • bservations

inference

posterior distribution

μ*

Hillston Dagstuhl 15491 21 / 29

slide-34
SLIDE 34

Inference

Exact inference is impossible, as we cannot calculate the likelihood, so we use approximate algorithms or approximations of the system. The ProPPA semantics does not define a single inference algorithm, allowing for a modular approach. In practice, to perform inference, we must specify:

◮ The observations, in a text file ◮ The inference algorithm to be used

Regardless of the specific method chosen, the model is parsed to extract the necessary information: rate laws, stoichiometry and priors

  • n parameters.

The chosen method can then be applied automatically using the extracted information.

Hillston Dagstuhl 15491 22 / 29

slide-35
SLIDE 35

Supported inference techniques

Approximate Bayesian Computation: Simulation based, “cheap and cheerful”. Can give good results but hard to tune. Truncation-based MCMC: Principled way to deal with infinite state-spaces. Additionally assuming Gamma priors allows more efficient inference (Gibbs sampling). Linear Noise Approximation: Species counts are approximated as continuous variables. Assume state is normally distributed at any time ⇒ solve ODEs for mean and variance.

Hillston Dagstuhl 15491 23 / 29

slide-36
SLIDE 36

Results

Tested on the rumour-spreading example, giving the two parameters uniform priors. Method: Approximate Bayesian Computation Returns posterior as a set of points (samples) Observations: time-series (single simulation)

Hillston Dagstuhl 15491 24 / 29

slide-37
SLIDE 37

Results: ABC prior distribution

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ks kr

Hillston Dagstuhl 15491 25 / 29

slide-38
SLIDE 38

Results: ABC posterior distribution

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ks kr

Hillston Dagstuhl 15491 26 / 29

slide-39
SLIDE 39

Results: ABC posterior distribution (parameters separated)

0.2 0.4 0.6 0.8 1 2000 4000 6000 8000 10000 12000 kr Number of samples 0.2 0.4 0.6 0.8 1 1000 2000 3000 4000 5000 6000 7000 ks Number of samples

Hillston Dagstuhl 15491 27 / 29

slide-40
SLIDE 40

Outline

1

Introduction

2

Probabilistic Programming

3

ProPPA

4

Inference

5

Conclusions

Hillston Dagstuhl 15491 28 / 29

slide-41
SLIDE 41

Summary

ProPPA is a process algebra that incorporates uncertainty and

  • bservations directly in the model, influenced by probabilistic

programming. Semantics defined in terms of an extension of Constraint Markov Chains. Observations can be either time-series or logical properties. Parameter inference results consistent with expectations.

Hillston Dagstuhl 15491 29 / 29