SMT-BASED ANALYSIS OF BIOLOGICAL SYSTEMS Nicola Paoletti CS - - PowerPoint PPT Presentation

smt based analysis of biological systems
SMART_READER_LITE
LIVE PREVIEW

SMT-BASED ANALYSIS OF BIOLOGICAL SYSTEMS Nicola Paoletti CS - - PowerPoint PPT Presentation

SMT-BASED ANALYSIS OF BIOLOGICAL SYSTEMS Nicola Paoletti CS department, Oxford University Molecular Programming and Biological Computation reading group 8 th June 2016 Motivation Information processing in biosystems Biological computation


slide-1
SLIDE 1

SMT-BASED ANALYSIS OF BIOLOGICAL SYSTEMS

Nicola Paoletti

CS department, Oxford University Molecular Programming and Biological Computation reading group

8th June 2016

slide-2
SLIDE 2
  • Motivation
  • Information processing in biosystems à Biological computation
  • Examples:
  • Synthetic biology
  • Molecular computing
  • Applications:
  • Smart therapeutics
  • Biofuels, bioremediation, …

Medicine in 2050: “Doctor in a Cell”

Programmable Computer Molecular Input

  • U. Shapiro, E. Shapiro, “How many

computers can fit into a drop of water?”

slide-3
SLIDE 3
  • Motivation
  • Understand life
  • Development
  • Disease onset and progression
  • Target discovery, pluripotent stem cells, …

Dunn, S-J., et al, Science 344.6188 (2014): 1156-1160.

slide-4
SLIDE 4
  • Motivation
  • Formal computational modelling and analysis
  • Verified behaviour of synthetic biosystems / DNA circuits
  • Verify known biological hypotheses
  • Derive new hypotheses, suggest lab experiments
  • Design automation of molecular programs
slide-5
SLIDE 5
  • Motivation
  • SMT-based analysis
  • Expressive framework
  • Supports both verification and synthesis
  • Scalable (can handle models of practical interest)
  • Formal computational modelling and analysis
  • Verified behaviour of synthetic biosystems / DNA circuits
  • Verify known biological hypotheses
  • Derive new hypotheses, suggest lab experiments
  • Design automation of molecular programs
slide-6
SLIDE 6
  • Outline

NOT TODAY TODAY

  • Overview of SMT solving
  • Bounded Model Checking
  • Analysis of Chemical Reaction Networks
  • SAT and SMT algorithms
  • Analysis of Gene Regulatory Networks
slide-7
SLIDE 7

SMT SOLVING

slide-8
SLIDE 8
  • SAT

Boolean Satisfiability Problem (SAT)

  • IN: Boolean formula (CNF)
  • OUT: is

satisfiable? + Truth assignment

  • NP-complete problem
  • Modern SAT solvers can handle millions of variables and clauses
  • Established technique for verification and synthesis of hardware

and logic circuits

  • Limited expressiveness (cannot encode e.g. arithmetic or datatypes)

φ = (x1 ∨ ¬x2) ∧ x2

φ

slide-9
SLIDE 9
  • SMT

Satisfiability Modulo Theories Problem (SMT)

  • IN: First-Order-Logic (FOL) formula over one or more

background theories

  • OUT: is

satisfiable? + Interpretation of free variables in their respective domains

φ

φ = x1 ≤ x2 = ⇒ f(x1) ≤ f(x2)

slide-10
SLIDE 10
  • SMT
  • In the general case, FOL is undecidable [Church, Turing]
  • But in SMT, background theories fix the interpretation of (function,

predicate and constant) symbols

  • With decidable theories, SMT is decidable too
  • With undecidable theories, semi-decision procedures often work

well in practice

Satisfiability Modulo Theories Problem (SMT)

  • IN: First-Order-Logic (FOL) formula over one or more

background theories

  • OUT: is

satisfiable? + Interpretation of free variables in their respective domains

φ

φ = x1 ≤ x2 = ⇒ f(x1) ≤ f(x2)

slide-11
SLIDE 11
  • SMT

COMMON (LAZY) APPROACH:

  • Provide ad-hoc, specialised theory solvers, able to handle negation

and conjunction of atomic propositions

  • Integrate with SAT solver to handle arbitrary Boolean structures
  • Methods exists for combination of multiple theories
slide-12
SLIDE 12
  • SMT

COMMON (LAZY) APPROACH:

  • Provide ad-hoc, specialised theory solvers, able to handle negation

and conjunction of atomic propositions

  • Integrate with SAT solver to handle arbitrary Boolean structures
  • Methods exists for combination of multiple theories

APPLICATIONS (in CS):

  • Program verification (Boogie, Spec#, …)
  • Model checking (BLAST, CBMC, nuXmv, …)
  • Symbolic execution
  • Program synthesis
slide-13
SLIDE 13
  • Some examples

a > b + 2 ∧ a = 2 · c + 10 ∧ b + c ≤ 1000

SAT, [a = 10, b = 0, c = 0]

  • (Non-)Linear Integer/Real Arithmetic

UNSAT

x · x = x + 2.0 ∧ x · y = x ∧ (y − 1.0) · z = 1.0

  • Bit-vectors

¬((x & y) = (x | y))

Validity of De Morgan’s law:

UNSAT Play with http://rise4fun.com/Z3/tutorial/

slide-14
SLIDE 14
  • Strings

a.b = “abc”.a[1]

SAT, [a = ‘bcc’, b = ‘a’]

  • Uninterpreted Functions

Play with http://rise4fun.com/Z3/tutorial/

f(10) > f(2) ∧ f(10) > f(a)

SAT, [a = 3, f(x) = {0 if x=10, -1 if x=2, -1 if x=3, -1 otherwise} ]

  • Optimisation (http://rise4fun.com/z3opt/tutorial/ )
  • Theory of ODEs – delta-satisfiability (http://dreal.github.io/ )

Ø Less conventional, but useful

  • Some examples
slide-15
SLIDE 15

BOUNDED MODEL CHECKING

slide-16
SLIDE 16
  • Bounded Model Checking (BMC) [Biere et al, TACAS 99]
  • A SAT/SMT-based method to verify properties on finite paths
  • Looks for counter-examples (CEs) of finite length, by negating the

property and finding SAT assignment

  • Based on unrolling the transition relation

I(ρ[0]) ∧ ^

0≤i<k

T(ρ[i], ρ[i + 1]) ∧ _

0≤i<k

¬φ(ρ[i])

INVARIANT/SAFETY PROPERTY

slide-17
SLIDE 17
  • A SAT/SMT-based method to verify properties on finite paths
  • Looks for counter-examples (CEs) of finite length, by negating the

property and finding SAT assignment

  • Based on unrolling the transition relation
  • If SAT, the property DOES NOT hold
  • If UNSAT, increase length until:
  • A CE is found (SAT)
  • The search becomes intractable
  • A fixed bound is reached

I(ρ[0]) ∧ ^

0≤i<k

T(ρ[i], ρ[i + 1]) ∧ _

0≤i<k

¬φ(ρ[i])

INVARIANT/SAFETY PROPERTY

  • Bounded Model Checking (BMC) [Biere et al, TACAS 99]
slide-18
SLIDE 18
  • Example
  • 2-bit counter
  • value of j-th bit at i-th step
  • At each step, increment counter:
  • Verify (bounded) invariant:
  • Initial state:

xi[j]

φ(x) = ¬x[0] ∨ ¬x[1]

(it always holds that at least one of the two bits is 0) (counter set to 00)

T(x, x0) = (x0[1] = x[1] ⊕ x[0]) ∧ (x0[0] = ¬x[0])

I(x) = ¬x[0] ∧ ¬x[1]

slide-19
SLIDE 19

x0[0] x0[1] x0 x1[0] x1[1] x1 x2[0] x2[1] x2 x3[0] x3[1] x3

φ(x) = ¬x[0] ∨ ¬x[1]

T(x0,x1)

T(x, x0) = (x0[1] = x[1] ⊕ x[0]) ∧ (x0[0] = ¬x[0])

T(x1,x2) T(x2,x3)

I(x) = ¬x[0] ∧ ¬x[1]

  • Example
slide-20
SLIDE 20

x0[0] x0[1] x0 x1[0] x1[1] x1 x2[0] x2[1] x2 x3[0] x3[1] x3

φ(x) = ¬x[0] ∨ ¬x[1]

T(x0,x1)

T(x, x0) = (x0[1] = x[1] ⊕ x[0]) ∧ (x0[0] = ¬x[0])

T(x1,x2) T(x2,x3)

I(x0) ∧ ^

0≤i<3

T(xi, xi+1) ∧ _

0≤i≤3

¬φ(xi)

NEED TO CHECK:

I(x) = ¬x[0] ∧ ¬x[1]

  • Example
slide-21
SLIDE 21

x0[0] x0[1] x0 x1[0] x1[1] x1 x2[0] x2[1] x2 x3[0] x3[1] x3

φ(x) = ¬x[0] ∨ ¬x[1]

T(x0,x1)

T(x, x0) = (x0[1] = x[1] ⊕ x[0]) ∧ (x0[0] = ¬x[0])

T(x1,x2) T(x2,x3) Step 1: I(x0) ∧ ¬φ(x0)

I(x) = ¬x[0] ∧ ¬x[1]

[UNSAT, safe so far]

  • Example
slide-22
SLIDE 22

x0[0] x0[1] x0 x1[0] x1[1] x1 x2[0] x2[1] x2 x3[0] x3[1] x3

φ(x) = ¬x[0] ∨ ¬x[1]

T(x0,x1)

T(x, x0) = (x0[1] = x[1] ⊕ x[0]) ∧ (x0[0] = ¬x[0])

T(x1,x2) T(x2,x3) Step 1: I(x0) ∧ ¬φ(x0)

I(x0) ∧ T(x0, x1) ∧ ¬φ(x1)

Step 2:

I(x) = ¬x[0] ∧ ¬x[1]

[UNSAT, safe so far] [UNSAT, safe so far]

  • Example
slide-23
SLIDE 23

x0[0] x0[1] x0 x1[0] x1[1] x1 x2[0] x2[1] x2 x3[0] x3[1] x3

φ(x) = ¬x[0] ∨ ¬x[1]

T(x0,x1)

T(x, x0) = (x0[1] = x[1] ⊕ x[0]) ∧ (x0[0] = ¬x[0])

T(x1,x2) T(x2,x3) Step 1: I(x0) ∧ ¬φ(x0)

I(x0) ∧ T(x0, x1) ∧ ¬φ(x1) I(x0) ∧ T(x0, x1) ∧ T(x1, x2) ∧ ¬φ(x2)

Step 2: Step 3:

I(x) = ¬x[0] ∧ ¬x[1]

[UNSAT, safe so far] [UNSAT, safe so far] [UNSAT, safe so far]

  • Example
slide-24
SLIDE 24

x0[0] x0[1] x0 x1[0] x1[1] x1 x2[0] x2[1] x2 x3[0] x3[1] x3

φ(x) = ¬x[0] ∨ ¬x[1]

T(x0,x1)

T(x, x0) = (x0[1] = x[1] ⊕ x[0]) ∧ (x0[0] = ¬x[0])

T(x1,x2) T(x2,x3) Step 1: I(x0) ∧ ¬φ(x0)

I(x0) ∧ T(x0, x1) ∧ ¬φ(x1) I(x0) ∧ T(x0, x1) ∧ T(x1, x2) ∧ ¬φ(x2) I(x0) ∧ T(x0, x1) ∧ T(x1, x2) ∧ T(x2, x3) ∧ ¬φ(x3)

Step 2: Step 3: Step 4: [UNSAT, safe so far] [UNSAT, safe so far] [UNSAT, safe so far] [SAT, CE found: x3[0]=1, x3[1]=1, …]

I(x) = ¬x[0] ∧ ¬x[1]

  • Example
slide-25
SLIDE 25
  • The Length Problem
  • Finite k à incomplete (cannot capture CEs at k’>k)
  • Complexity of BMC depends on k
  • How to find k such that BMC is complete?
  • A possible solution is using diameter of transition system: length of the

longest loop-free path

  • (…but diameter computation can be very expensive)
slide-26
SLIDE 26
  • Finite k à incomplete (cannot capture CEs at k’>k)
  • Complexity of BMC depends on k
  • How to find k such that BMC is complete?
  • A possible solution is using diameter of transition system: length of the

longest loop-free path

  • (…but diameter computation can be very expensive)

In practice…

  • “BMC is normally used for detecting bugs, not for proving their absence.”
  • So when the property doesn’t hold, BMC is efficient since returns the CE

with minimal length

  • Many problems consider bounded properties
  • BMC remains the most effective SAT/SMT-based method in general
  • The Length Problem
slide-27
SLIDE 27
  • Alternative SAT/SMT Methods
  • k-induction [Sheeran et al, FMCAD 2000]
  • CEGAR [Chauhan et al, FMCAD 2002]
  • Interpolation [McMillan, CAV 2003]
  • IC3 [Bradley, VMCAI 2011]
slide-28
SLIDE 28
  • BMC for Synthesis
  • “Parametric” transition system:
  • Problem: Find parameters such that the property is satisfied
  • “Positive query”, property is not negated:
  • If SAT, solver finds parameters for which there exists a path (of length k)

satisfying the property.

  • If UNSAT, no parameters exists that satisfy the property up to length k.

(I(p), T(p))

∃~ p. @I(~ p, ⇢[0]) ∧ ^

0≤i<k

T(~ p, ⇢[i], ⇢[i + 1]) ∧ ^

0≤i≤k

(⇢[i]) 1 A

SYNTHESIS WITH INVARIANT PROPERTY

slide-29
SLIDE 29
  • BMC for Synthesis
  • “Parametric” transition system:
  • Problem: Find parameters such that the property is satisfied
  • “Positive query”, property is not negated:
  • If SAT, solver finds parameters for which there exists a path (of length k)

satisfying the property.

  • If UNSAT, no parameters exists that satisfy the property up to length k.

(I(p), T(p))

I(ρ[0]) ∧ ^

0≤i<k

T(ρ[i], ρ[i + 1]) ∧ _

0≤i<k

¬φ(ρ[i])

INVARIANT VERIFICATION

∃~ p. @I(~ p, ⇢[0]) ∧ ^

0≤i<k

T(~ p, ⇢[i], ⇢[i + 1]) ∧ ^

0≤i≤k

(⇢[i]) 1 A

SYNTHESIS WITH INVARIANT PROPERTY

slide-30
SLIDE 30
  • State space: bit-vectors of length 4
  • Transition relation is an unknown BV arithmetic operation:
  • Encoding exploits “choice variables”:

T(x, x0) = (x0 = x4c), 4 2 {+, , |, &, . . .}

T(op, c, x, x0) =                x0 = x + c if op = 0 x0 = x − c if op = 1 x0 = −x if op = 2 . . . . . . x0 = x NAND c

  • therwise
  • BMC for Synthesis - Example
slide-31
SLIDE 31
  • Property: successor smaller than predecessor
  • Initial state:

φ(x, x0) = x > x0

I(x) = x > 6

∃op, c. I(ρ[0]) ∧ ^

0≤i<3

T(op, c, ρ[i], ρ[i + 1]) ∧ ^

0≤i<3

φ(ρ[i], ρ[i + 1])

NEED TO CHECK:

  • BMC for Synthesis - Example
slide-32
SLIDE 32
  • Property: successor smaller than predecessor
  • Initial state:

φ(x, x0) = x > x0

I(x) = x > 6

[SAT, op=1, c=1, …] à T(x, x0) = (x0 = x − 1) ∃op, c. I(ρ[0]) ∧ ^

0≤i<3

T(op, c, ρ[i], ρ[i + 1]) ∧ ^

0≤i<3

φ(ρ[i], ρ[i + 1])

NEED TO CHECK:

  • BMC for Synthesis - Example
slide-33
SLIDE 33

CHEMICAL REACTION NETWORKS

slide-34
SLIDE 34
  • Chemical Reaction Networks (CRNs)
  • CRN = (species, reactions)
  • Reaction describes multisets of species

consumed and produced

  • Qualitative abstraction
  • No reaction rates
  • Preserves reachability

(S, R) r = (Rr, Pr)

[Yordanov et al, DNA19], [Dachau et al, DNA21]

slide-35
SLIDE 35
  • SMT-based encoding of CRNs
  • Transition system:
  • Each state is a multiset of species: is the count of species s

in state q

T = (Q, q0, T)

q(s)

slide-36
SLIDE 36
  • Transition system:
  • Each state is a multiset of species: is the count of species s

in state q

  • A reaction r is enabled in state q when there are enough

reactants:

T = (Q, q0, T)

q(s)

enabled(r, q) = ^

s∈S

q(s) ≥ Rr(s)

  • SMT-based encoding of CRNs
slide-37
SLIDE 37
  • Transition system:
  • Each state is a multiset of species: is the count of species s

in state q

  • A reaction r is enabled in state q when there are enough

reactants:

  • A state is terminal if no reactions are enabled

T = (Q, q0, T)

q(s)

enabled(r, q) = ^

s∈S

q(s) ≥ Rr(s)

terminal(q) = ^

r∈R

¬enabled(r, q)

  • SMT-based encoding of CRNs
slide-38
SLIDE 38
  • Transition system:
  • Each state is a multiset of species: is the count of species s

in state q

  • A reaction r is enabled in state q when there are enough

reactants:

  • A state is terminal if no reactions are enabled
  • Transition relation:

T = (Q, q0, T)

q(s)

enabled(r, q) = ^

s∈S

q(s) ≥ Rr(s)

terminal(q) = ^

r∈R

¬enabled(r, q)

T(q, q0) = _

r2R

enabled(r, q) ∧ ^

s2S

q0(s) = q(s) + Pr(s) − Rr(s) !

  • SMT-based encoding of CRNs
slide-39
SLIDE 39
  • Analysis of DNA computation [Yordanov et al, DNA19]
  • BMC-based approach to verify correctness of DNA Strand

Displacement (DSD) circuits

  • DSD is a framework for molecular programming where

computation occurs through DNA strands interacting following base pairing rules

slide-40
SLIDE 40
  • Evolution of a DSD circuit can be described as a CRN

BMC-BASED ANALYSIS (Intuition)

^

0≤i<k

T(xi, xi+1) ∧ terminal(xk) ∧ P(x0, xk)

  • P describes the function that the circuit must compute
  • No further reactions possible in the last state
  • Approach used to verify DSD circuits for SQRT, FANOUT,

AND, OR

  • Analysis of DNA computation [Yordanov et al, DNA19]
slide-41
SLIDE 41
  • Synthesis of CRNs [Dachau et al, DNA21]
  • Reaction stoichiometry is unconstrained
  • Synthesis reduces to find satisfiable assignment to

stoichiometry constants

  • For fixed number of reactions n, check
  • Approach used to synthesise networks for Approximate

Majority and Division

∃r1, . . . , rn. ^

0≤i<k

T(xi, xi+1) ∧ terminal(xk) ∧ P(x0, xk)

slide-42
SLIDE 42

SUMMARY

  • Overview of SMT solving

(from SAT to SMT, examples)

  • Bounded Model Checking

(approach, limitations, alternatives, verification and synthesis, examples)

  • Analysis of Chemical Reaction Networks

(SMT encoding, DSD circuit verification, CRNs synthesis)

slide-43
SLIDE 43

REFERENCES

SMT overview

De Moura, Leonardo, and Nikolaj Bjørner. "Satisfiability modulo theories: An appetizer." Formal Methods: Foundations and Applications, 2009. 23-36. Barrett, Clark W., et al. "Satisfiability Modulo Theories." Handbook of satisfiability 185 (2009): 825-885.

Bounded model checking and others

Biere, Armin, et al. "Symbolic Model Checking without BDDs." Tools and Algorithms for the Construction of Analysis of Systems: 5th International Conference, TACAS'99, Springer, 1999. Clarke, Edmund, et al. "Computational challenges in bounded model checking." International Journal on Software Tools for Technology Transfer 7.2 (2005): 174-183. Amla, Nina, et al. "An analysis of SAT-based model checking techniques in an industrial environment." Correct hardware design and verification methods, 2005. 254-268.

slide-44
SLIDE 44

SMT + biological networks

Yordanov, Boyan, et al. "Functional analysis of large-scale DNA strand displacement circuits." DNA Computing and Molecular Programming. Springer International Publishing, 2013. 189-203. Dalchau, Neil, et al. "Synthesizing and tuning chemical reaction networks with specified behaviours." DNA Computing and Molecular Programming. Springer International Publishing, 2015. 16-33. Paoletti, Nicola, et al. "Analyzing and synthesizing genomic logic functions." Computer Aided Verification. Springer International Publishing, 2014.

REFERENCES

Dunn, S-J., et al. "Defining an essential transcription factor program for naïve pluripotency." Science 344.6188 (2014): 1156-1160.