Mathematical programming techniques applied to biology Fabien - PowerPoint PPT Presentation

Mathematical programming techniques applied to biology Fabien Tarissan 1 Leo Liberti 2 Camilo La Rota 3 1 ISC-PIF (Paris, France) 2 ´ Ecole Polytechnique (Paris, France) 3 IXXI (Lyon, France) October 31, 2008

Context of work Pre-simulation tool for the Morphex european project: Heterogeity at many levels: ◮ organisms ◮ data ◮ reliability ◮ level of details ◮ . . .

Network reconstruction

Network reconstruction Our approach: ◮ Modelisation by means of mathematical programming techniques (constraints) ◮ Reformulation of the models in order to ease the solving Contributions : ◮ Reconstruction of gene regulatory networks: ◮ with continuous dynamics (drosophila) ◮ with discrete dynamics (arabidopsis)

Mathematical Programming � min x f ( x ) subject to g ( x ) ≤ 0 ◮ x ∈ R n are the decision variables ◮ f : R n → R is the objective function ◮ g : R n → R m is the set of constraints + distinction between integer and continuous variables. Let Z ∈ { 1 , . . . , n } such that ∀ i ∈ Z , x i ∈ Z .

Classes of problems � min x f ( x ) subject to g ( x ) ≤ 0 AMPL: A Mathematical Programming Language. Class f , g Z Best solver Best free solver Complexity Θ(10 6 ) LP linear Z = ∅ CPLEX CLP Θ(10 4 ) cNLP convex Z = ∅ SNOPT/FILTER IPOPT Θ(10 3 ) MILP linear Z � = ∅ CPLEX BCP/SYMPHONY Θ(10 2 ) NLP non linear Z = ∅ BARON ? Θ(10 3 ) cMINLP convex Z � = ∅ MINLP bb/FILMINT BONMIN/FILMINT Θ(10 2 ) MINLP non linear Z � = ∅ BARON ?

Application to the drosophila model Continuous regulation of gene products concentrations: dg ia ( t ) = R a Φ( u ia ( t )) − λ a g ia ( t )+ D a ( g i +1 , a ( t ) − 2 g ia ( t )+ g i − 1 , a ( t )) dt ◮ g ia ( t ) is the concentration of gene a in nucleus i at time t ◮ R a is the production rate for gene a ◮ Φ is the sigmoid regulation function ◮ λ a is the decay rate ◮ D a is the diffusion coefficient for gene a

Regulation term The sigmoid definition: Φ( u ) = 1 � u � √ + 1 u 2 + 1 2 Relies on: � W ba g ib ( t ) + m a g bcd u ia ( t ) = + h a i b ∈ N γ ◮ W ba is the weight on the arc ( b , a ) in the GRN ◮ m a is the regulatory influence of the maternal gene bcd ◮ h a is the activation threshold for Φ

The problem Size of the problem: ◮ Network of 6 genes ◮ but missing values for W , R , D , m , λ , h : 66 variables. Confronting the estimation to the observed data: ( t )) 2 + Π R + Π λ + Π D + Π u � � ( g ia ( t ) − g data min ia i ∈ N ι t Penalty function: Π u = e Θ − 1 ) 2 + ( m a v max bcd ) 2 + h 2 � ( W ba v max Θ = Λ( a ) b ( b , a ) ∈ A

Modelling in AMPL 1. Translating the model into AMPL: ◮ Objective function: i ( t )) 2 + max ) 2 + max ) 2 + h 2 X (g a a X (W a b v b X (( m a v bcd i ( t ) − g data min a ) a ∈ N γ a ∈ N γ a ∈ N γ i ∈ N ι b ∈ N γ t ∈ T data ◮ Some penalty functions as constraints: 8 R L ≤ R a ≤ R U > < λ L ≤ λ a ≤ λ U ∀ a ∈ N γ D L ≤ D a ≤ D U > : ◮ PDE as a constraint (discretization): 0 1 u a R a i ( t ) g a i ( t ) − g a + 1) − λ a g a i ( t ) + D a (g a i +1 ( t ) − 2g a i ( t ) + g a i ( t − 1) = ∆ t ( i − 1 ( t )) B C @ 2 q i ( t ) 2 + 1 A u a 2. Other issues: ◮ Mitosis time ◮ Modelling cell division ◮ Updating diffusion coefficient ◮ . . .

Simplifying the model ◮ Driven by biological knowledge: (e.g. boundaries on W , m and h ) ◮ Mathematical reformulating of terms: ◮ exact reformulation: e.g. for u √ u 2 +1 ⇒ z 2 ( u 2 + 1) = 1 = ⇒ ( zu ) 2 + z 2 = 1 1 √ 1. z = u 2 +1 = 2. Let u ′ , u ′′ and z ′ be respectively the uz , u ′ 2 and z 2 . u 2 +1 with u ′ and add constraints: u √ 3. Substitute u ′ = uz 8 > u ′′ + z ′ = 1 > < z ′ = z 2 > u ′′ = u ′ 2 > : ◮ approximative reformulation of z 2

Work achieved so far What is done: 1. the raw model (without any reformulation) 2. various reformulations: ◮ sigmoid (exact): too many variables. ◮ sigmoid (approx): ok. ◮ convex products (approx): ok but feasability issues. 3. run on small data set: good results What will be done: ◮ run on large data set: too heavy for now (need to split the model). ◮ trying other modellisations ( g ia ( t ) = g data ( t )?) ia

Other case of study: Arabidopsis Same approach: ◮ Gene regulatory network ◮ Some knowledge of the network topology ◮ Don’t know the weight on edges Different dynamics: ◮ Descretization of the time � � n ◮ Qualitative activity of gene i : x t +1 α ij w ij x t = H � j − θ i i j =1 • θ i : threshold of activation. � � ( induced production ) • w ij : interaction strength . decay • α ij : Kind of the interaction ( repression = − 1 , activation = +1) Similar problem: Find w ij and θ i

Modelling: defining the GRN Gene Regulatory Network (GRN): ( G , T , α, w , x , ι, θ ) ◮ Sets and Graph: ◮ Functions: α : A → { +1 , − 1 } arc sign ; V : vertexes (genes) w : A → R + arc weight ; A : arcs (interactions) x : V × T → { 0 , 1 } gene activation ; T : = { 1 , 2 , .. } ⊂ N ι : V → { 0 , 1 } initial configuration ; G = ( V , A ) θ : V → R threshold , ◮ Evolution rules x ( v , 1) = ι ( v ) � 1 if � α ( u , v ) w ( u , v ) x ( u , t − 1) ≥ θ ( v ) x ( v , t ) = u ∈ δ − ( v ) 0 otherwise , where δ − ( v ) = { u ∈ V | ( u , v ) ∈ A } for all v ∈ V .

Modelling: defining the problem Given ◮ ( G , T , α ) ◮ S := { 1 .. Smax } : set of stages. ◮ U = { U s } s ∈ S ; U s ⊆ V : nodes of G s (induced subnetworks of G ). ◮ I = { ι s , u } s ∈ S , u ∈ U s ; ι s , u : V → { 0 , 1 } : initial conditions. ◮ Φ = { φ s , u } s ∈ S , u ∈ U s ; φ s , u : V → { 0 , 1 } : expression data. Find w , θ with the property that ∀ � ι s , ( G s , T , α, w ,� x s ,� ι s , θ ) satisfies the evolution rules and has fixed points that collectively minimize the total D H ( ρ, φ ). D H : hamming distance from model fixed points to data. x t for all t ′ > t . fixed points ( � ρ ) : If � x t = � x t − 1 = � ρ then � x t ′ = �

Finding fixed points

Mathematical programming formulation ◮ Objective function X X X ( y s , t − 1 − y s , t ) | x s , u , t − ρ s , u | s ∈ S u ∈ U s t ∈ T \ 1 ◮ Fixed point conditions X | x t s , u − x t − 1 � U s � σ t 1 − y t X σ t s , u | ≤ ≤ s s r u ∈ U s r ≥ t X | x t s , u − x t − 1 σ t X y t σ t s , u | ≥ = 0 s s r u ∈ U s r ≥ t ◮ Evolution rules X α u , v w u , v x t − 1 θ v x t s , v − � V � (1 − x t ≥ s , v ) s , u u ∈ U s :( u , v ) ∈ A X α u , v w u , v x t − 1 ( θ v − ǫ )(1 − x t s , v ) + � V � x t ≤ s , u s , v u ∈ U s :( u , v ) ∈ A

Conclusion on the modelling approach Static modelling of a dynamic system A framework for reconstructing regulatory networks: ◮ of different biological organisms ◮ with different dynamics Drawbacks: ◮ loose of efficiency ◮ might require to introduce new elements Perspectives: ◮ automatization of the reformulations ◮ study more complex qualitative models of GRN ◮ integrating different kind of knowledge (experimental, theoretical, . . . )

Automatic (re)formulation For the modelling part: E.g. 4 “virtual” constraints to express the fixed point (should have been generated!) For the simplification part: Name Nonlinear feasible set Linear feasible set PowBin ( x 1 , x 2 ) ∈ { 0 , 1 } × R : x 2 = x n ( x 1 , x 2 ) ∈ { 0 , 1 } × R : x 2 = x 1 1 exact ( x , x n +1 ) ∈ { 0 , 1 } n × [0 , 1] : ( x , x n +1 ) ∈ { 0 , 1 } n × R : x n +1 = ProdBin x n +1 ≤ x i ∀ i ≤ n Q x i exact x n +1 ≥ 1 − n + P x i i ≤ n i ≤ n 2 ] 2 : ( x 1 , x 2 , x 3 ) ∈ { 0 , 1 } × [ x L 2 , x U x 3 ≤ x U 2 x 1 ProdBin- ( x 1 , x 2 , x 3 ) ∈ { 0 , 1 } × [ x L 2 , x U 2 ] × R : x 3 ≥ x L Cont 2 x 1 x 3 = x 1 x 2 exact x 3 ≤ x 2 + x L 2 (1 − x 1 ) x 3 ≥ x 2 − x U 2 (1 − x 1 ) Leads to Term Rewriting Systems (TRS) properties: ◮ termination ◮ confluence ◮ optimality?

Mathematical programming techniques applied to biology Fabien - PowerPoint PPT Presentation

Mathematical programming techniques applied to biology Fabien Tarissan 1 Leo Liberti 2 Camilo La Rota 3 1 ISC-PIF (Paris, France) 2 Ecole Polytechnique (Paris, France) 3 IXXI (Lyon, France) October 31, 2008 Context of work Pre-simulation tool

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

2019-20 DNA Biology New Products RNA Biology PROTEIN Biology MOLECULAR Biology Plant DNA

Mathematical Modeling and Biology Bo Deng Introduction Examples of Models Bo Deng Consistency

Computational and Mathematical Biology Computational and Mathematical Biology in the Genomics

MATHEMATICS 1 CONTENTS Mathematical programming Linear programming The LP-problem Old exam

Applied Population Biology Brook Milligan Department of Biology New Mexico State University Las

Introduction to Fetal Medicine: Genetics and Embryology Question: What do cancer biology,

connections between cs and biology computing science and biology (1) biology is the science

Standards Synthetic Biology uses new techniques combining biology and engineering to make new or

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Intraday Techniques Intraday Techniques Intraday Techniques Intraday Techniques Combining

Mathematical modeling in biology. D. Salort, LBCQ, Sorbonne University, Paris 03-07 september

Synthetic Biology Considerations in Synthetic Biology Considerations in Synthetic Biology

Biology Majors Information Session Biology Advising Center NHB 2.606 Biology Advising Center

Deep Computing in Biology Challenges and Progress Ajay K. Royyuru Computational Biology Center

Principles of Conservation Biology Biology 462 Brook Milligan Department of Biology New Mexico

INFORMATION VISUALIZATION Alvitta Ottley Washington University in St. Louis Slide Credits:

The Coalescent Evolution backward in time Joachim Hermisson Mathematics and Biosciences Group

Lecture 4 Sequence alignment: how to discover similarities between biological sequences

Bioinformatics for High-Throughput Sequencing Misha Kapushesky St. Petersburg Russia 2010

Variable Dependencies & Q-resolution Friedrich Slivovsky & Stefan Szeider x 1 x 2

Recent Developments at the Brookhaven Source Development Laboratory Brian Sheehy National

17 th April 2012 About the Hallam Centre for Community Justice Evaluation, research, consultancy,

BALTIMORE CITY LEAD BEHAVIORAL HEALTH SYSTEM BALTIMORE Ensures access to a full range of

Mathematical programming techniques applied to biology Fabien - PowerPoint PPT Presentation

Mathematical programming techniques applied to biology Fabien Tarissan 1 Leo Liberti 2 Camilo La Rota 3 1 ISC-PIF (Paris, France) 2 Ecole Polytechnique (Paris, France) 3 IXXI (Lyon, France) October 31, 2008 Context of work Pre-simulation tool

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

2019-20 DNA Biology New Products RNA Biology PROTEIN Biology MOLECULAR Biology Plant DNA

Mathematical Modeling and Biology Bo Deng Introduction Examples of Models Bo Deng Consistency

Computational and Mathematical Biology Computational and Mathematical Biology in the Genomics

MATHEMATICS 1 CONTENTS Mathematical programming Linear programming The LP-problem Old exam

Applied Population Biology Brook Milligan Department of Biology New Mexico State University Las

Introduction to Fetal Medicine: Genetics and Embryology Question: What do cancer biology,

connections between cs and biology computing science and biology (1) biology is the science

Standards Synthetic Biology uses new techniques combining biology and engineering to make new or

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Intraday Techniques Intraday Techniques Intraday Techniques Intraday Techniques Combining

Mathematical modeling in biology. D. Salort, LBCQ, Sorbonne University, Paris 03-07 september

Synthetic Biology Considerations in Synthetic Biology Considerations in Synthetic Biology

Biology Majors Information Session Biology Advising Center NHB 2.606 Biology Advising Center

Deep Computing in Biology Challenges and Progress Ajay K. Royyuru Computational Biology Center

Principles of Conservation Biology Biology 462 Brook Milligan Department of Biology New Mexico

INFORMATION VISUALIZATION Alvitta Ottley Washington University in St. Louis Slide Credits:

The Coalescent Evolution backward in time Joachim Hermisson Mathematics and Biosciences Group

Lecture 4 Sequence alignment: how to discover similarities between biological sequences

Bioinformatics for High-Throughput Sequencing Misha Kapushesky St. Petersburg Russia 2010

Variable Dependencies &amp; Q-resolution Friedrich Slivovsky &amp; Stefan Szeider x 1 x 2

Recent Developments at the Brookhaven Source Development Laboratory Brian Sheehy National

17 th April 2012 About the Hallam Centre for Community Justice Evaluation, research, consultancy,

BALTIMORE CITY LEAD BEHAVIORAL HEALTH SYSTEM BALTIMORE Ensures access to a full range of

Variable Dependencies & Q-resolution Friedrich Slivovsky & Stefan Szeider x 1 x 2