Model I dentification: Model I dentification: A Key Challenge in A - - PowerPoint PPT Presentation

model i dentification model i dentification a key
SMART_READER_LITE
LIVE PREVIEW

Model I dentification: Model I dentification: A Key Challenge in A - - PowerPoint PPT Presentation

Model I dentification: Model I dentification: A Key Challenge in A Key Challenge in Com putational Com putational System s Biology System s Biology Eberhard O.Voit Eberhard O.Voit Departm ent of Biom edical Engineering Departm ent of Biom


slide-1
SLIDE 1

1

Eberhard O.Voit Eberhard O.Voit

Departm ent of Biom edical Engineering Departm ent of Biom edical Engineering Georgia I nstitute of Technology and Em ory University Georgia I nstitute of Technology and Em ory University Atlanta, Georgia Atlanta, Georgia The 2 The 2 nd

nd I nternational Sym posium

I nternational Sym posium

  • n Optim ization and System s Biology ( OSB'0 8 )
  • n Optim ization and System s Biology ( OSB'0 8 )

Lijiang Lijiang, China, 3 1 October , China, 3 1 October – – 3 Novem ber 2 0 0 8 3 Novem ber 2 0 0 8

Model I dentification: Model I dentification: A Key Challenge in A Key Challenge in Com putational Com putational System s Biology System s Biology

slide-2
SLIDE 2

2

Overview Overview

Systems Biology and Optimization Choice of a Suitable Model Bottom-up and Top-down Model Estimation Technical Issues Dynamic Flux Estimation Open Problems

slide-3
SLIDE 3

3

System s Biology System s Biology Biological System Data measure explain Model match

slide-4
SLIDE 4

4

extrapolate manipulate

  • ptimize

Biological System Data measure explain Model match Optim ization Optim ization

slide-5
SLIDE 5

5

System s Biology System s Biology Biological System Data measure explain extrapolate manipulate

  • ptimize

Optim ization Optim ization Model match

slide-6
SLIDE 6

6

System s Biology System s Biology extrapolate manipulate

  • ptimize

Biological System Data measure explain Model match Optim ization Optim ization

Focus today Focus today

slide-7
SLIDE 7

7

Application: Pathw ay Modeling Application: Pathw ay Modeling

Model Understanding Extrapolation Manipulation Optimization Model Structure Literature, KEGG, de novo Experiments “Local” Data Literature, Brenda, de novo Experiments (Enzyme Kinetics) Local Processes “Global” Data Internet, de novo Experiments (Microarrays, Proteomics, Mass Spec, NMR, Time Series

“ “inverse inverse problem problem ” ”

slide-8
SLIDE 8

8

Overview of Modeling Process Overview of Modeling Process

slide-9
SLIDE 9

9

Xi Vi

+

Vi

) ,..., , ,..., , (

1 2 1 m n n n i i

X X X X X V V

+ + + + =

inside

  • utside

− + −

= =

i i i i

V V dt dX X &

Form ulation of a Form ulation of a Dynam ical System s Model Dynam ical System s Model

com plicated Big Problem: Where do we get functions from?

slide-10
SLIDE 10

10

Sources of Functions for Sources of Functions for Com plex System s Models Com plex System s Models

Physics: Functions come from theory Biology: No theory available Solution 1: Educated guesses: growth functions Solution 2: “Partial” theory: Enzyme kinetics Solution 3: Generic approximation

slide-11
SLIDE 11

11

E (1) EAB (3) EQ (4) EA (2)

k12 k23 k41 k34 k14 k43 k21 k32

EPQ A+B P+Q

(B)(P)(Q) AB coef. coef.B B coef. coef.BQ BQ coef. coef.BPQ (A)(B)(P) AB coef. coef.ABP (P)(Q) AB coef. A coef. A coef. constant constant coef.Q coef.Q coef.PQ (B)(Q) AB coef. coef.B B coef. coef.BQ (A)(P) AB coef. coef.A A coef. coef.AP (Q) AB coef. A coef. A coef. constant constant coef.Q (P) AB coef. A coef. A coef. AP coef. AP coef. coef.P (A)(B) AB coef. AB coef. (B) AB coef. B coef. (A) AB coef. A coef. AB coef. A coef. A coef. constant (P)(Q) num.1 num.2 AB coef. num.1 (B) (A) AB coef. num.1 ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ × × + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ × × × + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ × + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ × + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ × × + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ × × + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ × ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ × ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ =

  • v

from Schultz (1994)

A+B P+Q

W hy not Use W hy not Use “ “True True” ” Functions? Functions?

slide-12
SLIDE 12

12

W hy not Use Linear Functions? W hy not Use Linear Functions?

sin(t) 30 60 2 4 time

Example: Heartbeat modeled as stable limit cycle System of linear differential equations

vdP(t) 30 60 2 4 time

System of non-linear differential equations

slide-13
SLIDE 13

13

Form ulation of a Nonlinear Model Form ulation of a Nonlinear Model for Com plex System s for Com plex System s

Challenge: Linear approximation unsuited Infinitely many nonlinear functions Solution w ith Potential: Savageau (1969): Approximate Vi

+ and Vi – in a

logarithmic coordinate system, using Taylor theory. Result: Canonical Modeling; Biochemical Systems Theory.

− + −

= =

i i i i

V V dt dX X &

slide-14
SLIDE 14

14

Exam ple Exam ple

Adenine Excretion as a Function

  • f Plasma Adenine Concentration

Excretion of Adenine Metabolites

0.5 1 1.5 2 20 40 60 80

Concentration and Log of Concentration of Plasma Adenine Log of Excretion of Adenine Metabolites

  • 2
  • 1.5
  • 1
  • 0.5

0.5

  • 2
  • 1

1 2

slide-15
SLIDE 15

15 m n i i i m n i i i

h m n h h i g m n g g i i

X X X X X X X

+ +

+ +

β − α =

, 2 1 , 2 1

... ...

2 1 2 1

&

I m portant: Each term contains exactly those variables that have a direct effect; others have exponents of 0 and drop out.

Result: S Result: S-

  • system

system

Each term is represented as a product of power-functions. Each term contains and only those variables that have a direct effect; others have exponents of 0 and drop out.

α’s and β’s are rate constants, g’s and h’s kinetic orders.

slide-16
SLIDE 16

16

X1 X 2 X 3 X 4 X1 X 2 X 3 X 4

21

1 2 g

X α

41 21

4 1 2 g g X

X α

g41 < 0 g41 = 0

Mapping Mapping Structure Param eters Structure Param eters

slide-17
SLIDE 17

17 m n i i i m n i i i

h m n h h i g m n g g i i

X X X X X X X

+ +

+ +

β − α =

, 2 1 , 2 1

... ...

2 1 2 1

&

S-system Form :

Xi Vi1

+

Vi1

Vi,p

+

Vi,q

− + ∑

− = =

ij ij i i

V V dt dX X &

Alternative Form ulations Alternative Form ulations W ithin BST W ithin BST

slide-18
SLIDE 18

18 m n i i i m n i i i

h m n h h i g m n g g i i

X X X X X X X

+ +

+ +

β − α =

, 2 1 , 2 1

... ...

2 1 2 1

&

S-system Form :

Xi Vi1

+

Vi1

Vi,p

+

Vi,q

− + ∑

− = =

ij ij i i

V V dt dX X &

Generalized Mass Action Form :

∑ ∏

± =

ijk

f j ik i

X X γ &

Alternative Form ulations Alternative Form ulations

slide-19
SLIDE 19

19

Exam ple of Canonical Model Design Exam ple of Canonical Model Design

X4 X2 X1 X0 X3 X4 X4 X2 X2 X1 X1 X0 X0 X3 X3

GMA / S:

3 . 2 75 . 1 2

5 8 X X X − = & X2(t0) = 1 GMA / S:

2 . 4 5 . 3 3 . 2 3

5 5 X X X X − = & X3(t0) = 0.5 GMA / S:

8 . 4 1 4 5 . 1 4

4 12 X X X X − =

& X4(t0) = 6 GMA / S: X0 = 1.1 (constant) GMA:

1 4 5 . 1 75 . 1 9 . 3 1

12 8 20

− −

− − = X X X X X X & X1(t0) = 0.8 S-system:

45 . 4 64 . 1 9 . 3 1

19 20

− −

− = X X X X X & X1(t0) = 0.8

slide-20
SLIDE 20

20

Exam ple of Canonical Model Design Exam ple of Canonical Model Design

X4 X2 X1 X0 X3 X4 X4 X2 X2 X1 X1 X0 X0 X3 X3

GMA:

1 4 5 . 1 75 . 1 9 . 3 1

12 8 20

− −

− − = X X X X X X & X1(t0) = 0.8 S-system:

45 . 4 64 . 1 9 . 3 1

19 20

− −

− = X X X X X & X1(t0) = 0.8

S-system

4 8 5 10 X1 X2 X4 X3 time

GMA system

4 8 5 10 X1 X2 X4 X3 time

slide-21
SLIDE 21

21

Sphingolipid pathw ay ( purely m etabolic)

  • 1. Many metabolites
  • 2. Many reactions
  • 3. Many stimuli and agents

regulate several enzymes

  • f lipid metabolism
  • 4. Some in vivo experiments

Doable Size Doable Size

Alvarez, Sims, Hannun, Voit JTB, 2004; Nature, 2005

slide-22
SLIDE 22

22

Pathways: purines, glycolysis, citric acid, TCA, red blood cell, trehalose, sphingolipids, ... Genes: circuitry, regulation,… Genome: explain expression patterns upon stimulus Growth, immunology, pharmaceutical science, forestry, ... Metabolic engineering: optimize yield in microbial pathways Dynamic labeling analyses possible Math: recasting, function classification, bifurcation analysis,... Statistics: S-system representation, S-distribution, trends; applied to seafood safety, marine mammals, health economics

Applications Applications

slide-23
SLIDE 23

23

Advantages of Canonical Models Advantages of Canonical Models

Prescribed model design: Rules for translating diagrams into equations; rules can be automated Direct interpretability of parameters and other features One-to-one relationship between parameters and model structure simplifies parameter estimation and model identification Simplified steady-state computations (for S-systems), including steady-state equations, stability, sensitivities, gains Simplified optimization under steady-state conditions Efficient numerical solutions and time-dependent sensitivities In some sense minimal bias of model choice and minimal model size; easy scalability

slide-24
SLIDE 24

24

Vi =Ri (Si , Mi )

p1 , p2 , p3 , …

= fk (Xj , Vi ) dXj dt

Flow Chart of Flow Chart of System s I dentification Strategy System s I dentification Strategy

Voit, Drug Discovery Today, 2004

slide-25
SLIDE 25

25

  • Lots of time-consuming work and effort!
  • Very many a priori assumptions
  • What’s important, what isn’t?
  • Topology
  • Regulation
  • Functional forms
  • Seldom consistent experiments
  • Mixing and matching of organisms, strains, conditions
  • Paucity of data for comparisons with documented

responses

  • Iterative nature of process time consuming

Problem s w ith Traditional Problem s w ith Traditional System I dentification Strategy System I dentification Strategy

slide-26
SLIDE 26

26

  • Use information at the “global” level (in vivo time series data)

to deduce (per model) structure and regulation at the “local” level (connectivity, signals,…)

Alternative to Traditional Modeling: Alternative to Traditional Modeling: Top Top-

  • Dow n Modeling

Dow n Modeling

slide-27
SLIDE 27

27

I nverse Problem s: Sandbox Exam ple I nverse Problem s: Sandbox Exam ple

1 3 5 1 2 3 time concentration X4 X1 X2 X3

X4 X2 X1 X0 X3

VBoMT VBoMT

Voit’s Box of Magic Tricks

slide-28
SLIDE 28

28

∏ ∏

− =

h g

X X X β α &

∏ ∏

− =

' '

' '

h g

Y Y Y β α &

∏ ∏

− =

'' ''

' ' ' '

h g

Z Z Z β α &

BST

Top Top-

  • Dow n

Dow n “ “I nverse I nverse ” ” Modeling Modeling

slide-29
SLIDE 29

29

Key Step: Param eter Key Step: Param eter Estim ation from Tim e Series Data Estim ation from Tim e Series Data

  • According to computer scientists: trivial, solved.
  • Many methods
  • Most work sometimes
  • None works always
  • Estimation remains to be a frustrating topic!
  • Example: Kikuchi et al. 2003
slide-30
SLIDE 30

30

Recent Methods for Param eter Recent Methods for Param eter Estim ation in BST: Estim ation in BST: ~ 1 0 0 papers; no m ethod really good ~ 1 0 0 papers; no m ethod really good

Flux Flux-

  • Based

Based Estimation Estimation

slide-31
SLIDE 31

31

Challenges of I nverse Modeling Challenges of I nverse Modeling

slide-32
SLIDE 32

32

Challenges of I nverse Modeling Challenges of I nverse Modeling

slide-33
SLIDE 33

33

Challenges of I nverse Modeling Challenges of I nverse Modeling

Overly noisy data Missing data points Uncertainties about the

measurements

Non-informative Ill-posed data matrix

slide-34
SLIDE 34

34

Challenges of I nverse Modeling Challenges of I nverse Modeling

Overly noisy data Missing data points Uncertainties about the

measurements

Non-informative Ill-posed data matrix Model selection criteria:

Data dynamics capture ability, mathematical simplicity,

Infinite variety of

formulations tractability, results interpretability

slide-35
SLIDE 35

35

Challenges of I nverse Modeling Challenges of I nverse Modeling

Overly noisy data Missing data points Uncertainties about the

measurements

Non-informative Ill-posed data matrix Model selection criteria:

Data dynamics capture ability, mathematical simplicity,

Infinite variety of

formulations

Lacking convergence or

convergence to local minima

Time consuming for integration of

differential equations

Computational capacity Slow convergence

tractability, results interpretability

slide-36
SLIDE 36

36

Challenges of I nverse Modeling Challenges of I nverse Modeling

Overly noisy data Missing data points Lacking convergence or

convergence to local minima

Time consuming for integration of

differential equations

Distinctly different yet

equivalent solutions

Non-equivalent solutions

with similar error

Uncertainties about the

measurements

Non-informative Ill-posed data matrix Error compensation Computational capacity Slow convergence Model selection criteria:

Data dynamics capture ability, mathematical simplicity,

Infinite variety of

formulations tractability, results interpretability

slide-37
SLIDE 37

37

Old Trick: Slope Estim ation Old Trick: Slope Estim ation

( at least as old as Voit & ( at least as old as Voit & Savageau Savageau, 1 9 8 2 ) , 1 9 8 2 )

)) ( ( ) (

k t k

t X f X t S

k =

≈ &

: ) ,..., ); ( ),...., ( ), ( ( ) ( :

1 2 1

i

iM i j n j j i j i

p p t X t X t X f t S ≈

in i i in i i

h n h h i g n g g i i

X X X X X X f ... ...

2 1 2 1

2 1 2 1

β α − ≈

k h n h h i g n g g i i

t at X X X X X X S

in i i in i i

... ...

2 1 2 1

2 1 2 1

β α − ≈

S-System:

slide-38
SLIDE 38

38

Tow ard a New Trick Tow ard a New Trick

k h n h h i g n g g i i

t at X X X X X X S

in i i in i i

... ...

2 1 2 1

2 1 2 1

β α − ≈

measured estimated from data

Guess βi and hij Terms become Numbers

slide-39
SLIDE 39

39

New Trick: Alternating Regression New Trick: Alternating Regression

k h n h h i g n g g i i

t at X X X X X X S

in i i in i i

... ...

2 1 2 1

2 1 2 1

β α − ≈

k g n g g i h n h h i i

t at X X X X X X S

in i i in i i

... ...

2 1 2 1

2 1 2 1

α β = −

k g n g g i

t at X X X Number

in i i

...

2 1

2 1

α =

k i ij i

t all for X g Number

+ = ) log( ) log( ) log( α

Linear regression yields αi and gij ^ ^

slide-40
SLIDE 40

40

Alternating Regression ( cont Alternating Regression ( cont ’ ’d) d)

k h n h h i g n g g i i

t at X X X X X X S

in i i in i i

... ...

2 1 2 1

2 1 2 1

β α − ≈

^ ^ Use αi and gij and compute “α-term” Merge the numerical value of the α-term with Si and compute βi and hij per linear regression for all time points. Iterate between α - and β - terms until convergence ^ ^

slide-41
SLIDE 41

41

Alternating Regression ( cont Alternating Regression ( cont ’ ’d) d)

Results: Extremely fast, if it converges. Convergence issue very complex.

slide-42
SLIDE 42

42

Problem s w ith Traditional Methods Problem s w ith Traditional Methods

Time to (global) convergence Problems with collinear data Problems with models permitting redundancies Problems with compensation of error among terms

slide-43
SLIDE 43

43

Problem s w ith Traditional Methods: Problem s w ith Traditional Methods: Extrapolation Extrapolation

X4 X2 X1 X0 X3 X4 X4 X2 X2 X1 X1 X0 X0 X3 X3

Former model; here using GMA form

Bad parameters, but good fits because of error compensation

4 8 5 10 X1 X2 X4 X3 time

a Problem with the “misestimated” system during extrapolation

4 8 25 50 X1 X2 X4 X3 time

b

slide-44
SLIDE 44

44

Exam ple: Regulation of Glycolysis in Exam ple: Regulation of Glycolysis in Lactococcus lactis Lactococcus lactis

www.hhmi.org/ bulletin/ winter2005/ images/ bacteria5.jpg

Bacterium involved in dairy, wine, bread, pickle production. Relatively simple organization. Here: study glucose regulation.

Bacteria found in yogurt and cheese: Lactococcus lactis (top), Lactobacillus bulgaricus (blue), Streptococcus thermophilus (orange), Bifidobacterium spec (magenta).

slide-45
SLIDE 45

45

Goals of Modeling Goals of Modeling

  • Understand pathway; design, operation
  • Allow extrapolation to new situations
  • Allow prediction for manipulation
  • Maximize yield of main product
  • Optimize yield of secondary products
  • Eventually develop a cell-wide model
slide-46
SLIDE 46

46

Experim ental Tim e Series Data Experim ental Tim e Series Data

E.O. Voit, J.S. Almeida, S. Marino, R. Lall, G. Goel, A.R. Neves, and H. Santos: IEE Proc. Systems Biol. 2006

slide-47
SLIDE 47

47

Other I nform ation Other I nform ation

slide-48
SLIDE 48

48

Lactococcus Lactococcus Data Data

Had modeled these data before First, difficult to find any solutions Combination of methods led to good fit Later, many rather different solutions Question: Is any of these solutions optimal? Question: Is the BST model appropriate? Problems with extrapolation

slide-49
SLIDE 49

49

Dynam ic Flux Estim ation ( DFE) Dynam ic Flux Estim ation ( DFE)

Inspired by Stoichiometric and Flux Balance Analysis Extended to dynamic time courses Study flux balance at each time point Change in variable @ t = all influxes @ t – all effluxes @ t Linear system; solve as far as possible Result: values of each flux @ t Represent fluxes with appropriate models

  • G. Goel et al., Bioinformatics 2008
slide-50
SLIDE 50

50

Dynam ic Flux Estim ation ( DFE) Dynam ic Flux Estim ation ( DFE)

slide-51
SLIDE 51

51

Dynam ic Flux Estim ation ( DFE) Dynam ic Flux Estim ation ( DFE)

slide-52
SLIDE 52

52

Dynam ic Flux Estim ation ( DFE) Dynam ic Flux Estim ation ( DFE)

slide-53
SLIDE 53

53

Dynam ic Flux Estim ation ( DFE) Dynam ic Flux Estim ation ( DFE)

slide-54
SLIDE 54

54

Dynam ic Flux Estim ation ( DFE) Dynam ic Flux Estim ation ( DFE)

slide-55
SLIDE 55

55

Open Problem s Open Problem s

Sm oothing and Mass conservation: Noise in the data leads to loss or gain of mass Underdeterm ined Flux System s: Linear system of flux often not of full rank Augment DFE with other methods (e.g., AR or bottom-up estimation) Characterization of Redundancies: Data collinear or non-informative (pooling?) Model allows transformation groups (Lie analysis?)

slide-56
SLIDE 56

56

Overriding Challenge Overriding Challenge

Speed and Convenience Algorithms for parameter estimation from time series must become much faster and more robust They must run reliably and “semi-foolproof”

  • n ordinary PC’s without the need
  • f expensive software
slide-57
SLIDE 57

57

W orkflow W orkflow

slide-58
SLIDE 58

58

Sum m ary Sum m ary

Efficiently dealing w ith inverse problem s presents new m odeling opportunities:

  • 1. Time series data are coming! They contain a lot of

implicit information that must be extracted.

  • 2. Technical challenges abound. Important: Efficient,

robust, and fast solutions on PC’s needed.

  • 3. Important overlooked issue: Error compensation;

extrapolation becomes unreliable. DFE promising

slide-59
SLIDE 59

59

Acknow ledgem ents Acknow ledgem ents

The Current Crew : Funding: NI H, NSF, DOE, W oodruff Foundation, Georgia Research Alliance

I nform ation: I nform ation: w w w .bst.bm e.gatech.edu w w w .bst.bm e.gatech.edu

slide-60
SLIDE 60

60

Further I nform ation Further I nform ation

www.bst.bme.gatech.edu