Toward comprehensive whole-cell models Genomic and biochemical data - - PowerPoint PPT Presentation

toward comprehensive whole cell models
SMART_READER_LITE
LIVE PREVIEW

Toward comprehensive whole-cell models Genomic and biochemical data - - PowerPoint PPT Presentation

Toward comprehensive whole-cell models Genomic and biochemical data Pathway submodels Rule-based modeling Multi-algorithmic modeling KarrLab.org Jonathan Karr July 6, 2017 karr@mssm.edu Outline Introduction to whole-cell (WC) modeling


slide-1
SLIDE 1

Toward comprehensive whole-cell models

KarrLab.org karr@mssm.edu Jonathan Karr July 6, 2017 Genomic and biochemical data Pathway submodels Rule-based modeling Multi-algorithmic modeling

slide-2
SLIDE 2

Outline

Introduction to whole-cell (WC) modeling

  • What is a WC model?
  • Motivation
  • Challenges
  • Feasibility

Methodology

  • Data aggregation
  • Data organization
  • Hybrid simulation

New tools to accelerate WC modeling

  • Data aggregation
  • Model representation
  • Parallel simulation
slide-3
SLIDE 3

Outline

Introduction to whole-cell (WC) modeling

  • What is a WC model?
  • Motivation
  • Challenges
  • Feasibility

Methodology

  • Data aggregation
  • Data organization
  • Hybrid simulation

New tools to accelerate WC modeling

  • Data aggregation
  • Model representation
  • Parallel simulation
slide-4
SLIDE 4

Features of whole-cell (WC) models

Karr et al., 2015

Whole organism Dynamic Whole genome including each gene Stochastic Whole cell cycle Accurate Species-specific Mechanistic

AGTC

slide-5
SLIDE 5

Outline

Introduction to whole-cell (WC) modeling

  • What is a WC model?
  • Motivation
  • Challenges
  • Feasibility

Methodology

  • Data aggregation
  • Data organization
  • Hybrid simulation

New tools to accelerate WC modeling

  • Data aggregation
  • Model representation
  • Parallel simulation
slide-6
SLIDE 6

Genome design requires WC models

Tissue engineering Biosensors Biofactories

slide-7
SLIDE 7

Example: drug biosynthesis

slide-8
SLIDE 8

Example: drug biosynthesis

slide-9
SLIDE 9

Example: drug biosynthesis

slide-10
SLIDE 10

Example: drug biosynthesis

slide-11
SLIDE 11

Example: drug biosynthesis

slide-12
SLIDE 12

Example: drug biosynthesis

slide-13
SLIDE 13

Example: drug biosynthesis

slide-14
SLIDE 14

Personalized medicine requires WC models

slide-15
SLIDE 15

Outline

Introduction to whole-cell (WC) modeling

  • What is a WC model?
  • Motivation
  • Challenges
  • Feasibility

Methodology

  • Data aggregation
  • Data organization
  • Hybrid simulation

New tools to accelerate WC modeling

  • Data aggregation
  • Model representation
  • Parallel simulation
slide-16
SLIDE 16

WC models are a grand challenge

slide-17
SLIDE 17

Time Length

Replication Growth Transcription Metabolism

Challenge: multiple time and length scales

slide-18
SLIDE 18

Single-cell variation Microscopy Transcription RNA-seq Protein expression Mass-spec, Western blot

Challenge: heterogeneous data

slide-19
SLIDE 19

Challenge: sparse data

slide-20
SLIDE 20

Metabolic Signaling Transcriptional regulatory

Challenge: heterogeneous granularity

slide-21
SLIDE 21

Outline

Introduction to whole-cell (WC) modeling

  • What is a WC model?
  • Motivation
  • Challenges
  • Feasibility

Methodology

  • Data aggregation
  • Data organization
  • Hybrid simulation

New tools to accelerate WC modeling

  • Data aggregation
  • Model representation
  • Parallel simulation
slide-22
SLIDE 22

WC modeling is now feasible

Genomic and biochemical data Pathway submodels Rule-based modeling Multi-algorithmic modeling

slide-23
SLIDE 23

Extensive molecular data is available

slide-24
SLIDE 24

Numerous predictors are available

  • miRNA targets: TargetScan
  • Operons: OperonPredictor
  • Protein half-lives: N-end rule
  • Protein localization: PSORT
  • Signal sequences: SignalP
  • Transcription start site: Promoter
slide-25
SLIDE 25

Numerous databases are available

slide-26
SLIDE 26

Model design tools are available

MetaFlux

slide-27
SLIDE 27

Model languages are available

slide-28
SLIDE 28

Numerous pathway models are available

slide-29
SLIDE 29

Numerous simulators are available

Uptake FBA Composition Metabolism FBA Composition Transcription Stochastic binding Gene expression Translation Stochastic binding Gene expression Replication Chemical kinetics DNA sequence

slide-30
SLIDE 30

Testing tools are available

PRISM

slide-31
SLIDE 31

Numerous other tools

  • Automated model construction
  • Model refinement
  • Parallel simulation
  • Calibration
  • Analysis and visualization
slide-32
SLIDE 32

Outline

Introduction to whole-cell (WC) modeling

  • What is a WC model?
  • Motivation
  • Challenges
  • Feasibility

Methodology

  • Data aggregation
  • Data organization
  • Hybrid simulation

New tools to accelerate WC modeling

  • Data aggregation
  • Model representation
  • Parallel simulation
slide-33
SLIDE 33

Pathway modeling workflow

  • 1. Choose a system to model
  • 2. Determine the scope and granularity of the model
  • 3. Determine the mathematical representation of the

model

  • 4. Reconstruct the species, reactions, rate laws, and

rate parameters from the literature

  • 5. Debug and calibrate the model by comparison to

data

  • 6. Test the model by comparison to independent data
slide-34
SLIDE 34

Detail Scope

ODE Shuler, 1970’s FBA Palsson, 1990’s Boolean Bolouri, 2000’s Gillespie Luthey-Schulten, 2011 PDE

WC model

Predictive modeling methodologies

slide-35
SLIDE 35

Scaling pathway modeling to whole-cells

  • Aggregate more data

– Accelerate data aggregation through automation – Organize input data using pathway/genome databases

  • Build models collaboratively using web-based tools

– Define the semantic meaning of every model component – Track every assumption and data source

  • Describe models clearly

– Explicitly describe the data used to build models – Describe models in terms of rules

  • Describe and simulate hybrid models
slide-36
SLIDE 36

Scaling pathway modeling to whole-cells

Genomics, bioinformatics ↔ Mechanistic modeling Pathway/genome databases ↔ Model design tools Polymers, sequences ↔ Rule-based modeling Stochastic modeling ↔ Steady-state modeling (FBA) Numerical simulation ↔ Big data analytics Model design tools ↔ Collaboration tools

slide-37
SLIDE 37

WC modeling workflow

slide-38
SLIDE 38

Aggregate data

Fraser et al., 1995; Kühner et al., 2009; Lluch-Senar et al., 2013; Maier et al., 2013; Yus et al. 2012

Proteome Mass-spectrometry Transcriptome RNA-seq Epigenome Meth-seq Genome DNA-seq Metabolome Mass-spectrometry

slide-39
SLIDE 39

Karr et al., 2013

Organize input data

slide-40
SLIDE 40

Free Bound Promoter Bound Active

  • 1. Update RNA polymerase states
  • 3. Bind RNA polymerase
  • 2. Calculate promoter affinities
  • 4. Elongate and terminate transcripts

AUGAUCCGUCUCUAAUGUCUAC UTCAACGUGAGGUAAUAAAGUC UCCACGAUGCUACUGUAUC GCCUCAUACUGCGGAU UUACGUAUCAGUGAUCAGUACU Sequence Transcript

HcrA Spx Fur GntR LuxR glpF dnaJ dnaK gntR trxB polC

Design submodels

slide-41
SLIDE 41

Design pathway submodels

Uptake FBA Composition Metabolism FBA Composition Transcription Stochastic events Gene expression Translation Stochastic events Gene expression Replication Chemical kinetics DNA sequence

slide-42
SLIDE 42

Uptake FBA Composition Metabolism FBA Composition Transcription Stochastic events Gene expression Translation Stochastic events Gene expression Replication Chemical kinetics DNA sequence

Submodels States

Mass, shape Metabolite, RNA, protein counts Mammalian host Transcript, polypeptide sequences DNA polymerization, proteins, modifications FtsZ ring

Combine submodels

slide-43
SLIDE 43

1 s

Uptake Metabolism Transcription Translation Replication

Cell states Cell states

Uptake Metabolism Transcription Translation Replication

Cell states

Uptake Metabolism Transcription Translation Replication

Concurrently integrate submodels

slide-44
SLIDE 44

Calibrate model

1.Estimate individual parameters 2.Generate reduced models of individual pathways and to calibrate individual pathways 3.Refine joint parameter values using full models

slide-45
SLIDE 45

Verify model against known biology

Matches training data

Cell mass, volume Biomass composition RNA, protein expression, half-lives Superhelicity

Matches published data

Metabolite concentrations DNA-bound protein density Gene essentiality

Matches theory

Mass conservation Central dogma Cell theory Evolution

No obvious errors

Plot model predictions Manually inspect data Compare to known biology

slide-46
SLIDE 46

Verify model against known biology

Matches training data

Cell mass, volume Biomass composition RNA, protein expression, half-lives Superhelicity

Matches published data

Metabolite concentrations DNA-bound protein density Gene essentiality

Matches theory

Mass conservation Central dogma Cell theory Evolution

No obvious errors

Plot model predictions Manually inspect data Compare to known biology

slide-47
SLIDE 47

Verify model against known biology

Matches training data

Cell mass, volume Biomass composition RNA, protein expression, half-lives Superhelicity

Matches published data

Metabolite concentrations DNA-bound protein density Gene essentiality

Matches theory

Mass conservation Central dogma Cell theory Evolution

No obvious errors

Plot model predictions Manually inspect data Compare to known biology

slide-48
SLIDE 48

Verify model against known biology

Matches training data

Cell mass, volume Biomass composition RNA, protein expression, half-lives Superhelicity

Matches published data

Metabolite concentrations DNA-bound protein density Gene essentiality

Matches theory

Mass conservation Central dogma Cell theory Evolution

No obvious errors

Plot model predictions Manually inspect data Compare to known biology

slide-49
SLIDE 49

Verify model against known biology

Matches training data

Cell mass, volume Biomass composition RNA, protein expression, half-lives Superhelicity

Matches published data

Metabolite concentrations DNA-bound protein density Gene essentiality

Matches theory

Mass conservation Central dogma Cell theory Evolution

No obvious errors

Plot model predictions Manually inspect data Compare to known biology

slide-50
SLIDE 50

Verify model against known biology

Matches training data

Cell mass, volume Biomass composition RNA, protein expression, half-lives Superhelicity

Matches published data

Metabolite concentrations DNA-bound protein density Gene essentiality

Matches theory

Mass conservation Central dogma Cell theory Evolution

No obvious errors

Plot model predictions Manually inspect data Compare to known biology

slide-51
SLIDE 51

Verify model against known biology

Matches training data

Cell mass, volume Biomass composition RNA, protein expression, half-lives Superhelicity

Matches published data

Metabolite concentrations DNA-bound protein density Gene essentiality

Matches theory

Mass conservation Central dogma Cell theory Evolution

No obvious errors

Plot model predictions Manually inspect data Compare to known biology

slide-52
SLIDE 52

Verify model against known biology

Matches training data

Cell mass, volume Biomass composition RNA, protein expression, half-lives Superhelicity

Matches published data

Metabolite concentrations DNA-bound protein density Gene essentiality

Matches theory

Mass conservation Central dogma Cell theory Evolution

No obvious errors

Plot model predictions Manually inspect data Compare to known biology

slide-53
SLIDE 53

Verify model against known biology

Matches training data

Cell mass, volume Biomass composition RNA, protein expression, half-lives Superhelicity

Matches published data

Metabolite concentrations DNA-bound protein density Gene essentiality

Matches theory

Mass conservation Central dogma Cell theory Evolution

No obvious errors

Plot model predictions Manually inspect data Compare to known biology

slide-54
SLIDE 54

Verify model against known biology

Matches training data

Cell mass, volume Biomass composition RNA, protein expression, half-lives Superhelicity

Matches published data

Metabolite concentrations DNA-bound protein density Gene essentiality

Matches theory

Mass conservation Central dogma Cell theory Evolution

No obvious errors

Plot model predictions Manually inspect data Compare to known biology

slide-55
SLIDE 55

Verify model against known biology

Matches training data

Cell mass, volume Biomass composition RNA, protein expression, half-lives Superhelicity

Matches published data

Metabolite concentrations DNA-bound protein density Gene essentiality

Matches theory

Mass conservation Central dogma Cell theory Evolution

No obvious errors

Plot model predictions Manually inspect data Compare to known biology

slide-56
SLIDE 56
  • E. coli
  • M. genitalium

Genome 4700 kb 580 kb Genes 4461 525 Size 2 μm × 0.5 μm 0.2-0.3 μm

Mycoplasmas are small in size and complexity

slide-57
SLIDE 57

Karr et al., 2012

Model includes 28 pathway submodels

slide-58
SLIDE 58

Karr et al., 2012

Submodels are integrated through 16 common states

slide-59
SLIDE 59

Model represents 76% of genes

1 10 100

Genes Process

DNA RNA Protein Other

Condensation Segregation Damage Repair Replication Rep Init Trans Reg Degradation Modification Processing Transcription Aminoacylation Complexation Activation Degradation Folding Modification Processing I Processing II Translocation Ribosome Term Org Translation Shape Cytokinesis FtsZ Metabolism

slide-60
SLIDE 60
slide-61
SLIDE 61

v v

Karr et al., 2012

Predict energy consumption

slide-62
SLIDE 62

WC models predict ancestral similarity

slide-63
SLIDE 63

Optimal gene expression

  • M. genitalium
  • M. mycoides
  • M. pneumoniae
slide-64
SLIDE 64

Optimal architecture retains robustness

Optimal gene expression retains robustness

slide-65
SLIDE 65

Purcell et al., 2013

WC models can inform synthetic designs

slide-66
SLIDE 66

Kazakiewicz et al., 2015

WC models can reposition antibiotics

slide-67
SLIDE 67

Outline

Introduction to whole-cell (WC) modeling

  • What is a WC model?
  • Motivation
  • Challenges
  • Feasibility

Methodology

  • Data aggregation
  • Data organization
  • Hybrid simulation

New tools to accelerate WC modeling

  • Data aggregation
  • Model representation
  • Parallel simulation
slide-68
SLIDE 68

Current limitations

  • M. genitalium model is limited and inaccurate
  • Ignores several pathways
  • Mispredicts the growth rates of many single gene disruptions

Methods are not rigorous

  • Data selection
  • Multi-algorithm simulation
  • Parameter estimation
  • Verification

Methods are time-consuming

  • Data aggregation
  • Model design
  • Model verification

Hard to understand, reuse, reproduce

slide-69
SLIDE 69

Technology development goals

Scale to more complex models Accelerate WC model building and simulation Enable more researchers to engage in WC modeling Apply WC modeling to bioengineering and medicine

slide-70
SLIDE 70

WC modeling process

slide-71
SLIDE 71

Accelerate data aggregation

  • Chaperones
  • Complex composition
  • DNA binding sites
  • DNA footprints
  • DNA methylation
  • DNA sequence
  • Gene-drug interactions
  • Genome annotation
  • Growth rates
  • Metabolite concentrations
  • Protein cofactors
  • Protein expression
  • Protein half-lives
  • Protein localization
  • Protein modification
  • RNA editing
  • RNA expression
  • RNA half-lives
  • RNA modification
  • RNA maturation
  • Reaction fluxes
  • Reaction kinetics
  • Reaction stoichiometries
  • Signaling pathways
  • DNA mutations
slide-72
SLIDE 72

Accelerate data aggregation

  • Help modelers quickly get relevant data for a

model

  • Enable modelers to aggregate data

collaboratively

  • Record the provenance of all data

Yosef Roth

slide-73
SLIDE 73

Accelerate data aggregation

  • 1. Merge data from as many sources as

feasible

  • 2. Find most relevant data for the model
  • Species: taxonomic distance
  • Environment: temperature, pH, media
  • 3. Normalize data
  • 4. Calculate weighted consensus of the

relevant data

  • 5. Record all provenance
slide-74
SLIDE 74

Accelerate data aggregation

slide-75
SLIDE 75

Accelerate data aggregation

  • Metabolite concentrations: ECMDB, YMDB
  • RNA expression: Array Express
  • Protein expression: PaxDB
  • Protein complexes: CORUM
  • Protein localization: prediction
  • Protein-DNA interactions: DBD, DBTBS
  • Reaction kinetics: SABIO-RK
slide-76
SLIDE 76

Organize data for model building

slide-77
SLIDE 77

Organize data for model building

slide-78
SLIDE 78

Systemize model descriptions

John Sekar

slide-79
SLIDE 79

Systemize model descriptions

Initiation

Dna(pos=sample(d.tss.pos)) + RnaPol -> Dna(pos=.).RnaPol.Rna(seq=‘’) algorithm: SSA rate law: constants: kcat value = d.tss.rate refs: [10.1093/bioinformatics/btw598, …] units = 1/s

Elongation

Dna(pos=<i>).RnaPol.Rna(seq=<j>) + RevComp(DsDna(pos=<i>)) -> DsDna(pos=<i+1>).RnaPol.Rna(seq=<j>+RevComp(DsDna(pos=<i>))) x-ref: EC: x.x.x.x

84

Rule-based modeling Genomic data Bioinformatic calculations Annotation, provenance Multi-algorithmic modeling

John Sekar

James Faeder, U Pitt

slide-80
SLIDE 80

Language enables compact model descriptions

Initiation Elongation Termination SBML 1 per RNA 335 1 per base ~500k 1 per RNA 335 BioNetGen 1 per RNA 335 1 per RNA 335 1 per RNA 335 WC-Lang 1 1 1 1 1 1

85

slide-81
SLIDE 81

Provenance tracking

slide-82
SLIDE 82

Submodel design

WC-ML, SBML, CellML

slide-83
SLIDE 83

Systemize simulation

Goldberg et al., 2016

Arthur Goldberg

slide-84
SLIDE 84

High-performance simulator

slide-85
SLIDE 85

Simulation results database

slide-86
SLIDE 86

Visual analysis of simulation results

slide-87
SLIDE 87

Future work

  • Online platform for collaborative model

design

  • Parallel, rule-based simulator
  • Scalable methods for calibrating and

validating large models

  • Community standard for verifying WC

models

slide-88
SLIDE 88

Karr Lab overview

Technology development Modeling language

  • Programmatic
  • Rule- and sequence-based
  • Multi-algorithmic

Parallel simulation

  • Reusable
  • Multi-algorithmic
  • Parallel discrete event

simulation

Pilot models

  • M. pneumoniae
  • Expand scope
  • Improve accuracy
  • Drive genome design

Stem cells

  • Personalized models
  • Precision medicine

WC- Rules

slide-89
SLIDE 89

Summary

Genomic and biochemical data Pathway submodels Rule-based modeling Multi-algorithmic modeling

slide-90
SLIDE 90

Methods development James Faeder, U Pitt

  • M. pneumoniae

Veronica Llorens, CRG Maria Lluch-Senar, CRG Samuel Miravet, CRG Luis Serrano, CRG

  • B. subtilis

Pablo Meyer, IBM

Acknowledgements

John Sekar Yosef Roth Roger Rodriguez Arthur Goldberg Yin Hoon Chew Balazs Szigeti

slide-91
SLIDE 91

Summary

Genomic and biochemical data Pathway submodels Rule-based modeling Multi-algorithmic modeling