[PPT] - Jonathan Karr karr@mssm.edu July 17, 2019 Join us at Mount Sinai! PowerPoint Presentation

SLIDE 1

Jonathan Karr

karr@mssm.edu

July 17, 2019

Toward WC models for predicting cellular phenotypes

SLIDE 2

Join us at Mount Sinai! karr@mssm.edu KarrLab.org

SLIDE 3

Zhouyang Lian Arthur Goldberg Yin Hoon Chew Yassmine Chebaro Bilal Shaikh Balazs Szigeti Yosef Roth

Acknowledgements

John Sekar

SLIDE 4

Outline

Genotype to cellular phenotype

– What is a WC model? – Why do we need WC models? – Challenges & feasibility – Foundational principles and state of the art – Progress toward comprehensive models

Tips for modeling complex systems

4

SLIDE 5

What is a WC model?

SLIDE 6

6

SLIDE 7

Goals of WC modeling

Whole cell Dynamic Whole genome Stochastic Whole cell cycle Species-specific Mechanistic

AGTC

7

SLIDE 8

Motivation

SLIDE 9

Synthetic biology requires WC models

Tissue engineering Biosensors Biofactories

9

SLIDE 10

Example: drug biosynthesis

10

SLIDE 11

Example: drug biosynthesis

11

SLIDE 12

Example: drug biosynthesis

12

SLIDE 13

Example: drug biosynthesis

13

SLIDE 14

Example: drug biosynthesis

14

SLIDE 15

Example: drug biosynthesis

15

SLIDE 16

Example: drug biosynthesis

16

SLIDE 17

Precision medicine requires WC models

17

SLIDE 18

Challenges

SLIDE 19

Challenge: explain diverse chemistry

Metabolism FBA Signaling ODE, SSA Transcriptional regulation Logical

19

SLIDE 20

Time Length

Replication Growth Transcription Metabolism

Challenge: explain multiple scales

20

SLIDE 21

Challenge: capture chemical complexity

21

SLIDE 22

Single-cell variation Microscopy Transcription RNA-seq Protein expression Mass-spec, Western blot

Challenge: heterogeneous data

22

SLIDE 23

Challenge: incomplete data

23

SLIDE 24

Feasibility

SLIDE 25

Feasibility: Extensive data

25

SLIDE 26

Feasibility: Rule-based modeling

26

SLIDE 27

Feasibility: Multi-algorithm simulation

27 Uptake FBA Metabolism FBA Transcription Stochastic events Translation Stochastic events Replication Chemical kinetics

SLIDE 28

WC modeling is becoming feasible

Genomic and biochemical data Pathway submodels Rule-based modeling Multi-algorithmic simulation

28

SLIDE 29

Workflow

SLIDE 30

Workflow

30

SLIDE 31

E. coli
M. genitalium

Genome 4700 kb 580 kb Genes 4461 525 Size 2 μm × 0.5 μm 0.2-0.3 μm

1. Focus on simple cells

31

SLIDE 32

2. Aggregate and integrate data

32

SLIDE 33

3. Model each process

Metabolism Signaling Transcriptional regulation

33

SLIDE 34

Detail Scope

ODE Shuler, 1970’s FBA Palsson, 1990’s Boolean Bolouri, 2000’s Gillespie Luthey-Schulten, 2011 PDE

3. Model each process

34

SLIDE 35

3. Model each process

Metabolism Species and reactions

ADP + Pi + 4 H+[p] ↓ ATP + H2O + 3 H+[c] ATPase 𝑤 = kcat ATPase [ADP] 𝐿𝑛 + [ADP] ATPase = 3 * AtpA 1 * AtpB 1 * AtpC 3 * AtpD 10 * AtpE 2 * AtpF 1 * AtpG 1 * AtpH

Catalysis Kinetics

35

SLIDE 36

Uptake FBA Composition Metabolism FBA Composition Transcription Stochastic events Gene expression Translation Stochastic events Gene expression Replication Chemical kinetics DNA sequence

Submodels States

Mass, shape Metabolite, RNA, protein counts Mammalian host Transcript, polypeptide sequences DNA polymerization, proteins, modifications FtsZ ring

4. Merge models into a single model

36

SLIDE 37

1 s

Uptake Metabolism Transcription Translation Replication

Cell states Cell states

Uptake Metabolism Transcription Translation Replication

Cell states

Uptake Metabolism Transcription Translation Replication

5. Co-simulate models

37

SLIDE 38

6. Verify model

Matches training data

Cell mass, volume Biomass composition RNA, protein expression, half- lives Superhelicity

Matches published data

Metabolite concentrations DNA-bound protein density Gene essentiality

Matches theory

Mass conservation Central dogma Cell theory Evolution

No obvious errors

Plot model predictions Manually inspect data Compare to known biology

38

SLIDE 39

State of the art

SLIDE 40

40

SLIDE 41

WC models provide novel insights

v v

41

SLIDE 42

WC models help design cells

lacI

42

SLIDE 43

WC models help purpose drugs

43

SLIDE 44

Limitations of the Mycoplasma model

Represents one of the smallest bacteria
Ignores several processes
Mispredicts several phenotypes
Methods were ad hoc
Hard to understand, reuse, and expand
Time-consuming to build

44

SLIDE 45

Toward more comprehensive and more accurate models

SLIDE 46

Goal: design precise therapy

46

SLIDE 47

Karyotypically normal
Autonomous
Well-characterized

Challenge: H1 hESCs

47

SLIDE 48

Bottlenecks

48

SLIDE 49

Bottlenecks

Data aggregation: Hard to find relevant data

– Data is incomplete, scattered, and insufficient annotated

Model design: Hard to capture multiple scales and

describe models modularly

– Insufficient abstraction and metadata

Simulation: Hard to simulate multiple scales

– Simulators are only support individual formalisms and are slow

Verification: Little formalism or standardization
Collaboration: Difficult to describe the data,

assumptions, and decisions that underlie modeling

49

SLIDE 50

Data needed for WC modeling

𝑤 = 𝑙cat [enzyme] substrate substrate + 𝐿m

Metabolite concentrations Enzyme concentrations Reaction kinetics

50

SLIDE 51

Data needed for WC modeling

51

SLIDE 52

Datanator: data integration & discovery

Aggregate Find Reduce Review

7.3 10-4 mM

Species
Environment

52

SLIDE 53

Datanator: data aggregation

Metabolites

ChEBI
ECMDB, YMDB
PubChem

DNA

GenBank

RNA

Array Express
MODOMICS
RNALocate
RNA MOD

Protein

COMPARTMENTS
CORUM
Human Protein Ref. DB
Pax-DB
PDB
PSORTdb
RESID
UniProt

Pathways

KEGG
Pathway

Commons

Reactome
WikiPathways

Rates

BRENDA
SABIO-RK

Interactions

BioCyc
DBTBS
DrugBank
JASPAR
KEGG
SuperTarget

Taxonomy

NCBI

53

SLIDE 54

Measured entity/property Measured value, uncertainty, units Genotype

– Taxon – Genetic variant – Cell, tissue type

Environment

– Temperature – pH – Growth media

Data generation process

– Experimental design – Measurement method

Data analysis process

– Software – Version

Metadata

– Authors – Curator – Date – Citation

Datanator: actionable metadata

54

SLIDE 55

Datanator: Finding relevant data

Chemical similarity

– Tanimoto index – Sequence similarity

Genetic similarity

– Whole-genome similarity – Taxonomic distance

Environmental similarity

– Temperature – pH

55

SLIDE 56

WC-Lang: scalable model descriptions

Concretely describe composite multi-

algorithmic models

Concrete descriptions of every model element
Capture data and assumptions underlying

models

Explicit descriptions of mixed granularity /

lumping

Structured description of initial conditions
User interfaces suited to large models

57

SLIDE 57

WC-Lang: scalable model descriptions

RNA(i, 0) + NTP(i, 1)  RNA(i, 1) + PPi RNA(i, 1) + NTP(i, 2)  RNA(i, 2) + PPi RNA(i, 2) + NTP(i, 3)  RNA(i, 3) + PPi RNA(i, 3) + NTP(i, 4)  RNA(i, 4) + PPi …

58

SLIDE 58

WC-Lang: scalable model descriptions

RNA(i, l) + NTP(i, l+1)  RNA(i, l+1) + PPi RNA(I, l) + H2O  RNA(i, l-1) + NMP(i, l) Protein(i, l) + AA(i, l+1)  Protein(i, l+1) + H2O Protein(i, l) + H2O  Protein(i, l-1) + AA(i, l)

59

SLIDE 59

WC-Lang: scalable model descriptions

Initiation Elongation Termination SBML 1 per RNA 335 1 per base ~500k 1 per RNA 335 Rules 1 1 1 1 1 1

61

SLIDE 60

WC-Sim: scalable co-simulation

64

SLIDE 61

H1-hESC model

Recon 2.2 H1 model

Kinetic data

(SABIO-RK)

Protein

abundance (Phanstiel et al., 2011; PaxDB)

Composable model

H1

transcriptomics data (ENCODE)

Cell composition
Media

composition

66

SLIDE 62

Summary

SLIDE 63

Availability

Code: code.karrlab.org (GitHub, PyPI)
Data: data.karrlab.org (Quilt)
Images: DockerHub
Primer and docs: docs.karrlab.org
Tutorials: sandbox.karrlab.org

68

SLIDE 64

Summary

Bioengineering and medicine needs WC models WC modeling is becoming feasible New technologies will enable WC modeling Pilot models will show the feasibility of bacteria and human models

69

SLIDE 65

Tips & tricks

70

SLIDE 66

Challenges to g2p2pop

Build models from imperfect data
Capture complexity within and between

scales

Systematically link scales
Scalably simulate multiple scales
Collaborate

SLIDE 67

Stretch goals inspire innovation

72

SLIDE 68

Integration enables great scope and depth

73

Data aggregation
Model composition
Multi-algorithmic co-simulation
Modular methods and software
Interdisciplinary collaboration

SLIDE 69

Frameworks enable scalable integration

75

SLIDE 70

Common languages enable frameworks

76

SLIDE 71

Agent-based modeling can capture complexity

77

SLIDE 72

Collaboration enables solutions

78

SLIDE 73

Modularity enables collaboration

79

SLIDE 74

Sharing promotes collaboration

Quilt: data
GitHub: code
PyPI: packaged code
Docker: computing environments
Google Docs, Overleaf: written documents
Google Drive: other files
GitHub issues: tasks

80

SLIDE 75

Common practices ease collaboration

Interfaces between modules
Coarse-graining
Package organization
Coding, documentation styles
Software libraries

81

SLIDE 76

QC inspires trust among collaborators

82

SLIDE 77

Data integration enables modelers to drive science

83

SLIDE 78

Summary

SLIDE 79

Integration is enabling WC modeling

Genomic and biochemical data Pathway models Rule-based modeling Multi-algorithmic simulation

85