Jonathan Karr karr@mssm.edu July 17, 2019 Join us at Mount Sinai! - - PowerPoint PPT Presentation

jonathan karr
SMART_READER_LITE
LIVE PREVIEW

Jonathan Karr karr@mssm.edu July 17, 2019 Join us at Mount Sinai! - - PowerPoint PPT Presentation

Toward WC models for predicting cellular phenotypes Jonathan Karr karr@mssm.edu July 17, 2019 Join us at Mount Sinai! karr@mssm.edu KarrLab.org Acknowledgements Yassmine Chebaro Yin Hoon Chew Arthur Goldberg Zhouyang Lian Yosef Roth John


slide-1
SLIDE 1

Jonathan Karr

karr@mssm.edu

July 17, 2019

Toward WC models for predicting cellular phenotypes

slide-2
SLIDE 2

Join us at Mount Sinai! karr@mssm.edu KarrLab.org

slide-3
SLIDE 3

Zhouyang Lian Arthur Goldberg Yin Hoon Chew Yassmine Chebaro Bilal Shaikh Balazs Szigeti Yosef Roth

Acknowledgements

John Sekar

slide-4
SLIDE 4

Outline

Genotype to cellular phenotype

– What is a WC model? – Why do we need WC models? – Challenges & feasibility – Foundational principles and state of the art – Progress toward comprehensive models

Tips for modeling complex systems

4

slide-5
SLIDE 5

What is a WC model?

slide-6
SLIDE 6

6

slide-7
SLIDE 7

Goals of WC modeling

Whole cell Dynamic Whole genome Stochastic Whole cell cycle Species-specific Mechanistic

AGTC

7

slide-8
SLIDE 8

Motivation

slide-9
SLIDE 9

Synthetic biology requires WC models

Tissue engineering Biosensors Biofactories

9

slide-10
SLIDE 10

Example: drug biosynthesis

10

slide-11
SLIDE 11

Example: drug biosynthesis

11

slide-12
SLIDE 12

Example: drug biosynthesis

12

slide-13
SLIDE 13

Example: drug biosynthesis

13

slide-14
SLIDE 14

Example: drug biosynthesis

14

slide-15
SLIDE 15

Example: drug biosynthesis

15

slide-16
SLIDE 16

Example: drug biosynthesis

16

slide-17
SLIDE 17

Precision medicine requires WC models

17

slide-18
SLIDE 18

Challenges

slide-19
SLIDE 19

Challenge: explain diverse chemistry

Metabolism FBA Signaling ODE, SSA Transcriptional regulation Logical

19

slide-20
SLIDE 20

Time Length

Replication Growth Transcription Metabolism

Challenge: explain multiple scales

20

slide-21
SLIDE 21

Challenge: capture chemical complexity

21

slide-22
SLIDE 22

Single-cell variation Microscopy Transcription RNA-seq Protein expression Mass-spec, Western blot

Challenge: heterogeneous data

22

slide-23
SLIDE 23

Challenge: incomplete data

23

slide-24
SLIDE 24

Feasibility

slide-25
SLIDE 25

Feasibility: Extensive data

25

slide-26
SLIDE 26

Feasibility: Rule-based modeling

26

slide-27
SLIDE 27

Feasibility: Multi-algorithm simulation

27 Uptake FBA Metabolism FBA Transcription Stochastic events Translation Stochastic events Replication Chemical kinetics

slide-28
SLIDE 28

WC modeling is becoming feasible

Genomic and biochemical data Pathway submodels Rule-based modeling Multi-algorithmic simulation

28

slide-29
SLIDE 29

Workflow

slide-30
SLIDE 30

Workflow

30

slide-31
SLIDE 31
  • E. coli
  • M. genitalium

Genome 4700 kb 580 kb Genes 4461 525 Size 2 μm × 0.5 μm 0.2-0.3 μm

  • 1. Focus on simple cells

31

slide-32
SLIDE 32
  • 2. Aggregate and integrate data

32

slide-33
SLIDE 33
  • 3. Model each process

Metabolism Signaling Transcriptional regulation

33

slide-34
SLIDE 34

Detail Scope

ODE Shuler, 1970’s FBA Palsson, 1990’s Boolean Bolouri, 2000’s Gillespie Luthey-Schulten, 2011 PDE

  • 3. Model each process

34

slide-35
SLIDE 35
  • 3. Model each process

Metabolism Species and reactions

ADP + Pi + 4 H+[p] ↓ ATP + H2O + 3 H+[c] ATPase 𝑤 = kcat ATPase [ADP] 𝐿𝑛 + [ADP] ATPase = 3 * AtpA 1 * AtpB 1 * AtpC 3 * AtpD 10 * AtpE 2 * AtpF 1 * AtpG 1 * AtpH

Catalysis Kinetics

35

slide-36
SLIDE 36

Uptake FBA Composition Metabolism FBA Composition Transcription Stochastic events Gene expression Translation Stochastic events Gene expression Replication Chemical kinetics DNA sequence

Submodels States

Mass, shape Metabolite, RNA, protein counts Mammalian host Transcript, polypeptide sequences DNA polymerization, proteins, modifications FtsZ ring

  • 4. Merge models into a single model

36

slide-37
SLIDE 37

1 s

Uptake Metabolism Transcription Translation Replication

Cell states Cell states

Uptake Metabolism Transcription Translation Replication

Cell states

Uptake Metabolism Transcription Translation Replication

  • 5. Co-simulate models

37

slide-38
SLIDE 38
  • 6. Verify model

Matches training data

Cell mass, volume Biomass composition RNA, protein expression, half- lives Superhelicity

Matches published data

Metabolite concentrations DNA-bound protein density Gene essentiality

Matches theory

Mass conservation Central dogma Cell theory Evolution

No obvious errors

Plot model predictions Manually inspect data Compare to known biology

38

slide-39
SLIDE 39

State of the art

slide-40
SLIDE 40

40

slide-41
SLIDE 41

WC models provide novel insights

v v

41

slide-42
SLIDE 42

WC models help design cells

lacI

42

slide-43
SLIDE 43

WC models help purpose drugs

43

slide-44
SLIDE 44

Limitations of the Mycoplasma model

  • Represents one of the smallest bacteria
  • Ignores several processes
  • Mispredicts several phenotypes
  • Methods were ad hoc
  • Hard to understand, reuse, and expand
  • Time-consuming to build

44

slide-45
SLIDE 45

Toward more comprehensive and more accurate models

slide-46
SLIDE 46

Goal: design precise therapy

46

slide-47
SLIDE 47
  • Karyotypically normal
  • Autonomous
  • Well-characterized

Challenge: H1 hESCs

47

slide-48
SLIDE 48

Bottlenecks

48

slide-49
SLIDE 49

Bottlenecks

  • Data aggregation: Hard to find relevant data

– Data is incomplete, scattered, and insufficient annotated

  • Model design: Hard to capture multiple scales and

describe models modularly

– Insufficient abstraction and metadata

  • Simulation: Hard to simulate multiple scales

– Simulators are only support individual formalisms and are slow

  • Verification: Little formalism or standardization
  • Collaboration: Difficult to describe the data,

assumptions, and decisions that underlie modeling

49

slide-50
SLIDE 50

Data needed for WC modeling

𝑤 = 𝑙cat [enzyme] substrate substrate + 𝐿m

Metabolite concentrations Enzyme concentrations Reaction kinetics

50

slide-51
SLIDE 51

Data needed for WC modeling

51

slide-52
SLIDE 52

Datanator: data integration & discovery

Aggregate Find Reduce Review

7.3 10-4 mM

  • Species
  • Environment

52

slide-53
SLIDE 53

Datanator: data aggregation

Metabolites

  • ChEBI
  • ECMDB, YMDB
  • PubChem

DNA

  • GenBank

RNA

  • Array Express
  • MODOMICS
  • RNALocate
  • RNA MOD

Protein

  • COMPARTMENTS
  • CORUM
  • Human Protein Ref. DB
  • Pax-DB
  • PDB
  • PSORTdb
  • RESID
  • UniProt

Pathways

  • KEGG
  • Pathway

Commons

  • Reactome
  • WikiPathways

Rates

  • BRENDA
  • SABIO-RK

Interactions

  • BioCyc
  • DBTBS
  • DrugBank
  • JASPAR
  • KEGG
  • SuperTarget

Taxonomy

  • NCBI

53

slide-54
SLIDE 54

Measured entity/property Measured value, uncertainty, units Genotype

– Taxon – Genetic variant – Cell, tissue type

Environment

– Temperature – pH – Growth media

Data generation process

– Experimental design – Measurement method

Data analysis process

– Software – Version

Metadata

– Authors – Curator – Date – Citation

Datanator: actionable metadata

54

slide-55
SLIDE 55

Datanator: Finding relevant data

Chemical similarity

– Tanimoto index – Sequence similarity

Genetic similarity

– Whole-genome similarity – Taxonomic distance

Environmental similarity

– Temperature – pH

55

slide-56
SLIDE 56

WC-Lang: scalable model descriptions

  • Concretely describe composite multi-

algorithmic models

  • Concrete descriptions of every model element
  • Capture data and assumptions underlying

models

  • Explicit descriptions of mixed granularity /

lumping

  • Structured description of initial conditions
  • User interfaces suited to large models

57

slide-57
SLIDE 57

WC-Lang: scalable model descriptions

RNA(i, 0) + NTP(i, 1)  RNA(i, 1) + PPi RNA(i, 1) + NTP(i, 2)  RNA(i, 2) + PPi RNA(i, 2) + NTP(i, 3)  RNA(i, 3) + PPi RNA(i, 3) + NTP(i, 4)  RNA(i, 4) + PPi …

58

slide-58
SLIDE 58

WC-Lang: scalable model descriptions

RNA(i, l) + NTP(i, l+1)  RNA(i, l+1) + PPi RNA(I, l) + H2O  RNA(i, l-1) + NMP(i, l) Protein(i, l) + AA(i, l+1)  Protein(i, l+1) + H2O Protein(i, l) + H2O  Protein(i, l-1) + AA(i, l)

59

slide-59
SLIDE 59

WC-Lang: scalable model descriptions

Initiation Elongation Termination SBML 1 per RNA 335 1 per base ~500k 1 per RNA 335 Rules 1 1 1 1 1 1

61

slide-60
SLIDE 60

WC-Sim: scalable co-simulation

64

slide-61
SLIDE 61

H1-hESC model

Recon 2.2 H1 model

  • Kinetic data

(SABIO-RK)

  • Protein

abundance (Phanstiel et al., 2011; PaxDB)

Composable model

  • H1

transcriptomics data (ENCODE)

  • Cell composition
  • Media

composition

66

slide-62
SLIDE 62

Summary

slide-63
SLIDE 63

Availability

  • Code: code.karrlab.org (GitHub, PyPI)
  • Data: data.karrlab.org (Quilt)
  • Images: DockerHub
  • Primer and docs: docs.karrlab.org
  • Tutorials: sandbox.karrlab.org

68

slide-64
SLIDE 64

Summary

Bioengineering and medicine needs WC models WC modeling is becoming feasible New technologies will enable WC modeling Pilot models will show the feasibility of bacteria and human models

69

slide-65
SLIDE 65

Tips & tricks

70

slide-66
SLIDE 66

Challenges to g2p2pop

  • Build models from imperfect data
  • Capture complexity within and between

scales

  • Systematically link scales
  • Scalably simulate multiple scales
  • Collaborate
slide-67
SLIDE 67

Stretch goals inspire innovation

72

slide-68
SLIDE 68

Integration enables great scope and depth

73

  • Data aggregation
  • Model composition
  • Multi-algorithmic co-simulation
  • Modular methods and software
  • Interdisciplinary collaboration
slide-69
SLIDE 69

Frameworks enable scalable integration

75

slide-70
SLIDE 70

Common languages enable frameworks

76

slide-71
SLIDE 71

Agent-based modeling can capture complexity

77

slide-72
SLIDE 72

Collaboration enables solutions

78

slide-73
SLIDE 73

Modularity enables collaboration

79

slide-74
SLIDE 74

Sharing promotes collaboration

  • Quilt: data
  • GitHub: code
  • PyPI: packaged code
  • Docker: computing environments
  • Google Docs, Overleaf: written documents
  • Google Drive: other files
  • GitHub issues: tasks

80

slide-75
SLIDE 75

Common practices ease collaboration

  • Interfaces between modules
  • Coarse-graining
  • Package organization
  • Coding, documentation styles
  • Software libraries

81

slide-76
SLIDE 76

QC inspires trust among collaborators

82

slide-77
SLIDE 77

Data integration enables modelers to drive science

83

slide-78
SLIDE 78

Summary

slide-79
SLIDE 79

Integration is enabling WC modeling

Genomic and biochemical data Pathway models Rule-based modeling Multi-algorithmic simulation

85