Computational Systems Biology Deep Learning in the Life Sciences - - PowerPoint PPT Presentation

computational systems biology deep learning in the life
SMART_READER_LITE
LIVE PREVIEW

Computational Systems Biology Deep Learning in the Life Sciences - - PowerPoint PPT Presentation

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490 HST.506 David Gifford Lecture 19 April 16, 2020 Predicting genome editing outcomes with machine learning methods http://mit6874.github.io 1 Poll


slide-1
SLIDE 1

Computational Systems Biology Deep Learning in the Life Sciences

6.802 6.874 20.390 20.490 HST.506

David Gifford Lecture 19 April 16, 2020

Predicting genome editing outcomes with machine learning methods

http://mit6874.github.io

1

slide-2
SLIDE 2

Poll Warm Up:

Where are you located today? How do you prefer to receive lectures?

2

slide-3
SLIDE 3

Predicting the outcomes of genome editing

  • CRISPR (cas9) genome editing in detail
  • Assays to detect off target cutting
  • Machine learning models to predict off target cutting
  • Discovering the necessary genome for Tdgf1
  • Machine learning models of on target cutting
  • The limitations of base editing

3

slide-4
SLIDE 4

CRISPR (clustered regularly interspaced short palindromic repeats) editing mechanics

4

slide-5
SLIDE 5

Cas9 nuclease engaged in cutting

17-24 nucleotide RNA spacer complementary to target DNA sequence Required PAM sequence limits available cut sites (NGG Cas9)

5

slide-6
SLIDE 6

Genome cuts resolve in two ways

Desired outcome sequence

6

slide-7
SLIDE 7

https://www.hindawi.com/journals/bmri/2019/1369682/

CRISPR is relevant as a therapeutic tool

7

slide-8
SLIDE 8

CRISPR derivatives can implement many functions

8

slide-9
SLIDE 9

How well does CRISPR find its way to a specific site? Characterizing off target effects with a genome wide assay

9

slide-10
SLIDE 10

GUIDE-seq incorporates a 34-bp phosphothiorated double stranded DNA oligo (dsODN) into cut sites

10

slide-11
SLIDE 11

GUIDE-seq identifies off target cuts

11

slide-12
SLIDE 12

https://www-nature-com/articles/nmeth.4278.pdf

CIRCLE-seq reveals CRISPR cut sites genome wide

12

slide-13
SLIDE 13

CIRCLE-seq reveals CRISPR cut sites genome wide (arrow is intended cut site)

https://www-nature-com/articles/nmeth.4278.pdf

13

slide-14
SLIDE 14

How can we predict off target activity of a CRISPR based enzyme?

14

slide-15
SLIDE 15

Recall our biological model What features should we use to predict off target effects?

17-24 nucleotide RNA spacer complementary to target DNA sequence Required PAM sequence limits available cut sites

15

slide-16
SLIDE 16

Example CRISPR features for off target prediction

  • CROP-IT grades gRNA sequences dividing 23bp

sequence into three regions with different weights, penalty scores for consecutive mismatched sites

  • CCTOP and MIT score considers positions and counts of

mismatches

  • CFD (cutting frequency determination) emulates a large

number of single base, deletion, and insertions in the gRNA and scores these with reference to validated gRNAs in a cellular assay

16

slide-17
SLIDE 17

17

slide-18
SLIDE 18

A deep neural network for classifying CRISPR recognition of a genomic site

RELU

  • n
  • utput

10 4x1 10 4x2 10 4x3 10 4x5

18

slide-19
SLIDE 19

Performance of different architectures on a 5x cross-validation on CRISPOR dataset

19

slide-20
SLIDE 20

Train on CRISPOR dataset Test on GUIDE-seq

20

slide-21
SLIDE 21

What bases are necessary for genome function?

21

slide-22
SLIDE 22

Genome editing allow us to change genome sequence and observe the function of each base in a selected cellular context

22

slide-23
SLIDE 23

Genome editing allow us to change genome sequence and observe the function of each base in a selected cellular context

23

slide-24
SLIDE 24

Genome editing allow us to change genome sequence and observe the function of each base in a selected cellular context

24

slide-25
SLIDE 25

Promoter region Enhancer region Coding sequence

Wh What are the key y genomic ic ele lements that are ne necessary for gene ne expr pression? n?

25

slide-26
SLIDE 26

~650,000 TF Motifs ~50,000 binding sites for a typical TF

Ne Nece cessar ary elements ts will depend upon

  • n ce

cell ty type – th the b binding s sites u used b by a g a given f fact actor c can an d depend upo upon n cell type pe (he here Tcf7l2)

Bi Binding sites change across ti time

26

slide-27
SLIDE 27

An An annotation of potential Tdgf1 1 ci cis-re regulation

27 Histone modifications TF density Dnase-I HS

Predictions of regulatory function based on indirect epigenomic measurements

slide-28
SLIDE 28

Idea – break parts of the genome to see what is essential for gene expression

  • Native context measurement
  • High-throughput
  • Directly observe expression of target gene via GFP
  • Controlled delivery of only 1 gRNA per cell

28 Cell 1 Cell 2 Cell n

Green Florescent Protein (GFP) lights up cell when gene is expressed

slide-29
SLIDE 29

Idea – determine parts of your computer necessary for Zoom by breaking parts

29

slide-30
SLIDE 30

Idea – determine parts of your computer necessary for Zoom by breaking parts

30

slide-31
SLIDE 31

Refinement – need fine grain resolution on what we break

slide-32
SLIDE 32

32

  • efficient
  • random indels
  • highly heterogeneous
  • impractical beyond gene

disruption

  • “genome vandalism”
  • ften inefficient
  • designable
  • undesirable byproducts

We can use CRISPR genome editing to make localized genome alterations that are addressed by a guide RNA (gRNA)

slide-33
SLIDE 33

Mu Multiplexed Editing Regulatory Assay (ME MERA) ex experimenta tal flow:

33

  • 1. Put one gRNA in each cell that targets a location of interest
  • 2. Use CRISPR to ablate the respective location in each cell
  • 3. Sort cells by expression of GFP
  • 4. Sequence gRNAs in each population to determine what locations are

necessary, what locations are not necessary for GFP expression

slide-34
SLIDE 34

Distribution of the log10 ratio of GFPneg to bulk reads for all integrated gRNAs for Tdgf1

34

slide-35
SLIDE 35

ME MERA enables systematic identification of required ci cis- re regulatory elements for Tdgf1

35

slide-36
SLIDE 36

Te Testing of individual gR gRNAs suppo supports s requi quired d ci cis-re regulatory elements for Tdgf1

36

slide-37
SLIDE 37

Ne Necessary genome me goes beyond know

  • wn annotations

(T (Tdgf1 f1)

37 Genomic regions sorted by importance

slide-38
SLIDE 38

How can we predict the genotypes of on target CRISPR cuts?

38

slide-39
SLIDE 39

39

  • efficient
  • random indels
  • highly heterogeneous
  • impractical beyond gene

disruption

  • “genome vandalism”
  • ften inefficient
  • designable
  • undesirable byproducts

The state of CRISPR genome editing

slide-40
SLIDE 40

40

  • efficient
  • predictable indels
  • can be homogeneous
  • practical: repair of pathogenic

alleles to wild-type

  • “genome art”
  • ften inefficient
  • designable
  • predictable byproducts

The state of CRISPR genome editing

slide-41
SLIDE 41

High-throughput genome-integrated assay of Cas9-mediated DNA repair

41

  • 96 target sites in largest previous study
  • Designed 1,872 target sites (55-bp) based on the human genome
  • Observed 1,262 unique genotypes / target site

Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.

slide-42
SLIDE 42

42

Cas9 primarily causes microhomology deletions in genome-integrated and endogenous settings

mESC A microhomology deletion is a deletion with multiple equal-scoring alignments Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.

slide-43
SLIDE 43

43

Cas9 primarily causes microhomology deletions in genome-integrated and endogenous settings

mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.

slide-44
SLIDE 44

44

Majority of repair products arise from microhomology-mediated end-joining (MMEJ)

mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.

slide-45
SLIDE 45

45

inDelphi predicts 90% of repair products from 3 major repair classes

mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.

slide-46
SLIDE 46

46

1-bp insertions copy the adjacent nucleotide

Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.

slide-47
SLIDE 47

47

1-bp insertion frequency depends on local sequence context

Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.

slide-48
SLIDE 48

48

inDelphi accurately predicts nearly all repair outcomes

Input: Sequence, cutsite

  • Predicts 90% of observed repair outcomes
  • 70% at single-base resolution

Training & testing on held-out cell-types

  • Median r = 0.87 on genotype prediction
  • Median r = 0.84 on indel length prediction
slide-49
SLIDE 49

49

inDelphi accurately predicts frameshifts

Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.

slide-50
SLIDE 50

50

Target sites yielding a single deletion repair genotype >50% of the time

Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.

slide-51
SLIDE 51

51

Target sites yielding a single insertion repair genotype >50% of the time

Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. Weak microhomology Local sequence context

slide-52
SLIDE 52

How can we use DNA cuts to restore function?

52

slide-53
SLIDE 53

53

inDelphi predicts that 5% of gRNAs yield a single repair genotype the majority of the time

Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.

slide-54
SLIDE 54

54

Pathogenic microduplications are efficiently repairable to wild-type with simple Cas9 cutting

Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.

slide-55
SLIDE 55

55

23,018

  • Clinvar

and HGMD

1,592

  • Identified and

designed with highest repair %

865

  • Candidates

for wild-type repair after quality filtering

682 inDelphi identified 183 pathogenic alleles corrected to wild-type at >50% frequency (r = 0.64)

Candidates for wild-type frame repair after quality filtering Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.

slide-56
SLIDE 56

56

Efficient repair of pathogenic alleles to wild-type with template-free Cas9-nuclease treatment

  • Primary patient-derived fibroblasts

Human and mouse cell lines

  • SpCas9 and SaCas9
  • HPS1 71%

LDLR 77% PORCN 48% GAA 68% GLB1 42%

Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.

slide-57
SLIDE 57

Correcting Duchenne muscular dystrophy exon 44 deletion with CRISPR

https://advances.sciencemag.org/content/5/3/eaav4324

slide-58
SLIDE 58

58

  • efficient
  • predictable indels
  • can be homogeneous
  • practical: repair of pathogenic

alleles to wild-type

  • “genome art”
  • ften inefficient
  • designable
  • predictable byproducts

The state of CRISPR genome editing

slide-59
SLIDE 59

59

slide-60
SLIDE 60

60

slide-61
SLIDE 61

61

slide-62
SLIDE 62

The limitations of base editing

62

slide-63
SLIDE 63

Cytosine base editors (CBEs) have issues with untargeted edits

slide-64
SLIDE 64

Adenine base editors (ABEs) have issues with untargeted edits

slide-65
SLIDE 65

Prime editing is guided by pegRNAs – Lower off target rate, but not zero

[C->T, G->A, A-G, T->C] [C->A, C->G, G->C, G->T, A->C, A->T, T->A, T->G]

slide-66
SLIDE 66

FIN - Thank You