Computational Systems Biology Deep Learning in the Life Sciences
6.802 6.874 20.390 20.490 HST.506
David Gifford Lecture 19 April 16, 2020
Predicting genome editing outcomes with machine learning methods
http://mit6874.github.io
1
Computational Systems Biology Deep Learning in the Life Sciences - - PowerPoint PPT Presentation
Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490 HST.506 David Gifford Lecture 19 April 16, 2020 Predicting genome editing outcomes with machine learning methods http://mit6874.github.io 1 Poll
6.802 6.874 20.390 20.490 HST.506
David Gifford Lecture 19 April 16, 2020
http://mit6874.github.io
1
2
Predicting the outcomes of genome editing
3
4
Cas9 nuclease engaged in cutting
17-24 nucleotide RNA spacer complementary to target DNA sequence Required PAM sequence limits available cut sites (NGG Cas9)
5
Genome cuts resolve in two ways
Desired outcome sequence
6
https://www.hindawi.com/journals/bmri/2019/1369682/
CRISPR is relevant as a therapeutic tool
7
CRISPR derivatives can implement many functions
8
9
GUIDE-seq incorporates a 34-bp phosphothiorated double stranded DNA oligo (dsODN) into cut sites
10
GUIDE-seq identifies off target cuts
11
https://www-nature-com/articles/nmeth.4278.pdf
CIRCLE-seq reveals CRISPR cut sites genome wide
12
CIRCLE-seq reveals CRISPR cut sites genome wide (arrow is intended cut site)
https://www-nature-com/articles/nmeth.4278.pdf
13
14
Recall our biological model What features should we use to predict off target effects?
17-24 nucleotide RNA spacer complementary to target DNA sequence Required PAM sequence limits available cut sites
15
Example CRISPR features for off target prediction
sequence into three regions with different weights, penalty scores for consecutive mismatched sites
mismatches
number of single base, deletion, and insertions in the gRNA and scores these with reference to validated gRNAs in a cellular assay
16
17
A deep neural network for classifying CRISPR recognition of a genomic site
RELU
10 4x1 10 4x2 10 4x3 10 4x5
18
Performance of different architectures on a 5x cross-validation on CRISPOR dataset
19
Train on CRISPOR dataset Test on GUIDE-seq
20
21
22
23
24
Promoter region Enhancer region Coding sequence
25
~650,000 TF Motifs ~50,000 binding sites for a typical TF
Bi Binding sites change across ti time
26
An An annotation of potential Tdgf1 1 ci cis-re regulation
27 Histone modifications TF density Dnase-I HS
Predictions of regulatory function based on indirect epigenomic measurements
28 Cell 1 Cell 2 Cell n
Green Florescent Protein (GFP) lights up cell when gene is expressed
29
30
32
disruption
We can use CRISPR genome editing to make localized genome alterations that are addressed by a guide RNA (gRNA)
Mu Multiplexed Editing Regulatory Assay (ME MERA) ex experimenta tal flow:
33
necessary, what locations are not necessary for GFP expression
Distribution of the log10 ratio of GFPneg to bulk reads for all integrated gRNAs for Tdgf1
34
ME MERA enables systematic identification of required ci cis- re regulatory elements for Tdgf1
35
Te Testing of individual gR gRNAs suppo supports s requi quired d ci cis-re regulatory elements for Tdgf1
36
Ne Necessary genome me goes beyond know
(T (Tdgf1 f1)
37 Genomic regions sorted by importance
38
39
disruption
The state of CRISPR genome editing
40
alleles to wild-type
The state of CRISPR genome editing
High-throughput genome-integrated assay of Cas9-mediated DNA repair
41
Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.
42
Cas9 primarily causes microhomology deletions in genome-integrated and endogenous settings
mESC A microhomology deletion is a deletion with multiple equal-scoring alignments Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.
43
Cas9 primarily causes microhomology deletions in genome-integrated and endogenous settings
mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.
44
Majority of repair products arise from microhomology-mediated end-joining (MMEJ)
mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.
45
inDelphi predicts 90% of repair products from 3 major repair classes
mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.
46
1-bp insertions copy the adjacent nucleotide
Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.
47
1-bp insertion frequency depends on local sequence context
Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.
48
inDelphi accurately predicts nearly all repair outcomes
Input: Sequence, cutsite
Training & testing on held-out cell-types
49
inDelphi accurately predicts frameshifts
Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.
50
Target sites yielding a single deletion repair genotype >50% of the time
Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.
51
Target sites yielding a single insertion repair genotype >50% of the time
Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. Weak microhomology Local sequence context
52
53
inDelphi predicts that 5% of gRNAs yield a single repair genotype the majority of the time
Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.
54
Pathogenic microduplications are efficiently repairable to wild-type with simple Cas9 cutting
Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.
55
23,018
and HGMD
1,592
designed with highest repair %
865
for wild-type repair after quality filtering
682 inDelphi identified 183 pathogenic alleles corrected to wild-type at >50% frequency (r = 0.64)
Candidates for wild-type frame repair after quality filtering Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.
56
Efficient repair of pathogenic alleles to wild-type with template-free Cas9-nuclease treatment
Human and mouse cell lines
LDLR 77% PORCN 48% GAA 68% GLB1 42%
Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.
Correcting Duchenne muscular dystrophy exon 44 deletion with CRISPR
https://advances.sciencemag.org/content/5/3/eaav4324
58
alleles to wild-type
The state of CRISPR genome editing
59
60
61
62
Cytosine base editors (CBEs) have issues with untargeted edits
Adenine base editors (ABEs) have issues with untargeted edits
Prime editing is guided by pegRNAs – Lower off target rate, but not zero
[C->T, G->A, A-G, T->C] [C->A, C->G, G->C, G->T, A->C, A->T, T->A, T->G]