computational systems biology deep learning in the life
play

Computational Systems Biology Deep Learning in the Life Sciences - PowerPoint PPT Presentation

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490 HST.506 David Gifford Lecture 19 April 16, 2020 Predicting genome editing outcomes with machine learning methods http://mit6874.github.io 1 Poll


  1. Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490 HST.506 David Gifford Lecture 19 April 16, 2020 Predicting genome editing outcomes with machine learning methods http://mit6874.github.io 1

  2. Poll Warm Up: Where are you located today? How do you prefer to receive lectures? 2

  3. Predicting the outcomes of genome editing • CRISPR (cas9) genome editing in detail • Assays to detect off target cutting • Machine learning models to predict off target cutting • Discovering the necessary genome for Tdgf1 • Machine learning models of on target cutting • The limitations of base editing 3

  4. CRISPR ( clustered regularly interspaced short palindromic repeats ) editing mechanics 4

  5. Cas9 nuclease engaged in cutting Required PAM sequence limits available cut sites (NGG Cas9) 17-24 nucleotide RNA spacer complementary to target DNA sequence 5

  6. Genome cuts resolve in two ways Desired outcome sequence 6

  7. CRISPR is relevant as a therapeutic tool 7 https://www.hindawi.com/journals/bmri/2019/1369682/

  8. CRISPR derivatives can implement many functions 8

  9. How well does CRISPR find its way to a specific site? Characterizing off target effects with a genome wide assay 9

  10. GUIDE-seq incorporates a 34-bp phosphothiorated double stranded DNA oligo (dsODN) into cut sites 10

  11. GUIDE-seq identifies off target cuts 11

  12. CIRCLE-seq reveals CRISPR cut sites genome wide 12 https://www-nature-com/articles/nmeth.4278.pdf

  13. CIRCLE-seq reveals CRISPR cut sites genome wide (arrow is intended cut site) 13 https://www-nature-com/articles/nmeth.4278.pdf

  14. How can we predict off target activity of a CRISPR based enzyme? 14

  15. Recall our biological model What features should we use to predict off target effects? Required PAM sequence limits available cut sites 17-24 nucleotide RNA spacer complementary to target DNA sequence 15

  16. Example CRISPR features for off target prediction • CROP-IT grades gRNA sequences dividing 23bp sequence into three regions with different weights, penalty scores for consecutive mismatched sites • CCTOP and MIT score considers positions and counts of mismatches • CFD (cutting frequency determination) emulates a large number of single base, deletion, and insertions in the gRNA and scores these with reference to validated gRNAs in a cellular assay 16

  17. 17

  18. A deep neural network for classifying CRISPR recognition of a genomic site 10 4x1 RELU 10 4x2 on 10 4x3 output 18 10 4x5

  19. Performance of different architectures on a 5x cross-validation on CRISPOR dataset 19

  20. Train on CRISPOR dataset Test on GUIDE-seq 20

  21. What bases are necessary for genome function? 21

  22. Genome editing allow us to change genome sequence and observe the function of each base in a selected cellular context 22

  23. Genome editing allow us to change genome sequence and observe the function of each base in a selected cellular context 23

  24. Genome editing allow us to change genome sequence and observe the function of each base in a selected cellular context 24

  25. Wh What are the key y genomic ic ele lements that are ne necessary for gene ne expr pression? n? Enhancer region Promoter region Coding sequence 25

  26. Ne Nece cessar ary elements ts will depend upon on ce cell ty type – the b th binding s sites u used b by a g a given f fact actor c can an d depend upo upon n cell type pe (he here Tcf7l2) Bi Binding sites change across ti time ~50,000 binding sites for a typical TF ~650,000 TF Motifs 26

  27. An An annotation of potential Tdgf1 1 ci cis-re regulation Histone modifications Dnase-I HS TF density Predictions of regulatory function based on indirect epigenomic measurements 27

  28. Idea – break parts of the genome to see what is essential for gene expression Green Florescent Protein (GFP) lights up cell when gene is expressed Cell 1 Cell 2 Cell n - Native context measurement - High-throughput - Directly observe expression of target gene via GFP - Controlled delivery of only 1 gRNA per cell 28

  29. Idea – determine parts of your computer necessary for Zoom by breaking parts 29

  30. Idea – determine parts of your computer necessary for Zoom by breaking parts 30

  31. Refinement – need fine grain resolution on what we break

  32. We can use CRISPR genome editing to make localized genome alterations that are addressed by a guide RNA (gRNA) • often inefficient • efficient • designable • random indels • undesirable byproducts • highly heterogeneous • impractical beyond gene disruption • “genome vandalism” 32

  33. Mu Multiplexed Editing Regulatory Assay (ME MERA) ex experimenta tal flow: 1. Put one gRNA in each cell that targets a location of interest 2. Use CRISPR to ablate the respective location in each cell 3. Sort cells by expression of GFP 4. Sequence gRNAs in each population to determine what locations are necessary, what locations are not necessary for GFP expression 33

  34. Distribution of the log 10 ratio of GFP neg to bulk reads for all integrated gRNAs for Tdgf1 34

  35. ME MERA enables systematic identification of required ci cis- re regulatory elements for Tdgf1 35

  36. Te Testing of individual gR gRNAs suppo supports s requi quired d ci cis-re regulatory elements for Tdgf1 36

  37. Ne Necessary genome me goes beyond know own annotations (T (Tdgf1 f1) Genomic regions sorted by importance 37

  38. How can we predict the genotypes of on target CRISPR cuts? 38

  39. The state of CRISPR genome editing • often inefficient • efficient • designable • random indels • undesirable byproducts • highly heterogeneous • impractical beyond gene disruption • “genome vandalism” 39

  40. The state of CRISPR genome editing • often inefficient • efficient • designable • predictable indels • predictable byproducts • can be homogeneous • practical: repair of pathogenic alleles to wild-type • “genome art” 40

  41. High-throughput genome-integrated assay of Cas9-mediated DNA repair • 96 target sites in largest previous study • Designed 1,872 target sites (55-bp) based on the human genome • Observed 1,262 unique genotypes / target site 41 Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.

  42. Cas9 primarily causes microhomology deletions in genome-integrated and endogenous settings A microhomology deletion is a deletion with multiple equal-scoring alignments mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 42

  43. Cas9 primarily causes microhomology deletions in genome-integrated and endogenous settings mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 43

  44. Majority of repair products arise from microhomology-mediated end-joining (MMEJ) mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 44

  45. inDelphi predicts 90% of repair products from 3 major repair classes mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 45

  46. 1-bp insertions copy the adjacent nucleotide Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 46

  47. 1-bp insertion frequency depends on local sequence context Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 47

  48. inDelphi accurately predicts nearly all repair outcomes Input: Sequence, cutsite • Predicts 90% of observed repair outcomes • 70% at single-base resolution Training & testing on held-out cell-types • Median r = 0.87 on genotype prediction • Median r = 0.84 on indel length prediction 48

  49. inDelphi accurately predicts frameshifts Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 49

  50. Target sites yielding a single deletion repair genotype >50% of the time Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 50

  51. Target sites yielding a single insertion repair genotype >50% of the time Weak microhomology Local sequence context Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 51

  52. How can we use DNA cuts to restore function? 52

  53. inDelphi predicts that 5% of gRNAs yield a single repair genotype the majority of the time Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 53

  54. Pathogenic microduplications are efficiently repairable to wild-type with simple Cas9 cutting Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 54

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend