Computational Systems Biology Deep Learning in the Life Sciences - PowerPoint PPT Presentation

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490 HST.506 David Gifford Lecture 19 April 16, 2020 Predicting genome editing outcomes with machine learning methods http://mit6874.github.io 1

Poll Warm Up: Where are you located today? How do you prefer to receive lectures? 2

Predicting the outcomes of genome editing • CRISPR (cas9) genome editing in detail • Assays to detect off target cutting • Machine learning models to predict off target cutting • Discovering the necessary genome for Tdgf1 • Machine learning models of on target cutting • The limitations of base editing 3

CRISPR ( clustered regularly interspaced short palindromic repeats ) editing mechanics 4

Cas9 nuclease engaged in cutting Required PAM sequence limits available cut sites (NGG Cas9) 17-24 nucleotide RNA spacer complementary to target DNA sequence 5

Genome cuts resolve in two ways Desired outcome sequence 6

CRISPR is relevant as a therapeutic tool 7 https://www.hindawi.com/journals/bmri/2019/1369682/

CRISPR derivatives can implement many functions 8

How well does CRISPR find its way to a specific site? Characterizing off target effects with a genome wide assay 9

GUIDE-seq incorporates a 34-bp phosphothiorated double stranded DNA oligo (dsODN) into cut sites 10

GUIDE-seq identifies off target cuts 11

CIRCLE-seq reveals CRISPR cut sites genome wide 12 https://www-nature-com/articles/nmeth.4278.pdf

CIRCLE-seq reveals CRISPR cut sites genome wide (arrow is intended cut site) 13 https://www-nature-com/articles/nmeth.4278.pdf

How can we predict off target activity of a CRISPR based enzyme? 14

Recall our biological model What features should we use to predict off target effects? Required PAM sequence limits available cut sites 17-24 nucleotide RNA spacer complementary to target DNA sequence 15

Example CRISPR features for off target prediction • CROP-IT grades gRNA sequences dividing 23bp sequence into three regions with different weights, penalty scores for consecutive mismatched sites • CCTOP and MIT score considers positions and counts of mismatches • CFD (cutting frequency determination) emulates a large number of single base, deletion, and insertions in the gRNA and scores these with reference to validated gRNAs in a cellular assay 16

A deep neural network for classifying CRISPR recognition of a genomic site 10 4x1 RELU 10 4x2 on 10 4x3 output 18 10 4x5

Performance of different architectures on a 5x cross-validation on CRISPOR dataset 19

Train on CRISPOR dataset Test on GUIDE-seq 20

What bases are necessary for genome function? 21

Genome editing allow us to change genome sequence and observe the function of each base in a selected cellular context 22

Wh What are the key y genomic ic ele lements that are ne necessary for gene ne expr pression? n? Enhancer region Promoter region Coding sequence 25

Ne Nece cessar ary elements ts will depend upon on ce cell ty type – the b th binding s sites u used b by a g a given f fact actor c can an d depend upo upon n cell type pe (he here Tcf7l2) Bi Binding sites change across ti time ~50,000 binding sites for a typical TF ~650,000 TF Motifs 26

An An annotation of potential Tdgf1 1 ci cis-re regulation Histone modifications Dnase-I HS TF density Predictions of regulatory function based on indirect epigenomic measurements 27

Idea – break parts of the genome to see what is essential for gene expression Green Florescent Protein (GFP) lights up cell when gene is expressed Cell 1 Cell 2 Cell n - Native context measurement - High-throughput - Directly observe expression of target gene via GFP - Controlled delivery of only 1 gRNA per cell 28

Idea – determine parts of your computer necessary for Zoom by breaking parts 29

Idea – determine parts of your computer necessary for Zoom by breaking parts 30

Refinement – need fine grain resolution on what we break

We can use CRISPR genome editing to make localized genome alterations that are addressed by a guide RNA (gRNA) • often inefficient • efficient • designable • random indels • undesirable byproducts • highly heterogeneous • impractical beyond gene disruption • “genome vandalism” 32

Mu Multiplexed Editing Regulatory Assay (ME MERA) ex experimenta tal flow: 1. Put one gRNA in each cell that targets a location of interest 2. Use CRISPR to ablate the respective location in each cell 3. Sort cells by expression of GFP 4. Sequence gRNAs in each population to determine what locations are necessary, what locations are not necessary for GFP expression 33

Distribution of the log 10 ratio of GFP neg to bulk reads for all integrated gRNAs for Tdgf1 34

ME MERA enables systematic identification of required ci cis- re regulatory elements for Tdgf1 35

Te Testing of individual gR gRNAs suppo supports s requi quired d ci cis-re regulatory elements for Tdgf1 36

Ne Necessary genome me goes beyond know own annotations (T (Tdgf1 f1) Genomic regions sorted by importance 37

How can we predict the genotypes of on target CRISPR cuts? 38

The state of CRISPR genome editing • often inefficient • efficient • designable • random indels • undesirable byproducts • highly heterogeneous • impractical beyond gene disruption • “genome vandalism” 39

The state of CRISPR genome editing • often inefficient • efficient • designable • predictable indels • predictable byproducts • can be homogeneous • practical: repair of pathogenic alleles to wild-type • “genome art” 40

High-throughput genome-integrated assay of Cas9-mediated DNA repair • 96 target sites in largest previous study • Designed 1,872 target sites (55-bp) based on the human genome • Observed 1,262 unique genotypes / target site 41 Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction.

Cas9 primarily causes microhomology deletions in genome-integrated and endogenous settings A microhomology deletion is a deletion with multiple equal-scoring alignments mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 42

Cas9 primarily causes microhomology deletions in genome-integrated and endogenous settings mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 43

Majority of repair products arise from microhomology-mediated end-joining (MMEJ) mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 44

inDelphi predicts 90% of repair products from 3 major repair classes mESC Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 45

1-bp insertions copy the adjacent nucleotide Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 46

1-bp insertion frequency depends on local sequence context Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 47

inDelphi accurately predicts nearly all repair outcomes Input: Sequence, cutsite • Predicts 90% of observed repair outcomes • 70% at single-base resolution Training & testing on held-out cell-types • Median r = 0.87 on genotype prediction • Median r = 0.84 on indel length prediction 48

inDelphi accurately predicts frameshifts Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 49

Target sites yielding a single deletion repair genotype >50% of the time Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 50

Target sites yielding a single insertion repair genotype >50% of the time Weak microhomology Local sequence context Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 51

How can we use DNA cuts to restore function? 52

inDelphi predicts that 5% of gRNAs yield a single repair genotype the majority of the time Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 53

Pathogenic microduplications are efficiently repairable to wild-type with simple Cas9 cutting Cas9 editing without a homology template is predictable, can be precise, and can be practical for disease correction. 54

Computational Systems Biology Deep Learning in the Life Sciences - PowerPoint PPT Presentation

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490 HST.506 David Gifford Lecture 19 April 16, 2020 Predicting genome editing outcomes with machine learning methods http://mit6874.github.io 1 Poll

Deep Computing in Biology Challenges and Progress Ajay K. Royyuru Computational Biology Center

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

2019-20 DNA Biology New Products RNA Biology PROTEIN Biology MOLECULAR Biology Plant DNA

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

1. Introduction to Molecular & Systems Biology EECS 600: Systems Biology &

Methods Updating Variables Console Programs int life = 42; life life = 42 life; 21 life =

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Computational Systems Biology Deep Learning in the Life Sciences 6.802 20.390 20.490 HST.506

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

The Story, Ch a pter 9 When they came to Bethlehem, the whole town was stirred because of them;

CS 557 - Lecture 1 Review of Basic Protocols IP - RFC 791, 1981 TCP - RFC 793, 1981 Spring 2013

Is Gender Still an Issue for UK Pensions? Jay Ginn Institute for Gerontology Kings College

Title Text You want to start a new web agency... but you're a developer iterate.ie/drupaldays14

Cutting Edge Genetics Made Easy Karyotype Molecular testing with FISH Chromosomal

Studying and Fighting pathogenic bacteria with CRISPR David Bikard Synthetic Biology Group

Living Wisely in an Era of Gene Editing at Will Anjeanette AJ Roberts, MACA, PhD Research

Oligo Pools: Design, Synthesis, and Research Applications Presenter Marcelo Caraballo, Senior

Sambuz

Useful Links

Newsletter

Mail Us

Computational Systems Biology Deep Learning in the Life Sciences - PowerPoint PPT Presentation

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490 HST.506 David Gifford Lecture 19 April 16, 2020 Predicting genome editing outcomes with machine learning methods http://mit6874.github.io 1 Poll

Deep Computing in Biology Challenges and Progress Ajay K. Royyuru Computational Biology Center

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

2019-20 DNA Biology New Products RNA Biology PROTEIN Biology MOLECULAR Biology Plant DNA

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

1. Introduction to Molecular &amp; Systems Biology EECS 600: Systems Biology &amp;

Methods Updating Variables Console Programs int life = 42; life life = 42 life; 21 life =

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Computational Systems Biology Deep Learning in the Life Sciences 6.802 20.390 20.490 HST.506

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

Computational Systems Biology Deep Learning in the Life Sciences 6.802 6.874 20.390 20.490

The Story, Ch a pter 9 When they came to Bethlehem, the whole town was stirred because of them;

CS 557 - Lecture 1 Review of Basic Protocols IP - RFC 791, 1981 TCP - RFC 793, 1981 Spring 2013

Is Gender Still an Issue for UK Pensions? Jay Ginn Institute for Gerontology Kings College

Title Text You want to start a new web agency... but you're a developer iterate.ie/drupaldays14

Cutting Edge Genetics Made Easy Karyotype Molecular testing with FISH Chromosomal

Studying and Fighting pathogenic bacteria with CRISPR David Bikard Synthetic Biology Group

Living Wisely in an Era of Gene Editing at Will Anjeanette AJ Roberts, MACA, PhD Research

Oligo Pools: Design, Synthesis, and Research Applications Presenter Marcelo Caraballo, Senior

Sambuz

Useful Links

Newsletter

Mail Us

1. Introduction to Molecular & Systems Biology EECS 600: Systems Biology &