A massively parallel approach to understanding genomic information - - PowerPoint PPT Presentation

a massively parallel approach to understanding genomic
SMART_READER_LITE
LIVE PREVIEW

A massively parallel approach to understanding genomic information - - PowerPoint PPT Presentation

A massively parallel approach to understanding genomic information Alexander Rosenberg, Rupali Pathwardan, Jay Shendure, Georg Seelig Electrical Engineering and Computer Science & Engineering, University of Washington Sequencing genome.


slide-1
SLIDE 1

A massively parallel approach to understanding genomic information

Alexander Rosenberg, Rupali Pathwardan, Jay Shendure, Georg Seelig Electrical Engineering and Computer Science & Engineering, University of Washington

slide-2
SLIDE 2

Sequencing genome. Complete.

Interpreting genome … Compiling list of variants.

Complete.

Jay Shendure

slide-3
SLIDE 3

Understanding the impact of variant with machine learning

enhancers promoter 5’ UTR intron exon 3’ UTR Poly A

Aaatcggagacc c

} Build a sequence-function model using machine learning } Model are limited by data (e.g. “only” 50K splice events)

slide-4
SLIDE 4

More data is better

slide-5
SLIDE 5

DNA sequencing

A massively parallel approach to understanding the genome

DNA sequencing Synthetic biology Machine learning Massively parallel experiments Models for understanding and engineering the genome

slide-6
SLIDE 6

Overview

} A massively parallel approach to understanding

sequence-function relationship: 5’alternative splicing

} Cell-type specific effects in alternative splicing } Skipped exons: attempt 1 } Skipped exons and 3’ alternative splicing: exon definition

slide-7
SLIDE 7

RNA-Splicing

Exon Intron Typical Human Gene:

slide-8
SLIDE 8

Core splicing signals

} Splicing is regulated by cis-regulatory sequences motifs

and a trans-acting RNA-protein complex, the spliceosome

Splice donor Branch point PPT + Splice acceptor

slide-9
SLIDE 9

Alternative Splicing

} Different isoforms can have distinct protein functions } 95% of coding genes are alternatively spliced } Misregulation of splicing can lead to disease and cancer

Isoform A Isoform B

slide-10
SLIDE 10

Regulation of Alternative Splicing

What are the sequence determinants of alternative splicing?

} The splice site sequences (splice donors) } Sequences around the splice sites

slide-11
SLIDE 11

Effects of Single Nucleotide Polymorphisms (SNPs) on Alternative Splicing in Humans

} Can we create a model that predict the effects of

nucleotide changes on alternative splicing?

slide-12
SLIDE 12

Massively Parallel Splicing Assay

} Alternatively spliced plasmid mini-gene with 3 splice donors } Introduced degenerate nucleotide sequences between the

splice donors

} How does sequence variation in these positions affect

alternative splicing?

slide-13
SLIDE 13

Massively Parallel Splicing Assay

slide-14
SLIDE 14

Let’s give a cell lots of DNA sequences and record what happens

DNA synthesized in the lab Human Cells

slide-15
SLIDE 15

Massively Parallel Splicing Assay

} Used RNA-seq to quantify isoform levels } For every mRNA molecule that we sequenced we determined:

} how it spliced } which plasmid variant it was transcribed from (barcode in 3’UTR)

slide-16
SLIDE 16

Resulting Data

SD1 SD2 SD3

26 2 27 113 4 1 … …

267,000 Different Sequences SDNEW

slide-17
SLIDE 17

Resulting Data - Summary

SD1 SD2 SD3

28% 47% 6% 15%

SDNEW

slide-18
SLIDE 18

Short Sequence Motif Effect Sizes

SD1 SD2 Effect Size: GTGGGG = +2.37 Introns without GTGGGG (N=264,000)

TAATCTTCTTAGAGTATCGCCTAGG TCAAATAGGGAGCTTTGATATCTGC … GCGCGCAGATCTGGGTCGAGATAAA

21% 79% Introns with GTGGGG (N=3000)

CAATCCCATATTGCGACGTGGGGGG GGTTCGCAAGTCCCACGTGGGGCGT … CAGGTGGGGAAGGCTCAGGTTTCTG

59% 41%

slide-19
SLIDE 19

All 6-mer Effect Sizes

} 78% of 6-mers have

statistically significant effect

  • n usage of the first splice

donor

slide-20
SLIDE 20

Combinatorial Regulation of Alternative Splicing

T wo Possible Models of Combinatorial Sequence Regulation:

} Additive: Sequence motifs act independently of each

  • ther

} Effect Size(GTGG & CTGC) = Effect Size(GTGG) + Effect

Size(CTGC)

} Cooperative: Sequence motifs interact with other motifs

slide-21
SLIDE 21

SD1 SD2

Combinatorial Regulation of Alternative Splicing

R2=0.89

} Short motifs act additively

and independently of each

  • ther

GTGG CTGC

slide-22
SLIDE 22

Building an Additive Model of Splicing

} Effect Size(ACTGTACGTGTGTGGGCCATGTCCG) =

ACTGTACGTGTGTGGGCCATGTCCG SD1 SD2

… + Effect Size (TGTCCG) Effect Size (ACTGTA) + Effect Size (CTGTAC) + Effect Size (TGTACG)

slide-23
SLIDE 23

Individual Contribution of a Nucleotide to Splicing

} Effect Size(G at position 12) =

ACTGTACGTGTGTGGGCCATGTCCG SD1 SD2

Effect Size (CGTGTG) + Effect Size (GTGTGT) + Effect Size (TGTGTG) + Effect Size (GTGTGG) + Effect Size (TGTGGG) + Effect Size (GTGGGC)

(

) / 6

slide-24
SLIDE 24

Testing An Additive Model

SD1 SD2 SD3 SDNEW Model Predictions RNA-seq

} Trained model using multinomial logistic regression } T

ested the accuracy of model predictions on a test set

} For each intron variant:

} Score every potential splice site } Convert splice donor scores into splicing probabilities (softmax function)

slide-25
SLIDE 25

Effects of Single Nucleotide Polymorphisms (SNPs) on Alternative Splicing in Humans

} Can our model predict the effects of nucleotide changes

  • n alternative splicing?
slide-26
SLIDE 26

Measuring the Effects of SNPs on Alternative Splicing

} Started with a list of alternatively spliced human genes } Used Thousand Genomes data and RNA-seq data from

GEUVADIS to calculate isoform percentage for:

} Individuals with a SNP } Individuals with no SNP

slide-27
SLIDE 27

Predicting Effects of SNPs between Alternative Splice Donors

slide-28
SLIDE 28

Predicting Effects of SNPs in an Alternative Splice Donor

  • r
slide-29
SLIDE 29

Overview

} A massively parallel approach to understanding sequence-

function relationship: 5’alternative splicing

} Cell-type specific effects in alternative splicing } Skipped exons: attempt 1 } Skipped exons and 3’ alternative splicing: exon definition

slide-30
SLIDE 30

RBFOX1/2 Binding Site Differences in HEK293 and MCF7 Cells

Rank Motif 1 TGCATG 2 GCATGC 3 CGCATG 4 TCGCCT 5 ATGCAT 6 ACGACA 7 ACGACG 8 AGCCCC 9 CTCGGC 10 CATGCA 11 CCCCAC 12 AGCATG 13 AACGAC

slide-31
SLIDE 31

RBFOX2 Expression in HEK293 vs MCF7

10 20 30 40 50 60 HEK293 MCF7

RNA (fpkm)

500 1000 1500 2000 2500 3000 HEK293 MCF7

Protein (antibody score)

The Human Protein Atlas

slide-32
SLIDE 32

RBFOX1/2 Binding Site Differences in HEK293 and MCF7 Cells

Ray, Debashish, et al. "A compendium of RNA-binding motifs for decoding gene regulation." Nature 499.7457 (2013): 172-177.

slide-33
SLIDE 33

Overview

} A massively parallel approach to understanding sequence-

function relationship: 5’alternative splicing

} Cell-type specific effects in alternative splicing } Skipped exons: attempt 1 } Skipped exons and 3’ alternative splicing: exon definition

slide-34
SLIDE 34

Alternative Splicing

Alternative 5’ (8%) Alternative 3’ (31%) Skipped exon (59%)

Bradley, R., et al. "Alternative Splicing of RNA Triplets Is Often Regulated and Accelerates

Proteome Evolution.” Plos Biol 10 (2013): e1001229

.

slide-35
SLIDE 35

Skipped exons

slide-36
SLIDE 36

Skipped exons

} Exon skipping

slide-37
SLIDE 37

Skipped exons

mRNA A mRNA B

slide-38
SLIDE 38

Massively Parallel Exon Skipping Assay

} Exon skipping minigene base on SMN1/2 exon7 } Randomized two intronic 25 nucleotides regions } T

ested ~1 million different sequences (for perspective: ~25,000 genes in the human genome)

SMN1/2 exon 7

slide-39
SLIDE 39

Short Sequence Effects

GGGGGG? Introns without GGGGGG (N= 973,471)

TAATCTTCTTAGAGTATCGCCTAGG TCAAATAGGGAGCTTTGATATCTGC … GCGCGCAGATCTGGGTCGAGATAAA

Introns with GGGGGG (N=2,087)

CAATCCCATATTGCGACGGGGGGGG GGTTCGCAAGTCCCACGGGGGGCGT … CAGGGGGGGAAGGCTCAGGTTTCTG

33.3% 66.7% 64.2% 35.8%

slide-40
SLIDE 40

Effects of Genetic Variation on Alternative Splicing in Humans

slide-41
SLIDE 41

Predicted Effects of SMN2 Mutations

} Works only for intronic mutations } And works only for SMN1/2

SMN1/2 exon 7

slide-42
SLIDE 42

Overview

} A massively parallel approach to understanding sequence-

function relationship: 5’alternative splicing

} Cell-type specific effects in alternative splicing } Skipped exons: attempt 1 } Skipped exons and 3’ alternative splicing: exon

definition

slide-43
SLIDE 43

Alternative Splicing Libraries

Alternative 5’ (8%) Alternative 3’ (31%) Skipped exon (59%) 300K 1.7M 1M

slide-44
SLIDE 44

Nearly identical exon definition in 3’ and 5’ alternative splicing

~1.7 million 3’alternative splice events

slide-45
SLIDE 45

Predicting the Effects of Mutations in Skipped Exons

slide-46
SLIDE 46

Predicting the Effects of Mutations in SMN and CFTR proteins

slide-47
SLIDE 47

Nearly identical exon definition in 3’ and 5’ alternative splicing

SPANR: Ailpanahi et al., Science (2015)

slide-48
SLIDE 48

Exon definition

} Human exons are short: typically 50-250 bp } Human introns are long: often 105 bp } Splice sites are recognized in pairs across exons

slide-49
SLIDE 49

Summary

} We presented a new approach to learn the regulatory

rules governing alternative splice site selection

} A model that was trained only on synthetic data predicts

splice site selection better than any previous model directly trained on the genome

} A model that was not trained on skipped exon can

predict the effect of mutations in skipped exons

} Our approach makes it possible to identify cell-types

specific differences in splicing

slide-50
SLIDE 50

A broadly applicable method for understanding gene regulation

enhancers

Transcription Alternative Splicing Translation Poly-adenylation …

promoter 5’ UTR intron exon 3’ UTR Poly A

slide-51
SLIDE 51

Yuan-Jyue Chen Sergii Pochekailov Ben Groves Rebecca Black Alex Rosenberg Paul Sample Gourab Chatterjee Alex Baryshev Sumit Mukherjee Sifang Chen Nick Bogard Randolph Lopez

Acknowledgements

Arjun Khakhar

slide-52
SLIDE 52

Short Sequence Motif Effect Sizes

SD1 SD2 Effect Size: GTGGGG = +2.37 Introns with GTGGGG (N=3000)

CAATCCCATATTGCGACGTGGGGGG GGTTCGCAAGTCCCACGTGGGGCGT … CAGGTGGGGAAGGCTCAGGTTTCTG

Introns without GTGGGG (N=264,000)

TAATCTTCTTAGAGTATCGCCTAGG TCAAATAGGGAGCTTTGATATCTGC … GCGCGCAGATCTGGGTCGAGATAAA

slide-53
SLIDE 53

Predicting the Effects of Mutations in Survival Motor Neuron (SMN) protein

} Mutations in SMN proteins alter RNA splicing and cause

spinal muscular atrophy (SMA)

} SMA can severely affect muscle control } SMA affects between 1/6,000 to 1/10,000 people } Can we predict which mutations will alter splicing of SMN

proteins?

slide-54
SLIDE 54

A massively parallel approach to studying translation

BGH pA mCherry CMV Citrine CMVd1 5’UTR BGH pA 50-mer library Lentiviral backbone

slide-55
SLIDE 55

Work flow

Citrine 50-mer library 5’UTR

500k random 5’UTRs Lentiviral delivery Cell sorting Barcode and sequence

Citrine : mCherry ratio Low High

50-mer bin1 50-mer bin1 50-mer bin7 50-mer bin7

Motif discovery

GYCGGY expression low high count

Predictive model

5’ UTR k-mer motifs GC-content 2° structure uORFs RBP sites expression

slide-56
SLIDE 56

Flow cytometry results for 7 random and 3 designed 5’UTRs

0% 1% 2% 3% 4% 5% 6% 7% 8% 9% Normalized Citrine

UTR3

  • Purine at -3
  • No uAUGs
  • Moderate-low 2° structure

UTR9

  • Purine at -3
  • No uAUGs
  • Very High 2° structure

UTR10

  • Purine at -3
  • Two uAUGs
  • Low 2° structure
  • W. L. Noderer, et al. "Quantitative analysis of mammalian translation

initiation sites by FACS-seq." Mol. Sys. Biol. 10, 748 (2014).

slide-57
SLIDE 57

Sequencing confirms random 5’UTR

slide-58
SLIDE 58

Upstream ATGs modulate translation

slide-59
SLIDE 59

Nucleotides at -3:-1 strongly influence translation

slide-60
SLIDE 60

Translation Summary

} We are developing a massively parallel approach to

understanding the 5’UTR sequence-function relationship

} Very large “super-biological”data sets enable predictive

models

} This approach can in principle be applied in the context

  • f your favorite gene and cell type
slide-61
SLIDE 61

Flow cytometry results for 7 random and 3 designed 5’UTRs

slide-62
SLIDE 62

Example Growth Traces for a Few Library Members

slide-63
SLIDE 63

Regulation of Alternative Splicing

What are the sequence determinants of alternative splicing?

} The splice site sequences } Sequences in the introns

slide-64
SLIDE 64

Experimental Methods

DNA synthesized in the lab Human Cells

slide-65
SLIDE 65

Resulting Data

mRNA A mRNA B

26 113 4 1 12 … …

~1 million Different Sequences

slide-66
SLIDE 66

Predicting the Effects of Mutations in Survival Motor Neuron (SMN) protein

} Mutations in SMN proteins alter RNA splicing and cause

spinal muscular atrophy (SMA)

} SMA can severely affect muscle control } SMA affects between 1/6,000 to 1/10,000 people } Can we predict which mutations will alter splicing of SMN

proteins?

slide-67
SLIDE 67

Definition: Percent Spliced In

}

mRNA A mRNA B

slide-68
SLIDE 68

Dataset: Mutations Tested in Studies on SMN2 Splicing

Position Mutation ΔPSI 3 C>G +21.2% 5 A>T

  • 20.8%

12 G>A +3.3% … 50 A>C +65.2%

slide-69
SLIDE 69

Uncovering cell type specific splicing

Ray, Debashish, et al. "A compendium of RNA-binding motifs for decoding gene regulation." Nature 499.7457 (2013): 172-177.

Logistic regression: R^2=0.14 Logistic regression: R^2=0.16