Lecture 12: Predicting gene expression and splicing Prof. Manolis - - PowerPoint PPT Presentation

lecture 12 predicting gene expression and splicing
SMART_READER_LITE
LIVE PREVIEW

Lecture 12: Predicting gene expression and splicing Prof. Manolis - - PowerPoint PPT Presentation

6.874, 6.802, 20.390, 20.490, HST.506 Computational Systems Biology Deep Learning in the Life Sciences Lecture 12: Predicting gene expression and splicing Prof. Manolis Kellis Slides credit: David Gifford, et al http://mit6874.github.io


slide-1
SLIDE 1

6.874, 6.802, 20.390, 20.490, HST.506 Computational Systems Biology Deep Learning in the Life Sciences

Lecture 12: Predicting gene expression and splicing

  • Prof. Manolis Kellis

http://mit6874.github.io

Slides credit: David Gifford, et al

slide-2
SLIDE 2

Today: Predicting gene expression and splicing

  • 0. Review: Expression, unsupervised learning, clustering
  • 1. Up-sampling: predict 20,000 genes from 1000 genes
  • 2. Compressive sensing: Composite measurements
  • 3. DeepChrome+LSTMs: predict expression from chromatin
  • 4. Predicting splicing from sequence: 1000s of features
  • 5. Unsupervised deep learning: Restricted Boltzmann mach.
  • 6. Multi-modal programs: Expr+DNA+miRNA RMBs Liang
slide-3
SLIDE 3

RNA-Seq: De novo tx reconstruction / quantification

RNA-Seq technology:

  • Sequence short reads from

mRNA, map to genome

  • Variations:
  • Count reads mapping to each

known gene

  • Reconstruct transcriptome de

novo in each experiment

  • Advantage:
  • Digital measurements, de novo

Count

Microarray technology

  • Synthesize DNA probe array,

complementary hybridization

  • Variations:
  • One long probe per gene
  • Many short probes per gene
  • Tiled k-mers across genome
  • Advantage:
  • Can focus on small regions,

even if few molecules / cell

slide-4
SLIDE 4

Expression Analysis Data Matrix

  • Measure 20,000 genes in 100s of conditions
  • Study resulting matrix

n experiments m genes

Condition 1Condition 2 Condition 3 …

Experiment similarity questions Gene similarity questions Expression profile of a gene

Each experiment measures expression of thousands

  • f ‘spots’, typically genes
slide-5
SLIDE 5

Clustering vs. Classification

  • Supervised learning

Conditions Genes

Alizadeh, Nature 2000

Conditions Genes

Proliferation genes in transformed cell lines B-cell genes in blood cell lines

Alizadeh, Nature 2000

Lymph node genes in diffuse large B-cell lymphoma (DLBCL) Chronic lymphocytic leukemia

Goal of Clustering: Group similar items that likely come from the same category, and in doing so reveal hidden structure Goal of Classification: Extract features from the data that best assign new elements to ≥1 of well-defined classes

  • Unsupervised learning

Known classes: Independent validation

  • f groups that emerge:
slide-6
SLIDE 6

PCA, Dimensionality reduction

slide-7
SLIDE 7

Geometric interpretation of SVD

Mx = M(x) = U( S( V*(x) ) ) Rotation Scaling Rotation Shearing

slide-8
SLIDE 8
  • Solution via SVD

Low-rank Approximation

set smallest r-k singular values to zero T k k

V U A ) ,..., , ,..., ( diag

1

σ σ =

column notation: sum

  • f rank 1 matrices

T i i k i i k

v u A

∑ =

=

k

1 ) ( :min + =

= − = −

k F k F k X rank X

A A X A σ

  • Error:
slide-9
SLIDE 9

PCA of MNIST digits

slide-10
SLIDE 10

t-SNE of MNIST digits

1 2 7 4 3 6 5 8 9

slide-11
SLIDE 11

t-SNEs of single-cell Brain data

scRNA-seq in 48 individuals, 84k cells, Nature, 2019 CA1 Subiculum Dentate Gyrus (DG) CA2-4 16 Sz/16 BP/16 Controls, 300k cells Brain Hippocampus sub-structures scATAC-seq of 262k cells across 7 brain regions

slide-12
SLIDE 12

Autoencoder: dimensionality reduction with neural net

  • Tricking a supervised learning algorithm to work in unsupervised fashion
  • Feed input as output function to be learned. But! Constrain model complexity
  • Pretraining with RBMs to learn representations for future supervised tasks. Use RBM
  • utput as “data” for training the next layer in stack
  • After pretraining, "unroll” RBMs to create deep autoencoder
  • Fine-tune using backpropagation

[Hinton et al, 2006]

slide-13
SLIDE 13

Today: Predicting gene expression and splicing

  • 0. Review: Expression, unsupervised learning, clustering
  • 1. Up-sampling: predict 20,000 genes from 1000 genes
  • 2. Compressive sensing: Composite measurements
  • 3. DeepChrome+LSTMs: predict expression from chromatin
  • 4. Predicting splicing from sequence: 1000s of features
  • 5. Unsupervised deep learning: Restricted Boltzmann mach.
  • 6. Multi-modal programs: Expr+DNA+miRNA RMBs Liang
slide-14
SLIDE 14
  • 1. Up-sampling gene expression

patterns

slide-15
SLIDE 15

Challenge: Measure few values, infer many values

  • Image up-scaling

– Inverse of convolution (de-convolution) – Transfer learning from corpus of images – Low-dim. re-projection to high-dim. img

https://arxiv.org/pdf/1902.06068.pdf

  • Digital signal upscaling

– Interpolating low-pass filter (e.g. FIR finite impulse response) – Low-dim. capture of higher-dim. signal – Nyquist rate (discrete) / freq. (contin.)

  • Gene expression measurements

– Measure 1000 genes, infer the rest – Rapid, cheap, reference assay – Apply to millions of conditions

  • Which 1000 genes? Compressed sensing

– Measure few combinations of genes – Better capture high-dimensional vector

slide-16
SLIDE 16

Deep Learning architectures for up-sampling images

Post-sampling SR Pre-sampling super-resolution (SR) Progressive up-sampling Iterative up-and-down sampling

  • Representation/abstract learning

– Enables compression, re-upscaling, denoising – Example: autoencoder bottleneck. High-low-high – Modification: de-compression, up-scaling, low-high

  • nly
slide-17
SLIDE 17

D-GEX - Deep Learning for up-scaling L1000 gene expression

  • Multi-task Multi-Layer Feed-Forward Neural Net
  • Non-linear activation function (hyperbolic tangent)
  • Input: 943 genes, Output: 9520 targets (partition to fit in memory)
slide-18
SLIDE 18

D-GEX outperforms Linear Regression or K-nearest-Neighbors

  • Strictly better for nearly all genes
  • Lower error than LR or KNN
  • Training rapidly converges
  • Deeper = better

However: performance still not great, computational limitations

slide-19
SLIDE 19

Today: Predicting gene expression and splicing

  • 0. Review: Expression, unsupervised learning, clustering
  • 1. Up-sampling: predict 20,000 genes from 1000 genes
  • 2. Compressive sensing: Composite measurements
  • 3. DeepChrome+LSTMs: predict expression from chromatin
  • 4. Predicting splicing from sequence: 1000s of features
  • 5. Unsupervised deep learning: Restricted Boltzmann mach.
  • 6. Multi-modal programs: Expr+DNA+miRNA RMBs Liang
slide-20
SLIDE 20
  • 2. Composite measurements

for compressed sensing

slide-21
SLIDE 21

Key insight: Composite measurements better capture modules

  • Sparse

Module Activity Factorization (SMAF)

slide-22
SLIDE 22

Making composite measurements in practice

  • Combinations of probes + barcodes for measurement
  • More consistent signal-to-noise ratios
slide-23
SLIDE 23

Today: Predicting gene expression and splicing

  • 0. Review: Expression, unsupervised learning, clustering
  • 1. Up-sampling: predict 20,000 genes from 1000 genes
  • 2. Compressive sensing: Composite measurements
  • 3. DeepChrome+LSTMs: predict expression from chromatin
  • 4. Predicting splicing from sequence: 1000s of features
  • 5. Unsupervised deep learning: Restricted Boltzmann mach.
  • 6. Multi-modal programs: Expr+DNA+miRNA RMBs Liang
slide-24
SLIDE 24
  • 3. Predicting Expression from Chromatin
slide-25
SLIDE 25

Can we predict gene expression from chromatin information?

  • DNA methylation vs. gene expression
  • Promoters: high. Gene body: low
slide-26
SLIDE 26

Strong enhancers (+H3K27ac) vs. weak enhancers (H3K4me1 only)

slide-27
SLIDE 27

DeepChrome: positional histone features predictive of expression

  • Positional information for each mark
  • Outperforms previous methods
  • Meaningful features selected

Histone mark 1

  • Convolution, pooling, drop-out, Multi-Layer-

Perceptron (MLP) alternating lin/non-linear

slide-28
SLIDE 28

AttentiveChrome: Selectively attend to specific marks/positions

  • Attention: LSTM:

Long short-term memory module

  • Hierarchical LSTM

modules: interactions across marks

  • Consistent improvement
  • ver DeepChrome

Histone mark 1

  • Attention focuses on

specific positions for specific marks

slide-29
SLIDE 29

Today: Predicting gene expression and splicing

  • 0. Review: Expression, unsupervised learning, clustering
  • 1. Up-sampling: predict 20,000 genes from 1000 genes
  • 2. Compressive sensing: Composite measurements
  • 3. DeepChrome+LSTMs: predict expression from chromatin
  • 4. Predicting splicing from sequence: 1000s of features
  • 5. Unsupervised deep learning: Restricted Boltzmann mach.
  • 6. Multi-modal programs: Expr+DNA+miRNA RMBs Liang
slide-30
SLIDE 30
  • 4. Predicting splicing from sequence
slide-31
SLIDE 31

Deciphering tissue-specific splicing code

Splicing code Tissue type Alternatively spliced exon2 exon1 exon2 exon3 300 nt 300 nt 300 nt 300 nt Feature set: known motifs, transcript structure in target exon and adjacent exons 3-class softplus prediction model:

qinc, qexc, qnc

Exon inclusion: tinc=1, texc=0, tnc=0 Exon exclusion: tinc=0, texc=1, tnc=0 RNA feature extraction

[Barash et al., 2010]

slide-32
SLIDE 32

Bayesian neural network splicing code

1014 RNA features x 3665 exons 4 Mouse tissues each with 3 classes (i.e., 12 output units) Bayesian neural network:

  • # hidden units

follows Poisson(λ)

  • Network weights

follows spike-and- slab prior Bern(1 − α)

  • Likelihood is cross-

entropy

  • Network weights are

sampled from the posterior [Xiong et al., 2011]

slide-33
SLIDE 33

Predicts diseasing causing mutations from splicing code

[Xiong et al., 2011]

slide-34
SLIDE 34

Predicts diseasing causing mutations from splicing code

Scoring splicing changes due to SNP ∆ψ:

  • Train splice code model on 10,689 exons to predict the 3 splicing

classes over 16 human tissues using 1393 sequence features (motifs & RNA structures)

  • Score both the reference ψref and alternative ψalt sequences

harboring one of the 658,420 common variants

  • Calculate ∆ψt = ψt

ref − ψr alt

  • ver each tissue t
  • Obtain largest absolute or aggregate ∆ψt to score effects of

SNPs

[Xiong et al., 2011]

slide-35
SLIDE 35

Predicted scores are indicative of disease causing mutations

slide-36
SLIDE 36

Predicted scores are indicative of disease causing mutations

slide-37
SLIDE 37

Predicted mutations in MLH1,2 in nonpolyposis colorectal cancer patients are validated via RT-PCR

slide-38
SLIDE 38

Splice code goes deep

Architecture of the new network to predict alternative splicing between two tissues. It contains three hidden layers, with hidden variables that jointly represent genomic features and tissue types. [Leung et al., 2014]

slide-39
SLIDE 39

Limitations of the splice code model

  • Require threshold to define discrete splicing targets
  • Not taking into account exon expression level in specific

tissue types

  • Fully connected neural network potentially impose a

large number of parameters: (1393 inputs + 13 outputs) × 10 hidden units = 13000 parameters

  • Although authors showed that neural network performs

the best a softplus/Dirichlet multivariate linear regression may achieve similar performance

  • The

features are pre-defined and thus may be completely reflect the underlying splicing mechanism

  • Interpretation of the importance of features is not trivial
slide-40
SLIDE 40

Today: Predicting gene expression and splicing

  • 0. Review: Expression, unsupervised learning, clustering
  • 1. Up-sampling: predict 20,000 genes from 1000 genes
  • 2. Compressive sensing: Composite measurements
  • 3. DeepChrome+LSTMs: predict expression from chromatin
  • 4. Predicting splicing from sequence: 1000s of features
  • 5. Unsupervised deep learning: Restricted Boltzmann mach.
  • 6. Multi-modal programs: Expr+DNA+miRNA RMBs Liang
slide-41
SLIDE 41
  • 5. Unsupervised deep learning

with Restricted Boltzmann Machines (RBMs)

slide-42
SLIDE 42

How the brain works inspired artificial “neural" networks

axon body dendritic tree

axon hillock

Biological neuron

z = b + xiwi

i

Activation function

Artificial perceptron Neural Network (e.g. 4-layers ‘deep’)

Deep multi-layer neural networks can ‘learn’ almost any function  Deep ‘unsupervised’ learning?

slide-43
SLIDE 43

General Boltzmann Machine: Unsupervised learning

v1 v2 h1 h2

  • Symmetrically connected network (no target ‘output’)
  • Each binary unit makes stochastic on/off decision
  • Network weights learn relationships between variables
  • Configuration dictates “energy”. At equilibrium, follows

Boltzmann distribution (exponentiated negative energy) v=visible units; h=hidden units, E (v,h) energy function wij=connection weights si∈{0,1}=state of unit i, bi=bias of unit i in global P(v) energy-dependent prob. function

Goal: Given v, learn weights wij to maximize P(v). Botzmann machine becomes universal approximator

  • f probability mass functions over discrete variables.

Adv: Local learning rules, infer each variable based on neighbors only. No need for example annotations, no output function. Problem: Difficult to train, dependencies between hidden units [Ackley et al., 1985; Le Roux, Bengio, 2008]

slide-44
SLIDE 44

Restricted Boltzmann Machine (RBM)

  • Bipartite graph. No hh and no vv connections
  • 1 layer of hidden units, 1 layer of visible units.
  • Simple unsupervised learning module
  • Much easier to train than GBM: no circularities

input v2 h3 h2 h1 v1 hidden

[Hinton and Osindero, 2006] However: <v^

i, h^ j> model still too large to estimate.

 apply Markov Chain Monte Carlo (MCMC) (i.e., Gibbs sampling) Objective function:

slide-45
SLIDE 45

Stacking RBMs  Deep belief network

  • 1. First apply RBM to find a

sensible set of weights using unlabelled data.

input v2 h1 h2 h3 hidden v1

2. Then use the pre-trained weight to perform backpropagation to classify labelled data

input v2 h2 h1 h3 hidden v1 a2 a1 softplus output

slide-46
SLIDE 46

Look into the mind of the network: generative model

1st column: Sample from generative model with each label clamped on. 2nd column: 20 iterations of alternating Gibbs sampling in associative memory. etc… (Figure 9, Hinton et al., 2006).

Interactive visualization of network learning: http://www.cs.toronto.edu/~hinton/digits.html

slide-47
SLIDE 47

Today: Predicting gene expression and splicing

  • 0. Review: Expression, unsupervised learning, clustering
  • 1. Up-sampling: predict 20,000 genes from 1000 genes
  • 2. Compressive sensing: Composite measurements
  • 3. DeepChrome+LSTMs: predict expression from chromatin
  • 4. Predicting splicing from sequence: 1000s of features
  • 5. Unsupervised deep learning: Restricted Boltzmann mach.
  • 6. Multi-modal programs: Expr+DNA+miRNA RMBs Liang
slide-48
SLIDE 48
  • 6. Multimodal unsupervised deep learning

for data integration with RBMs

slide-49
SLIDE 49

RBMs for TCGA cancer integration: Expression, miRNAs, Methylation

Hierarchical model integrates:

  • gene expression (GE)
  • miRNA expression (ME)
  • DNA methylation (DM)

Energy function combines multiple data types

slide-50
SLIDE 50

Learned patient groups show different survival/drugs

  • Capture independent variables from molecular data