CSEP 527 Computational Biology Gene Expression Analysis 1 - - PowerPoint PPT Presentation

csep 527 computational biology
SMART_READER_LITE
LIVE PREVIEW

CSEP 527 Computational Biology Gene Expression Analysis 1 - - PowerPoint PPT Presentation

CSEP 527 Computational Biology Gene Expression Analysis 1 Assaying Gene Expression 3 Microarrays 4 RNAseq Millions of reads, DNA Sequencer say, 100 bp each map to genome, analyze 5 Goals of RNAseq #1: Which genes are


slide-1
SLIDE 1

CSEP 527
 Computational Biology

Gene Expression Analysis

1

slide-2
SLIDE 2

Assaying Gene Expression

3

slide-3
SLIDE 3

Microarrays

4

slide-4
SLIDE 4

RNAseq

DNA Sequencer ⬇ ⬇ ⬇ map to genome, analyze

5

Millions of reads, say, 100 bp each

slide-5
SLIDE 5

Goals of RNAseq

#1: Which genes are being expressed?

How? assemble reads (fragments of mRNAs) into (nearly) full-length mRNAs and/or map them to a reference genome

#2: How highly expressed are they?

How? count how many fragments come from each gene–expect more highly expressed genes to yield more reads, after correcting for biases like mRNA length

#3: What’s same/diff between 2 samples

E.g., tumor/normal

#4: ...

7

slide-6
SLIDE 6

2

intron

exon 5’ exon

Recall: splicing

slide-7
SLIDE 7

RNAseq Data Analysis

De novo Assembly

mostly deBruijn-based, but likely to change with longer reads more complex than genome assembly due to alt splicing, wide diffs in expression levels; e.g. often multiple “k’s” used pro: no ref needed (non-model orgs), novel discoveries possible, e.g. very short exons con: less sensitive to weakly-expressed genes

Reference-based (more later)

pro/con: basically the reverse

Both: subsequent bias correction, quantitation, differential expression calls, fusion detection, etc.

8

slide-8
SLIDE 8

“TopHat” (Ref based example)

n map reads to ref transcriptome (optional) n map reads to ref genome n unmapped reads remapped as 25mers n novel splices = 25mers anchored 2 sides n stitch original reads across these n remap reads with minimal overlaps n Roughly: 10m reads/hr, 4Gbytes


(typical data set 100m–1b reads)

BWA

9

slide-9
SLIDE 9 Figure 6

Kim,et al. 2013. “TopHat2: Accurate Alignment of Transcriptomes in the Presence of Insertions, Deletions and Gene Fusions.” Genome Biology 14 (4) (April 25): R36. doi:10.1186/gb-2013-14-4-r36.

slide-10
SLIDE 10

20 Scale chr19: FCGRT FCGRT 5 kb hg19 50,020,000 50,025,000 1yr-3

Day 20 1 Year

RNAseq Example

slide-11
SLIDE 11

RNAseq protocol (approx)

Extract RNA (either polyA ↔ polyT or tot – rRNA) Reverse-transcribe into DNA (“cDNA”) Make double-stranded, maybe amplify Cut into, say, ~300bp fragments Add adaptors to each end Sequence ~100-175bp from one or both ends CAUTIONS: non-uniform sampling, sequence (e.g. G+C), 5’-3’, and length biases

6

slide-12
SLIDE 12

Two Stories: RNAseq Bias Correction & Isoform Quantification Let-7 & Cardiomyocyte Maturation

Walter L. (Larry) Ruzzo

Computer Science and Engineering Genome Sciences University of Washington Fred Hutchinson Cancer Research Center Seattle, WA, USA

Copenhagen, 2015-Aug-18 ruzzo@uw.edu

slide-13
SLIDE 13

Story 1

RNAseq:
 Bias Correction & Alt Splicing

slide-14
SLIDE 14

“All High-Throughput Technologies are Crap

  • Q. Morris


7-20-2015

– Initially”

slide-15
SLIDE 15

RNA seq

RNA → → Sequence → → Count

cDNA, fragment, end repair, A-tail, ligate, PCR, … QC filter, trim, map, …

It’s so easy, what could possibly go wrong?

slide-16
SLIDE 16

25 50 75 100 50 100 150 200

What we expect: Uniform Sampling

Uniform sampling of 4000 “reads” across a 200 bp “exon.” Average 20 ± 4.7 per position, min ≈ 9, max ≈33
 I.e., as expected, we see ≈ μ ± 3σ in 200 samples

Count reads starting at each position, not those covering each position

slide-17
SLIDE 17

The bad news: random fragments are not so uniform.

What we get: highly non-uniform coverage

––––––––––– 3’ exon –––––––––

200 nucleotides Mortazavi data

E.g., assuming uniform, the 8 peaks above 100 are > +10σ above mean ~

25 50

Uniform Actual

Count reads starting at each position, not those covering each position

slide-18
SLIDE 18

The bad news: random fragments are not so uniform.

What we get: highly non-uniform coverage

––––––––––– 3’ exon –––––––––

200 nucleotides Mortazavi data

E.g., assuming uniform, the 8 peaks above 100 are > +10σ above mean ~

25 50

Uniform Actual

Count reads starting at each position, not those covering each position

How to make it more uniform?

A: Math tricks like averaging/smoothing (e.g. “coverage”)

  • r transformations (“log”), …, or

B: Try to model (aspects of) causation

WE DO
 THIS

slide-19
SLIDE 19

The bad news: random fragments are not so uniform.

not perfect, but better: 38% reduction in LLR

  • f uniform model;

hugely more likely

What we get: highly non-uniform coverage

200 nucleotides

25 50

Uniform Actual

The Good News: we can (partially) correct the bias

slide-20
SLIDE 20

Fitting a model of the sequence surrounding read starts lets us predict which positions have more reads.

Bias is ^ sequence-dependent

Reads

and platform/sample-dependent

(in part)

slide-21
SLIDE 21

(a) (b) (c) (d) (e)

M e t h

  • d

O u t l i n e

sample (local) background sequences sample foreground sequences train Bayesian network

I.e., learn sequence patterns associated w/ high / low read counts.

slide-22
SLIDE 22
slide-23
SLIDE 23

Modeling Sequence Bias

Want a probability distribution over k-mers, k ≈ 40? Some obvious choices: Full joint distribution: 4k-1 parameters PWM (0-th order Markov): (4-1)•k parameters Something intermediate: Directed Bayes network

12

slide-24
SLIDE 24

One “node” per nucleotide, 
 ±20 bp of read start

  • Filled node means that

position is biased

  • Arrow i → j means letter at

position i modifies bias at j

  • For both, numeric

parameters say how much How–optimize:

ℓ=

n

  • i=1

logPr[xi|si]=

n

  • i=1

log Pr[si|xi]Pr[xi]

  • x∈{0,1}Pr[si|x]Pr[x]

Form of the models:

Directed Bayes nets

slide-25
SLIDE 25

NB:

  • Not just initial

hexamer

  • Span ≥ 19
  • All include

negative positions

  • All different,

even on same platform

Illumina ABI

slide-26
SLIDE 26

Trapnell Data Kullback-Leibler Divergence

Result – Increased Uniformity

Jones Li et al Hansen et al

slide-27
SLIDE 27

R2

* = p-value < 10-23

hypothesis test:
 “Is BN better than X?”

(1-sided Wilcoxon signed-rank test)

Result – Increased Uniformity

Fractional improvement in log-likelihood under uniform model across 1000 exons (R2=1-L’/L)

slide-28
SLIDE 28

104

If > 10,000 reads are used, the probability

  • f a non-empty model < 0.0004

Prob(non-empty model | unbiased data) Number of training reads

“First, do no harm”

17

Theorem: The probability of “false bias discovery,” i.e., of

learning a non-empty model from n reads sampled from unbiased data, declines exponentially with n.

slide-29
SLIDE 29

Given: r-sided die, with probs p1...pr of each face. Roll it n=10,000 times; observed frequencies = q1, …, qr, (the MLEs for the unknown qi’s). How close is pi to qi? Fancy name, simple idea: H(Q||P) is just the expected per-sample contribution to log-likelihood ratio test for “was X sampled from H0: P vs H1: Q?”

how different are two distributions?

18

  • 1000 : 1

m H(Q||P) mH(Q||P) ≥ 1000 m ≥ 1000 H(Q||P)

  • k k

Q P

H(Q||P) =

  • i

qi qi pi qi pi Q P k pi ≈ qi

  • H qi pi

H

slide-30
SLIDE 30

19

  • P p1, . . . , pr

pi = 1 r r = 4k k X1, X2, . . . , Xr n =

i Xi P

qi = Xi

n ≈ pi

P Q E[qi] = E Xi n

  • = E[Xi]

n = npi n = pi 1/√n H(Q||P) =

r

  • i=1

qi qi pi =

r

  • i=1

qi

  • 1 + qi − pi

pi

slide-31
SLIDE 31

20

(1 + x) H(Q||P) ≈

r

  • i=1

qi

  • qi − pi

pi − 1 2 qi − pi pi 2 =

r

  • i=1

qi qi − pi pi − qi 2pi (qi − pi)2 pi r

i=1 qi = r i=1 pi = 1 r i=1 pi qi−pi pi

= 0 H(Q||P) ≈

r

  • i=1

qi qi − pi pi − pi qi − pi pi − qi 2pi (qi − pi)2 pi =

r

  • i=1

(qi − pi)2 pi

  • 1 − qi

2pi

  • ≈ 1

2

r

  • i=1

(qi − pi)2 pi qi ≈ pi n2/n2 H(Q||P) ≈ 1 2n

r

  • i=1

(nqi − npi)2 npi = 1 2n

r

  • i=1

(Xi − E[Xi])2 E[Xi]

slide-32
SLIDE 32

21

  • χ2

n → ∞ χ2 r − 1 r −1 Q P E[H(Q||P)] = r − 1 2n

  • 5

10 15 20

  • 20
  • 15
  • 10
  • 5

Relative Entropy, wrt Uniform, of Observed n balls in r bins

log2(n) log2(relative entropy) Each Circle is mean of 100 trials; Stars are theoretical estimates for n/r >= 1/4. r = 16384 * * * * * * * * * r = 1024 * * * * * * * * * * * * * r = 256 * * * * * * * * * * * * * * * r = 64 * * * * * * * * * * * * * * * * * r = 16 * * * * * * * * * * * * * * * * * * * r = 2 * * * * * * * * * * * * * * * * * * * * *

slide-33
SLIDE 33

22

5 10 15 20

  • 20
  • 15
  • 10
  • 5

Relative Entropy, wrt Uniform, of Observed n balls in r bins

log2(n) log2(relative entropy) Each Circle is mean of 100 trials; Stars are theoretical estimates for n/r >= 1/4. r = 16384 * * * * * * * * * r = 1024 * * * * * * * * * * * * * r = 256 * * * * * * * * * * * * * * * r = 64 * * * * * * * * * * * * * * * * * r = 16 * * * * * * * * * * * * * * * * * * * r = 2 * * * * * * * * * * * * * * * * * * * * *

  • E[H(Q||P)] = r − 1

2n

… and after a modicum of algebra: … which empirically is a good approximation:

LLR of error rises with number of parameters r, declines with size of training set n

slide-34
SLIDE 34

… while accuracy and runtime rise with n (empirically)

23

R2

  • 104

Training time: 104 reads in minutes; 
 105 reads in an hour

slide-35
SLIDE 35

does it matter?

Possible objection to the approach: Typical expts compare gene A in sample 1 to itself in sample 2. Gene A’s sequence is unchanged, “so the bias is the same” & correction is useless/dangerous Responses: SNPs and/or alternative splicing might have a big effect, if samples are genetically different and/or engender changes in isoform usage Atypical experiments, e.g., imprinting, allele specific

expression, xenografts, ribosome profiling, ChIPseq, RAPseq, …

Bias is sample-dependent, to an unknown degree Strong control of “false bias discovery” ⇒ little risk

24

slide-36
SLIDE 36

25

Batch Effects? YES!

0.75 0.25 1.00 0.00 0.50

Proportionality Correlation Proportionality Correlation Kallisto Kallisto 0.761 0.761 Cufflinks Cufflinks 0.778 0.778 RSEM/ML RSEM/ML 0.784 0.784 Salmon Salmon 0.800 0.800 eXpress eXpress 0.805 0.805 Sailfish Sailfish 0.808 0.808 RSEM/PM RSEM/PM 0.917 0.917 BitSeq BitSeq 0.922 0.922 Isolator Isolator 0.961 0.961

AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2

a b

Isolator Salmon Cufflinks eXpress BitSeq Kallisto Change in correlation due to bias correction Change in correlation due to bias correction

  • 0.04
  • 0.02

0.00 0.02 0.04

  • A: Pairwise proportionality correlation between technical replicates; 1 lane
  • f 2 fl︎owcells each at ︎5 sites, all HiSeq 2000. B: The absolute change in

correlation induced by enabling bias correction (where available). 


For clarity, BitSeq est. of "MAY 2”, excluded; bias correction was extremely detrimental there.

slide-37
SLIDE 37

Associate Editor: Alex Bateman ABSTRACT Motivation: Quantification of sequence abundance in RNA-Seq experiments is often conflated by protocol-specific sequence bias. The exact sources of the bias are unknown, but may be influenced by

These biases may adversely effect transcript discovery, as low level noise may be overreported in some regions, and in

  • thers, active transcription may be underreported. They render

untrustworthy comparisons of relative abundance between genes

26

slide-38
SLIDE 38

27

Availability

h t t p : / / b i

  • c
  • n

d u c t

  • r

.

  • r

g / p a c k a g e s / r e l e a s e / b i

  • c

/ h t m l / s e q b i a s . h t m l

slide-39
SLIDE 39

Alternate Splicing

slide-40
SLIDE 40

29

Scale chr19: PALM PALM 5 kb hg19 740,000 745,000 63 _ 0 _ 58 _ 0 _ 93 _ 0 _ 53 _ 0 _ 66 _ 0 _ 61 _ 0 _

Day 20 1 Year

coverage

slide-41
SLIDE 41

30

Scale chr19: MYH14 MYH14 MYH14 1 kb hg19 50,727,000 50,727,500 50,728,000 50,728,500 50,729,000 11 _ 0 _ 20 _ 0 _ 26 _ 0 _ 18 _ 0 _ 25 _ 0 _ 26 _ 0 _

Day 20 1 Year

coverage

slide-42
SLIDE 42

31

Scale chr19: FCGRT FCGRT 5 kb hg19 50,020,000 50,025,000 1yr-3

Day 20 1 Year

coverage

slide-43
SLIDE 43

Is Isoform Quantification Hard?

Sequencing depth per-isoform is lower Many reads ambiguously mapped to multiple isoforms Isoform proportions and total expression may both vary All the previously-mentioned bias issues, including batch effects, affect all measurements Differences among isoforms may be only a small fraction

  • f nucleotides in transcript, potentially exacerbating bias

Isoform annotation is incomplete/poor

32

slide-44
SLIDE 44

33

Liu, et al. BMC Bioinformatics 15.1 (2014): 364

slide-45
SLIDE 45

Isolator Soon to be the world’s best isoform quantitation tool Bayesian hierarchical model + fast MCMC sampler give mean and uncertainty in estimates Can handle dozens of RNAseq samples per hour

In Progress

34

When data is lacking, estimates are shrunk towards each other, supressing suprious changes. 1 read vs. 2 reads is probably not a 2-fold change in transcription!

Experiment

1 Year Day 20 1 2 3 1 2 3

Experiment Conditions Replicates

slide-46
SLIDE 46

Why a Hierarchical Bayesian Model?

In a nutshell: A natural assumption is that “nothing has changed,” unless refuted by data. (Most genes don’t change.) Hierarchical model allows estimation of baseline expression/isoform usage/variability across all samples This helps compensate for lower per-isoform coverage Ex: Given 4 isoforms with 1, 1, 2, 2 reads in condition A vs 2, 2, 1, 1 reads in condition B do you think all 4 are 2x different?

35

slide-47
SLIDE 47

Why MCMC?

In a nutshell: posterior means are more stable than MLEs

Likelihood surface max often a broad plateau, not a sharp peak

Toy example: Isoform 1, length 1k: Isoform 2, length 2k:

36

– –

For simple likelihood
 model, one read here
 yields MLE expression 


  • f Iso1 twice that of Iso2

But one read 
 here gives zero as MLE for Iso1! OTOH, posterior mean is not zero in either case

slide-48
SLIDE 48

Some Benchmarks

“Sequencing Quality Consortium” (SEQC) 4 RNA samples with spike-ins They ran RNAseq They did extensive PCR for “gold standard” We ran multiple tools (on common alignment) Evaluated “Proportionality correlation”

(2•covariance/sum-of-variances, log-scale; usual -1 … 1 range)

slide-49
SLIDE 49

38

Table 2: Proportionality correlation between gene-level quanti︎fication

  • f 18353 genes using PrimePCR qPCR and RNA-Seq quanti︎fication.
slide-50
SLIDE 50

39

slide-51
SLIDE 51

40

  • c 0.75a + 0.25b

d 0.25a + 0.75b

slide-52
SLIDE 52

41

  • 0.919
slide-53
SLIDE 53

42

  • Number of Reads

Number of Reads

105 5.5 6 6.5 7 7.5 10 10 10 10 10 Isolator Cufflinks RSEM/ML BitSeq Salmon eXpress Sailfish Kallisto

Method Method Correlation Correlation

0.6 0.7 0.8 0.9

  • (100x2)

RSEM/PM: < 0.6

slide-54
SLIDE 54

43

A: Pairwise proportionality correlation between technical replicates; 1 lane

  • f 2 fl︎owcells each at ︎5 sites, all HiSeq 2000. B: The absolute change in

correlation induced by enabling bias correction (where available). 


For clarity, BitSeq est. of "MAY 2”, excluded; bias correction was extremely detrimental there.

Batch Effects? YES!

0.75 0.25 1.00 0.00 0.50

Proportionality Correlation Proportionality Correlation Kallisto Kallisto 0.761 0.761 Cufflinks Cufflinks 0.778 0.778 RSEM/ML RSEM/ML 0.784 0.784 Salmon Salmon 0.800 0.800 eXpress eXpress 0.805 0.805 Sailfish Sailfish 0.808 0.808 RSEM/PM RSEM/PM 0.917 0.917 BitSeq BitSeq 0.922 0.922 Isolator Isolator 0.961 0.961

AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2

a b

Isolator Salmon Cufflinks eXpress BitSeq Kallisto Change in correlation due to bias correction Change in correlation due to bias correction

  • 0.04
  • 0.02

0.00 0.02 0.04

slide-55
SLIDE 55

44

  • Isolator

Time

slide-56
SLIDE 56

45

Let-7 family of microRNA is required for maturation and adult-like metabolism in stem cell-derived cardiomyocytes

Kavitha T. Kuppusamya,b, Daniel C. Jonesc, Henrik Sperbera,b,d, Anup Madane, Karin A. Fischera,b, Marita L. Rodriguezf, Lil Pabona,g,h, Wei-Zhong Zhua,g,h, Nathaniel L. Tullocha,g,h, Xiulan Yanga,g,h, Nathan J. Sniadeckif,i, Michael A. Laflammea,g,h, Walter L. Ruzzoc,j,k, Charles E. Murrya,g,h,i,l, and Hannele Ruohola-Bakera,b,i,j,m,1

aInstitute for Stem Cell and Regenerative Medicine, Seattle, WA 98109; Departments of bBiochemistry, cComputer Science and Engineering, and dChemistry,

University of Washington, Seattle, WA 98195; eLabCorp Genomic Services, Seattle, WA 98109; fDepartment of Mechanical Engineering, gDepartment of Pathology, hCenter for Cardiovascular Biology, iDepartment of Bioengineering, and jDepartment of Genome Sciences, University of Washington, Seattle, WA 98195; kFred Hutchinson Cancer Research Center, Seattle, WA 98109; and lDepartment of Medicine/Cardiology and mDepartment of Biology, University

  • f Washington, Seattle, WA 98195

Edited by Eric N. Olson, University of Texas Southwestern Medical Center, Dallas, TX, and approved April 14, 2015 (received for review December 18, 2014)

types and large-scale gene expression studies have demonstrated

Story 2

Published: 2015-05-11

slide-57
SLIDE 57

background

It is possible to grow cardiomyocytes (heart muscle cells) from human embryonic stem cells (hESC-CMs) Can grow billions of them Can transplant them into animals after heart attack Cells integrate/heart function improves (after a few weeks) BUT – arrhythmias, at least in the early stages Why? Probably because hESC-CMs were immature.
 This will be tried in humans within a few years; ability to lab-culture mature hESC-CMs will greatly improve chances for success. Growing them quickly will greatly improve the economics. How can we do that?

46

slide-58
SLIDE 58

step 1: find molecular biomarkers for maturity

47 ** ** **

2 4 6

Day 20-CM cEHT

ssue-

M)

Fold change relative to GAPDH

** ** ** ** ** **

C D E

Cardiac maturaon markers Cardiac hypertrophy signaling Cardiac beta adrenergic signaling cAMP-mediated signaling Ca signaling G protein coupled receptor signaling Acn-cytoskeleton Integrin signaling FA metabolism Pluripotency associated Cell cycle PI3/AKT-insulin signaling

PC2 (21%)

Day20-CM 1y-CM HAH HFV HFA

PC1 (25%) 20 40

  • 20
  • 40

10 20

  • 10
  • 20
  • 30

**

1 2

  • 1
  • 2
slide-59
SLIDE 59

step 1 (cont.): find miRNA biomarkers for maturity, too

48

0.001 0.01 0.1 1 10 100 1000 10000 200 400 600 8

A

change

1y-CM let-7 mir-378 mir-30b mir-129-5p mir-502-5p

fold change,
 1yr vs 20day

miRNAs Number of miRNA targets in 1y -CM seq data set p-values let-7 98 3.602148e-05 mir-378 80 3.488876e-09 mir-502 47 3.28949e-04

slide-60
SLIDE 60

step2a: let-7 is driver, not passenger – it’s necessary

49

20 40 60

EV Lin28a OE +SCM Lin28a OE +let-7g OE

0.5 1 1.5 2 0.1 1 10 100 1000

EV Lin28a OE +SCM Lin28a OE +let-7g OE

mRNA levels relative to GAPDH Cell Perimeter (µM)

Sarcomere length

(µM)

Circularity Index

** ** ** **

Fold change relative to RNU66

B C D F G

**

mRNA levels of Lin28a relative to GAPDH

in28a OE +SCM

E

EV Lin28 OE+SCM Lin28 OE+ let7 g OE

DAPI α Actinin α Actin EV Lin28a OE + SCM Lin28a OE+ let-7g OE

100 200

** **

EV Lin28a OE + SCM Lin28a OE+ let-7g OE

Cell area (µM²) 200 400 600

** **

EV Lin28a OE + SCM Lin28a OE+ let-7g OE

**

0.4 0.8

** **

0.5 1 1.5 SCM let7g KD

Fold change relative to RNU66

50 100 150 SCM let-7g KD Cell Perimeter (µM) 200 400 600 SCM let-7g KD Cell area (µM²) 0.5 1 1.5 2 SCM let-7g KD

Sarcomere length

(µM) 0.2 0.4 0.6 SCM let-7g KD

Circularity Index

H I J K L M N

EV Lin28a OE + SCM Lin28a OE+ let-7g OE

SCM let-7g KD

α Actinin DAPI

** ** ** ** **

Lin28a let-7g

A

let-7g 0.4 0.8 1.2 1.6 2

EV Lin28a OE+SCM Lin28a OE+let-7g OE

** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
slide-61
SLIDE 61

step2b: let-7 is driver, not passenger – it’s sufficient

50

1 2 3 4 Day 20-CM EV let-7i OE let-7g OE 20 40 let-7i let-7g

EV let-7i OE let-7g OE

Fold change relative to RNU66

** **

200 400 600 let-7i let-7g Day 20-CM cEHT 1 y-CM HAH

** ** ** ** ** **

A B C

d

F G H I J K

Time (sec) Time (sec) Twitch force (nN)

0.1 0.2 0.3 0.4 0.5 EV let-7i OE let-7g OE

Circularity Index

**

Twitch force (nN) 5 10 15

EV let-7i OE let-7g OE

**

Fold change relative to GAPDH 0.5 1 1.5 2 EV let-7i OE let-7g OE

Frequency (Hz)

**

1.5 1.6 1.7 1.8 EV let-7i OE let-7g OE

Sarcomere

length (µm)

**

500 1000 1500

EV let-7i OE let-7g OE

**

E

D

Cell area (µm²)

Day 30-CM

Fold change relative to RNU66 DAPI α Actinin α Actin EV let-7i OE let-7g OE

D

2 4 6 8 10 12 EV let-7i OE let-7g OE # of CMs APD90 (msec) 200 400 600 800 1000 1200 EV let-7i OE let-7g OE APD90 (msec) 0.5 1 EV let-7i OE let-7g OE

L M N O

1 sec APD50/APD90 EV let-7OE

*** **

100 200 300 400

EV let-7i OE let-7g OE

Cell Perimeter (µm)

** ** ** ** ** ** ** ** **

  • ΔF/F
  • ΔF/F
** ** ** ** **** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
slide-62
SLIDE 62

step3: characterization

Pathways Physiology Etc.

51

slide-63
SLIDE 63

Back to Story 1: differential splicing speaks, too

52 EV

B C

MYH7

H7 1y-CM H7 Day 20-CM EV HFV HFA let-7g OE HAH

40 20

  • 20
  • 40
  • 30
  • 20
  • 10

10 20 PC1 PC2

H7 Day20- CM EV HFA HFV let-7g OE CM IMR90 CM HAH

B C

ing

20-CM

20

1 2

  • 1
  • 2

H7 1y- iPSC 1y-

PC1

H7 Day 20-CM EV let-7g OE H7 1y -CM HFV HFA HAH

  • 10
  • 5

10 15 5

  • 5
  • 4
  • 3
  • 2
  • 1

1 3 2 PC2

  • 30
  • 2

IMR90 iPSC 1y -CM

D

B C

1 2

  • 1
  • 2

B: gene expression in cardiac- pathways (E) tracks maturation (unsurprisingly) C/D: so does splicing (indp of level) via Isolator-detected probable monotonic

  • changes. (Not easily assessed by

MLE-based methods…)

C

slide-64
SLIDE 64

summary

RNAseq data shows strong technical biases Of course, compare to appropriate control samples But that’s not enough, due to: batch effects, SNPs/genetic heterogeneity, alt splicing, … all of which tend to differently bias sample/control BUT careful modeling can help.

53

slide-65
SLIDE 65

summary

Alternative splicing changes are very hard to quantify: lower coverage, ambiguous mapping, bias, … BUT careful modeling can help: Bayesian hierarchical model borrows power across all samples Sampling/posterior mean estimation is more robust than MLE Sampling allows novel questions to be addressed, e.g., “is isoform shift probably monotonic in time” It doesn’t have to be slow AND 90% of genes undergo alt splicing for a reason; you can’t see what it is if you don’t look

54

slide-66
SLIDE 66

summary

Amazing progress in stem cell technology Ability to study and control cellular developmental pathways is one of the frontiers of modern biology Multi-faceted, multi-disciplinary problems with rich data In this study, microRNA let-7 identified as a key driver

  • f cardiomyocyte maturation

Differential splicing of many transcripts clearly implicated; their exact roles remain to be determined.

55

slide-67
SLIDE 67

Acknowledgements

Daniel Jones

Katze Lab

Michael Katze Xinxia Peng

Stem Cell Labs

Tony Blau, Chuck Murry, Hannele Ruohola-Baker, Nathan Palpant, Kavitha Kuppusamy, …

Funding

NIGMS, NHGR, NIAID