CSEP 527 Computational Biology
Gene Expression Analysis
1
CSEP 527 Computational Biology Gene Expression Analysis 1 - - PowerPoint PPT Presentation
CSEP 527 Computational Biology Gene Expression Analysis 1 Assaying Gene Expression 3 Microarrays 4 RNAseq Millions of reads, DNA Sequencer say, 100 bp each map to genome, analyze 5 Goals of RNAseq #1: Which genes are
Gene Expression Analysis
1
3
4
DNA Sequencer ⬇ ⬇ ⬇ map to genome, analyze
5
Millions of reads, say, 100 bp each
#1: Which genes are being expressed?
How? assemble reads (fragments of mRNAs) into (nearly) full-length mRNAs and/or map them to a reference genome
#2: How highly expressed are they?
How? count how many fragments come from each gene–expect more highly expressed genes to yield more reads, after correcting for biases like mRNA length
#3: What’s same/diff between 2 samples
E.g., tumor/normal
#4: ...
7
2
intron
exon 5’ exon
De novo Assembly
mostly deBruijn-based, but likely to change with longer reads more complex than genome assembly due to alt splicing, wide diffs in expression levels; e.g. often multiple “k’s” used pro: no ref needed (non-model orgs), novel discoveries possible, e.g. very short exons con: less sensitive to weakly-expressed genes
Reference-based (more later)
pro/con: basically the reverse
Both: subsequent bias correction, quantitation, differential expression calls, fusion detection, etc.
8
n map reads to ref transcriptome (optional) n map reads to ref genome n unmapped reads remapped as 25mers n novel splices = 25mers anchored 2 sides n stitch original reads across these n remap reads with minimal overlaps n Roughly: 10m reads/hr, 4Gbytes
(typical data set 100m–1b reads)
BWA
9
Kim,et al. 2013. “TopHat2: Accurate Alignment of Transcriptomes in the Presence of Insertions, Deletions and Gene Fusions.” Genome Biology 14 (4) (April 25): R36. doi:10.1186/gb-2013-14-4-r36.
20 Scale chr19: FCGRT FCGRT 5 kb hg19 50,020,000 50,025,000 1yr-3
Day 20 1 Year
RNAseq Example
Extract RNA (either polyA ↔ polyT or tot – rRNA) Reverse-transcribe into DNA (“cDNA”) Make double-stranded, maybe amplify Cut into, say, ~300bp fragments Add adaptors to each end Sequence ~100-175bp from one or both ends CAUTIONS: non-uniform sampling, sequence (e.g. G+C), 5’-3’, and length biases
6
Computer Science and Engineering Genome Sciences University of Washington Fred Hutchinson Cancer Research Center Seattle, WA, USA
Copenhagen, 2015-Aug-18 ruzzo@uw.edu
cDNA, fragment, end repair, A-tail, ligate, PCR, … QC filter, trim, map, …
25 50 75 100 50 100 150 200
Uniform sampling of 4000 “reads” across a 200 bp “exon.” Average 20 ± 4.7 per position, min ≈ 9, max ≈33 I.e., as expected, we see ≈ μ ± 3σ in 200 samples
Count reads starting at each position, not those covering each position
The bad news: random fragments are not so uniform.
––––––––––– 3’ exon –––––––––
200 nucleotides Mortazavi data
E.g., assuming uniform, the 8 peaks above 100 are > +10σ above mean ~
25 50
Uniform Actual
Count reads starting at each position, not those covering each position
The bad news: random fragments are not so uniform.
––––––––––– 3’ exon –––––––––
200 nucleotides Mortazavi data
E.g., assuming uniform, the 8 peaks above 100 are > +10σ above mean ~
25 50
Uniform Actual
Count reads starting at each position, not those covering each position
A: Math tricks like averaging/smoothing (e.g. “coverage”)
WE DO THIS
The bad news: random fragments are not so uniform.
not perfect, but better: 38% reduction in LLR
hugely more likely
200 nucleotides
25 50
Uniform Actual
Fitting a model of the sequence surrounding read starts lets us predict which positions have more reads.
Reads
(in part)
(a) (b) (c) (d) (e)
sample (local) background sequences sample foreground sequences train Bayesian network
I.e., learn sequence patterns associated w/ high / low read counts.
Modeling Sequence Bias
Want a probability distribution over k-mers, k ≈ 40? Some obvious choices: Full joint distribution: 4k-1 parameters PWM (0-th order Markov): (4-1)•k parameters Something intermediate: Directed Bayes network
12
One “node” per nucleotide, ±20 bp of read start
position is biased
position i modifies bias at j
parameters say how much How–optimize:
ℓ=
n
logPr[xi|si]=
n
log Pr[si|xi]Pr[xi]
NB:
hexamer
negative positions
even on same platform
Illumina ABI
Jones Li et al Hansen et al
* = p-value < 10-23
hypothesis test: “Is BN better than X?”
(1-sided Wilcoxon signed-rank test)
Fractional improvement in log-likelihood under uniform model across 1000 exons (R2=1-L’/L)
104
If > 10,000 reads are used, the probability
Prob(non-empty model | unbiased data) Number of training reads
17
learning a non-empty model from n reads sampled from unbiased data, declines exponentially with n.
Given: r-sided die, with probs p1...pr of each face. Roll it n=10,000 times; observed frequencies = q1, …, qr, (the MLEs for the unknown qi’s). How close is pi to qi? Fancy name, simple idea: H(Q||P) is just the expected per-sample contribution to log-likelihood ratio test for “was X sampled from H0: P vs H1: Q?”
how different are two distributions?
18
m H(Q||P) mH(Q||P) ≥ 1000 m ≥ 1000 H(Q||P)
Q P
H(Q||P) =
qi qi pi qi pi Q P k pi ≈ qi
H
19
pi = 1 r r = 4k k X1, X2, . . . , Xr n =
i Xi P
qi = Xi
n ≈ pi
P Q E[qi] = E Xi n
n = npi n = pi 1/√n H(Q||P) =
r
qi qi pi =
r
qi
pi
20
(1 + x) H(Q||P) ≈
r
qi
pi − 1 2 qi − pi pi 2 =
r
qi qi − pi pi − qi 2pi (qi − pi)2 pi r
i=1 qi = r i=1 pi = 1 r i=1 pi qi−pi pi
= 0 H(Q||P) ≈
r
qi qi − pi pi − pi qi − pi pi − qi 2pi (qi − pi)2 pi =
r
(qi − pi)2 pi
2pi
2
r
(qi − pi)2 pi qi ≈ pi n2/n2 H(Q||P) ≈ 1 2n
r
(nqi − npi)2 npi = 1 2n
r
(Xi − E[Xi])2 E[Xi]
21
n → ∞ χ2 r − 1 r −1 Q P E[H(Q||P)] = r − 1 2n
10 15 20
Relative Entropy, wrt Uniform, of Observed n balls in r bins
log2(n) log2(relative entropy) Each Circle is mean of 100 trials; Stars are theoretical estimates for n/r >= 1/4. r = 16384 * * * * * * * * * r = 1024 * * * * * * * * * * * * * r = 256 * * * * * * * * * * * * * * * r = 64 * * * * * * * * * * * * * * * * * r = 16 * * * * * * * * * * * * * * * * * * * r = 2 * * * * * * * * * * * * * * * * * * * * *
22
5 10 15 20
Relative Entropy, wrt Uniform, of Observed n balls in r bins
log2(n) log2(relative entropy) Each Circle is mean of 100 trials; Stars are theoretical estimates for n/r >= 1/4. r = 16384 * * * * * * * * * r = 1024 * * * * * * * * * * * * * r = 256 * * * * * * * * * * * * * * * r = 64 * * * * * * * * * * * * * * * * * r = 16 * * * * * * * * * * * * * * * * * * * r = 2 * * * * * * * * * * * * * * * * * * * * *
2n
… and after a modicum of algebra: … which empirically is a good approximation:
LLR of error rises with number of parameters r, declines with size of training set n
… while accuracy and runtime rise with n (empirically)
23
R2
Training time: 104 reads in minutes; 105 reads in an hour
does it matter?
Possible objection to the approach: Typical expts compare gene A in sample 1 to itself in sample 2. Gene A’s sequence is unchanged, “so the bias is the same” & correction is useless/dangerous Responses: SNPs and/or alternative splicing might have a big effect, if samples are genetically different and/or engender changes in isoform usage Atypical experiments, e.g., imprinting, allele specific
expression, xenografts, ribosome profiling, ChIPseq, RAPseq, …
Bias is sample-dependent, to an unknown degree Strong control of “false bias discovery” ⇒ little risk
24
25
0.75 0.25 1.00 0.00 0.50
Proportionality Correlation Proportionality Correlation Kallisto Kallisto 0.761 0.761 Cufflinks Cufflinks 0.778 0.778 RSEM/ML RSEM/ML 0.784 0.784 Salmon Salmon 0.800 0.800 eXpress eXpress 0.805 0.805 Sailfish Sailfish 0.808 0.808 RSEM/PM RSEM/PM 0.917 0.917 BitSeq BitSeq 0.922 0.922 Isolator Isolator 0.961 0.961
AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2
a b
Isolator Salmon Cufflinks eXpress BitSeq Kallisto Change in correlation due to bias correction Change in correlation due to bias correction
0.00 0.02 0.04
correlation induced by enabling bias correction (where available).
For clarity, BitSeq est. of "MAY 2”, excluded; bias correction was extremely detrimental there.
Associate Editor: Alex Bateman ABSTRACT Motivation: Quantification of sequence abundance in RNA-Seq experiments is often conflated by protocol-specific sequence bias. The exact sources of the bias are unknown, but may be influenced by
These biases may adversely effect transcript discovery, as low level noise may be overreported in some regions, and in
untrustworthy comparisons of relative abundance between genes
26
27
Availability
h t t p : / / b i
d u c t
.
g / p a c k a g e s / r e l e a s e / b i
/ h t m l / s e q b i a s . h t m l
29
Scale chr19: PALM PALM 5 kb hg19 740,000 745,000 63 _ 0 _ 58 _ 0 _ 93 _ 0 _ 53 _ 0 _ 66 _ 0 _ 61 _ 0 _
Day 20 1 Year
coverage
30
Scale chr19: MYH14 MYH14 MYH14 1 kb hg19 50,727,000 50,727,500 50,728,000 50,728,500 50,729,000 11 _ 0 _ 20 _ 0 _ 26 _ 0 _ 18 _ 0 _ 25 _ 0 _ 26 _ 0 _
Day 20 1 Year
coverage
31
Scale chr19: FCGRT FCGRT 5 kb hg19 50,020,000 50,025,000 1yr-3
Day 20 1 Year
coverage
Is Isoform Quantification Hard?
Sequencing depth per-isoform is lower Many reads ambiguously mapped to multiple isoforms Isoform proportions and total expression may both vary All the previously-mentioned bias issues, including batch effects, affect all measurements Differences among isoforms may be only a small fraction
Isoform annotation is incomplete/poor
32
33
Liu, et al. BMC Bioinformatics 15.1 (2014): 364
Isolator Soon to be the world’s best isoform quantitation tool Bayesian hierarchical model + fast MCMC sampler give mean and uncertainty in estimates Can handle dozens of RNAseq samples per hour
In Progress
34
When data is lacking, estimates are shrunk towards each other, supressing suprious changes. 1 read vs. 2 reads is probably not a 2-fold change in transcription!
Experiment
1 Year Day 20 1 2 3 1 2 3
Experiment Conditions Replicates
Why a Hierarchical Bayesian Model?
In a nutshell: A natural assumption is that “nothing has changed,” unless refuted by data. (Most genes don’t change.) Hierarchical model allows estimation of baseline expression/isoform usage/variability across all samples This helps compensate for lower per-isoform coverage Ex: Given 4 isoforms with 1, 1, 2, 2 reads in condition A vs 2, 2, 1, 1 reads in condition B do you think all 4 are 2x different?
35
Why MCMC?
In a nutshell: posterior means are more stable than MLEs
Likelihood surface max often a broad plateau, not a sharp peak
Toy example: Isoform 1, length 1k: Isoform 2, length 2k:
36
For simple likelihood model, one read here yields MLE expression
But one read here gives zero as MLE for Iso1! OTOH, posterior mean is not zero in either case
(2•covariance/sum-of-variances, log-scale; usual -1 … 1 range)
38
Table 2: Proportionality correlation between gene-level quanti︎fication
39
40
d 0.25a + 0.75b
41
42
Number of Reads
105 5.5 6 6.5 7 7.5 10 10 10 10 10 Isolator Cufflinks RSEM/ML BitSeq Salmon eXpress Sailfish Kallisto
Method Method Correlation Correlation
0.6 0.7 0.8 0.9
RSEM/PM: < 0.6
43
A: Pairwise proportionality correlation between technical replicates; 1 lane
correlation induced by enabling bias correction (where available).
For clarity, BitSeq est. of "MAY 2”, excluded; bias correction was extremely detrimental there.
0.75 0.25 1.00 0.00 0.50
Proportionality Correlation Proportionality Correlation Kallisto Kallisto 0.761 0.761 Cufflinks Cufflinks 0.778 0.778 RSEM/ML RSEM/ML 0.784 0.784 Salmon Salmon 0.800 0.800 eXpress eXpress 0.805 0.805 Sailfish Sailfish 0.808 0.808 RSEM/PM RSEM/PM 0.917 0.917 BitSeq BitSeq 0.922 0.922 Isolator Isolator 0.961 0.961
AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2 AGR 1 AGR 2 BGI 1 BGI 2 CNL 1 CNL 2 MAY 1 MAY 2 NVS 1 NVS 2
a b
Isolator Salmon Cufflinks eXpress BitSeq Kallisto Change in correlation due to bias correction Change in correlation due to bias correction
0.00 0.02 0.04
44
45
Let-7 family of microRNA is required for maturation and adult-like metabolism in stem cell-derived cardiomyocytes
Kavitha T. Kuppusamya,b, Daniel C. Jonesc, Henrik Sperbera,b,d, Anup Madane, Karin A. Fischera,b, Marita L. Rodriguezf, Lil Pabona,g,h, Wei-Zhong Zhua,g,h, Nathaniel L. Tullocha,g,h, Xiulan Yanga,g,h, Nathan J. Sniadeckif,i, Michael A. Laflammea,g,h, Walter L. Ruzzoc,j,k, Charles E. Murrya,g,h,i,l, and Hannele Ruohola-Bakera,b,i,j,m,1
aInstitute for Stem Cell and Regenerative Medicine, Seattle, WA 98109; Departments of bBiochemistry, cComputer Science and Engineering, and dChemistry,
University of Washington, Seattle, WA 98195; eLabCorp Genomic Services, Seattle, WA 98109; fDepartment of Mechanical Engineering, gDepartment of Pathology, hCenter for Cardiovascular Biology, iDepartment of Bioengineering, and jDepartment of Genome Sciences, University of Washington, Seattle, WA 98195; kFred Hutchinson Cancer Research Center, Seattle, WA 98109; and lDepartment of Medicine/Cardiology and mDepartment of Biology, University
Edited by Eric N. Olson, University of Texas Southwestern Medical Center, Dallas, TX, and approved April 14, 2015 (received for review December 18, 2014)
types and large-scale gene expression studies have demonstrated
Published: 2015-05-11
background
It is possible to grow cardiomyocytes (heart muscle cells) from human embryonic stem cells (hESC-CMs) Can grow billions of them Can transplant them into animals after heart attack Cells integrate/heart function improves (after a few weeks) BUT – arrhythmias, at least in the early stages Why? Probably because hESC-CMs were immature. This will be tried in humans within a few years; ability to lab-culture mature hESC-CMs will greatly improve chances for success. Growing them quickly will greatly improve the economics. How can we do that?
46
step 1: find molecular biomarkers for maturity
47 ** ** **
2 4 6
Day 20-CM cEHT
ssue-
M)
Fold change relative to GAPDH
** ** ** ** ** **
C D E
Cardiac maturaon markers Cardiac hypertrophy signaling Cardiac beta adrenergic signaling cAMP-mediated signaling Ca signaling G protein coupled receptor signaling Acn-cytoskeleton Integrin signaling FA metabolism Pluripotency associated Cell cycle PI3/AKT-insulin signaling
PC2 (21%)
Day20-CM 1y-CM HAH HFV HFA
PC1 (25%) 20 40
10 20
**
1 2
step 1 (cont.): find miRNA biomarkers for maturity, too
48
0.001 0.01 0.1 1 10 100 1000 10000 200 400 600 8
change
1y-CM let-7 mir-378 mir-30b mir-129-5p mir-502-5p
miRNAs Number of miRNA targets in 1y -CM seq data set p-values let-7 98 3.602148e-05 mir-378 80 3.488876e-09 mir-502 47 3.28949e-04
step2a: let-7 is driver, not passenger – it’s necessary
49
20 40 60
EV Lin28a OE +SCM Lin28a OE +let-7g OE
0.5 1 1.5 2 0.1 1 10 100 1000
EV Lin28a OE +SCM Lin28a OE +let-7g OE
mRNA levels relative to GAPDH Cell Perimeter (µM)
Sarcomere length
(µM)
Circularity Index
** ** ** **
Fold change relative to RNU66
B C D F G
**
mRNA levels of Lin28a relative to GAPDH
in28a OE +SCM
E
EV Lin28 OE+SCM Lin28 OE+ let7 g OE
DAPI α Actinin α Actin EV Lin28a OE + SCM Lin28a OE+ let-7g OE
100 200
** **
EV Lin28a OE + SCM Lin28a OE+ let-7g OE
Cell area (µM²) 200 400 600
** **
EV Lin28a OE + SCM Lin28a OE+ let-7g OE
**
0.4 0.8
** **
0.5 1 1.5 SCM let7g KD
Fold change relative to RNU66
50 100 150 SCM let-7g KD Cell Perimeter (µM) 200 400 600 SCM let-7g KD Cell area (µM²) 0.5 1 1.5 2 SCM let-7g KD
Sarcomere length
(µM) 0.2 0.4 0.6 SCM let-7g KD
Circularity Index
H I J K L M N
EV Lin28a OE + SCM Lin28a OE+ let-7g OE
SCM let-7g KD
α Actinin DAPI
** ** ** ** **
Lin28a let-7g
A
let-7g 0.4 0.8 1.2 1.6 2
EV Lin28a OE+SCM Lin28a OE+let-7g OE
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **step2b: let-7 is driver, not passenger – it’s sufficient
50
1 2 3 4 Day 20-CM EV let-7i OE let-7g OE 20 40 let-7i let-7g
EV let-7i OE let-7g OE
Fold change relative to RNU66
** **
200 400 600 let-7i let-7g Day 20-CM cEHT 1 y-CM HAH
** ** ** ** ** **
A B C
d
F G H I J K
Time (sec) Time (sec) Twitch force (nN)
0.1 0.2 0.3 0.4 0.5 EV let-7i OE let-7g OE
Circularity Index
**
Twitch force (nN) 5 10 15
EV let-7i OE let-7g OE
**
Fold change relative to GAPDH 0.5 1 1.5 2 EV let-7i OE let-7g OE
Frequency (Hz)
**
1.5 1.6 1.7 1.8 EV let-7i OE let-7g OE
Sarcomere
length (µm)
**
500 1000 1500
EV let-7i OE let-7g OE
**
E
D
Cell area (µm²)
Day 30-CM
Fold change relative to RNU66 DAPI α Actinin α Actin EV let-7i OE let-7g OE
D
2 4 6 8 10 12 EV let-7i OE let-7g OE # of CMs APD90 (msec) 200 400 600 800 1000 1200 EV let-7i OE let-7g OE APD90 (msec) 0.5 1 EV let-7i OE let-7g OE
L M N O
1 sec APD50/APD90 EV let-7OE
*** **
100 200 300 400
EV let-7i OE let-7g OE
Cell Perimeter (µm)
** ** ** ** ** ** ** ** **
step3: characterization
Pathways Physiology Etc.
51
Back to Story 1: differential splicing speaks, too
52 EV
B C
MYH7
H7 1y-CM H7 Day 20-CM EV HFV HFA let-7g OE HAH
40 20
10 20 PC1 PC2
H7 Day20- CM EV HFA HFV let-7g OE CM IMR90 CM HAH
B C
ing
20-CM
20
1 2
H7 1y- iPSC 1y-
PC1
H7 Day 20-CM EV let-7g OE H7 1y -CM HFV HFA HAH
10 15 5
1 3 2 PC2
IMR90 iPSC 1y -CM
D
B C
1 2
B: gene expression in cardiac- pathways (E) tracks maturation (unsurprisingly) C/D: so does splicing (indp of level) via Isolator-detected probable monotonic
MLE-based methods…)
C
summary
RNAseq data shows strong technical biases Of course, compare to appropriate control samples But that’s not enough, due to: batch effects, SNPs/genetic heterogeneity, alt splicing, … all of which tend to differently bias sample/control BUT careful modeling can help.
53
summary
Alternative splicing changes are very hard to quantify: lower coverage, ambiguous mapping, bias, … BUT careful modeling can help: Bayesian hierarchical model borrows power across all samples Sampling/posterior mean estimation is more robust than MLE Sampling allows novel questions to be addressed, e.g., “is isoform shift probably monotonic in time” It doesn’t have to be slow AND 90% of genes undergo alt splicing for a reason; you can’t see what it is if you don’t look
54
summary
Amazing progress in stem cell technology Ability to study and control cellular developmental pathways is one of the frontiers of modern biology Multi-faceted, multi-disciplinary problems with rich data In this study, microRNA let-7 identified as a key driver
Differential splicing of many transcripts clearly implicated; their exact roles remain to be determined.
55
Michael Katze Xinxia Peng
Tony Blau, Chuck Murry, Hannele Ruohola-Baker, Nathan Palpant, Kavitha Kuppusamy, …
NIGMS, NHGR, NIAID