Introduction to RNA-Seq David Wood Winter School in Mathematics and - - PowerPoint PPT Presentation

introduction to rna seq
SMART_READER_LITE
LIVE PREVIEW

Introduction to RNA-Seq David Wood Winter School in Mathematics and - - PowerPoint PPT Presentation

Introduction to RNA-Seq David Wood Winter School in Mathematics and Computational Biology July 1, 2013 RNA is... Diverse Dynamic Central DNA Epigenetics rRNA RNA tRNA e c n a d n u b A Protein mRNA Time RNA is... Diverse


slide-1
SLIDE 1

Introduction to RNA-Seq

David Wood Winter School in Mathematics and Computational Biology July 1, 2013

slide-2
SLIDE 2

RNA is...

Central

DNA

RNA

Protein

Epigenetics

Diverse

tRNA mRNA rRNA

Dynamic

Time A b u n d a n c e

slide-3
SLIDE 3

RNA is...

Quantitative Qualitative Understand the molecular basis of gene function. Classify and transform cellular states Integrative

Central

DNA

RNA

Protein

Epigenetics

Diverse

tRNA mRNA rRNA

Dynamic

Time A b u n d a n c e

slide-4
SLIDE 4

RNA studies involve...

Biological System Technology Available Resources

Questions

~/bin

Project DB

slide-5
SLIDE 5

RNA studies involve...

Biological System Technology Available Resources

Questions

~/bin

Project DB This talk: Focusing on reference based mammalian RNA-seq analysis

slide-6
SLIDE 6

pA pA pA pA

ATG ATG

TSS

transcription start site pA polyadenylation signal protein coding regions

ATG

translation start site

AAA

polyadenylation non-coding regions genomic DNA microRNAs spliced intron

TSS TSS TSS TSS

ATG

AAA

ATG

AAA

ATG

AAA

ATG ATG ATG

AAA AAA

ATG

Transcriptional Complexity

slide-7
SLIDE 7

pA pA pA pA

ATG ATG

TSS

transcription start site pA polyadenylation signal protein coding regions

ATG

translation start site

AAA

polyadenylation non-coding regions genomic DNA microRNAs spliced intron

TSS TSS TSS TSS

ATG

AAA

ATG

AAA

ATG

AAA

ATG ATG ATG

AAA AAA

ATG

Transcriptional Complexity

PASR miRNA tiRNA

slide-8
SLIDE 8

AAA AAA Alu

pA pA pA pA

ATG ATG

TSS

transcription start site pA polyadenylation signal protein coding regions

ATG

translation start site

AAA

polyadenylation non-coding regions genomic DNA microRNAs spliced intron

TSS TSS TSS TSS

ATG

AAA

ATG

AAA

ATG

AAA

ATG ATG ATG

AAA AAA

ATG

Transcriptional Complexity

PASR miRNA tiRNA

slide-9
SLIDE 9

AAA AAA Alu

pA pA pA pA

ATG ATG

TSS

transcription start site pA polyadenylation signal protein coding regions

ATG

translation start site

AAA

polyadenylation non-coding regions genomic DNA microRNAs spliced intron

TSS TSS TSS TSS

ATG

AAA

ATG

AAA

ATG

AAA

ATG ATG ATG

AAA AAA

ATG

Transcriptional Complexity

PASR miRNA tiRNA

Mutations Allelic Expression RNA Editing

slide-10
SLIDE 10

pA pA pA pA

ATG ATG

TSS TSS TSS TSS AAA PASR miRNA

ATG

AAA

ATG

AAA

ATG

AAA

ATG ATG ATG

AAA AAA

ATG

tiRNA

RNA-seq

non-spliced reads junction reads strand specific

Cloonan et al. Nat Methods 2008; 5:613-619

AAA Alu

mutations

slide-11
SLIDE 11

Advantages of RNA-seq

!" #!!!!" $!!!!!" $#!!!!" %!!!!!" %#!!!!" #&" #'" (!" ($" (%" ()" (*" (#" ((" (+" (&" ('" +!" +$" ,-./01-2340" 5/06789":6-02;/" <-;462/"=;>2/?" @6?-.>.A;/" /1BCD" <06E>;?6/6"

Discovery genes, exons, junctions, UTRs, fusions (Present and Future)

slide-12
SLIDE 12

Advantages of RNA-seq

!" #!!!!" $!!!!!" $#!!!!" %!!!!!" %#!!!!" #&" #'" (!" ($" (%" ()" (*" (#" ((" (+" (&" ('" +!" +$" ,-./01-2340" 5/06789":6-02;/" <-;462/"=;>2/?" @6?-.>.A;/" /1BCD" <06E>;?6/6"

Discovery genes, exons, junctions, UTRs, fusions (Present and Future) Dynamic Range

Mortazavi et al. Nat. Methods 2008; 5:621–628

slide-13
SLIDE 13

Advantages of RNA-seq

!" #!!!!" $!!!!!" $#!!!!" %!!!!!" %#!!!!" #&" #'" (!" ($" (%" ()" (*" (#" ((" (+" (&" ('" +!" +$" ,-./01-2340" 5/06789":6-02;/" <-;462/"=;>2/?" @6?-.>.A;/" /1BCD" <06E>;?6/6"

Discovery genes, exons, junctions, UTRs, fusions (Present and Future) Dynamic Range

Mortazavi et al. Nat. Methods 2008; 5:621–628

Nucleotide Specific

slide-14
SLIDE 14

Typical experiment workflow

Design Experiment Sample Acquisition Field / Clinic / Lab Validation Verification Sample Acquisition Run Experiment Obtain RNA Make Library Sequencing Base Calling Mapping Library QC Publish Analysis Interpretation 1° 2° 3° 3° 2° Field / Clinic Wet Lab Dry Lab

slide-15
SLIDE 15

Typical experiment workflow

Design Experiment Sample Acquisition Field / Clinic / Lab Validation Verification Sample Acquisition Run Experiment Obtain RNA Make Library Sequencing Base Calling Mapping Library QC Publish Analysis Interpretation 1° 2° 3° 3° 2° Field / Clinic Wet Lab Dry Lab

slide-16
SLIDE 16

Typical experiment workflow

Design Experiment Sample Acquisition Field / Clinic / Lab Validation Verification Sample Acquisition Run Experiment Obtain RNA Make Library Sequencing Base Calling Mapping Library QC Publish Analysis Interpretation 1° 2° 3° 3° 2° Field / Clinic Wet Lab Dry Lab

slide-17
SLIDE 17

Typical experiment workflow

Design Experiment Sample Acquisition Field / Clinic / Lab Validation Verification Sample Acquisition Run Experiment Obtain RNA Make Library Sequencing Base Calling Mapping Library QC Publish Analysis Interpretation 1° 2° 3° 3° 2° Field / Clinic Wet Lab Dry Lab

slide-18
SLIDE 18

Library Construction

AAAAA AAAAA AAAAA AAAAA A AAA

Fragment ds-cDNA synthesis Ligate adaptors + Amplify Target RNA

rRNA (80%)

tRNA (15%) 5%

cellular RNA

Deplete rRNA Enrich polyA RNA Profile (ribosomes) Capture (tiling arrays)

Sequencing

slide-19
SLIDE 19

Typical experiment workflow

Design Experiment Sample Acquisition Field / Clinic / Lab Validation Verification Sample Acquisition Run Experiment Obtain RNA Make Library Sequencing Base Calling Mapping Library QC Publish Analysis Interpretation 1° 2° 3° 3° 2° Field / Clinic Wet Lab Dry Lab

slide-20
SLIDE 20

RNA-seq Mapping

ATG

AAA

Challenge #1: Introns

slide-21
SLIDE 21

RNA-seq Mapping

ATG

AAA

Challenge #1: Introns Align to database

  • f junctions or

transcriptome

Wood et al. Bioinformatics 2011; 27:580–581

Split Read Alignments

Trapnell et al. Bioinformatics 2009; 25:1105-11

slide-22
SLIDE 22

RNA-seq Mapping

ATG

AAA

Challenge #1: Introns Challenge #2: Correctness Sufficient Overlap Sufficient Evidence Align to database

  • f junctions or

transcriptome

Wood et al. Bioinformatics 2011; 27:580–581

Split Read Alignments

Trapnell et al. Bioinformatics 2009; 25:1105-11

slide-23
SLIDE 23

RNA-seq Mapping

ATG

AAA

Challenge #1: Introns Challenge #2: Correctness Sufficient Overlap Sufficient Evidence Align to the transcriptome Challenge #3: Multi-mappers Sequence Similarity Align to database

  • f junctions or

transcriptome

Wood et al. Bioinformatics 2011; 27:580–581

Split Read Alignments

Trapnell et al. Bioinformatics 2009; 25:1105-11

slide-24
SLIDE 24

RNA-seq Mapping

Data QC (clipping) Align to Filter Set Align to ‘genome’ Align to ‘junctions’ Split read Alignment Choose Alignments, Disambiguate Exclude Flag and Exclude

Tophat: Trapnell et al. Bioinformatics 2009; 25:1105-11

slide-25
SLIDE 25

RNA-seq Mapping

Data QC (clipping) Align to Filter Set Align to ‘genome’ Align to ‘junctions’ Split read Alignment Choose Alignments, Disambiguate Exclude Flag and Exclude BAM BAM BAM Alignment Filtering Analysis Library QC

Tophat: Trapnell et al. Bioinformatics 2009; 25:1105-11

slide-26
SLIDE 26

RNA-seq Mapping

reference? diploid? gene model? ESTs? Algorithm? rRNA, tRNA ?

Data QC (clipping) Align to Filter Set Align to ‘genome’ Align to ‘junctions’ Split read Alignment Choose Alignments, Disambiguate Exclude Flag and Exclude BAM BAM BAM Alignment Filtering Analysis Library QC

Tophat: Trapnell et al. Bioinformatics 2009; 25:1105-11

slide-27
SLIDE 27

Typical experiment workflow

Design Experiment Sample Acquisition Field / Clinic / Lab Validation Verification Sample Acquisition Run Experiment Obtain RNA Make Library Sequencing Base Calling Mapping Library QC Publish Analysis Interpretation 1° 2° 3° 3° 2° Field / Clinic Wet Lab Dry Lab

slide-28
SLIDE 28

Library Quality Control (QC)

AAAAA AAAAA AAAAA AAAAA A AAA

Fragment ds-cDNA synthesis Ligate adaptors + Amplify Target RNA

rRNA (80%)

tRNA (15%) 5%

cellular RNA

Deplete rRNA Enrich polyA RNA Profile (ribosomes) Capture (tiling arrays)

Sequencing

slide-29
SLIDE 29

Library Quality Control (QC)

AAAAA AAAAA AAAAA AAAAA A AAA

Fragment ds-cDNA synthesis Ligate adaptors + Amplify Target RNA

rRNA (80%)

tRNA (15%) 5%

cellular RNA

Deplete rRNA Enrich polyA RNA Profile (ribosomes) Capture (tiling arrays)

Sequencing

Affects RNA content (Expression quantification)

slide-30
SLIDE 30

Library Quality Control (QC)

AAAAA AAAAA AAAAA AAAAA A AAA

Fragment ds-cDNA synthesis Ligate adaptors + Amplify Target RNA

rRNA (80%)

tRNA (15%) 5%

cellular RNA

Deplete rRNA Enrich polyA RNA Profile (ribosomes) Capture (tiling arrays)

Sequencing

Affects RNA content (Expression quantification) Affects Insert Size (transcript identification)

slide-31
SLIDE 31

Library Quality Control (QC)

AAAAA AAAAA AAAAA AAAAA A AAA

Fragment ds-cDNA synthesis Ligate adaptors + Amplify Target RNA

rRNA (80%)

tRNA (15%) 5%

cellular RNA

Deplete rRNA Enrich polyA RNA Profile (ribosomes) Capture (tiling arrays)

Sequencing

Affects RNA content (Expression quantification) Affects Insert Size (transcript identification) Affects Strand Specificity

slide-32
SLIDE 32

Library Quality Control (QC)

AAAAA AAAAA AAAAA AAAAA A AAA

Fragment ds-cDNA synthesis Ligate adaptors + Amplify Target RNA

rRNA (80%)

tRNA (15%) 5%

cellular RNA

Deplete rRNA Enrich polyA RNA Profile (ribosomes) Capture (tiling arrays)

Sequencing

Affects RNA content (Expression quantification) Affects Insert Size (transcript identification) Affects Strand Specificity Affects Library Complexity (Tag uniqueness)

slide-33
SLIDE 33

Library Quality Control (QC)

AAAAA AAAAA AAAAA AAAAA A AAA

Fragment ds-cDNA synthesis Ligate adaptors + Amplify Target RNA

rRNA (80%)

tRNA (15%) 5%

cellular RNA

Deplete rRNA Enrich polyA RNA Profile (ribosomes) Capture (tiling arrays)

Sequencing

Affects RNA content (Expression quantification) Affects Insert Size (transcript identification) Affects Strand Specificity Affects Library Complexity (Tag uniqueness) Affects Mapping Rate Paired-end?

slide-34
SLIDE 34

Typical experiment workflow

Design Experiment Sample Acquisition Field / Clinic / Lab Validation Verification Sample Acquisition Run Experiment Obtain RNA Make Library Sequencing Base Calling Mapping Library QC Publish Analysis Interpretation 1° 2° 3° 3° 2° Field / Clinic Wet Lab Dry Lab

slide-35
SLIDE 35

Calculate Gene Expression

ATG

AAA

ATG

Gene A 3500nt (700 reads) Gene B 400nt (160 reads)

AAA

slide-36
SLIDE 36

Mortazavi et al. Nat. Methods 2008; 5:621–628

Calculate Gene Expression

ATG

AAA

ATG

Gene A 3500nt (700 reads) Gene B 400nt (160 reads)

AAA

RPKM = 2.0 RPKM = 4.0

RPKM ¡= ¡R ¡ 103 106 L N × ×

Reads ¡Per ¡Kilobase ¡ ¡per ¡Million

L ¡= ¡Length ¡of ¡gene N ¡= ¡Library ¡Size R ¡= ¡Gene ¡Read ¡Count

slide-37
SLIDE 37

Further Normalisation

ATG

AAA

Repeat

Normalise to “mappable” gene length

Koehler et al. Bioinformatics 2010

slide-38
SLIDE 38

Further Normalisation

ATG

AAA

Repeat

Normalise to “mappable” gene length

Koehler et al. Bioinformatics 2010 Robinson et al. Genome Biology 2010; 11:R25

Scale Expression Values by TMM

Cellular RNA

  • Cond. 1
  • Cond. 2
slide-39
SLIDE 39

Further Normalisation

ATG

AAA

Repeat

Normalise to “mappable” gene length

Koehler et al. Bioinformatics 2010 Robinson et al. Genome Biology 2010; 11:R25

Scale Expression Values by TMM

Cellular RNA

  • Cond. 1
  • Cond. 2

RPKM

  • Cond. 1
  • Cond. 2
slide-40
SLIDE 40

Further Normalisation

ATG

AAA

Repeat

Normalise to “mappable” gene length

Koehler et al. Bioinformatics 2010 Robinson et al. Genome Biology 2010; 11:R25

Scale Expression Values by TMM

Benjamini et al. NAR; 2012

Normalise to GC content of region

slide-41
SLIDE 41

Calculate ‘Feature’ Expression

ATG

AAA

ATG

AAA

slide-42
SLIDE 42

Calculate ‘Feature’ Expression

ATG

AAA

ATG

AAA

Exonic Region

slide-43
SLIDE 43

Calculate ‘Feature’ Expression

ATG

AAA

ATG

AAA

Exonic Region Exon Junction

slide-44
SLIDE 44

Calculate ‘Feature’ Expression

ATG

AAA

ATG

AAA

Exonic Region Exon Junction Intronic Region

slide-45
SLIDE 45

Calculate ‘Feature’ Expression

ATG

AAA

ATG

AAA

Exonic Region Exon Junction Intronic Region Exon Boundary

slide-46
SLIDE 46

Calculate ‘Feature’ Expression

ATG

AAA

ATG

AAA

Exonic Region Exon Junction Intronic Region Exon Boundary Intergenic Region

slide-47
SLIDE 47

Calculate ‘Feature’ Expression

ATG

AAA

ATG

AAA

Exonic Region Exon Junction Intronic Region Exon Boundary Intergenic Region

Calculate RPKM for any feature

slide-48
SLIDE 48

Calculate ‘Feature’ Expression

ATG

AAA

ATG

AAA

Exonic Region Exon Junction Intronic Region Exon Boundary Intergenic Region

Calculate RPKM for any feature Extended 3’ UTR

ATG

AAA

slide-49
SLIDE 49

Calculate ‘Feature’ Expression

ATG

AAA

ATG

AAA

Exonic Region Exon Junction Intronic Region Exon Boundary Intergenic Region

Calculate RPKM for any feature Extended 3’ UTR

ATG

AAA

ATG

AAA

Retained Intron

slide-50
SLIDE 50

Calculate Transcript Expression

ATG

AAA

ATG

AAA

ATG

AAA

ATG

slide-51
SLIDE 51

Calculate Transcript Expression

ATG

AAA

ATG

AAA

ATG

AAA

ATG

diagnostic feature

slide-52
SLIDE 52

Calculate Transcript Expression

ATG

AAA

ATG

AAA

ATG

AAA

ATG

diagnostic feature Approach #1: Expression calculated using diagnostic features Strong Evidence Excludes Transcripts Sampling Variability Lacks statistical robustness Easy to calculate Dependent on gene model

ALEXA-seq: Griffith et al. Nat. Methods 2010; 11:R25

slide-53
SLIDE 53

Calculate Transcript Expression

ATG

AAA

ATG

AAA

ATG

AAA

ATG

slide-54
SLIDE 54

Calculate Transcript Expression

ATG

AAA

ATG

AAA

ATG

AAA

ATG

Approach #2: Expression estimated Construct bipartite graph, then finds minimum path

Cufflinks: Trapnell et al. Nat. Biotech. 2010, 28:511-515

slide-55
SLIDE 55

Calculate Transcript Expression

ATG

AAA

ATG

AAA

ATG

AAA

ATG

Estimates expression for all transcripts Model can fail in complex / highly expressed regions More statistically robust Error rate largely unknown Incorporates ambiguous reads Approach #2: Expression estimated Construct bipartite graph, then finds minimum path

Cufflinks: Trapnell et al. Nat. Biotech. 2010, 28:511-515

slide-56
SLIDE 56

Expressed or not?

ATG

AAA

ATG

AAA

ATG

AAA

  • Cond. 1
  • Cond. 2
  • Cond. 3

Frequency log2 (expression) not “expressed” “expressed” Need to determine ‘expression’ cut-off value

slide-57
SLIDE 57

Expressed or not?

Expressed if > 1 RPKM 1 Lacks sensitivity Arbitrary Has literature support

slide-58
SLIDE 58

Expressed or not?

Expressed if > 1 RPKM 1 Expressed if above intergenic background 2

log2 Expression Frequency 95th percentile

Lacks sensitivity Arbitrary Has literature support

slide-59
SLIDE 59

Expressed or not?

Expressed if > 1 RPKM 1 Expressed if above intergenic background 2

log2 Expression Frequency 95th percentile

Cut-off based

  • n empirical

evidence Still somewhat arbitrary Lacks sensitivity Arbitrary Has literature support

slide-60
SLIDE 60

Expressed or not?

Expressed if > 1 RPKM 1 Expressed if above intergenic background 2

log2 Expression Frequency 95th percentile

Cut-off based

  • n empirical

evidence Still somewhat arbitrary Incorporate replicate information 3 Based on

  • bserved

reproducibility Requires replicates Lacks sensitivity Arbitrary Has literature support

−log2 (expression) bins np−IDR Rep 1 vs Rep 2 Rep 2 vs Rep 1 Mean Cut−off 0.1 0.3 0.5 0.7 0.9 1 −11 −7 −3 1 5 9 13 17 21 25

slide-61
SLIDE 61

Expressed or not?

Expressed if > 1 RPKM 1 Expressed if above intergenic background 2

log2 Expression Frequency 95th percentile

Cut-off based

  • n empirical

evidence Still somewhat arbitrary Incorporate replicate information 3 Based on

  • bserved

reproducibility Requires replicates Lacks sensitivity Arbitrary Has literature support

−log2 (expression) bins np−IDR Rep 1 vs Rep 2 Rep 2 vs Rep 1 Mean Cut−off 0.1 0.3 0.5 0.7 0.9 1 −11 −7 −3 1 5 9 13 17 21 25
slide-62
SLIDE 62

Expressed or not?

Expressed if > 1 RPKM 1 Expressed if above intergenic background 2

log2 Expression Frequency 95th percentile

Cut-off based

  • n empirical

evidence Still somewhat arbitrary Incorporate replicate information 3 Based on

  • bserved

reproducibility Requires replicates Choose what is reasonable for your experiment, be consistent! Lacks sensitivity Arbitrary Has literature support

−log2 (expression) bins np−IDR Rep 1 vs Rep 2 Rep 2 vs Rep 1 Mean Cut−off 0.1 0.3 0.5 0.7 0.9 1 −11 −7 −3 1 5 9 13 17 21 25
slide-63
SLIDE 63

Nucleotide-Resolution Analysis

ATG

AAA

ATG

AAA ICR

Imprinting

slide-64
SLIDE 64

Nucleotide-Resolution Analysis

ATG

AAA

ATG

AAA

Imprinting

sQTL eQTL

slide-65
SLIDE 65

Nucleotide-Resolution Analysis

ATG

AAA

ATG

AAA

Imprinting

sQTL eQTL

Complex Traits

slide-66
SLIDE 66

Nucleotide-Resolution Analysis

ATG

AAA

ATG

AAA

Imprinting

eQTL

Complex Traits A B C SNPs Allelic Fraction

sQTL

slide-67
SLIDE 67

Nucleotide-Resolution Analysis

ATG

AAA

ATG

AAA

Imprinting

eQTL

Complex Traits A B C SNPs Allelic Fraction

sQTL

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 Fraction of RNA−seq Reads Matching Reference Allele Density Expected Mean Observed Mean

Degner et al. Bioinformatics 2009

Reference bias

slide-68
SLIDE 68

Nucleotide-Resolution Analysis

ATG

AAA

ATG

AAA

Imprinting

eQTL

Complex Traits A B C SNPs Allelic Fraction

sQTL

Map to a diploid genome

AlleleSeq: Rozowsky et al. Mol. Sys. Bio 2011

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 Fraction of RNA−seq Reads Matching Reference Allele Density Expected Mean Observed Mean

Degner et al. Bioinformatics 2009

Reference bias

slide-69
SLIDE 69

Typical experiment workflow

Design Experiment Sample Acquisition Field / Clinic / Lab Validation Verification Sample Acquisition Run Experiment Obtain RNA Make Library Sequencing Base Calling Mapping Library QC Publish Analysis Interpretation 1° 2° 3° 3° 2° Field / Clinic Wet Lab Dry Lab

slide-70
SLIDE 70

The future of RNA-seq (now)

Single Cell

Shalek, et al. Nature 2013

slide-71
SLIDE 71

The future of RNA-seq (now)

Single Cell

Shalek, et al. Nature 2013

Huge Cohort

900 donors 30,000 RNA-seq data sets! Genotype-Tissue Expression project (GTEx)

Lonsdale, et al. Nature Genetics 2013

slide-72
SLIDE 72

Summary

Choose an alignment approach suitable for your experiment, available resources and tools Assess library quality, specifically rRNA contamination, insert size, strand specificity and library complexity Gene and ‘Feature’ Expression can be calculated using count data, and normalised by length, library size and GC content Transcript expression calculation requires alternative approaches and algorithms, which although common, are largely unproven RNA-seq can interrogate nucleotide specific questions, but be careful of alignment biases (diploid mapping can help here) 1 2 3 4 5

slide-73
SLIDE 73

Questions and References

Cloonan et al. Nat Methods 2008; Stem cell transcriptome profiling via massive-scale mRNA sequencing Mortazavi et al. Nat. Methods 2008; Mapping and quantifying mammalian transcriptomes by RNA-Seq Wood et al. Bioinformatics 2011; X-MATE: A flexible system for mapping short read data Trapnell et al. Bioinformatics 2009; TopHat: discovering splice junctions with RNA-Seq Koehler et al. Bioinformatics 2010. The Uniqueome: A mappability resource for short-tag sequencing Robinson et al. Genome Biology 2010; A scaling normalization method for differential expression analysis of RNA-seq data. Benjamini et al. NAR; 2012. Summarizing and correcting the GC content bias in high-throughput sequencing Griffith et al. Nat. Methods 2010; Alternative expression analysis by RNA sequencing. Trapnell et al. Nat. Biotech. 2010; Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform Degner et al. Bioinformatics 2009; Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing Rozowsky et al. Mol. Sys. Bio 2011; AlleleSeq: analysis of allele-specific expression and binding in a Shalek, et al. Nature 2013; Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells Lonsdale, et al. Nature Genetics 2013; The Genotype-Tissue Expression (GTEx) project.