Transcriptomics 101
Nicole Cloonan
Winter School, 5th July 2011
Transcriptomics 101 Nicole Cloonan Winter School, 5 th July 2011 - - PowerPoint PPT Presentation
Transcriptomics 101 Nicole Cloonan Winter School, 5 th July 2011 Transcriptional Complexity Mutations Allelic Expression RNA Editing TSS TSS TSS pA pA pA ATG ATG pA TSS PASR miRNA TASR tiRNA AAA ATG AAA ATG ATG AAA ATG AAA
Winter School, 5th July 2011
pA pA pA pA
ATG ATG
TSS
transcription start site pA polyadenylation signal protein coding regions
ATG
translation start site
AAA polyadenylation
non-coding regions genomic DNA microRNAs spliced intron
TSS TSS TSS TSS AAA AAA PASR TASR miRNA
ATG
AAA
ATG
AAA
ATG
AAA
ATG ATG ATG
AAA AAA
ATG
tiRNA
Mutations Allelic Expression RNA Editing
Transcriptional complexity Surveying transcriptional complexity with microarrays RNA-seq Introduction RNAseq Mapping Novel exon- junction discovery RNAseq Uniqueome How to measure a transcript Transcript assembly RNAseq post-mapping analysis isomiRs Information content Expression Thresholding miRNAseq Things to consider Take home messages Conclusions
pA pA pA pA
ATG ATG
AAA AAA TSS TSS TSS TSS
ATG
AAA
ATG
AAA
ATG
AAA
ATG ATG ATG
AAA AAA
ATG
SAGE CAGE MPSS PET
pA pA pA pA
ATG ATG
AAA AAA TSS TSS TSS TSS
ATG
AAA
ATG
AAA
ATG
AAA
ATG ATG ATG
AAA AAA
ATG
microarray exon arrays exon-junction arrays
pA pA pA pA
ATG ATG
AAA AAA TSS TSS TSS TSS
ATG
AAA
ATG
AAA
ATG
AAA
ATG ATG ATG
AAA AAA
ATG
Cloonan et al. Nat Methods 2008; 5:613-619
Transcriptional complexity Surveying transcriptional complexity with microarrays RNA-seq Introduction RNAseq Mapping Novel exon- junction discovery RNAseq Uniqueome How to measure a transcript Transcript assembly RNAseq post-mapping analysis isomiRs Information content Expression Thresholding miRNAseq Things to consider Take home messages Conclusions
pA pA pA pA
ATG ATG
TSS TSS TSS TSS
ATG
AAA
TSS
transcription start site pA polyadenylation signal protein coding regions
ATG
translation start site
AAA polyadenylation
non-coding regions genomic DNA microRNAs spliced intron
pA pA pA
ATG ATG
TSS
transcription start site pA polyadenylation signal protein coding regions
ATG
translation start site
AAA polyadenylation
non-coding regions genomic DNA microRNAs spliced intron
TSS TSS TSS
Pros: Computationally easy Cons: Does not find all novel splicing
pA pA pA
ATG ATG
TSS
transcription start site pA polyadenylation signal protein coding regions
ATG
translation start site
AAA polyadenylation
non-coding regions genomic DNA microRNAs spliced intron
TSS TSS TSS
ATG
AAA
Pros: Very sensitive Cons: Reasonable coverage required Accuracy dependent on insert size distribution Sequencing twice as expensive
ACGATATGACACGTACAGTCAAATCGT ACGATATTACACGTACATTCAAGTCGT ACGATATTACACGCACAGTCAAGTCGT CGATATTACACGTCCAGTCAAGTCGTT ATATTTCACGTACAGTCAAGTCGTTCG ATATTAAACGTACAGTCAAGTCGTTCG ATTGCACGTACAGTCAAGTCGTTCGGA ATTACACGTACAGTCACGTCGTTCGGA CACGTACAGTCAAGTCGTTCGGAACCT CACGTACCTTCAAGTCGTTCGGAACCT ACGATATTACACGTACAGTCAAGTCGTTCGGAACCT
consensus read aligned reads Non-matching tags Create consensus read remove adaptor sequence Blat against genome
Pros: De novo Cons: Requires high coverage
pA pA pA
ATG ATG
TSS
transcription start site pA polyadenylation signal protein coding regions
ATG
translation start site
AAA polyadenylation
non-coding regions genomic DNA microRNAs spliced intron
TSS TSS TSS
ATG
AAA
http://tophat.cbcb.umd.edu Pros: Very sensitive Cons: Relies on reference
Gene Symbol GRB7 Single nucleotide resolution coverage plot Exon-exon junction usage Known gene structure (exons and introns) Alternative splicing Novel exons or novel transcripts
Transcriptional complexity Surveying transcriptional complexity with microarrays RNA-seq Introduction RNAseq Mapping Novel exon- junction discovery RNAseq Uniqueome How to measure a transcript Transcript assembly RNAseq post-mapping analysis isomiRs Information content Expression Thresholding miRNAseq Things to consider Take home messages Conclusions
Koehler et al Bioinformatics 2011 27(2):272-274
The patterns are largely the same so don’t panic… … unless you’re doing RNAseq
Correction for unique content improves correlation to microarrays
pA pA pA pA
ATG ATG
AAA AAA TSS TSS TSS TSS
ATG
AAA
ATG
AAA
ATG
AAA
ATG ATG ATG
AAA AAA
ATG
Cloonan et al. Nat Methods 2008; 5:613-619
AAA
protein coding regions
AAA polyadenylation
non-coding regions spliced intron
AAA AAA AAA
AAA
protein coding regions
AAA polyadenylation
non-coding regions spliced intron
217127 diagnostic features covering 160156 individual transcripts from 65254 loci
AAA AAA AAA
Accuracy relies on the quality
Different gene models will give different results from the same data. ~80%
Cufflinks Scripture
Guttman et al., Nat Biotech 2010 28(5):503-10
Gene Symbol: MGAT5 Cloonan et al., Unpublished Gene Symbol: RAN
Oases Trinity Abyss
Transcriptional complexity Surveying transcriptional complexity with microarrays RNA-seq Introduction RNAseq Mapping Novel exon- junction discovery RNAseq Uniqueome How to measure a transcript Transcript assembly RNAseq post-mapping analysis isomiRs Information content Expression Thresholding miRNAseq Things to consider Take home messages Conclusions
5’ 3’
RNA-Induced Silencing Complex (RISC)
5’ 3’
miRNA duplex mRNA
5’ AAAAAAAAAAAAAA 3’ 5’ 3’ 5’ 3’
pri-miRNA
5’ 3’ pre-miRNA
Drosha Processing Dicer Processing Asymmetrical Unwinding RISC-mRNA interactions Translational Inhibition mRNA sequestration mRNA degradation Most interactions thought to occur in the 3’ UTR
10 20 30 40 50 60
15 16 17 18 19 20 21 22 23 24 25
Proportion of miRNAs (%) Length of miRNAs (nt)
* 20 CAAAGUGCUUACAGUGCAGGUAGU UAAAGUGCUUAUAGUGCAGGUAG- AAAAGUGCUUACAGUGCAGGUAGC UAAAGUGCUGACAGUGCAGAU---
UAAGGUGCAUCUAGUGCAGAUA-- AAaGUGCu aGUGCAG Ua
miR-17-5p : miR-20 : miR-106a : miR-106b : miR-93 : miR-18 :
* 20 UGUGCAAAUCUAUGCAAAACUGA- UGUGCAAAUCCAUGCAAAACUGA- UGUGCAAAUCCAUGCAAAACUGA- UGUGCAAAUCcAUGCAAAACUGA
miR-19a : miR-19b-1 : miR-19b-2 :
Map to a subset of the genome instead
tagcgggatctctcgagagctcgcgat tagcgggatctctcgacagctcgcgat
miR A miR B
tctctcgacagct
1 MM 0 MM
tctctcgagagct
0 MM 1 MM
5’ 3’ pre-miRNA
Cloonan et al. Genome Biol 2011; 12(12):R126
Cloonan et al. Genome Biol 2011; 12(12):R126
Transcriptional complexity Surveying transcriptional complexity with microarrays RNA-seq Introduction RNAseq Mapping Novel exon- junction discovery RNAseq Uniqueome How to measure a transcript Transcript assembly RNAseq post-mapping analysis isomiRs Information content Expression Thresholding miRNAseq Things to consider Take home messages Conclusions