Transcriptomics 101 Nicole Cloonan Winter School, 5 th July 2011 - - PowerPoint PPT Presentation

transcriptomics 101
SMART_READER_LITE
LIVE PREVIEW

Transcriptomics 101 Nicole Cloonan Winter School, 5 th July 2011 - - PowerPoint PPT Presentation

Transcriptomics 101 Nicole Cloonan Winter School, 5 th July 2011 Transcriptional Complexity Mutations Allelic Expression RNA Editing TSS TSS TSS pA pA pA ATG ATG pA TSS PASR miRNA TASR tiRNA AAA ATG AAA ATG ATG AAA ATG AAA


slide-1
SLIDE 1

Transcriptomics 101

Nicole Cloonan

Winter School, 5th July 2011

slide-2
SLIDE 2

pA pA pA pA

ATG ATG

TSS

transcription start site pA polyadenylation signal protein coding regions

ATG

translation start site

AAA polyadenylation

non-coding regions genomic DNA microRNAs spliced intron

TSS TSS TSS TSS AAA AAA PASR TASR miRNA

ATG

AAA

ATG

AAA

ATG

AAA

ATG ATG ATG

AAA AAA

ATG

tiRNA

Transcriptional Complexity

Mutations Allelic Expression RNA Editing

slide-3
SLIDE 3

Transcriptional complexity Surveying transcriptional complexity with microarrays RNA-seq Introduction RNAseq Mapping Novel exon- junction discovery RNAseq Uniqueome How to measure a transcript Transcript assembly RNAseq post-mapping analysis isomiRs Information content Expression Thresholding miRNAseq Things to consider Take home messages Conclusions

Presentation Outline

slide-4
SLIDE 4

pA pA pA pA

ATG ATG

AAA AAA TSS TSS TSS TSS

ATG

AAA

ATG

AAA

ATG

AAA

ATG ATG ATG

AAA AAA

ATG

Tag sequencing

SAGE CAGE MPSS PET

slide-5
SLIDE 5

pA pA pA pA

ATG ATG

AAA AAA TSS TSS TSS TSS

ATG

AAA

ATG

AAA

ATG

AAA

ATG ATG ATG

AAA AAA

ATG

Microarrays

microarray exon arrays exon-junction arrays

slide-6
SLIDE 6

pA pA pA pA

ATG ATG

AAA AAA TSS TSS TSS TSS

ATG

AAA

ATG

AAA

ATG

AAA

ATG ATG ATG

AAA AAA

ATG

RNAseq

Cloonan et al. Nat Methods 2008; 5:613-619

slide-7
SLIDE 7

Transcriptional complexity Surveying transcriptional complexity with microarrays RNA-seq Introduction RNAseq Mapping Novel exon- junction discovery RNAseq Uniqueome How to measure a transcript Transcript assembly RNAseq post-mapping analysis isomiRs Information content Expression Thresholding miRNAseq Things to consider Take home messages Conclusions

Presentation Outline

slide-8
SLIDE 8

pA pA pA pA

ATG ATG

TSS TSS TSS TSS

ATG

AAA

RNAseq Mapping

The fastest alignment methods are ungapped… but what about junctions?

TSS

transcription start site pA polyadenylation signal protein coding regions

ATG

translation start site

AAA polyadenylation

non-coding regions genomic DNA microRNAs spliced intron

slide-9
SLIDE 9

Novel exon-junction discovery (systematic)

pA pA pA

ATG ATG

TSS

transcription start site pA polyadenylation signal protein coding regions

ATG

translation start site

AAA polyadenylation

non-coding regions genomic DNA microRNAs spliced intron

TSS TSS TSS

Pros: Computationally easy Cons: Does not find all novel splicing

slide-10
SLIDE 10

Novel exon-junction discovery (Paired End)

pA pA pA

ATG ATG

TSS

transcription start site pA polyadenylation signal protein coding regions

ATG

translation start site

AAA polyadenylation

non-coding regions genomic DNA microRNAs spliced intron

TSS TSS TSS

ATG

AAA

Pros: Very sensitive Cons: Reasonable coverage required Accuracy dependent on insert size distribution Sequencing twice as expensive

slide-11
SLIDE 11

Novel exon-junction discovery (de novo)

ACGATATGACACGTACAGTCAAATCGT ACGATATTACACGTACATTCAAGTCGT ACGATATTACACGCACAGTCAAGTCGT CGATATTACACGTCCAGTCAAGTCGTT ATATTTCACGTACAGTCAAGTCGTTCG ATATTAAACGTACAGTCAAGTCGTTCG ATTGCACGTACAGTCAAGTCGTTCGGA ATTACACGTACAGTCACGTCGTTCGGA CACGTACAGTCAAGTCGTTCGGAACCT CACGTACCTTCAAGTCGTTCGGAACCT ACGATATTACACGTACAGTCAAGTCGTTCGGAACCT

consensus read aligned reads Non-matching tags Create consensus read remove adaptor sequence Blat against genome

Pros: De novo Cons: Requires high coverage

slide-12
SLIDE 12

Novel exon-junction discovery (Top Hat)

pA pA pA

ATG ATG

TSS

transcription start site pA polyadenylation signal protein coding regions

ATG

translation start site

AAA polyadenylation

non-coding regions genomic DNA microRNAs spliced intron

TSS TSS TSS

ATG

AAA

http://tophat.cbcb.umd.edu Pros: Very sensitive Cons: Relies on reference

slide-13
SLIDE 13

Gene Symbol GRB7 Single nucleotide resolution coverage plot Exon-exon junction usage Known gene structure (exons and introns) Alternative splicing Novel exons or novel transcripts

Look at your data!

slide-14
SLIDE 14

Transcriptional complexity Surveying transcriptional complexity with microarrays RNA-seq Introduction RNAseq Mapping Novel exon- junction discovery RNAseq Uniqueome How to measure a transcript Transcript assembly RNAseq post-mapping analysis isomiRs Information content Expression Thresholding miRNAseq Things to consider Take home messages Conclusions

Presentation Outline

slide-15
SLIDE 15

Different aligners give different results

Koehler et al Bioinformatics 2011 27(2):272-274

The patterns are largely the same so don’t panic… … unless you’re doing RNAseq

slide-16
SLIDE 16

Uniqueome affects quantitation of RNAseq

Correction for unique content improves correlation to microarrays

slide-17
SLIDE 17

pA pA pA pA

ATG ATG

AAA AAA TSS TSS TSS TSS

ATG

AAA

ATG

AAA

ATG

AAA

ATG ATG ATG

AAA AAA

ATG

RNAseq

Cloonan et al. Nat Methods 2008; 5:613-619

slide-18
SLIDE 18

How to detect a transcript?

AAA

protein coding regions

AAA polyadenylation

non-coding regions spliced intron

AAA AAA AAA

A B C D

slide-19
SLIDE 19

How to detect a transcript?

AAA

protein coding regions

AAA polyadenylation

non-coding regions spliced intron

92.6% known transcripts have diagnostic features (covers 99.8% of loci)

217127 diagnostic features covering 160156 individual transcripts from 65254 loci

AAA AAA AAA

A B C D

Accuracy relies on the quality

  • f the gene models used.

Different gene models will give different results from the same data. ~80%

slide-20
SLIDE 20

Reference assisted transcript assembly

Cufflinks Scripture

Guttman et al., Nat Biotech 2010 28(5):503-10

slide-21
SLIDE 21

Reference free alignment

  • de novo assembly

Gene Symbol: MGAT5 Cloonan et al., Unpublished Gene Symbol: RAN

Oases Trinity Abyss

slide-22
SLIDE 22

Transcriptional complexity Surveying transcriptional complexity with microarrays RNA-seq Introduction RNAseq Mapping Novel exon- junction discovery RNAseq Uniqueome How to measure a transcript Transcript assembly RNAseq post-mapping analysis isomiRs Information content Expression Thresholding miRNAseq Things to consider Take home messages Conclusions

Presentation Outline

slide-23
SLIDE 23

miRNAs

5’ 3’

RNA-Induced Silencing Complex (RISC)

5’ 3’

miRNA duplex mRNA

5’ AAAAAAAAAAAAAA 3’ 5’ 3’ 5’ 3’

pri-miRNA

5’ 3’ pre-miRNA

Drosha Processing Dicer Processing Asymmetrical Unwinding RISC-mRNA interactions Translational Inhibition mRNA sequestration mRNA degradation Most interactions thought to occur in the 3’ UTR

slide-24
SLIDE 24

MicroRNAs are small and closely related

10 20 30 40 50 60

15 16 17 18 19 20 21 22 23 24 25

Proportion of miRNAs (%) Length of miRNAs (nt)

* 20 CAAAGUGCUUACAGUGCAGGUAGU UAAAGUGCUUAUAGUGCAGGUAG- AAAAGUGCUUACAGUGCAGGUAGC UAAAGUGCUGACAGUGCAGAU---

  • AAAGUGCUGUUCGUGCAGGUAG-

UAAGGUGCAUCUAGUGCAGAUA-- AAaGUGCu aGUGCAG Ua

miR-17-5p : miR-20 : miR-106a : miR-106b : miR-93 : miR-18 :

* 20 UGUGCAAAUCUAUGCAAAACUGA- UGUGCAAAUCCAUGCAAAACUGA- UGUGCAAAUCCAUGCAAAACUGA- UGUGCAAAUCcAUGCAAAACUGA

miR-19a : miR-19b-1 : miR-19b-2 :

slide-25
SLIDE 25

Information content in short tags

Map to a subset of the genome instead

slide-26
SLIDE 26

tagcgggatctctcgagagctcgcgat tagcgggatctctcgacagctcgcgat

miR A miR B

tctctcgacagct

1 MM 0 MM

tctctcgagagct

0 MM 1 MM

Not allowing mismatches does not solve the problem

slide-27
SLIDE 27

5’ 3’ pre-miRNA

IsomiRs are common and functional

Cloonan et al. Genome Biol 2011; 12(12):R126

slide-28
SLIDE 28

Expression Thresholding

Cloonan et al. Genome Biol 2011; 12(12):R126

slide-29
SLIDE 29

Transcriptional complexity Surveying transcriptional complexity with microarrays RNA-seq Introduction RNAseq Mapping Novel exon- junction discovery RNAseq Uniqueome How to measure a transcript Transcript assembly RNAseq post-mapping analysis isomiRs Information content Expression Thresholding miRNAseq Things to consider Take home messages Conclusions

Presentation Outline

slide-30
SLIDE 30

Things to consider

Check your data!

 visualization strategies

 IGV (brilliant for individual read resolution)  UCSC (brilliant for genomic context of expression)  Heatmaps, etc. (brilliant for quantification)

 Check your mapping statistics

 % mapped?, % mapped at what length?, redundancy etc.

 Make sure the controls are doing what they should be

Remember the limitations and parameters of your alignment strategy - be careful with interpretation!

 Eg. Variable alignment strategies that trim starts and ends of tags will overestimate the relative complexity of your library  Eg. Discarding all tags that map to multiple regions will limit your ability to detect closely related gene families, or sequence motifs in repetitive/low complexity areas

slide-31
SLIDE 31

Conclusions

RNAseq and miRNAseq both require special attention to mapping strategies Choose an alignment strategy that will answer your biological question first and foremost, and then consider available resources

 If your strategy won’t work, it’s better to know BEFORE sequencing rather than afterwards.

Check your mapped data – better to find errors before extensive analysis and validation Be careful in your interpretation of the data