Institut für Medizinische Informatik, Statistik und Epidemiologie
RNA-Sequencing analysis
Markus Kreuz
- 25. 04. 2012
RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut fr - - PowerPoint PPT Presentation
RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut fr Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges Comparable
Institut für Medizinische Informatik, Statistik und Epidemiologie
Markus Kreuz
RNA-Seq - Overview 2
Content:
Biological Background 3
Biological background (I):
Type Size Function
21-23 nt regulation of gene expression
19-23 nt antiviral mechanisms
26-31 nt interaction with piwi proteins/spermatogenesis
100-300 nt RNA splicing
modification of other RNAs
Biological Background 4
Biological Background (II):
RNA-Seq technology 5
RNA-Seq technology -Aims:
mRNAs, non-coding RNAs and small RNAs
in terms of:
(different conditions, tissues, etc.)
RNA-Seq analysis 6
RNA-Seq analysis (I):
Long RNAs are first converted into a library of cDNA fragments through either: RNA fragmentation or DNA fragmentation
RNA-Seq analysis 7
RNA-Seq analysis (II):
larger RNA must be fragmented
depletion for ends
biased towards 5’ end
RNA-Seq analysis 8
RNA-Seq analysis (III):
Sequencing adaptors (blue) are subsequently added to each cDNA fragment and a short sequence is obtained from each cDNA using high-throughput sequencing Technology (typical read length: 30-400 bp depending on technology)
RNA-Seq analysis 9
RNA-Seq analysis (IV):
The resulting sequence reads are aligned with the reference genome or transcriptome and classified as three types: exonic reads, junction reads and poly(A) end-reads.
(de novo assembly also possible => attractive for non-model organisms)
RNA-Seq analysis 10
RNA-Seq analysis (V):
These three types are used to generate a base-resolution expression profile for each gene Example: A yeast ORF with one intron
RNA-Seq - Bioinformatic challenges 11
RNA-Seq - Bioinformatic challenges (I):
=> FastQ files
(Alternative: assemble contigs and align them to genome)
=>SAM/BAM files
RNA-Seq - Bioinformatic challenges 12
RNA-Seq - Bioinformatic challenges (II):
Specific challenges for RNA-Seq:
Specific sequence context: CT – AG dinucleotides Low expression for intronic regions Known or predicted splice sites Detection of new sites (e.g. via split read mapping)
RNA-Seq - Coverage 13
Coverage, sequencing depth and costs:
with sequence depth (number of analyzed read)
transcriptome analysis (transcription activity varies)
RNA-Seq - technology 14
RNA-Seq - Comparable technologies:
RNA-Seq - technology 15
Transcriptome mapping using tiling arrays:
Chip design Hybridization to Tiling array Interpretation of results
RNA-Seq - technology 16
Advantages of RNA-Seq:
Wang Z. et al. 2009
In addition RNA-Seq can reveal sequence variation, i.e. mutations or SNPs
RNA-Seq - technology 17
Advantages of RNA-Seq (II):
Wang Z. et al. 2009
Background and saturation:
RNA-Seq - New insights 18
New insights:
for transcripts
influence on transcription and post-transcriptional modification
Expression quantification - ReCount database 19
Expression quantification:
Expression quantification - ReCount database 20
Preprocessing and construction of count tables:
from middle position of read
Expression quantification - ReCount database 21
Example applications (I):
=> similar cell types
Benjamini-Hochberg correction
expressed
Expression quantification - ReCount database 22
Example applications (II):
different ethnicities
Benjamini-Hochberg correction
expressed
RNA-Seq 23