Comparison of RNA sequencing with 19,319 lab validated RT-qPCR assays - - PowerPoint PPT Presentation
Comparison of RNA sequencing with 19,319 lab validated RT-qPCR assays - - PowerPoint PPT Presentation
Comparison of RNA sequencing with 19,319 lab validated RT-qPCR assays Jan Hellemans, PhD London, UK October 20-21, 2014 Acknowledgements Biogazelle team & collaborators Biogazelle Ghent University Steve Lefever SEQC
Acknowledgements
- Biogazelle
Biogazelle team & collaborators
- Ghent University
- Steve Lefever
- SEQC consortium
- Christopher Mason
- David Kreil
- Leming Shi
- Bio-Rad
- qPCR: reference technology for nucleic acid quantification
- sensitivity and specificity
- wide dynamic range
- speed
- relatively low cost
- conceptual and practical simplicity
- easy to perform ≠ easy to do it right
- many steps involved
- all need to be right
Introduction
Assays & MIQE
- design
- amplicon length
- primer positions (exonic or intron-spanning)
- transcript coverage
- in silico verification
- specificity prediction (retropseudogenes and other homologues)
- secondary structure analysis
- empirical (wet lab) validation
- specificity assessment (gel, melt, amplicon sequencing)
- Cq of NTC (for SYBR assays)
- amplification efficiency determination (slope, E, SE(E), r²)
Assays & MIQE
- design
- amplicon length
- primer positions (exonic or intron-spanning)
- transcript coverage
- in silico verification
- specificity prediction (retropseudogenes and other
homologues)
- secondary structure analysis
- empirical (wet lab) validation
- specificity assessment (gel, melt, amplicon sequencing)
- Cq of NTC (for SYBR assays)
- amplification efficiency determination (slope, E, SE(E), r²)
The perfect assay
- specific for the gene of interest
(no off-target amplification)
- detection of all transcript variants
- detection not affected by polymorphisms
(no allelic bias or drop out)
- amplification efficiency ~100%
- no gDNA co-amplification
- no primer dimer formation
properties
The perfect assay
The perfect assay
- For some genes, there is no perfect assay
- no unique sequence (homology with other genes –
pseudogenes)
- no common sequence among all transcripts
- regions are excluded because of repeats, secondary
structures, SNPs, homology, ...
- Make the best possible compromise and report potential issues
- Design à in silico quality control à lab validation
... or the best possible
Assay design using primerXL
- database of genomic information (transcripts, SNPs, ...)
- tools for target region selection (maximize transcript coverage)
- primer3 design engine
- analysis of secondary structures and SNPs in primer annealing
regions
- specificity prediction (BiSearch)
- relaxation cascade (from perfect to best possible)
BiSearch specificity prediction
- BiSearch loose
- 1222222222222222
- BiSearch strict
- 1233333333333
BiSearch specificity prediction
- BiSearch loose
- 1222222222222222
- nly the gene of interest
(FFAR2)
- BiSearch strict
- 1233333333333
reads ¡ seq ¡ gene_list ¡
- fficial_symbol ¡
location ¡ 2843 ¡ CATGGCAGTCACCATCTTCTGCTACTGGCGTTTTGTGTGGATCATGCTCTCCCAGCCC
CTTGTGGGGGCCCAGAGGCGGCGCCGAGCCGTGGGGCTGGCTGTGGTGACGC TGCTCAATTTCCTGGTGTGCTTCGGACCTTACAGATCGGAA
ENSG00000126262 ¡ FFAR2 ¡ 19:35940617-359 42667 ¡ 1897 ¡ GTAAGGTCCGAAGCACACCAGGAAATTGAGCAGCGTCACCACAGCCAGCCCC
ACGGCTCGGCGCCGCCTCTGGGCCCCCACAAGGGGCTGGGAGAGCATGATCC ACACAAAACGCCAGTAGCAGAAGATGGTGACTGCCATGAGATCGGAA
ENSG00000126262 ¡ FFAR2 ¡ 19:35940617-359 42667 ¡ 1535 ¡ GTAAGGTCCGAAGCACACCGAGAGCTGGGAGCAGGAGCTACACAGTCTGCTGG
CCTCACTGCACACCCTGCTGGGGGCCCTGTACGAGGGAGCAGAGACTGCTCCT GTGCAGAATGAAGGCCCTGGGGTGGAGATGCTGCTGTCCTCAGAA
ENSG00000141456 ¡ AC091153.1 ¡ 17:4574680-4607 632 ¡ 1097 ¡ CATGGCAGTCACCATCTTCTGAGGACAGCAGCATCTCCACCCCAGGGCCTTCATT
CTGCACAGGAGCAGTCTCTGCTCCCTCGTACAGGGCCCCCAGCAGGGTGTGCA GTGAGGCCAGCAGACTGTGTAGCTCCTGCTCCCAGCTCTCGG
ENSG00000141456 ¡ AC091153.1 ¡ 17:4574680-4607 632 ¡ 1091 ¡ CATGGCAGTCACCATCTTCTGAGGACAGCAGCATCTCCACCCCAGGGCCTTCATT
CTGCACAGGAGCAGTCTCTGCTCCCTCGTACAGGGCCCCCAGCAGGGTGTGCA GTGAGGCCAGCAGACTGTGTAGCTCCTGCTCCCAGCTCTCGGT
ENSG00000141456 ¡ AC091153.1 ¡ 17:4574680-4607 632 ¡
Wet lab validation
- PCR composition
- total volume: 5 µl
- instrument: CFX384 (with automation)
- mastermix: SsoAdvanced SYBR
- primer conc: 250 nM each
- PCR program
- default cycling protocol for SsoAdvanced SYBR (Ta=60°C)
- Samples
- cDNA: 25 ng (total RNA equivalents – Agilent Universal human reference
RNA = MAQC A)
- gDNA: 2.5 ng (Roche)
- NTC: water + carrier (5 ng/μl yeast transfer RNA)
- synthetic template (pooled 60-mers in concentration range: 20 M – 20
copies)
setup
Wet lab validation
- lab validation of 103 053 assays
(human, mouse and rat coding genes)
- 1 456 142 reactions
- 3 822 PCR plates (384-well)
- equivalent to 15 288 PCR plates (96-well)
some numbers
305 m
Amplification efficiency
- initial publication: Vermeulen et al., Nucleic Acids Research, 2009
- Biogazelle approach (easy & cost effective)
- 60-mer
- no modifications, standard desalted
- 7 points dilution series: 20 000 000 > 20 molecules
- equivalent to full length double stranded template
- limitation: behavior of first cycles amplifying from cDNA are not
evaluated
synthetic templates
30 nt 3’ 30 nt 5’
ds template ss oligo r²<0.99 1 1 median E 2.00 2.01 average E 2.00 2.01 count E <> [1.90-2.10] 1 3 paired t-test p-value 0.14
Amplification efficiency
distribution (n = 50 133)
89%
Amplification efficiency
distribution (n = 50 133)
89% redesign redesign
Specificity
- amplicon sizing ( + melt analysis for SYBR assays)
- limited sensitivity for detecting low level non-specific
coamplification
- failure to observe non-specific amplification of sequences
with similar size and/or Tm e.g. expressed pseudogenes or homologous genes
- Next level of specificity assessment
- in silico specificity predictions by BiSearch
- massively parallel sequencing of pooled PCR products
- average coverage > 1000-fold à lab specificity > 99.9%
- 50 – 200 times more sensitive than size analysis and Sanger
sequencing
NGS for increased sensitivity
Specificity
most assays are 100% on-target
Specificity
0% 25% 50% 75% 100% % on-target
2/3 of non-specific assays may go unnoticed without NGS
0% 20% 40% 60%
0 < x < 0.1 0.1 < x < 0.2 0.2 < x < 0.3 0.3 < x < 0.4 0.4 < x < 0.5 0.5 < x < 0.6 0.6 < x < 0.7 0.7 < x < 0.8 0.8 < x < 0.9 0.9 < x < 1
Specificity
perfect 60 293 86% acceptable (<10% non-specific) 5 866 8% predicted non-specificity (no specific design found) 1 204 2% failing specificity QC criteria 2 467 4%
the power of in silico verification
MIQE compliant PrimePCR assay
validation data sheet
Dynamic range
> 10 000 000 fold
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 16 777.216 8 388.608 4 194.304 2 097.152 1 048.576 524.288 262.144 131.072 65.536 32.768 16.384 8.192 4.096 2.048 1.024 0.512 0.256 0.128 0.064 0.032 0.016 0.008 0.004 0.002 0.001 gene count copies per cell human mouse rat
SEQC
- multisite, cross-platform analysis of RNAseq
- FDA sponsored and guided MAQC-III
- Nature Biotechnology, Sept 2014
Focus on RNA sequencing quality control (SEQC) 2 Biogazelle co-authors
- MAQC samples
reference RNA with built in controls – known truths
- > 100 billion reads
- compared against qPCR (PrimePCR)
RNAseq vs PrimePCR
Differential expression
454 ILMN PGM PRO 0.83 0.89 0.86 0.89
13,190 genes 16,264 genes 14,981 genes 16,242 genes
qPCR (PrimePCR) vs RNAseq (Illumina)
r² = 75% for genes detected by both platforms
qPCR (PrimePCR) vs RNAseq (Illumina)
Saturation analysis
preparation ¡ sample ¡ libraries ¡ reads ¡ GENCODE12 mapping ¡ PrimePCR mapping ¡ ribo- depleted ¡ MAQC A ¡ 22 ¡ 5 304 M ¡ 1 955 M (37%) ¡ 1 692 M (32%) ¡ MAQC B ¡ 17 ¡ 3 370 M ¡ 1 447 M (43%) ¡ 1 193 M (35%) ¡ poly-A– enriched ¡ MAQC A ¡ 4 ¡ 427 M ¡ 291 M (68%) ¡ 278 M (65%) ¡ MAQC B ¡ 4 ¡ 446 M ¡ 323 M (72%) ¡ 297 M (67%) ¡
ABRF-NGS dataset
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 096 000 000 2 048 000 000 1 024 000 000 512 000 000 256 000 000 128 000 000 64 000 000 32 000 000 16 000 000 8 000 000 4 000 000 2 000 000 1 000 000 500 000 250 000 125 000 MAQC A - detection MAQC B - detection
Saturation analysis
ribo-depletion RNAseq - % of GENCODE12
Saturation analysis
ribo-depletion RNAseq - % of GENCODE12
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 096 000 000 2 048 000 000 1 024 000 000 512 000 000 256 000 000 128 000 000 64 000 000 32 000 000 16 000 000 8 000 000 4 000 000 2 000 000 1 000 000 500 000 250 000 125 000 MAQC A - detection MAQC B - detection
Saturation analysis
ribo-depletion RNAseq - % of GENCODE12
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 096 000 000 2 048 000 000 1 024 000 000 512 000 000 256 000 000 128 000 000 64 000 000 32 000 000 16 000 000 8 000 000 4 000 000 2 000 000 1 000 000 500 000 250 000 125 000 MAQC A - detection MAQC B - detection MAQC A - quantification MAQC B - quantification
Saturation analysis
poly-A RNAseq - % of GENCODE12
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 096 000 000 2 048 000 000 1 024 000 000 512 000 000 256 000 000 128 000 000 64 000 000 32 000 000 16 000 000 8 000 000 4 000 000 2 000 000 1 000 000 500 000 250 000 125 000 MAQC A - detection MAQC B - detection MAQC A - quantification MAQC B - quantification
Saturation analysis
ribo-depletion RNAseq for MAQC A - GENCODE12 vs PrimePCR
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 096 000 000 2 048 000 000 1 024 000 000 512 000 000 256 000 000 128 000 000 64 000 000 32 000 000 16 000 000 8 000 000 4 000 000 2 000 000 1 000 000 500 000 250 000 125 000 GENCODE12 detection primePCR detection GENCODE12 quantification primePCR quantification
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 096 000 000 2 048 000 000 1 024 000 000 512 000 000 256 000 000 128 000 000 64 000 000 32 000 000 16 000 000 8 000 000 4 000 000 2 000 000 1 000 000 500 000 250 000 125 000 ribo-depletion RNAseq - detection poly-A RNAseq - detection qPCR - detection ribo-depletion RNAseq - quantification poly-A RNAseq - quantification qPCR - quantification
Saturation analysis
MAQC A - % of PrimePCR
Confirmation rate of novel junctions
Junction prediction junctions ¡ confirmed ¡ confirmation rate ¡ multiple algorithms
(Cstar + Magic + Subread) ¡
136 ¡ 136 ¡ 100% ¡ single algorithm ¡ 24 ¡ 20 ¡ 83% ¡
- novel exon
- ne of the primers in the novel exon
- novel junction
- ne of the primers overlapping the novel junction
≥ 5 bases at either side of junction
- size analysis to confirm expected size for novel transcripts
Conclusions - I
- Assay design and in silico verification
- Transcript coverage
- SNPs and secondary structures
- Specificity prediction
- Empirical assay validation
- Efficiency in 90-110% range
- Stringent specificity analysis by massively parallel amplicon
sequencing
- validated assays for human, mouse & rat coding genes
PrimePCR
Conclusions - II
- qPCR based transcriptome profiling
- Samples from MAQC/SEQC study
- PCR data as benchmark for evaluation of RNAseq
- qPCR benefits: high sensitivity and large dynamic range
- good correlation with RNAseq results
- for individual genes, RNAseq ≤ 100 M reads gives lower
sensitivity than qPCR
- the majority of novel junctions identified by RNAseq can be