Comparison of RNA sequencing with 19,319 lab validated RT-qPCR assays - - PowerPoint PPT Presentation

comparison of rna sequencing with
SMART_READER_LITE
LIVE PREVIEW

Comparison of RNA sequencing with 19,319 lab validated RT-qPCR assays - - PowerPoint PPT Presentation

Comparison of RNA sequencing with 19,319 lab validated RT-qPCR assays Jan Hellemans, PhD London, UK October 20-21, 2014 Acknowledgements Biogazelle team & collaborators Biogazelle Ghent University Steve Lefever SEQC


slide-1
SLIDE 1

Comparison of RNA sequencing with 19,319 lab validated RT-qPCR assays

Jan Hellemans, PhD London, UK October 20-21, 2014

slide-2
SLIDE 2
slide-3
SLIDE 3

Acknowledgements

  • Biogazelle

Biogazelle team & collaborators

  • Ghent University
  • Steve Lefever
  • SEQC consortium
  • Christopher Mason
  • David Kreil
  • Leming Shi
  • Bio-Rad
slide-4
SLIDE 4
  • qPCR: reference technology for nucleic acid quantification
  • sensitivity and specificity
  • wide dynamic range
  • speed
  • relatively low cost
  • conceptual and practical simplicity
  • easy to perform ≠ easy to do it right
  • many steps involved
  • all need to be right

Introduction

slide-5
SLIDE 5

Assays & MIQE

  • design
  • amplicon length
  • primer positions (exonic or intron-spanning)
  • transcript coverage
  • in silico verification
  • specificity prediction (retropseudogenes and other homologues)
  • secondary structure analysis
  • empirical (wet lab) validation
  • specificity assessment (gel, melt, amplicon sequencing)
  • Cq of NTC (for SYBR assays)
  • amplification efficiency determination (slope, E, SE(E), r²)
slide-6
SLIDE 6

Assays & MIQE

  • design
  • amplicon length
  • primer positions (exonic or intron-spanning)
  • transcript coverage
  • in silico verification
  • specificity prediction (retropseudogenes and other

homologues)

  • secondary structure analysis
  • empirical (wet lab) validation
  • specificity assessment (gel, melt, amplicon sequencing)
  • Cq of NTC (for SYBR assays)
  • amplification efficiency determination (slope, E, SE(E), r²)
slide-7
SLIDE 7

The perfect assay

  • specific for the gene of interest

(no off-target amplification)

  • detection of all transcript variants
  • detection not affected by polymorphisms

(no allelic bias or drop out)

  • amplification efficiency ~100%
  • no gDNA co-amplification
  • no primer dimer formation

properties

slide-8
SLIDE 8

The perfect assay

slide-9
SLIDE 9

The perfect assay

  • For some genes, there is no perfect assay
  • no unique sequence (homology with other genes –

pseudogenes)

  • no common sequence among all transcripts
  • regions are excluded because of repeats, secondary

structures, SNPs, homology, ...

  • Make the best possible compromise and report potential issues
  • Design à in silico quality control à lab validation

... or the best possible

slide-10
SLIDE 10

Assay design using primerXL

  • database of genomic information (transcripts, SNPs, ...)
  • tools for target region selection (maximize transcript coverage)
  • primer3 design engine
  • analysis of secondary structures and SNPs in primer annealing

regions

  • specificity prediction (BiSearch)
  • relaxation cascade (from perfect to best possible)
slide-11
SLIDE 11

BiSearch specificity prediction

  • BiSearch loose
  • 1222222222222222
  • BiSearch strict
  • 1233333333333
slide-12
SLIDE 12

BiSearch specificity prediction

  • BiSearch loose
  • 1222222222222222
  • nly the gene of interest

(FFAR2)

  • BiSearch strict
  • 1233333333333

reads ¡ seq ¡ gene_list ¡

  • fficial_symbol ¡

location ¡ 2843 ¡ CATGGCAGTCACCATCTTCTGCTACTGGCGTTTTGTGTGGATCATGCTCTCCCAGCCC

CTTGTGGGGGCCCAGAGGCGGCGCCGAGCCGTGGGGCTGGCTGTGGTGACGC TGCTCAATTTCCTGGTGTGCTTCGGACCTTACAGATCGGAA

ENSG00000126262 ¡ FFAR2 ¡ 19:35940617-359 42667 ¡ 1897 ¡ GTAAGGTCCGAAGCACACCAGGAAATTGAGCAGCGTCACCACAGCCAGCCCC

ACGGCTCGGCGCCGCCTCTGGGCCCCCACAAGGGGCTGGGAGAGCATGATCC ACACAAAACGCCAGTAGCAGAAGATGGTGACTGCCATGAGATCGGAA

ENSG00000126262 ¡ FFAR2 ¡ 19:35940617-359 42667 ¡ 1535 ¡ GTAAGGTCCGAAGCACACCGAGAGCTGGGAGCAGGAGCTACACAGTCTGCTGG

CCTCACTGCACACCCTGCTGGGGGCCCTGTACGAGGGAGCAGAGACTGCTCCT GTGCAGAATGAAGGCCCTGGGGTGGAGATGCTGCTGTCCTCAGAA

ENSG00000141456 ¡ AC091153.1 ¡ 17:4574680-4607 632 ¡ 1097 ¡ CATGGCAGTCACCATCTTCTGAGGACAGCAGCATCTCCACCCCAGGGCCTTCATT

CTGCACAGGAGCAGTCTCTGCTCCCTCGTACAGGGCCCCCAGCAGGGTGTGCA GTGAGGCCAGCAGACTGTGTAGCTCCTGCTCCCAGCTCTCGG

ENSG00000141456 ¡ AC091153.1 ¡ 17:4574680-4607 632 ¡ 1091 ¡ CATGGCAGTCACCATCTTCTGAGGACAGCAGCATCTCCACCCCAGGGCCTTCATT

CTGCACAGGAGCAGTCTCTGCTCCCTCGTACAGGGCCCCCAGCAGGGTGTGCA GTGAGGCCAGCAGACTGTGTAGCTCCTGCTCCCAGCTCTCGGT

ENSG00000141456 ¡ AC091153.1 ¡ 17:4574680-4607 632 ¡

slide-13
SLIDE 13

Wet lab validation

  • PCR composition
  • total volume: 5 µl
  • instrument: CFX384 (with automation)
  • mastermix: SsoAdvanced SYBR
  • primer conc: 250 nM each
  • PCR program
  • default cycling protocol for SsoAdvanced SYBR (Ta=60°C)
  • Samples
  • cDNA: 25 ng (total RNA equivalents – Agilent Universal human reference

RNA = MAQC A)

  • gDNA: 2.5 ng (Roche)
  • NTC: water + carrier (5 ng/μl yeast transfer RNA)
  • synthetic template (pooled 60-mers in concentration range: 20 M – 20

copies)

setup

slide-14
SLIDE 14

Wet lab validation

  • lab validation of 103 053 assays

(human, mouse and rat coding genes)

  • 1 456 142 reactions
  • 3 822 PCR plates (384-well)
  • equivalent to 15 288 PCR plates (96-well)

some numbers

305 m

slide-15
SLIDE 15

Amplification efficiency

  • initial publication: Vermeulen et al., Nucleic Acids Research, 2009
  • Biogazelle approach (easy & cost effective)
  • 60-mer
  • no modifications, standard desalted
  • 7 points dilution series: 20 000 000 > 20 molecules
  • equivalent to full length double stranded template
  • limitation: behavior of first cycles amplifying from cDNA are not

evaluated

synthetic templates

30 nt 3’ 30 nt 5’

ds template ss oligo r²<0.99 1 1 median E 2.00 2.01 average E 2.00 2.01 count E <> [1.90-2.10] 1 3 paired t-test p-value 0.14

slide-16
SLIDE 16

Amplification efficiency

distribution (n = 50 133)

89%

slide-17
SLIDE 17

Amplification efficiency

distribution (n = 50 133)

89% redesign redesign

slide-18
SLIDE 18

Specificity

  • amplicon sizing ( + melt analysis for SYBR assays)
  • limited sensitivity for detecting low level non-specific

coamplification

  • failure to observe non-specific amplification of sequences

with similar size and/or Tm e.g. expressed pseudogenes or homologous genes

  • Next level of specificity assessment
  • in silico specificity predictions by BiSearch
  • massively parallel sequencing of pooled PCR products
  • average coverage > 1000-fold à lab specificity > 99.9%
  • 50 – 200 times more sensitive than size analysis and Sanger

sequencing

NGS for increased sensitivity

slide-19
SLIDE 19

Specificity

most assays are 100% on-target

slide-20
SLIDE 20

Specificity

0% 25% 50% 75% 100% % on-target

2/3 of non-specific assays may go unnoticed without NGS

0% 20% 40% 60%

0 < x < 0.1 0.1 < x < 0.2 0.2 < x < 0.3 0.3 < x < 0.4 0.4 < x < 0.5 0.5 < x < 0.6 0.6 < x < 0.7 0.7 < x < 0.8 0.8 < x < 0.9 0.9 < x < 1

slide-21
SLIDE 21

Specificity

perfect 60 293 86% acceptable (<10% non-specific) 5 866 8% predicted non-specificity (no specific design found) 1 204 2% failing specificity QC criteria 2 467 4%

the power of in silico verification

slide-22
SLIDE 22

MIQE compliant PrimePCR assay

validation data sheet

slide-23
SLIDE 23

Dynamic range

> 10 000 000 fold

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 16 777.216 8 388.608 4 194.304 2 097.152 1 048.576 524.288 262.144 131.072 65.536 32.768 16.384 8.192 4.096 2.048 1.024 0.512 0.256 0.128 0.064 0.032 0.016 0.008 0.004 0.002 0.001 gene count copies per cell human mouse rat

slide-24
SLIDE 24

SEQC

  • multisite, cross-platform analysis of RNAseq
  • FDA sponsored and guided MAQC-III
  • Nature Biotechnology, Sept 2014

Focus on RNA sequencing quality control (SEQC) 2 Biogazelle co-authors

  • MAQC samples

reference RNA with built in controls – known truths

  • > 100 billion reads
  • compared against qPCR (PrimePCR)
slide-25
SLIDE 25

RNAseq vs PrimePCR

Differential expression

454 ILMN PGM PRO 0.83 0.89 0.86 0.89

13,190 genes 16,264 genes 14,981 genes 16,242 genes

slide-26
SLIDE 26

qPCR (PrimePCR) vs RNAseq (Illumina)

r² = 75% for genes detected by both platforms

slide-27
SLIDE 27

qPCR (PrimePCR) vs RNAseq (Illumina)

slide-28
SLIDE 28

Saturation analysis

preparation ¡ sample ¡ libraries ¡ reads ¡ GENCODE12 mapping ¡ PrimePCR mapping ¡ ribo- depleted ¡ MAQC A ¡ 22 ¡ 5 304 M ¡ 1 955 M (37%) ¡ 1 692 M (32%) ¡ MAQC B ¡ 17 ¡ 3 370 M ¡ 1 447 M (43%) ¡ 1 193 M (35%) ¡ poly-A– enriched ¡ MAQC A ¡ 4 ¡ 427 M ¡ 291 M (68%) ¡ 278 M (65%) ¡ MAQC B ¡ 4 ¡ 446 M ¡ 323 M (72%) ¡ 297 M (67%) ¡

ABRF-NGS dataset

slide-29
SLIDE 29

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 096 000 000 2 048 000 000 1 024 000 000 512 000 000 256 000 000 128 000 000 64 000 000 32 000 000 16 000 000 8 000 000 4 000 000 2 000 000 1 000 000 500 000 250 000 125 000 MAQC A - detection MAQC B - detection

Saturation analysis

ribo-depletion RNAseq - % of GENCODE12

slide-30
SLIDE 30

Saturation analysis

ribo-depletion RNAseq - % of GENCODE12

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 096 000 000 2 048 000 000 1 024 000 000 512 000 000 256 000 000 128 000 000 64 000 000 32 000 000 16 000 000 8 000 000 4 000 000 2 000 000 1 000 000 500 000 250 000 125 000 MAQC A - detection MAQC B - detection

slide-31
SLIDE 31

Saturation analysis

ribo-depletion RNAseq - % of GENCODE12

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 096 000 000 2 048 000 000 1 024 000 000 512 000 000 256 000 000 128 000 000 64 000 000 32 000 000 16 000 000 8 000 000 4 000 000 2 000 000 1 000 000 500 000 250 000 125 000 MAQC A - detection MAQC B - detection MAQC A - quantification MAQC B - quantification

slide-32
SLIDE 32

Saturation analysis

poly-A RNAseq - % of GENCODE12

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 096 000 000 2 048 000 000 1 024 000 000 512 000 000 256 000 000 128 000 000 64 000 000 32 000 000 16 000 000 8 000 000 4 000 000 2 000 000 1 000 000 500 000 250 000 125 000 MAQC A - detection MAQC B - detection MAQC A - quantification MAQC B - quantification

slide-33
SLIDE 33

Saturation analysis

ribo-depletion RNAseq for MAQC A - GENCODE12 vs PrimePCR

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 096 000 000 2 048 000 000 1 024 000 000 512 000 000 256 000 000 128 000 000 64 000 000 32 000 000 16 000 000 8 000 000 4 000 000 2 000 000 1 000 000 500 000 250 000 125 000 GENCODE12 detection primePCR detection GENCODE12 quantification primePCR quantification

slide-34
SLIDE 34

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 4 096 000 000 2 048 000 000 1 024 000 000 512 000 000 256 000 000 128 000 000 64 000 000 32 000 000 16 000 000 8 000 000 4 000 000 2 000 000 1 000 000 500 000 250 000 125 000 ribo-depletion RNAseq - detection poly-A RNAseq - detection qPCR - detection ribo-depletion RNAseq - quantification poly-A RNAseq - quantification qPCR - quantification

Saturation analysis

MAQC A - % of PrimePCR

slide-35
SLIDE 35

Confirmation rate of novel junctions

Junction prediction junctions ¡ confirmed ¡ confirmation rate ¡ multiple algorithms

(Cstar + Magic + Subread) ¡

136 ¡ 136 ¡ 100% ¡ single algorithm ¡ 24 ¡ 20 ¡ 83% ¡

  • novel exon
  • ne of the primers in the novel exon
  • novel junction
  • ne of the primers overlapping the novel junction

≥ 5 bases at either side of junction

  • size analysis to confirm expected size for novel transcripts
slide-36
SLIDE 36

Conclusions - I

  • Assay design and in silico verification
  • Transcript coverage
  • SNPs and secondary structures
  • Specificity prediction
  • Empirical assay validation
  • Efficiency in 90-110% range
  • Stringent specificity analysis by massively parallel amplicon

sequencing

  • validated assays for human, mouse & rat coding genes

PrimePCR

slide-37
SLIDE 37

Conclusions - II

  • qPCR based transcriptome profiling
  • Samples from MAQC/SEQC study
  • PCR data as benchmark for evaluation of RNAseq
  • qPCR benefits: high sensitivity and large dynamic range
  • good correlation with RNAseq results
  • for individual genes, RNAseq ≤ 100 M reads gives lower

sensitivity than qPCR

  • the majority of novel junctions identified by RNAseq can be

confirmed by qPCR

slide-38
SLIDE 38