Digital PCR for copy number analysis Jo Vandesompele, PhD - - PowerPoint PPT Presentation
Digital PCR for copy number analysis Jo Vandesompele, PhD - - PowerPoint PPT Presentation
Digital PCR for copy number analysis Jo Vandesompele, PhD Biogazelle CSO, UGent professor EMBL Advanced Course Digital PCR, Heidelberg, Germany October 22, 2015 Acknowledgements (A-Z) Lieven Clement, Els Goetghebeur, Bart Jacobs, Peter
Acknowledgements (A-Z)
Lieven Clement, Els Goetghebeur, Bart Jacobs, Peter Pipelers, Olivier Thas, Matthijs Vynck Steve Lefever, Björn Menten, Katrien Vanderheyden, Kimberly Verniers, Nurten Yigit Ariane De Ganck, Nele Nijs Xavier Alba, Jen Berman, Frank Bizouarn, Viresh Pattel, Svilen Tzonev
Agenda
- introduction
- experiment design
- power analysis
- sensitivity vs. inhibition vs. availability of input
- CNV use cases
- advanced data-analysis
- droplet classification
- combining replicates & multigene normalization
- tips & tricks
Full text papers available on Biogazelle website
http://www.biogazelle.com > Knowledge center > publications
Biogazelle blog on dPCR vs. qPCR
http://www.biogazelle.com/knowledge-center/blog
Digital PCR is emerging as gold standard method for CNV
- Biogazelle is reference lab for Bio-Rad’s QX100/200 droplet digital
PCR technology
- Scalable precision and relative sensitivity (needle in the
haystack) (“more is better”)
- High accuracy (without calibration)
- Excels in quantification of small differences and rare events
Application domains
- in principle any nucleic acid quantification study
(cost/throughput)
- focus on those areas where dPCR excels
- small differences
- CNV analysis (high copy number range, transgene stability testing,
cell-free DNA (NIPT, oncogene amplification)
- gene expression (microRNA, splice variants)
- rare events
- pathogens (e.g. viral load in body fluid such as urine)
- mutant cancer cells (tissue, circulating cells or cell-free DNA)
- circulating RNA biomarker (cell-free RNA)
dMIQE guidelines for digital PCR
- Clinical Chemistry, 2013
- co-authored by Biogazelle founders
dMIQE guidelines have 3 goals
1. Design, perform, and report dPCR experiments that have greater scientific integrity 2. Facilitate replication of published experiments adhering to the guidelines 3. Provide critical information that allows reviewers and editors to assess the technical quality of manuscripts
Power analysis is a crucial aspect of experiment design
- Ensure proper setup to find a true difference with statistical
significance
- Often ignored
- Limitations of dPCR power analysis in literature
- no or few details on the methods
- no incorporation of replicate variability (instead, reactions
are (naively) pooled over replicates)
- not taking into account of all variables (e.g. replicates,
fraction of negative droplets, …)
- use of meta-analysis methods (instead of ad hoc statistical
method)
Digital PCR power analysis is a function of
- true difference you want to see
- number of partitions
- fraction of negative partitions
- number of replicates
- alpha value (type I error, false positive rate, 5%)
- 97% power to detect a 10% difference in copy number using
3 replicated reactions of each 14,000 partitions with 30% negative partitions
- 53% for a 5% difference
Interactive tool to determine power in digital PCR experiments
- power for a given condition
- power
~ number of replicates ~ fraction of negative partitions ~ number of partitions ~ copy number difference
- optimal negative fraction (for max power) ~
copy number difference
- Vynck et al., in preparation
http://vandesompelelab.ugent.be/power/
Power in function of fraction of negative partitions
http://vandesompelelab.ugent.be/power/
- difference of 10%
- 14,000 partitions
- 3 replicates
Power in function of number of replicates
http://vandesompelelab.ugent.be/power/
- difference of 10%
- 14,000 partitions
- 95% negatives
Power in function of number of partitions
http://vandesompelelab.ugent.be/power/
- difference of 15%
- 1 replicate
- 30% negatives
What is determining the sensitivity of dPCR?
- Both qPCR and dPCR can detect 1 molecule (precision is higher
for dPCR at low concentrations)
- Input amount of nucleic acids
- more cDNA to detect a low abundant transcript (e.g. long
non-coding RNA)
- more circulating cell-free DNA to detect a low frequent
mutation
intended&sensitivity ng&of&DNA&needed 10.000% 0.229 1.000% 2.286 0.100% 22.857 0.010% 228.571 0.001% 2285.714
assuming at least 5 positive droplets are needed for confident calling, a perfectly discriminating assay between wild type and mutant, 14,000 recovered droplets from 20,000 formed
Large dynamic range, high precision and accuracy
- Correlation between expected and measured concentrations
- n a gDNA dilution series (ranging from 100 000 copies/reaction
to 5 copies/reaction) (320 ng – 16 pg DNA)
y = 0.9781x + 0.0695 R² = 0.99877 1 2 3 4 5 6 1 2 3 4 5 6 log10 (measured concentration) copies/ddPCR reaction log10 (expected concentration) copies/ddPCR reaction
Unpurified digested genomic DNA inhibits ddPCR if > 30 v/v%
y = 1.143x + 3.224 R² = 0.990 3.0 3.5 4.0 4.5 5.0 5.5 0.6 0.8 1.0 1.2 1.4 1.6 1.8 log10 (measured concentration) copies/reaction log10 (gDNA concentration) v/v%
25 5 7.5 10 15 20 30
cDNA inhibits ddPCR if > 25 v/v%
- Influence of cDNA input amounts (ranging from 5 to 45 v/v%) on
measured concentration
y = 0.921x + 3.306 R² = 0.999 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.6 0.8 1.0 1.2 1.4 1.6 1.8 log10 (measured concentration) copies/reaction log10 (cDNA concentration) v/v%
5 10 15 20 25
Case 1 – genetic characterization of cell banks
- Therapeutic protein production in
biopharmaceutical industry
- Transgene copy number has influence
- n expression level
- Need for a cell line that is genetically
stable throughout the biopharmaceutical manufacturing process
- Genetic characterization of Master Cell
Bank (MCB) and Working Cell Bank (WCB)
- Traditionally by Southern blot analysis -
laborious and time consuming
- > qPCR method for transgene copy
number determination
Case 1 – struggling with qPCR
- Transgene copy number analysis
- Limited accuracy at higher copy numbers
- Compensated by including more PCR replicates and
calibrators
(D’haene et al., Methods, 2010)
- Pilot study: synthetic CN series (1-10 copies) measured with
16 qPCR replicates
- Resampling to investigate impact of increased number of
replicates & calibrator samples
- Conclusion
- 8 qPCR replicates and 3 calibrator samples are required for
CN analysis at increased copy numbers
- Still relatively large deviation from expected copy number in
proof of concept study
S1 S2 S3 S4 S5 S6 S7 S8
Case 1 – proof of concept 1
- Copy numbers from duplex assay – gene 1 (performed in
triplicate)
- bserved normalized copy numbers tightly agree with expected
integer copies
expected CN: 0 0 1 2 3 4 5 5
Copy number
Case 1 – proof of concept 2
- Copy numbers from duplex assay – gene 2 (performed in
triplicate)
- deviation from expected integer copies for samples 3 and 4
S1 S2 S3 S4 S5 S6 S7 expected CN: 1 1 4 4 3 0 1
Copy number
Case 1 – getting integer copy numbers with ddPCR
- Copy numbers from duplex assay – gene 2 (XbaI restriction
digest)
- Restriction digest is required to properly count linked loci (here:
tandem repeats)
S1 S2 S3 S4 S5 S6 S7 expected CN: 1 1 4 4 3 0 1
Copy number
Restriction digest
Case 1 - ddPCR versus qPCR
- ddPCR has higher
accuracy than qPCR
- 3.1 x lower standard
deviation on log2 copy numbers
- 2.3 x smaller fold changes
between max and min copy number
- Less reactions required
for ddPCR than for qPCR
- ddCPR requires no
external standard or calibrator sample with known copy number
0.00# 1.00# 2.00# 3.00# 4.00# 0.00# 1.00# 2.00# 3.00# 4.00# 5.00# ddPCR% qPCR%
qPCR ddPCR
Case 1 – ddPCR based genetic characterization of cell banks
- Copy number
- 24 samples – WCB
- Duplex assay – gene 1
- Expected CN: 5
- Deviation from expected
CN
- Average: 0.11
- Standard deviation: 0.078
Copy number
01_WCB 02_WCB 03_WCB 04_WCB 05_WCB 06_WCB 07_WCB 08_WCB 09_WCB 10_WCB 11_WCB 12_WCB 13_WCB 14_WCB 15_WCB 16_WCB 17_WCB 18_WCB 19_WCB 20_WCB 21_WCB 22_WCB 23_WCB 24_WCB
0.05 0.1 0.15 0.2 0.25 0.3 01_WCB 02_WCB 03_WCB 04_WCB 05_WCB 06_WCB 07_WCB 08_WCB 09_WCB 10_WCB 1_WCB 12_WCB 13_WCB 14_WCB 15_WCB 16_WCB 17_WCB 18_WCB 19_WCB 20_WCB 21_WCB 22_WCB 23_WCB 24_WCB
01_WCB 02_WCB 03_WCB 04_WCB 05_WCB 06_WCB 07_WCB 08_WCB 09_WCB 10_WCB 11_WCB 12_WCB 13_WCB 14_WCB 15_WCB 16_WCB 17_WCB 18_WCB 19_WCB 20_WCB 21_WCB 22_WCB 23_WCB 24_WCB
Deviation
Case 1 – ddPCR based genetic characterization of cell banks
- ddPCR is very well suited for transgene copy number
determination
- Genetic characterization of cell banks for therapeutic
protein production
- Transgene copy number analysis in genetically modified
(GM) crop research
- Transgenic animal models
- Remark: qPCR is the standard approach in biopharmaceutical
industry – will take some time to adopt ddPCR
Case 2 – clinical genetics application
- Detection of chromosomal aneuploidies
- Proof of concept on post-natal samples
- Future: non-invasive prenatal testing (NIPT)
- Challenge to achieve accuracy and precision required to
quantify fetal copy numbers in prenatal samples based on low level fetal cfDNA in maternal blood (median amount of 10%)
Case 2 – assay design and validation
- Design of assays for a number of loci on chromosomes for which
copy number variations are most often found
- Chromosome 21 (e.g. trisomy 21 or Down syndrome)
- Chromosome 13 (e.g. trisomy 13 or Patau syndrome)
- Chromosome 18 (e.g. trisomy 18 or Edwards syndrome)
- Chromosome X & Y (e.g. Turner syndrome)
- Empirical validation using qPCR
- Standard curve (dilution series) à efficiency QC
- Gel electrophoresis à specificity QC
Case 2 – assay design and validation
- Design of assays for a number of loci on chromosomes for which
copy number variations are most often found
- Chromosome 21 (e.g. trisomy 21 or Down syndrome)
- Chromosome 13 (e.g. trisomy 13 or Patau syndrome)
- Chromosome 18 (e.g. trisomy 18 or Edwards syndrome)
- Chromosome X & Y (e.g. Turner syndrome)
- ddPCR
- Chromosome specific assays (hydrolysis probe - FAM)
- Reference assay (RPP30 – VIC)
- Gradient PCR à standard protocol is suitable
- gDNA dilution series
- CNV duplex – 3 replicates
Case 2 – copy numbers of control samples
Control ¡1 Control ¡2 Control ¡3 Control ¡4 female male female male
A-13q B-13q A-18p A-18q B-18q A-21q B-21q A-Xp A-Xq B-Xq A-Yp B-Yp 2.5 2 1.5 1 0.5 A-13q B-13q A-18p A-18q B-18q A-21q B-21q A-Xp A-Xq B-Xq A-Yp B-Yp 2.5 2 1.5 1 0.5 A-13q B-13q A-18p A-18q B-18q A-21q B-21q A-Xp A-Xq B-Xq A-Yp B-Yp 2.5 2 1.5 1 0.5 2.5 2 1.5 1 0.5 A-13q B-13q A-18p A-18q B-18q A-21q B-21q A-Xp A-Xq B-Xq A-Yp B-Yp
Case 2 – copy numbers of cases
Case ¡5 Case ¡9 Case ¡18 female Turner trisomy 21 male trisomy 18 male
A-13q B-13q A-18p A-18q B-18q A-21q B-21q A-Xp A-Xq B-Xq A-Yp B-Yp C-21q 2.5 2 1.5 1 0.5 3.5 3 A-13q B-13q A-18p A-18q B-18q A-21q B-21q A-Xp A-Xq B-Xq A-Yp B-Yp C-21q 2.5 2 1.5 1 0.5 3.5 3 2.5 2 1.5 1 0.5 3.5 3 A-13q B-13q A-18p A-18q B-18q A-21q B-21q A-Xp A-Xq B-Xq A-Yp B-Yp C-21q
Case 2 – proof of concept on post-natal samples
- ddPCR is great for copy number analysis in majority of samples
- Non-integer copy numbers may be observed in difficult samples
- Accuracy and precision need improvements to allow for NIPT
- ultrashort amplicons
- improved cell-free DNA isolation method (300-1000 alleles
from 2 ml of plasma)
- multigene normalization (also for gene expression!)
Case 2 – optimization experiment design
- Standard CNV protocol – duplex normalization
- Triplicate ddPCR reactions
- 14 duplex reactions
- Each reaction contains one locus of interest (FAM) to be
normalized with reference locus (VIC)
- Normalization against reference locus copy number in the
same reaction
Case 2 – optimization experiment design
- Improved CNV protocol – multigene normalization
- Triplicate ddPCR reactions
- 7 duplex reactions
- Each reaction contains a FAM labeled assay and a HEX
labeled assay (à HEX as alternative to VIC (Zen / Iowa Black double quencher probes from IDT)
- No a priori selection of reference gene locus
- Normalization against all other autosomal chromosomes with
normal diploid copy number
geNorm - multigene normalization
- geNorm – cited more than 8000 times
Vandesompele et al., Genome Biology, 2002
Case 2 – multigene normalization
- Average deviation from integer copy numbers between different
normalization strategies deviation from integer CN
multigene normalization RPP30 normalization
Case 5 Case 6 Case 9 Case 16 Case 19 Case 20 Control 4 Control 3 Control 2 Control 1 0.000 0.010 0.020 0.030 0.040 0.050 0.060 0.070 0.080
multigene normalization RPP30 normalization Average 0.015 0.037 SD 0.008 0.025
Case 2 – optimization experiment design
- Results show that normalization using other autosomes improves
accuracy of copy numbers
- Normalization based on absolute autosomal counts reduces
running cost by 50%
Advanced digital PCR data-analysis
- Vynck et al., submitted
- GLMM framework (R and Shiny web app)
- handles replicate wells
- multiple reference gene normalization
- automatic selection and application of stable reference
genes
20 samples, 3 replicates each, ~ 14,000 droplets, negative fraction 80-90%, 95% CI
Results from oncogene detection in cell-free DNA from plasma
0" 0.5" 1" 1.5" 2" 2.5" 3" 3.5" 4" 4.5" 5"
2.0 3.0 1.0
- in 8/10, there was a perfect agreement on oncogene
amplification status
- in 2/10, there is no agreement
- fresh frozen is only marginally elevated (tumor
heterogeneity)
- tumor DNA
2.068 (95% CI 2.017-2.121) > elevated
- cfDNA
2.009 (95% CI 1.933-2.089) > normal
Comparison of plasma cfDNA and fresh frozen tumor DNA
More narrow CI with proper statistical processing of replicates
0.25% 0.5% 1% 2% 4% 8% 16% 32% 64% 128% 256% 512% 1024% 0.25% 0.5% 1% 2% 4% 8% 16% 32% 64% 128% 256% 512% 1024%
meta-analysis GLMM
More narrow CI with GLMM statistical processing of replicates
1" 2" 4" 1" 2" 4"
3:4 copies
- ncogene:reference gene
(tumor) cfDNA without evidence of
- ncogene amplification
cfDNA with signs of
- ncogene amplification
(p<0.05) meta-analysis GLMM
- Jacobs et al., BMC Bioinformatics, 2014
Partition misclassification has largest impact on accuracy and precision
Interactive tool to inspect sources of variance on absolute quantification
http://users.ugent.be/~bkjacobs/dPCR_VarComp/index.html
- Stochastic clustering approach that matches the intuition
- Using the raw data from the QX100
- Multistep approach
- cluster center location (expectation maximization)
- remove the rotation
- univariate projection on each channel
- robustly fit a normal null distribution on the negative peak
- calculate the posterior probability to be negative with respect
to the channel for each droplet
- combine both channels
Development of a framework for
- bjective partition classification
Find cluster centers and remove rotation
Fit the null distribution and calculate posterior probability of the negatives
no rain rain
- red = fitted distribution of the negatives
- black = entire distribution
- probability negative droplet = red/black in the projected point of the droplet
Combine channels and label clusters based on max probability
Gene copy number quantification
- n digested high quality DNA
Inhibition due to cDNA carryover
Oncogene amplification in cfDNA
Single channel data for low concentration target
Single channel data for low concentration target
- better dealing with outlier droplets with lower than
negative amplitude (deviating droplet volumes?)
- use combined estimated distribution of no template
reactions instead of theoretical normal distribution
Work in progress
General conclusions (1)
- ddPCR is a great tool for copy number analysis
- no need for reference sample with known copy number
- better accuracy and precision compared to qPCR
- Points of attention
- restriction digest is required to quantify linked loci (e.g.
tandem repeats)
- Remaining challenges
- non-integer copy numbers for difficult samples
- further improve accuracy and precision to meet NIPT
requirements (for instance smaller amplicon size)
General conclusions (2)
- Power analysis is important (and easy)
- interactive tool
- Mathematical framework for combining replicates, selecting
reference genes, and multigene normalization
- latent variable, complementary log-log link, GLMM
- Vynck et al., submitted
- Statistical framework for automated (objective) droplet
classification
- Jacobs et al., work in progress
Tips & tricks
Template input (1)
- ~1 copy per droplet (CPD) (highest precision is at 1.59)
- range of 1-100 000 copies / 20 µl ddPCR reaction
- 0.00005 - 5 CPD
- 0.15
- 0.1
- 0.05
0.05 0.1 0.15
95% confidence interval fraction fraction positive droplets 0.11 0.22 0.36 0.51 0.69 0.92 1.20 1.61 2.30 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 copies per droplet
1 well (20,000 droplets) 3 wells merged
Template input (2)
- maximum 25 v/v% unpurified digested gDNA or undiluted cDNA
to prevent inhibition (test using your own reagents)
- DNA digest is required for gene copy number analysis, especially
for linked loci (not required for FFPE and cell-free DNA)
- integrity of DNA/RNA is as important for dPCR as for qPCR
- Vermeulen et al., Nucleic Acids Research, 2011
ddPCR assay design guidelines
- in house primerXL design pipeline
- primer3 based
- avoid SNPs (Lefever et al., Clinical Chemistry, 2013)
- avoid secondary structures (UNAFold)
- assess specificity (BiSearch / Bowtie)
- target: FAM-IBFQ, reference HEX-IBFQ
- amplicon length <70 nt if possible
- primer Tm: 61-63 °C
- probe Tm: 64-68 °C (65 opt)
- probe length: 14-25 nt (18 opt)
- HaeIII-compatible amplicons
Separation of + and - droplets depend
- n amplicon & probe length
- amplicons >100
bp, positive intensities drop
- rise in
negatives as probe length increases (> 25 nt)
Gradient PCR allows selection of
- ptimal annealing temperature
- gradient from 55-65 °C
- ptimal Ta, specificity check
Duplex test validation
- same quantification result as in singleplex
- rthogonal droplet clusters in 2D plot
- rthogonality of duplex assay can be improved by
- Tm matching between target and reference assay
- Droplet PCR Supermix (#186-3023) (adding more resources)