Errors, biases and Quality control in Next Gen Sequencing
Dr David Humphreys
d.humphreys@victorchang.edu.au
- Lab scientist : Bioinformatician
- RNA biologist
- small RNAs (miRNA)
Victor Chang Cardiac Research Institute, Sydney, Australia
Errors, biases and Quality control in Next Gen Sequencing Dr David - - PowerPoint PPT Presentation
Errors, biases and Quality control in Next Gen Sequencing Dr David Humphreys d.humphreys@victorchang.edu.au - Lab scientist : Bioinformatician - RNA biologist - small RNAs (miRNA) Victor Chang Cardiac Research Institute, Sydney, Australia
d.humphreys@victorchang.edu.au
Victor Chang Cardiac Research Institute, Sydney, Australia
Errors/Biases:
Data points HTS/NGS
Time line
1994 2013
2009
ME! ???
2013
You???
Next generation sequencing:
Image source: Wikipedia
Anscombe F.J (1973) American Statistician
Sample preparation Library preparation Clonal amplification Sequencing Bioinformatics
Quantification Purity
(1) Awareness Community Literature Network; (2) QC considerations Time Cost
Gels Stains Absorbance Molarity Titrations Fluoresence CPU Cores Scripts Command line RAM Threads
Consumption Throughput
Genes Genome SNPs
Sensitivity/specificity Cummulative Error
http://www.nanodrop.com/Library/CVStech_17_11_FINAL.pdf
WARNING
* http://seqanswers.com/forums/showthread.php?t=21280 Contaminants: 230nm: EDTA, carbohydrates, sodium acetate*, tris* 270nm: Phenol (plus at 230nm*) 280nm: DTT
WARNING
enzymatic reactions
Ratios 260/280 : 1.8 (DNA) 2.0 (RNA) 260/270 : 1.2 – 1.3? 260/230 : 2.0 – 2.2
(10 – 10,000ng/ul)
Solution: Re-precipitate/buffer exchange
WARNING
WARNING
* RNA integrity index (RIN)
Schroeder et al (2006) BMC Mol Bio.
Total RNA * 5-500ng/ul mRNA 25-250ng/ul Total RNA * 50-5000pg/ul mRNA 250-5000pg/ul dsDNA 5-500 pg/ul (50-7000bp)
Chip Application Quantitative range
Criteria RNA DNA QC High complexity Trizol vs column based Phenol:chloroform vs column based qPCR, Northern blotting??
High quality RIN > 8 Unfragmented Bioanalyzer, gel electrophoresis Accurate Quantification pg - ng - ug pg - ng - ug Qubit/Nanodrop, Agilent Bioanalyser Contamination (salts, organics) A260/280 = 2 A260/230 >2 A260/280 = 1.8 A260/230 >2 Qubit, Nanodrop
Enrichment Deplete ribosomes Exome capture qPCR/Agilent Fragment Uniform peaks better than broad Agilent
Sample preparation Library preparation Clonal amplification Sequencing Bioinformatics
1) Library manual as provided by the manufacturer 2) http://nxseq.bitesizebio.com/articles/
Sample preparation Library preparation Clonal amplification Sequencing Bioinformatics
miRNAs:
decreased in cells grown at low confluence/loss of adhesion
Library prep + Sequence
Cell number (L) = 200,000 (H) = 800,000
1mL Trizol
Kim et al., (2011)
Molecular Cell 43, 1005-1014
Cell number Low = 500,000 High = 800,000 Ratio 141/200c
Kim et al., (2012)
Molecular Cell 46, 893-895
Low GC content, 2ndary structure
Ligation biases
Sample preparation Library preparation Clonal amplification Sequencing Bioinformatics
Hafner et al., (2011)
“RNA-ligase-dependent biases in miRNA ….. cDNA libraries”
RNA 17(9), 1-16
Input:
Reverse Transcription bias
Not a significant source of sequence specific biases Pool A = Equimolar Pool B = 10 fold serial dilution
PCR bias
Dilute 1:10000 10 PCR cycles
5 x WARNING
Sample preparation Library preparation Clonal amplification Sequencing Bioinformatics Ion torrent Illumina Complete genomics Kapa Biosystems Standard reagents Flowcell/lane variations do occur Smaller than those observed between platforms Sequencing platforms
Sample preparation Library preparation Clonal amplification Sequencing Bioinformatics
Raw sequencing files
Assessing sequence quality Align (pipeline) Assessing alignment data
10 20 30 40 . . . . ! “ # $ % & ‘ ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h
Numerical : Phred+33 : Phred+64 : Quality values: Phred score
File types: fastq, csfasta, qual, fasta, xsq
Sequence: A T C G N/. Header: Coordinates/other
VCCRI
Raw sequencing files Assessing sequence quality Align (pipeline) Assessing alignment data
VCCRI
Very good Reasonable Poor
Median 90% 10% 75% 25% Raw sequencing files
Align (pipeline) Assessing alignment data Mean
Identify adaptors and primers
VCCRI
Raw sequencing files
Align (pipeline) Assessing alignment data
Identifies if subset
low quality May identify cycles that are unreliable Helps assess raw data files prior to mapping
Choose a suitable reference. Include mitochondrial sequence Design a filter set to capture repeated sequences (rRNA, tRNA) Reference Be aware of the default options
Raw sequencing files Assessing sequence quality
Assessing alignment data
Different aligners can give different results.
Benchmarking short sequence mapping tools Hatem et al (BMC Bioinformatics, 2013)
Raw sequencing files Assessing sequence quality Align (pipeline)
Pass Questionable Alignment feature statistics
Test Filter raw data
Important
Include a filter % mapped % mapped at what length
Humphreys D.T., and Suter C.M. Nucleic Acids Research 2013. http://miRspring.victorchang.edu.au
i i i) 5’ isomiRs ii ii) 3’ isomiRs ii iii iii) Non-canonical iv iv) Arm bias v) miRNA length v v A G C T vi vi) RNA editing
miRNA clusters
Mono-cistronic Poly-cistronic
miRNA Seed analysis miR-196a UAGGUAGUUUCCUGUUGUUGGG let-7a UGAGGUAGUAGGUUGUAUAGUUU AGGUAGU GAGGUAG let-7a UGAGGUAGUAGGUUGUAUAGUUU
Genomic Genomic
Sampling bias!
Tissue Atlas Heart Kidney Liver Lung Ovary Spleen Testes Thymus Brain Placenta AGO IP THP-1 ENCODE HeLa S3 A549 Ag04450 Bj Gm1287 H1hesc HepG2 Huvec K562 MCF7 NheK Sknshra
Top 100 miRNAs typically:
miRspring provide a quick easy way to analyse QC parameters of your data set
Centile Rank Centile Rank
Victor Chang Cardiac Research Institute, Sydney, Australia
Example miRspring documents can be found at http://miRspring.victorchang.edu.au
Cath Suter Paul Young Rupert Shuttleworth Diane Fatkin Monique Ohanian Djordje Djordjevic Chris Hayward Kavitha Muthiah Richard Harvey Mirana Ramialison Ashley Waardenberg IT Timothy Kersten Pardeep Dhiman Thomas Priess (VCCRI/ANU)
Matthias Hentze (EMBL)
Funding bodies ARC NHMRC Viertel Charitable Foundation Perpetual Trust VCCRI