Errors, biases and Quality control in Next Gen Sequencing Dr David - PowerPoint PPT Presentation

Errors, biases and Quality control in Next Gen Sequencing Dr David Humphreys d.humphreys@victorchang.edu.au - Lab scientist : Bioinformatician - RNA biologist - small RNAs (miRNA) Victor Chang Cardiac Research Institute, Sydney, Australia

Testing hypothesis and theories Data points Errors/Biases: HTS/NGS - Present in all experiments - Be aware/informed - Minimise - Test 1994 2009 2013 ME! ??? You??? 2013 Time line Next generation sequencing: - Series of experiments - Biases/error accumulate!

Anscombe’s Quartet Anscombe F.J (1973) American Statistician Image source: Wikipedia Maths is a tool for analysis. • You can blindly ignore biases and errors in data sets. • - mean, stdev, variance, correlation are the same!

High Throughput Sequencing Molarity Cores CPU Workflow: Genes Fluoresence RAM Scripts Gels Genome Absorbance Threads SNPs Stains Titrations Command line Sample Library Clonal Sequencing Bioinformatics preparation preparation amplification Quantification Purity Cummulative Error Challenges: (1) Awareness (2) QC considerations Community Time Cost Network; Throughput Literature Consumption Sensitivity/specificity

Quantification: Nanodrop spectrophotometer * http://seqanswers.com/forums/showthread.php?t=21280 Quick • Consumes 1-2ul sample • http://www.nanodrop.com/Library/CVStech_17_11_FINAL.pdf Large dynamic range • Contaminants: (10 – 10,000ng/ul) 230nm: EDTA, carbohydrates, sodium acetate * , tris * 270nm: Phenol ( plus at 230nm * ) Can identify contaminations 280nm: DTT • Ratios 260/280 : 1.8 (DNA) 2.0 (RNA) 260/270 : 1.2 – 1.3? 260/230 : 2.0 – 2.2 Solution: Re-precipitate/buffer exchange ! WARNING ! WARNING Careful of accuracy < 50ng/ul • Contaminants can impact on downstream • Careful of concentrations > 1ug/ul • enzymatic reactions Does not assess quality!! •

Quantification: Qubit fluorimeter More sensitive than nano-drop • Consumes small amount of sample • Specific assays • ! WARNING Known biases in quantifying ssRNA < 50ng/ul • Cannot quantitate ssDNA in presence of dsDNA •

Quantification Agilent Bioanalyzer * RNA integrity index (RIN) Chip Application Quantitative range Total RNA * 5-500ng/ul mRNA 25-250ng/ul Total RNA * 50-5000pg/ul mRNA 250-5000pg/ul dsDNA 5-500 pg/ul (50-7000bp) - Use at least 50ng for meaningful RIN Consumes small amount of sample • Quantification Schroeder et al (2006) BMC Mol Bio. • Estimating nucleic acid size • WARNING ! Each chip has a quantitative range Limitations on size range • • Sensitive to salts. Not accurate quantitating broad smears • •

Sample Library Clonal Sequencing Bioinformatics preparation preparation amplification Sample Purification/Assessment/Processing Criteria RNA DNA QC High complexity Trizol vs column Phenol:chloroform qPCR, Northern based vs column based blotting?? High quality RIN > 8 Unfragmented Bioanalyzer, gel electrophoresis Accurate pg - ng - ug pg - ng - ug Qubit/Nanodrop, Quantification Agilent Bioanalyser Contamination A260/280 = 2 A260/280 = 1.8 Qubit, Nanodrop (salts, organics) A260/230 >2 A260/230 >2 Enrichment Deplete ribosomes Exome capture qPCR/Agilent Fragment Uniform peaks better than broad Agilent 1) Library manual as provided by the manufacturer 2) http://nxseq.bitesizebio.com/articles/ GOAL: to have a final sample with high complexity

Sample Library Clonal Sequencing Bioinformatics preparation preparation amplification Purification Kim et al., (2012) biases Molecular Cell 46, 893-895 Kim et al., (2011) Ratio 141/200c Molecular Cell 43, 1005-1014 Cell number Low = 500,000 High = 800,000 Small RNA ppt with longer RNA • 1mL Most susceptible: Trizol • Low GC content, 2ndary structure Library prep + Sequence Cell number miRNAs: (L) = 200,000 -141 -29b -21 -106b -15a -34a (H) = 800,000 decreased in cells grown at low confluence/loss of adhesion

Sample Library Clonal Sequencing Bioinformatics preparation preparation amplification miRNA Hafner et al., (2011) “ RNA-ligase-dependent biases in miRNA ….. cDNA libraries” library RNA 17(9), 1-16 biases Input: Pool A = Equimolar - 770 synthetic miRNAs - 45 designed RNAs Pool B = 10 fold serial dilution Ligation biases PCR bias Reverse Transcription bias -Enzyme Dilute 1:10000 Not a significant source of 5 x -Temperature 10 PCR cycles sequence specific biases -Sequence - No appreciable distortion! ! WARNING Don’t compare NGS data sets from different library preps • Be consistent with incubation times/temperatures •

Sample Library Clonal Sequencing Bioinformatics preparation preparation amplification Sequencing platforms Ion torrent Kapa Biosystems Illumina Standard reagents Complete genomics Flowcell/lane variations do occur Smaller than those observed between platforms Ross et al., Characterizing and measuring bias in sequence data. Genome Biology 2013 • Bragg et al., Shining a light on Dark sequencing characterising errors . PLoS Comp Biol 2013 • Loman et al., Performance comparison of benchtop HTS platforms. Nature Biotech 2012 • Quail et al., Tale of three NGS platforms . BMC Genomics, 2012 • Lam et al., Performance comparison of whole genome sequencing platforms . Nat Biotech 2012 •

Sample Library Clonal Sequencing Bioinformatics preparation preparation amplification Raw sequencing files Assessing sequence quality Align (pipeline) Assessing alignment data

The Basics: VCCRI Raw File types : fastq, csfasta, qual, fasta, xsq sequencing files Header : Coordinates/other Sequence : A T C G N/. Assessing sequence Quality values : Phred score quality Align (pipeline) Assessing alignment data 0 10 20 30 40 Numerical : . . . . ! “ # $ % & ‘ ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I Phred+33 : @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h Phred+64 :

VCCRI Raw sequencing files Free java utility that can assess QC metrics of HTS data sets. • - GUI - Command line Assessing - Can create html output sequence quality fastq (standard, gzip, colorspace, casava), SAM/BAM • Align (pipeline) Assessing alignment data Not all data sets require full complement of green ticks!!

Raw sequencing files Assessing 90% Very good 75% sequence Median quality Reasonable Mean Align (pipeline) Poor 25% Assessing 10% alignment data

VCCRI Raw sequencing files Identifies if subset of sequences have low quality Assessing sequence quality May identify cycles that are unreliable Align Identify adaptors (pipeline) and primers Helps assess raw data files prior to mapping - low quality data may cause incorrect alignments Assessing - low quality data may incorrectly call variations alignment - Sequence with trailing adaptor sequences will not map data

Aligners Raw sequencing files Be aware of the default options - Accepted Errors - Multimappers Assessing sequence quality Different aligners can give different results. Benchmarking short sequence mapping tools Hatem et al (BMC Bioinformatics, 2013) Align Reference (pipeline) Choose a suitable reference. Include mitochondrial sequence Design a filter set to capture repeated sequences (rRNA, tRNA) Assessing alignment data

Assessing alignment data Include a filter % mapped Mapping statistics Raw sequencing % mapped at what length files Pass Questionable Alignment feature statistics Filter raw data Assessing - Coverage - Filter sequence - Expression - Trim quality - Discovery Test Align (pipeline) Assessing ! Important alignment data Know your mapping statistics • Know what to expect from your data sets • Test on existing data set •

Take home messages NGS is a collection of experiments • Biases/errors can/will occur at all steps of a high throughput sequencing study • QC measures should applied at all steps of a high throughput sequencing study • Don’t be alarmed, stay informed • Be familiar with existing data sets

miRNA sequencing profiling miRspring Humphreys D.T., and Suter C.M. Nucleic Acids Research 2013 . http://miRspring.victorchang.edu.au Small (<2MB) HTML document that replicates the miRNA aligned sequencing data. • Needs NO internet connectivity. • Provides visualization of sequence data • Reports on miRNA processing • Complete transparency. •

microRNAs miRspring reporting tools Small non-coding RNAs (22nt) • Bind to 3’UTRs � decay and/or translational repression • Biogenesis: Derived from longer stem loop precursors • i) 5’ isomiRs iii ii) 3’ isomiRs A � G ii i vi C � T iii) Non-canonical v v iv) Arm bias v) miRNA length i ii vi) RNA editing 5’ 3’ iv

miRspring miRNA clusters Mono-cistronic Poly-cistronic Genomic Genomic miRNA Seed analysis miR-196a UAGGUAGUUUCCUGUUGUUGGG AGGUAGU let-7a UGAGGUAGUAGGUUGUAUAGUUU GAGGUAG let-7a UGAGGUAGUAGGUUGUAUAGUUU

Errors, biases and Quality control in Next Gen Sequencing Dr David - PowerPoint PPT Presentation

Errors, biases and Quality control in Next Gen Sequencing Dr David Humphreys d.humphreys@victorchang.edu.au - Lab scientist : Bioinformatician - RNA biologist - small RNAs (miRNA) Victor Chang Cardiac Research Institute, Sydney, Australia

Basic Errors Compiling in Unix Syntax errors Common Errors, and Debugging Run-Time errors

Sequencing technology and assembly Sanger sequencing Sanger sequencing with radioactivity

Genomics Sequencing tech Sequencing tech: next generation What do we get from sequencing? How

Heuristics and biases Tina Nane 2 Heuristics and biases Lotto Icon by Dapete is

Next Next Generation Sequencing: an overview of Generation Sequencing: an overview of

Genesis 20 THE PERSISTENCE OF SIN First Look At first glance, Gen 20 doesnt seem to fit Gen

Plan for Today Finish control-flow code gen from Tuesday Handling shift-reduce errors

Whol e Gen ome Sh ot gun S equencing Whol e Gen ome Sh ot gun S equencing Shotgun DNA

Present and Future of Angular with Ivy Template ViewEngine Ivy Compiler Gen Gen Gen

Unified error reporting -- A worthy goal? Andi Kleen, Intel Corporation Sep 2009

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

Reintroducing Gen 15 The OTs Doubting Thomas Gen 15:1-5 What can you give me if I

Jobs, Jobs, Jobs Gen Y, Gen Y, Gen Y 4,500,000 4 500 000 Vacation Home First Baby Purchase

TEXT AND TEXT AND AUTOMATED BIASES AUTOMATED BIASES NATURAL LANGUAGES ARE THE NATURAL

Biases in Decision Making Alexander Felfernig alexander.felfernig@ist.tugraz.at Decision Biases

Unconscious Bias 1 Questions to Start: Are we aware of our unconscious biases? Do we accept

TTC and science. Hans Muilerman, PAN Europe www.pan-europe.info TTC, science or politics?

EARLY DI DIAGNOSIS IS OF BRAIN DI DISEASES via liquid id biopsy psy, based sed on miRNA NA

end-of-life in ports BoatDIGEST Final Conference Brussels, 23 September 2015 With the support of

Micro RNA (miRNA) Short non-coding RNAs (~22 nucleo.des in length)

Working with Bioconductor Objects: Microarray Analysis Martin Morgan, Chao-Jen Wong Fred

Attaching Fluorescent Nanoclusters to DNA Origami Microarrays John Devany 10/31/14 Worster

Enterprise and ICUK at Southampton Some facts and figures One of top UK research-led

AGM 22 November 2011 Forward Looking Statements This presentation may contain forward looking

Sambuz

Useful Links

Newsletter

Mail Us