olga vinnere pettersson phd
play

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted - PowerPoint PPT Presentation

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node (UGC) Version 6.3 Outline www.robustpm.com A bit of history NGS technologies NGS applications De Novo RNA-seq Targeted


  1. Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node (UGC) Version 6.3

  2. Outline www.robustpm.com • A bit of history • NGS technologies • NGS applications – De Novo – RNA-seq – Targeted enrichment (hybridization & amplicon-Seq) • National Genomics Infrastructure – Sweden • Auxiliary technologies (10x Chromium, BioNano) • Sample prep for NGS

  3. What is sequencing? Phosphate group Fluorofor Proton https://figures.boundless-cdn.com

  4. Once upon a time… • Fredrik Sanger and Alan Coulson Chain Termination Sequencing (1977) Nobel prize 1980 Principle: SYNTHESIS of DNA is randomly TERMINATED at different points Separation of fragments that are 1 nucleotide different in size ! Lack of OH-group at 3’ position of deoxyribose 1 molecule sequenced at a time = 1 read Capillary sequencer: 384 reads per run

  5. 2006 REVOLUTION Thousands of molecules sequenced in parallel 1 mln reads sequenced per run Roche 454 GS FLX

  6. Technologies

  7. Differences between platforms • Technology: chemistry + signal detection • Run times vary from hours to days • Production range from Mb to Gb • Accuracy per base from 0.1% to 15% • Cost per base • Library construction Read length: from <100 bp to > 20 Kbp

  8. Read length 1000000 300000 100000 50000 10000 110 600 1 2 3 4 5 6 7

  9. Illumina Instrument Yield and run time Read Error Error Length rate type HiSeq2500 120 Gb – 600 Gb 100x100 0.1% Subst 27h or standard run (250x250) MiSeq 540 Mb – 15 Gb Up to 0.1% Subst (4 – 48 hours) 350x350 HiSeqXten 800 Gb - 1.8 Tb 150x150 “ “ (3 days) Main applications Whole genome, exome and targeted reseq • Transcriptome analyses • Methylome and ChiPSeq • Rapid targeted resequencing (MiSeq) • Human genome seq (Xten) •

  10. Illumina : bridge amplification • 200M fragments per lane • Bridge amplification • Ends with blocking of free 3’-ends and hybridisation of sequencing primer

  11. Illumina : ExAmp = black box Affected platforms : HiSeqXten, HiSeq 3000 and 4000, NovaSeq

  12. Ion Chip Yield - run Read time Length 314, 316, 0.1 – 1 Gb Gb, 200 – 400 318 ( PGM ) 3 hrs bp P-I 10 Gb 200 bp ( Proton ) 4 hrs 520, 530, 1 Gb – 10 Gb 200 - 600 bp 540 ( S5 ) 3 hrs (except 540) Main applications Microbial and metagenomic sequencing • Targeted re-sequencing (gene panels) • Clinical sequencing •

  13. Ion Torrent - H + ion-sensitive field effect transistors bead

  14. PacBio Instrument Yield/cell Read Length Error rate Error type and run time RS II 250 Mb – 1.8 Gb 250 bp – 30 kb 15 % Insertions, random (single pass) 30 - 600 min (78 kb) 0.0001% (circular consensus) SEQUEL 2-6 Gb 250 bp – 25 kb as RSII as RSII 30-600 min Single-Molecule, Real-Time DNA sequencing

  15. PacBio: SMRT - technology SMRT = Single Molecule Real Time

  16. SMRT sequencing: common misconceptions High error rate? Irrelevant, because errors are random Depending on coverage Examples: 8 Mb genome, 8 SNPs detected • 65 kb construct: 100% correct • sequence Detection of low frequency mutations • High price? Bioinfo-time to assemble short reads Not for small genomes Bioinfo-time to assemble Better assembly quality long reads Single-molecule reads without PCR-bias

  17. Oxford Nanopore MinION Reads up to 800k 10-15% error rate Life time 5 days

  18. Main types of equipment PacBio RSII Illumina HiSeq Ion Torrent PGM PacBio Sequel Illumina Xten Ion Proton Illumina MiSeq Ion S5 XL Ultra-long reads Short paired reads Short single-end reads FAST throughput HIGH throughput FAST throughput

  19. Applications

  20. NGS/MPS applications • Whole genome sequencing: – De novo sequencing – Re-sequencing • Transcriptome sequencing: – mRNA-se q – miRNA – Isoform discovery • Target re-sequencing – Exome – Large portions of a genome – Gene panels – Amplicons

  21. De novo sequencing • Used to create a reference genome without previous reference

  22. De novo vs re-sequencing ref De novo Re-seq No bias towards a reference Finding similarities to a reference No template to adapt to Easier to identify SNPs and minor events Fewer contigs Many contigs Novel events are lost Works best for large-scale events

  23. De novo – do it with long reads!

  24. Example: de novo PacBio; Crow Assembly results, FALCON Sequencing results Number of SMRT cells: 70 PRIMARY ALTERNATIVE N50 8.5 Mb 23 kb N75 3.9 Mb 18 kb Total bases per SMRT: 1.39 Gb Nr contigs 4375 2614 Longest contig 36 Mb 121 kb Total reads per SMRT: 106 833 Total length 1.09 Gb 45 Mb

  25. Re-sequencing Population studies: Illumina HiSeq is The Best England and Southern- Scotland Central Sweden Northern Sweden Italy Finland Spain

  26. Transcriptome sequencing (RNA-seq) TOTAL RNA mRNA Splice isoforms Dif.ex. • miRNA Non-codingRNA Annotation • Transcriptional regulation •

  27. mRNA: rRNA depletion vs polyA selection Method Pros Cons Recommended rRNA Captures on-going Does not get rid 20-40 mln reads • • depletion transcription of all rRNA (single or PE) Picks up non-coding Messy Dif.Ex. • • RNA profile polyA selection Gives a clean Dif.Ex. Does not pick 5-20 mln reads • • profile non-coding RNA Alternative for human RNA-seq: AmpliSeq Human Transcriptome panel : • faster, cheaper, works fine with FFPE input: 50 ng total RNA • dif.ex. ONLY •

  28. RNA-seq experimental setup • mRNA only: any kit • mRNA and miRNA: only specialized kits • Always use DNase! • RIN value above 8. • CONTROL vs experimental conditions • Biological replicates: 4 strongly recommended

  29. RNA-seq experimental setup PacBio Iso-seq : full-length transcriptome seq

  30. Targeted re-sequencing Suitable applications Approaches for target-seq - Hybridization capture - Metagenomics (Agilent, NimbleGen, MyBaits) - Resolving complex regions - PCR (Amplicon sequencing) - Low frequency mutations - Long-range - Human re-sequencing - Conventional - Clinical diagnostics - Multiplex - …. - Experimental: - TLA, Samplix, CRISPR-Cas9)

  31. Amplicon sequencing Example 1: tight peak, OK FOR ANY NGS TECHNOLOGY Size difference among fragments must not exceed 80 bp (or 20% in length) Reason – preferential amplification of short fragments Example 2: several sizes, Example 3: broad peak; fractionation is needed size selection is needed => we HAVE to make several libraries SIZE MATTERS…

  32. Size-related bias in amplicon-seq Courtesy Mikael Brandström Durling, Forest Mycology and Pathology, SLU

  33. Amplicon sequencing: Technologies FW read RW read Illumina MiSeq Paired-end reads Single-end reads Ion S5XL PacBio RSII Circular consensus reads

  34. Amplicon sequencing: Barcoding strategies Illumina and Ion PacBio USER NGI

  35. Main types of equipment & applications Illumina HiSeq Ion Torrent PGM NextSeq, X10, MiSeq, Ion Proton PacBio RSII MiniSeq, NovaSeq Ion S5 XL SEQUEL Short paired reads Short single-end reads Ultra-long reads HIGH throughput FAST throughput FAST throughput Human WGS mRNA and miRNA Long amplicons Re-sequencing 30x Exome Re-sequencing mRNA and miRNA ChIP-seq De novo sequencing De novo transcriptome Short amplicons Novel isoform discovery Exome Gene panels Fusion transcript analysis ChIP-seq Clinical samples Haplotype phasing Short amplicons Clinical samples Methylation

  36. But there is more!

  37. 10x Genomics (Chromium) Fragment length: 50 kb – 100+ Kb

  38. BioNano Genomics (Irys) Fragment length: 100 kb – 3 Mb

  39. SAMPLE QUALITY REQUIREMENTS 41

  40. Sample prep: take home message PCR-quality sample and NGS-quality sample are two completely different things

  41. Making an NGS library Sharing & size selection DNA QC – paramount importance Amplification Ligation of sequencing adaptors, technology specific

  42. NGS library DNA QC – paramount importance Sharing & size selection

  43. Library complexity Suboptimal sample Good sample (source: https://www.kapabiosystems.com)

  44. DNA quality requirements Some DNA left in the well Sharp band of 20+kb No sign of proteins No smear of degraded DNA No sign of RNA NanoDrop: Qubit or Picogreen: 260/280 = 1.8 – 2.0 10 kb insert libraries: 3-5 ug 260/230 = 2.0 – 2.2 20 kb insert libraries: 10-20 ug

  45. Example:

  46. What do absorption ratios tell us? Pure DNA 260/280: 1.8 – 2.0 < 1.8 : Too little DNA compared to other components of the solution; presence of organic contaminants: proteins and phenol; glycogen - absorb at 280 nm . > 2.0 : High share of RNA. Pure DNA 260/230: 2.0 – 2.2 <2.0 : Salt contamination, humic acids, peptides, aromatic compounds, polyphenols, urea, guanidine, thiocyanates (latter three are common kit components) – absorb at 230 nm . >2.2 : High share of RNA, very high share of phenol, high turbidity, dirty instrument, wrong blank. Photometrically active contaminants: phenol, polyphenols, EDTA, thiocyanate, protein, RNA, nucleotides (fragments below 5 bp)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend