Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted - PowerPoint PPT Presentation

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node (UGC) Version 6.3

Outline: www.robustpm.com • A bit of history • NGS technologies & sample prep • NGS applications • National Genomics Infrastructure – Sweden

What is sequencing? https://figures.boundless-cdn.com

Once upon a time… • Fredrik Sanger and Alan Coulson Chain Termination Sequencing (1977) Nobel prize 1980 Principle: SYNTHESIS of DNA is randomly TERMINATED at different points Separation of fragments that are 1 nucleotide different in size ! Lack of OH- group at 3’ position of deoxyribose 1 molecule sequenced at a time = 1 read Capillary sequencer: 384 reads per run

2006 REVOLUTION Thousands of molecules sequenced in parallel 1 mln reads sequenced per run Roche 454 GS FLX

Technologies

NGS technologies Company Platform Amplification Sequencing method Roche 454 (until 2016) emPCR Pyrosequencing Illumina HiSeq, MiSeq Bridge PCR Synthesis NextSeq, X10 LifeTechnologie Ion Torrent, emPCR Synthesis (pH) s(Thermo Ion Proton, S5 Fisher) Pacific RSII None Synthesis Biosciences SEQUEL (SMRT) Complete Nanoballs None Ligation genomics Oxford MinION None Flow Nanopore* GridION RIP technologies: Helicos, Polonator, SOLiD, 454 etc. In development: Tunneling currents, nanopores, etc.

Differences between platforms • Technology: chemistry + signal detection • Run times vary from hours to days • Production range from Mb to Gb • Read length from <100 bp to > 20 Kbp • Accuracy per base from 0.1% to 15% • Cost per base

Illumina Instrument Yield and run time Read Error Error Length rate type HiSeq2500 120 Gb – 600 Gb 100x100 0.1% Subst 27h or standard run (250x250) 540 Mb – 15 Gb MiSeq Up to 0.1% Subst (4 – 48 hours) 350x350 “ “ HiSeqXten 800 Gb - 1.8 Tb 150x150 (3 days) Main applications • Whole genome, exome and targeted reseq • Transcriptome analyses • Methylome and ChiPSeq • Rapid targeted resequencing (MiSeq) • Human genome seq (Xten)

Illumina : bridge amplification • 200M fragments per lane • Bridge amplification • Ends with blocking of free 3 ’ -ends and hybridisation of sequencing primer

Ion Torrent Chip Yield - run Read time Length 0.1 – 1 Gb Gb, 200 – 400 314, 316, 318 ( PGM ) 3 hrs bp P-I 10 Gb 200 bp ( Proton ) 4 hrs 1 Gb – 10 Gb 520, 530, 400 (600) bp 540 ( S5 ) 3 hrs (except 540) Main applications • Microbial and metagenomic sequencing • Targeted re-sequencing (gene panels) • Clinical sequencing

Ion Torrent - H + ion-sensitive field effect transistors

Ion PGM Ion S5XL 520 530 540 314 316 318 250 000 4 mln 9 mln 8 mln 15-20 mln 90 mln 400 bp 400 bp 400 bp 400 bp 400 bp 200 bp 100 Mb 500 Mb 1 Gb 1 Gb 5 Gb 10 Gb Ion Proton PI 90 mln 200 bp 10-18 Gb

PacBio SMRT-technology Instrument Yield and run Read Length Error Error time rate type 250 Mb – 1.3 Gb 250 bp – 30 kb RS II 15% Insertions /30 - 360 min , random (on a single (74 kb) passage!) SMRTCell 250 bp – 25 kb SEQUEL 2-6 Gb per as as RSII SMRT RSII 30-360 min Single-Molecule, Real-Time DNA sequencing

PacBio SMRT - technology Single Molecule Real Time

SMRT sequencing: common misconceptions High error rate? Irrelevant, because errors are random Depending on coverage Examples: • 8 Mb genome, 8 SNPs detected • 65 kb construct: 100% correct sequence • Detection of low frequency mutations High price? Bioinfo-time to assemble short reads Not for small genomes Bioinfo-time to assemble Better assembly quality long reads Single-molecule reads without PCR-bias

Oxford Nanopore MinION Reads up to 100k 1D and 2D reads 15-40% error rate Life time 5 days

Main types of equipment PacBio RSII Illumina HiSeq Ion Torrent PGM Illumina Xten Ion Proton Illumina MiSeq Ion S5 XL Ultra-long reads Short paired reads Short single-end reads FAST throughput HIGH throughput FAST throughput

Applications

NGS/MPS applications • Whole genome sequencing: – De novo sequencing – Re-sequencing • Transcriptome sequencing: – mRNA-se q – miRNA – Isoform discovery • Target re-sequencing – Exome – Large portions of a genome – Gene panels – Amplicons

De novo sequencing • Used to create a reference genome without previous reference

De novo vs re-sequencing ref De novo Re-seq No bias towards a reference Finding similarities to a reference No template to adapt to Easier to identify SNPs and minor events Fewer contigs Many contigs Novel events are lost Works best for large-scale events

De novo sequencing: Illumina strategy PacBio strategy Sequencing: Sequencing: • PE library with 350 bp • 10-20 kb library • PE library with 600 bp 50-80x • MP library with 2 kb (where 30x are reads above 10 kb) • MP library with 5-8-20 kb PE: 50-100x, MP 10-15x Analysis: Analysis: • ALLPATH • HGAP (haploid) • FALCON (diploid)

Example: de novo PacBio; Crow Assembly results, FALCON Sequencing results PRIMARY Number of SMRT cells: 70 ALTERNATIVE N50 8.5 Mb 23 kb N75 3.9 Mb 18 kb Total bases per SMRT: 1.39 Gb Nr contigs 4375 2614 Longest contig 36 Mb 121 kb Total reads per SMRT: 106 833 Total length 1.09 Gb 45 Mb

Transcriptome sequencing (RNA-seq) TOTAL RNA mRNA Splice isoforms • Dif.ex. • miRNA Non-codingRNA Annotation • Transcriptional regulation

mRNA: rRNA depletion vs polyA selection Method Pros Cons Recommended • • rRNA Captures on-going Does not get rid 20-40 mln reads depletion transcription of all rRNA (single or PE) • • Picks up non-coding Messy Dif.Ex. RNA profile polyA selection • • Gives a clean Dif.Ex. Does not pick 5-20 mln reads profile non-coding RNA Alternative for human RNA-seq: AmpliSeq Human Transcriptome panel : • faster, cheaper, works fine with FFPE • input: 50 ng total RNA • dif.ex. ONLY

RNA-seq experimental setup • mRNA only: any kit • mRNA and miRNA: only specialized kits • Always use DNase! • RIN value above 8. • CONTROL vs experimental conditions • Biological replicates: 4 strongly recommended

RNA-seq experimental setup PacBio Iso-seq : full-length transcriptome seq

Amplicon sequencing Used a lot in metagenomics • Community analysis – rRNA genes & spacers (16S, ITS) – Functional genes • Genotyping by sequencing

Amplicon sequencing Example 1: tight peak, OK FOR ANY NGS TECHNOLOGY Size difference among fragments must not exceed 80 bp (or 20% in length) Reason – preferential amplification of short fragments Example 2: several sizes, Example 3: broad peak; fractionation is needed size selection is needed => we HAVE to make several libraries SIZE MATTERS…

Size-related bias in amplicon-seq Courtesy Mikael Brandström Durling, Forest Mycology and Pathology, SLU

When you sequence an amplicon … On MiSeq FW read RW read On Ion FW read

Main types of equipment & applications Illumina HiSeq Ion Torrent PGM NextSeq, X10, MiSeq, Ion Proton PacBio RSII MiniSeq, NovaSeq Ion S5 XL SEQUEL Short paired reads Short single-end reads Ultra-long reads HIGH throughput FAST throughput FAST throughput Human WGS mRNA and miRNA Long amplicons Re-sequencing 30x Exome Re-sequencing mRNA and miRNA ChIP-seq De novo sequencing De novo transcriptome Short amplicons Novel isoform discovery Exome Gene panels Fusion transcript analysis ChIP-seq Clinical samples Haplotype phasing Short amplicons Clinical samples Methylation

Other technologies for scaffolding of genomes 10x Chromium -> Illumina sequencing BioNano Irys, optical mapping

What is “The BEST”?

SAMPLE QUALITY REQUIREMENTS 37

Sample prep: take home message PCR-quality sample and NGS-quality sample are two completely different things

Making an NGS library Sharing & size selection DNA QC – paramount importance Amplification Ligation of sequencing adaptors, technology specific

Library complexity Suboptimal sample Good sample (source: https://www.kapabiosystems.com)

DNA quality requirements Some DNA left in the well Sharp band of 20+kb No sign of proteins No smear of degraded DNA No sign of RNA NanoDrop: Qubit or Picogreen: 260/280 = 1.8 – 2.0 10 kb insert libraries: 3-5 ug 260/230 = 2.0 – 2.2 20 kb insert libraries: 10-20 ug

Example:

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted - PowerPoint PPT Presentation

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node (UGC) Version 6.3 Outline: www.robustpm.com A bit of history NGS technologies & sample prep NGS applications National Genomics

Nina Norgren, NBIS Gteborg, May 2019 Slides adapted from: Olga Vinnere Pettersson, PhD

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node

Monitoring and modeling of phytoplankton and marine primary production Lasse H. Pettersson,

Olga Vinnere Pe,ersson, PhD Na3onal Genomics Infrastructure hosted by ScilifeLab, Uppsala Node

INFO MEETING SPEEDGROUP March 6th 2015 Helsinki sa Kinnemar Tomas Pettersson Tomas Pettersson

NLP for Historical (or Very Modern) Text Eva Pettersson eva.pettersson@lingfil.uu.se 2017-08-30

scRNA-seq Differential expression analyses Olga Dethlefsen olga.dethlefsen@nbis.se NBIS,

Emergenta system C-kurs, 5 pong, HT-05 Jonny Pettersson jonny@cs.umu.se 1/11 - 05 Emergent

Statistical Network Analysis Olga Klopp MODALX, Universit e Paris Nanterre - CREST, ENSAE

URINE AND FECES METABOLOMICS-BASED ANALYSIS OF CAROB TREATED RATS Olga Begou 1 , Olga Deda 1 ,

Computing Travelling Flexural-Gravity Waves Olga Trichtchenko ICERM olga.trichtchenko@gmail.com

ON MATRIX D -STABILITY AND RELATED PROPERTIES Olga Kushel Shanghai Jiao Tong University, China

Imagine a world in which every single human being can freely share in the sum of all knowledge.

Event Evaluation & the Event Compas s Robert Pettersson ETOUR, Mid Sweden University Why

Sierk de Jong, Ric Hoefnagels, Elisabeth Wetterlund, Karin Pettersson & Martin Junginger

Slide 1 / 48 Slide 2 / 48 1 According to the Arrhenius concept, an acid is a 2 A

Tobias G ocke yo 2. A Together with C. S. Fischer (JLU) and R. Williams (Complutense, Madrid)

A g i l e i s t h e l a s t t h i n g y o u n e e d . A g i l e A u s t r a l i a , 2 0 1 8

Access Project 2 nd Access TF meeting 18 December 2017 Introduction Agenda Task Timing

ATI TEAS SCIENCE REVIEW CHEMISTRY ATI TEAS SCIENCE CHEMISTRY Questions related to chemistry

Apto Aut Morior Adapt or Die Jim Craft, SES Deputy Director Information Enterprise Management

Water Radiolysis Issues T. Sekiguchi 2019. 10. 24 Contents 2 Water radiolysis Hydrogen

Field Indicators for TDS Prediction from Appalachian Mine Spoils Daniel Johnson, W. Lee Daniels

Sambuz

Useful Links

Newsletter

Mail Us

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted - PowerPoint PPT Presentation

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node (UGC) Version 6.3 Outline: www.robustpm.com A bit of history NGS technologies & sample prep NGS applications National Genomics

Nina Norgren, NBIS Gteborg, May 2019 Slides adapted from: Olga Vinnere Pettersson, PhD

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node

Monitoring and modeling of phytoplankton and marine primary production Lasse H. Pettersson,

Olga Vinnere Pe,ersson, PhD Na3onal Genomics Infrastructure hosted by ScilifeLab, Uppsala Node

INFO MEETING SPEEDGROUP March 6th 2015 Helsinki sa Kinnemar Tomas Pettersson Tomas Pettersson

NLP for Historical (or Very Modern) Text Eva Pettersson eva.pettersson@lingfil.uu.se 2017-08-30

scRNA-seq Differential expression analyses Olga Dethlefsen olga.dethlefsen@nbis.se NBIS,

Emergenta system C-kurs, 5 pong, HT-05 Jonny Pettersson jonny@cs.umu.se 1/11 - 05 Emergent

Statistical Network Analysis Olga Klopp MODALX, Universit e Paris Nanterre - CREST, ENSAE

URINE AND FECES METABOLOMICS-BASED ANALYSIS OF CAROB TREATED RATS Olga Begou 1 , Olga Deda 1 ,

Computing Travelling Flexural-Gravity Waves Olga Trichtchenko ICERM olga.trichtchenko@gmail.com

ON MATRIX D -STABILITY AND RELATED PROPERTIES Olga Kushel Shanghai Jiao Tong University, China

Imagine a world in which every single human being can freely share in the sum of all knowledge.

Event Evaluation &amp; the Event Compas s Robert Pettersson ETOUR, Mid Sweden University Why

Sierk de Jong, Ric Hoefnagels, Elisabeth Wetterlund, Karin Pettersson &amp; Martin Junginger

Slide 1 / 48 Slide 2 / 48 1 According to the Arrhenius concept, an acid is a 2 A

Tobias G ocke yo 2. A Together with C. S. Fischer (JLU) and R. Williams (Complutense, Madrid)

A g i l e i s t h e l a s t t h i n g y o u n e e d . A g i l e A u s t r a l i a , 2 0 1 8

Access Project 2 nd Access TF meeting 18 December 2017 Introduction Agenda Task Timing

ATI TEAS SCIENCE REVIEW CHEMISTRY ATI TEAS SCIENCE CHEMISTRY Questions related to chemistry

Apto Aut Morior Adapt or Die Jim Craft, SES Deputy Director Information Enterprise Management

Water Radiolysis Issues T. Sekiguchi 2019. 10. 24 Contents 2 Water radiolysis Hydrogen

Field Indicators for TDS Prediction from Appalachian Mine Spoils Daniel Johnson, W. Lee Daniels

Sambuz

Useful Links

Newsletter

Mail Us

Event Evaluation & the Event Compas s Robert Pettersson ETOUR, Mid Sweden University Why

Sierk de Jong, Ric Hoefnagels, Elisabeth Wetterlund, Karin Pettersson & Martin Junginger