Next Generation Sequencing The basics Wilfred van IJcken Erasmus - - PowerPoint PPT Presentation

next generation sequencing the basics
SMART_READER_LITE
LIVE PREVIEW

Next Generation Sequencing The basics Wilfred van IJcken Erasmus - - PowerPoint PPT Presentation

Center for Biomics Next Generation Sequencing The basics Wilfred van IJcken Erasmus MC Center for Biomics Biomedical Research Techniques (XVIth ed.), Nov 6 Learning objectives Next generation sequencing (NGS): The basics Background


slide-1
SLIDE 1

Next Generation Sequencing The basics

Wilfred van IJcken Erasmus MC Center for Biomics Biomedical Research Techniques (XVIth ed.), Nov 6

Center for Biomics

slide-2
SLIDE 2

Learning objectives

Next generation sequencing (NGS): The basics  Background  Illumina sequencing technology  Terminology Next presentation  Research applications  Diagnostic applications  Future directions

slide-3
SLIDE 3

What is next generation sequencing?

  • Sequencing technology developed after Sanger
  • Millions of reads in parallel (MPS)
  • Shorter (<400bp) sequencing reads
  • Enables analysis of complex mixtures of DNA or RNA
  • Enables genome wide approach
  • Different vendors with different approaches
  • MPS = massive parallel sequencing
slide-4
SLIDE 4

NGS systems on the market High Throughput Special Desktop

Different characteristics

Sequencing technology Readlength Speed Output Applications Run cost

slide-5
SLIDE 5

Illumina systems

  • 6 Tb per run

MiSeq HiSeq 2500 NextSeq 500 HiSeq X Ten

Data amount Purchase cost

HiSeq 4000 MiniSeq

8 Gb

Run costs

NovaSeq6000

slide-6
SLIDE 6

NGS flow

Isolate Library Sequence Intake Report

ID amount sex disease DNA or RNA blood plasma saliva FFPE cells Select region of interest PCR capture chemistry enzymes detection signal yield quality Variation Match phenotype?

slide-7
SLIDE 7

DNA library prep

slide-8
SLIDE 8

Sequencing by Synthesis cluster generation

lane flowcell

slide-9
SLIDE 9

Bridge amplification

slide-10
SLIDE 10

Sequencing

incorporated

slide-11
SLIDE 11

Sequencing and basecalling Read 1

Base calling 1 2 3 7 8 9 4 5 6 Image acquisition

C A A G T A A C …

A T G C

slide-12
SLIDE 12

SingIe-end, paired end, index read

Single Read Paired end read

GATCG

Index read Single read = sequence from one side of the fragment Paired end = sequence from both sides of the fragment

slide-13
SLIDE 13

Indexing enables sample multiplexing

Index = different nucleic acid code per sample  introduced during sampleprep  read during index read Enables multiple samples in one flowcell lane

GATCG

Index

CGTGA ATCGG TCTCT

Patient 1 Patient 2 Patient 3 Patient 4

slide-14
SLIDE 14

Sequence Index 1

slide-15
SLIDE 15

Sequence Index 2

slide-16
SLIDE 16

Sequence Read 2

1 2 3 7 8 9 4 5 6 Image acquisition

C A A G T A A C …

slide-17
SLIDE 17

Summary sequencing technology Read 1 Index 1 Read 2 Index 2

slide-18
SLIDE 18

Simplified RNA sample preparation

DNA Reverse transcriptase RNA Adaptor 1 Adaptor 2

slide-19
SLIDE 19

Output file from basecalling

  • Many file types: qseq, fastq, etc…
  • Each system own format.
  • Large file sizes: >400 million reads per lane

ASCII Character Q-score PF (0,1) Sequence Instrument Run ID Lane Tile X-coord Y-coord Index # Read #

C A A G T A A C …

slide-20
SLIDE 20

Data analysis not trivial due to data volumes and complexity

Data Volume Total Final Comment

HiSeq 2000 200G run Image Data 32 TB Intensity Data 2 TB Optionally transferred Base Call / Quality Score Data 0.25 TB 0.25 TB 1 byte/base (raw) assuming qseq generation offline Alignment Output 6 TB (3 TB) 1.2 TB Remove intermediate files GAIIx 50G run Image Data 6.9 TB Optionally transferred Intensity Data 0.93 TB 0.93 TB Base Call / Quality Score Data 0.17 TB 0.17 TB Alignment Output 1.2 TB 1.2 TB

150 M reads x 8 lanes x 100 bp x 2 (paired end) = 240 Gbp Storage and compute needed Core facilities

slide-21
SLIDE 21

Terminology

  • Next generation sequencing, AKA:
  • - Deep sequencing
  • - MPS = massive parallel sequencing

1 2 3 7 8 9 4 5 6

T G C T A C G A T …

Read Cluster # of sequencing cycles = readlength

slide-22
SLIDE 22

Alignment, Mapping

AAAACGCGCTTAGCCTTTTTTCGACTGTCGAGTGGAACGCCGCTAGCTAGGCGC AAAACGCGCTTAGCCTTTTTTCGACTGTCGAGTGGATCGCCGCTAGCTAGGCGC TAGCCTTTTTTCGACTGTCGAGTGGATCGCCG AGCCTTTTTTCGACTGTCGAGTGGATCGCCGC GCCTTTGTTCGACTGTCGAGTGGATCGCCGCT CCTTTGTTCGACTGTCGAGTGGATCGCCGCTA

Consensus sequence Reference sequence Heterozygous SNP mismatch

slide-23
SLIDE 23

Read depth

  • Average read depth can differ a lot from read depth !

AAAACGCGCTTAGCCTTTTTTCGACTGTCGAGTGGATCGCCGCTAGCTAGGCGC TAGCCTTTTTTCGACTGTCGAGTGGATCGCCG AGCCTTTTTTCGACTGTCGAGTGGATCGCCGC GCCTTTGTTCGACTGTCGAGTGGATCGCCGCT CCTTTGTTCGACTGTCGAGTGGATCGCCGCTA GACTGTCGAGTGGATCGCCGCTAGCTAGG CTGTCGAGTGGATCGCCGCTAGCTAGG 5 7 1

Aka depth of coverage

slide-24
SLIDE 24

Accuracy, error rate, quality score

  • Single base error rate =
  • Total number of mismatched bases found in mapped sequence reads

from a sequencing run, divided by the mappable yield.

  • Quality scores (Q scores / phred scores)
  • - derived from an examination of the intensity peaks around each base
  • - range from 0 – 41, higher corresponds to higher quality
  • - Q = -10log10 p, p is basecall error probability

Quality score Probability of incorrect base call Base call accuracy 10 (Q10) 1 in 10 90% 20 (Q20) 1 in 100 99% 30 (Q30) 1 in 1000 99.9%

slide-25
SLIDE 25

Traditional vs NextGen Sequencing

1 sequence read per basepair Multiple sequence reads per basepair Sanger sequencing: NGS:

slide-26
SLIDE 26

LNA Genomics core facility at ErasmusMC www.biomics.nl w.vanijcken@erasmusmc.nl

Erasmus Center for Biomics