[PPT] - Next Generation Sequencing in Molecular Diagnostics Wilfred van PowerPoint Presentation

SLIDE 1

Next Generation Sequencing in Molecular Diagnostics

Wilfred van IJcken, PhD Erasmus MC Center for Biomics Nov 2 2017 Molecular Diagnostics Course XI

Center for Biomics

SLIDE 2

Learning objectives

Next generation sequencing (NGS):  The basics  Illumina sequencing technology  Terminology  Enrichment technology Clinical applications  Targeted gene panels vs exome vs whole genome  NIPT  Future directions

SLIDE 3

Next next next generation sequencing…

1st generation sequencing technique: amplified multiple molecule seq
Sanger sequencing
2nd generation sequencing techniques: amplified single molecule seq
454 sequencing - Roche
SBS sequencing - Illumina
Solid sequencing - Applied biosystems/Life technologies
Ion Torrent - Life technologies
3rd generation sequencing techniques: Single molecule seq
Helicos tSMS
PacBio SMRT (real time DNA seq)
NanoPore Technologies

SLIDE 4

NGS systems on the market High Throughput Special Desktop

SLIDE 5

Sequence technology dynamics High Throughput Special Desktop

SLIDE 6

What is next generation sequencing?

Sequencing technology developed after Sanger
Millions of reads in parallel (MPS)
Shorter (<400bp) sequencing reads
Enables analysis of complex mixtures of DNA or RNA
Enables genome wide approach
Different vendors with different approaches
MPS = massive parallel sequencing

SLIDE 7

NGS flow

Isolate Library Sequence Intake Report

ID amount sex disease DNA or RNA blood plasma saliva FFPE cells Select region of interest PCR capture chemistry enzymes detection signal yield quality Variation Match phenotype?

SLIDE 8

Illumina systems

6 Tb per run

MiSeq HiSeq 2500 NextSeq 500 HiSeq X Ten

Data amount Purchase cost

HiSeq 4000 MiniSeq

8 Gb

Run costs

NovaSeq6000

SLIDE 9

Simplified sample preparation

DNA Reverse transcriptase RNA Adaptor 1 Adaptor 2

SLIDE 10

Bridge amplification

lane each DNA molecule hybridizes at different location in flowcell lane

SLIDE 11

Clustering and Sequencing

Cluster growth

5’ 5’ 3’

G T C A G T C A G T C A C A G T C A T C A C C T A G C G T A G T

1 2 3 7 8 9 4 5 6 Image acquisition Base calling

T G C T A C G A T …

Sequencing each base has a different fluorescent dye coupled

SLIDE 12

Output file from basecalling

Many file types: qseq, fastq, etc…
Each system own format.
Large file sizes: ~150 million reads per lane

ASCII Character Q-score PF (0,1) Sequence Instrument Run ID Lane Tile X-coord Y-coord Index # Read #

SLIDE 13

Data analysis not trivial due to data volumes and complexity

Data Volume Total Final Comment

HiSeq 2000 200G run Image Data 32 TB Intensity Data 2 TB Optionally transferred Base Call / Quality Score Data 0.25 TB 0.25 TB 1 byte/base (raw) assuming qseq generation offline Alignment Output 6 TB (3 TB) 1.2 TB Remove intermediate files GAIIx 50G run Image Data 6.9 TB Optionally transferred Intensity Data 0.93 TB 0.93 TB Base Call / Quality Score Data 0.17 TB 0.17 TB Alignment Output 1.2 TB 1.2 TB

Need data storage and compute to handle up to penta bytes of data Core facilities needed

SLIDE 14

Terminology

Next generation sequencing, AKA:
- Deep sequencing
- MPS = massive parallel sequencing

1 2 3 7 8 9 4 5 6

T G C T A C G A T …

Read Cluster # of sequencing cycles = readlength

SLIDE 15

SingIe-end, paired end, index read

Single Read Paired end read

GATCG

Index read Single read = sequence from one side of the fragment Paired end = sequence from both sides of the fragment

SLIDE 16

Indexing enables sample multiplexing

Index = different nucleic acid code per sample  introduced during sampleprep  read during index read Enables multiple samples in one flowcell lane

GATCG

Index

CGTGA ATCGG TCTCT

Patient 1 Patient 2 Patient 3 Patient 4

SLIDE 17

Alignment, Mapping

AAAACGCGCTTAGCCTTTTTTCGACTGTCGAGTGGAACGCCGCTAGCTAGGCGC AAAACGCGCTTAGCCTTTTTTCGACTGTCGAGTGGATCGCCGCTAGCTAGGCGC TAGCCTTTTTTCGACTGTCGAGTGGATCGCCG AGCCTTTTTTCGACTGTCGAGTGGATCGCCGC GCCTTTGTTCGACTGTCGAGTGGATCGCCGCT CCTTTGTTCGACTGTCGAGTGGATCGCCGCTA

Consensus sequence Reference sequence Heterozygous SNP mismatch

SLIDE 18

Read depth

Average read depth can differ a lot from read depth !

AAAACGCGCTTAGCCTTTTTTCGACTGTCGAGTGGATCGCCGCTAGCTAGGCGC TAGCCTTTTTTCGACTGTCGAGTGGATCGCCG AGCCTTTTTTCGACTGTCGAGTGGATCGCCGC GCCTTTGTTCGACTGTCGAGTGGATCGCCGCT CCTTTGTTCGACTGTCGAGTGGATCGCCGCTA GACTGTCGAGTGGATCGCCGCTAGCTAGG CTGTCGAGTGGATCGCCGCTAGCTAGG 5 7 1

Aka depth of coverage

SLIDE 19

Accuracy, error rate, quality score

Single base error rate =
Total number of mismatched bases found in mapped sequence reads

from a sequencing run, divided by the mappable yield

Quality scores (Q scores / phred scores)
- derived from an examination of the intensity peaks around each base
- range from 0 – 41, higher corresponds to higher quality
- Q = -10log10 p, p is basecall error probability

Quality score Probability of incorrect base call Base call accuracy 10 (Q10) 1 in 10 90% 20 (Q20) 1 in 100 99% 30 (Q30) 1 in 1000 99.9%

SLIDE 20

NGS systems on the market High Throughput Special Desktop

Different characteristics

Sequencing technology Readlength Speed Output Applications Run cost

SLIDE 21

NGS Applications

whole genome De novo sequencing Epigenetic profiling (DNA methylation) Gene expression analysis Discovery of novel transcripts, splice variants, miRNAs Protein-DNA/RNA interactions (ChIPSeq) genomic DNA interactions (3C, 4C, 5C Seq) Targeted DNA sequencing Exome Sequencing Whole genome re-sequencing

Clinical use

SLIDE 22

Diagnostic applications

Targetted sequencing

Cardio Myopathies, Ciliopathies, Cancer hotspot panel, Noonan, Neurodegenerative diseases, …

Exome sequencing

Unknown disease, de novo

Whole genome sequencing

Unknown disease, non-exonic

Non invasive diagnostics

prenatal plasma, T21 testing (NIPT)

Cancer sequencing

germline mutations, therapy

HLA typing

transplantation

SLIDE 23

Enrichment technology

Exome = all coding regions (~ exons) of genome

SLIDE 24

Choose your baits

Agilent, Nimblegen (Roche), Illumina, IDT, …

exome, panel or other targets

CRE: boosted coverage for ~5000 clinically relevant genes

Exome performance
Target coverage

>20X coverage for 95% of genes

Even coverage

read depth distribution

Specificity of capture

gene False pos / neg variants High homology genes

V4 CRE halo

SLIDE 25

Mapping Coverage Sanger + Copy number Variants + Filtering Exome data analysis overview

Exome depth
Mapping %, on/off target
% >20x, min, max, bases not sequenced
bases <20x add Sanger amplicons
low frequency variants + indel

GATK: SNP + indel Annotation >100 databases, function

Inheritance

Dominant, recessive, etc

SLIDE 26

Quality

High throughput
ISO 15189/17025 accredition needed for clinical use in NL
Sample swap is a real possibility
Spike-in to uniquely identify each sample after sequencing

A1 B1 C1

Shear Capture Sequencing QC QC Spike-in

SLIDE 27

How does targetted sequencing result look?

SLIDE 28

Zoom in sequence result

SLIDE 29

Variation is not only SNP

GATTTAGATCGCGATAGAG GATTTAGATCTCGATAGAG ~0.1% of the genomes of any two individuals differ due to SNPs

Structural variants (SVs),

[e.g. kb-Mb-sized deletions, insertions, inversions, fusion genes]

presumably >0.1% of the genome GATT------------GAG GATTTAGATCTCGATAGAG

Short InDels

More difficult to detect than SNPs

SNPs

SLIDE 30

Recent Case report

2005: 5 weeks old girl hospitalized RS virus with artificial respiration 2008: Developmental delay maybe due to braindamage by hypoxia 2011: Re-evaluation clinical geneticist: possibly Sotos syndrome SNParray, Sanger NSD1, PTEN, AOA, fraX, metabolism: Negative 2015: Re-evaluation: speech affected. WES trio filter for ID genes de novo c.1216C>T, p.Gln406* mutation MECP2

> atypical form of RETT syndrome

2016: RETT specialist: 5 other girls found with atypical RETT syndrome with c-terminal frame shift mutations in MECP2 (unpublished) WES helps to solve previously unsolved cases Evidence increasing to use WES as first tier care

SLIDE 31

Human and disease, what to sequence?

Most mendelian diseases are caused by exome mutations
Exome is only ~1.6 % of human genome (50Mbp)

Panel Exome Whole genome

Genome >0,01% 1,6 % 95 % Sequencing 1/400x 1x 60x Interpretation ++ + + / - Validation ++ + + / - Speed ++ +

Cost (est.)

€ 500 € 700 € 3000

SLIDE 32

Whole genome sequencing

X Ten $1000 genome 30x Outsource $1000 genome 40x

?

SLIDE 33

Comparision of exome and genome sequencing

SLIDE 34

Non invasive trisomy testing (NIPT)

DNA isolation Prepare NGS Analysis Trisomy Report

10 weeks pregnancy 5% fetal DNA

SLIDE 35

NIPT: determine fetal chromosomal copy number

Fetal cfDNA Maternal cfDNA

Fetal Trisomy Euploid Pregnancy Chr 21 Chr 21

SLIDE 36

Future of NGS

SLIDE 37

MinION

USB sized sequencer
One time use
$ 900 dollar
500 nanopores
> 1 Gbp
User defined runtime
Lifetime electrodes is limiting

(days)

No sample prep Measure directly from blood

SLIDE 38

Longer readlength e.g Sequel

Enables first pass de novo assembly, phasing, epigenetics
Single molecule real time (SMRT) technology
1M ZMWs, 500 Mb – 1 Gb
No amplification bias
Readlength max 60 kbp
Accuracy high due to multiple sequencing
Epigenetic characterization

Pathogens Mobile elements Plant/animal Polyploidy genomes Phased chromosomes Missing heritability

SLIDE 39

Personalized medicine

consent, ethical, juridical, psychological, reimbursement me

Doctor here is my genome and variation…

Short term Become routine use for cancer Long term Routine workup of child like heel stick Every genome sequenced Baseline for health Genome gives knowledge of susceptibility Need to combined with conventional info

SLIDE 40