Next Generation Sequencing in Molecular Diagnostics Wilfred van - - PowerPoint PPT Presentation

next generation sequencing in molecular diagnostics
SMART_READER_LITE
LIVE PREVIEW

Next Generation Sequencing in Molecular Diagnostics Wilfred van - - PowerPoint PPT Presentation

Center for Biomics Next Generation Sequencing in Molecular Diagnostics Wilfred van IJcken, PhD Erasmus MC Center for Biomics Nov 2 2017 Molecular Diagnostics Course XI Learning objectives Next generation sequencing (NGS): The basics


slide-1
SLIDE 1

Next Generation Sequencing in Molecular Diagnostics

Wilfred van IJcken, PhD Erasmus MC Center for Biomics Nov 2 2017 Molecular Diagnostics Course XI

Center for Biomics

slide-2
SLIDE 2

Learning objectives

Next generation sequencing (NGS):  The basics  Illumina sequencing technology  Terminology  Enrichment technology Clinical applications  Targeted gene panels vs exome vs whole genome  NIPT  Future directions

slide-3
SLIDE 3

Next next next generation sequencing…

  • 1st generation sequencing technique: amplified multiple molecule seq
  • Sanger sequencing
  • 2nd generation sequencing techniques: amplified single molecule seq
  • 454 sequencing - Roche
  • SBS sequencing - Illumina
  • Solid sequencing - Applied biosystems/Life technologies
  • Ion Torrent - Life technologies
  • 3rd generation sequencing techniques: Single molecule seq
  • Helicos tSMS
  • PacBio SMRT (real time DNA seq)
  • NanoPore Technologies
slide-4
SLIDE 4

NGS systems on the market High Throughput Special Desktop

slide-5
SLIDE 5

Sequence technology dynamics High Throughput Special Desktop

slide-6
SLIDE 6

What is next generation sequencing?

  • Sequencing technology developed after Sanger
  • Millions of reads in parallel (MPS)
  • Shorter (<400bp) sequencing reads
  • Enables analysis of complex mixtures of DNA or RNA
  • Enables genome wide approach
  • Different vendors with different approaches
  • MPS = massive parallel sequencing
slide-7
SLIDE 7

NGS flow

Isolate Library Sequence Intake Report

ID amount sex disease DNA or RNA blood plasma saliva FFPE cells Select region of interest PCR capture chemistry enzymes detection signal yield quality Variation Match phenotype?

slide-8
SLIDE 8

Illumina systems

  • 6 Tb per run

MiSeq HiSeq 2500 NextSeq 500 HiSeq X Ten

Data amount Purchase cost

HiSeq 4000 MiniSeq

8 Gb

Run costs

NovaSeq6000

slide-9
SLIDE 9

Simplified sample preparation

DNA Reverse transcriptase RNA Adaptor 1 Adaptor 2

slide-10
SLIDE 10

Bridge amplification

lane each DNA molecule hybridizes at different location in flowcell lane

slide-11
SLIDE 11

Clustering and Sequencing

Cluster growth

5’ 5’ 3’

G T C A G T C A G T C A C A G T C A T C A C C T A G C G T A G T

1 2 3 7 8 9 4 5 6 Image acquisition Base calling

T G C T A C G A T …

Sequencing each base has a different fluorescent dye coupled

slide-12
SLIDE 12

Output file from basecalling

  • Many file types: qseq, fastq, etc…
  • Each system own format.
  • Large file sizes: ~150 million reads per lane

ASCII Character Q-score PF (0,1) Sequence Instrument Run ID Lane Tile X-coord Y-coord Index # Read #

slide-13
SLIDE 13

Data analysis not trivial due to data volumes and complexity

Data Volume Total Final Comment

HiSeq 2000 200G run Image Data 32 TB Intensity Data 2 TB Optionally transferred Base Call / Quality Score Data 0.25 TB 0.25 TB 1 byte/base (raw) assuming qseq generation offline Alignment Output 6 TB (3 TB) 1.2 TB Remove intermediate files GAIIx 50G run Image Data 6.9 TB Optionally transferred Intensity Data 0.93 TB 0.93 TB Base Call / Quality Score Data 0.17 TB 0.17 TB Alignment Output 1.2 TB 1.2 TB

Need data storage and compute to handle up to penta bytes of data Core facilities needed

slide-14
SLIDE 14

Terminology

  • Next generation sequencing, AKA:
  • - Deep sequencing
  • - MPS = massive parallel sequencing

1 2 3 7 8 9 4 5 6

T G C T A C G A T …

Read Cluster # of sequencing cycles = readlength

slide-15
SLIDE 15

SingIe-end, paired end, index read

Single Read Paired end read

GATCG

Index read Single read = sequence from one side of the fragment Paired end = sequence from both sides of the fragment

slide-16
SLIDE 16

Indexing enables sample multiplexing

Index = different nucleic acid code per sample  introduced during sampleprep  read during index read Enables multiple samples in one flowcell lane

GATCG

Index

CGTGA ATCGG TCTCT

Patient 1 Patient 2 Patient 3 Patient 4

slide-17
SLIDE 17

Alignment, Mapping

AAAACGCGCTTAGCCTTTTTTCGACTGTCGAGTGGAACGCCGCTAGCTAGGCGC AAAACGCGCTTAGCCTTTTTTCGACTGTCGAGTGGATCGCCGCTAGCTAGGCGC TAGCCTTTTTTCGACTGTCGAGTGGATCGCCG AGCCTTTTTTCGACTGTCGAGTGGATCGCCGC GCCTTTGTTCGACTGTCGAGTGGATCGCCGCT CCTTTGTTCGACTGTCGAGTGGATCGCCGCTA

Consensus sequence Reference sequence Heterozygous SNP mismatch

slide-18
SLIDE 18

Read depth

  • Average read depth can differ a lot from read depth !

AAAACGCGCTTAGCCTTTTTTCGACTGTCGAGTGGATCGCCGCTAGCTAGGCGC TAGCCTTTTTTCGACTGTCGAGTGGATCGCCG AGCCTTTTTTCGACTGTCGAGTGGATCGCCGC GCCTTTGTTCGACTGTCGAGTGGATCGCCGCT CCTTTGTTCGACTGTCGAGTGGATCGCCGCTA GACTGTCGAGTGGATCGCCGCTAGCTAGG CTGTCGAGTGGATCGCCGCTAGCTAGG 5 7 1

Aka depth of coverage

slide-19
SLIDE 19

Accuracy, error rate, quality score

  • Single base error rate =
  • Total number of mismatched bases found in mapped sequence reads

from a sequencing run, divided by the mappable yield

  • Quality scores (Q scores / phred scores)
  • - derived from an examination of the intensity peaks around each base
  • - range from 0 – 41, higher corresponds to higher quality
  • - Q = -10log10 p, p is basecall error probability

Quality score Probability of incorrect base call Base call accuracy 10 (Q10) 1 in 10 90% 20 (Q20) 1 in 100 99% 30 (Q30) 1 in 1000 99.9%

slide-20
SLIDE 20

NGS systems on the market High Throughput Special Desktop

Different characteristics

Sequencing technology Readlength Speed Output Applications Run cost

slide-21
SLIDE 21

NGS Applications

whole genome De novo sequencing Epigenetic profiling (DNA methylation) Gene expression analysis Discovery of novel transcripts, splice variants, miRNAs Protein-DNA/RNA interactions (ChIPSeq) genomic DNA interactions (3C, 4C, 5C Seq) Targeted DNA sequencing Exome Sequencing Whole genome re-sequencing

Clinical use

slide-22
SLIDE 22

Diagnostic applications

  • Targetted sequencing

Cardio Myopathies, Ciliopathies, Cancer hotspot panel, Noonan, Neurodegenerative diseases, …

  • Exome sequencing

Unknown disease, de novo

  • Whole genome sequencing

Unknown disease, non-exonic

  • Non invasive diagnostics

prenatal plasma, T21 testing (NIPT)

  • Cancer sequencing

germline mutations, therapy

  • HLA typing

transplantation

slide-23
SLIDE 23

Enrichment technology

Exome = all coding regions (~ exons) of genome

slide-24
SLIDE 24

Choose your baits

  • Agilent, Nimblegen (Roche), Illumina, IDT, …

exome, panel or other targets

CRE: boosted coverage for ~5000 clinically relevant genes

  • Exome performance
  • Target coverage

>20X coverage for 95% of genes

  • Even coverage

read depth distribution

  • Specificity of capture

gene False pos / neg variants High homology genes

V4 CRE halo

slide-25
SLIDE 25

Mapping Coverage Sanger + Copy number Variants + Filtering Exome data analysis overview

  • Exome depth
  • Mapping %, on/off target
  • % >20x, min, max, bases not sequenced
  • bases <20x add Sanger amplicons
  • low frequency variants + indel

GATK: SNP + indel Annotation >100 databases, function

Inheritance

  • Dominant, recessive, etc
slide-26
SLIDE 26

Quality

  • High throughput
  • ISO 15189/17025 accredition needed for clinical use in NL
  • Sample swap is a real possibility
  • Spike-in to uniquely identify each sample after sequencing

A1 B1 C1

Shear Capture Sequencing QC QC Spike-in

slide-27
SLIDE 27

How does targetted sequencing result look?

slide-28
SLIDE 28

Zoom in sequence result

slide-29
SLIDE 29

Variation is not only SNP

GATTTAGATCGCGATAGAG GATTTAGATCTCGATAGAG ~0.1% of the genomes of any two individuals differ due to SNPs

Structural variants (SVs),

[e.g. kb-Mb-sized deletions, insertions, inversions, fusion genes]

presumably >0.1% of the genome GATT------------GAG GATTTAGATCTCGATAGAG

Short InDels

More difficult to detect than SNPs

SNPs

slide-30
SLIDE 30

Recent Case report

2005: 5 weeks old girl hospitalized RS virus with artificial respiration 2008: Developmental delay maybe due to braindamage by hypoxia 2011: Re-evaluation clinical geneticist: possibly Sotos syndrome SNParray, Sanger NSD1, PTEN, AOA, fraX, metabolism: Negative 2015: Re-evaluation: speech affected. WES trio filter for ID genes de novo c.1216C>T, p.Gln406* mutation MECP2

  • > atypical form of RETT syndrome

2016: RETT specialist: 5 other girls found with atypical RETT syndrome with c-terminal frame shift mutations in MECP2 (unpublished) WES helps to solve previously unsolved cases Evidence increasing to use WES as first tier care

slide-31
SLIDE 31

Human and disease, what to sequence?

  • Most mendelian diseases are caused by exome mutations
  • Exome is only ~1.6 % of human genome (50Mbp)

Panel Exome Whole genome

Genome >0,01% 1,6 % 95 % Sequencing 1/400x 1x 60x Interpretation ++ + + / - Validation ++ + + / - Speed ++ +

  • Cost (est.)

€ 500 € 700 € 3000

slide-32
SLIDE 32

Whole genome sequencing

X Ten $1000 genome 30x Outsource $1000 genome 40x

?

slide-33
SLIDE 33

Comparision of exome and genome sequencing

slide-34
SLIDE 34

Non invasive trisomy testing (NIPT)

DNA isolation Prepare NGS Analysis Trisomy Report

10 weeks pregnancy 5% fetal DNA

slide-35
SLIDE 35

NIPT: determine fetal chromosomal copy number

Fetal cfDNA Maternal cfDNA

Fetal Trisomy Euploid Pregnancy Chr 21 Chr 21

slide-36
SLIDE 36

Future of NGS

slide-37
SLIDE 37

MinION

  • USB sized sequencer
  • One time use
  • $ 900 dollar
  • 500 nanopores
  • > 1 Gbp
  • User defined runtime
  • Lifetime electrodes is limiting

(days)

No sample prep Measure directly from blood

slide-38
SLIDE 38

Longer readlength e.g Sequel

  • Enables first pass de novo assembly, phasing, epigenetics
  • Single molecule real time (SMRT) technology
  • 1M ZMWs, 500 Mb – 1 Gb
  • No amplification bias
  • Readlength max 60 kbp
  • Accuracy high due to multiple sequencing
  • Epigenetic characterization

Pathogens Mobile elements Plant/animal Polyploidy genomes Phased chromosomes Missing heritability

slide-39
SLIDE 39

Personalized medicine

consent, ethical, juridical, psychological, reimbursement me

Doctor here is my genome and variation…

Short term Become routine use for cancer Long term Routine workup of child like heel stick Every genome sequenced Baseline for health Genome gives knowledge of susceptibility Need to combined with conventional info

slide-40
SLIDE 40

LNA Genomics core facility at ErasmusMC www.biomics.nl w.vanijcken@erasmusmc.nl

Erasmus Center for Biomics