SLIDE 1 DepthOfCoverage
NGS technologies SNPs and Human Diseases XV
Robert Kraaij Department of Internal Medicine r.kraaij@erasmusmc.nl
SLIDE 2
What will NGS bring us?
RFLP TaqMan Array Array and Imputation Regional Sequencing Full Genome Sequencing
SLIDE 3
- First Generation: a bit of history
- Next (Second) Generation
- Third Generation
SLIDE 4 1977: Maxam & Gilbert Sequencing
Walter Gilbert from wikipedia.org
SLIDE 5
Maxam & Gilbert Sequencing
G G+A C+T C
SLIDE 6
SLIDE 7 1977: Sanger Sequencing
Frederick Sanger from wikipedia.org
SLIDE 8
G A T C
Sanger Sequencing
SLIDE 9 Sanger sequencing landmarks
from wikipedia.org
bacteriophage φX174 5.4 kb
Epstein-Barr virus 170 kb
Haemophilus influenzae 1.8 Mb
Human 3 Gb
SLIDE 10 June 26th, 2000: working draft, 95% gesequenced April 14th, 2003: finished, 99% gesequenced Costs: $ 2.7 billion (instead of $ 3 billion) Timing: 1990 - 2003 (instead of 2005)
Bill Clinton Tony Blair Craig Venter Francis Collins
The Human Genome Project
SLIDE 11
- First Generation: a bit of history
- Next (Second) Generation
- Third Generation
SLIDE 12
Next Generation: Illumina
SLIDE 13 Sequencing Workflow
DNA isolation Library preparation Sequencing Data analysis
SLIDE 14 Sequencing Workflow
DNA isolation Library preparation Sequencing Data analysis
SLIDE 15 Sequencing Workflow
DNA isolation Library preparation Sequencing Data analysis
SLIDE 16 Illumina sequencing
- fragment DNA
- clonal amplification
- n flowcell by bridgePCR
- sequencing-by-synthesis
SLIDE 17
Bridge amplification
SLIDE 18 Illumina sequencing
- fragment DNA
- clonal amplification
- n flowcell by bridgePCR
- sequencing-by-synthesis
SLIDE 19
Sequencing by synthesis
SLIDE 20
Sequencing by synthesis
SLIDE 21
Per Cycle Imaging
SLIDE 22
G A T C
Per Cycle Imaging
SLIDE 23
G good quality G poor quality
Per Cycle Base Calling
SLIDE 24
Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000 99.999 % 0 to 93 ASCII 33 to 126 = single character
Quality Scoring
SLIDE 25
@SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTC +SEQ_ID !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>
FASTQ File
SLIDE 26 T A C G G T A C T T G C A T A G A T T A C G G T A C T T G C A T A G C T Alignment or Mapping of Reads R E F E R E N C E G E N O M E (HG19)
chromosome + position + strand
sample.bam
SLIDE 27 Run QC and filtering
sample.bam
SLIDE 28 sample.bam
- both reads
- quality scores
- chromosome
- position
- quality flag
- duplicate flag
- off target flag
sortedBAM file
SLIDE 29
Coverage T A C G G T A C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T A C T T G C A T A G G A T T A C G G T A C T T G C G G T A C T T G C A T A G C T T T A C G G T A C T T G C A T
5x coverage
SLIDE 30
Variant Calling T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T G C T T G C A T A G G A T T A C G G T G C T T G C G G T G C T T G C A T A G C T T T A C G G T G C T T G C A T
G = homozygous alternative
G A T T A C G G T G C C G G T G C T T G C A T A G C T G C A T A G C T - A T T A C G G T G C T T G C A
SLIDE 31
Variant Calling T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T G C T T G C A T A G G A T T A C G G T A C T T G C G G T G C T T G C A T A G C T T T A C G G T A C T T G C A T
A/G = heterozygous
G A T T A C G G T A C C G G T G C T T G C A T A G C T G C A T A G C T - A T T A C G G T G C T T G C A
SLIDE 32
Variant Calling T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T G C T T G C A T A G G A T T A C G G T A C T T G C
A/G = heterozygous?
SLIDE 33
Variant Calling T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T G C T T G C A T A G G A T T A C G G T A C T T G C
G
sequencing quality good poor
SLIDE 34
MiniSeq MiSeq NextSeq500 HiSeq2500 2 x 150 b 2 x 300 b 2 x 150 b 2 x 125 b 6.6 Gb 13 Gb 100 Gb 450/900 Gb 22M clusters 22M clusters 0.4B clusters 2B/4B clusters 1 day 3 days 1 day 6 days 50k$ 100k€ 250k€ 700k$ 4250 $/WG 3500 $/WG
Illumina: Normal flow cell technology
SLIDE 35
HiSeq4000 HiSeqX Five HiSeqX Ten NovaSeq6000 2 x 150 b 2 x 150 b 2 x 150 b 2 x 150 b 0.65/1.3 Tb 0.8/1.6 Tb 0.8/1.6 Tb 0.85/1.7 Tb 2/4 B clusters 2.5/5 B clusters 2.5/5 B clusters 2.8/5.6 B clusters 4 days 3 days 3 days 2 days 900k$ 5 x 1.2M$ 10 x 1M€ 1M€ 2500 $/WG 1500 $/WG 1000 $/WG 1200 $/WG
Illumina: Patterned flow cell technology
SLIDE 36 Illumina: Patterned flow cell technology
- Patterned flowcell
- Billions of nanowells
- Extreme high density
- No overlapping clusters
- Special polymerase?
- ExAmp clustering
- primer swaps
SLIDE 37
- First Generation: a bit of history
- Next (Second) Generation
- Third Generation
SLIDE 38
Next Generation: Roche 454
SLIDE 39 Roche 454
- fragment DNA
- clonal amplification
- n bead by emPCR
- load beads in
PicoTiterPlate
synthesis
SLIDE 40
Ion Torrent
SLIDE 41 Ion Torrent
- fragment DNA
- clonal amplification
- n bead by emPCR
- load beads on chip
- sequencing-by-
synthesis
SLIDE 42
- First Generation: a bit of history
- Next (Second) Generation
- Third Generation
SLIDE 43
Third generation sequencing = single molecule sequencing
SLIDE 44 Third Generation: PacBio
- last week update: bought by Illumina
RS Sequal
SLIDE 45 SMRT technology
- Library prep
- Circular DNA
- SMRT cell
SLIDE 46
- no DNA amplification
- real-time imaging of
DNA polymerase
synthesis
PacBio
SLIDE 47 SMRT technology
- >10kb reads
- 1 Gb output
- Better chemistry
- De novo assembly
- Haplotyping
- Variant calling
Posted February 10, 2014 The Genomics Resource Center University of Maryland http://www.igs.umaryland.edu
SLIDE 48
Oxford Nanopore
SLIDE 49
Oxford Nanopore
SLIDE 50
Oxford Nanopore
SLIDE 51 Oxford Nanopore
ACCCGTCCG
- 6 bases in pore
- 6x base calling
- Caller development Community
SLIDE 52 Oxford Nanopore
- High error rate, but major improvement in 2017…
SLIDE 53