SNPs and Human Diseases XV Robert Kraaij Department of Internal - - PowerPoint PPT Presentation

snps and human diseases xv
SMART_READER_LITE
LIVE PREVIEW

SNPs and Human Diseases XV Robert Kraaij Department of Internal - - PowerPoint PPT Presentation

NGS technologies DepthOfCoverage SNPs and Human Diseases XV Robert Kraaij Department of Internal Medicine r.kraaij@erasmusmc.nl What will NGS bring us? RFLP TaqMan Array Array and Imputation Regional Sequencing Full Genome Sequencing


slide-1
SLIDE 1

DepthOfCoverage

NGS technologies SNPs and Human Diseases XV

Robert Kraaij Department of Internal Medicine r.kraaij@erasmusmc.nl

slide-2
SLIDE 2

What will NGS bring us?

RFLP TaqMan Array Array and Imputation Regional Sequencing Full Genome Sequencing

slide-3
SLIDE 3
  • First Generation: a bit of history
  • Next (Second) Generation
  • Third Generation
slide-4
SLIDE 4

1977: Maxam & Gilbert Sequencing

Walter Gilbert from wikipedia.org

slide-5
SLIDE 5

Maxam & Gilbert Sequencing

G G+A C+T C

slide-6
SLIDE 6
slide-7
SLIDE 7

1977: Sanger Sequencing

Frederick Sanger from wikipedia.org

slide-8
SLIDE 8

G A T C

Sanger Sequencing

slide-9
SLIDE 9

Sanger sequencing landmarks

from wikipedia.org

  • 1977

bacteriophage φX174 5.4 kb

  • 1984

Epstein-Barr virus 170 kb

  • 1995

Haemophilus influenzae 1.8 Mb

  • 2001

Human 3 Gb

slide-10
SLIDE 10

June 26th, 2000: working draft, 95% gesequenced April 14th, 2003: finished, 99% gesequenced Costs: $ 2.7 billion (instead of $ 3 billion) Timing: 1990 - 2003 (instead of 2005)

Bill Clinton Tony Blair Craig Venter Francis Collins

The Human Genome Project

slide-11
SLIDE 11
  • First Generation: a bit of history
  • Next (Second) Generation
  • Third Generation
slide-12
SLIDE 12

Next Generation: Illumina

slide-13
SLIDE 13

Sequencing Workflow

DNA isolation Library preparation Sequencing Data analysis

slide-14
SLIDE 14

Sequencing Workflow

DNA isolation Library preparation Sequencing Data analysis

slide-15
SLIDE 15

Sequencing Workflow

DNA isolation Library preparation Sequencing Data analysis

slide-16
SLIDE 16

Illumina sequencing

  • fragment DNA
  • clonal amplification
  • n flowcell by bridgePCR
  • sequencing-by-synthesis
slide-17
SLIDE 17

Bridge amplification

slide-18
SLIDE 18

Illumina sequencing

  • fragment DNA
  • clonal amplification
  • n flowcell by bridgePCR
  • sequencing-by-synthesis
slide-19
SLIDE 19

Sequencing by synthesis

slide-20
SLIDE 20

Sequencing by synthesis

slide-21
SLIDE 21

Per Cycle Imaging

slide-22
SLIDE 22

G A T C

Per Cycle Imaging

slide-23
SLIDE 23

G good quality G poor quality

Per Cycle Base Calling

slide-24
SLIDE 24

Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000 99.999 % 0 to 93  ASCII 33 to 126 = single character

Quality Scoring

slide-25
SLIDE 25

@SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTC +SEQ_ID !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>

FASTQ File

slide-26
SLIDE 26

T A C G G T A C T T G C A T A G A T T A C G G T A C T T G C A T A G C T Alignment or Mapping of Reads R E F E R E N C E G E N O M E (HG19)

chromosome + position + strand

sample.bam

slide-27
SLIDE 27

Run QC and filtering

sample.bam

slide-28
SLIDE 28

sample.bam

  • both reads
  • quality scores
  • chromosome
  • position
  • quality flag
  • duplicate flag
  • off target flag

sortedBAM file

slide-29
SLIDE 29

Coverage T A C G G T A C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T A C T T G C A T A G G A T T A C G G T A C T T G C G G T A C T T G C A T A G C T T T A C G G T A C T T G C A T

5x coverage

slide-30
SLIDE 30

Variant Calling T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T G C T T G C A T A G G A T T A C G G T G C T T G C G G T G C T T G C A T A G C T T T A C G G T G C T T G C A T

G = homozygous alternative

G A T T A C G G T G C C G G T G C T T G C A T A G C T G C A T A G C T - A T T A C G G T G C T T G C A

slide-31
SLIDE 31

Variant Calling T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T G C T T G C A T A G G A T T A C G G T A C T T G C G G T G C T T G C A T A G C T T T A C G G T A C T T G C A T

A/G = heterozygous

G A T T A C G G T A C C G G T G C T T G C A T A G C T G C A T A G C T - A T T A C G G T G C T T G C A

slide-32
SLIDE 32

Variant Calling T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T G C T T G C A T A G G A T T A C G G T A C T T G C

A/G = heterozygous?

slide-33
SLIDE 33

Variant Calling T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A C G G T G C T T G C A T A G G A T T A C G G T A C T T G C

G

sequencing quality good poor

slide-34
SLIDE 34

MiniSeq MiSeq NextSeq500 HiSeq2500 2 x 150 b 2 x 300 b 2 x 150 b 2 x 125 b 6.6 Gb 13 Gb 100 Gb 450/900 Gb 22M clusters 22M clusters 0.4B clusters 2B/4B clusters 1 day 3 days 1 day 6 days 50k$ 100k€ 250k€ 700k$ 4250 $/WG 3500 $/WG

Illumina: Normal flow cell technology

slide-35
SLIDE 35

HiSeq4000 HiSeqX Five HiSeqX Ten NovaSeq6000 2 x 150 b 2 x 150 b 2 x 150 b 2 x 150 b 0.65/1.3 Tb 0.8/1.6 Tb 0.8/1.6 Tb 0.85/1.7 Tb 2/4 B clusters 2.5/5 B clusters 2.5/5 B clusters 2.8/5.6 B clusters 4 days 3 days 3 days 2 days 900k$ 5 x 1.2M$ 10 x 1M€ 1M€ 2500 $/WG 1500 $/WG 1000 $/WG 1200 $/WG

Illumina: Patterned flow cell technology

slide-36
SLIDE 36

Illumina: Patterned flow cell technology

  • Patterned flowcell
  • Billions of nanowells
  • Extreme high density
  • No overlapping clusters
  • Special polymerase?
  • ExAmp clustering
  • primer swaps
slide-37
SLIDE 37
  • First Generation: a bit of history
  • Next (Second) Generation
  • Third Generation
slide-38
SLIDE 38

Next Generation: Roche 454

slide-39
SLIDE 39

Roche 454

  • fragment DNA
  • clonal amplification
  • n bead by emPCR
  • load beads in

PicoTiterPlate

  • sequencing-by-

synthesis

slide-40
SLIDE 40

Ion Torrent

slide-41
SLIDE 41

Ion Torrent

  • fragment DNA
  • clonal amplification
  • n bead by emPCR
  • load beads on chip
  • sequencing-by-

synthesis

slide-42
SLIDE 42
  • First Generation: a bit of history
  • Next (Second) Generation
  • Third Generation
slide-43
SLIDE 43

Third generation sequencing = single molecule sequencing

slide-44
SLIDE 44

Third Generation: PacBio

  • last week update: bought by Illumina

RS Sequal

slide-45
SLIDE 45

SMRT technology

  • Library prep
  • Circular DNA
  • SMRT cell
slide-46
SLIDE 46
  • no DNA amplification
  • real-time imaging of

DNA polymerase

  • sequencing-by-

synthesis

PacBio

slide-47
SLIDE 47

SMRT technology

  • >10kb reads
  • 1 Gb output
  • Better chemistry
  • De novo assembly
  • Haplotyping
  • Variant calling

Posted February 10, 2014 The Genomics Resource Center University of Maryland http://www.igs.umaryland.edu

slide-48
SLIDE 48

Oxford Nanopore

slide-49
SLIDE 49

Oxford Nanopore

slide-50
SLIDE 50

Oxford Nanopore

slide-51
SLIDE 51

Oxford Nanopore

ACCCGTCCG

  • 6 bases in pore
  • 6x base calling
  • Caller development  Community
slide-52
SLIDE 52

Oxford Nanopore

  • High error rate, but major improvement in 2017…
slide-53
SLIDE 53