Next Next Generation Sequencing: an overview of Generation - - PowerPoint PPT Presentation

next next generation sequencing an overview of generation
SMART_READER_LITE
LIVE PREVIEW

Next Next Generation Sequencing: an overview of Generation - - PowerPoint PPT Presentation

Next Next Generation Sequencing: an overview of Generation Sequencing: an overview of technologies and applications technologies and applications Matthew Tinning Australian Genome Research Facility July 2012 History of Sequencing


slide-1
SLIDE 1

Next Next Generation Sequencing: an overview of Generation Sequencing: an overview of technologies and applications technologies and applications

Matthew Tinning Australian Genome Research Facility

July 2012

slide-2
SLIDE 2

History of Sequencing History of Sequencing Where have we been? Where have we been?

1869 – Discovery of DNA 1909 – Chemical characterisation 1953 – Structure of DNA solved 1977 – Sanger sequencing invented – First genome sequenced – ФX174 (5 kb) – First genome sequenced – ФX174 (5 kb) 1986 – First automated sequencing machine 1990 – Human Genome Project started 1992 – First “sequencing factory” at TIGR

slide-3
SLIDE 3

History of Sequencing History of Sequencing Where have we been? Where have we been?

1995 – First bacterial genome – H. influenzae (1.8 Mb) 1998 – First animal genome – C. elegans (97 Mb) 2003 – Completion of Human Genome Project (3 Gb) – 13 years, $2.7 bn 2005 – First “next-generation” sequencing instrument 2005 – First “next-generation” sequencing instrument 2008 – 2356 genome sequences in NCBI database

slide-4
SLIDE 4

Origin of DNA Sequencing Origin of DNA Sequencing

  • 1977

– First genome (ФX174) – Sequencing by synthesis (Sanger) – Sequencing by degradation (Maxam – Sequencing by degradation (Maxam Gilbert)

slide-5
SLIDE 5

Sanger sequencing Sanger sequencing

  • Uses DNA polymerase
  • All four nucleotides, plus one

dideoxynucleotide

  • Random termination at specific bases
  • Random termination at specific bases
  • Separate by gel electrophoresis
slide-6
SLIDE 6

Chain extension and Chain extension and termination termination

slide-7
SLIDE 7

Chain extension and Chain extension and termination termination

  • deoxynucleotide

dideoxynucleotide

slide-8
SLIDE 8

Fragment identification Fragment identification

slide-9
SLIDE 9

Gel electrophoresis Gel electrophoresis

slide-10
SLIDE 10

1986: 4 Reactions to 1 Lane 1986: 4 Reactions to 1 Lane

Sequencing Reaction Products Progression of Sequencing Reaction

slide-11
SLIDE 11

ABI377 ABI377

slide-12
SLIDE 12

ABI3730xl ABI3730xl

slide-13
SLIDE 13

Electropherograms Electropherograms

slide-14
SLIDE 14

Sanger Sequencing Sanger Sequencing

  • Maximum read length

~900 base

  • Maximum yield/day

< 2.1 million bases (rapid mode, 500 bp reads) < 0.1% of the human genome > 1000 days of sequencing for a 1 fold coverage ...

slide-15
SLIDE 15

Human Genome Project Human Genome Project

slide-16
SLIDE 16

Human Genome Project Human Genome Project

  • Launched in 1989 –expected to take 15 years

– Competing Celera project launched in 1998

  • Genome estimated to be 92% complete

– 1st Draft released in 2000 – “Complete” genome released in 2003 – “Complete” genome released in 2003 – Sequence of last chromosome published in 2006

  • Cost: ~$3 billion

– Celera ~$300 million

slide-17
SLIDE 17

Shotgun Sequencing Approach Shotgun Sequencing Approach

slide-18
SLIDE 18

Next Next Gen Sequencing Gen Sequencing Technologies Technologies

slide-19
SLIDE 19

Next Next Gen Sequencing Gen Sequencing Technologies Technologies

  • Four platforms, four technologies
  • All massively parallel sequencing

– Sequencing by synthesis – Sequencing by Ligation – Sequencing by Ligation

  • Read lengths vary from ~36bp to >400bp
  • Read numbers vary from ~ 1 million to ~

1 billion per run

slide-20
SLIDE 20

Next Next Gen Sequencing Gen Sequencing Technologies Technologies

Life Technologies SOLiD Roche GS-FLX Illumina HiSeq Life Technologies Ion Torrent

slide-21
SLIDE 21

Next Gen Sequencing Library Next Gen Sequencing Library Preparation Preparation

slide-22
SLIDE 22

Roche GS Roche GS FLX FLX

slide-23
SLIDE 23

Workflow Workflow

Library Preparation emPCR Setup Sample Fragmentation emPCR Amplification Pyrosequencing Data Analysis

slide-24
SLIDE 24

Pyrosequencing Pyrosequencing

slide-25
SLIDE 25

emPCR emPCR

Emulsion PCR is a method of clonal amplification which allows for millions of unique PCRs to be performed at once through the generation of microreactors.

slide-26
SLIDE 26

emPCR

The Water-in-Oil-Emulsion

slide-27
SLIDE 27

Massively Parallel Sequencing Massively Parallel Sequencing

slide-28
SLIDE 28

Data Analysis Data Analysis

T Base Flow A Base Flow C Base Flow G Base Flow

Raw Image Files

Image Processing Base calling Quality Filtering

SFF File

slide-29
SLIDE 29

454 Platform Updates 454 Platform Updates

  • 100bp reads, ~20Mbp / run

GS20

  • 250bp reads ~100 Mbp / run (7.5 hrs)

GSFLX

  • 400bp reads ~400 Mbp / run (10 hrs)

GSFLX Titanium

  • 400bp reads ~400 Mbp / run (10 hrs)

GSFLX Titanium

  • 700 bp reads ~700 Mbp/run (18 hrs)

GSFLX Titanium Plus

  • 400 bp reads ~ 35Mbp/run (10 hrs)

GS Junior

slide-30
SLIDE 30

Illumina Illumina HiSeq HiSeq

slide-31
SLIDE 31

DNA (0.1-1.0 ug)

5’ 3’

G T C A T A G C A C A G T C A T C A C G T

Illumina Illumina Sequencing Technology Sequencing Technology

Robust Reversible Terminator Chemistry Foundation

Sample preparation Cluster growth

5’

G T C T C C T A G C G T A

1 2 3 7 8 9 4 5 6 Image acquisition Base calling

T G C T A C G A T …

Sequencing

slide-32
SLIDE 32

Platform Updates Platform Updates

  • 18bp reads, ~1Gbp / run

Solexa 1G

  • 36bp reads ~3Gbp / run

Illumina GA

  • 75bp paired reads ~10Gbp / run (8 days)

Illumina GAII

  • 75bp paired reads ~40Gbp / run (8 days)

Illumina GAIIx Illumina HiSeq 2000

  • 100 bp paired reads ~200 Gbp/ run (10 days)

Illumina HiSeq 2000

  • 100bp paired reads ~600Gbp / run (12 days)

Illumina HiSeq, v3 SBS

  • 150 paired reads ~1.5 Gb/run (27 hrs)

MiSeq

Maximum yield / day 50,Gbp ~16x the human genome

slide-33
SLIDE 33

Applied Applied Biosystems Biosystems SOLiD SOLiD

slide-34
SLIDE 34

Sequencing by Ligation Sequencing by Ligation

slide-35
SLIDE 35

Base Interrogations Base Interrogations

slide-36
SLIDE 36

2 Base encoding 2 Base encoding

AT

slide-37
SLIDE 37

emPCR and Enrichment emPCR and Enrichment

3’ Modification allows covalent bonding to the slide surface

slide-38
SLIDE 38

Platform Updates Platform Updates

  • 50bp Paired reads ~50Gbp / run

(12 days)

SOLiD 3

  • 50bp Paired reads ~100Gbp / run

(12 days)

SOLiD 4

  • 75bp Paired reads ~300Gbp / run

(14 days)

5500xl

Maximum yield / day 21,000,000,000bp 7x the human genome 3.5 hours of sequencing for a 1 fold coverage.....

slide-39
SLIDE 39

Applied Applied Biosystems Biosystems: : Ion Torrent PGM Ion Torrent PGM

slide-40
SLIDE 40

Ion Torrent Ion Torrent

  • Ion Semiconductor Sequencing
  • Detection of hydrogen ions during

the polymerization DNA

  • Sequencing occurs in microwells
  • Sequencing occurs in microwells

with ion sensors

  • No modified nucleotides
  • No optics
slide-41
SLIDE 41

Ion Torrent Ion Torrent

  • DNA Ions Sequence

– Nucleotides flow sequentially over Ion semiconductor chip – One sensor per well per sequencing reaction – Direct detection of natural DNA extension – Millions of sequencing reactions per chip – Fast cycle time, real time detection

dNTP

∆ pH ∆ Q

H+

– Fast cycle time, real time detection

Sensor Plate Silicon Substrate

Drain Source Bulk

To column receiver ∆ Q ∆ V

Sensing Layer

slide-42
SLIDE 42

Ion Torrent: System Updates Ion Torrent: System Updates

  • 100bp reads ~10 Mb/run (1.5 hrs)

314 Chip

  • 100 bp reads ~100 Mbp / run (2 hrs)
  • 200 bp reads ~200 Mbp/run (3 hrs)

316 Chip

  • 200 bp reads ~1 Gbp / run (4.5 hrs)

318 Chip

slide-43
SLIDE 43

Summary of NGS Platforms Summary of NGS Platforms

  • Clonal amplification of sequencing template

– emPCR (454, SOLiD and Ion Torrent) – Bridge amplification (Illumina)

  • Sequencing by Synthesis

– 454 – Illumina – Illumina – Ion Torrent

  • Sequencing by ligation

– SOLiD – 2 base encoding

  • Dramatic reduction in cost of sequencing

– GSFLX provides > 100x decrease in costs compared to Sanger Sequencing – HiSeq and SOLiD > 100x decrease in costs over GSFLX

slide-44
SLIDE 44

Rapid Innovation Driving Cost Rapid Innovation Driving Cost Down Down

Evolution of NGS system output Cost per Human Genome

Throughput (GB) 120 300

300GB 3GB 6GB 20GB

20 40 60 80 100 2007 2008 2009 2010

slide-45
SLIDE 45

NGS Library Preparation NGS Library Preparation

  • Library preparation
slide-46
SLIDE 46

Library Preparation Library Preparation

slide-47
SLIDE 47

Sample preparation Sample preparation

DNA Fragmentation Fragmentation mRNA cDNA Synthesis mechanical chemical Fragmentation Ligation of Amplification/ Sequencing Adaptors Library Fragment Size Selection cDNA Synthesis

slide-48
SLIDE 48

Applications Applications

  • DNA
  • Whole Genome
  • hybridization Capture
  • amplicon
  • amplicon
  • ChIPseq
  • RNA
  • mRNA
  • small RNA
slide-49
SLIDE 49

Whole Genome Sequencing Whole Genome Sequencing

  • assembly
  • Reference Mapping

– SNVs, rearrangements

  • Comparative genomics
  • E. coli assembly from MiSeq Data

Illumina application notes

slide-50
SLIDE 50

Targeted Re Targeted Re sequencing: sequencing: sequence capture sequence capture

  • Enrichment for specific

targets via capture with

  • ligonculeotide baits

– Exome Capture

  • !

" "

– Custom Capture

  • up to 50 Mb of target

sequences

slide-51
SLIDE 51

Targeted Targeted Resequencing Resequencing: : Amplicons Amplicons

  • Preparation of amplicons tagged

with sequencing adapters

– Well suited for 454 and bench top sequencers – Deep sequencing for detection of somatic mutations somatic mutations – 16S Sequencing for microbial diversity

slide-52
SLIDE 52

RNA applications RNA applications

  • mRNA

– Gene expression analysis – Transcriptome analysis

  • Small RNA

– Small RNA discovery – Small RNA discovery – Gene regulation

slide-53
SLIDE 53