1 Traditional Genome Sequencing Based on the protocol used at JGI - - PDF document

1
SMART_READER_LITE
LIVE PREVIEW

1 Traditional Genome Sequencing Based on the protocol used at JGI - - PDF document

BITS 2009, Mar 20th 2009 Mar 20th 2009 Next-Generation Sequencing (NGS): Next-Generation Sequencing (NGS): An Overview An Overview Francesca D Ciccarelli NGS in the Literature Next-Next SOLiD Solexa 454 NIH Grant Keywords: Next


slide-1
SLIDE 1

1

Next-Generation Sequencing (NGS): Next-Generation Sequencing (NGS): An Overview An Overview

Francesca D Ciccarelli

BITS 2009, Mar 20th 2009 Mar 20th 2009 Keywords: Next generation sequencing; Massive parallel sequencing; Ultra-deep sequencing; Pyro-sequencing

NGS in the Literature

Francesca D. Francesca D. Ciccarelli Ciccarelli

NIH Grant 454 Solexa SOLiD Next-Next

  • “$100.000 Genome ”:
  • “$10.000 Genome”:

Sequencing of tumor genome collections SNP and disease-associated mutations Signs of natural selection within a population Raw Measure of human genetic variation

  • “$1.000 Genome”:

Personal genome

Nat Rev Genet 5 (2004), pp. 335–344. Curr Opin Genet Dev. 2006 16(6):545-52.

Feb 2004: NHGRI launched a grant application to develop next generation sequencing technologies NIH Genome Centers spend > $120 million/year on genome sequences

The $1000 Genome Project

Francesca D. Francesca D. Ciccarelli Ciccarelli

slide-2
SLIDE 2

2

  • I. Library Preparation
  • 1. Shearing of DNA
  • 2. Insertion of Fragments into a Plasmid
  • 3. Transformation
  • 4. Subcloning of Sheared Fragment
  • 5. Colony Picking

Based on the protocol used at JGI (http://www.jgi.doe.gov/)

  • II. Sequencing
  • 6. Cell Lysing
  • 7. Rolling-Circle Amplification
  • 8. Capillary Sequencing
  • III. Assembly and QA
  • 9. Assembly
  • 10. Quality Assessment

Traditional Genome Sequencing

Francesca D. Francesca D. Ciccarelli Ciccarelli

PROBLEM #3: costs PROBLEM #1: in vivo cloning PROBLEM #2: timing and workload

  • 10.000 instrument day/human genome
  • Only affordable by genome centers
  • Human Genome Reference Sequence (2001; 2003)

$1 billion (99.995% accurate; 99% complete)

  • Estimated Cost for Individual Genome:

$10 million (1 year using >30 instruments) Clonal Bias and Unclonable DNA

  • Hard Stops (hairpins, triple helices, stem loops, high GC content)
  • Polymerase (Long streches of PolyA)

Limitations of Traditional Sequencing

Francesca D. Francesca D. Ciccarelli Ciccarelli Journal of Experimental Biology 210, 1518-1525 (2007)

Sequencing Approaches

Francesca D. Francesca D. Ciccarelli Ciccarelli

slide-3
SLIDE 3

3

SOLiD (ABI)

Resolve inherent biases of in vivo cloning

454 (Roche)

  • emulsion PCR
  • pyrosequencing
  • read lengths ca. 400 bp

Solexa (Illumina)

  • PCR on solid support
  • reversible terminator sequencing
  • read lengths ca. 35 bp
  • emulsion PCR
  • sequencing by ligation
  • read lengths ca. 35 bp

first instrument in June 2007 first instrument in Oct 2005 first instrument in July 2006

NGS on the Market

Francesca D. Francesca D. Ciccarelli Ciccarelli

454 Solexa SOLiD

http://solid.appliedbiosystems.com Shendure et al. (2005) Science 309 (1728-1732) Margulies et al. (2005) Nature 437 (376-380)

Library Preparation

Francesca D. Francesca D. Ciccarelli Ciccarelli DNA attached to the surface

Solexa

Bridge Amplification Double Strand Denaturation Repeat Cycles

454

Anneal sstDNA Emulsion in water- in-oil microreactors Clonal Amplification Enrichment for DNA- positive beads

Clonal Amplification

Francesca D. Francesca D. Ciccarelli Ciccarelli

slide-4
SLIDE 4

4

SOLiD

Emulsion PCR Transfer on solid array Enrichment for DNA- positive beads Clonal Amplification

Clonal Amplification

Francesca D. Francesca D. Ciccarelli Ciccarelli

454 (Pyrosequencing)

Reaction with:

  • DNA polymerase,
  • ATP sulfurylase,
  • luciferase
  • apyrase,
  • APS
  • luciferin

Addition of dNTPs one at the time

Solexa (Reversible Terminators)

Reaction with:

  • DNA polymerase,
  • primers
  • 4 labelled reversible terminators

Determine first base using laser light Wash off and Repeat for all sequence Sequence Read

Sequencing by Synthesis

Francesca D. Francesca D. Ciccarelli Ciccarelli

1st Cycle Sequencing

Probe Annealing Ligation Washing off Visualization

SOLiD

Reaction with:

  • Universal Primers
  • Ligase
  • Probes

Cleavage

[x 5 times]

5 base read

Sequencing by Ligation

Francesca D. Francesca D. Ciccarelli Ciccarelli

slide-5
SLIDE 5

5

SOLiD

Following Cycles

annealing; ligation; washing; visualization; cleavage Reset

[x 5 times] [X 5 times]

Sequencing by Ligation

Francesca D. Francesca D. Ciccarelli Ciccarelli Capillary Electrophoresis: 1be Deconvolution matrix

  • A single color does not indicate
  • ne single base
  • Each read contains information

for 2 bases

  • To decode the bases you have to

know one of them

SNP Detection

Real SNP Miscall Sequencing by Ligation: 2be

Two Bases Encoding

Francesca D. Francesca D. Ciccarelli Ciccarelli

Massive Parallelization

Francesca D. Francesca D. Ciccarelli Ciccarelli

Sequencing Reaction within the PicoTiterPlate Device

  • 1.6 million wells/plate
  • ~420 kread/run (1.2 Mread/run)

454 Solexa

  • > 10 million clusters
  • ~50 Mread/run (220 Mread/run)

Sequencing Reaction on planar,

  • ptically transparent surface

SOLiD

  • ~ 20 million beads (1µm diameter)
  • ~95 Mread/run (220 Mread/run)

present (6months/1year)

slide-6
SLIDE 6

6

PROBLEM #3: costs PROBLEM #1: in vivo cloning PROBLEM #2: timing and workload

  • 10.000 instrument day/human genome
  • Only affordable by genome centers
  • Human Genome Reference Sequence (2001; 2003)

$1 billion (99.995% accurate; 99% complete)

  • Estimated Cost for Individual Genome:

$10 million (1 year using >30 instruments) Clonal Bias and Unclonable DNA

  • Hard Stops (hairpins, triple helices, stem loops, high GC content)
  • Polymerase (Long streches of PolyA)

Limitations of Traditional Sequencing

Francesca D. Francesca D. Ciccarelli Ciccarelli

1 Roche-Italy, pers.comm. 3 AppliedBS, pers.comm.

Today (6Months-1Year)

2 Illumina, pers.comm.

  • 3 days (single fragment)
  • 6 days (paired-end)
  • 6Gbp/run (17Gbp)
  • 10Gbp/run (26Gbp)

SOLiD3

  • 4 days (single fragment)
  • 4.5 days (paired-end)
  • 1.5Gbp/run (5-10Gbp)
  • 3Gbp/run (10-20Gbp)

Solexa2 21h

  • 100Mbp/run (450Mbp-1Gbp)

4541 10 runs/day

  • 96cap: 76.8kbp/run
  • 384cap 0.3Mbp/run

Sanger TIMING THROUGHPUT

Timing and Throughput

Francesca D. Francesca D. Ciccarelli Ciccarelli

PROBLEM #3: costs PROBLEM #1: in vivo cloning PROBLEM #2: timing and workload

  • 10.000 instrument day/human genome
  • Only affordable by genome centers
  • Human Genome Reference Sequence (2001; 2003)

$1 billion (99.995% accurate; 99% complete)

  • Estimated Cost for Individual Genome:

$10 million (1 year using >30 instruments) Clonal Bias and Unclonable DNA

  • Hard Stops (hairpins, triple helices, stem loops, high GC content)
  • Polymerase (Long streches of PolyA)

Limitations of Traditional Sequencing

Francesca D. Francesca D. Ciccarelli Ciccarelli

slide-7
SLIDE 7

7

~€ 0.0004/kbp €2.478 (6Gbp) SOLiD3 ~€ 0.002/kbp €3.000 (1.5Gbp) Solexa2

  • ~€ 0.07/kbp
  • ~€ $9 (consensus kbp, Error=4x10-5)2

€7.195 (100Mb) 4541

  • $1 (raw kbp)
  • $7 (consensus kb, Error=4x10-6)2

X96 x384 Sanger COSTS/kBP COSTS/RUN

1 Roche-Italy, pers.comm. 4 AppliedBS, pers.comm. 3 Illumina, pers.comm. 2 G.Church Nat Biotec 24, 139 (2006)

Sequencing Costs

Francesca D. Francesca D. Ciccarelli Ciccarelli

Limitations of NGS

Francesca D. Francesca D. Ciccarelli Ciccarelli

PROBLEM #3: sequencing accuracy PROBLEM #1: length of sequencing reads PROBLEM #2: (huge) amount of data production

  • Difficult data handling and analysis
  • Not a reliable standard available yet
  • Difficult to compare different methods
  • Much shorter than Sanger

25-35 (50) SOLiD 35 (75) Solexa 250 (450) 454 450-850 Sanger Read Length (bp)

  • Resequencing (454) : overlapping amplicons needed
  • De novo Sequencing: difficult assembly
  • Metagenomics: difficult assignment

Length of Sequencing Reads

Francesca D. Francesca D. Ciccarelli Ciccarelli

slide-8
SLIDE 8

8

  • Increase the Read Length
  • Help in Assembly Reconstruction
  • Find Structural Variation (CNV, Rearrangements, Etc)

600-10.000bp SOLiD 200-300bp (2kbp) Solexa 1,5-3kbp (16kbp) 454 INSERT LENGTH

Francesca D. Francesca D. Ciccarelli Ciccarelli

Paired-End Sequencing

tag1 tag2 insert

Need ad hoc tool development for data analysis

40Gb (7-8 Tb after sequencing) SOLiD 10-20Gb (0.5-2Tb after sequencing) Solexa 1Gb (10-15Gb after sequencing) 454

(Huge) Amount of Data Production

Francesca D. Francesca D. Ciccarelli Ciccarelli

Difficult to compare because based on different technologies 99.999% (15x) $

(in principle, outstanding accuracy for SNP detection)

99.94% SOLiD4 99.8%- 98.5% Solexa3 99.99% 99.96% 99.5% (no homopolymers) 97.0% (with homopolymers)

(n>7; 0.7% human genome)2

4541 99.995% (10x) 99.5% Sanger Consensus Raw

1 Roche-Italy, pers.comm. 4 AppliedBS, pers.comm. 3 Illumina, pers.comm. 2 Nat Med 12, 852-855 (2006)

Sequencing Accuracy

Francesca D. Francesca D. Ciccarelli Ciccarelli

slide-9
SLIDE 9

9

Francesca D. Francesca D. Ciccarelli Ciccarelli

Comparison of Sequencing Accuracy

  • Generation of a mutant strain of Pichia stipitis (haploid yeast,

GS=15.4Mb, 14 mutations compared to reference)

Genome Res. (2008) 18:1638-1642

  • Whole-genome mutational profiling using 454, Solid, Illumina
  • Comparative assessment of the sequence coverage needed to optimize

sensitivity and specificity

  • NOV 2008: Genome of an Asian Individual (Illumina)
  • ~$1 million
  • few weeks
  • Nature (2008) 456: 60-65
  • MAY 2007: James Watson's genome (454)
  • less than $1 million
  • few months
  • Nature (2008) 452, 872-876
  • SEP 2007: Diploid Genome of Craig Venter (Sanger)
  • ~$1 million
  • few months
  • PLoS Biology (2007) 5(10): e254

Human Genome Resequencing

Francesca D. Francesca D. Ciccarelli Ciccarelli

  • NOV 2008: Genome of a male Yorouba (Illumina)
  • US$100,000
  • few weeks
  • Nature (2008) 456: 60-65

Single-molecule Sequencing

LIBRARY PREPARATION

  • DNA fragmentation and

addition of labeled polyA - SURFACE BINDING

  • through hybridization with

complementary polyT - IMAGING

  • to establish starting sites
  • f sequencing -

SEQUENCING

  • after adding labeled nu and

polymerase and followed by washing and imaging - CHEMICAL CLEAVAGE

  • after washing

to remove the dye - SECOND CYCLE

  • with another labeled nu -

Harris et al Science 320 (2008)

Next Next-Next Generation Sequencing

  • Next Generation Sequencing
slide-10
SLIDE 10

10

Helicos Web Site individual single molecules that incorporated a fluorescent “G” nucleotide in this cycle

  • At each cycle, the incorporated nucleotides emits light upon illumination

Tracking nucleotide incorporation on each strand determines the exact sequence of each individual DNA molecule

  • The sequencer captures images for each strand up to 25 nu

Next Next-Next Generation Sequencing

  • Next Generation Sequencing
  • NO PCR
  • PCR introduces an uncontrolled bias in template representation

since amplification efficiency varies as a function of template properties

  • Errors of polymerase
  • LOWER RUNNING COSTS
  • HIGHER THROUGHPUT

Helicos Web Site

Next Next-Next Generation Sequencing

  • Next Generation Sequencing

Engineered DNA polymerases Fluorescent nucleotide During the chain extension, when a nu is incorporated, there is energy transfer from the polymerase to the nu, which emits light

http://visigenbio.com/technology_movie_streaming.html

Visigen

Single Molecule Sequencing Single Molecule Sequencing