Inferring Linkage Information from Sequencing Chromatograms Bastian - - PowerPoint PPT Presentation

inferring linkage information from sequencing
SMART_READER_LITE
LIVE PREVIEW

Inferring Linkage Information from Sequencing Chromatograms Bastian - - PowerPoint PPT Presentation

Inferring Linkage Information from Sequencing Chromatograms Bastian Beggel Max-Planck-Institute for Informatics Saarbrcken At the limits of Sanger sequencing data YMDD Motif of HBV Two resistance AG, GT mutations Are the resistance


slide-1
SLIDE 1

Inferring Linkage Information from Sequencing Chromatograms

Bastian Beggel Max-Planck-Institute for Informatics Saarbrücken

slide-2
SLIDE 2

Slide 2 Bastian Beggel

At the limits of Sanger sequencing data

YMDD Motif of HBV AG, GT Two resistance mutations

Are the resistance mutations on the same strain?

slide-3
SLIDE 3

Slide 3 Bastian Beggel

What is the mixture composition?

YMDD Motif of HBV Clones Position 610 Position 612 Comment Proportions

1 WT WT Wild type 25.0% 2 WT RAM M204V 25.0% 3 RAM WT M204I 32.5% 4 RAM RAM M204V 17.5%

slide-4
SLIDE 4

Slide 4 Bastian Beggel

Outline

1. Introduction 2. Related Work 3. Model 4. Model Evaluation

slide-5
SLIDE 5

Slide 5 Bastian Beggel

Polymerases show context dependent incorporation of dideoxy-nucleotides

Source: http://www.daviddarling.info/encyclopedia/D/DNA_sequencing.htm

  • PCR creates fragments of different length by

incorporating chain terminators

  • The four dideoxy-nucleotide chain terminators are

labeled with different fluorescent dyes Separation by size (e.g. Capillary electrophoresis) Laser Detector Chromatogram

Sanger Sequencing (chain terminator sequencing)

slide-6
SLIDE 6

Slide 6 Bastian Beggel

Early Chromatograms showed unequal peak heights

Chromatograms from Kwok et al. 1994

slide-7
SLIDE 7

Slide 7 Bastian Beggel

Sequence context-dependent incorporation of dideoxy-nucleotides

DNA Polymerase

http://spine.rutgers.edu/cellbio/assets/flash/dnapol.htm

slide-8
SLIDE 8

Slide 8 Bastian Beggel

The peak heights of a mixture are the proportion-weighted mixture

  • f the peak heights of the underlying clonal variants

Data from Carr et al. 2009:

Peak heights of a dilution series

slide-9
SLIDE 9

Slide 9 Bastian Beggel

Peak height profiles vary significantly for different mixtures Artificial Experiment Mixture 1 + 4 (left) Mixture 2 + 3 (right)

slide-10
SLIDE 10

Slide 10 Bastian Beggel

Outline

1. Introduction 2. Related Work 3. Model 4. Model Evaluation

slide-11
SLIDE 11

Slide 11 Bastian Beggel

The observed chromatogram is the sum of all single molecular fluorescence impulses

Source: http://spine.rutgers.edu/cellbio/assets/flash/dnapol.htm

Polymerase processes DNA on single molecular level

Observed Chromatogram 50% 50% 0% 0% C 1 C 2 C 3 C 4

slide-12
SLIDE 12

Slide 12 Bastian Beggel

The linear model assumption is combined with a Gaussian error model

Conditional Distribution Assumption

  • γB[i] · hi | M, α , γB[i] ~N (mi = ∑αi·pji, σ)
  • σ = const.

Input Data

  • Peak heights of query chromatogram hi
  • Peak heights of clonal variants pji

α α α d P M D P M D P ) ( ) , | ( ) | ( ⋅ =∫

Marginal Likelihood for Model Selection With: uniform priors

) ( ), ( M P P α

slide-13
SLIDE 13

Slide 13 Bastian Beggel

Outline

1. Introduction 2. Related Work 3. Model 4. Model Evaluation

slide-14
SLIDE 14

Slide 14 Bastian Beggel

The accuracy of the model predictions depend on the distance of the two ambiguous positions

1 base in-between 3 bases in-between

Correct 93% Incorrect 0% Uncertain 7% Correct 55% Incorrect 19% Uncertain 26%

slide-15
SLIDE 15

Slide 15 Bastian Beggel

Conclusions

  • Proportion estimates
  • Reconstruction of linkage

Predictions Findings Limitations

  • Sequence-context depended

incorporation provides linkage information

  • Peak height profiles of mixtures can be

computed

  • Limited accuracy
  • Limited range
  • Profiles of the clonal variants are required
slide-16
SLIDE 16

Slide 16 Bastian Beggel

Thank you for your attention

  • Thomas Lengauer
  • Alex Thielen
  • Rolf Kaiser
  • Maria Neumann-Fraune
slide-17
SLIDE 17

Slide 17 Bastian Beggel

End

slide-18
SLIDE 18

Slide 18 Bastian Beggel

Subsequent improvements lead to almost equal peak heights

Polymerase Chain Terminators Protocol and Sequencer

ABI 3100 sequencer Lee LG et al. 1992 Ying Li et al. 1999

  • Chain terminator sequencing has required 15 years of

research

  • Different companies use different materials/ protocols
  • Context-dependent incorporation of dideoxy-nucleotides was

seen as a burden