Inferring Linkage Information from Sequencing Chromatograms Bastian - - PowerPoint PPT Presentation
Inferring Linkage Information from Sequencing Chromatograms Bastian - - PowerPoint PPT Presentation
Inferring Linkage Information from Sequencing Chromatograms Bastian Beggel Max-Planck-Institute for Informatics Saarbrcken At the limits of Sanger sequencing data YMDD Motif of HBV Two resistance AG, GT mutations Are the resistance
Slide 2 Bastian Beggel
At the limits of Sanger sequencing data
YMDD Motif of HBV AG, GT Two resistance mutations
Are the resistance mutations on the same strain?
Slide 3 Bastian Beggel
What is the mixture composition?
YMDD Motif of HBV Clones Position 610 Position 612 Comment Proportions
1 WT WT Wild type 25.0% 2 WT RAM M204V 25.0% 3 RAM WT M204I 32.5% 4 RAM RAM M204V 17.5%
Slide 4 Bastian Beggel
Outline
1. Introduction 2. Related Work 3. Model 4. Model Evaluation
Slide 5 Bastian Beggel
Polymerases show context dependent incorporation of dideoxy-nucleotides
Source: http://www.daviddarling.info/encyclopedia/D/DNA_sequencing.htm
- PCR creates fragments of different length by
incorporating chain terminators
- The four dideoxy-nucleotide chain terminators are
labeled with different fluorescent dyes Separation by size (e.g. Capillary electrophoresis) Laser Detector Chromatogram
Sanger Sequencing (chain terminator sequencing)
Slide 6 Bastian Beggel
Early Chromatograms showed unequal peak heights
Chromatograms from Kwok et al. 1994
Slide 7 Bastian Beggel
Sequence context-dependent incorporation of dideoxy-nucleotides
DNA Polymerase
http://spine.rutgers.edu/cellbio/assets/flash/dnapol.htm
Slide 8 Bastian Beggel
The peak heights of a mixture are the proportion-weighted mixture
- f the peak heights of the underlying clonal variants
Data from Carr et al. 2009:
Peak heights of a dilution series
Slide 9 Bastian Beggel
Peak height profiles vary significantly for different mixtures Artificial Experiment Mixture 1 + 4 (left) Mixture 2 + 3 (right)
Slide 10 Bastian Beggel
Outline
1. Introduction 2. Related Work 3. Model 4. Model Evaluation
Slide 11 Bastian Beggel
The observed chromatogram is the sum of all single molecular fluorescence impulses
Source: http://spine.rutgers.edu/cellbio/assets/flash/dnapol.htm
Polymerase processes DNA on single molecular level
Observed Chromatogram 50% 50% 0% 0% C 1 C 2 C 3 C 4
Slide 12 Bastian Beggel
The linear model assumption is combined with a Gaussian error model
Conditional Distribution Assumption
- γB[i] · hi | M, α , γB[i] ~N (mi = ∑αi·pji, σ)
- σ = const.
Input Data
- Peak heights of query chromatogram hi
- Peak heights of clonal variants pji
α α α d P M D P M D P ) ( ) , | ( ) | ( ⋅ =∫
Marginal Likelihood for Model Selection With: uniform priors
) ( ), ( M P P α
Slide 13 Bastian Beggel
Outline
1. Introduction 2. Related Work 3. Model 4. Model Evaluation
Slide 14 Bastian Beggel
The accuracy of the model predictions depend on the distance of the two ambiguous positions
1 base in-between 3 bases in-between
Correct 93% Incorrect 0% Uncertain 7% Correct 55% Incorrect 19% Uncertain 26%
Slide 15 Bastian Beggel
Conclusions
- Proportion estimates
- Reconstruction of linkage
Predictions Findings Limitations
- Sequence-context depended
incorporation provides linkage information
- Peak height profiles of mixtures can be
computed
- Limited accuracy
- Limited range
- Profiles of the clonal variants are required
Slide 16 Bastian Beggel
Thank you for your attention
- Thomas Lengauer
- Alex Thielen
- Rolf Kaiser
- Maria Neumann-Fraune
Slide 17 Bastian Beggel
End
Slide 18 Bastian Beggel
Subsequent improvements lead to almost equal peak heights
Polymerase Chain Terminators Protocol and Sequencer
ABI 3100 sequencer Lee LG et al. 1992 Ying Li et al. 1999
- Chain terminator sequencing has required 15 years of
research
- Different companies use different materials/ protocols
- Context-dependent incorporation of dideoxy-nucleotides was