Effects of Sequencing Errors
- n geno2pheno[coreceptor]
Alejandro Pironti, Saleta Sierra, Rolf Kaiser, Thomas Lengauer and Nico Pfeifer
Computational Biology and Applied Algorithmics Max Planck Institute for Informatics April 18, 2013
Effects of Sequencing Errors on geno2pheno [coreceptor] Alejandro - - PowerPoint PPT Presentation
Effects of Sequencing Errors on geno2pheno [coreceptor] Alejandro Pironti, Saleta Sierra, Rolf Kaiser, Thomas Lengauer and Nico Pfeifer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics April 18, 2013
Computational Biology and Applied Algorithmics Max Planck Institute for Informatics April 18, 2013
April 18, 2013 Alejandro Pironti
– Non-duplicated V3 regions of the ENV gene – Los Alamos National Laboratory Sequence Database
– All sequences in a dataset
– Build 6 datasets containing 1000 sequences each – Choose sequences at random
– Replace original nucleotide by another nucleotide or IUPAC ambiguity code – Evaluate with geno2pheno[coreceptor]
– Position(s) chosen at random – Differentiate between nucleotides and ambiguity codes
April 18, 2013 Alejandro Pironti
April 18, 2013 Alejandro Pironti
Logo for 5 most frequent aminoacids. Height of letter is proportional to frequency. Color: see key to the right 50 100 Average FPR Histogram of the original FPRs
April 18, 2013 Alejandro Pironti
Comparison of the FPR histograms for the unchanged and the altered sequences.
April 18, 2013 Alejandro Pironti
Aminoacid position 11 Aminoacid position 25 On average:
April 18, 2013 Alejandro Pironti
Data X4 Intermediate R5 Original Sequences 10,484 (16%) 6,157 (9%) 48,668 (75%) Altered Sequences 16,538,450 (17%) 12,109,891 (13%) 66,396,041 (70%)
April 18, 2013 Alejandro Pironti
Data X4 R5 Original Sequences 13,181 (20%) 52,128 (80%) Altered Sequences 23,625,020 (25%) 71,419,362 (75%)
Data X4 R5 Original Sequences 20,385 (31%) 44,924 (69%) Altered Sequences 34,047,123 (36%) 60,997,259 (64%)
April 18, 2013 Alejandro Pironti
0.00 0.04 0.96 0.07 0.92 0.01 0.16 0.14 0.70 0.02 0.98 0.06 0.94 0.93 0.07 0.90 0.10
1 Nucleotide Change 2 Nucleotide Changes 3 Nucleotide Changes 1 Ambiguity Change 2 Ambiguity Changes 3 Ambiguity Changes
April 18, 2013 Alejandro Pironti
% Switches
Each pair of bars is one experiment with 1000 sequences. One, two or three nucleotide changes were introduced to each sequence at random. Changes were either nucleotides or ambiguity codes.
April 18, 2013 Alejandro Pironti