comet a novel approach to hiv 1 subtype prediction
play

COMET: A Novel approach to HIV-1 subtype prediction (Context-based - PowerPoint PPT Presentation

COMET: A Novel approach to HIV-1 subtype prediction (Context-based Modeling for Expeditious Typing) Daniel Struck CRP-SANT Laboratory of Retrovirology (daniel.struck@crp-sante.lu) comet.retrovirology.lu Background HIV-1 subtype is


  1. COMET: A Novel approach to HIV-1 subtype prediction (Context-based Modeling for Expeditious Typing) Daniel Struck CRP-SANTÉ Laboratory of Retrovirology (daniel.struck@crp-sante.lu) comet.retrovirology.lu

  2. Background • HIV-1 subtype is often used for epidemiological studies • Many different subtyping tools exist: – jpHMM, RIP (LANL), NCBI genotyping, STAR , REGA Subtyping Tool , … • Subtyping remains a controversial topic → compare the results from different approaches comet.retrovirology.lu

  3. COMET HIV-1 subtyping tool • Context-based modeling for classification of HIV-1 sequences adapted from ppm compression algorithm ( p rediction by p artial m atch) – take ambiguities from population sequencing into consideration • Software written in Java (Linux, Windows, Apple, …) • Core algorithm holds in approx. 300 lines of code • Does not require any external analysis tool (muscle / mafft / clustal, paup / raxml / phyml) • Multi-threaded (takes advantage of all the cpu cores available) comet.retrovirology.lu

  4. Algorithm • Training of the model with the subtype reference sequences from Los Alamos National Lab (LANL) from 2008 and 30 additional near full length sequences from LANL. • Slide over the sequence and determine the probabilities for each subtype. Simplified example with a model 4: C T A G C A A C A C T A G C A A C A C T A G C A A C A C T A G C A A C A Subtype A 0.5 0.5 0.1 0.2 0.3 Subtype B 0.5 0.5 0.4 0.6 0.8 Subtype C 0.3 0.2 0.1 0.2 0.1 • Determine the most probable subtype. • Then slide over the table of probabilities with a window size of 250bp and a stepping size of 2bp to detect possible recombination events. comet.retrovirology.lu

  5. Analysis of 27017 prot-RT sequences from LANL • Dataset for analysis: – 27017 prot-RT sequences downloaded from LANL. • Query parameters: – HXB2 start point: 2253, end point: 3450 (prot-RT region) – Sequence length < 1700 bp • Download subtype results from the STAR and REGA subtyping tools. – STAR: all PURE, CRF: 01_AE - 02_AG – REGA v2: all PURE, CRF: 01_AE - 14_BG comet.retrovirology.lu

  6. Subtype distribution of the dataset (27017 prot-RT sequences) STAR REGA COMET B 19988 19722 20282 C 1329 1334 1329 A 672 1200 1194 D 555 186 499 G 246 441 393 F 205 206 193 H 19 21 19 J 3 6 2 CRF02_AG 867 787 829 CRF01_AE 414 419 416 other CRF 0 653 806 unassigned 2719 2042 1055 comet.retrovirology.lu

  7. Comparison of STAR, REGA & COMET (27017 prot-RT sequences) • All 3 tools agreed in 88.3% cases (23854) – 22352 PURE – 777 CRF – 725 unassigned • All 3 tools disagreed in only 0.1% cases (30). • COMET & REGA agreed in 6.4% cases (1722); STAR disagreed – 1034 PURE, 582 CRF, 106 unassigned • COMET & STAR agreed in 4.0% cases (1090); REGA disagreed – 910 PURE, 40 CRF, 140 unassigned • REGA & STAR agreed in 1.2% cases (321); COMET disagreed – 77 PURE, 8 CRF, 236 unassigned comet.retrovirology.lu

  8. Comparison of REGA & COMET to LANL Of the 27017 from the dataset, 24735 had a subtype (PURE, CRF, URF) assigned in the LANL database. For comparison 24576 sequences were analyzed ( PURE, CRF: 01_AE → 14_BG, URF ) REGA & LANL agreed in 93.9% cases (23077) and disagreed in 6.1% of the cases (1499). Fleiss kappa = 0.84 COMET & LANL agreed in 96.9% cases (23818) and disagreed in 3.1% of the cases (758). Fleiss kappa = 0.92 “The Fleiss kappa measure calculates the degree of agreement in classification over that which would be expected by chance and is scored as a number between 0 and 1.” comet.retrovirology.lu

  9. Cohen Kappa REGA ↔ LANL COMET ↔ LANL training set 01_AE 0.98 0.98 5 02_AG 0.92 0.93 6 03_AB 0 0 2 04_CPX 0.86 0.86 4 05_DF 1 0 3 06_CPX 0.83 0.77 5 07_BC 1 0.98 4 08_BC 0.97 0.97 2 09_CPX 0 0.8 4 10_CD -1.09E-04 0 2 11_CPX 0.64 0.64 3 12_BF 0.65 0.61 5 13_CPX 0.8 0.8 3 14_BG 0 0 2 A 0.96 0.96 7 A1 ,2 A2 B 0.92 0.98 7 C 0.99 0.99 6 D 0.41 0.94 6 F 0.94 0.92 6 F1, 2 F2 G 0.9 0.91 4 H 0.97 0.91 4 J 0.5 0.5 3 K 0 0 2 URF 0.38 0.55 comet.retrovirology.lu

  10. Benchmark Anaylsis of the 27017 prot-RT sequences: 392+/-2 seconds (6 ½ minutes) on Opteron server (2 x Quad-core, 2.5GHz) => 68 prot-RT sequences / second 144+/-0 seconds (2 ½ minutes) on new Intel server (2 x Quad-core, newest generation, 2.93 GHz) => 187 prot-RT sequences / second comet.retrovirology.lu

  11. Ultra-deep sequencing (UDS) application In-house UDS (454) software: • alignment, trimming • filtering • compressing • automatic correction of homopolymer count & “carry forward” errors • … • added adapted COMET module with bootstrap analysis (100 values per sequence, threshold 75%) comet.retrovirology.lu

  12. UDS application, dataset: 64 patients from Rwanda AMATA study  454 Sequence length: 333 bp (454, RT, AA 88 → 198)  Total sequences analyzed: 267749 (seq. with frameshifts excluded)  Time needed for analysis (100 bootstraps / seq. ): 5 ½ minutes  Sanger (prot-RT) (URF: 2 AC, 5CA, 1 CAC, 1 AD, 2CD, 1 DC, 1GH) comet.retrovirology.lu

  13. UDS application, results: COMET subtype confirmation patient major subtype number minor subtype number unassigned minority % REGA STAR jpHMM man. align. insp. Sanger 5 A1 4312 C 1 0 0.02 ok A1/u ok ok URF_CA 8 D 6853 A1 1 57 0.01 ok ok ok ok D 9 C 6603 A1 14 28 0.21 u/A1 u/A1 H/A1 C-H?/A1 URF_GH 17 A1 5727 C 3 0 0.05 ok ok ok ok A1 18 C 3279 A1 5 0 0.15 ok ok ok ok C 21 A1 2856 C 4 0 0.14 u/u ok ok ok A1 22 C 5995 A1 5 0 0.08 u/A1 ok ok ok C 24 A1 6361 C 13 0 0.2 u/C ok ok ok A1 25 C 6412 A1 15 0 0.23 C/u ok ok ok URF_CD 26 A1 7350 C 1 0 0.01 u/C ok ok ok C 32 C 6094 A1 11 0 0.18 C/u ok ok ok URF_DC 33 A1 2226 C 1 0 0.04 ok ok ok ok A1 35 A1 4864 C 4 0 0.08 A1/u ok ok ok A1 36 A1 670 C 1 0 0.15 ok ok ok ok A1 47 A1 3290 C 2 0 0.06 u/C ok ok ok A1 48 A1 4120 C 1 0 0.02 u/C ok ok ok A1 49 C 5279 A1 58 0 1.09 ok ok ok ok C 64 C 1695 A1 9 0 0.53 ok ok ok ok URF_CA 65 A1 6346 C 8 0 0.13 A1/u ok ok ok A1 73 C 3335 A1 1 0 0.03 ok ok ok ok C 79 A1 3244 C 3 0 0.09 ok ok ok ok A1 21 out of 64 patients (32.81%) seem to be dually infected by two different subtypes comet.retrovirology.lu

  14. Summary • Reliable prediction of HIV-1 subtype • Generally it is best to compare the results of different approaches to define the subtype of a sequence • High performance and scalability – suitable for deep sequencing (454) analysis • In preparation: stand-alone desktop version with possibility to inspect the recombination pattern comet.retrovirology.lu

  15. http://comet.retrovirology.lu subtype results can be downloaded in CSV format comet.retrovirology.lu

  16. Acknowledgements CRP-Santé, Laboratory of Retrovirology Jean-Claude Schmit Carole Devaux Danielle Perez Bercoff Jean-Claude Karasi CRP-Santé, Laboratory of Cardiovascular Research Francisco Azuaje comet.retrovirology.lu

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend