Learning Drug Resistance from Therapeutic History Alejandro Pironti - - PowerPoint PPT Presentation

learning drug resistance from therapeutic history
SMART_READER_LITE
LIVE PREVIEW

Learning Drug Resistance from Therapeutic History Alejandro Pironti - - PowerPoint PPT Presentation

Learning Drug Resistance from Therapeutic History Alejandro Pironti Computational Biology and Applied Algorithmics Max-Planck-Institut fr Informatik May 12, 2015 Motivation Genotypic drug-resistance Goals: determination:


slide-1
SLIDE 1

Learning Drug Resistance from Therapeutic History

Alejandro Pironti

Computational Biology and Applied Algorithmics Max-Planck-Institut für Informatik May 12, 2015

slide-2
SLIDE 2

Motivation

  • Genotypic drug-resistance

determination: – Rules-based systems – Data-driven systems

  • Data-driven genotypic

interpretation systems are trained

  • n genotype-phenotype pairs

(GPP) – GPPs are hard to get – Combined regression of GPPs derived with different assays problematic

2

Goals:

  • Development of a data-driven

genotypic drug-resistance interpretation system requiring minimal expert supervision

  • Exploitation of both genotype-

phenotype pairs from different assays and therapy-history data from routine clinical practice

  • Regular, automatic updates

Figure 1: The genotype codes for the phenotype. Figure 2: Data-driven or rules-based? The benefits and disadvantages of each approach render them complimentary.

slide-3
SLIDE 3

Datasets: Drug Exposure

3 DPRRT DIN TPRRT TIN HIVdb DPRRT DIN TPRRT TIN HIVdb ABC 7,862 256 1,661 136 363 APV 1,369 50 375 36 206 AZT 20,923 332 3,796 214 1,190 ATV 3,549 198 738 102 76 d4T 15,172 209 2,705 124 1,101 DRV 936 121 313 90 7 ddC 4,928 40 1,189 56 339 FPV 1,139 72 274 52 30 ddI 13,836 176 2,552 119 817 IDV 10,760 155 2,053 93 812 FTC 4,699 261 788 152 83 LPV 8,951 246 1,837 175 197 3TC 23,063 406 3,954 242 0 NFV 8,407 126 1,520 77 764 TDF 9,873 349 1,636 200 272 SQV 7,371 126 1,764 97 493 DLV 142 8 90 23 0 TPV 673 73 217 49 11 EFV 10,311 221 1,922 155 454 EVG 10 3 0 ETR 272 49 130 59 2 RAL 694 132 209 95 0 NVP 9,094 167 1,635 132 570 Naïve 37,408 2,188 2,453 184 0 RPV 5 2 0 Total 63,593 2,674 6,886 461 1,517

  • PRRT: 40,473 EuResist sequences (exposed and naïve) + 20,020 Los-

Alamos sequences (naïve only), including 9,690 TCEs.

  • IN: 1,524 EuResist sequences (exposed and naïve) + 4,111 Los-Alamos

sequences (naïve only)

  • HIVdb: 1,804 protease and reverse-transcriptase sequences from the

HIVdb TCE respository (exposed only), including 1,512 TCEs. Reserved for testing.

Table: Numbers of sequences by dataset and drug exposure. DPRRT: training PRRT dataset; DIN: training IN dataset; TPRRT: test PRRT dataset; TIN: test integrase dataset.

slide-4
SLIDE 4

Datasets: Genotype-Phenotype Pairs

4 AV Train PS Train Resist. Train Total Train AV Test PS Test Resist. Test Total Test Total 3TC 912 1537 1623 2449 108 175 184 283 2732 ABC 851 1468 902 2319 96 171 96 267 2586 AZT 859 1555 1234 2414 103 177 137 280 2694 d4T 898 1562 1026 2460 101 179 104 280 2740 ddC 833 448 139 1281 93 49 15 142 1423 ddI 900 1563 167 2463 102 180 17 282 2745 TDF 648 1224 696 1872 72 142 75 214 2086 DLV 1036 1621 1055 2657 106 186 109 292 2949 EFV 1133 1636 1362 2769 114 187 135 301 3070 ETR 374 460 268 834 32 68 35 100 934 NVP 1194 1640 1477 2834 122 188 156 310 3144 RPV 93 173 93 266 12 24 15 36 302 ATV 773 1156 975 1929 86 109 99 195 2124 DRV 270 648 349 918 34 60 33 94 1012 FPV 1086 1705 1413 2791 112 183 138 295 3086 IDV 1144 1739 1409 2883 132 189 159 321 3204 LPV 1041 1485 1486 2526 112 155 150 267 2793 NFV 1178 1783 1646 2961 134 196 180 330 3291 SQV 1177 1743 1187 2920 133 193 134 326 3246 TPV 742 880 584 1622 80 80 55 160 1782 EVG 106 589 206 695 8 70 26 78 773 RAL 106 622 220 728 8 73 30 81 809

Table: Numbers of Antivirogram (AV) and PhenoSense (PS) genotype- phenotype pairs.

  • Genotype-

phenotype pairs downloaded from HIVdb

  • Gaussian-mixture

model used for resistant- susceptible cutoff determination

  • Only resistant

genotype- phenotype pairs used for training, but both types for testing.

slide-5
SLIDE 5

Prediction and Cutoff Determination

  • Linear support vector

machines trained for discriminating:

– Sequences exposed to or resistant to a certain drug – Therapy-naïve sequences and those exposed to other drugs

  • Features:

– Amino-acids, insertions and deletions – Protease: positions 4-99 – Reverse transcriptase: positions 40-230 – All integrase positions

  • Determination of upper and lower

cutoffs by maximization of AUC in training set when predicting drug exposure

5

Schematic Representation of a Support Vector Machine

Distance to hyperplane is a linear score for predicting drug exposure and resistance: the drug-exposure score.

slide-6
SLIDE 6

Performance: Drug-Exposure Prediction

6 DES DES After Cutoffs HIVdb Rule Set DES DES After Cutoffs HIVdb Rule Set 3TC/ FTC 0.84 0.81 0.73 ATV 0.61 0.58 0.56 ABC 0.76 0.72 0.68 DRV 0.65 0.62 0.62 AZT 0.84 0.81 0.74 IDV 0.79 0.76 0.7 d4T 0.85 0.82 0.77 LPV 0.7 0.67 0.65 ddC 0.84 0.5 NFV 0.79 0.5 0.74 ddI 0.86 0.83 0.77 SQV 0.81 0.78 0.72 TDF 0.73 0.69 0.61 TPV 0.83 0.79 0.8 EFV 0.77 0.74 0.7 RAL 0.75 0.69 ETR 0.78 0.75 0.72 Naïve PRRT 0.88 0.83 NVP 0.77 0.74 0.7 Naïve IN 0.65 0.64 APV/ FPV 0.8 0.73 0.74 Mean CD (SD) 0.77 (0.07) 0.73 (0.09) 0.7 (0.06) Mean AM (SD) 0.78 (0.07) 0.71 (0.1) DES DES After Cutoffs HIVdb Rule Set DES DES After Cutoffs HIVdb Rule Set 3TC/ FTC 0.73 0.66 0.76 ATV 0.61 0.57 0.54 ABC 0.7 0.66 0.66 DRV 0.88 0.89 0.89 AZT 0.62 0.6 0.67 IDV 0.76 0.73 0.73 d4T 0.65 0.62 0.65 LPV 0.67 0.64 0.64 ddI 0.73 0.69 0.68 NFV 0.76 0.5 0.71 TDF 0.57 0.54 0.55 SQV 0.76 0.73 0.74 EFV 0.83 0.79 0.79 TPV 0.79 0.74 0.78 ETR 0.58 0.63 0.64 Mean CD (SD) 0.72 (0.09) 0.67 (0.1)

CD: Common drugs; AM: All Models Table 1: Drug-Exposure Prediction Performance (AUC) on EuResist test set. Table 1: Drug-Exposure Prediction Performance (AUC) on HIVdb test set.

slide-7
SLIDE 7

Performance: Resistance Prediction and Therapy Success Prediction

7 Antivirogram log RF Correlation PhenoSense log RF Correlation Resistant vs. Naïve after cutoffs AUC Antivirogram log RF Correlation PhenoSense log RF Correlation Resistant vs. Naïve after cutoffs AUC 3TC/FTC 0.75 0.76 0.99 APV/FPV 0.85 0.88 1 ABC 0.65 0.73 1 ATV 0.84 0.89 0.99 AZT 0.27 0.5 1 DRV 0.72 0.89 1 d4T 0.38 0.55 0.99 IDV 0.82 0.84 1 ddI 0.49 0.45 0.98 LPV 0.88 0.92 1 TDF 0.26 0.24 0.99 NFV 0.79 0.85 1 EFV 0.71 0.74 0.99 SQV 0.78 0.8 1 ETR 0.71 0.65 0.99 TPV 0.48 0.64 0.99 RPV 0.75 0.7 1 RAL 0.62 0.71 0.96 NVP 0.75 0.6 0.99 EVG 0.71 0.67 0.99 Mean (SD) 0.66 (0.19) 0.7 (0.17) 0.99 (0.01)

Table 1: The correlation of drug-exposure scores with log resistance factors is shown below. Additionally, cutoffs were applied to drug-exposure scores and the capability of discriminating between resistant genotypes and therapy-naïve genotypes was assessed.

EuResist TCEs HIVdb TCEs Drug Exposure Scores After Cutoffs 0.68 0.63 HIVdb Rule Set 0.67 0.66

Table 2: Therapy-success prediction performance (AUC).

slide-8
SLIDE 8

Examples

  • http://bioinf.mpi-inf.mpg.de/g2p_r

8

slide-9
SLIDE 9

Concluding Remarks

  • Novel approach:

– Data-driven genotypic drug-resistance interpretation derived from therapy history and genotype-phenotype pairs

  • Training of the tool without resistant genotypes:

– Yields good performance, albeit decreased

  • Linear weights of the models provide interpretation for prediction

9

slide-10
SLIDE 10

May 12, 2015 Alejandro Pironti

Acknowledgements

Max-Planck-Institut für Informatik

Thomas Lengauer Nico Pfeifer Joachim Büch Prabhav Kalaghatgi Joachim Büch

University of Düsseldorf

Björn Jensen

University of Cologne

Rolf Kaiser Mark Oette Saleta Sierra Aragon Elena Knops Maria Neumann-Fraune Eugen Schülter Eva Heger Claudia Müller Nadine Lübcke

Medizinisches Labor Berg

Hauke Walter Martin Obermeier

Institut für Immunologie und Genetik Kaiserslautern

Martin Däumer Alexander Thielen Berhard Thiele

EuResist

Francesca Incardona Maurizzio Zazzi Mattia Prosperi

Robert-Koch-Institut

Claudia Kücherer