University of Cologne Institute of Virology
ES / Bonn Apr 2008 1
Arevir 2008 1 ES / Bonn Apr 2008 Arevir University of Cologne - - PowerPoint PPT Presentation
University of Cologne Institute of Virology Arevir 2008 1 ES / Bonn Apr 2008 Arevir University of Cologne Institute of Virology Analysis of resistance mutations of HI-Virus Bioinformatics analysis of relations between mutations of the HIV
University of Cologne Institute of Virology
ES / Bonn Apr 2008 1
University of Cologne Institute of Virology
ES / Bonn Apr 2008 2
Institut für Virologie der Universität zu Köln - Cologne center of advanced european studies and research - Bonn
Bioinformatics analysis of relations between mutations of the HIV genome and phenotypical drug resistance for the
University of Cologne Institute of Virology
ES / Bonn Apr 2008 3
1999 The Arevir project1) is founded by Daniel Hoffmann, Rolf Kaiser and Joachim Selbig. Aim: develop computer based methods to enhance the interpretation
2000 Niko Beerenwinkel created the basis for Arevir in his dissertation: Computational Analysis of HIV Drug Resistance Data 2001 Barbara Schmidt, Hauke Walter and Klaus Korn provided ~ 650 genotype phenotype pairs 2001 First version of geno2pheno available online, predicting drug resistance from genotype 2006 Collaboration with EuResist (www.euresist.org) 2008 New Arevir DB and user interface version
1) The project was funded by the German Research Foundation (Grants HO 1582/1-1 to -3 and KA 1569/1-1 to -3)
University of Cologne Institute of Virology
ES / Bonn Apr 2008 4
Genotype: 41L, 67N, 68G, 70R, 86DE, 88S, 90I, 102Q, 103S, 118IV, 135T, 162H, 190A, 203DE,
210W, 211K, 214F, 215Y, 219E, 228H, 248D, 277K, 283I, 326V, 329IV, 334L Resistance Test VL t
But: Selection of the optimal therapy remains a hard task! We need a database to find out more about mutations and resistance, correlate therapies with clinical outcome etc. Why develop computer based methods?
Therapy should be switched
We have to deal with: … VL, CD4, side effects, preferences, resistance
University of Cologne Institute of Virology
ES / Bonn Apr 2008 5
Patient related data (demographic data and diagnoses) Therapy data Isolate related data (serology, clinical chemistry) Genotipic data (mutations) Administrative data (access rights etc.)
University of Cologne Institute of Virology
ES / Bonn Apr 2008 6
Restricted access to the data
Only people involved in patient care (e.g. responsible for diagnostic findings)
have access to all data
Bioinformatics receive a copy of anonymized data
Use of pseudonyms
When a new patient record is added a pseudonym using the SHA-1
algorithm is generated to facilitate data exchange without name disclosure
Connection to Arevir is only possible:
With SSH (secure shell) Using public/private key authorization From a computer whose IP-Address is in a subnet known to Arevir
University of Cologne Institute of Virology
ES / Bonn Apr 2008 7
Arevir-DB
SSH tunnel SSH tunnel
Rosie
Firewall
SSH-Daemon
Arevir- Server
User PC
Internet Internet
HTTP Mail FTP Telnet …
Arevir security
University of Cologne Institute of Virology
ES / Bonn Apr 2008 8
For a meaningful correlation a large amount of data is needed!
P a t i e n t s P a t i e n t s Therapies Therapies C D 4 C D 4 V L V L Sequence
Steady small result despite of an increasing amount of data
s Sequences
2001 Data from ~ 500 patients ~ 150 records suitable for evaluation 2005 Data from ~ 4500 patients ~ 350 records suitable for evaluation
University of Cologne Institute of Virology
ES / Bonn Apr 2008 9
Arevir in June 2005
Patients ~ 4.500 Diagnoses ~ 2.900 Therapies ~ 12.000 CD4 values ~ 52.000 VL values ~ 41.000 Sequences ~ 3.000
Arevir in February 2007
Patients 2.444 Diagnoses 4.412 Therapies 6.154 CD4 values 53.031 VL values 25.972 Sequences 2.180
University of Cologne Institute of Virology
ES / Bonn Apr 2008 10
Small errors can have quite big effects:
TCE 15.03.2002 20.06.200
1,6
Therapy B Therapy A 12W (15.06.2002)
4,0
2
VL t
University of Cologne Institute of Virology
ES / Bonn Apr 2008 11
University of Cologne Institute of Virology
ES / Bonn Apr 2008 12
Arevir in June 2005
Patients ~ 4.500 Diagnoses ~ 2.900 Therapies ~ 12.000 CD4 values ~ 52.000 VL values ~ 41.000 Sequences ~ 3.000
Arevir in February 2007
Patients 2.444 Diagnoses 4.412 Therapies 6.154 CD4 values 53.031 VL values 25.972 Sequences 2.180
Arevir in May 2008
Patients ~ 5.600 Diagnoses ~ 9.200 Therapies ~ 9.800 Isolate - values ~ 230.000 Sequences ~ 5.100
University of Cologne Institute of Virology
ES / Bonn Apr 2008 13
By supporting therapy optimization algorithms By enhancing our understanding of HIV
University of Cologne Institute of Virology
ES / Bonn Apr 2008 14
University of Cologne Institute of Virology
ES / Bonn Apr 2008 15
University of Cologne Institute of Virology
ES / Bonn Apr 2008 16
André Ramirez 23.03.1969 Andre Ramires 23.03.1969 Hans-Peter Schmidt 05.11.1982 Hans Schmidt 05.11.1982 Georgios Koehler 15.04.1958 Georgious Koeler 15.04.1958 Anna Meier 29.12.1978 Anna da Silveira 29.12.1978 John Miller 16.03.1970 John Miller 10.03.1970 Mgabe Osamba 12.04.1974 Ossamba Mgabe 11.04.1974
Cleansing of patient names and assignment of an unique patient ID was done with new fuzzy indices (name aliases allowed)
Examples are fictive!
University of Cologne Institute of Virology
ES / Bonn Apr 2008 17
Several checks applied to uncover suspicious data:
Genotypes without sampling date Duplicate genotypes Overlapping therapies Date checks (e.g. infection < first positive test < first treatment etc.) Therapies with 'forbidden' drug combinations More than one isolate value (of a kind) in a period of 14 days Same isolate value for different dates Lab values out of specified range (e.g. HIVRNA > 10.000.000 copies/ml) . . .
Data cleansing is hard work, time consuming, tedious but absolutely necessary!
University of Cologne Institute of Virology
ES / Bonn Apr 2008 18
patients
2342 1963 'M' 'IVDA'
identity
2342 Max Muster 17.03.1963 'FA3E359D1..'
diagnoses
7431 2342 'R75'
therapies
4288 2342 2002-06-23 2003-11-05
therapycom- ponents
4288 '3TC' 150 mg
isolates
12630 2342 2002-06-02
sequences
2966 12630 'CCTCAGATC...' 3247583203
isolate_values
12630 'HIVRNA' 'bDNA' 165000
mutations
2966 K65R, L74V...
University of Cologne Institute of Virology
ES / Bonn Apr 2008 19
Backup I
Silent Mutations Subtype B
4,0 5,0 6,0 7,0 8,0 9,0 10,0 1999 2000 2001 2002 2003 2004 2005 2006 PRO (treated) RT (treated) PRO (naive) RT (naive)