Arevir University of Cologne Institute of Virology Analysis of - - PowerPoint PPT Presentation

arevir
SMART_READER_LITE
LIVE PREVIEW

Arevir University of Cologne Institute of Virology Analysis of - - PowerPoint PPT Presentation

Arevir University of Cologne Institute of Virology Analysis of resistance mutations of HI-Virus Bioinformatics analysis of relations between mutations of the HIV genome and phenotypical drug resistance for the optimization of anti-retroviral


slide-1
SLIDE 1

University of Cologne Institute of Virology ES / Bonn Apr 2007 1

Arevir

Eugen Schülter

Institut für Virologie der Universität zu Köln - Cologne center of advanced european studies and research - Bonn

Analysis of resistance mutations of HI-Virus

Bioinformatics analysis of relations between mutations of the HIV genome and phenotypical drug resistance for the

  • ptimization of anti-retroviral therapies
slide-2
SLIDE 2

University of Cologne Institute of Virology ES / Bonn Apr 2007 2

Outline

Background Database scheme Requirements of data protection Problems resulting from 'to much protection' Data cleansing Comparison between 'old' and 'new' Arevir DB Basic statistics / samples of derived information Collaborations / What next

slide-3
SLIDE 3

University of Cologne Institute of Virology ES / Bonn Apr 2007 3

Background

1999 The Arevir project1) is founded by Daniel Hoffmann, Rolf Kaiser and Joachim Selbig. Aim: develop computer based methods to enhance the inter- pretation of genotipic resistance tests. 2000 Niko Beerenwinkel created the basis for Arevir in his dissertation: Computational Analysis of HIV Drug Resistance Data 2001 Barbara Schmidt, Hauke Walter and Klaus Korn provided ~650 genotype phenotype pairs 2001 First version of geno2pheno available online, predicting drug resistance from genotype

1) The project was funded by the German Research Foundation (Grants HO 1582/1-1 to -3 and KA 1569/1-1 to -3)

slide-4
SLIDE 4

University of Cologne Institute of Virology ES / Bonn Apr 2007 4

Background

Genotype: 41L, 67N, 68G, 70R, 86DE, 88S, 90I, 102Q, 103S, 118IV, 135T, 162H, 190A, 203DE,

210W, 211K, 214F, 215Y, 219E, 228H, 248D, 277K, 283I, 326V, 329IV, 334L Resistance Test VL t

Question: How will the current virus population change, faced with a new therapy? Which new therapy is the optimal choice? (~1500 drug combinations seen!) Why develop computer based methods?

Therapy must be changed!

slide-5
SLIDE 5

University of Cologne Institute of Virology ES / Bonn Apr 2007 5

To give assistance with both questions, we tried to correlate: genotype, VL, CD4, therapy we needed a database The Arevir DB is a MySQL database and consists mainly of 50 tables organized in 5 groups:

Patient related data (demographic data) Therapy data Isolate related data (serology, clinical chemistry) Genotipic data (mutations) Administrative data (access rights etc.)

Background

slide-6
SLIDE 6

University of Cologne Institute of Virology ES / Bonn Apr 2007 6

Arevir DB 'Scheme'

patients

patID year_of_birth gender transmission

pseudonyms

patID pseudonym

diagnoses

diagID patID ICD_code

therapies

therapyID patID therapy_start therapy_stop

therapycom- ponents

therapyID compound dosage

isolates

isoID patID sampling_date

sequences

seqID isoID nt_seq nt_crc

isolate_values

isoID propID methodID value

mutations

seqID mutations

slide-7
SLIDE 7

University of Cologne Institute of Virology ES / Bonn Apr 2007 7

Arevir DB 'Scheme'

patients

2342 1963 'M' 'IVDA'

pseudonyms

2342 'FA3E359D1...'

diagnoses

7431 2342 'R75'

therapies

4288 2342 2002-06-23 2003-11-05

therapycom- ponents

4288 '3TC' 150 mg

isolates

12630 2342 2002-06-02

sequences

2966 12630 'CCTCAGATC...' 3247583203

isolate_values

12630 'HIVRNA' 'bDNA' 165000

mutations

2966 K65R, L74V...

slide-8
SLIDE 8

University of Cologne Institute of Virology ES / Bonn Apr 2007 8

Data Protection

Requirements from data protection officials:

Trace back to the identity of a patient must be impossible

Restricted access to the data (physician can see 'her/his' data,

bioinformatics have read access but not to pseudonyms)

Names are not stored in Arevir at all

When a new patient record is added a pseudonym using the SHA-1 algorithm

is generated on the fly from the fields first name, last name and birthday.

The only way to connect the server is:

SSH (secure shell) Using public/private key authorization From a computer whose IP-Address is known to Arevir

slide-9
SLIDE 9

University of Cologne Institute of Virology ES / Bonn Apr 2007 9

0101

Internet Access to Arevir

Arevir Arevir

Login with private key protected by pass phrase VNC login with password VNC tunneled via SSH

0101 0101 0101 0101 0101 0101

Internet TCP/IP

0101 0101

Arevir web application login with password

slide-10
SLIDE 10

University of Cologne Institute of Virology ES / Bonn Apr 2007 10

Web Interface

Arevir's web interface showing the input form for personal data

slide-11
SLIDE 11

University of Cologne Institute of Virology ES / Bonn Apr 2007 11

Selbigs Data Funnel

For a meaningful correlation a large amount of data is needed!

P a t i e n t s P a t i e n t s Therapies Therapies C D 4 C D 4 V L V L Sequence

Steady small result despite of an increasing amount of data

s Sequences

2001 Data from ~500 patients ~150 records suitable for evaluation 2005 Data from ~4500 patients ~350 records suitable for evaluation

slide-12
SLIDE 12

University of Cologne Institute of Virology ES / Bonn Apr 2007 12

The data funnel

t

Clinical data Genotypes

Δ n / time 2000 2005

Bad patient identification was spotted as main cause for the funnel effect

Arevir pseudonym algorithm was dependent on the exact spelling

  • f the patient name

Maintenance of patient ↔ genotype list at the laboratory was done

with an Excel sheet

Solution:

'New' pseudonym algorithm using some kind of fuzzy pre processing Standalone application (Rosie) for the Institute of Virology Migration to new MySQL version with some alterations in the scheme

While the lab was producing more genotypes per month, the number of follow up data was decreasing steadily

slide-13
SLIDE 13

University of Cologne Institute of Virology ES / Bonn Apr 2007 13

Data cleansing I

André Ramirez 23.03.1969 Andre Ramires 23.03.1969 Hans-Peter Schmidt 05.11.1982 Hans Schmidt 05.11.1982 Georgios Koehler 15.04.1958 Georgious Koeler 15.04.1958 Anna Meier 29.12.1978 Anna da Silveira 29.12.1978 John Miller 16.03.1970 John Miller 10.03.1970 Mgabe Osamba 12.04.1974 Ossamba Mgabe 11.04.1974

Cleansing of patient names and assignment of an unique patient ID was done with new fuzzy indices (name aliases allowed)

Examples are fictive!

slide-14
SLIDE 14

University of Cologne Institute of Virology ES / Bonn Apr 2007 14

Data cleansing II

Several checks applied to uncover suspicious data:

Genotypes without sampling date Duplicate genotypes Overlapping therapies Date checks (e.g. infection < first positive test < first treatment etc.) Therapies with 'forbidden' drug combinations More than one isolate value (of a kind) in a period of 7 days Lab values out of specified range (e.g. HIVRNA > 10.000.000 copies/ml) . . .

Data cleansing is hard work, time consuming, tedious but absolutely necessary!

slide-15
SLIDE 15

University of Cologne Institute of Virology ES / Bonn Apr 2007 15

Data cleansing III

Small errors can have quite big effects:

TCE 15.03.2002 20.06.200

1,6

Therapy B Therapy A 12W (15.06.2002)

4,0

2

VL t

slide-16
SLIDE 16

University of Cologne Institute of Virology ES / Bonn Apr 2007 16

Rosie

Two entry forms of Rosie

Data quality and consistency was

improved by early checks

Pseudonym is generated in Rosie

slide-17
SLIDE 17

University of Cologne Institute of Virology ES / Bonn Apr 2007 17

Old and new Arevir versions

Arevir in June 2005

Patients ~ 4.500 Diagnoses ~ 2.900 Therapies ~ 12.000 CD4 values ~ 52.000 VL values ~ 41.000 Sequences ~ 3.000

Arevir in February 2007

Patients 2.444 Diagnoses 4.412 Therapies 6.154 CD4 values 53.031 VL values 25.972 Sequences 2.180 ~ 350 complete TCEs ~ 750 complete TCEs

slide-18
SLIDE 18

University of Cologne Institute of Virology ES / Bonn Apr 2007 18

Basic statistics

blood transfusion haemophiliac heterosexual homosexual IVDA pattern II unknown

Male Female 0,0 5,0 10,0 15,0 20,0 25,0 30,0 35,0 40,0 45,0

Male Female

Transmission quotas

slide-19
SLIDE 19

University of Cologne Institute of Virology ES / Bonn Apr 2007 19

Derived Data

Percentage of silent mutations per genotype

5,0 6,0 7,0 8,0 9,0 10,0 11,0 12,0 13,0 14,0 1999 2000 2001 2002 2003 2004 2005 2006 PRO (treated) RT (treated) PRO (naïve) RT (naïve)

n naïve = 1013 n pretreated = 2327

slide-20
SLIDE 20

University of Cologne Institute of Virology ES / Bonn Apr 2007 20

Collaborations / What Next

At the moment Arevir receives data mainly from the University

Clinics from Bonn, Köln (through Medeora, www.medeora.com) and Düsseldorf.

The system is open and any contribution is welcome! There is a close collaboration with EuResist (www.euresist.org)

Coming soon:

Integrase Genotypes Hepatitis B and Hepatitis C data

slide-21
SLIDE 21

University of Cologne Institute of Virology ES / Bonn Apr 2007 21

Conclusions

Databases can help to improve the health condition of HIV infected.

By supporting therapy optimization algorithms By enhancing our understanding of HIV

But to extract useful information from a database, a

large amount of data with high quality is needed!

slide-22
SLIDE 22

University of Cologne Institute of Virology ES / Bonn Apr 2007 22

Bernd Kupfer Christian von Behren Claudia Müller Clemens Kühn Daniel Hoffmann Dörte Hammerschmidt Gerd Fätkenheuer Hauke Walter Heike Krause Joachim Selbig Jürgen Klein Jürgen Rockstroh Mark Oette M i Dä

Thank you!