Arevir 2008 1 ES / Bonn Apr 2008 Arevir University of Cologne - - PowerPoint PPT Presentation

arevir 2008
SMART_READER_LITE
LIVE PREVIEW

Arevir 2008 1 ES / Bonn Apr 2008 Arevir University of Cologne - - PowerPoint PPT Presentation

University of Cologne Institute of Virology Arevir 2008 1 ES / Bonn Apr 2008 Arevir University of Cologne Institute of Virology Analysis of resistance mutations of HI-Virus Bioinformatics analysis of relations between mutations of the HIV


slide-1
SLIDE 1

University of Cologne Institute of Virology

ES / Bonn Apr 2008 1

Arevir 2008

slide-2
SLIDE 2

University of Cologne Institute of Virology

ES / Bonn Apr 2008 2

Arevir

Eugen Schülter

Institut für Virologie der Universität zu Köln - Cologne center of advanced european studies and research - Bonn

Analysis of resistance mutations of HI-Virus

Bioinformatics analysis of relations between mutations of the HIV genome and phenotypical drug resistance for the

  • ptimization of anti-retroviral therapies
slide-3
SLIDE 3

University of Cologne Institute of Virology

ES / Bonn Apr 2008 3

Background

1999 The Arevir project1) is founded by Daniel Hoffmann, Rolf Kaiser and Joachim Selbig. Aim: develop computer based methods to enhance the interpretation

  • f genotipic resistance tests.

2000 Niko Beerenwinkel created the basis for Arevir in his dissertation: Computational Analysis of HIV Drug Resistance Data 2001 Barbara Schmidt, Hauke Walter and Klaus Korn provided ~ 650 genotype phenotype pairs 2001 First version of geno2pheno available online, predicting drug resistance from genotype 2006 Collaboration with EuResist (www.euresist.org) 2008 New Arevir DB and user interface version

1) The project was funded by the German Research Foundation (Grants HO 1582/1-1 to -3 and KA 1569/1-1 to -3)

slide-4
SLIDE 4

University of Cologne Institute of Virology

ES / Bonn Apr 2008 4

Background

Genotype: 41L, 67N, 68G, 70R, 86DE, 88S, 90I, 102Q, 103S, 118IV, 135T, 162H, 190A, 203DE,

210W, 211K, 214F, 215Y, 219E, 228H, 248D, 277K, 283I, 326V, 329IV, 334L Resistance Test VL t

But: Selection of the optimal therapy remains a hard task! We need a database to find out more about mutations and resistance, correlate therapies with clinical outcome etc. Why develop computer based methods?

Therapy should be switched

We have to deal with: … VL, CD4, side effects, preferences, resistance

slide-5
SLIDE 5

University of Cologne Institute of Virology

ES / Bonn Apr 2008 5

The Arevir DB is a relational database (MySQL version 5.0x) consisting of 54 tables and 14 views organized in 5 groups:

Patient related data (demographic data and diagnoses) Therapy data Isolate related data (serology, clinical chemistry) Genotipic data (mutations) Administrative data (access rights etc.)

Arevir DB basics

slide-6
SLIDE 6

University of Cologne Institute of Virology

ES / Bonn Apr 2008 6

Arevir Protection/Security

Restricted access to the data

Only people involved in patient care (e.g. responsible for diagnostic findings)

have access to all data

Bioinformatics receive a copy of anonymized data

Use of pseudonyms

When a new patient record is added a pseudonym using the SHA-1

algorithm is generated to facilitate data exchange without name disclosure

Connection to Arevir is only possible:

With SSH (secure shell) Using public/private key authorization From a computer whose IP-Address is in a subnet known to Arevir

slide-7
SLIDE 7

University of Cologne Institute of Virology

ES / Bonn Apr 2008 7

Arevir-DB

SSH tunnel SSH tunnel

Rosie

Firewall

SSH-Daemon

Arevir- Server

User PC

Internet Internet

HTTP Mail FTP Telnet …

Arevir security

  • verview

Security concept

slide-8
SLIDE 8

University of Cologne Institute of Virology

ES / Bonn Apr 2008 8

Selbigs Data Funnel

For a meaningful correlation a large amount of data is needed!

P a t i e n t s P a t i e n t s Therapies Therapies C D 4 C D 4 V L V L Sequence

Steady small result despite of an increasing amount of data

s Sequences

2001 Data from ~ 500 patients ~ 150 records suitable for evaluation 2005 Data from ~ 4500 patients ~ 350 records suitable for evaluation

slide-9
SLIDE 9

University of Cologne Institute of Virology

ES / Bonn Apr 2008 9

Arevir before 2008

Arevir in June 2005

Patients ~ 4.500 Diagnoses ~ 2.900 Therapies ~ 12.000 CD4 values ~ 52.000 VL values ~ 41.000 Sequences ~ 3.000

Arevir in February 2007

Patients 2.444 Diagnoses 4.412 Therapies 6.154 CD4 values 53.031 VL values 25.972 Sequences 2.180

slide-10
SLIDE 10

University of Cologne Institute of Virology

ES / Bonn Apr 2008 10

Data cleansing

Small errors can have quite big effects:

TCE 15.03.2002 20.06.200

1,6

Therapy B Therapy A 12W (15.06.2002)

4,0

2

VL t

slide-11
SLIDE 11

University of Cologne Institute of Virology

ES / Bonn Apr 2008 11

Rosie

slide-12
SLIDE 12

University of Cologne Institute of Virology

ES / Bonn Apr 2008 12

Arevir 2008

Arevir in June 2005

Patients ~ 4.500 Diagnoses ~ 2.900 Therapies ~ 12.000 CD4 values ~ 52.000 VL values ~ 41.000 Sequences ~ 3.000

Arevir in February 2007

Patients 2.444 Diagnoses 4.412 Therapies 6.154 CD4 values 53.031 VL values 25.972 Sequences 2.180

Arevir in May 2008

Patients ~ 5.600 Diagnoses ~ 9.200 Therapies ~ 9.800 Isolate - values ~ 230.000 Sequences ~ 5.100

slide-13
SLIDE 13

University of Cologne Institute of Virology

ES / Bonn Apr 2008 13

Conclusions

Databases can help to improve the health condition of HIV infected.

By supporting therapy optimization algorithms By enhancing our understanding of HIV

But to extract useful information from a database, a

large amount of data with high quality is needed!

slide-14
SLIDE 14

University of Cologne Institute of Virology

ES / Bonn Apr 2008 14

Alexander Thielen Andre Altmann Bernd Kupfer Bettina Jaster Christian von Behren Claudia Müller Clemens Kühn Daniel Hoffmann Daniel Gillor Dörte Hammerschmidt Elena Knops EuResist Gerd Fätkenheuer H k W l

Thank you!

slide-15
SLIDE 15

University of Cologne Institute of Virology

ES / Bonn Apr 2008 15

slide-16
SLIDE 16

University of Cologne Institute of Virology

ES / Bonn Apr 2008 16

Data cleansing I

André Ramirez 23.03.1969 Andre Ramires 23.03.1969 Hans-Peter Schmidt 05.11.1982 Hans Schmidt 05.11.1982 Georgios Koehler 15.04.1958 Georgious Koeler 15.04.1958 Anna Meier 29.12.1978 Anna da Silveira 29.12.1978 John Miller 16.03.1970 John Miller 10.03.1970 Mgabe Osamba 12.04.1974 Ossamba Mgabe 11.04.1974

Cleansing of patient names and assignment of an unique patient ID was done with new fuzzy indices (name aliases allowed)

Examples are fictive!

slide-17
SLIDE 17

University of Cologne Institute of Virology

ES / Bonn Apr 2008 17

Data cleansing II

Several checks applied to uncover suspicious data:

Genotypes without sampling date Duplicate genotypes Overlapping therapies Date checks (e.g. infection < first positive test < first treatment etc.) Therapies with 'forbidden' drug combinations More than one isolate value (of a kind) in a period of 14 days Same isolate value for different dates Lab values out of specified range (e.g. HIVRNA > 10.000.000 copies/ml) . . .

Data cleansing is hard work, time consuming, tedious but absolutely necessary!

slide-18
SLIDE 18

University of Cologne Institute of Virology

ES / Bonn Apr 2008 18

Arevir DB 'Scheme'

patients

2342 1963 'M' 'IVDA'

identity

2342 Max Muster 17.03.1963 'FA3E359D1..'

diagnoses

7431 2342 'R75'

therapies

4288 2342 2002-06-23 2003-11-05

therapycom- ponents

4288 '3TC' 150 mg

isolates

12630 2342 2002-06-02

sequences

2966 12630 'CCTCAGATC...' 3247583203

isolate_values

12630 'HIVRNA' 'bDNA' 165000

mutations

2966 K65R, L74V...

slide-19
SLIDE 19

University of Cologne Institute of Virology

ES / Bonn Apr 2008 19

Backup I

Silent Mutations Subtype B

4,0 5,0 6,0 7,0 8,0 9,0 10,0 1999 2000 2001 2002 2003 2004 2005 2006 PRO (treated) RT (treated) PRO (naive) RT (naive)