[PPT] - Mattia CF Prosperi ahnven@yahoo.it University of Roma TRE Faculty PowerPoint Presentation

SLIDE 1

Mattia CF Prosperi ahnven@yahoo.it University of “Roma TRE” Faculty of Computer Science Engineering Dept of Computer Science and Automation (DIA) via della vasca navale, 79 – 00149 – Rome, ITALY

SLIDE 2

Summary

The EuResist project

– Project partners – Aims – Collaboration with other projects

The Integrated Data Base

– Technologies

Therapy optimisation issues

– Theoretical models, validation, comparison with state

f the art
Web-service development

– User interface – Expert validation

SLIDE 3

The EuResist Project

Funded by EU under 6th framework
Partners

– Machine learning and data bases: IBM (Isr), MPI (Ger), Roma TRE (Ita), RMKI (Hun) – Statistical analyses: Kingston university (UK) – Clinical and genomic data collection, virology and clinical expertise: University of Siena (Ita), Karolinska Inst (Swe), University of Cologne (Ger) – Coordination and administration: Informa CRO (Ita)

Collaboration with the “Virolab” (funded by EU

as well) exchanging data

SLIDE 4

Aims

Collect and integrate clinical and genomic

data of HIV+ patients

Perform retrospective statistical studies
Develop prediction models for therapy
ptimisation

SLIDE 5

Data sources

ARCA (Italy)
AREVIR (Germany)
Karolinska (Sweden)
Luxembourg cohort
Probably the largest amount of information

about HIV+ patients (as it concerns sequences and clinical markers) in Europe or in the world (only EuroSIDA is comparable)

SLIDE 6

Data base technologies

IBM used a centralised approach

– The data are replicated from the single sources in a new data base – It is an old-fashioned data integration technology, since now the federated approach is preferred (where data are virtually stored accessing to local data bases), but possesses some practical advantages, especially with heterogeneous data sources

SLIDE 7

Data base technologies (2)

Local sources are mapped to the central DB
Reliable server
Quality controls
Interface for statistical

studies and model development

HL7 compliance

SLIDE 8

Data base schema

Normalised schema (important issue from

an IT point of view)

SLIDE 9

Data base size

SLIDE 10

Therapy optimisation

Objective: to determine the optimal Combined Anti-

Retroviral Therapy (CART) given patient’s baseline (demographics, genomic, clinical) and historical characteristics when experiencing a Treatment Change Episode (TCE) or a first line therapy

SLIDE 11

Study Design

SLIDE 12

State of the art

Phenotype (in-vitro)

– VIRCO, Virologic, virtual Geno2pheno

Rule based methods (in-vivo)

– Stanford hivdb, REGA, ANRS, HIV-GRADE, various scores for specific drugs (Marcelin, Bertoli…)

Based on literature evidences, expert opinions and statistical

studies

Not cross-validated, but proven to be significantly associated with

virological outcomes through linear multivariable analysis

Give prediction based only on genotype, without accounting for
ther variables (i.e. viral load, CD4, demographics), even if

sometimes their significance is adjusted for such covariates

Don’t work on combination therapies (CART)
Data driven approaches (in-vivo)

– RDI (Artificial Neural Networks)

Biased study design, not properly validated

SLIDE 13

The EuResist approach

Data driven models
Large sample size
Robust cross

validation

Comparison with state
f the art
Comparison with

expert opinions

SLIDE 14

Exploring the feature space

Usage of all information available added to the

baseline genotype and treatment

– Demographics, treatment history, baseline markers, past genotypes… – Derived features

Mutagenetic trees (genetic barrier)
Bayesian networks for past combination treatments
Higher order interactions
Only minimal feature set required (genotype and

treatment) to perform a prediction

– Not always treatment history or past genotypes are available – But the usage of additional information can enhance performances

SLIDE 15

Modelling techniques

Three independent engines developed by IBM, RM3 and MPI
The engines are combined in a meta-engine

SLIDE 16

Modelling techniques (2)

All engines use Logistic

Regression (LR)

– IBM uses additional features training a bayesian network on past treatments – MPI uses additional features estimating genetic barrier through mutagenetic trees – RM3 uses higher order interactions

mutation x mutation
drug x drug (x drug)
drug x mutation
drug x past drug

SLIDE 17

Modelling techniques (3)

A lot of features!!!

– Hundreds of mutations (not only literature reported) – Hundreds of different CART – Other covariates – All higher order interactions (thousands!!!)

Several feature selection techniques used

– AIC selection – Correlation-based Feature Selection (CFS) – SVM z-scores

SLIDE 18

Results

Individual prediction

engines perform similarly

Combination of

engines enhances performances

– Several combination techniques explored

Usage of additional

information enhances performances

SLIDE 19

Results (2)

Comparison with state of the art:

– The combined engine outperforms Stanford hivdb – Also single engines do, even if less

SLIDE 20

Results (3)

Example of

logistic model with higher-

rder

interactions

Variable

importance is assessed easily

Variable (success prediction)

dds.ratio

p.value sign.

Number of drugs in CART 1.9 2.00E-16 *** HIV RNA baseline LOG cp/ml 0.6 2.66E-12 *** PR_IAS_54_V 0.2 1.31E-06 *** EFV and EFV experience 0.2 2.00E-05 *** RT_184_V and 3TC 0.5 2.79E-05 *** SQV and AZT experience 0.4 0.000146 *** NFV and PI experience 0.5 0.000224 *** RT_184_V and NVP 0.4 0.000344 *** RT_39_A and RT_211_K 0.4 0.000378 *** (Intercept) 4.8 0.000399 *** RT_67_N and RT_184_V 2 0.00056 *** RTV experience 0.5 0.00061 *** TDF and EFV experience 0.5 0.000633 *** PR_63_P and PR_90_M 0.6 0.00082 *** PR_89_M and PR_93_L 3.8 0.000873 *** PR_IAS_20_M 0.2 0.001149 ** EFV 1.8 0.001223 ** PR_IAS_10_I 0.6 0.002524 ** RT_177_E and RT_207_A 2.3 0.007537 ** PR_IAS_54_L 0.2 0.007575 ** APV experience 0.5 0.008579 ** LPV and DDC experience 1.9 0.0087 ** PI_boosted and LPV experience 0.5 0.009403 **

SLIDE 21

Comparison with experts’ opinion

The “EVE” (Expert Vs Engine) study

– Aim: assess EuResist prediction engine performances and agreement with expert

pinion

– Design: a set of TCE is defined, with complete information, and physicians have to give their

pinion about the probability of virological

success – Evaluation: kappa-statistic (measure of agreement among experts), accuracy, AUC

SLIDE 22

Web service

Technology: Ruby on Rails

– open source web framework – large developers community – well documented – very good for web-service development

SLIDE 23

Web service (2)

The user inserts

– Baseline viral sequence (fasta or mutation list) – Optional covariates

Baseline markers (CD4 and HIV RNA)
Age, sex, risk group
Previously experienced treatments

– A suitable CART to be evaluated

The user gets

– Sequence mutations and subtype match – Probability of success (with CI) for the chosen CART – A ranking of other suitable therapies (over a set of CART allowed by international guidelines)

SLIDE 24

Web service (3)

SLIDE 25