SLIDE 1
Bias-aware learning for HIV therapy screening Jasmina Bogojeska, - - PowerPoint PPT Presentation
Bias-aware learning for HIV therapy screening Jasmina Bogojeska, - - PowerPoint PPT Presentation
Bias-aware learning for HIV therapy screening Jasmina Bogojeska, Max-Planck Institut fr Informatik, Saarbrcken Arevir 2009 Overview Problem setting Transfer learning with distribution matching Conclusions Problem setting Bias-aware
SLIDE 2
SLIDE 3
Problem setting
Bias-aware prediction of the
- utcome of combination
therapies given to AIDS patients Develop methods that can deal with:
Different trends in treating patients Evolution of the viral sequence under drug pressure over time Uneven therapy representation
- lder
recent
SLIDE 4
Multi-Task Learning
Multi-task learning
several related tasks learn a model for each task from all available data by utilizing their similarity
Our approach in the HIV setting
very scarce training data drug combinations (tasks) have similar but not identical effect on the virus each combination therapy is a separate task learn a model for each therapy by using all available data with proper weights
SLIDE 5
Multi-Task Learning
~Training
Model [ ( , )] y
x
x
target
w
~Target
Model ( , ) y
x
x Goal: Fit the best model for a specific target therapy t
=
x
Target distribution Training distribution
( , | ) ( ) ( , | )
z
p y t p z p y z
∑
x x
SLIDE 6
Multi-Task learning with distribution matching
Simple logistic regression model – data points closer to the decision boundary will get higher weights
target rest
LOGREG
[ ]
~Target ~Training
target distr. Model ( , ) Model ( , ) training distr. y y ⎡ ⎤ = ⎢ ⎥ ⎣ ⎦
x x
x x
How to estimate the resampling (similarity) weights?
SLIDE 7
Therapy similarity kernels
Prior information on therapy similarity
defined with a similarity kernel
Drug feature kernel: based on the drugs used in the therapy Mutation table kernel: based on similarity of resistance-relevant mutations 1 0 1 1 0 1 0 1 0 1
muts(th1) muts(th2) drug group 1
0.8 0.4 0.6 importance 0.5 0.7 0.6 importance
sim(gr1)
avg(sim(gr1),sim(gr2),sim(gr3)) sim(th1, th2)
SLIDE 8
Algorithm
Step 1 Step 2
Learn the importance weights – logistic regression model that separates the target therapy data from the rest of the data in the training set Train a target model – logistic regression model fitted on the training data weighted with the similarity weights obtained in step 1
SLIDE 9
Results
initial response sustained response
Distribution matching performs best
80% training 20% test
Special evaluation setup to address evolving trends in treatments
- ver time
Bickel, S. et al., 2008. in Proceedings of ICML.
separate
- ne-size-
fits-all
- hier. Bayes
kernel
- hier. Bayes
- Gauss. Proc.
distribution matching
SLIDE 10
Conclusions
Using the transfer learning paradigm for predicting
- utcomes of HIV therapies
Learning separate models for each therapy by using all available data with proper weights Time-aware evaluation setup to encounter changing trends in HIV treatment and virus evolution over time
SLIDE 11
Acknowledgements
Thomas Lengauer Steffen Bickel, Tobias Scheffer HIV group @ MPII Saarbrucken
Andre Altmann Alexander Thielen Kasia Bozek Joachim Büch
EuResist
SLIDE 12
Thank You!
SLIDE 13
Data
Data representation
Viral genotype Current treatment Treatment history Label (success or failure)
Sustained response (2252 samples) Initial response (3385 samples)
0 1 0 0 1 …
- ccurrence of
resistance mutations 1 1 0 0 1 … 1 0 0 1 0 … drugs used in current treatment drugs used in all previous treatments
SLIDE 14