Bias-aware learning for HIV therapy screening Jasmina Bogojeska, - - PowerPoint PPT Presentation

bias aware learning for hiv therapy screening
SMART_READER_LITE
LIVE PREVIEW

Bias-aware learning for HIV therapy screening Jasmina Bogojeska, - - PowerPoint PPT Presentation

Bias-aware learning for HIV therapy screening Jasmina Bogojeska, Max-Planck Institut fr Informatik, Saarbrcken Arevir 2009 Overview Problem setting Transfer learning with distribution matching Conclusions Problem setting Bias-aware


slide-1
SLIDE 1

Jasmina Bogojeska, Max-Planck Institut für Informatik, Saarbrücken

Bias-aware learning for HIV therapy screening

Arevir 2009

slide-2
SLIDE 2

Overview

Problem setting Transfer learning with distribution matching Conclusions

slide-3
SLIDE 3

Problem setting

Bias-aware prediction of the

  • utcome of combination

therapies given to AIDS patients Develop methods that can deal with:

Different trends in treating patients Evolution of the viral sequence under drug pressure over time Uneven therapy representation

  • lder

recent

slide-4
SLIDE 4

Multi-Task Learning

Multi-task learning

several related tasks learn a model for each task from all available data by utilizing their similarity

Our approach in the HIV setting

very scarce training data drug combinations (tasks) have similar but not identical effect on the virus each combination therapy is a separate task learn a model for each therapy by using all available data with proper weights

slide-5
SLIDE 5

Multi-Task Learning

~Training

Model [ ( , )] y

x

x

target

w

~Target

Model ( , ) y

x

x Goal: Fit the best model for a specific target therapy t

=

x

Target distribution Training distribution

( , | ) ( ) ( , | )

z

p y t p z p y z

x x

slide-6
SLIDE 6

Multi-Task learning with distribution matching

Simple logistic regression model – data points closer to the decision boundary will get higher weights

target rest

LOGREG

[ ]

~Target ~Training

target distr. Model ( , ) Model ( , ) training distr. y y ⎡ ⎤ = ⎢ ⎥ ⎣ ⎦

x x

x x

How to estimate the resampling (similarity) weights?

slide-7
SLIDE 7

Therapy similarity kernels

Prior information on therapy similarity

defined with a similarity kernel

Drug feature kernel: based on the drugs used in the therapy Mutation table kernel: based on similarity of resistance-relevant mutations 1 0 1 1 0 1 0 1 0 1

muts(th1) muts(th2) drug group 1

0.8 0.4 0.6 importance 0.5 0.7 0.6 importance

sim(gr1)

avg(sim(gr1),sim(gr2),sim(gr3)) sim(th1, th2)

slide-8
SLIDE 8

Algorithm

Step 1 Step 2

Learn the importance weights – logistic regression model that separates the target therapy data from the rest of the data in the training set Train a target model – logistic regression model fitted on the training data weighted with the similarity weights obtained in step 1

slide-9
SLIDE 9

Results

initial response sustained response

Distribution matching performs best

80% training 20% test

Special evaluation setup to address evolving trends in treatments

  • ver time

Bickel, S. et al., 2008. in Proceedings of ICML.

separate

  • ne-size-

fits-all

  • hier. Bayes

kernel

  • hier. Bayes
  • Gauss. Proc.

distribution matching

slide-10
SLIDE 10

Conclusions

Using the transfer learning paradigm for predicting

  • utcomes of HIV therapies

Learning separate models for each therapy by using all available data with proper weights Time-aware evaluation setup to encounter changing trends in HIV treatment and virus evolution over time

slide-11
SLIDE 11

Acknowledgements

Thomas Lengauer Steffen Bickel, Tobias Scheffer HIV group @ MPII Saarbrucken

Andre Altmann Alexander Thielen Kasia Bozek Joachim Büch

EuResist

slide-12
SLIDE 12

Thank You!

slide-13
SLIDE 13

Data

Data representation

Viral genotype Current treatment Treatment history Label (success or failure)

Sustained response (2252 samples) Initial response (3385 samples)

0 1 0 0 1 …

  • ccurrence of

resistance mutations 1 1 0 0 1 … 1 0 0 1 0 … drugs used in current treatment drugs used in all previous treatments

slide-14
SLIDE 14

Labeling

Sustained response labeling (2252 labeled samples, 454 different therapies)

success

Initial response labeling (3385 labeled samples, 538 different therapies)

success