An Improved Matrix Completion Algorithm For Categorical Variables: - - PowerPoint PPT Presentation

an improved matrix completion algorithm for categorical
SMART_READER_LITE
LIVE PREVIEW

An Improved Matrix Completion Algorithm For Categorical Variables: - - PowerPoint PPT Presentation

An Improved Matrix Completion Algorithm For Categorical Variables: Application to Active Learning of Drug Responses Huangqingbo Sun, Robert F. Murphy Workshop on Real World Experiment Design and Active Learning at ICML 2020 Drug Discovery


slide-1
SLIDE 1

Huangqingbo Sun, Robert F. Murphy

Workshop on Real World Experiment Design and Active Learning at ICML 2020

An Improved Matrix Completion Algorithm For Categorical Variables: Application to Active Learning of Drug Responses

slide-2
SLIDE 2

Drug Discovery Funnel

Source: PhRMA Failures in clinical trials (and even after FDA approval) typically due to side effects that were not tested for earlier on (e.g., Vioxx) Better to test early for both having desired effect and not having undesired effects – but too many combinations (104 targets x 107 compounds)

slide-3
SLIDE 3
  • Solution is active learning of a predictive model of all compound effects on all

targets

  • But there are also many possible effects that compounds could have on a given

target – thus effects are categorical variables

  • Assume that there are some similarities in effects among compounds and targets
  • Predictive model: completion (imputation) of a very sparse (only a few
  • bserved entries) categorical matrix
  • For active learning, uncertainty sampling is adopted, with 3 query strategies.

Active Learning - Multiple Phenotypes

slide-4
SLIDE 4

Experiment on Synthetic Data

  • How fast does Active Learning comparing to random selection?

Performance was measured as the difference in the number of batches to achieve 100% (right) or 90% (left) accuracy between active and random selection.

slide-5
SLIDE 5

Experiment Using Microscope Images for Many Drugs and Targets

0% 20% 40% 60% 80% 100%

1 21 41 61 81 101

Accuracy Round

Naik et al. Active Model Our Active Model - Hybrid Query Our Active Model - Least Score Our Active Model - Entropy Our Random Model Naik et al. Random Model

20 40 60 80 100

Image source: Naik et al.

Learn the effect of 92 drugs on 94 GFP- tagged proteins without doing experiments for all drugs and proteins with the help of Active Learning.

slide-6
SLIDE 6
  • Improved clustering-based, “lazy learning” matrix completion algorithm

for categorical matrices.

  • Results in improved active learning performance over previous methods.

Conclusions