Vijay S. Pande Departments of Chemistry, Structural Biology, and - - PowerPoint PPT Presentation

vijay s pande
SMART_READER_LITE
LIVE PREVIEW

Vijay S. Pande Departments of Chemistry, Structural Biology, and - - PowerPoint PPT Presentation

The Rational Exuberance of Deep Learning: Can the breakthroughs in imaging and natural languages be replicated in chemoinformatics? Vijay S. Pande Departments of Chemistry, Structural Biology, and Computer Science, Stanford University General


slide-1
SLIDE 1

The Rational Exuberance of Deep Learning:

Can the breakthroughs in imaging and natural languages be replicated in chemoinformatics?

Vijay S. Pande

Departments of Chemistry, Structural Biology, and Computer Science, Stanford University General Partner, Andreessen Horowitz

1

slide-2
SLIDE 2

Moore’s law: compute cost exponentially → $0

Compute, Storage, Genomics, Mobile Sensors, … → $0

slide-3
SLIDE 3

Impact of Moore’s law

slide-4
SLIDE 4

Impact of Moore’s law

1997 CM-5 (0.03 Petaflop)

slide-5
SLIDE 5

Impact of Moore’s law

1997 CM-5 (0.03 Petaflop) 2007 FAH (1 Petaflop)

slide-6
SLIDE 6

Impact of Moore’s law

1997 CM-5 (0.03 Petaflop) 2007 FAH (1 Petaflop) 2017 1 AWS PFLOP = $100/day

slide-7
SLIDE 7

Impact of Moore’s law

1997 CM-5 (0.03 Petaflop) 2007 FAH (1 Petaflop) 2017 1 AWS PFLOP = $100/day

Impossible

slide-8
SLIDE 8

Impact of Moore’s law

1997 CM-5 (0.03 Petaflop) 2007 FAH (1 Petaflop) 2017 1 AWS PFLOP = $100/day

Impossible World Record

slide-9
SLIDE 9

Impact of Moore’s law

1997 CM-5 (0.03 Petaflop) 2007 FAH (1 Petaflop) 2017 1 AWS PFLOP = $100/day

Impossible World Record Free

slide-10
SLIDE 10

Deep Learning for drug design?

Jeff Dean (Google), Andrew Ng (Stanford), et al http://arxiv.org/pdf/1112.6209.pdf

slide-11
SLIDE 11

Deep Learning for drug design?

Jeff Dean (Google), Andrew Ng (Stanford), et al http://arxiv.org/pdf/1112.6209.pdf

slide-12
SLIDE 12

Deep Learning for drug design?

Jeff Dean (Google), Andrew Ng (Stanford), et al http://arxiv.org/pdf/1112.6209.pdf

slide-13
SLIDE 13

Deep Learning for drug design?

Jeff Dean (Google), Andrew Ng (Stanford), et al http://arxiv.org/pdf/1112.6209.pdf

slide-14
SLIDE 14

Deep Learning for drug design?

Jeff Dean (Google), Andrew Ng (Stanford), et al http://arxiv.org/pdf/1112.6209.pdf drug-like

slide-15
SLIDE 15

Deep Learning for drug design?

Jeff Dean (Google), Andrew Ng (Stanford), et al http://arxiv.org/pdf/1112.6209.pdf GPCR-agonist drug-like

slide-16
SLIDE 16

Deep Learning for drug design?

Jeff Dean (Google), Andrew Ng (Stanford), et al http://arxiv.org/pdf/1112.6209.pdf Kinase inhibitor GPCR-agonist drug-like

slide-17
SLIDE 17
slide-18
SLIDE 18

Will these methods be successful in drug design?

slide-19
SLIDE 19

Outline for today’s talk

slide-20
SLIDE 20

Outline for today’s talk

slide-21
SLIDE 21

Outline for today’s talk

slide-22
SLIDE 22

Outline for today’s talk

slide-23
SLIDE 23

Outline for today’s talk

slide-24
SLIDE 24

Outline for today’s talk

slide-25
SLIDE 25

Combining the best of both worlds Adding ML to Physical Simulation

slide-26
SLIDE 26

Molecular simulation is very appealing

slide-27
SLIDE 27

Molecular simulation is very appealing

slide-28
SLIDE 28

But is limited by long time scales

10-15 femto 10-12 pico 10-9 nano 10-6 micro 10-3 milli 100 seconds

Bond vibration Bond Isomeration Water dynamics Helix forms Fast conformational change 1 day of simulation MD step Slow conformational change

slide-29
SLIDE 29

10-9 nano 10-6 micro 10-3 milli 100 seconds

Water dynamics Helix forms Fast conformational change 1 day of simulation Slow conformational change

But is limited by long time scales

slide-30
SLIDE 30

10-9 nano 10-6 micro 10-3 milli 100 seconds

Water dynamics Helix forms Fast conformational change 1 day of simulation Slow conformational change

~105x

Folding@home 


(~100 petaflops)

FAH

But is limited by long time scales

slide-31
SLIDE 31

10-9 nano 10-6 micro 10-3 milli 100 seconds

Water dynamics Helix forms Fast conformational change 1 day of simulation Slow conformational change

~105x

Folding@home 


(~100 petaflops)

GPUs

~102x

FAH FAH+
 GPUs

But is limited by long time scales

slide-32
SLIDE 32

10-9 nano 10-6 micro 10-3 milli 100 seconds

Water dynamics Helix forms Fast conformational change 1 day of simulation Slow conformational change

~105x

Folding@home 


(~100 petaflops)

GPUs

~102x

Markov State Model (MSM) technology

~102x

FAH FAH+
 GPUs

But is limited by long time scales

FAH+
 GPUs + 
 MSMs

slide-33
SLIDE 33

10-9 nano 10-6 micro 10-3 milli 100 seconds

Water dynamics Helix forms Fast conformational change 1 day of simulation Slow conformational change

~105x

Folding@home 


(~100 petaflops)

GPUs

~102x

Markov State Model (MSM) technology

~102x

FAH FAH+
 GPUs

But is limited by long time scales

FAH+
 GPUs + 
 MSMs

http://omnia.md http://openmm.org
 http://msmbuilder.org

http://leeping.github.io/ forcebalance/doc/html

slide-34
SLIDE 34

MSM approach finds new states

(Feinberg, Farimani, VSP)

slide-35
SLIDE 35

MSM approach finds new states

(Feinberg, Farimani, VSP)

slide-36
SLIDE 36

MSM approach finds new states

(Feinberg, Farimani, VSP)

slide-37
SLIDE 37

… and improves predictions

(Feinberg, Farimani, VSP)

slide-38
SLIDE 38
slide-39
SLIDE 39

“Deeper” role for ML

slide-40
SLIDE 40

Our solution: Drug repurposing

slide-41
SLIDE 41

Our solution: Drug repurposing

Many drugs can perform multiple functions.

slide-42
SLIDE 42

Our solution: Drug repurposing

Many drugs can perform multiple functions. 
 
 
 
 The question is how can we find an existing drug for our desired new function?

slide-43
SLIDE 43

Our solution: Drug repurposing

Many drugs can perform multiple functions. 
 
 
 
 The question is how can we find an existing drug for our desired new function? We’re developing and applying Machine Learning approaches.

slide-44
SLIDE 44

Classic approach: molecular similarity

Given a lead compound, we can computationally find compounds which are chemically similar.

slide-45
SLIDE 45

Primary challenges to using ML

slide-46
SLIDE 46

Primary challenges to using ML

  • 1. Small Datasets: 10K-30K

compounds which are known to work or fail


slide-47
SLIDE 47

Primary challenges to using ML

  • 1. Small Datasets: 10K-30K

compounds which are known to work or fail


  • 2. Imbalanced Datasets: 30-50

active drugs vs thousands of inactive drugs for a target


slide-48
SLIDE 48

Primary challenges to using ML

  • 1. Small Datasets: 10K-30K

compounds which are known to work or fail


  • 2. Imbalanced Datasets: 30-50

active drugs vs thousands of inactive drugs for a target


  • 3. Challenging Featurizations:

What are effective features (without them, ML can’t help)

slide-49
SLIDE 49

How well does this work?

arXiv:1502.02072

(Ramsundar, Kearns, Riley, …, Google, VSP)

Massively Multitask Networks for Drug Discovery

Bharath Ramsundar*,†,

RBHARATH@STANFORD.EDU

Steven Kearnes*,†

KEARNES@STANFORD.EDU

Patrick Riley

PFR@GOOGLE.COM

Dale Webster

DRW@GOOGLE.COM

David Konerding

DEK@GOOGLE.COM

Vijay Pande†

PANDE@STANFORD.EDU

(*Equal contribution, †Stanford University, Google Inc.)

Abstract

Massively multitask neural architectures provide a learning framework for drug discovery that synthesizes information from many distinct bi-

  • logical sources. To train these architectures at

scale, we gather large amounts of data from pub- lic sources to create a dataset of nearly 40 mil- lion measurements across more than 200 bio- logical targets. We investigate several aspects

  • f the multitask framework by performing a se-

ries of empirical studies and obtain some in- teresting results: (1) massively multitask net- works obtain predictive accuracies significantly better than single-task methods, (2) the pre- dictive power of multitask networks improves as additional tasks and data are added, (3) the total amount of data and the total number of tasks both contribute significantly to multitask improvement, and (4) multitask networks afford limited transferability to tasks not in the training

  • set. Our results underscore the need for greater

data sharing and further algorithmic innovation to accelerate the drug discovery process. After a suitable target has been identified, the first step in the drug discovery process is “hit finding.” Given some druggable target, pharmaceutical companies will screen millions of drug-like compounds in an effort to find a few attractive molecules for further optimization. These screens are often automated via robots, but are expensive to perform. Virtual screening attempts to replace or aug- ment the high-throughput screening process by the use of computational methods (Shoichet, 2004). Machine learn- ing methods have frequently been applied to virtual screen- ing by training supervised classifiers to predict interactions between targets and small molecules. There are a variety of challenges that must be overcome to achieve effective virtual screening. Low hit rates in experimental screens (often only 1–2% of screened com- pounds are active against a given target) result in im- balanced datasets that require special handling for effec- tive learning. For instance, care must be taken to guard against unrealistic divisions between active and inactive compounds (“artificial enrichment”) and against informa- tion leakage due to strong similarity between active com- pounds (“analog bias”) (Rohrer & Baumann, 2009). Fur- thermore, the paucity of experimental data means that over- fitting is a perennial thorn.

slide-50
SLIDE 50

How well does this work?

arXiv:1502.02072

(Ramsundar, Kearns, Riley, …, Google, VSP)

Massively Multitask Networks for Drug Discovery

Bharath Ramsundar*,†,

RBHARATH@STANFORD.EDU

Steven Kearnes*,†

KEARNES@STANFORD.EDU

Patrick Riley

PFR@GOOGLE.COM

Dale Webster

DRW@GOOGLE.COM

David Konerding

DEK@GOOGLE.COM

Vijay Pande†

PANDE@STANFORD.EDU

(*Equal contribution, †Stanford University, Google Inc.)

Abstract

Massively multitask neural architectures provide a learning framework for drug discovery that synthesizes information from many distinct bi-

  • logical sources. To train these architectures at

scale, we gather large amounts of data from pub- lic sources to create a dataset of nearly 40 mil- lion measurements across more than 200 bio- logical targets. We investigate several aspects

  • f the multitask framework by performing a se-

ries of empirical studies and obtain some in- teresting results: (1) massively multitask net- works obtain predictive accuracies significantly better than single-task methods, (2) the pre- dictive power of multitask networks improves as additional tasks and data are added, (3) the total amount of data and the total number of tasks both contribute significantly to multitask improvement, and (4) multitask networks afford limited transferability to tasks not in the training

  • set. Our results underscore the need for greater

data sharing and further algorithmic innovation to accelerate the drug discovery process. After a suitable target has been identified, the first step in the drug discovery process is “hit finding.” Given some druggable target, pharmaceutical companies will screen millions of drug-like compounds in an effort to find a few attractive molecules for further optimization. These screens are often automated via robots, but are expensive to perform. Virtual screening attempts to replace or aug- ment the high-throughput screening process by the use of computational methods (Shoichet, 2004). Machine learn- ing methods have frequently been applied to virtual screen- ing by training supervised classifiers to predict interactions between targets and small molecules. There are a variety of challenges that must be overcome to achieve effective virtual screening. Low hit rates in experimental screens (often only 1–2% of screened com- pounds are active against a given target) result in im- balanced datasets that require special handling for effec- tive learning. For instance, care must be taken to guard against unrealistic divisions between active and inactive compounds (“artificial enrichment”) and against informa- tion leakage due to strong similarity between active com- pounds (“analog bias”) (Rohrer & Baumann, 2009). Fur- thermore, the paucity of experimental data means that over- fitting is a perennial thorn.

http://DeepChem.io

slide-51
SLIDE 51

Deep Learning Approaches to Drug Design

slide-52
SLIDE 52

Deep Learning Approaches to Drug Design

  • Why Deep Learning?
  • Merck Kaggle contest: multi-task deep neural

networks that combine small datasets together, increases the amount of data

  • Great at extracting features from rich, “natural” data

sources (images, video, speech)


slide-53
SLIDE 53

Deep Learning Approaches to Drug Design

  • Why Deep Learning?
  • Merck Kaggle contest: multi-task deep neural

networks that combine small datasets together, increases the amount of data

  • Great at extracting features from rich, “natural” data

sources (images, video, speech)


  • Outstanding questions
  • Can we devise rich, natural featurizations of

molecules that can be fed to deep networks?

  • What architectures will provide the best

performance?

slide-54
SLIDE 54

Multi-task Learning is important

Input

DBN

}

slide-55
SLIDE 55

Multi-task Learning is important

Task 1 Task 2

shared representation

Task n

Input

DBN

}

slide-56
SLIDE 56

Questions to address

(Ramsundar, Kearns, Riley, …, Google, VSP)

slide-57
SLIDE 57

Questions to address

  • 1. Do massively multitask networks provide a

performance boost over simple machine learning methods?
 


(Ramsundar, Kearns, Riley, …, Google, VSP)

slide-58
SLIDE 58

Questions to address

  • 1. Do massively multitask networks provide a

performance boost over simple machine learning methods?
 


  • 2. How does the performance of a multitask network

depend on the number of tasks?
 


(Ramsundar, Kearns, Riley, …, Google, VSP)

slide-59
SLIDE 59

Questions to address

  • 1. Do massively multitask networks provide a

performance boost over simple machine learning methods?
 


  • 2. How does the performance of a multitask network

depend on the number of tasks?
 


  • 3. Do massively multitask networks extract

generalizable information about chemical space?

(Ramsundar, Kearns, Riley, …, Google, VSP)

slide-60
SLIDE 60

Protocol

Input Layer 1024 binary nodes Hidden layers 1-4 layers with 50-3000 nodes Fully connected to layer below, rectified linear activation

Softmax nodes, one per dataset

(Ramsundar, Kearns, Riley, …, Google, VSP)

slide-61
SLIDE 61

Results: AUC for various models

Model PCBA (n = 128) MUV (n = 17) Tox21 (n = 12) Logistic Regression (LR) .801 .752 .738 Random Forest (RF) .800 .774 .790 Single-Task Neural Net (STNN) .795 .732 .714 Max{LR, RF, STNN} .821 .781 .790 1-Hidden (1200) Layer Multitask Neural Net .852 .816 .789 4-Hidden (1000) Layer Multitask Neural Net .858 .836 .810 Pyramidal (2000, 100) Multitask Neural Net, .75 Dropout .872 .837 .802 Pyramidal (2000, 100) Multitask Neural Net .860 .862 .824

Table 2. Median 5-fold-average AUCs for various models. The last column is the result of a sign test vs. the Pyramidal (2000, 100) network (last row) on the 5-fold-average AUCs for all datasets except those in the DUD-E group (we remove DUD-E datasets for reasons discussed in the text). For each model, the sign test estimates the fraction of datasets for which that model is superior to the Pyrami- dal (2000, 100) network (bottom row). We use the Wilson score interval to derive a 95% confidence interval for this fraction. Non-neural network methods were trained using scikit-learn (Pedregosa et al., 2011) implementations and basic hyperparameter optimization. We also include results for a hypothetical “best” single-task model (Max{LR, RF, STNN}) to provide a stronger baseline. Details for our cross-validation and training procedures are given in the Appendix.

(Ramsundar, Kearns, Riley, …, Google, VSP)

slide-62
SLIDE 62

Multitask DNN does better across the board

Model PCBA (n = 128) MUV (n = 17) Tox21 (n = 12) Logistic Regression (LR) .801 .752 .738 Random Forest (RF) .800 .774 .790 Single-Task Neural Net (STNN) .795 .732 .714 Max{LR, RF, STNN} .821 .781 .790 1-Hidden (1200) Layer Multitask Neural Net .852 .816 .789 4-Hidden (1000) Layer Multitask Neural Net .858 .836 .810 Pyramidal (2000, 100) Multitask Neural Net, .75 Dropout .872 .837 .802 Pyramidal (2000, 100) Multitask Neural Net .860 .862 .824

Table 2. Median 5-fold-average AUCs for various models. The last column is the result of a sign test vs. the Pyramidal (2000, 100) network (last row) on the 5-fold-average AUCs for all datasets except those in the DUD-E group (we remove DUD-E datasets for reasons discussed in the text). For each model, the sign test estimates the fraction of datasets for which that model is superior to the Pyrami- dal (2000, 100) network (bottom row). We use the Wilson score interval to derive a 95% confidence interval for this fraction. Non-neural network methods were trained using scikit-learn (Pedregosa et al., 2011) implementations and basic hyperparameter optimization. We also include results for a hypothetical “best” single-task model (Max{LR, RF, STNN}) to provide a stronger baseline. Details for our cross-validation and training procedures are given in the Appendix.

(Ramsundar, Kearns, Riley, …, Google, VSP)

slide-63
SLIDE 63

Room to grow: more data will help

(Ramsundar, Kearns, Riley, …, Google, VSP)

slide-64
SLIDE 64

Challenges of interpretation

slide-65
SLIDE 65

Challenges of interpretation

  • Going beyond Deep Neural

Net as a black box

slide-66
SLIDE 66

Challenges of interpretation

  • Going beyond Deep Neural

Net as a black box

  • How can we systematically

interpret the features learned by the network?

slide-67
SLIDE 67

Challenges of interpretation

  • Going beyond Deep Neural

Net as a black box

  • How can we systematically

interpret the features learned by the network?

  • Can neurons be matched to

functional groups like carboxylates or amines?

slide-68
SLIDE 68

Challenges of interpretation

  • Going beyond Deep Neural

Net as a black box

  • How can we systematically

interpret the features learned by the network?

  • Can neurons be matched to

functional groups like carboxylates or amines?

  • If so, can we argue that the

network “thinks” like an organic chemist?

slide-69
SLIDE 69

Questions to address + some answers

(Ramsundar, Kearns, Riley, …, Google, VSP)

slide-70
SLIDE 70

Questions to address + some answers

  • 1. Do massively multitask networks provide a performance

boost over simple machine learning methods? 
 Yes: significantly, with room to grow


(Ramsundar, Kearns, Riley, …, Google, VSP)

slide-71
SLIDE 71

Questions to address + some answers

  • 1. Do massively multitask networks provide a performance

boost over simple machine learning methods? 
 Yes: significantly, with room to grow


  • 2. How does the performance of a multitask network depend
  • n the number of tasks? 


We haven’t saturated yet and are working to get much more data.


(Ramsundar, Kearns, Riley, …, Google, VSP)

slide-72
SLIDE 72

Questions to address + some answers

  • 1. Do massively multitask networks provide a performance

boost over simple machine learning methods? 
 Yes: significantly, with room to grow


  • 2. How does the performance of a multitask network depend
  • n the number of tasks? 


We haven’t saturated yet and are working to get much more data.


  • 3. Do massively multitask networks extract generalizable

information about chemical space? 
 Appears this may be possible – simpler ML on DNN features works.

(Ramsundar, Kearns, Riley, …, Google, VSP)

slide-73
SLIDE 73
slide-74
SLIDE 74

But how does this do in the “real world”?

slide-75
SLIDE 75

Steven Kearnes goes to Vertex to test

(Kearnes, Goldman, VSP)

Modeling Industrial ADMET Data with Multitask Networks

Steven Kearnes Stanford University kearnes@stanford.edu Brian Goldman Vertex Pharmaceuticals Inc. brian_goldman@vrtx.com Vijay Pande Stanford University pande@stanford.edu

Abstract Deep learning methods such as multitask neural networks have recently been applied to ligand-based virtual screening and other drug discovery applications. Using a set of indus- trial ADMET datasets, we compare neural net- works to standard baseline models and ana- lyze multitask learning effects with both ran- dom cross-validation and a more relevant tem- poral validation scheme. We confirm that multitask learning can provide modest ben- efits over single-task models and show that smaller datasets tend to benefit more than larger datasets from multitask learning. Addi- tionally, we find that adding massive amounts

  • f side information is not guaranteed to improve

performance relative to simpler multitask learn-

  • ing. Our results emphasize that multitask ef-

fects are highly dataset-dependent, suggesting the use of dataset-specific models to maximize

  • verall performance.

1 Introduction

⋯ ⋯

⋯101000000001000100001000100000010 ⋯

Hidden Layer 1 Hidden Layer N Input Features Task-Specific Output (Single-Task or Multitask) Figure 1: Abstract neural network architecture. The input vector is a binary molecular fingerprint with 1024

  • bits. All connections between layers are dense, meaning

that every unit in layer n is connected to every unit in layer n + 1. Each output block is a task-specific two-class softmax layer; dashed lines indicate that models can be either single-task or multitask.

arXiv:1606.08793

slide-76
SLIDE 76

How did we do?

(Kearnes, Goldman, VSP)

slide-77
SLIDE 77

How did we do?

(Kearnes, Goldman, VSP)

Table 1: Proprietary datasets used for model evaluation. Each data point is associated with an experiment date used for temporal validation. Dataset Actives Inactives Total A 20 247 9652 29 899 B 32 806 23 936 56 742 C 40 136 27 703 67 839 D 24 379 2374 26 753 E 21 722 2746 24 468 F 25 202 2034 27 236 G 2003 3226 5229 H 500 526 1026 I 669 344 1013 J 883 399 1282 K 845 357 1202 L 489 164 653 M 820 357 1177 N 1420 740 2160 O 670 1417 2087 P 3861 4107 7968 Q 1056 2658 3714 R 215 2760 2975 S 987 582 1569 T 1454 5935 7389 U 3998 2790 6788 V 2795 896 3691 187 157 95 703 282 860

slide-78
SLIDE 78

How did we do?

(Kearnes, Goldman, VSP)

Table 1: Proprietary datasets used for model evaluation. Each data point is associated with an experiment date used for temporal validation. Dataset Actives Inactives Total A 20 247 9652 29 899 B 32 806 23 936 56 742 C 40 136 27 703 67 839 D 24 379 2374 26 753 E 21 722 2746 24 468 F 25 202 2034 27 236 G 2003 3226 5229 H 500 526 1026 I 669 344 1013 J 883 399 1282 K 845 357 1202 L 489 164 653 M 820 357 1177 N 1420 740 2160 O 670 1417 2087 P 3861 4107 7968 Q 1056 2658 3714 R 215 2760 2975 S 987 582 1569 T 1454 5935 7389 U 3998 2790 6788 V 2795 896 3691 187 157 95 703 282 860

c. at s ces a s- r- r- g- y d t

  • an

r- ] NN)

  • n

Figure 2: Box plots showing ∆AUC values between mul- titask (MTNN or W-MTNN) and STNN models with the same core architecture. Each box plot summarizes 10 ∆AUC values, one for each combination of model archi- tecture (e.g. (2000, 1000)) and task weighting strategy (MTNN or W-MTNN).

slide-79
SLIDE 79

How did we do?

(Kearnes, Goldman, VSP)

Table 2: Median test set AUC values for random forest, logistic regression, single-task neural network (STNN), and multitask neural network (MTNN) models. W-MTNN models are task-weighted models, meaning that the cost for each task is weighted inversely proportional to the amount of training data for that task. We also report median ∆AUC values and sign test 95% confidence intervals for comparisons between each model and random forest or logistic regression (see Section 2.3). Bold values indicate confidence intervals that do not include 0.5.

  • vs. Random Forest
  • vs. Logistic Regression

Model Median AUC Median ∆AUC Sign Test 95% CI Median ∆AUC Sign Test 95% CI Random Forest 0.719 −0.016 (0.20, 0.57) Logistic Regression 0.758 0.016 (0.43, 0.80) (1000) 0.748 0.043 (0.47, 0.84) 0.007 (0.39, 0.77) (4000) 0.761 0.052 (0.52, 0.87) 0.015 (0.52, 0.87) (2000, 100) 0.749 0.039 (0.47, 0.84) 0.007 (0.35, 0.73) (2000, 1000) 0.759 0.038 (0.47, 0.84) 0.008 (0.35, 0.73) STNN (4000, 2000, 1000, 1000) 0.736 0.041 (0.43, 0.80) −0.011 (0.27, 0.65) (1000) 0.792 0.049 (0.67, 0.95) 0.029 (0.52, 0.87) (4000) 0.768 0.057 (0.61, 0.93) 0.031 (0.57, 0.90) (2000, 100) 0.797 0.044 (0.61, 0.93) 0.023 (0.43, 0.80) (2000, 1000) 0.800 0.071 (0.67, 0.95) 0.040 (0.52, 0.87) MTNN (4000, 2000, 1000, 1000) 0.809 0.059 (0.72, 0.97) 0.024 (0.43, 0.80) (1000) 0.793 0.059 (0.78, 0.99) 0.040 (0.67, 0.95) (4000) 0.773 0.055 (0.72, 0.97) 0.036 (0.67, 0.95) (2000, 100) 0.769 0.050 (0.61, 0.93) 0.022 (0.43, 0.80) (2000, 1000) 0.821 0.077 (0.78, 0.99) 0.041 (0.67, 0.95) W-MTNN (4000, 2000, 1000, 1000) 0.800 0.071 (0.61, 0.93) 0.035 (0.47, 0.84)

slide-80
SLIDE 80

How did we do?

(Kearnes, Goldman, VSP)

Table 3: Comparisons between neural network models. Differences between STNN, MTNN, and W-MTNN models with the same core (hidden layer) architecture are reported as median ∆AUC values and sign test 95% confidence

  • intervals. Bold values indicate confidence intervals that do not include 0.5.
  • vs. STNN
  • vs. MTNN

Model Median ∆AUC Sign Test 95% CI Median ∆AUC Sign Test 95% CI (1000) 0.010 (0.43, 0.80) (4000) 0.012 (0.43, 0.80) (2000, 100) 0.015 (0.39, 0.77) (2000, 1000) 0.026 (0.47, 0.84) MTNN (4000, 2000, 1000, 1000) 0.023 (0.43, 0.80) (1000) 0.017 (0.52, 0.87) 0.002 (0.37, 0.76) (4000) 0.007 (0.47, 0.84) 0.002 (0.35, 0.73) (2000, 100) 0.004 (0.39, 0.77) −0.002 (0.28, 0.68) (2000, 1000) 0.032 (0.57, 0.90) 0.005 (0.43, 0.80) W-MTNN (4000, 2000, 1000, 1000) 0.033 (0.43, 0.80) 0.004 (0.43, 0.80)

slide-81
SLIDE 81

Bharath Ramsundar goes to Pfizer (virtually)

(Subramanian, Ramsundar, VSP, Denny)

Computational Modeling of β‑Secretase 1 (BACE-1) Inhibitors Using Ligand Based Approaches

Govindan Subramanian,*,§ Bharath Ramsundar,‡ Vijay Pande,⊥ and Rajiah Aldrin Denny†

§VMRD Global Discovery, Zoetis, 333 Portage Street, Kalamazoo, Michigan 49007, United States ‡Department of Computer Science and ⊥Department of Chemistry, Stanford University, 318 Campus Drive, Stanford, California

94305, United States

†Worldwide Medicinal Chemistry, Pfizer Inc., 610 Main Street, Cambridge, Massachusetts 02139, United States

*

S Supporting Information

ABSTRACT: The binding affinities (IC50) reported for diverse structural and chemical classes of human β-secretase 1 (BACE- 1) inhibitors in literature were modeled using multiple in silico ligand based modeling approaches and statistical techniques. The descriptor space encompasses simple binary molecular fingerprint, one- and two-dimensional constitutional, physicochemical, and topological descriptors, and sophisticated three-dimensional molecular fields that require appropriate structural alignments of varied chemical scaffolds in one universal chemical space. The affinities were modeled using qualitative classification or quantitative regression schemes involving linear, nonlinear, and deep neural network (DNN) machine-learning methods used in the scientific literature for quantitative−structure activity relationships (QSAR). In a departure from tradition, ∼20% of the chemically diverse data set (205 compounds) was used to train the model with the remaining ∼80% of the structural and chemical analogs used as part of an external validation (1273 compounds) and prospective test (69 compounds) sets respectively to ascertain the model performance. The machine-learning methods investigated herein performed well in both the qualitative classification (∼70% accuracy) and quantitative IC50 predictions (RMSE ∼ 1 log). The success of the 2D descriptor based machine learning approach when compared against the 3D field based technique pursued for hBACE-1 inhibitors provides a strong impetus for systematically applying such methods during the lead identification and optimization efforts for other protein families as well.

slide-82
SLIDE 82

The target: BACE-1

(Subramanian, Ramsundar, VSP, Denny)

Scheme 1. Depiction of BACE-1 Binding Site (left) Using the Ligand from PDB Code 3UQP along with the Protein−Ligand Interaction (right)

slide-83
SLIDE 83

Workflow

(Subramanian, Ramsundar, VSP, Denny)

Scheme 2. Workflow for the Training, Test, and Validation Set Compound Alignment Used for 3D-Field Based Approaches

slide-84
SLIDE 84

Results

(Subramanian, Ramsundar, VSP, Denny)

Table 1. Statistical Measures for the Various Classification Models Developed in This Work

aTraining set (205): experimentally active (102) with IC50 ≤ 100 nM; experimentally inactive (103). bValidation set (1273): experimentally active

(551) with IC50 ≤ 100 nM; experimentally inactive (722). cFingerprint and descriptors as implemented within Canvas modeling suite from Schrödinger. d(TP + TN)/total no. molecules where TP and TN correspond to true positives and true negatives. eTP/(TP + FN) where FN correspond to false negatives. fTN/(TN + FP) where FP correspond to false positives. gMatthews correlation coefficient, MCC = [(TP*TN − FP*FN)/√((TP + FP)(TP + FN)(TN + FP)(TN + FN))]. hModel developed using Bayesian approach as implemented within Canvas modeling suite from Schrödinger. iConstitutional, physicochemical, and topological descriptors as implemented within Canvas modeling suite from Schrödinger. jModel developed using recursive partitioning (RP) using the Canvas modeling suite from Schrödinger. kRandom forest (RF) model developed using DEEPCHEM package. lDeep neural net (DNN) model developed using DEEPCHEM package. mReverse split (yellow highlight). Training set (1180): experimentally active (521) with IC50 ≤ 100 nM; experimentally inactive (659). Validation set (295): experimentally active (130) with IC50 ≤ 100 nM; experimentally inactive (165).

slide-85
SLIDE 85

Results, part 2

(Subramanian, Ramsundar, VSP, Denny)

Table 2. Statistical Parameters for the Various Quantitative Models Developed in This Work

aTraining set (205): experimentally active (102) with IC50 ≤ 100 nM; experimentally inactive (103). bValidation set (1273): experimentally active

(551) with IC50 ≤ 100 nM; experimentally inactive (722). cStatistical technique employed. See the Abbreviations section for definitions. dCoefficient

  • f the fit of a linear regression. eRoot-mean-square error. fMean absolute error. gStandard error. h1D and 2D constitutional, physiochemical, and

topological descriptors as implemented within Canvas modeling suite from Schrödinger. i3D-grid based field descriptors utilizing hydrophobic, H- bond donor, and acceptor probes as implemented by the individual approaches implemented within Schrödinger and Sybyl modeling packages.

jReverse split (yellow highlighting). Training set (1180): experimentally active (521) with IC50 ≤ 100 nM; experimentally inactive (659). Validation

set (295): experimentally active (130) with IC50 ≤ 100 nM; experimentally inactive (165).

slide-86
SLIDE 86
slide-87
SLIDE 87

“One shot” to get it right

slide-88
SLIDE 88

Siamese Neural Networks for One-shot Image Recognition

Gregory Koch

GKOCH@CS.TORONTO.EDU

Richard Zemel

ZEMEL@CS.TORONTO.EDU

Ruslan Salakhutdinov

RSALAKHU@CS.TORONTO.EDU

Department of Computer Science, University of Toronto. Toronto, Ontario, Canada.

Abstract

The process of learning good features for ma- chine learning applications can be very compu- tationally expensive and may prove difficult in cases where little data is available. A prototyp- ical example of this is the one-shot learning set- ting, in which we must correctly make predic- tions given only a single example of each new class. In this paper, we explore a method for learning siamese neural networks which employ a unique structure to naturally rank similarity be- tween inputs. Once a network has been tuned, we can then capitalize on powerful discrimina- tive features to generalize the predictive power of the network not just to new data, but to entirely new classes from unknown distributions. Using a convolutional architecture, we are able to achieve strong results which exceed those of other deep learning models with near state-of-the-art perfor- mance on one-shot classification tasks.

Figure 1. Example of a 20-way one-shot classification task using the Omniglot dataset. The lone test image is shown above the grid

  • f 20 images representing the possible unseen classes that we can

choose for the test image. These 20 images are our only known examples of each of those classes.

Siamese neural network for one-shot learning

slide-89
SLIDE 89

Figure 2. Our general strategy. 1) Train a model to discriminate between a collection of same/different pairs. 2) Generalize to evaluate new categories based on learned feature mappings for verification.

Proceedings of the 32 nd International Conference on Machine Learning, Lille, France, 2015. JMLR: W&CP volume 37. Copy- right 2015 by the author(s).

Siamese neural network for one-shot learning

slide-90
SLIDE 90

Figure 2. Our general strategy. 1) Train a model to discriminate between a collection of same/different pairs. 2) Generalize to evaluate new categories based on learned feature mappings for verification.

Proceedings of the 32 nd International Conference on Machine Learning, Lille, France, 2015. JMLR: W&CP volume 37. Copy- right 2015 by the author(s).

Siamese neural network for one-shot learning

slide-91
SLIDE 91

Beyond molecular fingerprints: conv nets on graphs

Convolutional Networks on Graphs for Learning Molecular Fingerprints

David Duvenaud†, Dougal Maclaurin†, Jorge Aguilera-Iparraguirre Rafael G´

  • mez-Bombarelli, Timothy Hirzel, Al´

an Aspuru-Guzik, Ryan P. Adams Harvard University

Abstract

We introduce a convolutional neural network that operates directly on graphs. These networks allow end-to-end learning of prediction pipelines whose inputs are graphs of arbitrary size and shape. The architecture we present generalizes standard molecular feature extraction methods based on circular fingerprints. We show that these data-driven features are more interpretable, and have better pre- dictive performance on a variety of tasks.

These neural graph fingerprints offer several advantages over fixed fingerprints:

  • Predictive performance. By using data adapting to the task at hand, machine-optimized

fingerprints can provide substantially better predictive performance than fixed fingerprints. We show that neural graph fingerprints match or beat the predictive performance of stan- dard fingerprints on solubility, drug efficacy, and organic photovoltaic efficiency datasets.

  • Parsimony. Fixed fingerprints must be extremely large to encode all possible substructures

without overlap. For example, [28] used a fingerprint vector of size 43,000, after having removed rarely-occurring features. Differentiable fingerprints can be optimized to encode

  • nly relevant features, reducing downstream computation and regularization requirements.
  • Interpretability. Standard fingerprints encode each possible fragment completely dis-

tinctly, with no notion of similarity between fragments. In contrast, each feature of a neural graph fingerprint can be activated by similar but distinct molecular fragments, making the feature representation more meaningful.

Molecular Graph Convolutions: Moving Beyond Fingerprints

Steven Kearnes Stanford University kearnes@stanford.edu Kevin McCloskey Google Inc. mccloskey@google.com Marc Berndl Google Inc. marcberndl@google.com Vijay Pande Stanford University pande@stanford.edu Patrick Riley Google Inc. pfr@google.com

Abstract Molecular “fingerprints” encoding structural information are the workhorse of cheminfor- matics and machine learning in drug discovery

  • applications. However, fingerprint representa-

tions necessarily emphasize particular aspects

  • f the molecular structure while ignoring others,

rather than allowing the model to make data- driven decisions. We describe molecular graph convolutions, a machine learning architecture for learning from undirected graphs, specifically small molecules. Graph convolutions use a sim- ple encoding of the molecular graph—atoms, bonds, distances, etc.—which allows the model to take greater advantage of information in the graph structure. Although graph convolutions do not outperform all fingerprint-based meth-

  • ds, they (along with other graph-based meth-
  • ds) represent a new paradigm in ligand-based

virtual screening with exciting opportunities for future improvement.

1 Introduction

Computer-aided drug design requires representations

  • f molecules that can be related to biological activ-

ity or other experimental endpoints. These repre- sentations encode structural features, physical prop- erties, or activity in other assays [Todeschini and Consonni, 2009; Petrone et al., 2012]. The recent advent of “deep learning” has enabled the use of very raw representations that are less application- specific when building machine learning models [Le- Cun et al., 2015]. For instance, image recognition models that were once based on complex features ex- tracted from images are now trained exclusively on the pixels themselves—deep architectures can “learn” appropriate representations for input data. Conse- quently, deep learning systems for drug screening or

Figure 1: Molecular graph for ibuprofen. Unmarked ver- tices represent carbon atoms, and bond order is indicated by the number of lines used for each edge.

design should benefit from molecular representations that are as complete and general as possible rather than relying on application-specific features or encod- ings. First-year chemistry students quickly become fa- miliar with a common representation for small molecules: the molecular graph. Figure 1 gives an example of the molecular graph for ibuprofen, an over-the-counter non-steroidal anti-inflammatory

  • drug. The atoms and bonds between atoms form the

nodes and edges, respectively, of the graph. Both atoms and bonds have associated properties, such as atom type and bond order. Although the basic molec- ular graph representation does not capture the quan- tum mechanical structure of molecules or necessarily express all of the information that it might suggest to an expert medicinal chemist, its ubiquity in academia and industry makes it a desirable starting point for machine learning on chemical information. Here we describe molecular graph convolutions, a deep learning system using a representation of small molecules as undirected graphs of atoms. Graph con- 1

(Kearnes, McCloskey, Berndl, VSP, Riley)

slide-92
SLIDE 92

Siamese neural network for one-shot learning

Low Data Drug Discovery with One-shot Learning

Han Altae-Tran,†,§ Bharath Ramsundar,‡,§ Aneesh S. Pappu,‡ and Vijay Pande∗,¶

Department of Biological Engineering, Massachusetts Institute of Technology, Department

  • f Computer Science, Stanford University, and Department of Chemistry, Stanford

University E-mail: pande@stanford.edu

Abstract Recent advances in machine learning have made significant contributions to drug

  • discovery. Deep neural networks in particular have been demonstrated to provide sig-

nificant boosts in predictive power when inferring the properties and activities of small- molecule compounds.1 However, the applicability of these techniques has been limited by the requirement for large amounts of training data. In this work, we demonstrate how one-shot learning can be used to significantly lower the amounts of data required to make meaningful predictions in drug discovery applications. We introduce a new architecture, the residual LSTM embedding, that, when combined with graph convolu- tional neural networks, significantly improves the ability to learn meaningful distance metrics over small-molecules. We open source all models introduced in this work as part of DeepChem, an open-source framework for deep-learning in drug discovery.2

∗To whom correspondence should be addressed †Department of Biological Engineering, Massachusetts Institute of Technology ‡Department of Computer Science, Stanford University ¶Department of Chemistry, Stanford University §equal contribution

arXiv:1611.03199

slide-93
SLIDE 93

A new architecture: Residual LSTM

(Altae-Tran, Ramsundar, Pappu, VSP)

Dopamine Tosylate Caffeine Ethanol Lithium Ion Structure Label Dataset New Compound Styrene Oxide ? Compound

@′ B′ @ B C ⋅

×

prediction similarity

slide-94
SLIDE 94

A new architecture: Residual LSTM

(Altae-Tran, Ramsundar, Pappu, VSP)

Dopamine Tosylate Caffeine Ethanol Lithium Ion Structure Label Dataset New Compound Styrene Oxide ? Compound

@′ B′ @ B C ⋅

×

prediction similarity

conv net → graph conv net

slide-95
SLIDE 95

One-step iterative refinement of embeddings

(Altae-Tran, Ramsundar, Pappu, VSP)

slide-96
SLIDE 96

One-step iterative refinement of embeddings

(Altae-Tran, Ramsundar, Pappu, VSP)

Initialize r = g0(S) δz = 0 δz = 0 Repeat e = k(f 0(x) + δz, r) e = k(r + δz, g0(S)) (similarity measures) aj = ej/ Pm

j=1 eij

Aij = eij/ Pm

j=1 eij

(attention mechanism) r = a>r r = Ag0(S) (expected feature map) δz = LSTM ([δz, r]) δz = LSTM ([δz, r]) (generate updates) Return f(x) = f 0(x) + δz g(S) = g0(S) + δz (evolve embeddings)

slide-97
SLIDE 97

The major graph operations

(Altae-Tran, Ramsundar, Pappu, VSP)

local topology features

Graph Convolution

pick & with (&,,) ∈ / set 2 = dist(,,&) 5 = deg (,) transform =>,?& + A>,? new features for ,

Graph Pool

local topology features max over neighbors and self new features for ,

Graph Gather

molecular featurization sum all nodes global topology features sum over & and apply nonlinearity F(⋅) repeat for remaining & + +

For each of the

  • perations, the nodes

being operated on are shown in blue, with unchanged nodes shown in light blue. For graph convolution and graph pool, the

  • peration is shown for

a single node, v, however, these

  • perations are

performed on all nodes v in the graph simultaneously.

slide-98
SLIDE 98

Tests of this method

(Altae-Tran, Ramsundar, Pappu, VSP)

SIDER contains information on marketed medicines and their recorded adverse drug reactions. The goal of the Tox21 challenge is to "crowdsource" data analysis by independent researchers.

slide-99
SLIDE 99

Significant improvement in AUC over RF

(Altae-Tran, Ramsundar, Pappu, VSP)

Table 1: Accuracies of models on held-out tasks for Tox21. Numbers reported are median

  • n test-tasks. Numbers for each task are averaged for 20 random choices of support sets.

Tox21 RF (50 trees) RF (100 trees) Siamese AttnLSTM ResLSTM 10 pos, 10 neg 0.537 0.563 0.831 0.834 0.840 5 pos, 10 neg 0.537 0.579 0.790 0.820 0.837 1 pos, 10 neg 0.537 0.584 0.710 0.687 0.757 1 pos, 5 neg 0.571 0.572 0.689 0.595 0.815 1 pos, 1 neg 0.536 0.542 0.668 0.652 0.784 Table 2: Accuracies of models on held-out tasks for SIDER. Numbers reported are median

  • n test-tasks. Numbers for each task are averaged for 20 random choices of support sets

SIDER RF (50 trees) RF (100 trees) Siamese AttnLSTM ResLSTM 10 pos, 10 neg 0.551 0.546 0.660 0.671 0.752 5 pos, 10 neg 0.534 0.541 0.674 0.671 0.750 1 pos, 10 neg 0.537 0.533 0.542 0.543 0.602 1 pos, 5 neg 0.536 0.535 0.544 0.539 0.639 1 pos, 1 neg 0.504 0.501 0.506 0.505 0.623

slide-100
SLIDE 100

Significant improvement in AUC over RF

(Altae-Tran, Ramsundar, Pappu, VSP)

Table 1: Accuracies of models on held-out tasks for Tox21. Numbers reported are median

  • n test-tasks. Numbers for each task are averaged for 20 random choices of support sets.

Tox21 RF (50 trees) RF (100 trees) Siamese AttnLSTM ResLSTM 10 pos, 10 neg 0.537 0.563 0.831 0.834 0.840 5 pos, 10 neg 0.537 0.579 0.790 0.820 0.837 1 pos, 10 neg 0.537 0.584 0.710 0.687 0.757 1 pos, 5 neg 0.571 0.572 0.689 0.595 0.815 1 pos, 1 neg 0.536 0.542 0.668 0.652 0.784 Table 2: Accuracies of models on held-out tasks for SIDER. Numbers reported are median

  • n test-tasks. Numbers for each task are averaged for 20 random choices of support sets

SIDER RF (50 trees) RF (100 trees) Siamese AttnLSTM ResLSTM 10 pos, 10 neg 0.551 0.546 0.660 0.671 0.752 5 pos, 10 neg 0.534 0.541 0.674 0.671 0.750 1 pos, 10 neg 0.537 0.533 0.542 0.543 0.602 1 pos, 5 neg 0.536 0.535 0.544 0.539 0.639 1 pos, 1 neg 0.504 0.501 0.506 0.505 0.623

slide-101
SLIDE 101

But with limits

(Altae-Tran, Ramsundar, Pappu, VSP)

Table 3: Accuracies of models on held-out tasks for MUV. Numbers reported are median on test-tasks. Numbers for each task are averaged for 20 random choices of support sets SIDER RF (50 trees) RF (100 trees) Siamese AttnLSTM ResLSTM 10 pos, 10 neg 0.710 0.741 0.501 0.683 0.712 5 pos, 10 neg 0.723 .751 0.708 0.674 0.672 1 pos, 10 neg 0.586 0.624 0.567 0.583 0.619 1 pos, 5 neg 0.561 0.579 0.546 0.565 0.634 1 pos, 1 neg 0.558 0.573 0.498 0.501 0.512 Table 4: Accuracies of models trained on Tox21 and evaluated on SIDER. Random forest numbers copied over from SIDER table for comparative purposes. Numbers reported are median on SIDER tasks. Numbers for each task are averaged for 20 random choices of support sets SIDER from Tox21 RF (50 trees) RF (100 trees) Siamese AttnLSTM ResLSTM 10 pos, 10 neg .551 .546 0.504 0.510 .509

slide-102
SLIDE 102

But with limits

(Altae-Tran, Ramsundar, Pappu, VSP)

Table 3: Accuracies of models on held-out tasks for MUV. Numbers reported are median on test-tasks. Numbers for each task are averaged for 20 random choices of support sets SIDER RF (50 trees) RF (100 trees) Siamese AttnLSTM ResLSTM 10 pos, 10 neg 0.710 0.741 0.501 0.683 0.712 5 pos, 10 neg 0.723 .751 0.708 0.674 0.672 1 pos, 10 neg 0.586 0.624 0.567 0.583 0.619 1 pos, 5 neg 0.561 0.579 0.546 0.565 0.634 1 pos, 1 neg 0.558 0.573 0.498 0.501 0.512 Table 4: Accuracies of models trained on Tox21 and evaluated on SIDER. Random forest numbers copied over from SIDER table for comparative purposes. Numbers reported are median on SIDER tasks. Numbers for each task are averaged for 20 random choices of support sets SIDER from Tox21 RF (50 trees) RF (100 trees) Siamese AttnLSTM ResLSTM 10 pos, 10 neg .551 .546 0.504 0.510 .509

1-shot fails to generalize

  • n diverse scaffolds found in MUV?
slide-103
SLIDE 103

Where to find the code?

(Altae-Tran, Ramsundar, Pappu, VSP)

http://deepchem.io

slide-104
SLIDE 104

Deep Learning for Chemoinformatics

Registration/Welcome 9:15-9:45am Morning Session 9:45 - 12:15pm 9:45 - 10:30am The Rational Exuberance of Deep Learning: The breakthroughs in imaging and natural languages can be replicated in chemoinformatics (Vijay Pande) 10:30 - 11:15am The Thicket of Challenges in GPCR Molecular Pharmacology: Paradigm shifts bring new

  • pportunities and new hurdles (Ryan Strachan, UNC Chapel Hill)

11:15 - 11:30am Atomwise Sponsored Coffee Break 11:30 - 12:15pm The Holographic Pose: Docked conformations as Tensor Images (Abraham Heifets, Atomwise) Lunch 12:15 - 1:15pm Afternoon Session 1:15pm - 4:45pm 1:15 - 2:00pm Chemoinformatics for Pharmacology: Why it shouldn't work, why it does work, and

  • pportunities for predicting new biology (Brian Shoichet)

2:00 - 2:45pm Tutorial: Democratizing Drug Discovery with the DeepChem Platform (Bharath Ramsundar, Pande Group) 2:45 - 3:30pm Break 3:30 - 4:00pm Mastering variant calling of SNPs and small indels with deep neural networks (Mark DiPristo, Verily Life Sciences) 4:00 - 4:45pm Tutorial: Virtual Screening Step-by-Step: Combining computational methods and expert judgement (Magdalena Korczynska, Shoichet Group) Cocktail hour 5:00pm - 6:30pm Mudd Building Just across campus drive from the Clark center. 333 Campus Drive (main floor lobby and Patio)

slide-105
SLIDE 105

Conclusions

slide-106
SLIDE 106

Conclusions

  • 1. It’s still early, but we are seeing signals that

DNN methods are superior to traditional ML.


slide-107
SLIDE 107

Conclusions

  • 1. It’s still early, but we are seeing signals that

DNN methods are superior to traditional ML.


  • 2. We will need domain specific ML advances in
  • rder to push the boundaries further.

slide-108
SLIDE 108

Conclusions

  • 1. It’s still early, but we are seeing signals that

DNN methods are superior to traditional ML.


  • 2. We will need domain specific ML advances in
  • rder to push the boundaries further.

  • 3. Academia, startups, and pharma will likely need

to all collaborate closely to make this happen.

slide-109
SLIDE 109

Acknowledgements

Improving Docking AUCs with MSMs Evan Feinberg (Pande Lab) Amir Farimani (Pande Lab) Multitask DNNs for drug discovery (arXiv: 1502.02072 ) Bharath Ramsundar (Pande Lab, Stanford; intern, Google) Steven Kearnes (Pande Lab, Stanford; now Google) Patrick Riley (Google) Dale Webster (Google) David Konerding (Google) Vertex Comparison (arXiv:1606.08793) Steven Kearnes (Pande Lab, Stanford) Brian Goldman (Vertex) Pfizer Comparison Govindan Subramanian (Pfizer) Bharath Ramsundar (Pande Lab, Stanford) Rajah Denny (Pfizer) One Shot (arXiv:1611.03199 ) Han Altae-Tran (Pande Lab, Stanford; now MIT) Bharath Ramsundar (Pande Lab, Stanford) Aneesh Pappu (Pande Lab, Stanford) Infectious Disease and Alzheimer’s Disease Paul Novick (Pande Lab, now at Globavir) Kim Branson (Pande Lab, now at Lumiata)