GPU-Accelerated Convolutional Neural Networks For Protein-Ligand - - PowerPoint PPT Presentation

gpu accelerated convolutional neural networks for protein
SMART_READER_LITE
LIVE PREVIEW

GPU-Accelerated Convolutional Neural Networks For Protein-Ligand - - PowerPoint PPT Presentation

GPU-Accelerated Convolutional Neural Networks For Protein-Ligand Scoring David Koes @david_koes GPU Technology Conference May 8, 2017 University of Pittsburgh Computational and Systems Biology THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT


slide-1
SLIDE 1

GPU-Accelerated Convolutional Neural Networks For Protein-Ligand Scoring

David Koes

GPU Technology Conference May 8, 2017

@david_koes

slide-2
SLIDE 2

University of Pittsburgh Computational and Systems Biology 2

PHASE I PHASE II PHASE III PHASE IV

IND SUBMITTED NDA/BLA SUBMITTED FDA APPROVAL

TENS HUNDREDS THOUSANDS NUMBER OF VOLUNTEERS

THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT PROCESS

POTENTIAL NEW MEDICINES

1 FDA-

APPROVED MEDICINE

BASIC RESEARCH DRUG DISCOVERY CLINICAL TRIALS FDA REVIEW

POST-APPROVAL RESEARCH & MONITORING

PRE- CLINICAL

Source: Pharmaceutical Research and Manufacturers of America (http://phrma.org)

$2.6

BILLION

slide-3
SLIDE 3

University of Pittsburgh Computational and Systems Biology 2

PHASE I PHASE II PHASE III PHASE IV

IND SUBMITTED NDA/BLA SUBMITTED FDA APPROVAL

TENS HUNDREDS THOUSANDS NUMBER OF VOLUNTEERS

THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT PROCESS

POTENTIAL NEW MEDICINES

1 FDA-

APPROVED MEDICINE

BASIC RESEARCH DRUG DISCOVERY CLINICAL TRIALS FDA REVIEW

POST-APPROVAL RESEARCH & MONITORING

PRE- CLINICAL

Source: Pharmaceutical Research and Manufacturers of America (http://phrma.org)

$2.6

BILLION

If you stop failing so often you massively reduce the cost of drug development.

— Sir Andrew Witty CEO, GlaxoSmithKline

slide-4
SLIDE 4

University of Pittsburgh Computational and Systems Biology 2

PHASE I PHASE II PHASE III PHASE IV

IND SUBMITTED NDA/BLA SUBMITTED FDA APPROVAL

TENS HUNDREDS THOUSANDS NUMBER OF VOLUNTEERS

THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT PROCESS

POTENTIAL NEW MEDICINES

1 FDA-

APPROVED MEDICINE

BASIC RESEARCH DRUG DISCOVERY CLINICAL TRIALS FDA REVIEW

POST-APPROVAL RESEARCH & MONITORING

PRE- CLINICAL

Source: Pharmaceutical Research and Manufacturers of America (http://phrma.org)

$2.6

BILLION

If you stop failing so often you massively reduce the cost of drug development.

— Sir Andrew Witty CEO, GlaxoSmithKline

slide-5
SLIDE 5

University of Pittsburgh Computational and Systems Biology 3

  • 1. Does the compound do what you want it to?
  • 2. Does the compound not do what you don’t want it to?
  • 3. Is what you want it to do the right thing?
slide-6
SLIDE 6

University of Pittsburgh Computational and Systems Biology

Protein Structures

4

sequence → structure → function

slide-7
SLIDE 7

University of Pittsburgh Computational and Systems Biology

Protein Structures

4

sequence → structure → function

slide-8
SLIDE 8

University of Pittsburgh Computational and Systems Biology

Structure Based Drug Design

5

?

Unlike ligand based approaches, generalizes to new targets Requires molecular target with known structure and binding site

slide-9
SLIDE 9

University of Pittsburgh Computational and Systems Biology

Structure Based Drug Design

5

Unlike ligand based approaches, generalizes to new targets Requires molecular target with known structure and binding site

slide-10
SLIDE 10

University of Pittsburgh Computational and Systems Biology

Structure Based Drug Design

5

Unlike ligand based approaches, generalizes to new targets Requires molecular target with known structure and binding site

slide-11
SLIDE 11

University of Pittsburgh Computational and Systems Biology 6

Structure Based Drug Design

Virtual Screening Lead Optimization Pose Prediction Binding Discrimination Affinity Prediction

slide-12
SLIDE 12

University of Pittsburgh Computational and Systems Biology 6

Structure Based Drug Design

Virtual Screening Lead Optimization Pose Prediction Binding Discrimination Affinity Prediction

slide-13
SLIDE 13

University of Pittsburgh Computational and Systems Biology

Protein-Ligand Scoring

7

r1 r2

d

  • O. Trott, A. J. Olson, AutoDock Vina: improving the speed and accuracy of docking with a new scoring

function, efficient optimization and multithreading, Journal of Computational Chemistry 31 (2010) 455-461

AutoDock Vina

slide-14
SLIDE 14

University of Pittsburgh Computational and Systems Biology

Can we do better?

Accurate pose prediction, binding discrimination, and affinity prediction without sacrificing performance?

8

slide-15
SLIDE 15

University of Pittsburgh Computational and Systems Biology

Can we do better?

Accurate pose prediction, binding discrimination, and affinity prediction without sacrificing performance? Key Idea: Leverage “big data”

  • 231,655,275 bioactivities in PubChem
  • 125,526 structures in the PDB
  • 16,179 annotated complexes in PDBbind

8

slide-16
SLIDE 16

University of Pittsburgh Computational and Systems Biology

Deep Learning

9

slide-17
SLIDE 17

University of Pittsburgh Computational and Systems Biology

Deep Learning

9

slide-18
SLIDE 18

University of Pittsburgh Computational and Systems Biology

Image Recognition

10

https://devblogs.nvidia.com

Convolutional Neural Networks

slide-19
SLIDE 19

University of Pittsburgh Computational and Systems Biology

Convolutional Neural Networks

11

. . . . . .

Dog: 0.99 Cat: 0.02 Convolution Feature Maps Convolution Feature Maps Fully Connected Traditional NN

slide-20
SLIDE 20

University of Pittsburgh Computational and Systems Biology

CNNs for Protein-Ligand Scoring

12

CNN

Pose Prediction Binding Discrimination Affinity Prediction

slide-21
SLIDE 21

University of Pittsburgh Computational and Systems Biology

CNNs for Protein-Ligand Scoring

12

CNN

Pose Prediction Binding Discrimination Affinity Prediction

slide-22
SLIDE 22

University of Pittsburgh Computational and Systems Biology

CNNs for Protein-Ligand Scoring

12

CNN

Pose Prediction Binding Discrimination Affinity Prediction

  • Input representation
  • Training
  • Model optimization
  • Visualize and Evaluation
slide-23
SLIDE 23

University of Pittsburgh Computational and Systems Biology

C C O O C C C C C O O O O C C C C C CO O O O C C C C C C O O C C C C C C C C C C C C C C C C G G R R G G G G G R R R R G G G G G GR R R R G G G G G G R R G G G G G G G G G G G G G G G G

Protein-Ligand Representation

13

(R,G,B) pixel

slide-24
SLIDE 24

University of Pittsburgh Computational and Systems Biology

C C O O C C C C C O O O O C C C C C CO O O O C C C C C C O O C C C C C C C C C C C C C C C C

Protein-Ligand Representation

13

(R,G,B) pixel → (Carbon, Nitrogen, Oxygen,…) voxel The only parameters for this representation are the choice of grid resolution, atom density, and atom types.

slide-25
SLIDE 25

University of Pittsburgh Computational and Systems Biology

Atom Density

14

Gaussian

slide-26
SLIDE 26

University of Pittsburgh Computational and Systems Biology

Atom Types

15

Ligand AliphaticCarbonXSHydrophobe AliphaticCarbonXSNonHydrophobe AromaticCarbonXSHydrophobe AromaticCarbonXSNonHydrophobe Bromine Chlorine Fluorine Iodine Nitrogen NitrogenXSAcceptor NitrogenXSDonor NitrogenXSDonorAcceptor Oxygen OxygenXSAcceptor OxygenXSDonorAcceptor Phosphorus Sulfur SulfurAcceptor Receptor AliphaticCarbonXSHydrophobe AliphaticCarbonXSNonHydrophobe AromaticCarbonXSHydrophobe AromaticCarbonXSNonHydrophobe Calcium Iron Magnesium Nitrogen NitrogenXSAcceptor NitrogenXSDonor NitrogenXSDonorAcceptor OxygenXSAcceptor OxygenXSDonorAcceptor Phosphorus Sulfur Zinc

slide-27
SLIDE 27

University of Pittsburgh Computational and Systems Biology

Training Data

16

Pose Prediction 337 protein-ligand complexes

  • curated for electron density
  • diverse targets
  • <10µM affinity
  • generate poses with Vina
  • 745 <2Å RMSD (actives)
  • 3251 >4Å RMSD (decoys)

12,484 protein-ligand complexes

  • diverse targets
  • wide range of affinities
  • generate poses with AutoDock Vina
  • include minimized crystal pose
  • 24,727 <2Å RMSD (actives)
  • 244,192 >4Å RMSD (decoys)
slide-28
SLIDE 28

University of Pittsburgh Computational and Systems Biology

Model Evaluation

17

CSAR: >90% similar targets kept in same fold PDBbind: >80% similar targets kept in same fold

AUC

slide-29
SLIDE 29

University of Pittsburgh Computational and Systems Biology

Model Training

18

Parallelize over atoms to obtain a mask of atoms that overlap each grid region Use exclusive scan to obtain a list of atom indices from the mask Parallelize over grid points, using reduced atom list to avoid O(Natoms) check

Custom MolGridDataLayer

Caffe

slide-30
SLIDE 30

University of Pittsburgh Computational and Systems Biology

Data Augmentation

19

slide-31
SLIDE 31

University of Pittsburgh Computational and Systems Biology

Data Augmentation

19

slide-32
SLIDE 32

University of Pittsburgh Computational and Systems Biology

Model Optimization

20

Atom Types

  • Vina (34)
  • element-only (18)
  • ligand-protein (2)

Atom Density Type

  • Boolean
  • Gaussian

Radius Multiple Resolution Pooling Depth Width Fully Connected Layers max

slide-33
SLIDE 33

University of Pittsburgh Computational and Systems Biology

Model Optimization

21

slide-34
SLIDE 34

University of Pittsburgh Computational and Systems Biology

Model Optimization

21

unit1_pool unit1_conv1 32 x 24^3 loss unit2_pool unit2_conv1 64 x 12^3 label unit3_pool

  • utput_fc

2

  • utput

unit3_conv1 128 x 6^3 data 48^3

slide-35
SLIDE 35

University of Pittsburgh Computational and Systems Biology

Cross-Validation Evaluation

22

slide-36
SLIDE 36

University of Pittsburgh Computational and Systems Biology

Pose Prediction (CSAR)

23

slide-37
SLIDE 37

University of Pittsburgh Computational and Systems Biology

Pose Prediction (CSAR)

23

inter-target ranking intra-target ranking

slide-38
SLIDE 38

University of Pittsburgh Computational and Systems Biology

Pose Prediction (PDBbind)

24

slide-39
SLIDE 39

University of Pittsburgh Computational and Systems Biology

Pose Prediction (PDBbind)

24

inter-target ranking intra-target ranking

slide-40
SLIDE 40

University of Pittsburgh Computational and Systems Biology

Visualization

25

slide-41
SLIDE 41

University of Pittsburgh Computational and Systems Biology

Examples

26

3COY 2QMJ 3OZT Partially Aligned Poses

slide-42
SLIDE 42

University of Pittsburgh Computational and Systems Biology

Beyond Scoring

27

slide-43
SLIDE 43

University of Pittsburgh Computational and Systems Biology

Beyond Scoring

27

slide-44
SLIDE 44

University of Pittsburgh Computational and Systems Biology

Beyond Scoring

27

slide-45
SLIDE 45

University of Pittsburgh Computational and Systems Biology

Beyond Scoring

28

2Q89 More Oxygen Here Less Oxygen Here

slide-46
SLIDE 46

University of Pittsburgh Computational and Systems Biology

Beyond Scoring

28

2Q89 More Oxygen Here Less Oxygen Here

slide-47
SLIDE 47
slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50
slide-51
SLIDE 51

University of Pittsburgh Computational and Systems Biology 30

slide-52
SLIDE 52

University of Pittsburgh Computational and Systems Biology 30

slide-53
SLIDE 53

University of Pittsburgh Computational and Systems Biology

Iterative Training

The Future

Pose Selection Pose Generation

Iterative Training

Virtual Screening Lead Optimization

Compound Generation

slide-54
SLIDE 54

University of Pittsburgh Computational and Systems Biology

Iterative Training

The Future

Pose Selection Pose Generation

Iterative Training

Virtual Screening Lead Optimization

Compound Generation

slide-55
SLIDE 55

University of Pittsburgh Computational and Systems Biology

Acknowledgements

32

Matt Ragoza Josh Hochuli Elisa Idrobo Jocelyn Sunseri

R01GM108340

Group Members Jocelyn Sunseri Matt Ragoza Josh Hochuli Roosha Mandal Alec Helbling Lily Turner Aaron Zheng Sara Amato Lily Turner Aaron Zheng Gibran Biswas

Department of Computational and Systems Biology

slide-56
SLIDE 56

University of Pittsburgh Computational and Systems Biology

Questions?

33

Binding Determination Affinity Prediction Relevance Propagation

slide-57
SLIDE 57

University of Pittsburgh Computational and Systems Biology

Questions?

33

Binding Determination Affinity Prediction Relevance Propagation