Deep Learning for Molecular Docking David Koes @david_koes GPU - - PowerPoint PPT Presentation

deep learning for molecular docking
SMART_READER_LITE
LIVE PREVIEW

Deep Learning for Molecular Docking David Koes @david_koes GPU - - PowerPoint PPT Presentation

Deep Learning for Molecular Docking David Koes @david_koes GPU Technology Conference San Jose, CA March 26, 2018 University of Pittsburgh Computational and Systems Biology THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT PROCESS


slide-1
SLIDE 1

Deep Learning for Molecular Docking

David Koes

GPU Technology Conference San Jose, CA March 26, 2018

@david_koes

slide-2
SLIDE 2

University of Pittsburgh Computational and Systems Biology 2

PHASE I PHASE II PHASE III PHASE IV

IND SUBMITTED NDA/BLA SUBMITTED FDA APPROVAL

TENS HUNDREDS THOUSANDS NUMBER OF VOLUNTEERS

THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT PROCESS

POTENTIAL NEW MEDICINES

1 FDA-

APPROVED MEDICINE

BASIC RESEARCH DRUG DISCOVERY CLINICAL TRIALS FDA REVIEW

POST-APPROVAL RESEARCH & MONITORING

PRE- CLINICAL

Source: Pharmaceutical Research and Manufacturers of America (http://phrma.org)

$2.6

BILLION

slide-3
SLIDE 3

University of Pittsburgh Computational and Systems Biology 2

PHASE I PHASE II PHASE III PHASE IV

IND SUBMITTED NDA/BLA SUBMITTED FDA APPROVAL

TENS HUNDREDS THOUSANDS NUMBER OF VOLUNTEERS

THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT PROCESS

POTENTIAL NEW MEDICINES

1 FDA-

APPROVED MEDICINE

BASIC RESEARCH DRUG DISCOVERY CLINICAL TRIALS FDA REVIEW

POST-APPROVAL RESEARCH & MONITORING

PRE- CLINICAL

Source: Pharmaceutical Research and Manufacturers of America (http://phrma.org)

$2.6

BILLION

If you stop failing so often you massively reduce the cost of drug development.

— Sir Andrew Witty CEO, GlaxoSmithKline

slide-4
SLIDE 4

University of Pittsburgh Computational and Systems Biology 2

PHASE I PHASE II PHASE III PHASE IV

IND SUBMITTED NDA/BLA SUBMITTED FDA APPROVAL

TENS HUNDREDS THOUSANDS NUMBER OF VOLUNTEERS

THE BIOPHARMACEUTICAL RESEARCH AND DEVELOPMENT PROCESS

POTENTIAL NEW MEDICINES

1 FDA-

APPROVED MEDICINE

BASIC RESEARCH DRUG DISCOVERY CLINICAL TRIALS FDA REVIEW

POST-APPROVAL RESEARCH & MONITORING

PRE- CLINICAL

Source: Pharmaceutical Research and Manufacturers of America (http://phrma.org)

$2.6

BILLION

If you stop failing so often you massively reduce the cost of drug development.

— Sir Andrew Witty CEO, GlaxoSmithKline

slide-5
SLIDE 5

University of Pittsburgh Computational and Systems Biology 3

  • 1. Does the compound do what you want it to?
  • 2. Does the compound not do what you don’t want it to?
  • 3. Is what you want it to do the right thing?
slide-6
SLIDE 6

University of Pittsburgh Computational and Systems Biology

Protein Structures

4

sequence → structure → function

slide-7
SLIDE 7

University of Pittsburgh Computational and Systems Biology

Protein Structures

4

sequence → structure → function

slide-8
SLIDE 8

University of Pittsburgh Computational and Systems Biology

Structure Based Drug Design

5

?

Unlike ligand based approaches, generalizes to new targets Requires molecular target with known structure and binding site

slide-9
SLIDE 9

University of Pittsburgh Computational and Systems Biology

Structure Based Drug Design

5

Unlike ligand based approaches, generalizes to new targets Requires molecular target with known structure and binding site

slide-10
SLIDE 10

University of Pittsburgh Computational and Systems Biology

Structure Based Drug Design

5

Unlike ligand based approaches, generalizes to new targets Requires molecular target with known structure and binding site

slide-11
SLIDE 11

University of Pittsburgh Computational and Systems Biology 6

Structure Based Drug Design

Virtual Screening Lead Optimization Pose Prediction Binding Discrimination Affinity Prediction

slide-12
SLIDE 12

University of Pittsburgh Computational and Systems Biology 6

Structure Based Drug Design

Virtual Screening Lead Optimization Pose Prediction Binding Discrimination Affinity Prediction

slide-13
SLIDE 13

University of Pittsburgh Computational and Systems Biology

Protein-Ligand Scoring

7

r1 r2

d

  • O. Trott, A. J. Olson, AutoDock Vina: improving the speed and accuracy of docking with a new scoring

function, efficient optimization and multithreading, Journal of Computational Chemistry 31 (2010) 455-461

AutoDock Vina

slide-14
SLIDE 14

University of Pittsburgh Computational and Systems Biology

Can we do better?

Accurate pose prediction, binding discrimination, and affinity prediction without sacrificing performance?

8

slide-15
SLIDE 15

University of Pittsburgh Computational and Systems Biology

Can we do better?

Accurate pose prediction, binding discrimination, and affinity prediction without sacrificing performance? Key Idea: Leverage “big data”

  • 231,655,275 bioactivities in PubChem
  • 125,526 structures in the PDB
  • 16,179 annotated complexes in PDBbind

8

slide-16
SLIDE 16

University of Pittsburgh Computational and Systems Biology

Deep Learning

9

https://devblogs.nvidia.com

Convolutional Neural Networks

slide-17
SLIDE 17

University of Pittsburgh Computational and Systems Biology

Deep Learning

9

https://devblogs.nvidia.com

Convolutional Neural Networks

slide-18
SLIDE 18

University of Pittsburgh Computational and Systems Biology

CNNs for Protein-Ligand Scoring

10

CNN

Pose Prediction Binding Discrimination Affinity Prediction

slide-19
SLIDE 19

University of Pittsburgh Computational and Systems Biology

C C O O C C C C C O O O O C C C C C CO O O O C C C C C C O O C C C C C C C C C C C C C C C C G G R R G G G G G R R R R G G G G G GR R R R G G G G G G R R G G G G G G G G G G G G G G G G

Protein-Ligand Representation

11

(R,G,B) pixel

slide-20
SLIDE 20

University of Pittsburgh Computational and Systems Biology

C C O O C C C C C O O O O C C C C C CO O O O C C C C C C O O C C C C C C C C C C C C C C C C

Protein-Ligand Representation

11

(R,G,B) pixel → (Carbon, Nitrogen, Oxygen,…) voxel The only parameters for this representation are the choice of grid resolution, atom density, and atom types.

slide-21
SLIDE 21

University of Pittsburgh Computational and Systems Biology

Training Data

12

Pose Prediction 4056 protein-ligand complexes

  • diverse targets
  • wide range of affinities
  • generate poses with AutoDock Vina
  • include minimized crystal pose

Affinity Prediction

  • 8,688 low RMSD poses
  • assign known affinity
  • regression problem
slide-22
SLIDE 22

University of Pittsburgh Computational and Systems Biology

Data Augmentation

13

slide-23
SLIDE 23

University of Pittsburgh Computational and Systems Biology

Data Augmentation

13

slide-24
SLIDE 24

University of Pittsburgh Computational and Systems Biology

Model

14

2x2 Max Pooling 2x2 Max Pooling 2x2 Max Pooling 3x3x3 Convolution

48x48x48x35 24x24x24x35 24x24x24x32 12x12x12x32 12x12x12x64 6x6x6x64 6x6x6x128

Fully Connected Fully Connected

Affinity Pose Score

Softmax+Logistic Loss Pseudo-Huber Loss Rectified Linear Unit

3x3x3 Convolution

Rectified Linear Unit

3x3x3 Convolution

Rectified Linear Unit

slide-25
SLIDE 25

University of Pittsburgh Computational and Systems Biology

Results

15

Trained on PDBbind refined; tested on CSAR

slide-26
SLIDE 26

University of Pittsburgh Computational and Systems Biology

Results

15

Trained on PDBbind refined; tested on CSAR

slide-27
SLIDE 27

University of Pittsburgh Computational and Systems Biology

Results

15

Trained on PDBbind refined; tested on CSAR

Clustered Cross-Validation

RMSE = 1.69 R = 0.57 AUC = 0.90

slide-28
SLIDE 28

University of Pittsburgh Computational and Systems Biology

Visualization

16

masking gradients layer-wise relevance

1UGX Score: 0.62

slide-29
SLIDE 29

University of Pittsburgh Computational and Systems Biology

Visualizing Empty Space

17

slide-30
SLIDE 30

University of Pittsburgh Computational and Systems Biology

Beyond Scoring

18

slide-31
SLIDE 31

University of Pittsburgh Computational and Systems Biology

Beyond Scoring

18

slide-32
SLIDE 32

University of Pittsburgh Computational and Systems Biology

Beyond Scoring

18

slide-33
SLIDE 33

University of Pittsburgh Computational and Systems Biology

Beyond Scoring

18

https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html

Deep Dreams

slide-34
SLIDE 34

University of Pittsburgh Computational and Systems Biology

Beyond Scoring

19

2Q89 More Oxygen Here Less Oxygen Here

slide-35
SLIDE 35

University of Pittsburgh Computational and Systems Biology

Beyond Scoring

19

2Q89 More Oxygen Here Less Oxygen Here

slide-36
SLIDE 36
slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40

University of Pittsburgh Computational and Systems Biology 21

Optimizing Low RMSD Poses

better worse

slide-41
SLIDE 41

University of Pittsburgh Computational and Systems Biology 22

Iterative Refinement

better worse

slide-42
SLIDE 42

University of Pittsburgh Computational and Systems Biology 22

Iterative Refinement

better worse

slide-43
SLIDE 43

University of Pittsburgh Computational and Systems Biology

Docking

23

MCMC

Sampling Refinement

N (50) independent Monte Carlo chains Scored with grid-accelerated Vina Best identified pose retained

MCMC MCMC MCMC MCMC

… vina/smina/gnina Vina CNN Rescoring

CNN pose affinity

best poses

slide-44
SLIDE 44

University of Pittsburgh Computational and Systems Biology

Full CNN Docking

24

slide-45
SLIDE 45

University of Pittsburgh Computational and Systems Biology

GPU Performance

25

Average Time (ms) 125 250 375 500 Xeon 4110 2.1GHz i9-7920X 2.9Ghz GTX 1070 Ti V100

Molecular Grid CNN Forward CNN Backward Atom Gradients

slide-46
SLIDE 46

Prospective Evaluation: D3R

slide-47
SLIDE 47

University of Pittsburgh Computational and Systems Biology

Grand Challenge 3

27

cnn_docked_affinity cnn_rescore_affinity cnn_docked_scoring cnn_rescore_scoring vina cat 0.0701 0.154

  • 0.0351

0.178 0.179 p38a

  • 0.0784
  • 0.116
  • 0.329
  • 0.305
  • 0.0631

vegfr2 0.366 0.484 0.434 0.448 0.414 jak2 0.428 0.338 0.39 0.27 0.106 jak2_sub3 0.68 0.369

  • 0.372

0.159

  • 0.633

tie2 0.648 0.835 0.136

  • 0.078

0.561 abl1 0.634 0.745 0.005 0.182 0.713

Spearman Correlation

slide-48
SLIDE 48

University of Pittsburgh Computational and Systems Biology

Grand Challenge 3: The Good

28

slide-49
SLIDE 49

University of Pittsburgh Computational and Systems Biology

Grand Challenge 3: The Good

29

slide-50
SLIDE 50

University of Pittsburgh Computational and Systems Biology

Grand Challenge 3: The Good

30

slide-51
SLIDE 51

University of Pittsburgh Computational and Systems Biology

Grand Challenge 3: The Good

31

slide-52
SLIDE 52

University of Pittsburgh Computational and Systems Biology

Grand Challenge 3: The Bad

32

slide-53
SLIDE 53

University of Pittsburgh Computational and Systems Biology

Grand Challenge 3: The Ugly

33

slide-54
SLIDE 54

and now for something completely different…

slide-55
SLIDE 55

University of Pittsburgh Computational and Systems Biology

Context Encoding

35

http://people.eecs.berkeley.edu/~pathak/context_encoder/

slide-56
SLIDE 56

University of Pittsburgh Computational and Systems Biology

Molecular Context Encoding

36

slide-57
SLIDE 57

University of Pittsburgh Computational and Systems Biology

Acknowledgements

37

Matt Ragoza Josh Hochuli Jocelyn Sunseri

R01GM108340

Group Members Jocelyn Sunseri Jonathan King Paul Francoeur Matt Ragoza Josh Hochuli Lily Turner Pulkit Mittal Alec Helbling Gibran Biswas Sharanya Bandla Faiha Khan

Department of Computational and Systems Biology

Lily Turner

slide-58
SLIDE 58

University of Pittsburgh Computational and Systems Biology 38

github.com/gnina http://bits.csb.pitt.edu

@david_koes

slide-59
SLIDE 59

University of Pittsburgh Computational and Systems Biology 38

github.com/gnina http://bits.csb.pitt.edu

@david_koes