Neural Networks Designing New Drugs Mariya Popova, Olexandr Isayev, - PowerPoint PPT Presentation

Neural Networks Designing New Drugs Mariya Popova, Olexandr Isayev, Alexandr Tropsha

Drug g Discovery Timeline ne 2

Conventio ional al Vir irtual tual Scre reening ing Pip ipeli line CHEMICAL AL CHEMIC MICAL PREDICTIVE PROPERTY/ STRUCTURES RES DESCRIPTO IPTORS MODELS ACTIVITY Chemical database VIRTUAL L SCREEN EENING Actives Inactives 3 ~10 6 – 10 9 molecules

Why Do We Need Generat rativ ive Mode dels? ls? Biggest database of o molecules has ~10 9 compounds Estimates for the size of o chemical space – up to 10 60 Searching for new drug o candidates in existing databases – observation bias 4

Generati rative Models ls Ove vervie view 5 Sanchez-Lengeling, Benjamin, and Al án Aspuru-Guzik. "Inverse molecular design using machine learning: Generative models for matter engineering." Science 361.6400 (2018): 360-365.

Our Appr proach oach • Generative model for SMILES 𝐻 • Predictive model for the desired property 𝑄 • 𝐻 and 𝑄 combined with RL in one pipeline to bias the property of generated molecules. 6 Popova et. al. "Deep reinforcement learning for de novo drug design." Science advances 4.7 (2018): eaap7885.

SMIL ILES ES-ba base sed d Generati rative Mode del • SMILES (simplif lified ied molec ecula lar-in input t line- entry y system) is a sequence of characters then encodes the molecular graph • One sequence = one molecule • Has alphabet Use language model for producing novel SMILES strings 𝑞 𝑡 𝑢 𝑡 1 … 𝑡 𝑢−1 ; 𝜄 = 𝑔(𝑡 1 … 𝑡 𝑢−1 |𝜄) 7

Generati rative Model: l: train aining g mode • Trained on 1.5 million of drug-like compounds from ChEMBL in a supervised manner Softmax loss C С O <END> [0.90 0.05 … 0.01] [0.03 0.50 … 0.03] [0.7 0.15 … 0.02] [0.07 0.13 … 0.77] Stack Stack Stack GRU GRU GRU GRU [0.27 0.15 … 0.03] [0.63 0.14 … 0.23] [0.45 0.66 … 0.87] [0.33 0.13 … 0.01] <START> N C O 8

Generati rative Model: l: in infe feren rence mode de Model takes its own predictions as next input character: 𝑞 𝑡 𝑢 𝑡 1 … 𝑡 𝑢−1 ; 𝜄 = 𝑔(𝑡 1 … 𝑡 𝑢−1 |𝜄) Growing SMILES sampling sampling sampling sampling Probabilities of the next character Embedding vectors 9

RL fo form rmulati ulation on fo for SMIL ILES ES generat ratio ion • Action – generate symbol 𝑡 𝑞 𝑡 𝑢 𝑡 1 … 𝑡 𝑢−1 ; 𝜄 = 𝑔(𝑡 1 … 𝑡 𝑢−1 |𝜄) • Set of actions – SMILES alphabet 𝐵 • State – generated prefix 𝑡 1 𝑡 2 … 𝑡 𝑢−1 • Set of states – set of all possible strings in SMILES alphabet 𝐵 with lengths from 0 to T -- 𝔹 = {𝐵 𝑢 , 𝑢 = 0 … 𝑈} • Environment – set of states 𝔹 , set of actions 𝐵 and transition probabilities 𝑞 𝑡 𝑢 = 𝑏 𝑡 1 … 𝑡 𝑢−1 ; 𝜄 , 𝑏 ∈ 𝐵 • Reward function – 𝑆 𝑇 𝑢 • Objective – maximize the expected reward: 𝔽 𝑆 𝑇 𝑢 𝜄 = σ 𝑇∈𝔹 𝑞 𝑇 𝜄 𝑆 𝑇 → 𝑛𝑏𝑦 𝜄 10

RL Pip ipeli line For Molec lecul ule Generati ration on • Generative model is a policy network • Predictive model is a simulator of the real-world • Reward is assigned based on the property prediction and researcher’s objective 11

Re Resul ults: ts: opti timi mizi zing li lipophil ilicity ty • Lipophilicity is possibly the lost important physicochemical property of a potential drug • It plays a role in solubility, absorption, membrane penetration, etc • Log P is quantitative measure of lipophilicity, is the ratio of concentrations of a compound in a mixture of two immiscible phases at equilibrium Log P is a component of Lipinski’s Rule of 5 a rule of thumb to predict • drug-likeness • According to Lipinski’s rule must be in a range between 0 and 5 for drug-like molecules

Predi dictiv ive Model l fo for r lo log P • SMILES-based RNN • Dataset of 14k compounds with logP measurements • 5 fold cross- validation • RMSE = 0.57 • 𝑆 2 = 0.90 13

Re Resul ults: ts: opti timi mizi zing li lipophil ilicity ty Reward value 𝑆 𝑇 = ቊ 11, 𝑗𝑔 𝑚𝑝𝑕𝑄 𝑡 ∈ [0.5; 4.5] 0, 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓 Log P value 14

Re Resul ults: ts: opti timi mizi zing li lipophil ilicity ty Values of the reward function during training Reward value Training iteration 15

Re Resul ults: ts: opti timi mizi zing li lipophil ilicity ty Distribution of unbiased and optimized log P values • Statistics are calculated from 10000 randomly generated SMILES • 100% of optimized SMILES were predicted to have log P within drug-like region Predicted log P values 16

Lim imit itat atio ions Worked well for a relatively simple physical property What if a molecule with a high reward is a rear event? It could take very long until the model receives a high or non-zero reward 17

Tricks ks • Flexible reward – First give high reward for worse molecules, then gradually increase threshold • Fine- tuning on a dataset of “good” molecules in a supervised manner – Fine-tune on generated molecules with high rewards – Fine-tune on experimental ground truth data – High exploitation, low exploration • Using experience replay for policy gradient optimization – Remember generated molecules with high rewards and replay on them – Replay on experimental ground truth data 18

More re results: ts: EGFR R Epidermal growth factor receptor (EGFR) • Associated with cancer and inflammatory disease • Has ~10k experimental measurements for molecules 19

More re results: ts: EGFR • Built a binary classification (active/inactive) predictive model for EGFR (F-1 score 0.9) • Took pretrained on ChEMBL generative network • Generated 10k random molecules and predicted probability of class “active” 20

More re results: ts: EGFR Probability of class “active” 21

More re results: ts: EGFR • Flexible reward: 𝑆 𝑇 = ቊ 10, 𝑗𝑔 𝑄 𝑇 > 𝑢ℎ𝑠𝑓𝑡ℎ𝑝𝑚𝑒 0, 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓 • Initial threshold = 0.05 After every update we generate 10k compound • • If 15% of them predicted to have property > threshold, we increase threshold by 0.05 • Fine-tuning on generated molecules with high rewards • Experience replay on experimental measurements and on generated molecules with high rewards 22

More re results: ts: EGFR Unbiased Maximized Probability of class “active” 23

More re results: ts: EGFR Experimental validation: • Selected several commercially available and validated our results experimentally • Found 4 active compounds 24

Futur ure work • Develop graph-based generative models: – SMILES-based models generate some amount of invalid molecules • Develop lead optimization methods: – Start from a given scaffold/structure – Impossible to do with SMILES • Develop models for predicting route for synthesis: – To be able to perform custom synthesis 25

Code Lin inks RL for de novo drug design https://github.com/isayev/ReLeaSE 26

Ac Acknow owledge dgements nts Univ iver ersit ity y of North Carolina olina at Chapel pel Hill: l: Olexandr Isayev Alexandr Tropsha 27

Neural Networks Designing New Drugs Mariya Popova, Olexandr Isayev, - PowerPoint PPT Presentation

Neural Networks Designing New Drugs Mariya Popova, Olexandr Isayev, Alexandr Tropsha Drug g Discovery Timeline ne 2 Conventio ional al Vir irtual tual Scre reening ing Pip ipeli line CHEMICAL AL CHEMIC MICAL PREDICTIVE PROPERTY/

Outline Evolution of neurocomputing Artificial neural networks Feed forward

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS Neural networks Fully connected networks

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

How Neural Networks (NN) Biological Neuron: A . . . Can (Hopefully) Learn Artificial Neural . .

Neural Networks Learning the network: Part 1 11-785, Spring 2018 Lecture 3 1 Designing a net..

Outline Why model neural networks? Modeling Neural Networks A brief look at the neuron.

Convolutional Neural Networks II Milan Straka March 30, 2020 Charles University in Prague

Convolutional Neural Networks II Milan Straka April 01, 2019 Charles University in Prague

Introduction to Deep Neural Networks 0. Logistics Spring 2020 1 Neural Networks are taking

Propagating Error Backward Hyperparameters for Neural Networks } Multi-layer (deep) neural

Boot Camp 2015 3/5/15 Abstracting & Coding Boot Camp: Cancer Case Scenarios 2014-2015

ASCO - CAP Collaboration Richard L. Schilsky, MD ASCO Chief Medical Officer NCCS Cancer Policy

2010 Annual Report Partnerships for a Healthy Wisconsin Robert N. Golden, MD Robert Turell

Advisory Panel on Clinical Trials Spring 2015 Meeting Washington, DC May 28, 2015 Welcome and

Institute of Sciences and Innovation in Medicine Ivn Alfaro PhD. Director of Innova4on and

Entropy and Survival-based Weights to Combine Affymetrix Array Types in the Analysis of

Safety of Home Birth: The Evidence Safety of Home Birth: The Evidence Wax et al in AJOG--Deeply

Doug ouglas las A. D Denn nnis is, M , M.D. Adjun junct ct Prof ofes essor or, Dept.

Neural Networks Designing New Drugs Mariya Popova, Olexandr Isayev, - PowerPoint PPT Presentation

Neural Networks Designing New Drugs Mariya Popova, Olexandr Isayev, Alexandr Tropsha Drug g Discovery Timeline ne 2 Conventio ional al Vir irtual tual Scre reening ing Pip ipeli line CHEMICAL AL CHEMIC MICAL PREDICTIVE PROPERTY/

Outline Evolution of neurocomputing Artificial neural networks Feed forward

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS Neural networks Fully connected networks

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

How Neural Networks (NN) Biological Neuron: A . . . Can (Hopefully) Learn Artificial Neural . .

Neural Networks Learning the network: Part 1 11-785, Spring 2018 Lecture 3 1 Designing a net..

Outline Why model neural networks? Modeling Neural Networks A brief look at the neuron.

Convolutional Neural Networks II Milan Straka March 30, 2020 Charles University in Prague

Convolutional Neural Networks II Milan Straka April 01, 2019 Charles University in Prague

Introduction to Deep Neural Networks 0. Logistics Spring 2020 1 Neural Networks are taking

Propagating Error Backward Hyperparameters for Neural Networks } Multi-layer (deep) neural

Boot Camp 2015 3/5/15 Abstracting &amp; Coding Boot Camp: Cancer Case Scenarios 2014-2015

ASCO - CAP Collaboration Richard L. Schilsky, MD ASCO Chief Medical Officer NCCS Cancer Policy

2010 Annual Report Partnerships for a Healthy Wisconsin Robert N. Golden, MD Robert Turell

Advisory Panel on Clinical Trials Spring 2015 Meeting Washington, DC May 28, 2015 Welcome and

Institute of Sciences and Innovation in Medicine Ivn Alfaro PhD. Director of Innova4on and

Entropy and Survival-based Weights to Combine Affymetrix Array Types in the Analysis of

Safety of Home Birth: The Evidence Safety of Home Birth: The Evidence Wax et al in AJOG--Deeply

Doug ouglas las A. D Denn nnis is, M , M.D. Adjun junct ct Prof ofes essor or, Dept.

Boot Camp 2015 3/5/15 Abstracting & Coding Boot Camp: Cancer Case Scenarios 2014-2015