Representation and Generation of Molecular Graphs Wengong Jin MIT - PowerPoint PPT Presentation

Representation and Generation of Molecular Graphs Wengong Jin MIT CSAIL in collaboration with Tommi Jaakkola, Regina Barzilay, Kevin Yang, Kyle Swanson

Why are molecules interesting for ML? ‣ E.g., antibiotic (cephalosporin) substructures node labels edge labels (motifs) 3D information

Why are molecules interesting for ML? ‣ E.g., antibiotic (cephalosporin) substructures node labels edge labels (motifs) 3D information Together give rise to various chemical properties (e.g., solubility, toxicity, …)

Why are molecules interesting for ML? ‣ Properties may depend on intricate structures; ‣ The key challenges are to automatically predict chemical properties and to generate molecules with desirable characteristics (Daptomycin antibiotic)

Interesting ML Problems ‣ Deeper into known chemistry - extract chemical knowledge from journals, notebooks (NLP) ‣ Deeper into drug design - molecular property prediction (graph representation) - (multi-criteria) lead optimization (graph generation) ‣ Deeper into reactions - forward reaction prediction (structured prediction) - forward reaction optimization (combinatorial optimization) ‣ Deeper into synthesis - retrosynthesis planning (reinforcement learning)

Automating Drug design ‣ Key challenges:   1. representation and prediction: learn to predict molecular properties   2. generation and optimization: realize target molecules with better properties programmatically   3. understanding: uncover principles (or diagnose errors) underlying complex predictions

GNNs for property prediction? ‣ Are GNN models operating on molecular graphs sufficiently expressive for predicting molecular properties (in the presence of “property cliffs”)? solubility, toxicity, bioactivity, etc. GNN embedding aggregation prediction ‣ A number of recent results pertaining to the power of GNNs (e.g., Xu et al. 2018, Sato et al. 2019, Maron et al., 2019, …);

Are basic GNNs sufficiently expressive? ‣ Theorem [Garg et al., 2019]: GNNs with permutation invariant readout functions cannot “decide” - girth (length of the shortest cycle) - circumference (length of the longest cycle) - diameter, radius - presence of conjoint cycle - total number of cycles - presence of c-clique - etc. (?) ‣ (most results also apply to MPNNs) property

Beyond simple GNNs: sub-structures ‣ Learning to view molecules at multiple levels [Jin et al., 2019] 1. original molecular graph Hierarchical Graph-to-Graph Translation for Molecules (2019) . W. Jin, R. Barzilay, and T. Jaakkola

Beyond simple GNNs: sub-structures ‣ Learning to view molecules at multiple levels 1. original molecular graph Hierarchical Graph-to-Graph Translation for Molecules (2019) . W. Jin, R. Barzilay, and T. Jaakkola

Beyond simple GNNs: sub-structures ‣ Learning to view molecules at multiple levels s N … N N N O O Cl S N … N S S N N a dictionary of substructures 1. original molecular graph Hierarchical Graph-to-Graph Translation for Molecules (2019) . W. Jin, R. Barzilay, and T. Jaakkola

Beyond simple GNNs: sub-structures ‣ Learning to view molecules at multiple levels s N … N N N O O Cl S N … N S S N N a dictionary of substructures Pooling 2. substructure graph 1. original molecular graph Hierarchical Graph-to-Graph Translation for Molecules (2019) . W. Jin, R. Barzilay, and T. Jaakkola

Beyond simple GNNs: sub-structures ‣ Learning to view molecules at multiple levels s N … N N N O O Cl S N … N S S N N a dictionary of substructures 2. substructure graph with attachments 1. original molecular graph Hierarchical Graph-to-Graph Translation for Molecules (2019) . W. Jin, R. Barzilay, and T. Jaakkola

Beyond simple GNNs: sub-structures ‣ Learning to view molecules at multiple levels Propagate atom embeddings 2. substructure graph with attachments 1. original molecular graph Hierarchical Graph-to-Graph Translation for Molecules (2019) . W. Jin, R. Barzilay, and T. Jaakkola

Beyond simple GNNs: sub-structures ‣ Learning to view molecules at multiple levels 3. substructure graph 2. substructure graph with attachments 1. original molecular graph Hierarchical Graph-to-Graph Translation for Molecules (2019) . W. Jin, R. Barzilay, and T. Jaakkola

Multi-resolution representations ‣ Learning to view molecules at multiple levels 3. substructure Hierarchical message graph passing 2. substructure graph with attachments ‣ Related to graph-pooling 1. original molecular (Ying et al., 2018, …) graph Hierarchical Graph-to-Graph Translation for Molecules (2019) . W. Jin, R. Barzilay, and T. Jaakkola

Experiments on solubility ‣ ESOL dataset (averaged over 5 folds) ESOL RMSE 1.2 1.11 1.025 0.85 0.675 0.69 0.65 0.5 GNN GNN-Feature Hier-MPNN

Experiments on solubility ‣ ESOL dataset (averaged over 5 folds) Raw GNN ESOL RMSE ‣ atom feature: only atom type label 1.2 1.11 1.025 0.85 0.675 0.69 0.65 0.5 GNN GNN-Feature Hier-MPNN

Experiments on solubility ‣ ESOL dataset (averaged over 5 folds) Raw GNN ESOL RMSE ‣ atom feature: only atom type label 1.2 1.11 1.025 GNN with features 0.85 ‣ atom type label ‣ degree 0.675 0.69 ‣ valence 0.65 Cycle ‣ whether an atom is in a cycle information 0.5 GNN GNN-Feature Hier-MPNN ‣ whether an atom is in an aromatic ring ‣ ……

Experiments on solubility ‣ ESOL dataset (averaged over 5 folds) ESOL RMSE 1.2 1.11 1.025 Hierarchical GNN ‣ Atom features: still just atom type 0.85 ‣ But has extra substructure information built into the architecture 0.675 0.69 0.65 0.5 GNN GNN-Feature HierGNN

        New Antibiotic Discovery ‣ If we can accurately predict molecular properties, we can screen (select and repurpose) molecules from a large candidate set   … ‣ Antibiotic Discovery [Stokes et al., 2019] - Trained a model to predict the inhibition against E. Coli (some bacteria…) - Data: ~2000 measured compounds from Broad Institute at MIT - Screened in total ~100 million compounds - Biologists tested 15 molecules (top prediction, structurally diverse) in the lab - 7 of them are validated to be inhibitive in-vitro - 1 of them demonstrate strong inhibition against other bacteria (e.g., A. baumannii) - All of them are new antibiotics distinct from existing ones! Learning to Discover Novel Antibiotics from Vast Chemical Spaces (2019) , J. Stokes, K. Yang, K. Swanson, W. Jin , R. Barzilay, T. Jaakkola et al.

Automating Drug design ‣ Key challenges:   1. representation and prediction: learn to predict molecular properties   2. generation and optimization: realize target molecules with better properties programmatically   3. understanding: uncover principles (or diagnose errors) underlying complex predictions

De novo molecule optimization ‣ Goal: We aim to programmatically turn precursor molecules into molecules that satisfy given design specifications

De novo molecule optimization ‣ Goal: We aim to programmatically turn precursor molecules into molecules that satisfy given design specifications ‣ Similar but … ‣ Better drug-likeness

De novo molecule optimization ‣ Goal: We aim to programmatically turn precursor molecules into molecules that satisfy given design specifications ‣ Similar but … ‣ Better drug-likeness ‣ Similar but … ‣ Better solubility

De novo molecule optimization ‣ Goal: We aim to programmatically turn precursor molecules into molecules that satisfy given design specifications ‣ Similar but … ‣ Better drug-likeness ‣ Similar but … ‣ Better solubility ‣ Need to learn a molecule-to-molecule mapping (i.e., graph-to-graph)

Molecule optimization as Graph Translation ‣ Goal: We aim to programmatically turn precursor molecules into molecules that satisfy given design specifications … Encode Decode Source … Target X Y …

Molecule optimization as Graph Translation ‣ Goal: We aim to programmatically turn precursor molecules into molecules that satisfy given design specifications … Encode Decode Source … Target X Y … ‣ The training set consists of (source, target) molecular pairs, e.g., Source Target … …

Representation and Generation of Molecular Graphs Wengong Jin MIT - PowerPoint PPT Presentation

Representation and Generation of Molecular Graphs Wengong Jin MIT CSAIL in collaboration with Tommi Jaakkola, Regina Barzilay, Kevin Yang, Kyle Swanson Why are molecules interesting for ML? E.g., antibiotic (cephalosporin) substructures

4. Molecular dynamics Understanding Molecular Simulation Molecular Simulations Molecular

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

Graphs Graphs Definitions Implementation/Representation of graphs Search Traversing

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

Molecular vibrations Ask Hjorth Larsen Center for Atomic-scale Materials Design 2008 Molecular

3. Monte Carlo Simulations Understanding Molecular Simulation Molecular Simulations Molecular

Molecular Simulation Introduction Understanding Molecular Simulation Introduction Why to use

A Compact Representation for Chordal Chordal Graphs Graphs A Compact Representation for Lilian

Graphs Graphs Definitions Implementation/Representation of graphs Search

Searching on Graphs November 16, 2016 CMPE 250 Graphs- Searching on Graphs November 16, 2016 1

CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of What can

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

RECENT ADVANCEMENTS IN FUNCTIONAL ASSESSMENT M E T H O D S A N D A P P L I C AT I O N S NCABA

Source Influences on Ambient Ozone Precursor Concentrations in the Colorado Front Range Shannon

chameleon bigravity 2018 03 03

Minimal theory of quasidilaton massive gravity GC2018, 18.02.06 M. Oliosi (YITP) Based on 2

African Easterly Waves and Atlantic Hurricanes Rosana Nieto Ferreira Tom Rickenbach East

Reaction Monitoring Kelly Ruggles kelly@fenyolab.org New York University Traditional

The Coach Is Is In In: Shape Your QI I Project TODAY Room - Pinn innacle le II II Come

ALD Basics: ALD on Powders December 19 th , 2019 dhiggs@forgenano.com smoulton@forgenano.com A