Hierarchical Generation of Molecular Graphs using Structural Motifs - PowerPoint PPT Presentation

Hierarchical Generation of Molecular Graphs using Structural Motifs Wengong Jin, Regina Barzilay, Tommi Jaakkola MIT CSAIL

Drug Discovery via Generative Models ‣ Drug discovery: finding molecules with desired chemical properties ‣ The primary challenge: large search space 10 30 Criterion: Search Find • Safe • Cures COVID Potential candidates Remdesivir?

Drug Discovery via Generative Models ‣ Generative models can be used to efficiently search in the chemical space ‣ Given a specified criterion, the model generates a molecule with desired properties. Criterion: Condition Generate • Safe • Cures COVID Generative Model Remdesivir

Molecular Graph Generation ‣ Consider connected graphs… ‣ Different type of graphs require different generation method. ‣ What kind of generation method is suitable for molecules? Line graph Low tree-width Grid graph Fully connected (Images) graph (text) graph (molecule) Complexity

Previous Methods for Molecule Generation ‣ Atom based methods : CG-VAE (Liu et al. 2018), DeepGMG (Li et al. 2018), GraphRNN (You et al. 2018), and more O O O O O S S N N N N N Atom based

Previous Methods for Molecule Generation ‣ Atom based methods : CG-VAE (Liu et al. 2018), DeepGMG (Li et al. 2018), GraphRNN (You et al. 2018), and more ‣ Substructure based methods : JT-VAE (Jin et al., 2018) - Incorporating inductive bias (i.e., low tree-width) into generation - Each time generate a cycle or edge O O O O O O O O S S N N S N N S N N N N Atom based Substructure based

Previous methods: limitation ‣ Atom based methods : CG-VAE (Liu et al. 2018), DeepGMG (Li et al. 2018), GraphRNN (You et al. 2018), and more ‣ Substructure based methods : JT-VAE (Jin et al., 2018) Reconstruction Accuracy w.r.t. Molecule Size 80 JT-VAE CG-VAE 64 Accuracy 48 32 16 0 20 40 60 80 100

Previous methods: limitation ‣ Atom based methods : CG-VAE (Liu et al. 2018), DeepGMG (Li et al. 2018), GraphRNN (You et al. 2018), and more ‣ Substructure based methods : JT-VAE (Jin et al., 2018) Reconstruction Accuracy w.r.t. Molecule Size 80 JT-VAE CG-VAE 64 Accuracy 48 32 16 Large molecules 0 (e.g., peptides, polymers) 20 40 60 80 100

Failure in Generating Large Molecules ‣ Atom based methods : CG-VAE (Liu et al. 2018), DeepGMG (Li et al. 2018), GraphRNN (You et al. 2018), and more O N O N O H CG-VAE O N S N 70 atom predictions N N + 70 bond predictions N S O H O N O N O ‣ Many Generation Steps: Vanishing gradient + error accumulation

Failure in Generating Large Molecules ‣ Atom based methods : CG-VAE (Liu et al. 2018), DeepGMG (Li et al. 2018), GraphRNN (You et al. 2018), and more ‣ Substructure based methods : JT-VAE (Jin et al., 2018) O N O N O H O N JT-VAE: S N N 35 substructure (ring/bond) N N S predictions O H O N O N O ‣ JT-VAE decoder requires each substructure neighborhood to be assembled in one go, making it combinatorially challenging to handle large substructures.

Larger Building Blocks: Motifs ‣ JT-VAE only considered single rings and bonds as building blocks ‣ How about using larger building blocks — motifs with flexible structures, not restricted to rings and bonds? ‣ Large molecules such as polymers exhibit clear hierarchical structure, being built from repeated structural motifs. O N O N O H O N S ‣ Only 11 steps to generate N N N this polymer structure. N S O H O N O N O

NLP Analogy ‣ Atom-based generation == character-based generation ‣ Substructure-based generation == word-based generation ‣ Motif-based generation == phrase-based generation O O N N N N N N N N N N N O O N N O O Cl S N N H S N Si S S N S N N N S N S Si N H ‣ Substructures ‣ Motifs ‣ (ring and bond only) ‣ (structures can be flexible) ‣ Word-based generation ‣ Phrase-based generation

Our New Architecture: HierVAE ‣ Generates molecules motif by motif - Faster and more efficient - Much higher reconstruction accuracy for large molecules Reconstruction Accuracy w.r.t. Molecule Size 90 Motif (Ours) 72 Substructure Atom 54 Accuracy 36 18 0 20 40 60 80 100

Our New Architecture: HierVAE ‣ Motif extraction from data - Motif extraction is based on heuristics - Later I will discuss how motifs can be learned (based on given properties). ‣ Hierarchical Graph Encoder - Representing molecules at both motif and atom level. - Designed to match the decoding process ‣ Hierarchical Graph Decoder - Each generation step needs to resolve: 1. What’s the next motif? 2. How it should be attached to current graph?

Motif Extraction Algorithm ‣ A molecule is decomposed into disconnected motifs as follows: 1. Find all the bridge bonds (u, v) such that either u or v is part of a ring. F F F N N O O Bridge bonds F F F

Motif Extraction Algorithm ‣ A molecule is decomposed into disconnected motifs as follows: 1. Find all the bridge bonds (u, v) such that either u or v is part of a ring. 2. Detach all bridge bonds from its neighbors. F F F N N O Detach O Detach F F F

Motif Extraction Algorithm ‣ A molecule is decomposed into disconnected components as follows: 1. Find all the bridge bonds (u, v) such that either u or v is part of a ring. 2. Detach all bridge bonds from its neighbors. 3. Select all components as motifs if it occurs frequently in the training set. F F F N N O O F F Occurs frequently, select as motif F

Motif Extraction Algorithm ‣ A molecule is decomposed into disconnected components as follows: 1. Find all the bridge bonds (u, v) such that either u or v is part of a ring. 2. Detach all bridge bonds from its neighbors. 3. Select all components as motifs if it occurs frequently in the training set. 4. If a component is not selected, further decompose it into basic rings and bonds. F F F N N O O Break into three Break into two F bonds (motifs) bonds (motifs) F F

Mark attaching points ‣ Motif decomposition loses atom-level connectivity information ‣ For ease of reconstruction, we propose to mark attaching points in each motif. F F F F F F N N O O F F F

Motif Vocabulary ‣ We can construct a motif vocabulary given a training set (usually <500) O O N N N N N N N N N N N O O N N N O O Cl S N H N S Si S S S N N N N S N S Si N H ‣ Each motif also has a vocabulary of possible attaching point configurations. - Usually less than 10 because motifs have regular attachment patterns. - The attachment vocabulary covers >97% of the molecules in test set. N N N N N N N N N N N N N N N N

Generation Process Current state N O N O During generation, we maintain all possible positions to which new motifs will be attached

Generation Process Current state Step 1: Motif Prediction H N N S N O O N S N N H O O O O N N N N N N N N Motif O O Vocabulary N H S N Si S N N S N S Si N H

Generation Process Current state Step 2: Attachment Prediction H N N S N O O N S N N H O O H H H S N S N N S Attachment Vocabulary N S N S N S H H H

Generation Process Current state Step 3: Graph Prediction H N N S N O O N S N N H O O

Generation Process Current state Next State H N N N S O O N S N H N O O

Generation Process Current state Next State H N N N S O O N S N H N O O ‣ JT-VAE assembles each neighborhood (multiple motifs) in one go. ‣ HierVAE decomposes the assembly process into multiple “baby steps” - First predict attaching points, then matching atoms. - Assembles one motif at a time, not the entire neighborhood.

Hierarchical Graph Encoder (bottom up) ‣ Atom layer serves graph H N S N Atom Layer prediction (step 3) O N S H N O O

Hierarchical Graph Encoder (bottom up) H N N S ‣ Attachment layer serves O N S N N H O Attachment attachment prediction (step 2) Layer O N ‣ Atom layer serves graph H N S N Atom Layer prediction (step 3) O N S H N O O

Hierarchical Graph Encoder (bottom up) H N N S ‣ Motif layer designed for motif O N S N N O H prediction (step 1) Motif Layer O N H N N S ‣ Attachment layer is designed for O N S N N H O attachment prediction (step 2) Attachment Layer O N ‣ Atom layer is designed for graph H N S N Atom Layer prediction (step 3) O N S H N O O

Hierarchical Graph Encoder (bottom up) Motif vectors ‣ Run motif layer message passing network Motif Layer Propagate messages to corresponding nodes Attachment vectors ‣ Run attachment layer message passing network Attachment Layer Propagate messages to corresponding nodes Atom vectors ‣ Run atom layer message H N S N Atom Layer passing network O N S H N O O

Hierarchical Graph Decoder (top down) 1 ‣ Motif Prediction Motif vectors N - Classification: predict the right motif in the vocabulary Attachment vectors Atom vectors H N S N O N S H N O

Hierarchical Graph Decoder (top down) 1 ‣ Motif Prediction Motif vectors N - Classification: predict the right motif in the vocabulary 2 ‣ Attachment Prediction Attachment vectors N - Classification: predict the right attachment in the vocabulary Atom vectors H N S N O N S H N O

Hierarchical Generation of Molecular Graphs using Structural Motifs - PowerPoint PPT Presentation

Hierarchical Generation of Molecular Graphs using Structural Motifs Wengong Jin, Regina Barzilay, Tommi Jaakkola MIT CSAIL Drug Discovery via Generative Models Drug discovery: finding molecules with desired chemical properties The

4. Molecular dynamics Understanding Molecular Simulation Molecular Simulations Molecular

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Molecular vibrations Ask Hjorth Larsen Center for Atomic-scale Materials Design 2008 Molecular

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

3. Monte Carlo Simulations Understanding Molecular Simulation Molecular Simulations Molecular

Molecular Simulation Introduction Understanding Molecular Simulation Introduction Why to use

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Searching on Graphs November 16, 2016 CMPE 250 Graphs- Searching on Graphs November 16, 2016 1

CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of What can

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Aaron Schulman Stanford University Cellular base station PHY measurement Smartphone power

GraphPIM : Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks Lifeng Nai

Recommenda)ons and Ques)ons wwPDB/CCDC/D3R Ligand Valida)on Workshop

Interference with Bose-Einstein condensates on atom chips Sebastian Hofferberth, Igor Lesanovsky,

DHTM: Durable Hardware Transactional Memory Arpit Joshi , Vijay Nagarajan, Marcelo Cintra, Stratis

Light and Atoms Our goals for learning: Light interacts with atoms in specific How can

Meaning of Atoms Models assign truth values A model assigns truth values ( F or T ) to each atom.

CS 133 - Introduction to Computational and Data Science Instructor: Renzhi Cao Computer Science