hierarchical generation of molecular graphs using
play

Hierarchical Generation of Molecular Graphs using Structural Motifs - PowerPoint PPT Presentation

Hierarchical Generation of Molecular Graphs using Structural Motifs Wengong Jin, Regina Barzilay, Tommi Jaakkola MIT CSAIL Drug Discovery via Generative Models Drug discovery: finding molecules with desired chemical properties The


  1. Hierarchical Generation of Molecular Graphs using Structural Motifs Wengong Jin, Regina Barzilay, Tommi Jaakkola MIT CSAIL

  2. Drug Discovery via Generative Models ‣ Drug discovery: finding molecules with desired chemical properties ‣ The primary challenge: large search space 10 30 Criterion: Search Find • Safe • Cures COVID Potential candidates Remdesivir?

  3. Drug Discovery via Generative Models ‣ Generative models can be used to efficiently search in the chemical space ‣ Given a specified criterion, the model generates a molecule with desired properties. Criterion: Condition Generate • Safe • Cures COVID Generative Model Remdesivir

  4. Molecular Graph Generation ‣ Consider connected graphs… ‣ Different type of graphs require different generation method. ‣ What kind of generation method is suitable for molecules? Line graph Low tree-width Grid graph Fully connected (Images) graph (text) graph (molecule) Complexity

  5. Previous Methods for Molecule Generation ‣ Atom based methods : CG-VAE (Liu et al. 2018), DeepGMG (Li et al. 2018), GraphRNN (You et al. 2018), and more O O O O O S S N N N N N Atom based

  6. Previous Methods for Molecule Generation ‣ Atom based methods : CG-VAE (Liu et al. 2018), DeepGMG (Li et al. 2018), GraphRNN (You et al. 2018), and more ‣ Substructure based methods : JT-VAE (Jin et al., 2018) - Incorporating inductive bias (i.e., low tree-width) into generation - Each time generate a cycle or edge O O O O O O O O S S N N S N N S N N N N Atom based Substructure based

  7. Previous methods: limitation ‣ Atom based methods : CG-VAE (Liu et al. 2018), DeepGMG (Li et al. 2018), GraphRNN (You et al. 2018), and more ‣ Substructure based methods : JT-VAE (Jin et al., 2018) Reconstruction Accuracy w.r.t. Molecule Size 80 JT-VAE CG-VAE 64 Accuracy 48 32 16 0 20 40 60 80 100

  8. Previous methods: limitation ‣ Atom based methods : CG-VAE (Liu et al. 2018), DeepGMG (Li et al. 2018), GraphRNN (You et al. 2018), and more ‣ Substructure based methods : JT-VAE (Jin et al., 2018) Reconstruction Accuracy w.r.t. Molecule Size 80 JT-VAE CG-VAE 64 Accuracy 48 32 16 Large molecules 0 (e.g., peptides, polymers) 20 40 60 80 100

  9. Failure in Generating Large Molecules ‣ Atom based methods : CG-VAE (Liu et al. 2018), DeepGMG (Li et al. 2018), GraphRNN (You et al. 2018), and more O N O N O H CG-VAE O N S N 70 atom predictions N N + 70 bond predictions N S O H O N O N O ‣ Many Generation Steps: Vanishing gradient + error accumulation

  10. Failure in Generating Large Molecules ‣ Atom based methods : CG-VAE (Liu et al. 2018), DeepGMG (Li et al. 2018), GraphRNN (You et al. 2018), and more ‣ Substructure based methods : JT-VAE (Jin et al., 2018) O N O N O H O N JT-VAE: S N N 35 substructure (ring/bond) N N S predictions O H O N O N O ‣ JT-VAE decoder requires each substructure neighborhood to be assembled in one go, making it combinatorially challenging to handle large substructures.

  11. Larger Building Blocks: Motifs ‣ JT-VAE only considered single rings and bonds as building blocks ‣ How about using larger building blocks — motifs with flexible structures, not restricted to rings and bonds? ‣ Large molecules such as polymers exhibit clear hierarchical structure, being built from repeated structural motifs. O N O N O H O N S ‣ Only 11 steps to generate N N N this polymer structure. N S O H O N O N O

  12. NLP Analogy ‣ Atom-based generation == character-based generation ‣ Substructure-based generation == word-based generation ‣ Motif-based generation == phrase-based generation O O N N N N N N N N N N N O O N N O O Cl S N N H S N Si S S N S N N N S N S Si N H ‣ Substructures ‣ Motifs ‣ (ring and bond only) ‣ (structures can be flexible) ‣ Word-based generation ‣ Phrase-based generation

  13. Our New Architecture: HierVAE ‣ Generates molecules motif by motif - Faster and more efficient - Much higher reconstruction accuracy for large molecules Reconstruction Accuracy w.r.t. Molecule Size 90 Motif (Ours) 72 Substructure Atom 54 Accuracy 36 18 0 20 40 60 80 100

  14. Our New Architecture: HierVAE ‣ Motif extraction from data - Motif extraction is based on heuristics - Later I will discuss how motifs can be learned (based on given properties). ‣ Hierarchical Graph Encoder - Representing molecules at both motif and atom level. - Designed to match the decoding process ‣ Hierarchical Graph Decoder - Each generation step needs to resolve: 1. What’s the next motif? 2. How it should be attached to current graph?

  15. Motif Extraction Algorithm ‣ A molecule is decomposed into disconnected motifs as follows: 1. Find all the bridge bonds (u, v) such that either u or v is part of a ring. F F F N N O O Bridge bonds F F F

  16. Motif Extraction Algorithm ‣ A molecule is decomposed into disconnected motifs as follows: 1. Find all the bridge bonds (u, v) such that either u or v is part of a ring. 2. Detach all bridge bonds from its neighbors. F F F N N O Detach O Detach F F F

  17. Motif Extraction Algorithm ‣ A molecule is decomposed into disconnected components as follows: 1. Find all the bridge bonds (u, v) such that either u or v is part of a ring. 2. Detach all bridge bonds from its neighbors. 3. Select all components as motifs if it occurs frequently in the training set. F F F N N O O F F Occurs frequently, select as motif F

  18. Motif Extraction Algorithm ‣ A molecule is decomposed into disconnected components as follows: 1. Find all the bridge bonds (u, v) such that either u or v is part of a ring. 2. Detach all bridge bonds from its neighbors. 3. Select all components as motifs if it occurs frequently in the training set. 4. If a component is not selected, further decompose it into basic rings and bonds. F F F N N O O Break into three Break into two F bonds (motifs) bonds (motifs) F F

  19. Mark attaching points ‣ Motif decomposition loses atom-level connectivity information ‣ For ease of reconstruction, we propose to mark attaching points in each motif. F F F F F F N N O O F F F

  20. Motif Vocabulary ‣ We can construct a motif vocabulary given a training set (usually <500) O O N N N N N N N N N N N O O N N N O O Cl S N H N S Si S S S N N N N S N S Si N H ‣ Each motif also has a vocabulary of possible attaching point configurations. - Usually less than 10 because motifs have regular attachment patterns. - The attachment vocabulary covers >97% of the molecules in test set. N N N N N N N N N N N N N N N N

  21. Generation Process Current state N O N O During generation, we maintain all possible positions to which new motifs will be attached

  22. Generation Process Current state Step 1: Motif Prediction H N N S N O O N S N N H O O O O N N N N N N N N Motif O O Vocabulary N H S N Si S N N S N S Si N H

  23. Generation Process Current state Step 2: Attachment Prediction H N N S N O O N S N N H O O H H H S N S N N S Attachment Vocabulary N S N S N S H H H

  24. Generation Process Current state Step 3: Graph Prediction H N N S N O O N S N N H O O

  25. Generation Process Current state Next State H N N N S O O N S N H N O O

  26. Generation Process Current state Next State H N N N S O O N S N H N O O ‣ JT-VAE assembles each neighborhood (multiple motifs) in one go. ‣ HierVAE decomposes the assembly process into multiple “baby steps” - First predict attaching points, then matching atoms. - Assembles one motif at a time, not the entire neighborhood.

  27. Hierarchical Graph Encoder (bottom up) ‣ Atom layer serves graph H N S N Atom Layer prediction (step 3) O N S H N O O

  28. Hierarchical Graph Encoder (bottom up) H N N S ‣ Attachment layer serves O N S N N H O Attachment attachment prediction (step 2) Layer O N ‣ Atom layer serves graph H N S N Atom Layer prediction (step 3) O N S H N O O

  29. Hierarchical Graph Encoder (bottom up) H N N S ‣ Motif layer designed for motif O N S N N O H prediction (step 1) Motif Layer O N H N N S ‣ Attachment layer is designed for O N S N N H O attachment prediction (step 2) Attachment Layer O N ‣ Atom layer is designed for graph H N S N Atom Layer prediction (step 3) O N S H N O O

  30. Hierarchical Graph Encoder (bottom up) Motif vectors ‣ Run motif layer message passing network Motif Layer Propagate messages to corresponding nodes Attachment vectors ‣ Run attachment layer message passing network Attachment Layer Propagate messages to corresponding nodes Atom vectors ‣ Run atom layer message H N S N Atom Layer passing network O N S H N O O

  31. Hierarchical Graph Decoder (top down) 1 ‣ Motif Prediction Motif vectors N - Classification: predict the right motif in the vocabulary Attachment vectors Atom vectors H N S N O N S H N O

  32. Hierarchical Graph Decoder (top down) 1 ‣ Motif Prediction Motif vectors N - Classification: predict the right motif in the vocabulary 2 ‣ Attachment Prediction Attachment vectors N - Classification: predict the right attachment in the vocabulary Atom vectors H N S N O N S H N O

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend