Improving Molecular Design by Stochastic Iterative Target - PowerPoint PPT Presentation

Improving Molecular Design by Stochastic Iterative Target Augmentation Kevin Yang, Wengong Jin, Kyle Swanson, Regina Barzilay, Tommi Jaakkola

15-Second Overview Data augmentation approach: improve molecular optimization SOTA by > 10% Broadly useful for structured generation tasks, e.g. program synthesis (shown later)

Context: Pharmaceutical Drug Discovery Suppose: have promising drug candidate for e.g., COVID-19

Context: Pharmaceutical Drug Discovery Suppose: have promising drug candidate for e.g., COVID-19 Want to make it more potent (higher property score) Have: Want:

Task: Molecular Optimization “Translate” input molecule to a similar molecule with better property score.

Task: Molecular Optimization “Translate” input molecule to a similar molecule with better property score. Dataset: collection of input-target pairs

Why is Molecular Optimization Hard?

Why is Molecular Optimization Hard? Real-world ground truth evaluation: lab assay

Why is Molecular Optimization Hard? Real-world ground truth evaluation: lab assay - Slow + expensive!

Why is Molecular Optimization Hard? Real-world ground truth evaluation: lab assay - Slow + expensive! Key Problem: Small Datasets

Stochastic Iterative Target Augmentation Data augmentation meta-algorithm on top of existing model

Results: Molecular Optimization - Over 10% absolute gain over SOTA on both datasets

Results: Program Synthesis

Stochastic Iterative Target Augmentation Data augmentation meta-algorithm on top of existing model - Sample input-output pairs from generator New “data” Some good , some bad

Stochastic Iterative Target Augmentation Data augmentation meta-algorithm on top of existing model - Sample input-output pairs from generator ? Filtered good “data” only New “data” Some good , some bad How to filter for only the good pairs?

Idea: Filter with Property Predictor Predict

Idea: Filter with Property Predictor This is easier than generation! Predict

Idea: Filter with Property Predictor This is easier than generation! Predict Program synthesis analogue: hard to write program, easier to run test cases

Stochastic Iterative Target Augmentation Data augmentation meta-algorithm on top of existing model - Sample input-output pairs from generator Property - Filter with property predictor, Predictor add good pairs to training data Filtered good “data” only New “data” Some good , some bad

Stochastic Iterative Target Augmentation Data augmentation meta-algorithm on top of existing model - Sample input-output pairs from generator - Filter with property predictor, add good pairs to training data - Train generator, repeat

Outline Setup + Evaluation Detailed Method More Empirical Analysis Program Synthesis Experiments + Results

Real World Molecular Optimization Real-world ground truth evaluation: lab assay - Slow + expensive! ( → small datasets)

Real World Molecular Optimization Real-world ground truth evaluation: lab assay - Slow + expensive! ( → small datasets) - Only use at final test time

Real World Molecular Optimization Real-world ground truth evaluation: lab assay - Slow + expensive! ( → small datasets) - Only use at final test time Use fast + cheap in silico (i.e., computational) predictor for model validation Test time only Can use anytime Data used Lab Assay in silico to train

Evaluation Setup (Lab assay, in silico predictor) become ( in silico predictor, proxy predictor) Test time only Can use anytime Data used Data used Lab Assay in silico Proxy to train to train

Evaluation Setup (Lab assay, in silico predictor) become ( in silico predictor, proxy predictor) Test time only Can use anytime Data used Data used Lab Assay in silico Proxy to train to train - Just train proxy on property values of molecular optimization training pairs

Metric “Success” if even 1/20 tries passes ground truth evaluator

Metric “Success” if even 1/20 tries passes ground truth evaluator Molecular optimization is hard...

Stochastic Iterative Target Augmentation Goal: Somehow Target augmentation: Augment the set of correct targets for a given input.

Stochastic Iterative Target Augmentation 1. Given inputs, sample input-target pairs from current generative model Target augmentation: Augment the set of correct targets for a given input.

Stochastic Iterative Target Augmentation 1. Given inputs, sample input-target pairs from current generative model 2. Filter candidate input-output pairs using property predictor Target augmentation: Augment the set of correct targets for a given input.

Stochastic Iterative Target Augmentation 1. Given inputs, sample input-target pairs from current generative model 2. Filter candidate input-output pairs using property predictor 3. Add good pairs to training data, train model, repeat

Results: Molecular Optimization - Over 10% absolute gain over SOTA on both datasets

Observations - View as Stochastic EM

Observations - View as Stochastic EM - Why iterative? Better generator → easier to find new correct targets

Observations - View as Stochastic EM - Why iterative? Better generator → easier to find new correct targets - May as well use proxy to filter samples at test time too

Frechet Chemnet Distance Analysis FCD (embedding distance) is the molecular analogue to Inception distance in images. Lower is better.

Improved Diversity Diversity: average distance between different correct outputs for the same input

Robustness to Predictor Quality Far left point is oracle (ground truth); second-from left is learned proxy predictor. Blue line indicates baseline performance.

Program Synthesis Task: Karel Dataset Inputs: Test Cases Outputs: Programs Evaluate correctness using held-out test cases

Program Synthesis Target Augmentation

Results: Program Synthesis

Summary Data augmentation meta-algorithm for improving performance on structured generation tasks

Summary Data augmentation meta-algorithm for improving performance on structured generation tasks Significantly improves over SOTA in molecular optimization: > 10%

Summary Data augmentation meta-algorithm for improving performance on structured generation tasks Significantly improves over SOTA in molecular optimization: > 10% Applicable to other domains: program synthesis

Summary Data augmentation meta-algorithm for improving performance on structured generation tasks Significantly improves over SOTA in molecular optimization: > 10% Applicable to other domains: program synthesis Thanks for Watching!

Improving Molecular Design by Stochastic Iterative Target - PowerPoint PPT Presentation

Improving Molecular Design by Stochastic Iterative Target Augmentation Kevin Yang, Wengong Jin, Kyle Swanson, Regina Barzilay, Tommi Jaakkola 15-Second Overview Data augmentation approach: improve molecular optimization SOTA by > 10%

4. Molecular dynamics Understanding Molecular Simulation Molecular Simulations Molecular

Molecular vibrations Ask Hjorth Larsen Center for Atomic-scale Materials Design 2008 Molecular

Basic Techniques II: Iterative Compression Marek Cygan Institute of Informatics University of

Chapter 12: Iterative Methods ES 240: Scientific and Engineering Computation. Iterative Methods

Development Figures are from : Agile and Iterative Development: A Manager's Guide, Craig

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

3. Monte Carlo Simulations Understanding Molecular Simulation Molecular Simulations Molecular

Molecular Simulation Introduction Understanding Molecular Simulation Introduction Why to use

Target Risk vs. Target Date Funds in 401(k) Plans: Maybe the answer is both January 14, 2015

Search-based Testing of Procedural Programs: Iterative Single-Target or Multi-Target Approach?

Multiple Sensor Target Tracking: Basic Idea Iterative updating of conditional probability

Multiple Sensor Target Tracking: Basic Idea Iterative updating of conditional probability

for innovation improving for innovation improving Design Thinking for innovation improving New

Stochastic Processes Will Perkins March 7, 2013 Stochastic Processes Q: What is a Stochastic

What If We Only Have Stochastic . . . What if the Stochastic . . . Approximate Stochastic

An Iterative Solver for the Diffusion The Methods Progress So Far... Equation Alan Davidson

Predictive power of in silico approach to evaluate chemicals against M. tuberculosis: A systematic

CH.4. STRESS Continuum Mechanics Course (MMC) - ETSECCPB - UPC Overview Forces Acting on a

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

FRR WorkShop Donald Sharp, Principal Engineer NVIDIA Agenda ASIC Offloading Netlink

Panel: Data-Intensive Workfmows Panel Don Preuss Kirk Jordan Massimo Noro Jake Carroll

EU Phosphorus Phosphorus Project Project EU Harmony Harmony Advance Reservation Reservation

Biochemical Frequency Control by Synchronisation of Coupled Repressilators An In-silico Study of

Existential instantiation (EI) For any sentence , variable v , and constant symbol k that does