Conditioning by adaptive sampling for robust design David Brookes - PowerPoint PPT Presentation

Conditioning by adaptive sampling for robust design David Brookes Jennifer Listgarten Biophysics Graduate Group EECS and Center for Computational Biology University California, Berkeley University California, Berkeley

Motivating problem: design protein sequences • Proteins are made up of sequences of amino acids (20 possibilities) • Huge variety of proteins whose function we would like to improve

Motivating problem: design protein sequences • Proteins are made up of sequences of amino acids (20 possibilities) • Huge variety of proteins whose function we would like to improve Proteins that fluoresce

Motivating problem: design protein sequences • Proteins are made up of sequences of amino acids (20 possibilities) • Huge variety of proteins whose function we would like to improve Proteins that fluoresce … that act as drugs

Motivating problem: design protein sequences • Proteins are made up of sequences of amino acids (20 possibilities) • Huge variety of proteins whose function we would like to improve … that fixate Proteins that carbon in the fluoresce atmosphere … that act as drugs

Motivating problem: design protein sequences • Proteins are made up of sequences of amino acids (20 possibilities) • Huge variety of proteins whose function we would like to improve … that fixate Proteins that carbon in the fluoresce atmosphere … that deliver …. that act as gene-editing drugs tools to tissues

How to map sequence to function? How to map sequence to function? A law of molecular biology: A law of molecular biology: Sequence Structure Function Sequence Structure Function ex: fluorescence Hughes A, Mort M, Carlisle F , et al B04 Alternative Splicing In Htt Journal of Neurology, Neurosurgery & Psychiatry 2014; 85: A10. Hughes A, Mort M, Carlisle F , et al B04 Alternative Splicing In Htt Journal of Neurology, Neurosurgery & Psychiatry 2014; 85: A10. http://www.rcsb.org/structure/6FWW http://www.rcsb.org/structure/6FWW

Bypassing the structure relationships A law of molecular biology: Sequence Structure Function High throughput experiments (& ML) Hughes A, Mort M, Carlisle F , et al B04 Alternative Splicing In Htt Journal of Neurology, Neurosurgery & Psychiatry 2014; 85: A10. http://www.rcsb.org/structure/6FWW

Can we solve the inverse problem? A law of molecular biology: Sequence Structure Function Design problem: Given a model, find sequences with desired function Hughes A, Mort M, Carlisle F , et al B04 Alternative Splicing In Htt Journal of Neurology, Neurosurgery & Psychiatry 2014; 85: A10. http://www.rcsb.org/structure/6FWW

Why is protein design difficult? • Huge, rugged search space ⟹ size scales as 20 $ Atoms in universe Grains of sand on earth

Why is protein design difficult? • Huge, rugged search space ⟹ size scales as 20 $ • Discrete search space (no gradients) Atoms in universe Grains of sand on earth

Why is protein design difficult? • Huge, rugged search space ⟹ size scales as 20 $ • Discrete search space (no gradients) Atoms in universe • Uncertainty in predictor Grains of sand on earth https://livingthing.danmackinlay.name/gaussian_processes.html69

Possible solution: model-based optimization (MBO) Idea: replace the standard (hard) objective e.g. the space of sequences

Possible solution: model-based optimization (MBO) Idea: replace the standard (hard) objective with a potentially easier one model over sequence space the space of sequences

Possible solution: model-based optimization (MBO) Idea: replace the standard (hard) objective with a potentially easier one Solution approach is to iterate: 1. Sample from “search model” 𝑞 𝑦 𝜄 2. Evaluate samples on 𝑔 𝑦 3. Adjust 𝜄 so the model favors samples with large function evals

Possible solution: model-based optimization (MBO) Idea: replace the standard (hard) objective with a potentially easier one Solution approach is to iterate: ü Model can sample broad 1. Sample from “search model” 𝑞 𝑦 𝜄 areas of sequence space 2. Evaluate samples on 𝑔 𝑦 ü Does not require gradients of 𝑔 3. Adjust 𝜄 so the model favors ü Can incorporate sequences with large function evals uncertainty

First attempt at MBO for protein design: Design by Adaptive Sampling (DbAS) Our aim is solve the MBO objective:

First attempt at MBO for protein design: Design by Adaptive Sampling (DbAS) Our aim is solve the MBO objective: where 𝑞 𝑦 𝜄 is the search model (VAE, HMM…) •

First attempt at MBO for protein design: Design by Adaptive Sampling (DbAS) Our aim is solve the MBO objective: where 𝑞 𝑦 𝜄 is the search model (VAE, HMM…) • 𝑇 is desired set of property values • à e.g. fluorescence > 𝛽

First attempt at MBO for protein design: Design by Adaptive Sampling (DbAS) Our aim is solve the MBO objective: where 𝑞 𝑦 𝜄 is the search model (VAE, HMM…) • 𝑇 is desired set of property values • à e.g. fluorescence > 𝛽 𝑄(𝑇|𝑦) is a stochastic predictive model (“oracle”) • that maps sequences to property

Design by Adaptive Sampling (cont.) Two issues: 1. 𝜄 is in the expectation distribution.

Design by Adaptive Sampling (cont.) maximize a lower bound Two issues: 1. 𝜄 is in the expectation distribution. ≥

Design by Adaptive Sampling (cont.) maximize a lower bound Two issues: 1. 𝜄 is in the expectation distribution. ≥ 2. MC estimates for rare events.

Design by Adaptive Sampling (cont.) maximize a lower bound Two issues: 1. 𝜄 is in the expectation distribution. ≥ 2. MC estimates for rare events. anneal a sequence of relaxations: 𝑇 0 → 𝑇 , where 𝑇 0 ⊃ 𝑇 034

Design by Adaptive Sampling (cont.) maximize a lower bound Two issues: 1. 𝜄 is in the expectation distribution. ≥ 2. MC estimates for rare events. Anneal and MC

Design by Adaptive Sampling (cont.) maximize a lower bound Two issues: 1. 𝜄 is in the expectation Assumes oracle is unbiased and distribution. ≥ has good uncertainty estimates 2. MC estimates for rare events. Anneal and MC

How pathological oracles lead you astray

How pathological oracles lead you astray Acceptable Many training examples

How pathological oracles lead you astray Acceptable Pathological Many training examples Fewer training examples

How pathological oracles lead you astray Acceptable Pathological Idea: estimate training distribution of x conditioned on high values of oracle

Fixing pathological oracles w/ conditioning Idea: estimate training distribution of x conditioned on high values of oracle

Fixing pathological oracles w/ conditioning Idea: estimate training distribution of x conditioned on high values of oracle Don’t have access to training distribution, but can build a model 𝑞 𝒚 𝜾 7 to approximate it

Conditioning by Adaptive Sampling (CbAS) Previous formulation: New formulation: ≥ 𝑞 𝒚 𝜾 (𝟏) models the training distribution Anneal and MC

Conditioning by Adaptive Sampling (CbAS) Previous formulation: New formulation: = ≥ Anneal and MC

Conditioning by Adaptive Sampling (CbAS) Previous formulation: New formulation: = ≥ Anneal and MC Can’t anneal when sampling dist. doesn’t change!

Conditioning by Adaptive Sampling (CbAS) Previous formulation: New formulation: = ≥ = Anneal and MC Importance sampling proposal dist.

Conditioning by Adaptive Sampling (CbAS) Previous formulation: New formulation: = ≥ ≥ = Anneal and MC Anneal and MC Anneal and MC

Testing is fundamentally different • We don’t trust our oracle and generally can’t query the ground truth

Testing is fundamentally different • We don’t trust our oracle and generally can’t query the ground truth • We can’t hold-out a test set of good sequences • Near-zero chance of any of these sequences being found by the method Test set

Testing is fundamentally different • We don’t trust our oracle and generally can’t query the ground truth • We can’t hold-out a test set of good sequences • Near-zero chance of any of these sequences being found by the method • We can’t use some canonical test function as the oracle • In our problem it is untrustworthy

Testing strategy • Simulate a ground truth based on real data Ground à “Ground truth” is a GP mean function truth GP

Testing strategy • Simulate a ground truth based on real data Ground à “Ground truth” is a GP mean function truth GP • Ground truth vales values are sampled from the GP for given sequences • Use these input-output pairs to train oracles. Training data Oracles

Testing strategy • Simulate a ground truth based on real data à “Ground truth” is a GP mean function • Ground truth vales values are sampled from the GP for given sequences • Use these input-output pairs to train oracles • Coerce training set so these oracles exhibit pathologies

Results

Conditioning by adaptive sampling for robust design David Brookes - PowerPoint PPT Presentation

Conditioning by adaptive sampling for robust design David Brookes Jennifer Listgarten Biophysics Graduate Group EECS and Center for Computational Biology University California, Berkeley University California, Berkeley Motivating problem:

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Classical Conditioning MacFarlane (1978) Perceptual Development: Methods Classical Conditioning

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Classical and Instrumental Conditioning Lecture 8 1 Basic Procedure for Classical Conditioning

Conditioning in 90B John Kelsey, NIST, May 2016 Overview What is Conditioning? Vetted and

On the conditioning of subensembles Dustin G. Mixon Jubilee of Fourier Analysis and Applications

FLOW CONDITIONING FLOW CONDITIONING DESIGN IN TURBULENT DESIGN IN TURBULENT LIQUID SHEETS

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Time from DMT decision to starting therapy Dr. Orla Tuohy Locum consultant neurologist; Queen

Hospital and Community Jo Crawford CTRS, MSL Barrow Connection Barrow Neurological Institute St.

High Yield Neurological Examination Vanja Douglas, MD Sara & Evan Williams Foundation

Tetrahydro-Benzo[ c ]azepines David Dumoulin, Stphane Lebrun, Axel Couture,* ric Deniau,

prospective study Eric Van Belle MD-PhD, Nicolas Debry MD et al. Heart Institute CHRU Lille,

CHEMOTHERAPY IN ELDERLY PATIENTS WITH PANCREATIC CANCER Akshjot Puri, MD Post Graduate Year 5

QUALITATIVE GAIT ABNORMALITIES OF NEUROLOGICAL TYPE, CLINICAL CHARACTERISTICS AND DISABILITY IN

Multimodal i-vectors to Detect and Evaluate Parkinsons Disease Nicanor Garca 1 , Juan Camilo

Conditioning by adaptive sampling for robust design David Brookes - PowerPoint PPT Presentation

Conditioning by adaptive sampling for robust design David Brookes Jennifer Listgarten Biophysics Graduate Group EECS and Center for Computational Biology University California, Berkeley University California, Berkeley Motivating problem:

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Classical Conditioning MacFarlane (1978) Perceptual Development: Methods Classical Conditioning

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Classical and Instrumental Conditioning Lecture 8 1 Basic Procedure for Classical Conditioning

Conditioning in 90B John Kelsey, NIST, May 2016 Overview What is Conditioning? Vetted and

On the conditioning of subensembles Dustin G. Mixon Jubilee of Fourier Analysis and Applications

FLOW CONDITIONING FLOW CONDITIONING DESIGN IN TURBULENT DESIGN IN TURBULENT LIQUID SHEETS

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Time from DMT decision to starting therapy Dr. Orla Tuohy Locum consultant neurologist; Queen

Hospital and Community Jo Crawford CTRS, MSL Barrow Connection Barrow Neurological Institute St.

High Yield Neurological Examination Vanja Douglas, MD Sara &amp; Evan Williams Foundation

Tetrahydro-Benzo[ c ]azepines David Dumoulin, Stphane Lebrun, Axel Couture,* ric Deniau,

prospective study Eric Van Belle MD-PhD, Nicolas Debry MD et al. Heart Institute CHRU Lille,

CHEMOTHERAPY IN ELDERLY PATIENTS WITH PANCREATIC CANCER Akshjot Puri, MD Post Graduate Year 5

QUALITATIVE GAIT ABNORMALITIES OF NEUROLOGICAL TYPE, CLINICAL CHARACTERISTICS AND DISABILITY IN

Multimodal i-vectors to Detect and Evaluate Parkinsons Disease Nicanor Garca 1 , Juan Camilo

High Yield Neurological Examination Vanja Douglas, MD Sara & Evan Williams Foundation