SLIDE 1
Donovan N. Chin & R. Aldrin Denny
SLIDE 2 Traditional Drug Discovery (insert graph) In Silico Prediction of ADME (insert graph)
- Potency
- Absorption
- Lead
- Drug
- Toxicity
- Excretion
- Metabolism
- distribution
SLIDE 3
Target IVY(Brute force virtual screening of
very large compound libraries) Lead Discovery IVY(Utilize predictive models from Biogen data for more efficient virtual screening) Lead Optimization candidate
SLIDE 4 (insert graph)
- Potency
- Lead
- Drug
- Toxicity
- Excretion
- Metabolism
- Distribution
- absorption
SLIDE 5
Goal: Identify crystallographic binding mode,
Rank order ligands wrt binding with protein
(insert graph) Receptor Docking Ligand Shape Generate plausible trial binding modes using
docking function then Re-rank modes with scoring function
SLIDE 6
(insert graph) 341 Active 47 Non-Active
SLIDE 7
(insert graph) After filtering by Pharmacophore Feature
SLIDE 8
(insert graph)
SLIDE 9 (insert functions for)
- F_Score*
- D_Score
- G_Score
- PMF_Score
- Chem_Score
- ICM_Score*
SLIDE 10 Cell Adhesion Assay (50% Serum)
Biochemical Adhesion Assay
Scoring Functions Are Poor More Often Than
Not
SLIDE 11
Receptor Site View Library Design FlexX
Score Consensus Score>=3 e.g. Contact Map, CLogP MW, HBOND Rotatable bonds Consensus=5? if yes, substructure exists? if yes, Pharmacophore<4.2Å? if yes, Publish Hit Report
SLIDE 12
(insert graph)
SLIDE 13 Goal: Predict hit/miss class based on presence of features
(fingerprints)
Method
- Given a set of N samples
- Given that some subset A of them are good („active‟)
Then we estimate for a new compound: P(good)~ A/N
- Given a set of binary features F
For a given feature F:
It appears in N samples It appears in A good samples
Can we estimate: P(good l F)~A/N
(Problem: Error gets worse as Nsmall)
- P‟(good l F)= (A+P(good)k)/(n+k)
P‟(good l F)p(good)as N0 P‟(good l F) A/N as N large
- (If K=1/P(good) this is the Laplacian correction)
Descriptors (insert) Advantages
- Can describe huge number of features (up to 4 billion; MDL 1024; Lead
scope 27,000)
- Contains tertiary and stereochemistry information
- Fast
SLIDE 14 Classification Analysis
- Developing Non-Linear Scoring Functions to classify
actives and non-actives
- (insert graphs)
- Cost Function to Minimize: Gini Impurity N= 1-
ΣP^2(ω)
SLIDE 15
Training Set Prediction Success (insert table) 10-fold cross validation Randomly split training and test sets Significant Improvement in Separating Actives
from Non-Actives
SLIDE 16
(insert graph) Significant Improvement in Finding Hits Using
New SF
SLIDE 17
Optimal tree identified (insert graph) No random effects (insert graph)
SLIDE 18
(insert cluster) Able to identify different molecular property
criteria that lead to hits
SLIDE 19
(insert graph)
SLIDE 20
(insert graph) Size= magnitude of OBA OBA values cover range of descriptor space
SLIDE 21
(insert graph) Choose 1 & 2D Descriptors for ease of
interpretation and lower “noise”
SLIDE 22
Build Model (insert graphs) Apply Model
SLIDE 23
Features found in high OBA Features found in low OBA Would be nice if CART did similar view
SLIDE 24
Improved scoring functions for separating
hits from non-hits in structure-based drug design developed with CART and Bayesian models
Identified key differences in molecular
physical properties that led to hits
Built reasonably predictive OBA model
(cannot expect method to extend to other systems given complexity of OBA, however)
SLIDE 25 Biogen IDEC Modeling
- Rajiah Denny
- Claudio Chuaqui
- Juswinder Singh
- Herman van Vlijmen
- Norman Wang
- Anuj Patel
- Zhan Deng
Chemistry
- Kevin Guckian
- Dan Scott
- Thomas Durand-Reville
- Pat Conlon
- Charlie Hammond
- Chuck Jewell
Pharmacology