Bayesian matrix factorization for drug-target activity prediction - PowerPoint PPT Presentation

Bayesian matrix factorization for drug-target activity prediction Yves Moreau University of Leuven – ESAT-STADIUS SymBioSys Center for Computational Biology

Number of new drugs per billion US$ 100 1950 10 1 2010 0.1 Scannell et al . 2012

The curse of attrition … 83% 64% 60% Phase success 32% rates Phase 1 to Phase 2 to Phase 3 to NDA/BLA to phase 2 phase 3 NDA/BLA approval Hay et al. 2014

… mainly due to safety and efficacy issues Other Efficacy Safety Causes of failure between Phase 2 and submission in 2011 and 2012 Arrowsmith & Miller 2013

Chemoinformatics ? ● Goal: estimate interaction between compounds and protein targets ● Activity measured by high- Compound throughput screening (ex: Viagra) Enzyme ● Activity depends on (ex: ACE2) match between shape of compound and shape of protein ● 3D modeling is challenging

Drug–target activities • IC50 – amount of compound needed for half inhibition • pIC50 = -log10(IC50) • EC50 – amount of compound needed for half effect

High-throughput screening • Hit discovery in early drug discovery P 1 P x P m • Identify compounds active against a protein drug target of interest Comp 1 7 2 • Activity measured by Comp 2 8 Millions of compounds high-throughput screening • Activity = “scarce” data Comp u 3 IC50 1-2% fill rate Comp N 9 Thousands of targets

Molecular fingerprints Ø High-dimensional fingerprints of 2D compound structures Ø Sparse vectors Key-based fingerprints Circular fingerprints MNA & MPD & ECFP FP2 & MACCS A bit string represents the each fingerprint presence or absence of represents a central particular substructures atom and its neighbors 9

Quantitative Structure–Activity Relationship (QSAR) Ø Finds optimal model α based on predictive features Ø IC50( x ) = α 1 x 1 + α 2 x 2 + … + α F x F P 1 P x P m Ø Minimize error loss 00100010 Comp 1 7 2 Ø PLS, ridge regression Ø Good performance if 2 Comp 2 8 01000001 enough training examples Ø Does not share 00101101 Comp u 3 IC50 information across tasks! 00101101 Comp N 9

Multitask learning • From fingerprints and P 1 P x P m available activities, 00100010 Comp 1 7 2 predict missing activities Comp 2 8 01000001 • Approaches Supervised learning per 1. 3M compounds target (QSAR) 00101101 Comp u 3 Matrix factorization 2. IC50 - Netflix style 1-2% fill rate MF + supervised 3. - Macau 00101101 Comp N 9 Features 1500 targets (6K-4M)

The Netflix Challenge • Goal: predict user movie ratings • 440K users, 18K movies 18K movies • 100 million ratings 1 ? 2 ? ? ? • 1% fill rate ? ? ? ? ? 1 • è Predict 99% missing ? ? ? 5 ? ? 440K users ? ? ? ? ? 4 • How can this work? ? 5 ? ? ? ? ? ? ? ? 3 ? ? ? 3 ? ? ? 4 ? ? ? ? ?

Factor analysis Ø Low-rank approximation of full matrix V U Y Factors * ≈ Loadings

Factor analysis U i . V Y i . Factors * ≈ Loadings

Factor analysis Ø Individual response (= row) modeled as individual mixture (= loading) of a small number of latent responses (= factor) * + Factors ≈ * + * Loadings

Alternating Least Squares U i . V Y i . Factors ? * ? ? ≈ Loadings

Alternating Least Squares Ø If V were known, U could be found by linear regression V U Y ? ? ? Factors * ? ? ? ? ? ? ≈ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Loadings

Alternating Least Squares Ø If U were known, V could also be found by linear regression V U Y ? ? ? ? ? ? Factors * ? ? ? ? ? ? ? ? ? ? ? ? ≈ Loadings

Scarce matrix factorization Ø Only observed values are used in regressions V * U * Y Factors * ≈ U , V W ! ( Y − U . V ) min 2 Loadings

Scarce matrix factorization Ø Once factors are obtained, other entries can be predicted ˆ V * U * Y Factors * ≈ Loadings

Uncertainty Ø Given scarce data, is a single solution ( U * , V * ) meaningful? ˆ V * U * Y Factors * ≈ Loadings

Bayesian modeling Ø Given uncertainty from scarce data, Bayesian inference is desirable ( ) = min U , V W ! ( Y − U . V ) U * , V * • Instead of , 2 we want to consider the Bayesian posterior distribution p ( U , V | Y ) • Posterior predictive distribution p ( ˆ Y | Y ) is more informative than any optimal estimator

Ordinary least squares Ø ALS involves successive regressions solved by OLS U i . V Y i . Factors * ? ? ? ≈ Loadings

Ordinary least squares Ø Model Ø Solution Ø Setup = transposed of previous notation Ø If Gaussian noise, then OLS is max. likelihood estimate

Block Gibbs sampler Ø The Gibbs sampler is a Markov Chain Monte Carlo method Ø MCMC for model inference generates samples from complex posterior distributions of model parameters by iteratively sampling from simpler distributions Ø The following scheme is a block Gibbs sampler U ( i + 1) ~ p ( U | V ( i ) , Y ) V ( i + 1) ~ p ( V | U ( i + 1) , Y ) Ø Under mild conditions of ergodicity, after burn-in , the samples will be dependently drawn from joint distribution For i sufficiently large, ( U ( i ) , V ( i ) ) ~ p ( U , V | Y ) Ø Similar to alternating least squares, but global optimization

Markov Chain Monte Carlo Ø We do not get the posterior distribution analytically, only samples from it Ø Samples are sufficient to characterize posterior distribution Ø e.g., average solutions to get posterior mean estimate Ø e.g., marginal variance of individual predictions to characterize uncertainty

Bayesian linear regression Ø The distribution of β in function of the data X and y can be modeled as a multivariate Gaussian distribution over β Ø Model Ø Assume a Gaussian prior for β and an inverse gamma prior for ρ =

Bayesian linear regression Ø Then the posterior distribution of β is also a Gaussian distribution by application of Bayes’ rule = = Ø If Λ 0 = 0 and µ 0 = 0 , then solution for µ n is identical to OLS! Ø Average solution µ n is similar to ridge regression solution Ø Precision matrix Λ n characterizes variance of solution

GAMBLR trick Ø Executing the Gibbs sampler requires sampling repeatedly from posterior Gaussian distributions (which change every time U and V change) Ø Sampling from multivariate Gaussian distribution ε ~ N (0, I ). If A such that Σ =AA', then z = µ + A ε ~ N ( µ , Σ ) Ø For Bayesian linear regression ! $ ! $ y X # & X = # & , y = with Λ 0 = L 0 L 0 ' L 0 L 0 µ 0 # & # & " % " % − 1 Xy and Λ n = XX ' ( ) µ n = XX ' − 1 X ( y + σ . ε ) ~ N ( µ n , σ 2 Λ n ( ) − 1 ) It can be shown that z = XX ' Ø This has the same form as OLS!

GAMBLR trick Ø This means that we can sample from the posterior Gaussian distribution by solving a linear regression on the original data plus injected noise! Ø Running the Gibbs sampler then only amounts to solving a sequence of linear regressions with variable noise injection! Ø Linear regression is one of the best studied problems in numerical analysis Ø Fast algorithms Ø Scalable code Ø One multivariate regression per row or column of Y at each iteration step, hence easy parallelization

Matrix factorization • One of the best approaches for Netflix challenge • Prediction of ratings for viewer-movie pairs • Does not use features, only matrix values • Two popular versions • Probabilistic Matrix Factorization (PMF) = Maximum Likelihood • Bayesian PMF = Bayesian inference

Netflix comparison (PMF vs. BPMF) Ø Data: 100M ratings from 480K users, 18K movies Ø BPMF has advantage for users with few ratings

Motivation for Bayesian PMF • PMF gives point estimates • Problematic for compounds that have only few samples • We are interested in uncertainty of estimates Example IC50 data set from CHEMBL with 15K compounds

Bayesian PMF

Gibbs sampling • Iteratively samples each parameter • Obtains posterior samples of the model • e.g. , sample 200 models after burn-in • Using the samples one can also measure uncertainty • Related to Alternating Least Squares • Blocked Gibbs sampler with large blocks, good sampling behavior

ChEMBL: PMF vs. Bayesian PMF • ChEMBL public data set of assay activities • Classified IC50 Test classification error • 15,118 compounds • 344 proteins • 59,451 values • Discretization at 200nM • 20% test • BPMF outperforms PMF • Does not use features, only matrix values

ChEMBL: BPMF vs. ridge regression 15K compounds 344 protein 200 nM threshold 20% for test set Vary number of dimensions Matrix factorization not as good as QSAR, but does capture information.

BPMF (relation view) Model 2 entities, 1 relation Comp. Protein Latent variables (green) are learned from the IC50 data. IC50

Macau Can we get Fingerprints the best of both worlds? β comp Model Comp. Protein 2 entities, 1 relation Latent Latent U V + features for compounds Latent variables are learned together with β comp IC50

Results on ChEMBL 15K compounds 344 protein 200 nM threshold 20% for test set Compound features improve performance Multitask modeling improves performance

Bayesian matrix factorization for drug-target activity prediction - PowerPoint PPT Presentation

Bayesian matrix factorization for drug-target activity prediction Yves Moreau University of Leuven ESAT-STADIUS SymBioSys Center for Computational Biology Number of new drugs per billion US$ 100 1950 10 1 2010 0.1 Scannell et al . 2012

L101: Matrix Factorization In a nutshell Matrix factorization/completion you know? In NLP?

Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy

A Model For Mixed Linear-Tropical Matrix Factorization James Hook, Sanjar Karaev, Pauli Miettinen

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Robust Spectral Inference for Joint Stochastic Matrix Factorization Kun Dong Cornell University

CuMF: Large-Scale Matrix Factorization on Just One Machine with GPUs Wei Tan, IBM T. J. Watson

Multimodal Visualization Based On Non-negative Matrix Factorization Jorge Camargo Juan Caicedo

Matrix Factorization For Topic Models Dr. Derek Greene Insight Latent Space Workshop

Structured sparse methods for matrix factorization Francis Bach Sierra team, INRIA - Ecole

Accelerating drug discovery with deep neural networks literature review Tobias Sikosek Senior

Mo Modeling Dru rug and Me Medical Device Innovation as as Temporal al Sequences usin ing

Feature Generation for Drug Discovery Learning Using Persistent Homology to Create Moduli Spaces

Aris Floratos (Flash talk) & Kenneth Smith (Demo) Columbia University MAGNet : National Center

QUAPO : Quantitative Analysis of Pooling in High-Throughput Drug Screening Raghu Kainkaryam

Calculating MIRR 0 1 2 3 4 10% -260.0

Non-constant Non-constant growth model growth model You are calculating the intrinsic value of

Introduction to Unification Theory Solving Systems of Linear Diophantine Equations Temur Kutsia

Sambuz

Useful Links

Newsletter

Mail Us

Bayesian matrix factorization for drug-target activity prediction - PowerPoint PPT Presentation

Bayesian matrix factorization for drug-target activity prediction Yves Moreau University of Leuven ESAT-STADIUS SymBioSys Center for Computational Biology Number of new drugs per billion US$ 100 1950 10 1 2010 0.1 Scannell et al . 2012

L101: Matrix Factorization In a nutshell Matrix factorization/completion you know? In NLP?

Online-Updating Regularized Kernel Matrix Factorization Models for Large-Scale Recommender

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy

A Model For Mixed Linear-Tropical Matrix Factorization James Hook, Sanjar Karaev, Pauli Miettinen

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen Lin Department

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Robust Spectral Inference for Joint Stochastic Matrix Factorization Kun Dong Cornell University

CuMF: Large-Scale Matrix Factorization on Just One Machine with GPUs Wei Tan, IBM T. J. Watson

Multimodal Visualization Based On Non-negative Matrix Factorization Jorge Camargo Juan Caicedo

Matrix Factorization For Topic Models Dr. Derek Greene Insight Latent Space Workshop

Structured sparse methods for matrix factorization Francis Bach Sierra team, INRIA - Ecole

Accelerating drug discovery with deep neural networks literature review Tobias Sikosek Senior

Mo Modeling Dru rug and Me Medical Device Innovation as as Temporal al Sequences usin ing

Feature Generation for Drug Discovery Learning Using Persistent Homology to Create Moduli Spaces

Aris Floratos (Flash talk) &amp; Kenneth Smith (Demo) Columbia University MAGNet : National Center

QUAPO : Quantitative Analysis of Pooling in High-Throughput Drug Screening Raghu Kainkaryam

Calculating MIRR 0 1 2 3 4 10% -260.0

Non-constant Non-constant growth model growth model You are calculating the intrinsic value of

Introduction to Unification Theory Solving Systems of Linear Diophantine Equations Temur Kutsia

Sambuz

Useful Links

Newsletter

Mail Us

Aris Floratos (Flash talk) & Kenneth Smith (Demo) Columbia University MAGNet : National Center