Network-Driven Drug Discovery: An Application of In-Memory - PowerPoint PPT Presentation

Network-Driven Drug Discovery: An Application of In-Memory Distributed Processing Jonny Wray, PhD Head of Discovery Informatics jonny.wray@etherapeutics.co.uk

Who We Are Pioneers of the next frontier in drug discovery A unique drug discovery company headquartered in Oxford, UK, and listed on the AIM market in London (ETX.L.) Achieve diverse and high-performing drug hits quickly and cost efficiently Demonstrated success in 12 diverse areas of biology, from oncology to immunology and neurodegeneration Architects of an original, proprietary NETWORK-DRIVEN DRUG DISCOVERY platform A suit of powerful, custom computational tools that tap into large-scale, proprietary databases Applies network science to tackle complex diseases Employs data mining, machine learning, artificial intelligence, optimisation and network analysis A professional business partner: collaborations or out-licensing self-discovered assets Current focus on preclinical discovery programmes in immuno-oncology Offering a Hedgehog pathway modulation programme for out-licensing Seeking collaborations to apply our Network-Driven Drug Discovery platform to disease areas of mutual interest 2

Drug Discovery and Development Where e-therapeutics Operates e-therapeutics 3

Drug Discovery Process Analysis An Industry Ripe for Innovation Industry productivity is decreasing Costs are massive and increasing Late stage failures due to efficacy Er Eroom’ m’s la law Source: DiMasi et. al., Journal of Health Economics 47, 20-33 (2016) Source: Cook et. al., Nature Reviews Drug Discover y 13, 419-431 (2014) 4

Network Biology The Cell as a Network Protein-Protein Interaction Network Metabolic Network Signal Transduction Pathways Gene Regulatory Network 5

Network Biology Disease Behavior is an Emergent Property of Molecular Networks Dysregulated network module identification Pathological interaction identification in Huntington’s disease Source: Tourette, C., et al. Journal Biological Chemistry (2014) Source: Schadt, E., et al. Nature Reviews Drug Discovery (2009) 6

Network Biology Drugs Need to Alter Phenotype Intervening here… …to change this INTERACTOME GENOTYPE PROTEOME PHENOTYPE DN DNA RNA RN Protein Pr Pr Protein-Pr Protein Pa Pathway Pa Pathway-Pa Pathway Network Ne Ne Networks of Hi Higher Order Tr Trai ait In Interaction In Interaction Networks Ne Networks Ne • Phenotype is an emergent property of cellular networks • Networks can be viewed as the mechanistic bridge between the molecular and the phenotype 7 Confidential

Network-driven Drug Discovery Process From Hypothesis to Compound Testing in 9 Months Gaps in available treatment for disease 02 04 03 01 05 Phenotypic screening Identification Network Network Compound Mapping of intervention model analysis strategies construction in silico Discovery Engine Hit to Lead 8 Optimisation Confidential

Disease Network Perturbation Analysis Core Foundation of Discovery Process Networks are robust to random perturbation… … but susceptible to targeted perturbation Random Perturbation: YouTube Video Targeted Perturbation: YouTube Video 9

Network Model Construction Biological Inverse Problem Cells Measurements Network Model of Disease Healthy Vs Diseased 10

Network Model Construction Computational Issues ‘Active Module’ Detection: Integration of molecular profiles with cellular interactions • Formulated as an optimization problem – find high scoring sub-network • Heuristic approaches: greedy search • Exact approach: Prize-collecting Steiner tree formulated as linear programming problem Prize-collecting Steiner tree problem Maximum weight connected subgraph problem • Computationally expensive to solve: We use IBM CPLEX Optimizer • Multiple optimal, and suboptimal, solutions: Steiner Forests • Future challenges: move from gene based (22k) to protein based (250k – 1.5M) networks 11

Compound Mapping Data Augmentation With Machine Learning Ma Matrix Comple letion Pl Platform Servi vices Naïve Bayes Bioactivity Natural Footprint Language Database Processing Classifiers w Cl with Co Compound F Features Gradient Boosted Machines Intellegens Int ns Neural Networks Model Ensembling Classifiers w Cl with P Protein F Features Gradient Boosted Machines Feature Engineering Sparse Experimental Data Augmented with Predictions 12

Compound Mapping Computational Issues Requirements - - Heterogenous data: hard to make sampled data set results generalize to full data set - Speed: slow training times kill exploratory development of machine learning solutions - In memory requirements - Full matrix: 15M (compounds) x 20k (proteins) - ~1200G with Java float - Sensible data filtering: ~300G Solution Used - - H20.ai: - “H2O is an open source, in-memory, distributed, fast, and scalable machine learning and predictive analytics platform that allows you to build machine learning models on big data and provides easy productionalization of those models in an enterprise environment.” - Can deal with machine learning on full data set in-memory on our hardware (distributed 512G grid) - Required algorithms implemented - Data scientists prefer the environment over Spark 13

Network Analysis Error vs Attack Tolerance: Biological Networks are Robust 𝐽𝑛𝑞𝑏𝑑𝑢 = ∆ 𝐵𝑤𝑕. 𝑇ℎ𝑝𝑠𝑢𝑓𝑡𝑢 𝑄𝑏𝑢ℎ Attack: Targeted by Degree Error: Targeted Randomly vs • Albert, R., H. Jeong, and A. L. Barabasi. 2000. “Error and Attack Tolerance of Complex Networks.” Nature 406 (6794): 378–82 . 14

Network Analysis Algorithms Core algorithms used in drug discovery process • All can be formulated as embarrassingly parallel problems • Perturbation Analysis • Sequentially remove nodes from a network and measure change in network structure • Generate data for random vs targeted comparison • Used to calibrate other analysis for specific networks – identifies region of random effect • Impact Maximization • Find the optimal set of nodes (proteins) that maximally disrupt a network • Compound Impact Ranking • Rank all entries in our compound database by their impact on a network GridGain (Ignite) compute grid • Infrastructure for parallel distributed compute • Map-reduce or fork-join extended from multiple threads to multiple JVMs and physical machines • Hadoop: • Standard map-reduce framework (when we implemented) • Focused on massive data sets - not in-memory – which isn’t our situation • Batch focused – key requirement was for on-line, user triggered processing 15

Distributed Fork-Join or Simple Map-Reduce Generic Algorithm Master node Worker nodes – distributed across multiple machines Compute task: • divide into multiple jobs • collate results from multiple jobs Compute jobs: perform calculations on isolated data Multiple concurrent analysis runs from multiple users 16

Network Analysis Perturbation Goal: characterize network robustness behavior via perturbation • One compute task per repeat • One compute job • Calculate impact for a specific node set size • All jobs: • impact calculations for node sets of all sizes • Example below • 300 network calculations per repeat • Total repeats Error bars generated by repeats Generated data: 17

Network Analysis Impact Maximization Goals: • Find protein sets that have a large effect on network structural coherence and so on the targeted biological process • Robustness properties of biological networks mean the vast majority of protein sets have little effect • Compound mapping to those protein sets finds potential therapeutics Algorithmic Approach 8777 ≈ 3.4 ∗ 10 ?8 • Exhaustive approach unfeasible due to combinatoric explosion : 𝐷 67 • Stochastic approximation or metaheuristics • Stochastic aspect facilitates the exploration of solution space: more likely to find global maxima • Genetic algorithm • Specific, population based stochastic approximation approach • Based (very loosely) on natural selection • Population based ⇒ embarrassingly parallel 18

Network Analysis Impact Maximization via Genetic Algorithm Goal: find protein set(s) that maximize network impact • One compute task per “generation” • Generates population of potential solutions (nodes to remove) • Initially randomly • Then by “breeding” best solutions of previous generation asymptotic convergence • Compute job: evaluation of one member of population • All jobs: evaluation of whole population • Evaluation: quantification of the effect of node removal 19

Implementation Lessons 1. Minimize Data Distribution Naïve (first) implementation • Master node generates population of perturbed networks • Networks are distributed to worker nodes • Worker nodes perform network calculations (e.g. shortest path analysis) • Parallel distributed implementation was slower than serial • Cost of data distribution swamped gain due to parallel calculations Current Solution • Full, intact network is distributed to all worker nodes once at the start • Master node generates population of bit vectors indicating which nodes to remove • Bit vectors are distributed to worker nodes • Intact network is shared between worker nodes and multiple threads on each worker node • Immutable data structure for network • Percolation operation is construction of new network not removal of nodes from intact network. 20

Network-Driven Drug Discovery: An Application of In-Memory - PowerPoint PPT Presentation

Network-Driven Drug Discovery: An Application of In-Memory Distributed Processing Jonny Wray, PhD Head of Discovery Informatics jonny.wray@etherapeutics.co.uk Who We Are Pioneers of the next frontier in drug discovery A unique drug

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Massively Multitask Networks for Drug Discovery Ramsundar et al. (2015) What is Drug Discovery?

CD3 Centre for Drug Design and Discovery The investment fund for innovative small molecule

Prescription Drug Abuse Is Drug Abuse About Rx Drug Abuse What is prescription (Rx) drug

Drug education in schools ALCOHOL AND DRUG FOUNDATION 28/11/2017 Drug education in schools

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

Drug Discovery Process Drug Discovery Toolbox Insights on the Origins of Biological Activities

Drug Discovery using Grid Technologies Yuichiro Inagaki Biotechnology division Fuji Research

University of Pittsburgh Drug Discovery Institute The Role of Systems Biology in Drug Discovery

Discovery of Drug Sensitizing Genotypes in Discovery of Drug Sensitizing Genotypes in Cancer Cells

Bridging The Valley Of Death In Academic Drug Discovery Dennis Liotta, Ph.D. Dennis Liotta,

Mathematics In Drug Discovery: An Practitioners View Mathematics In Drug Discovery: An

Fuzzy Logic Interval Clustering for Drug Discovery PREDICTION ACCURACY FOR DRUG DISCOVERY

The Joyful Complementarity of Science and Faith ButHow Should Believers View Advances in

COMP 364: Computer Tools for Life Sciences Python libraries; How to read and use an API

1 , Roberta Baronio 1 , Emiliano De Cristofaro 2 , Pierre Baldi 1 , and Gene Tsudik 1 Paolo

Protein polymerization simulation for amyloid diseases (Prion, Alzheimer s) Marie Doumic

CS681: Advanced Topics in Computational Biology Week 1, Lectures 2-3 Can Alkan EA509

Inhibition of the cancer target human hyaluronidase Hyal-1 by natural substances Isabelle Lengers

Biotech-based world leader Biotech-based world leader in enzymes & microorganisms Novozymes

M Sdersten, Karolinska Institutet, Oct 13, 2012, Rekjavik Voice team Karolinska

Network-Driven Drug Discovery: An Application of In-Memory - PowerPoint PPT Presentation

Network-Driven Drug Discovery: An Application of In-Memory Distributed Processing Jonny Wray, PhD Head of Discovery Informatics jonny.wray@etherapeutics.co.uk Who We Are Pioneers of the next frontier in drug discovery A unique drug

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Massively Multitask Networks for Drug Discovery Ramsundar et al. (2015) What is Drug Discovery?

CD3 Centre for Drug Design and Discovery The investment fund for innovative small molecule

Prescription Drug Abuse Is Drug Abuse About Rx Drug Abuse What is prescription (Rx) drug

Drug education in schools ALCOHOL AND DRUG FOUNDATION 28/11/2017 Drug education in schools

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Priority-Driven Scheduling of Periodic Tasks Priority-driven vs. clock-driven scheduling:

Drug Discovery Process Drug Discovery Toolbox Insights on the Origins of Biological Activities

Drug Discovery using Grid Technologies Yuichiro Inagaki Biotechnology division Fuji Research

University of Pittsburgh Drug Discovery Institute The Role of Systems Biology in Drug Discovery

Discovery of Drug Sensitizing Genotypes in Discovery of Drug Sensitizing Genotypes in Cancer Cells

Bridging The Valley Of Death In Academic Drug Discovery Dennis Liotta, Ph.D. Dennis Liotta,

Mathematics In Drug Discovery: An Practitioners View Mathematics In Drug Discovery: An

Fuzzy Logic Interval Clustering for Drug Discovery PREDICTION ACCURACY FOR DRUG DISCOVERY

The Joyful Complementarity of Science and Faith ButHow Should Believers View Advances in

COMP 364: Computer Tools for Life Sciences Python libraries; How to read and use an API

1 , Roberta Baronio 1 , Emiliano De Cristofaro 2 , Pierre Baldi 1 , and Gene Tsudik 1 Paolo

Protein polymerization simulation for amyloid diseases (Prion, Alzheimer s) Marie Doumic

CS681: Advanced Topics in Computational Biology Week 1, Lectures 2-3 Can Alkan EA509

Inhibition of the cancer target human hyaluronidase Hyal-1 by natural substances Isabelle Lengers

Biotech-based world leader Biotech-based world leader in enzymes &amp; microorganisms Novozymes

M Sdersten, Karolinska Institutet, Oct 13, 2012, Rekjavik Voice team Karolinska

Biotech-based world leader Biotech-based world leader in enzymes & microorganisms Novozymes