[PPT] - A CSIR led team India consortium with global partnership for PowerPoint Presentation

SLIDE 1

Anshu Bhardwaj

Scientist & Community Builder OSDD, CSIR India

Open Source Drug Discovery (OSDD) Connecting Minds & Machines

A CSIR led team India consortium with global partnership for affordable healthcare for all

National Knowledge Network “First Annual Workshop” “The e-Infrastructure of India” 31st Oct – 1st Nov 2012

SLIDE 2

First Disease Target : Tuberculosis; Now extended to Malaria

Tuberculosis (TB) is one of leading causes of fatality, ranking second only to HIV as the killer infectious disease of adults worldwide.

Source: http://www.globalhealthfacts.org/data/topic/map.aspx?ind=12

OSDD Focus : Tropical Neglected Diseases

At least one person in

the world is newly infected with TB bacilli every second

Over 1000 deaths a day or

3 deaths every 2 mins

New TB cases 2010

No New TB Drugs past 50 years

SLIDE 3

Research Spending Per New Drug

Company Number of drugs approved R&D Spending Per Drug ($Mil) Total R&D Spending 1997-2011 ($Mil) AstraZeneca 5 11,790.93 58,955 GlaxoSmithKline 10 8,170.81 81,708 Sanofi 8 7,909.26 63,274 Roche Holding AG 11 7,803.77 85,841 Pfizer Inc. 14 7,727.03 108,178 Johnson & Johnson 15 5,885.65 88,285 Eli Lilly & Co. 11 4,577.04 50,347 Abbott Laboratories 8 4,496.21 35,970 Merck & Co Inc 16 4,209.99 67,360 Bristol-Myers Squibb Co. 11 4,152.26 45,675 Novartis AG 21 3,983.13 83,646 Amgen Inc. 9 3,692.14 33,229

Slate’s Bad Math : $55 million on each new drugs

Source: http://www.forbes.com/sites/matthewherper/2012/02/10/the-truly-staggering-cost-of-inventing-new-drugs/

SLIDE 4

Drug Discovery is a Long Risky process with Low Probability of Success

http://www.bayerpharma.com/en/research-and-development/processes/index.php

SLIDE 5

Prediction of non-toxic targets & inhibitors

Efficacy

Inhibitor should target the right protein in the pathogen (Mycobacterium tuberculosis)

Toxicity

Inhibitor should not target any crucial protein in host (Human)

x

SLIDE 6

From a mathematical point

f view, to create an

accurate model of a single mammalian cell may require generating and then solving somewhere between 100,000 to one million equations

Biology is complex !!

http://news.vanderbilt.edu/2011/10/robot-biologist/

The human brain can only process seven pieces of data at a time!!!

Need automation & new technology to address the complexity

SLIDE 7

Predictive Science in the Drug Discovery (DD) Process

Predicting toxicity and metabolism

f drugs

Prediction tools and models to prioritize candidates molecules

HPC for OSDD Community by Garuda/ CMMACS

Systems Level Models for DD

Target Identification
Pharmacomodeling
Off-target binding predictions

Virtual Screening for selected targets& Models for predicting antiTB and mutagenic properties

Systems Biology for predicting - Drug-targets MOA

SLIDE 8

Why Open Source Drug discovery ?

Many eye balls make the bug shallow!
Lack of market incentive for TB
Successful Open Source Models
Human Genome Sequencing Initiative
Open Source Software Initiative (eg: Linux OS)
Android
The WWW

SLIDE 9

Real Innovation lies in “Innovating how we innovate”…

“We cannot solve our problems with the same thinking we used when we created them.” Albert Einstein

SLIDE 10

Open TB Drug Discovery Platform

Informatics to Experimental Validation to Clinical Trials

Target Validation

f insiilico

targets Systems Biology Chem- informatics Mtb Strain and Clone Repository Screening Facility Assay Developm- ent OSDD Chem and Directed Synthesis Lead Identificati-

n

Lead Optimizati-

n

Target Identificati

n for

Leads DMPK In vivo efficacy Safety Pharmacol-

gy

Pre- Clinical Candidate Phase I-III Pharmco- genomics

SLIDE 11

OSDD portal Virtual Lab

Computer Scientists Mathematical modeling Data upload Disease experts Gene/Protein Expression Analysis Pharmacogenomics expert Administrator

Manages server

Virtual Screening

Unconventional Collaborative Network

SLIDE 12

Shaping Science 2.0

OSDD Semantic Web Architecture

SLIDE 13

OSDD Platform

System Architecture

Collaborative tools to accelerate neglected diseases research” in the book “Collaborative Computational Technologies for Biomedical Research”. Wiley and Sons. May 2011

Released : April 2010

SLIDE 14

Scientific Workflow Management Systems

http://www.tavaxy.org/ http://www.taverna.org.uk/ https://kepler-project.org/ http://galaxyproject.org/

Experimental data from biology and chemistry needs to be managed and analyzed systematically Large datasets and compute intensive analyses needs compute infrastructure

SLIDE 15

Weka Workflow a. Convert CSV to test and train files b. Convert both CSVs to arff files: output_file1 is always train file and output_file2 is test file. c. Select two input files for Classifier. Change the parameters in right side panel for each tool d. Evaluate model file: Classifier will be Misc -> SerializedClassifier

SLIDE 16

http://sysborg2.osdd.net Electronic lab note books APIs to submit workflow method to lab note book APIs to submit results to lab note book APIs to extract files from lab note books More than 250 applications integrated

Customized workflow with grid infrastructure & applications

Jobs are invoked from Customized Galaxy and submitted to Gridway

Input file + parameters Gridway meta scheduler LRM Torque Clusters Programs Gridway runner Job template PBS Customized Job Status may be checked using DRMAA API

SLIDE 17

Get data customized for extracting files from open lab note book Custom APIs for importing input files from OSDD’s open lab note book into Galaxy

SLIDE 18

 Workflows and the result of the workflows are stored as separate lab note books

 Lab note book has details of the experiments performed  Results of one experiment may be invoked for analysis in another experiment  All versions of the workflow and the results are stored  Flexibility to execute nested workflows

Custom APIs for exporting results to OSDD’s Open lab note book

SLIDE 19

List of >250 modules integrated as web services by OSDD Community

S. No Resources

Clients 1 KEGG: Kyoto Encyclopedia of Genes and Genomes 60 2 GetEntry: DDBJ sequence search by accessionID 43 3 GPSR : tools 33 4 PDB : Protein Data Bank 30 5 BioModel:mathematical models of biological DB 25 6 Gtps : Gene Trek in Prokaryote Space 8 7 WSDbfetch: retrieve entries from biological dbs using entry identifiers or accession no. 7 8 Gibv: Genome Information Broker for Viruses 7 9 DDBJ :DNA Data bank of Japan 7 10 Mafft: a multiple sequence alignment program 4 11 Fasta:- DDBJ database 4 12 Ensembl : maintains automatic annotation 4 13 VecScreen vector contamination 4 14 OMIM:Online Mendelian Inheritance in man 4 15 Gtop: Gene-product Informatics 3 16 GO: Gene Ontology 3 17 SPS : Splicing Profile based Score 2 18 GIBIS: Genome Information Broker for Insertion Sequence 1 19 RefSeq: database of sequence 1 20 GIB: Genome Information Broker 1 21 GIBEnv- DDBJ database 1 22 TxSearch: Database indexing & searching 1

SLIDE 20

Ongoing: Cheminformatics

Curated molecule datasets Cheminformatics Models Data Mining and Analysis

HT Virtual screening

PubChem ChEMBL DrugBank

Experimental Assays

Community of About 400 Other Active Communities:

OSDD Women Scientists Forum
OSDD Junior Scientists Forum

SLIDE 21

Background and Premise

SLIDE 22

Why are we doing this?

SLIDE 23

Crowd-Sourcing Large-Scale Data-Driven Cheminformatics Analysis

Machine Learning based Computational Models Bioassay Datasets Computational Tools and Resources People Standard re-ususable models/ Publications

SLIDE 24

Pubchem Bioassay data (approx. 1 lakh molecules/ dataset 6000 descriptors /molecule

Successful Models

Screen PubChem (30 million)

Data amplification in Cheminformatics

Potential Hits

Down sizing and random validation require multiple calculation for validation of results
Cross validation up to 50+ time for each experiment

SLIDE 25

The Problem

SLIDE 26

C-DAC’s Garuda Grid – Indian Grid Computing Initiative

C-DAC is R&D organization under Ministry
f Communication & Information

Technology, India

C-DAC’s Garuda Grid is targeted at

providing a facility for the scientific community, which would enable them to seamlessly access the distributed resources

Compute Power of GARUDA: ~ 70TFs

(6000 CPUs)

Currently there are 55 Garuda Partners
Has NKN (National Knowledge

Network) connectivity at 10Gbps

SLIDE 27

Internet/NKN Results NKN

OSDD-Garuda Interface

SLIDE 28

Weka in Galaxy

SLIDE 29

OSDD – Garuda Activities

Created OSDD Virtual organization and 70 users registered

under this VO.

Garuda Portal customized to support OSDD requirements
Galaxy – a biology workbench has been customized as per

OSDD requirements

JNU Head node was set up for hosting Galaxy
Common data has been uploaded to Data Location for

accessibility through Galaxy and Portal by all OSDD users

Three cluster resources have been provided for OSDD activities

– Hyderabad Cluster with 320 CPUs – Chennai Cluster with 304 CPUs – Param Yuva at Pune with 4368 CPUs

Hand-holding users from the community & resolving their

queries

SLIDE 30

SLIDE 31

OSDD Cheminformatics Programme Present Status

Models for anti-tubercular activity Periwal et al (2012) BMC Pharmacology Periwal et al (2011) BMC Res Notes Models for anti-malarial activity Periwal et al (2012) under review Models for drug toxicity Seal et al (2012) Journal of Cheminformatics Models for specific drug targets (GlmU, Kinases, DAP) Singla et al (2011) BMC Pharmacology Garg et al (2010) BMC Bioinformatics Garg et al (2010) BMC Bioinformatics Models for drug metabolism Mishra et al (2010) BMC Pharmacology Databases and Datasets for Cheminformatics Singh et al (2012) Nucleic Acids Research Singla et al (2010) BMC Pharmacology

Collaboration on cheminformatics training and research Trained ~ 50 students in advanced cheminformatics data analysis methods Training for students on parallel data analysis environments TRAINING

SLIDE 32

OSDD Cheminformatics Programme Overview

Models for anti-tubercular activity Models for anti-malarial activity Models for drug toxicity Models for drug metabolism Computational Resources for Drug Discovery (CRDD) Models for specific drug targets NKN+ CDAC-Garuda Public reporitories of Chemical Data (PubChem/ChEMBL/Drug bank) OSDD Chemical Respository (OSDDChem) OSDD Chemistry Outreach Programme

ANALYTICS DATA RESOURCES Prioritization of biologically active molecules for assays Predictive modeling of Drug Metabolism and toxicity (predictive-insilico pre-clinical trial) OUTCOMES

SLIDE 33

Anshu Bhardwaj

Council of Scientific & Industrial Research (CSIR), India

Chintalapati Janaki,

Center for Development of Advanced Computing (C-DAC), India

www.osdd.net 25-26 May 2011

Customized Galaxy with applications as Web Services and

n the Grid for Open Source Drug Discovery (OSDD)

A CSIR led team India consortium with global partnership for affordable healthcare

SLIDE 34

Literature Annotation Tools Genomic Databases

Curated Annotations Raw Annotations OSDD C2D Community 800+ Student Researchers Collaborative Curation

Pathway/Interactome | Gene Ontology | Protein Structure/Fold | Glycomics| Immunome

The “Connect to Decode” Programme

SLIDE 35

Co Community nity Cu Cura ratio tion!! !!

Wrong

ng

(mark rk in red)

Right ght

(mark rk in green) n)

Onlin ine e discu cussion ion

Working on the cloud..

SLIDE 36

OSDD Community Effort to Understand Mtb Biology

SLIDE 37

The largest Mtb Interactome 54 Authors 29 Institutions

More than 2500 views and 350 downloads till date

Published: July 11, 2012

SLIDE 38

Knowledge Discovery Systems

S. no. Resource

Description URL 1 SysBorg* Community interaction portal

http://sysborg2.osdd.net

2 OSDDChem* Portal for submission/proposal of synthetic compounds for screening

http://crdd.osdd.net/osddchem

3 OSDDChemDesign Portal for submission/proposal for compounds for screening

http://180.149.49.37/servers/osddchemdesign

4 Tbrowse* Genome browser for Mtb

http://tbrowse.osdd.net

5 IPW* Interacting partners database

http://crdd.osdd.net/servers/ipw

6 curateTB Curated data on TB from literature

http://180.149.49.37/servers/ctb

7 Structural Annotation* Structural proteome of Mtb

http://proline.physics.iisc.ernet.in/Tbstructuralannotation

8 ccPDB* Compilation and creation of data sets from Protein Data Bank

http://crdd.osdd.net/raghava/ccpdb

9 GDoQ* Predicting novel/potent inhibitors against GLMU

http://crdd.osdd.net/raghava/gdoq

10 KiDoQ:* Predicting novel/potent inhibitors against DHDPS

http://crdd.osdd.net/raghava/kidoq

11 MbtA* QSAR and combinatorial library for MbtA 12 MetaPred* Prediction of cytochrome P450 isoform responsible for metabolizing a drug molecule

http://crdd.osdd.net/raghava/metapred

13 Anti-tubercular models* Predictive models for anti-tubercular molecules using machine learning on high throughput biological screening data sets 14 Mutagenicity models* In-silico Predictive Mutagenicity Model Generation Using Supervised Learning Approaches 15 Natural product database

β version

Database of biologically active phytomolecules and plant extracts with anti-mycobacterial activity

http://crdd.osdd.net/osddchem/biophytmol

16 Pharmacomodeling predictions* Modeling metabolic adjustment in Mtb upon treatment with isoniazid 17 Galaxy workflow engine Workflow engine to plugin applications for generating computational pipelines

http://sysborg2.osdd.net

* Published

Available

SLIDE 39

Within weeks, 830 volunteered to re-annotate the entire M. tuberculosis genome. The work started in December 2009 and was completed by April 2010, packing nearly 300 man-years into 4 months!

Source: Munos B. Can Open-Source Drug R&D Repower Pharmaceutical Innovation? Clin Pharmacol Ther 2010;87:534–536 Source: Hiroaki Kitano Nature Chemical Biology 7, 323–326 (2011) Social engineering for virtual 'big science' in systems biology

SLIDE 40

OSDD : A Global Community - More than 6500 members from over 130 countries

Statistics as of October 2012

SLIDE 41

Together we can … .. and we should !

http://www.osdd.net http://c2d.osdd.net info@osdd.net anshub@osdd.net anshu.bhardwaj

Report of the CEWG of WHO Recognised OSDD as an Open Innovation Model

5 April 2012 | Geneva

How Open Source Drug Discovery Is Helping India Develop New Drugs Apr 9, 2012

DNDi POLICY BRIEF recognised OSDD as part of Global Landscape for Neglected Diseases R&D April 2012

Crowd Sourcing Innovation: CSIR portal for OSDD 2011 Crowd-Sourcing Drug Discovery 24 February 2012

Vol. 335 no. 6071 p. 909