Big Data in Drug Discovery
David J. Wild
Assistant Professor & Director, Cheminformatics Program Indiana University School of Informatics and Computing djwild@indiana.edu - http://djwild.info
Big Data in Drug Discovery David J. Wild Assistant Professor & - - PowerPoint PPT Presentation
Big Data in Drug Discovery David J. Wild Assistant Professor & Director, Cheminformatics Program Indiana University School of Informatics and Computing djwild@indiana.edu - http://djwild.info Epochs in drug discovery Empirical up until
Assistant Professor & Director, Cheminformatics Program Indiana University School of Informatics and Computing djwild@indiana.edu - http://djwild.info
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
David Wild, December 2009. Page 3 http://djwild.info
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
David Wild, December 2009. Page 4 http://djwild.info
http://video.google.com/videoplay?docid=-2351549868099343381&hl=en#
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
David Wild, December 2009. Page 6 http://djwild.info
2,824,265 35,379,748 56,774,950 69,088,100
10,000,000 20,000,000 30,000,000 40,000,000 50,000,000 60,000,000 70,000,000 80,000,000 2005-01 2005-03 2005-05 2005-07 2005-09 2005-11 2006-01 2006-03 2006-05 2006-07 2006-09 2006-11 2007-01 2007-03 2007-05 2007-07 2007-09 2007-11 2008-01 2008-03 2008-05 2008-07 2008-09 2008-11 2009-01 2009-03 2009-05 2009-07 2009-09 2009-11 2010-01 2010-03 2010-05 2010-07
Addition of ChemSpider 434635
1 10 100 1000 10000 100000 1000000 2005-01 2005-04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04 2008-07 2008-10 2009-01 2009-04 2009-07 2009-10 2010-01 2010-04 2010-07
PubChem Bioassays 2005-2010
Addition of ChEMBL
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
David Wild, December 2009. Page 7 http://djwild.info
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
David Wild, December 2009. Page 8 http://djwild.info
http://www.genome.jp/en/db_growth.html
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
David Wild, December 2009. Page 9 http://djwild.info
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Range of ROCV values from different classes of BioAssay data set.
Range of ROCV values from three different classes of BioAssay data set for original models and models built with additional inactive compounds (“improved”).
Chen, B. and Wild, D.J. PubChem BioAssays as a data source for predictive models, Journal of Molecular Graphics and Modeling. 2010; 28, 420-426.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Predicting new molecular targets for known drugs. Nature 462, 175-181(12 November 2009)
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Cloud computing allows processing and data mining on a vast scale Integrative cheminformatics & bioinformatics connects compounds, targets genes, pathways, diseases and side effects Health informatics (PHRs and EHRs) allows integration
and patient models (QP) Semantic technologies and complex systems tools allow seamless integration and human-scale data mining
Visualization, projection, data mining, hypothesis generation, network tools
RDF , XML, Triple Stores Ontologies, SPARQL, Graph algorithms
Web Services, RPC Information extraction
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Dong, X., Gilbert, K.E., Guha, R., Heiland, R., Kim, J., Pierce, M.E. Pierce, Fox, G.C. and Wild, D.J. Web service infrastructure for chemoinformatics, J. Chem. Inf. Model., 2007; 47(4) pp 1303-1307.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Chen, B., Dong. X., Jiao, D., Wang, H., Zhu, Q., Ding, Y ., Wild, D.J. Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinformatics 2010, 11, 255
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Choi, J.Y . , Bae, S.H., Qiu, J., Fox, G., Chen, B., Wild. D.J. Browsing Large Scale Cheminformatics Data with Dimension Reduction. Emerging Computational Methods for the Life Sciences Workshop, ACM Symposium for High Performance Distributed Computing Jun 21-25, 2010, Chicago IL
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Choi, J.Y . , Bae, S.H., Qiu, J., Fox, G., Chen, B., Wild. D.J. Browsing Large Scale Cheminformatics Data with Dimension Reduction. Emerging Computational Methods for the Life Sciences Workshop, ACM Symposium for High Performance Distributed Computing Jun 21-25, 2010, Chicago IL
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Dexamethasone Triamcinalone NFKB1 Glucocorticoid Receptor
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Virtuoso runs Chem2Bio2RDF queries on the cloud Cytoscape plugins give access to Chem2Bio2RDF , LPG and chemical structure visualization Dynamic exploration in Cytoscape
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
ChemBioScape.Drugbank interaction contains information about every drug’s target. In this case, DB00741 and DB01234 share common targets through several different Drugbank interaction ID’s.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
interaction 2348 and 1962.Also, the two drugs appear in PubMed articles 8119326 and 8223912 via their CID (Compound ID)
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
., Um, K., Wilson, T ., et al.: inhA, a gene encoding a target for isoniazid and ethionamide in Mycobacterium tuberculosis. Science, 263(5144), 227-230 (1994).
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Doxorubicin (anthracyclin antibiotic)
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
QUERY CID 86427 CID 8642 AID 328 PubMed 12856 Breast Cancer Breast Cancer HER2 Breast Cancer
similar_to similar_to active_against contains_term contained_in contains_term contains_term predicted_ inactive_against
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Jiao, D. and Wild, D.J. Extraction of CYP Chemical Interactions from Biomedical Literature Using Natural Language Processing Methods, Journal of Chemical Information and Modeling, 49(2); pp263-269
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Topic 26: cell, expression, cancer, tumor,… Related Disease: DNA Damage, Melanoma, Glioblastoma, … Un-proved link proved link by c2b2r_chemogenomics Target Drug
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.
Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.