a large scale chemical data integration system
play

A large-scale chemical data integration system Gaia Paolini Pfizer - PowerPoint PPT Presentation

A large-scale chemical data integration system Gaia Paolini Pfizer Confidential 1 Large-Scale Chemical Data Integration Summary Current situation Business case Aims The design process Functionality Applications


  1. A large-scale chemical data integration system Gaia Paolini Pfizer Confidential 1

  2. Large-Scale Chemical Data Integration Summary � Current situation � Business case � Aims � The design process � Functionality � Applications

  3. Large-Scale Chemical Data Integration The Project � A large chemical data warehouse to store and integrate Pfizer and third-party information using chemical structure as the natural entry point � Millions of chemical structures � Based on the DayCart Oracle Cartridge

  4. Large-Scale Chemical Data Integration Why Integrate? � The need to integrate and mine disparate sources of data Mergers & acquisitions In-licensing In-house data 3 rd -party databases

  5. Large-Scale Chemical Data Integration Why Integrate? � Data available to buy and integrate from external sources � Need for active chemoinformatics research repository � Opportunity to highlight connections � Chemical Properties � Structural similarities

  6. Large-Scale Chemical Data Integration Aims of the Data Warehouse � Enable chemical/pharmaceutical data mining and knowledge discovery � Store chemical structures and properties together with related entities � Biology � Portfolio � Inventory

  7. Large-Scale Chemical Data Integration Scope � Data warehouse � Common consolidated set of data � Repository of selected fields from Pfizer and third-party data � Source independent � Chemo-centric : indexed on structure not compound ID � Emphasis on data integration rather than front end client application

  8. Large-Scale Chemical Data Integration Requirements � Unique chemical structure indexing � Multiple and hierarchical tautomeric and stereochemical indexing � Integrate internal and external data � Indexed by chemical structure � Integrate chemo- and bio-informatics communities � Fit-for-purpose model architecture � Uses corporate dictionaries to standardise entities � Create connections and synonym tables

  9. Large-Scale Chemical Data Integration What do we want from our data? � Data should be easy to � access � compare phase � exchange launched � manipulate target compound

  10. Large-Scale Chemical Data Integration Why data integration?

  11. Large-Scale Chemical Data Integration Database Design Decisions � Central data warehouse � Selective data integration � Focus on chemical structure � SMILES representation in DayCart � Flexible compound wiring

  12. Large-Scale Chemical Data Integration Central Data Warehouse � Data is decoded, loaded, cleaned and mapped Pfizer ETL DataMart Chemical Structure Integration Staging DrugStore External DataMart Tables warehouse Chemical DataMart External structure Structures integration

  13. Large-Scale Chemical Data Integration Selective data integration Database Drug Store Pipeline Pilot Spotfire Flexible UI Ad-hoc queries and data mining Pfizer marts … 3 rd -party … Contributed research…

  14. Large-Scale Chemical Data Integration Data Integration � Consolidated, homogeneous set of data: � One index for every entity � One unit of measure for every property � We can: � Highlight connections between entities � Create new connections � Filter on properties � Interface to other databases

  15. Large-Scale Chemical Data Integration Chemo-centric design � Every entity and property is connected to a chemical structure � Seamless integration of different data sources � Can measure how a data source enriches chemical space � Consistent modelling of tautomers and stereo- isomers � Easy to apply hierarchical order (e.g. parent-child) � Any (and multiple) grouping of structures allowed � Intuitive application of chemo-informatics methods

  16. Large-Scale Chemical Data Integration DayCart Oracle Cartridge � SMILES chemical representation � Structure comparison, transformation, manipulation � Fast data retrieval

  17. Large-Scale Chemical Data Integration DayCart: Chemical Representation � SMILES syntax support � Compact, linear representation � Self contained language � Computer friendly & searchable � No proprietary data types!

  18. Large-Scale Chemical Data Integration DayCart: Functions for Chemical Information � Exact match � Substructure � Similarity � Tautomers � Salts � Stereochemistry

  19. Large-Scale Chemical Data Integration DayCart: Indexes � Four (domain) indexes � DDBLOB: substructure, similarity � DDGRAPH: tautomers, stereochemistry � DDROLE: salts � DDEXACT: exact match � Essential for performance � Trade-off data-load/index building � Partitioning? (Next version)

  20. Large-Scale Chemical Data Integration DayCart: Indexes DDBLOB DDGRAPH DDROLE 700 600 500 Time (in mins) 400 300 200 100 0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 No of Records being indexed

  21. Large-Scale Chemical Data Integration DayCart: VCS_normalize � Transform structures according to database rules encoded in SMIRKS � Apply internal business rules � Standardize structures � Performance?

  22. Large-Scale Chemical Data Integration Applications � Perform large-scale data mining � Accelerate exploration of new ideas at project inception � Repository for chemo-informatics knowledge � Advanced research database for computational chemists

  23. Large-Scale Chemical Data Integration Example Query: chemical toolbox Find all screens and compounds tested against each target Filter target Find all activity results & rank compounds compound Filter out non druggable compounds Select available compounds Filter out non-selective compounds activity Select top ten representative diverse structures “Show me the most potent, selective tools for each target, available in- house”

  24. Large-Scale Chemical Data Integration Acknowledgements

  25. Large-Scale Chemical Data Integration Thank you!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend