a common data model why strengths and limitations of a
play

A Common Data Model- Why? Strengths and limitations of a common - PowerPoint PPT Presentation

A Common Data Model- Why? Strengths and limitations of a common data approach Patrick Ryan, PhD Janssen Research and Development Columbia University Medical Center Odyssey ( noun ): \oh-d-si\ 1. A long journey full of adventures 2. A series


  1. A Common Data Model- Why? Strengths and limitations of a common data approach Patrick Ryan, PhD Janssen Research and Development Columbia University Medical Center

  2. Odyssey ( noun ): \oh-d-si\ 1. A long journey full of adventures 2. A series of experiences that give knowledge or understanding to someone http://www.merriam-webster.com/dictionary/odyssey

  3. The journey to real-world evidence Patient-level Reliable data in source evidence system/schema One-time Repeated

  4. The journey to real-world evidence Different types of observational data: Populations • Pediatric vs. elderly • Socioeconomic disparities • Care setting • Inpatient vs. outpatient • Primary vs. secondary care • Patient-level Reliable Data capture process • data in source evidence Administrative claims • system/schema Electronic health records • Clinical registries • Health system • Insured vs. uninsured • Country policies • One-time Repeated

  5. The journey to real-world evidence Types of evidence desired: Cohort identification • Clinical trial feasibility and • recruitment Clinical characterization • Treatment utilization • Disease natural history Patient-level Reliable • data in source Quality improvement evidence • system/schema Population-level effect estimation • Safety surveillance • Comparative effectiveness • Patient-level prediction • Precision medicine • Disease interception • One-time Repeated

  6. Opportunities for standardization in the evidence generation journey • Data structure : tables, fields, data types • Data conventions : set of rules that govern how data are represented • Data vocabularies : terminologies to codify clinical domains • Cohort definition : algorithms for identifying the set of patients who meet a collection of criteria for a given Protocol interval of time • Covariate construction : logic to define variables available for use in statistical analysis • Analysis : collection of decisions and procedures required to produce aggregate summary statistics from patient-level data • Results reporting : series of aggregate summary statistics presented in tabular and graphical form

  7. Desired attributes for reliable evidence Desired Question Researcher Data Analysis Result attribute Repeatable Identical Identical Identical Identical = Identical Reproducible Identical Different Identical Identical = Identical Replicable Identical Same or Similar Identical = Similar different Generalizable Identical Same or Different Identical = Similar different Robust Identical Same or Same or Different = Similar different different Calibrated Similar Identical Identical Identical = Statistically (controls) consistent

  8. Minimum requirements to achieve reproducibility Desired Question Researcher Data Analysis Result attribute Reproducible Identical Different Identical Identical = Identical B C L K A X Patient-level Reliable M D W Y Z data in source evidence E Q system/schema N P V J F O R U G I H T S Complete documented specification that fully describes all • data manipulations and statistical procedures Original source data, no staged intermediaries • Full analysis code that executes end-to-end (from source to • results) without manual intervention One-time Repeated

  9. How a common data model + common analytics can support reproducibility Desired Question Researcher Data Analysis Result attribute Reproducible Identical Different Identical Identical = Identical B C L K A Patient-level Reliable D Patient- data in source evidence E M level data system/schema in CDM J F G I H Use of common data model splits the journey into two • segments: 1) data standardization, 2) analysis execution ETL specification and source code can be developed and • evaluated separately from analysis design CDM creates opportunity for re-use of data step and • analysis step One-time Repeated

  10. Challenges to achieve replication Desired Question Researcher Data Analysis Result attribute Replicable Identical Same or Similar Identical = Similar different Similar Source 1 evidence … B C L K A X Reliable M D W Y Source i Z evidence E Q N P V … J F O Similar R G U I Source n evidence H T S If analysis procedure is not identical across sources, how do you • determine if any differences observed are due to data vs. analysis? One-time Repeated

  11. How a common data model + common analytics can support replication Desired Question Researcher Data Analysis Result attribute Replicable Identical Same or Similar Identical = Similar different Similar Source 1 M Source 1 evidence CDM … B C L K A Reliable D Source i evidence Source i E M CDM … J F Similar G I Source n evidence H Source n M CDM One-time Repeated

  12. How a common data model + common analytics can support robustness Desired Question Researcher Data Analysis Result attribute Robust Identical Same or Same or Different = Similar different different Similar evidence B C N L K A Patient-level Reliable D Patient- data in source evidence E M level data system/schema in CDM J F O Similar G I H evidence Sensitivity analyses can be systematically conducted with • parameterized analysis procedures using a common input One-time Repeated

  13. How a common data model + common analytics can support calibration Desired Question Researcher Data Analysis Result attribute Calibrated Similar Identical Identical Identical = Statistically (controls) consistent B C L K Known Known A D Reliable Patient- inputs outputs Source E M evidence level data data in CDM J F G I H With a defined reproducible process, you can measure a • system’s performance and learn how to properly interpret the system’s outputs One-time Repeated

  14. Flavors of validation throughout the evidence generation journey Validation: “the action of checking or proving the accuracy of something” Clinical: to what extent does Data : are the data completely the analysis conducted match captured with plausible values in a the clinical intention? manner that is conformant to agreed structure and conventions? Clinical Data Validation Validation Software Methods Validation Validation Statistical : do the estimates Software : does the software do generated in an analysis what it is expected to do? measure what they purport to?

  15. Structuring the journey from source to a common data model Patient-level Patient-level ETL ETL data in ETL test data in source design implement Common Data system/schema Model Types of ‘validation’ required: Data validation, software validation (ETL) One-time Repeated

  16. Structuring the journey from a common data model to evidence Single study Write Develop Execute Compile Protocol code analysis result Real-time query Patient-level Reliable data in Develop Design Submit Review evidence CDM app query job result Large-scale analytics Develop Execute Explore app script results Types of ‘validation’ required: Software validation (analytics), Clinical validation, Statistical validation One-time Repeated

  17. Motivations for developing different common data models Collaboration Data type(s) Analytic use cases type I2b2 Grant -> Open- EHR, ‘omics cohorts Cohort identification • source project Translational research • Sentinel Contract US private-payer claims Clinical characterization • Safety surveillance • PCORNet Grant US EHR Cohort identification • Comparative effectiveness • EU-ADR Grant European EHR, claims Clinical characterization • (Jerboa) Safety surveillance • OHDSI Open-science International Cohort identification • (OMOP) community claims, EHR, hospital, Clinical characterization • registries Population-level estimation • (safety + effectiveness) Patient-level prediction •

  18. Balancing tradeoffs in data management vs analysis complexity Harder Common protocol + Common structure for N studies + Common conventions + Common vocabularies Common protocol Complexity for + Common structure data + Common conventions management (source data  Common protocol + input format for Common structure for 1 study analysis) Common protocol Easier Easier Harder Complexity for analyst (input format for analysis  final analysis results)

  19. Common data model + common analytics provides improved efficiency and reliability Harder Cohort identification Clinical characterization for N studies Population-level effect estimation Patient-level prediction Complexity for data management (source data  input format for analysis) Common protocol Easier Easier Harder Complexity for analyst (input format for analysis  final analysis results)

  20. Concluding thoughts • On the journey from source data to reliable evidence, think about where you are starting and where you want to end up • Common data model + common analytics can help standardize parts of the journey • The decision of whether (and which) CDM to apply to a EU network should be driven by the requirements around the reliability of the evidence and the efficiency of the evidence generation process

  21. Questions? Join the journey! ryan@ohdsi.org

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend