Data Quality: finding data and evaluating its quality for your research
David Dorr, MD, MS Professor and Vice Chair, Medical Informatics and Clinical Epidemiology Professor, Medicine Oregon Health & Science University AcademyHealth 2017!
Data Quality: finding data and evaluating its quality for your - - PowerPoint PPT Presentation
Data Quality: finding data and evaluating its quality for your research David Dorr, MD, MS Professor and Vice Chair, Medical Informatics and Clinical Epidemiology Professor, Medicine Oregon Health & Science University AcademyHealth 2017!
David Dorr, MD, MS Professor and Vice Chair, Medical Informatics and Clinical Epidemiology Professor, Medicine Oregon Health & Science University AcademyHealth 2017!
model
National Library of Medicine)
Example ADMINISTRATIVE / CLAIMS data Completeness Can be made 100% complete for concepts provided; MISSING many clinical concepts (e.g., results) Correctness Variable by use; encounter correctness GOOD; diagnosis correctness MODERATE Currency POOR – lag Granularity Fine grained for diagnosis Integration* Challenging – often de-identified or different identifiers Fitness for use (examples) Utilization rates for an at-risk population
http://www.oregon.gov/oha/HPA/ANALYTICS/APAC%20Page%20Docs/APAC-Overview.pdf
e.g., ALL PAYER ALL CLAIMS Links beneficiaries across insurances De-identified Restricted * Or Interoperability: but I mean can it be combined with other data sources.
Adapted from https://medinform.jmir.org/2014/1/e5/
https://ecqi.healthit.gov/qdm
Sensitivity was highest in encounters (.55), and specificity in the Problem List (.82). Combining all information led to sensitivity of .95 and specificity
EHR data into standard EHR data warehouse Completeness Often more complete by DOMAIN; temporal completeness varies Correctness HIGHLY variable Currency EXCELLENT (with careful constraints) Granularity Fine grained for many domains; with narrative notes, can get extensively fine grained Integration Frequent foreign keys (multiple identifiers), limited by policy Fitness for use (examples) Episode-based care; setting-based care, including specialties (e.g., ambulatory primary care); workflow / operations (time-stamped observations) e.g., ‘Clarity’ Data warehouse for Epic (or newer Caboodle); Analytic data warehouses ‘Back end access’ to EHR data
PheKB Phenotype
PhenX Protocol Name PhenX ID LOINC Name LOINC Code CDE Name CDE ID
Global Mental Status Screener - Adult PX130701 Global mental status adult proto 62769-5 Adult Cognitive Assessment Score 3076130 … subvariables under this level with logic
Human Phenotype Ontology
How do I define ‘Dementia’ for my study?
Shivade et al, JAMIA
Database Description Size / Use (Mini)Sentinel Database for active surveillance of regulated products; maintained by FDA and other network notes; claims and pharmacy 178 million members; search Sentinel FDA PCORNet Distributed network with common data model 122 million; http://www.pcornet.org/ I2b2 / SHRINE Distributed open source software and common data model with deployed networks 23 million; i2b2.org OHDSI OMOP CDM Common data model intended to facilitate
precision medicine 600 million; ohdsi.org
PheKB Phenotype: Dementia (excerpt)
PhenX Protocol Name PhenX ID LOINC Name LOINC Code CDE Name CDE ID
Global Mental Status Screener - Adult PX130701 Global mental status adult proto 62769- 5 Adult Cognitive Assessment Score 307613 … subvariables under this level with logic
Human Phenotype Ontology: Dementia
COMPUTABLE PHENOTYPE ASSESSMENT tool
Atlas
www.ohdsi.org
Evaluation Availability Feasibility Accuracy Currency Completeness, and Representativeness
Model Domain Table Names Standardized Clinical Data Tables PERSON, OBSERVATION_PERIOD, SPECIMEN, DEATH, VISIT_OCCURRENCE, PROCEDURE_OCCURRENCE, DRUG_EXPOSURE, DEVICE_EXPOSURE, CONDITION_OCCURRENCE, MEASUREMENT, NOTE, OBSERVATION, FACT_RELATIONSHIP Standardized Health System Data Tables LOCATION, CARE_SITE, PROVIDER Standardized Health Economics Data Tables PAYER_PLAN_PERIOD, VISIT_COST, PROCEDURE_COST, DRUG_COST, DEVICE_COST Standardized Derived Elements COHORT, COHORT_ATTRIBUTE, DRUG_ERA, DOSE_ERA, CONDITION_ERA
0.02 0.04 0.06 0.08 0.1 0.12
A B C D E F G H I J K L M
Errors / patient (using minimum database size)
Error/patient
Huser et al GEMS
Clinical datasets Completeness Often incomplete; ETLs should define what data is in there and allow for assessment of completeness for your need Correctness May be validated and improved Currency Moderate Granularity Transformation may reduce some granularity, especially for nuanced concepts Integration Already integrated; but expanding to new data sources hard Fitness for use (examples) Hypothesis generation / cohort discovery on large scale studies; basic observational studies
http://www.oregon.gov/oha/HPA/ANALYTICS/APAC%20Page%20Docs/APAC-Overview.pdf
e.g., PCORNet, i2b2, OHDSI How to find
WHERE TO LOOK? The LIBRARY! Try the national library of medicine https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html Open source aggregation / metadata
Found datasets Completeness Complete solely for its purpose Correctness Oddly poor Currency Frozen in time Granularity VARIES Integration Extremely difficult Fitness for use (examples) Replication or focused Exploratory data analysis for pilot data