Data Quality: finding data and evaluating its quality for your - - PowerPoint PPT Presentation

data quality finding data and
SMART_READER_LITE
LIVE PREVIEW

Data Quality: finding data and evaluating its quality for your - - PowerPoint PPT Presentation

Data Quality: finding data and evaluating its quality for your research David Dorr, MD, MS Professor and Vice Chair, Medical Informatics and Clinical Epidemiology Professor, Medicine Oregon Health & Science University AcademyHealth 2017!


slide-1
SLIDE 1

Data Quality: finding data and evaluating its quality for your research

David Dorr, MD, MS Professor and Vice Chair, Medical Informatics and Clinical Epidemiology Professor, Medicine Oregon Health & Science University AcademyHealth 2017!

slide-2
SLIDE 2

Overview

  • How can you find data for your problem that is appropriate for your

methods?

  • Choices
  • Operational database, e.g., from your Electronic Health Record system
  • Clinical Quality Measure or Registry-based database
  • Database cluster / linkage : e.g., PCORnet, i2b2 SHRINE
  • Standardized database for observational studies : OHDSI OMOP common data

model

  • Specific datasets : FigShare, published articles, the LIBRARY (including the

National Library of Medicine)

slide-3
SLIDE 3

Data quality overview for evaluation

Example ADMINISTRATIVE / CLAIMS data Completeness Can be made 100% complete for concepts provided; MISSING many clinical concepts (e.g., results) Correctness Variable by use; encounter correctness GOOD; diagnosis correctness MODERATE Currency POOR – lag Granularity Fine grained for diagnosis Integration* Challenging – often de-identified or different identifiers Fitness for use (examples) Utilization rates for an at-risk population

http://www.oregon.gov/oha/HPA/ANALYTICS/APAC%20Page%20Docs/APAC-Overview.pdf

e.g., ALL PAYER ALL CLAIMS Links beneficiaries across insurances De-identified Restricted * Or Interoperability: but I mean can it be combined with other data sources.

slide-4
SLIDE 4

Operational databases

Adapted from https://medinform.jmir.org/2014/1/e5/

slide-5
SLIDE 5

Quality Data Model

  • All clinical quality measures

Certain DOMAINS Certain taxonomies (LOINC, ICD-10, RxNORM) ONLY mapped if CQM uses it

https://ecqi.healthit.gov/qdm

slide-6
SLIDE 6

EHR data source dramatically affects data quality and interpretation

Sensitivity was highest in encounters (.55), and specificity in the Problem List (.82). Combining all information led to sensitivity of .95 and specificity

  • f .19.
slide-7
SLIDE 7

Data quality for EHR extracts

EHR data into standard EHR data warehouse Completeness Often more complete by DOMAIN; temporal completeness varies Correctness HIGHLY variable Currency EXCELLENT (with careful constraints) Granularity Fine grained for many domains; with narrative notes, can get extensively fine grained Integration Frequent foreign keys (multiple identifiers), limited by policy Fitness for use (examples) Episode-based care; setting-based care, including specialties (e.g., ambulatory primary care); workflow / operations (time-stamped observations) e.g., ‘Clarity’ Data warehouse for Epic (or newer Caboodle); Analytic data warehouses ‘Back end access’ to EHR data

slide-8
SLIDE 8

Definitions of observed conditions / symptom sets – PHENOTYPES – are increasingly required to improve quality but vary across sources

PheKB Phenotype

PhenX Protocol Name PhenX ID LOINC Name LOINC Code CDE Name CDE ID

Global Mental Status Screener - Adult PX130701 Global mental status adult proto 62769-5 Adult Cognitive Assessment Score 3076130 … subvariables under this level with logic

Human Phenotype Ontology

How do I define ‘Dementia’ for my study?

slide-9
SLIDE 9

The variation amongst phenotypes extends across domains

Shivade et al, JAMIA

slide-10
SLIDE 10

Clinical databases for observational studies in use

Database Description Size / Use (Mini)Sentinel Database for active surveillance of regulated products; maintained by FDA and other network notes; claims and pharmacy 178 million members; search Sentinel FDA PCORNet Distributed network with common data model 122 million; http://www.pcornet.org/ I2b2 / SHRINE Distributed open source software and common data model with deployed networks 23 million; i2b2.org OHDSI OMOP CDM Common data model intended to facilitate

  • bservational studies; used in All of Us

precision medicine 600 million; ohdsi.org

slide-11
SLIDE 11

Distributed models can facilitate collaboration / spread, but also require external resources

slide-12
SLIDE 12

Improving data quality: encouraging better mapping

PheKB Phenotype: Dementia (excerpt)

PhenX Protocol Name PhenX ID LOINC Name LOINC Code CDE Name CDE ID

Global Mental Status Screener - Adult PX130701 Global mental status adult proto 62769- 5 Adult Cognitive Assessment Score 307613 … subvariables under this level with logic

Human Phenotype Ontology: Dementia

COMPUTABLE PHENOTYPE ASSESSMENT tool

Atlas

www.ohdsi.org

Evaluation Availability Feasibility Accuracy Currency Completeness, and Representativeness

slide-13
SLIDE 13

OHDSI OMOP common data model

Model Domain Table Names Standardized Clinical Data Tables PERSON, OBSERVATION_PERIOD, SPECIMEN, DEATH, VISIT_OCCURRENCE, PROCEDURE_OCCURRENCE, DRUG_EXPOSURE, DEVICE_EXPOSURE, CONDITION_OCCURRENCE, MEASUREMENT, NOTE, OBSERVATION, FACT_RELATIONSHIP Standardized Health System Data Tables LOCATION, CARE_SITE, PROVIDER Standardized Health Economics Data Tables PAYER_PLAN_PERIOD, VISIT_COST, PROCEDURE_COST, DRUG_COST, DEVICE_COST Standardized Derived Elements COHORT, COHORT_ATTRIBUTE, DRUG_ERA, DOSE_ERA, CONDITION_ERA

slide-14
SLIDE 14

OHDSI OMOP related open source software

slide-15
SLIDE 15

Achilles Heel for OHDSI can automatically detect data errors

slide-16
SLIDE 16

Error rates per patient for OHDSI OMOP

0.02 0.04 0.06 0.08 0.1 0.12

A B C D E F G H I J K L M

Errors / patient (using minimum database size)

Error/patient

Huser et al GEMS

slide-17
SLIDE 17

Data quality for Clinical datasets

Clinical datasets Completeness Often incomplete; ETLs should define what data is in there and allow for assessment of completeness for your need Correctness May be validated and improved Currency Moderate Granularity Transformation may reduce some granularity, especially for nuanced concepts Integration Already integrated; but expanding to new data sources hard Fitness for use (examples) Hypothesis generation / cohort discovery on large scale studies; basic observational studies

http://www.oregon.gov/oha/HPA/ANALYTICS/APAC%20Page%20Docs/APAC-Overview.pdf

e.g., PCORNet, i2b2, OHDSI How to find

  • Your local Clinical and Translational Science Institute
  • core websites with forums
slide-18
SLIDE 18

Finding OTHER data sources to test hypotheses

  • The majority of de-identified data is not in any of these standards, but their
  • wn.
  • Multiple efforts to make these FAIR
  • Findable
  • Accessible
  • Interoperable
  • Reusable

WHERE TO LOOK? The LIBRARY! Try the national library of medicine https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html Open source aggregation / metadata

slide-19
SLIDE 19

DataCite

slide-20
SLIDE 20

FigShare

slide-21
SLIDE 21

Data quality for found datasets

Found datasets Completeness Complete solely for its purpose Correctness Oddly poor Currency Frozen in time Granularity VARIES Integration Extremely difficult Fitness for use (examples) Replication or focused Exploratory data analysis for pilot data

slide-22
SLIDE 22

Thanks !

  • Lots of help from Nicole Weiskopf, PhD, who actually knows

something about data quality

  • dorrd@ohsu.edu