WORKING WITH EHR DATA FROM DUKE UNIVERSITY HEALTH SYSTEM: WHAT IS IT
AND HOW DO I DO IT?
Benjamin A. Goldstein PhD, MPH ben.goldstein@duke.edu
Department of Biostatistics & Bioinformatics School of Medicine Duke University
May 13th, 2020
1 / 74
AND H OW D O I DO IT ? Benjamin A. Goldstein PhD, MPH - - PowerPoint PPT Presentation
W ORKING WITH EHR D ATA FROM D UKE U NIVERSITY H EALTH S YSTEM : W HAT IS IT AND H OW D O I DO IT ? Benjamin A. Goldstein PhD, MPH ben.goldstein@duke.edu Department of Biostatistics & Bioinformatics School of Medicine Duke University May 13
Department of Biostatistics & Bioinformatics School of Medicine Duke University
1 / 74
2 / 74
3 / 74
https://dashboard.healthit.gov/quickstats/quickstats.php
4 / 74
https://dashboard.healthit.gov/quickstats/pages/ FIG-Vendors-of-EHRs-to-Participating-Professionals.php
5 / 74
6 / 74
7 / 74
Chronicles - ~95,000 Data Elements Data stored immediately Clarity - ~17,000 Tables & 125,000 columns Data stored overnight Caboodle - 19 Tables & 76 Dimensions
Disease Specific Registries
Analytic Tools
8 / 74
9 / 74
10 / 74
11 / 74
12 / 74
13 / 74
14 / 74
15 / 74
16 / 74
17 / 74
18 / 74
19 / 74
20 / 74
21 / 74
22 / 74
23 / 74
24 / 74
25 / 74
26 / 74
27 / 74
28 / 74
Data are collected for Data are collected for
the study clinical care
Pre-planned study visits Random clinical
encounters
Research staff enter Entered by clinicians
into CRFs
Same data for Only information deemed
all patients important by clinician
Statistician pulls Informaticist extracts
from RedCap data
Top-down - start with study Bidirectional design -
and collect relevant data start with study but assess available data
29 / 74
1
2
3
4
30 / 74
◮ Returning to Hospital ◮ Implement alert for
30 days after discharge Readmissions Risk
◮ Compare Medical vs ◮ Point of Care
Surgical Treatment Randomization
◮ Experience of Incident ◮ Screening for
Diabetics Diabetes
31 / 74
32 / 74
Biostatistician Informaticist Epidemiologist/ Clinician Data Manipulation Study Design Data Extraction Analysis Research Question Variable Definition
Clinical Research
33 / 74
34 / 74
35 / 74
36 / 74
1
2
3
37 / 74
38 / 74
39 / 74
40 / 74
41 / 74
42 / 74
Richesson RL, Rusincovitch SA, Wixted D, Batch BC, Feinglos MN, Miranda ML, Hammond WE, Califg RM, Spratt SE. A Comparison of Phenotyp Defjnitions for Diabetes Mellitus. J Am Med Inf Assoc 2013 (epub ahead of print). http://www.ncbi.nlm.nih.gov/pubmed/24026307
43 / 74
ICD-9 250.x0
ICD-9 & 250.x2 (249.xx, 357.2, Abnormal Diabetes 250.xx (exclude type I) 362.0x, 366.41) HbA1c Glucose OGTT Meds ICD-9 250.xx X CMS CCW X* X* NYC A1c Registry X Meds X DDC X X X X X X SUPREME-DM X* X* X X X X
X* X X X * Distinction between Inpatient and Outpatient Visits 44 / 74
ANY TYPE2 TYPE2unsp 0.6 0.7 0.8 0.9 1.0 0.00 0.01 0.02 0.03 0.04 0.01 0.02 0.03 0.04 0.05 0.01 0.02 0.03 0.04 0.05
1−Specificity (FPF) Sensitivity (TPF)
Authoritative Source 250 A1C CCW DDC4 MED NW SUP A1C_OR_MED
Diabetes Validation Results faceted by Endpoint
Spratt et al. JAMIA 2017
45 / 74
46 / 74
47 / 74
48 / 74
Weiskopf et al. EGEMS 2017
49 / 74
50 / 74
51 / 74
1
2
3
4
5
52 / 74
53 / 74
54 / 74
55 / 74
56 / 74
57 / 74
58 / 74
They share an EHR system with DUHS but special permission is needed to access their data.
59 / 74
60 / 74
61 / 74
62 / 74
63 / 74
63 / 74
64 / 74
65 / 74
66 / 74
PCORnet Data Model
Electronic Health Records Data
All patients Structured data elements, curated to match a common data model Metadata Code base Project 1 Project 2 Cohort Department/Specialty Datamarts
e.g., Stork, Transplant, Cardiology
Other datasets
e.g., claims data
Data Sidecars IRB approval Geospatial Data
67 / 74
1
2
3
4
68 / 74
69 / 74
70 / 74
DEDUCE CRDM ACE FFS Access: GUI Interface Direct SQL Query Work with Informaicist Reproducibility: Moderate High High (need to query in (need to reengage same way) informaticist) Automatable Queries: No Yes Yes Data Elements: Most Structured PCORnet CDM + All Data Elements Duke Specific Side Cars Data Refresh: Daily Quarterly Daily (Working towards daily) Cost: Free Currently Free $ (Creating Cost Recovery Model) Time to Access Data: Always Available Always Available Variable Ideal use: Cohort creation Reproducible data Need to access w/out needing to code extraction “harder” data elements
71 / 74
72 / 74
73 / 74
74 / 74