AND H OW D O I DO IT ? Benjamin A. Goldstein PhD, MPH - - PowerPoint PPT Presentation

and h ow d o i do it
SMART_READER_LITE
LIVE PREVIEW

AND H OW D O I DO IT ? Benjamin A. Goldstein PhD, MPH - - PowerPoint PPT Presentation

W ORKING WITH EHR D ATA FROM D UKE U NIVERSITY H EALTH S YSTEM : W HAT IS IT AND H OW D O I DO IT ? Benjamin A. Goldstein PhD, MPH ben.goldstein@duke.edu Department of Biostatistics & Bioinformatics School of Medicine Duke University May 13


slide-1
SLIDE 1

WORKING WITH EHR DATA FROM DUKE UNIVERSITY HEALTH SYSTEM: WHAT IS IT

AND HOW DO I DO IT?

Benjamin A. Goldstein PhD, MPH ben.goldstein@duke.edu

Department of Biostatistics & Bioinformatics School of Medicine Duke University

May 13th, 2020

1 / 74

slide-2
SLIDE 2

TALK AGENDA

What are Electronic Health Records What are EHR data elements Types of studies we can do with EHR data Some analytic considerations with EHR data A case study in an EHR based study Options for accessing Duke EHR data

2 / 74

slide-3
SLIDE 3

WHAT ARE ELECTRONIC HEALTH RECORDS?

“An Electronic Health Record (EHR) is an electronic version of a patient’s medical history, that is maintained by the provider over time” (Centers for Medicaire & Medicaid Services (CMS) website) HITECH Act was part of the 2009 stimulus geared to incentivize the use of EHRs Synonyms: Electronic Medical Record (EMR), Patient Health Record (PHR)

3 / 74

slide-4
SLIDE 4

GROWTH OF EHR USAGE

https://dashboard.healthit.gov/quickstats/quickstats.php

4 / 74

slide-5
SLIDE 5

EHR VENDORS

https://dashboard.healthit.gov/quickstats/pages/ FIG-Vendors-of-EHRs-to-Participating-Professionals.php

5 / 74

slide-6
SLIDE 6

FRONT END OF EHRS

6 / 74

slide-7
SLIDE 7

BACK END OF EHRS

7 / 74

slide-8
SLIDE 8

COMPLEXITY OF EPIC BACKEND

Chronicles - ~95,000 Data Elements Data stored immediately Clarity - ~17,000 Tables & 125,000 columns Data stored overnight Caboodle - 19 Tables & 76 Dimensions

Disease Specific Registries

  • Diabetes
  • Afib
  • Etc

Analytic Tools

  • Provider Dashboards
  • Predictive models
  • Etc

8 / 74

slide-9
SLIDE 9

DATA ELEMENTS

Patient Demographics Encounters (Outpatient/Inpatient) Diagnoses Procedures Lab Results Vital Signs Medications Social History Provider Information Radiological Results Doctor Notes

9 / 74

slide-10
SLIDE 10

DEMOGRAPHICS

Patients have a single ID that follows them across all encounters

  • medical record number (MRN)

Basic information: Age, Sex, Race/Ethnicity that is typically static Time varying elements include: Payer, address

10 / 74

slide-11
SLIDE 11

ENCOUNTER TYPE

Encounters have an encounter ID that links the encounter context to what happened (diagnoses, tests etc.) Three Basic Encounters:

Outpatient (AV - Ambulatory Visit) Inpatient (IP) Emergency Department (ED)

Other types of encounters can include telephone consults, emails etc.

11 / 74

slide-12
SLIDE 12

CONTEXTUALIZING INFORMATION FOR ENCOUNTERS

When someone seen (i.e. time stamps for arrival and departure) Who the patient saw (i.e. provider specialty, provider type) Where the patient was seen (i.e. clinic location, facility type) What happened (i.e. vital signs, labs taken, diagnoses made) We don’t have good information on Why — diagnoses don’t

  • ften relate to “chief complaint”

12 / 74

slide-13
SLIDE 13

INTERNATIONAL CLASSIFICATION OF DISEASES (ICD) CODES

Hierarchical system to code all diagnoses that are made during a health encounter In 2015, the US switched to the ICD-10 system (previously ICD-9) ICD-9 had ∼13,000 unique codes, ICD-10 has ∼68,000 Since these are used as billing codes, the codes can be manipulated to increase billing Codes don’t always represent the primary concern

13 / 74

slide-14
SLIDE 14

STRUCTURE OF ICD-10 CODES

Myocardial Infarction: I21 ⇒ Acute Myocardial Infarction

Subsequent numbers designate location of event, e.g. I21.01 ⇒ MI

  • f left main coronary artery

I22 ⇒ Subsequent MI I23 ⇒ Complications of MI https: //www.icd10data.com/ICD10CM/Codes/I00-I99/I20-I25

14 / 74

slide-15
SLIDE 15

ROLLING UP ICD-CODES

Dealing with 68,000 unique codes is not realistic or efficient Agency for Healthcare Research and Quality (AHRQ) developed Clinical Classification Software (CCS) system Allows researchers to roll codes up to appropriate levels https://www.hcup-us.ahrq.gov/toolssoftware/ccs/ AppendixCMultiDX.txt

15 / 74

slide-16
SLIDE 16

CURRENT PROCEDURAL TERMINOLOGY (CPT) CODES

CPT is coding system for what happened during an encounter, e.g., surgeries, x-rays, etc. ∼ 10,000 in use Also tied to reimbursements Similar systems for organizing CPTs as ICDs

16 / 74

slide-17
SLIDE 17

MEDICATIONS

EPIC has > 100 medication-related tables Medications are often organized as:

Prescribed Administered Reconciliation

For prescribed medication dosages may be messy Like diagnoses and procedures medications can become overly granular

RxNorm is a system for rolling up medications into hierarchies https://mor.nlm.nih.gov/RxNav/search?searchBy=String& searchTerm=acetaminophen

17 / 74

slide-18
SLIDE 18

LABORATORY TEST RESULTS

Laboratory tests (along with vitals) differentiate EHR data from administrative data There may be multiple tests panels used which can be labeled differently Modern systems have standardized the nomenclature of laboratory tests

Duke has a catalog of the laboratory tests used: https://testcatalog.duke.edu/

Typically will see time stamps for when test was ordered and resulted An analytic concern is that these measurements are irregularly captured across encounters

18 / 74

slide-19
SLIDE 19

VITALS SIGNS

Most encounters will capture blood pressure, weight and temperature In the hospital vitals may be documented every couple hours ICU monitors can capture very dense data: minute-by-minute or even waveform Data will typically be stored in long running tallies called “flowsheets”

19 / 74

slide-20
SLIDE 20

SOCIAL HEALTH

Data such as smoking status, drug and alcohol use, employment status, marital status, etc., may be reported, but is frequently unreliable Socioeconomic status typically doesn’t exist but proxies can be used via primary payer or neighborhood address There is a growing emphasis on capturing patient reported

  • utcomes (PROs)

Food insecurity, PROMIS, pain, depression inventories

20 / 74

slide-21
SLIDE 21

OTHER DATA ELEMENTS

Problem Lists

Date stamped indicators for when someone has different conditions

  • not always reliable

Admission-Discharge-Transfer (ADT) Data

Time stamps are recorded every time a patient moves in the hospital

Provider Data

Information on who a patient saw and interacted with

User Data

Every time someone signs into EPIC a log is generated

21 / 74

slide-22
SLIDE 22

UNSTRUCTURED DATA

Structured Data refer to quantitative data in a ready-to-analyze format Growing emphasis on incorporating unstructured data which require some processing Examples include:

Notes Images Genetic data

22 / 74

slide-23
SLIDE 23

ORGANIZING DATA

DATA LAKES

Loose organization of data Able to maintain all data elements No explicit linkage between data elements Can be complicated to work with

DATA MARTS

Structured data in a relational format Easier to access data Designed for particular use case(s) Results in loss of information Higher maintenance cost

23 / 74

slide-24
SLIDE 24

NEED FOR DATA MODELS

24 / 74

slide-25
SLIDE 25

PCORNET DATA MODEL

25 / 74

slide-26
SLIDE 26

WORKING WITH DATA MODELS

ADVANTAGES

Simpler data organization, making it easier to access Uniform set of decisions so that data are consistent across institutions

DISADVANTAGES

A general loss of granularity

Not all data elements fit within the data model Many measures are grouped together

26 / 74

slide-27
SLIDE 27

WHY WE WANT TO USE EHR DATA FOR CLINICAL RESEARCH

Data Readily Available Often 100,000’s of Patients Information collected over a variety of fields Ability to study many different clinical questions Representative population

27 / 74

slide-28
SLIDE 28

WHY WE MAY Not WANT TO USE EHR DATA FOR CLINICAL RESEARCH

DATA ARE NOT ORGANIZED FOR RESEARCH

Data exist in disparate places All patients have different pieces of information Observational Data

28 / 74

slide-29
SLIDE 29

EHR VS CLINICAL TRIALS DATA

RCT Data EHR Data Why are data

Data are collected for Data are collected for

collected?

the study clinical care

When are data

Pre-planned study visits Random clinical

collected?

encounters

Who/Where are

Research staff enter Entered by clinicians

data entered?

into CRFs

What data are

Same data for Only information deemed

entered?

all patients important by clinician

How are data

Statistician pulls Informaticist extracts

extracted?

from RedCap data

How are studies

Top-down - start with study Bidirectional design -

designed?

and collect relevant data start with study but assess available data

29 / 74

slide-30
SLIDE 30

WHAT WE CAN DO WITH ELECTRONIC HEALTH RECORDS

1

Risk Prediction

Near term prediction - Risk of in-hospital mortality Long(er) term risk - 30 Day Revisit

2

Population Health

Health Service Utilization - Assessment of high utilizers Disease Epidemiology - Experience of incident diabetes in Durham County

3

Comparative Effectiveness Research (CER)

Retrospective Studies - Assessment of community intervention for diabetics Prospective Studies - Point of care randomization, Pragmatic Trials

4

Association Analyses

Risk factors for disease - Phenome Wide Association Studies Data mining - Drug-Drug interactions

30 / 74

slide-31
SLIDE 31

EHR BASED STUDIES

Retrospective Prospective Risk

◮ Returning to Hospital ◮ Implement alert for

Prediction

30 days after discharge Readmissions Risk

Intervention

◮ Compare Medical vs ◮ Point of Care

Assessment

Surgical Treatment Randomization

Population

◮ Experience of Incident ◮ Screening for

Health

Diabetics Diabetes

31 / 74

slide-32
SLIDE 32

ADVANTAGES OF STUDIES BASED ON EHR DATA

POPULATION HEALTH

EHRs often capture data on particular communities of interest

Estimated that ∼ 80% of Durham County residents receive health service at Duke University Health System affiliated providers

COMPARATIVE EFFECTIVENESS

Opportunity to see real-world usage of medical treatments Can assess both adoption of therapies and effectiveness & safety of therapies

RISK PREDICTION

Contains granular information capturing patients’ clinical state Direct pipeline to implement models into clinical practice

32 / 74

slide-33
SLIDE 33

WHO IS NEEDED FOR AN EHR BASED STUDY?

Biostatistician Informaticist Epidemiologist/ Clinician Data Manipulation Study Design Data Extraction Analysis Research Question Variable Definition

Collaborative Clinical Research

Clinical Research

33 / 74

slide-34
SLIDE 34

SUMMARY POINTS

EHR systems are complex database systems to store patient health data For clinical researchers there are a lot of appealing reasons to want to work with EHR data The raw data elements often need to be processed - typically through hierarchical structures Organizing the data into data models can aid analytics

34 / 74

slide-35
SLIDE 35

Analytic Challenges with EHR Data

35 / 74

slide-36
SLIDE 36

CREATING ANALYTIC DATA SETS

Don’t work directly off datamarts but create analytic data sets Creating analytic datasets requires many decisions, most of which can’t be tested via sensitivity analyses Need to define granularity of the analysis (encounter, patient, etc.) Larger projects may need datamart sub-extracts to generate multiple analytic data sets

36 / 74

slide-37
SLIDE 37

THREE WAYS EHR DATA DIFFER FROM TRADITIONAL CLINICAL DATA

1

We don’t have everything we want

2

Outcomes are not defined - need to phenotype data

3

Data are irregularly and potentially densely captured

37 / 74

slide-38
SLIDE 38

CHALLENGE 1: WE DON’T HAVE EVERYTHING WE WANT

Patients may seek care at multiple facilities Changes in standard of care Missing information on when individuals are healthy Most social health information is not recorded or reliable Cannot expect death to be reliably captured

Most people don’t die in the hospital EHRs have only 20-50% sensitivity

38 / 74

slide-39
SLIDE 39

LINKING EHR DATA

Data from other facilities (PCORNet) Claims: Center for Medicaire & Medicaid Services (CMS) Mortality: National Death Index (NDI) & Social Security Death Index (SSDI) Genetic Data Geocoded Information: housing, environmental, census Personal Tracking Data: wearables, sensors

39 / 74

slide-40
SLIDE 40

MISSING DATA IN EHR DATA SETS

Data in EHRs are typically not missing but not collected Data are Not Missing At Random (NMAR)

Typical imputation strategies would not be appropriate We’ve termed this Informed Presence

Inclusion of proxy measures:

Missingness categories Number of previous encounters Number of times a lab was tested or code was documented

40 / 74

slide-41
SLIDE 41

ADDRESSING INCOMPLETENESS VIA DESIGN

Inpatient analyses are typically well contained but outpatient analyses can lead to loss to follow-up Define local patient population

Live in the catchment of the health system Require a certain a number of primary care appointments before eligible for study

Contextual and proxy information can be linked in

Neighborhood for SES Claims data for additional encounters NDI/SSDI for death

41 / 74

slide-42
SLIDE 42

CHALLENGE 2 NEED FOR COMPUTABLE PHENOTYPES

EHR data do not have direct information on disease states “Problem Lists” exist but are not always reliable We can develop algorithms to define disease states The algorithms typically have high positive predictive value and specificity but not always high sensitivity

42 / 74

slide-43
SLIDE 43

ISSUES OF DATA DEFINITION: WHAT IS A DIABETIC?

Richesson RL, Rusincovitch SA, Wixted D, Batch BC, Feinglos MN, Miranda ML, Hammond WE, Califg RM, Spratt SE. A Comparison of Phenotyp Defjnitions for Diabetes Mellitus. J Am Med Inf Assoc 2013 (epub ahead of print). http://www.ncbi.nlm.nih.gov/pubmed/24026307

43 / 74

slide-44
SLIDE 44

ISSUES OF DATA DEFINITION: WHAT IS A DIABETIC?

ICD-9 250.x0

  • Expand. ICD-9

ICD-9 & 250.x2 (249.xx, 357.2, Abnormal Diabetes 250.xx (exclude type I) 362.0x, 366.41) HbA1c Glucose OGTT Meds ICD-9 250.xx X CMS CCW X* X* NYC A1c Registry X Meds X DDC X X X X X X SUPREME-DM X* X* X X X X

  • eMERGE

X* X X X * Distinction between Inpatient and Outpatient Visits 44 / 74

slide-45
SLIDE 45

DEFINITION DIFFERENCES

ANY TYPE2 TYPE2unsp 0.6 0.7 0.8 0.9 1.0 0.00 0.01 0.02 0.03 0.04 0.01 0.02 0.03 0.04 0.05 0.01 0.02 0.03 0.04 0.05

1−Specificity (FPF) Sensitivity (TPF)

Authoritative Source 250 A1C CCW DDC4 MED NW SUP A1C_OR_MED

Diabetes Validation Results faceted by Endpoint

Spratt et al. JAMIA 2017

45 / 74

slide-46
SLIDE 46

ADDITIONAL PHENOTYPING CHALLENGES

Death: Need to include external data Disease Incidence: Need to apply ‘burn-in’ periods Censoring: Need to apply ‘burn-out’ periods

46 / 74

slide-47
SLIDE 47

CHALLENGE 3: DATA ARE BOTH LONGITUDINAL AND CROSS-SECTIONAL

EHR data consists of cross-sections of longitudinal data

Most data are stored in datamarts that cover fixed periods of time

Need to use methods for longitudinal data to model updating exposures

We most often use time-varying Cox Models Most analyses don’t account for a patient’s trajectory - just most recent value

Since data are a cross-section no notion of time 0

Define “burn-in” periods to define eligibility Use “burn-out” periods to define censoring

47 / 74

slide-48
SLIDE 48

MULTIPLE MEASUREMENTS PER PERSON

OPPORTUNITIES

Get to observe patient’s evolving health status More frequent visits than a typical longitudinal study Denser visit information

CHALLENGES

Visits are irregularly spaced Different ways to aggregate You don’t know which data are not captured in your dataset

48 / 74

slide-49
SLIDE 49

ASSESSING DATA QUALITY

Weiskopf et al. EGEMS 2017

49 / 74

slide-50
SLIDE 50

TAKE HOMES

Turning EHR data into analytic data is an involved process that requires many choices When analyzing data we may sit downstream of these choices and not get to test their impact The temporal structure of EHR data opens up the need for different analytic methods

50 / 74

slide-51
SLIDE 51

CASE STUDY: ENVIRONMENTAL IMPACTS ON ASTHMA EXACERBATION IN CHILDREN

Research Question: How do the built and natural environments impact children with asthma Study Population: Children (ages 5-18) who have a diagnosis

  • f asthma and live in Durham County

Exposure: Weather, air quality, distance to highway, etc. Outcome: Asthma exacerbations Data Source: Duke EHR data linked with publicly available temporal-spatial data

51 / 74

slide-52
SLIDE 52

ANALYTIC STEPS

1

Abstract all children with asthma within Duke EHR

Defined computable phenotype based on diagnosis codes and medication prescription

2

Identify period of time children had address in Durham County

3

Capture asthma exacerbations based on encounter type (e.g.,

  • utpatient, urgent care, ED, inpatient), diagnosis code and

prescription for rescue medication

4

Link in exposure data based on patient address (spatial factors) and dates of service (temporal factors)

5

Extract other clinical data such as comorbidities, BMI, and laboratory values Data elements used: Demographics, Diagnoses, Medications, Laboratory Test, Vitals, Address, Service Location, Geo-Coded Data

52 / 74

slide-53
SLIDE 53

DISTANCE TO HIGHWAY AND ASTHMA EXACERBATION

53 / 74

slide-54
SLIDE 54

PREDICTING DAILY EXACERBATIONS BASED ON ENVIRONMENTAL FACTORS

54 / 74

slide-55
SLIDE 55

OTHER ANALYSES

Protective effect of Well-Child visits Factors associated with medication escalation Built environment’s impact on asthma exacerbation Impact of COVID exposure and stay at home order on kids with asthma

55 / 74

slide-56
SLIDE 56

Accessing EHR Data at Duke

56 / 74

slide-57
SLIDE 57

EHR DATA AT DUKE

Duke University Health System (DUHS) consists of 3 hospitals (2 in Durham, 1 in Raleigh) and a network of outpatient and specialty clinics The EHR system is managed by Duke Health Technology Solutions (DHTS)

They are responsible for meeting both operations and research data needs

Duke finished switching to an integrated EPIC based system in August 2013 Before this different departments had their own EHR systems Legacy (pre-2014) data are available but may be less reliable

57 / 74

slide-58
SLIDE 58

EHR DATA ARE Big DATA

Since 2014 there have been: >1.7 million Unique Patients >400,000 Inpatient Encounters >27 million Outpatient Encounters These patients each have diagnoses, vital signs, labs, medication

  • rders, etc.

58 / 74

slide-59
SLIDE 59

POPULATION HEALTH AT DUKE

DUHS is the primary provider in Durham County Estimated that ∼ 80% of Durham County residents get health services at DUHS One “hole” is Lincoln Community Health Clinics which services an underserved population

They share an EHR system with DUHS but special permission is needed to access their data.

Durham Neighborhood Compass is a CTSI initiative to use Duke EHR data to inform about public health in Durham County https: //compass.durhamnc.gov/en/ compass/DIABETES_TOTAL/tract

59 / 74

slide-60
SLIDE 60

THE NEED FOR PRE-REQUIREMENTS With Great Power Comes Great Responsibility

Most forms of EHR data contain Protected Health Information (PHI) While data are available for minimal risk research purposes, it is important to protect the identity of patients, many of whom are members of the local community Data can exist as:

Fully Identified - names, DOB, Address, etc Limited Data - minimal PHI (e.g. Dates of Service) Deidentified - no PHI

One should use the minimal amount of PHI necessary for research purposes

60 / 74

slide-61
SLIDE 61

SOME PRE-REQUIREMENTS

CITI IRB Training An active IRB approved by the School of Medicine Data stored in a secured server (PACE) Depending on degree of direct access a DHE (Duke Health) Account

61 / 74

slide-62
SLIDE 62

PROTECTED ANALYTICS COMPUTING ENVIRONMENT (PACE)

HIPAA/FISMA Compliant Virtual Machine (Windows and Linux environments) You can easily get data in but need to go through an honest broker to get data out Cannot connect to internet but preloaded with R, Python, SAS, GIS etc. Each user has resources equivalent to a laptop

Can connect to GPU Machines through Microsoft Azure

Directory structure is based off the approved IRB protocol - effective for project teams to share data and code https://pace.ori.duke.edu/

62 / 74

slide-63
SLIDE 63

3 POINTS OF ACCESS (IN REALITY MORE)

Self Service

GUI Based Tools (DEDUCE) Code Based Tools (CRDM)

Expert Assisted Access - ACE Fee For Service

63 / 74

slide-64
SLIDE 64

3 POINTS OF ACCESS (IN REALITY MORE)

Self Service

GUI Based Tools (DEDUCE) Code Based Tools (CRDM)

Expert Assisted Access - ACE Fee For Service

63 / 74

slide-65
SLIDE 65

DEDUCE

GUI based system to query data - easiest way to access data Build ”cohort” via hierarchical queries and indicate which elements to extract Order of operations can impact the way cohort is built GUI based cannot “run” jobs repeatedly Data go back to the ’90s but data pre-2014 (pre-EPIC data) may be less reliable

64 / 74

slide-66
SLIDE 66

CLINICAL RESEARCH DATAMART (CRDM)

Newer offering - launched earlier this year with support from CHDI, TDH, CTSI, DHTS https://sites.duke.edu/crdm Designed to support clinical research and development of clinical registries Meet the principles that:

Data pulls need to reproducible Provide code based access to broader range of analysts Most data queries don’t need the most up-to-date data Most studies use many of the same data elements (ICDs, Labs, Medications, etc.) Be linkable to other data assets such as department specific datamarts and publicly available contextual data (e.g. neighborhood data, weather etc.)

Active development of new data tables to meet researcher needs, e.g. family linkage table

65 / 74

slide-67
SLIDE 67

ORGANIZATION OF CRDM

Data organized into an extension of the PCORnet Common Data Model (CDM)

PCORnet CDM contains most standard data elements (labs, medication, diagnoses, etc.) Added additional tables such as providers, encounter details etc. Most useful when you don’t need details of the hospital encounters (e.g. no patient bed flow)

Only contains data starting in 2014 (when Duke switched to EPIC system) Data are refreshed and QA’d quarterly (working on daily refresh) Data are in a Oracle database that can be accessed from within PACE via R/Python/SAS

Since code-based system reproducible data pulls

66 / 74

slide-68
SLIDE 68

CRDM STRUCTURE

PCORnet Data Model

Electronic Health Records Data

All patients Structured data elements, curated to match a common data model Metadata Code base Project 1 Project 2 Cohort Department/Specialty Datamarts

e.g., Stork, Transplant, Cardiology

Other datasets

e.g., claims data

Data Sidecars IRB approval Geospatial Data

67 / 74

slide-69
SLIDE 69

ACCESSING THE CRDM

1

Obtain IRB/RPRR approval to access EHR data

2

Get PACE account

3

Fill out RedCap to register project and get initial access https://redcap.duke.edu/redcap/surveys/?s=CFKLE9EKLY

4

Set up your initial Database Connection within PACE

68 / 74

slide-70
SLIDE 70

SOME CRDM USE CASES

Population Health

Mapping pediatric diseases in Durham County Developing a phenotype for patients with NASH & NAFLD

Comparative Effectiveness

Evaluation of opioid prescriptions in post-surgical setting Use and effectiveness of biologic therapies for patients with asthma

Epidemiological Studies

Multifactorial analysis of pediatric asthma outcomes Lipoprotein A testing in patients with cardiovascular disease risk

Predictive Modeling

Developing a risk score for 30-day readmissions Prediction of healthcare utilization for patients with Type 2 diabetes

Registries

Health outcomes and healthcare utilization patterns in pediatric patients with epilepsy Growth trajectories in pediatric patients who are obese

69 / 74

slide-71
SLIDE 71

ACE FEE-FOR-SERVICE

Component of Duke Health Technology Solutions (DHTS) Custom data pulls and data services (e.g. building out datamarts, APIs etc.) Cost and time is largely dependent on complexity of the data domains requested Most useful when you need unstructured data elements or detailed hospital flow data

70 / 74

slide-72
SLIDE 72

COMPARING SERVICES

DEDUCE CRDM ACE FFS Access: GUI Interface Direct SQL Query Work with Informaicist Reproducibility: Moderate High High (need to query in (need to reengage same way) informaticist) Automatable Queries: No Yes Yes Data Elements: Most Structured PCORnet CDM + All Data Elements Duke Specific Side Cars Data Refresh: Daily Quarterly Daily (Working towards daily) Cost: Free Currently Free $ (Creating Cost Recovery Model) Time to Access Data: Always Available Always Available Variable Ideal use: Cohort creation Reproducible data Need to access w/out needing to code extraction “harder” data elements

71 / 74

slide-73
SLIDE 73

NAVIGATING THE SYSTEM: DUKE DATASHARE CATALOG

https://medschool.duke.edu/research/ data-science-information-technology/data/ data-services-catalog

72 / 74

slide-74
SLIDE 74

NAVIGATING THE SYSTEM: EHR ENABLED RESEARCH SUPPORT GROUP (EERS)

Investigators can request a free consultation to review the investigator’s request in order to: Define a technical approach for a project timeline Connect the investigator with Duke resources Refine budget estimates https://ctsi.duke.edu/ ehr-enabled-research-support-eers

73 / 74

slide-75
SLIDE 75

CONCLUSIONS

The Duke EHR system contains a wealth of information on patient health The data present opportunities to study population health within Durham County EHR data can be quite complex both structurally and epidemiologically There are multiple ways to access EHR data at Duke

74 / 74