Environmental Health Science Data Streams Data Streams Health Data - - PowerPoint PPT Presentation

environmental health science data streams data streams
SMART_READER_LITE
LIVE PREVIEW

Environmental Health Science Data Streams Data Streams Health Data - - PowerPoint PPT Presentation

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S. Schwartz, MD, MS January 10 2013 January 10, 2013 When is a data stream not a data stream? When it is health data. EHR data = PHI of health EHR data =


slide-1
SLIDE 1

Environmental Health Science Data Streams Data Streams

Health Data Health Data

Brian S. Schwartz, MD, MS January 10 2013 January 10, 2013

slide-2
SLIDE 2

EHR data = PHI of health

When is a data stream not a data stream? When it is health data.

EHR data = PHI of health system

“Data stream”

IRB approval, data pull (by IT), data transfer (to researchers), data cleaning variable creation (“phenotyping” of patients) data cleaning, variable creation ( phenotyping of patients), data merging, environmental metrics, data analysis (computationally intensive – person, place, time)

slide-3
SLIDE 3

Using EHR Data: An Example

  • Using longitudinal EHR data, how do we know if a

patient has diabetes? p

  • When does observation of the patient begin?

– With EHR data cannot determine “enrollment” (health plan data)

  • When did the patient’s diabetes begin?
  • How do we distinguish type 1 from type 2?

What does it mean if a HbA1c level exists or diabetes

  • What does it mean if a HbA1c level exists, or diabetes

treatment began, before any ICD-9 code for diabetes?

  • How do we define diabetes severity?

y

  • How do we avoid confounding by indication?

– In observational studies of drug effects, drugs are not assigned randomly; indication for treatment may be related to risk of future randomly; indication for treatment may be related to risk of future health outcomes

slide-4
SLIDE 4

The Natural History of Diabetes

Healthy Pre-diabetes Diabetes Complications

100 mg/dL ≤ FBS ≤ 125mg/dL HbA1c (screening) ICD-9 code HbA1c (monitoring) Rx mean duration = 159d mean duration = 117d mean duration = 1534d

HbA1c (pre-therapeutic) 1st ICD-9 diabetes code HbA1c (post-ICD-9) HbA1c (last-ever)

n = 7337 mean = 7.51% n = 17,959 mean = 7.64% mean duration = 1732 days

slide-5
SLIDE 5

NIH Research Collaboratory

Council of Councils Meeting, July 1, 2010 NIH-HMORN Collaboratory: Common Fund Proposal Purpose: The NIH-HMORN Collaboratory will enhance and strengthen Purpose: The NIH HMORN Collaboratory will enhance and strengthen a research platform to accelerate large epidemiology studies, pragmatic clinical trials, and EHR-enabled health care delivery research by leveraging the HMORN's scientific, data and operational y g g , p infrastructure.

  • Limited competition U54 RFP released 2-17-2011, then changed.
  • Duke Clinical Research Institute awarded $9M from NIH to serve as

Duke Clinical Research Institute awarded $9M from NIH to serve as the Coordinating Center for NIH’s Health Care Systems Research Collaboratory – 9/25/12 press release

– “The goal of the Collaboratory is to involve clinicians and patients in the The goal of the Collaboratory is to involve clinicians and patients in the design and interpretation of trials, provide the education needed to enhance the value of their participation, and use the data collected during healthcare delivery as the core data source for the full spectrum

  • f clinical research from registries to obser ational st dies and
  • f clinical research, from registries to observational studies and

pragmatic randomized controlled trials.”

slide-6
SLIDE 6

Virtual Data Warehouse

  • NIH-HMORN Collaboratory included a goal to develop a VDW

– “The objectives of this initiative are to improve data quality; enable cross-site and cross project synergies; balance site-, enable cross site and cross project synergies; balance site , project-, and network-level priorities; and reduce the preparatory work needed to assemble cohorts, count events, and capture exposure and co-morbidity data, all in support of an array of y y different types of studies.”

  • HMO-RN members have been working on a VDW
  • An internal website provides metadata (years variable
  • An internal website provides metadata (years, variable

descriptions, labels, formats, definitions, specification, coding)

  • HMORN has developed guidelines and policies to facilitate

h b t t l id t h b it research, but control resides at each member site

  • Efforts to write programs to extract & convert variables stored in

legacy information systems to common standards; test standardized d t f i t & t d di th d b idi data for consistency & accuracy; standardize methods by providing macros & programs that are used across sites; provide instructions

  • n how to use VDW to create analytic files for research
slide-7
SLIDE 7

The VDW

  • Not a centralized data warehouse; it consists of parallel, identical

databases at each HMORN site, to facilitate merging across sites

  • It is not an analytic dataset, but does facilitate creation of such

y ,

  • As of March 2011, VDW data domains include:

– Demographics: date of birth, gender, race and ethnicity – Enrollment: health plan membership enrollment , with insurance types, benefits, p p yp effective dates of coverage – Encounters: OPT, IPT, with associated diagnosis and procedure codes, type of encounter, provider seen, facility and discharge disposition – Procedures: performed procedures (e g surgery lab radiology immunization); – Procedures: performed procedures (e.g., surgery, lab, radiology, immunization); various coding systems (CPT, HCPCS, ICD‐9, insurance claims Revenue Codes) – Diagnoses: dates, diagnosis codes, provider – Providers: specialty, age, gender, race and year graduated – Cancer/Tumor Registry: Surveillance, Epidemiology and End Results (SEER) program standards – most complex domain of VDW – Pharmacy Dispensing: date, National Drug or GPI code, therapeutic class, days supply, and amount dispensed supply, and amount dispensed – Vital Signs: height, weight, blood pressure, tobacco use and type – Laboratory Values: originally HbA1c, S-Cr, INR, FBG, serum K; values are being added through a timed priority list of 57 types of lab tests

slide-8
SLIDE 8

In multisite studies, site- level differences in disease incidence, predictive variables, and health

  • utcomes can represent:
  • utcomes can represent:
  • True “small area” variation

in practice patterns & p p

  • utcomes
  • Variability in data collection

th d it methods across sites Data quality assessments across sites are a critical across sites are a critical first step in multisite studies

Kahn, et al., Medical Care, 2012

slide-9
SLIDE 9

1)Type of data

  • HEALTH data from electronic health records

At G i i 400K i ti t h d d f illi f

  • At Geisinger, 400K+ primary care patients, hundreds of millions of

records; many kinds of health information 2) What is the current status of data collection/archiving? 2) What is the current status of data collection/archiving?

  • Most patient health information will be collected electronically in the

coming years

  • There is no single repository for US health data

There is no single repository for US health data

  • Health systems most often use programs such as Epic; they then export

data from Epic to a data warehouse for more easy access; and export from the warehouse for analysis y

  • There is no centralized warehouse; there are mechanisms for gaining

access; there is no centralized catalog 3) Non technical aspects of sharing

  • These data are not public; they can be accessed after agreements are in

place, most often in collaborative research relationships; there is no ro tine sharing routine sharing

  • Creating a single national repository of EHR data would be a daunting

task

slide-10
SLIDE 10

4) Standardization in description of the data

  • Many types of data: dates, encounters, diagnoses, ICD-9 and CPT codes,

laboratory test codes with results, procedure test codes sometimes with results, physician orders, medications, imaging

  • Variation across providers, clinics, health systems
  • I do not believe there are as yet many ontology or metadata standards; text

searching is necessary and natural language processing in early development searching is necessary and natural language processing in early development 5) Movement and ability to combine with other data

  • The health data are for INDIVIDUAL patients; individual patients cannot be directly

li k d t f il b linked to family members

  • Health data can be linked to other data by location (generally residential address)

and date (space and time)

  • In general, approaches to analysis of EHR data are on a study by study basis

g , pp y y y y

  • Data have to be accessed, exported, used to create analytic variables, merged with
  • ther data, analyzed
  • Epic has some analysis tools; in general we export data and use biostatistical

software programs software programs 6) Specific example: scientific question limited by integration challenges

  • As long as other data have meaning in space and time there should not be

g g p

  • bstacles to integration
  • Have to acknowledge we may not be able to get what we actually want – we use

surrogates for exposure

slide-11
SLIDE 11

Thank you for listening Thank you for listening

Second Presentation ENDS HERE