If it is in the EHR it must be true Using EHR data for research - - PowerPoint PPT Presentation

if it is in the ehr it must be true
SMART_READER_LITE
LIVE PREVIEW

If it is in the EHR it must be true Using EHR data for research - - PowerPoint PPT Presentation

If it is in the EHR it must be true Using EHR data for research Keith Marsolo, PhD Jareen Meinzen-Derr, PhD Bin Huang, PhD The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS. Outline of Discussion Jareen


slide-1
SLIDE 1 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

If it is in the EHR it must be true

Using EHR data for research

Keith Marsolo, PhD Jareen Meinzen-Derr, PhD Bin Huang, PhD

slide-2
SLIDE 2 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Outline of Discussion

  • Jareen Meinzen-Derr - Epidemiologist

– Introduction to using EHR in research, advantages and methodologic limitations/challenges

  • Keith Marsolo – Informaticist

– Overview of data abstraction and challenges, introduction to large network EHR- based registry (PCORNet)

  • Bin Huang - Biostatistician

– More in-depth look at the challenges and implications from the analysis perspective along with potential solutions and considerations

slide-3
SLIDE 3 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Large-scale electronic health record-based research is more challenging than traditional retrospective studies

slide-4
SLIDE 4 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

A Primer

  • My target population of interest is all children

with autism spectrum disorder (ASD) seen at Cincinnati Children’s

– How do I define ASD?

  • ICD9? ICD10?
  • Age? ASD diagnostic assessments?

– Where is my population?

  • Specific divisions?
  • Any clinic? Inpatient vs. outpatient?
  • With or without follow-up?
slide-5
SLIDE 5 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

A Primer

MRN ICD-9 Clinic Date 0001 299.0 Dev Peds 01/01/2015 0001 299.0 Dev Peds 10/01/2015 0002 299.0 Optho 02/01/2013 0003 315.31 Dev Peds 03/01/2012 0003 299.8 Dev Peds 03/01/2013 0004 348.39 Psych 01/01/2009 0004 348.39 Dev Peds 01/01/2010

Only record in chart Expressive language disorder Do you include previous visit? Static encephalopathy Notes state ASD assessments indicate ASD If I include this code in search, I will receive thousands of records who do not have ASD

slide-6
SLIDE 6 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Electronic Health Records

  • A longitudinal collection of electronic health

information for and about persons

  • Immediate electronic access to person- and

population-level information by authorized, and

  • nly authorized, users
  • Provision of knowledge and decision support

that enhance the quality, safety, and efficacy of patient care

  • Support of efficient processes for healthcare

delivery

IOM

slide-7
SLIDE 7

EHR use in research

  • Surge in the use of EHR (12.2%-2009 to 75.5%- 2014)

– EHR-based outcomes research studies have increased >6- fold

  • Accommodate collection of structured, coded,

electronically available data

– Can be used to build longitudinal histories

  • All access to health records from multiple locations

– Electronic transmission of records

  • More efficient/less expensive alternative to clinical trials
  • Can be used to populate databases for both clinical and

research purposes

slide-8
SLIDE 8 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Great Opportunities

  • Quality improvement purposes

– Facilitate data sharing, decision-making, efficient administrative operations

  • Recruiting for prospective studies/clinical trials
  • Public health initiatives

– Facilitate surveillance of infectious diseases, disease

  • utbreaks, chronic illnesses
  • Replicating results of randomized controlled trials
  • Conduct “Big Data” research

– Rich data to study disease progress, health disparities, clinical outcomes, treatment effectiveness, efficacy of public health interventions

slide-9
SLIDE 9 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

How do I Begin? JUST LIKE YOU WOULD ANY OTHER OBSERVATIONAL RESEARCH STUDY THAT USES SECONDARY DATA

slide-10
SLIDE 10 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Shifts in primary responsibility

Study design protocol

Create Data Collection Tool Manual data abstraction & entry Manual verify missing & erroneous data Data management Data analysis

Study design protocol

Electronic Data Abstraction

Data management verify data

Manual verification

Data analysis

Clinical researcher Methodologist should be engaged throughout Methodologist Clinical researcher Clinical researcher Methodologist

slide-11
SLIDE 11 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

How do I Begin?

  • What is your research question?

– Is it descriptive vs. analytic? – Does it have a clear testable hypothesis?

  • What are the appropriate study designs?
  • Is the information needed to answer question

present, accessible, & reliable in the EHR?

  • How will you extract and analyze the

information?

– What are additional data management and methods needs?

slide-12
SLIDE 12 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Before you begin

  • Crucial to develop criteria for identifying

patients who have condition to be studied

– Data may need to be searched from problem lists, billing codes, medication lists, physical exam results across any/all possible clinic sources – Must identify how long a patient has had a problem – Develop processes for solving issues such as identification of first diagnosis

  • Study subjects are patients, not participants

– Part of an “open-cohort” and enter or leave at any time

slide-13
SLIDE 13 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Have an Awareness

  • Known limitations of EHR data must be

considered

– In the study design – In the data collection/abstraction – In the data analysis – In the interpretation

  • Consequences can include:

– Flawed conclusions – Altered policy decision or clinical practice

slide-14
SLIDE 14 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

EHRs are designed for clinical care, not research

  • Not structured in a way that facilitates research

– Providers decide where to put information – Information may be entered free-text (not structured or finite list) – Providers use different terms for same info – Information not always stored in a way that is readily searchable – Data not important in clinical care may be missing

slide-15
SLIDE 15

Awareness: Poor Data Quality

  • Quality variable due to differences in measurement,

recording, information systems, and clinical focus

  • Serious threat to validity and generalizability of clinical

research findings

  • Context dependent

– Same elements deemed high quality for one use and poor quality for different use

  • Presence of extreme values may be irrelevant in

determining a median rough estimate of #eligible patients for study

  • Same extreme values may have significant undue

influence on results of algorithms, or analytic methods sensitive to extreme values

slide-16
SLIDE 16 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Incomplete Data

  • Due to fragmentation of healthcare systems

– Patients moving between systems for special referrals

  • r emergency care
  • Due to “poor”/inaccurate documentation (on the

part of patients and healthcare providers)

– Lack essential information such as treatment

  • utcomes
  • Sick patients often have more data

– Non-random missing

  • Complete information about patient vs. complete

information about patient’s encounter

slide-17
SLIDE 17 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Examples in the literature

  • 30-40% of patients have clinical visits across

multiple institutions

  • 55% of clinical research studies supplemented

with non-EHR sources of data

– 40% supplemented with patient-reported data

  • 49% of patients with ICD-9 pancreatic cancer

did not have corresponding pathology documentation (incomplete or incorrect)

Bourgeouis 2010; Finnell 2011; Thiru 2003; Dean 2009; Botsis 2010

slide-18
SLIDE 18

“Sicker” Have More Data

Figure 5. Average number of days with data per patient by ASA class. For both medication orders and laboratory results, all ASA Classes are significantly different except for Classes 1 and 2. Rusanov 2014

slide-19
SLIDE 19 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Sicker have more complete data

Figure 4. Complete records by ASA Class where complete records are those having at least seven values in each of the two categories (medication orders and laboratory results).

slide-20
SLIDE 20 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Data Quality

  • Data entry errors

– Reported as high as 26.9% Goldberg 2008 – Medication discrepancies common

  • Data coding, standardization, extraction

– Free text narrative – Inconsistent terms, phrases, abbreviations – Billing purposes – Diagnostic codes may be recorded for detection

  • r “rule out” purposes

Meredith L et al. 2008

slide-21
SLIDE 21 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Study Design Still Matters

  • Errors can occur in selecting a cohort and

characterizing that cohort

  • Errors in a small number of cases can have a

relatively large effect on outcomes

  • Manual review of cases or a sample of cases is

invaluable in improving the sample

  • May be difficult to find “healthy” patients with

sufficient data (comparison cohorts)

  • Requires special methodologic approaches to

selecting complete patient records from EHR databases while avoiding bias

Hripcsak 2011; Weiskopf 2013

slide-22
SLIDE 22 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Impact of Data Errors

Hripcsak 2011

slide-23
SLIDE 23 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Bias Challenges

Selection bias: Subset of individuals studied is not representative of the population of interest

– Selection is not random

  • Can distort assessments of measures (e.g.

disease prevalence or exposure risk)

  • Estimates not as generalizable

– Ex: Including only patients with complete data – Ex: Generalizing findings from a hospital-based study to all who may have a condition

slide-24
SLIDE 24 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Bias Challenges

Measurement bias: Errors in measurement and/or data collection

  • Instrument calibration
  • Data collection variability – depending on the field,

clinician judgement plays a role

  • Patient’s ability to complete assessment/provide

history (recall)

  • Use of certain codes/data to measure exposure
  • Clinician decides how long to “follow” patient

– Impact calculation of prevalence, incidence, risk ratios

slide-25
SLIDE 25 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Confounding

  • Distortion of the estimated effect of exposure
  • n outcome caused by the presence of an

extraneous factor associated with both exposure and outcome

– SES factors, lifestyle choices, age

  • Without consideration, estimated effect of

treatment may be actually caused by some

  • ther factor
slide-26
SLIDE 26 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Confounding Can Hurt

  • EHR study of hospitalized patients >65

years, NSAID use associated with 32% mortality risk reduction

  • However, after included additional specific

confounders and analytical techniques, NSAID use associated with 6% mortality risk increase

– Addressed unmeasured confounding

Sturmer 2005

slide-27
SLIDE 27 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Confounding

  • Confounding by indication

– Treatment choices influenced by severity or duration of patient’s disease – Also influences outcome of treatment – Sicker patients receive different treatments – Sicker patients have different (worse) outcomes

  • Cannot be adjusted for in conventional

regression analyses

slide-28
SLIDE 28 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

From EHR to Clinical Evidences

EHR recorded at the point of care Data Extractions Data Wrangling Data Curation Data Analyses Causal Inference Decision Theory Evidence Based Decision

slide-29
SLIDE 29 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

EHR data – entry to extract

slide-30
SLIDE 30 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Sources of variability – data entry

*partial list

slide-31
SLIDE 31 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Sources of variability – ETL

*partial list

slide-32
SLIDE 32 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Sources of variability – User request

*partial list

slide-33
SLIDE 33 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Sources of variability - analyst

*partial list

slide-34
SLIDE 34 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Sources of variability – self-service tools

*partial list

slide-35
SLIDE 35 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Why is this so complicated?

  • Conceptual idea of clinical process does not

translate to how data are captured in the EHR

  • Many different ways to document same piece of

information

– Workflow used to collect data often dictates where those elements are stored in reporting database – Most researchers lack understanding of these workflows

  • Quality of results then depend on how question is

asked, skill of analyst

slide-36
SLIDE 36 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Why is this so complicated?

  • Conceptual idea of clinical process does not

translate to how data are captured in the EHR

  • Many different ways to document same piece of

information

– Workflow used to collect data often dictates where those elements are stored in reporting database – Most researchers lack understanding of these workflows

  • Quality of results then depend on how question is

asked, skill of analyst

slide-37
SLIDE 37 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Example – encounters (CCHMC FY14)

  • Annual Report

– Total patient encounters: ~1.2 million – ED visits: ~100K – Admissions (including short stay): ~31K – Outpatient: ~1 million

  • EHR

– Total patient encounters: ~3 million – ED admissions to inpatient: ~145K – Inpatient: ~28K – Ambulatory: ~2.8 million

  • Encounter != encounter
slide-38
SLIDE 38 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Just pull data from “ambulatory” encounters…

EEG EXERCISE CARDIOLOGY TESTING PUMP/CGM INITIATION ORDERS MED TAPER SCHEDULE GENETIC COUNSELOR NEONATOLOGY TESTING CARE CONFERENCE - PATIENT/FAMILY PRESENT HOME VISIT - PALLIATIVE CARE ABUSE REPORTING CARE COORDINATOR SPECIAL NEEDS SUMMARY EARLY INTERVENTION HI NEURODEVELOPMENTAL CLINIC TRACKING INFUSION ORDERS ENT CLINIC VISITS FEES/VOICE HEPATOBLASTOMA LIVER TRANSPLANT FOLLOW UP PRE-ADOPTION ENCOUNTER EB PLANNING FEES CLINIC VPI - ENT/SPEECH INTAKE HVMC PLANNING PRE-OP PHYSICAL PLAN OF CARE ENT INPATIENT VISIT HOSPITAL TO HOSPITAL TRANSFER DEVELOPMENTAL TESTING BIOETHICS CONSULT ENDO STIM TESTING HIM INTERFACE CREATED SURGICAL SITE INFECTION DERM PATCH TESTING INTAKE CONSULT ADEC INTAKE CPST-PSY ENCOUNTER ECONSULT TELEMEDICINE ROADMAP HOSPITAL ENCOUNTER UPDATE PCP/CLINIC CHANGE WAIT LIST CLERICAL ORDERS MOTHER BABY LINK LACTATION ENCOUNTER CANCELED APPOINTMENT SURGERY ANESTHESIA ANESTHESIA EVENT UNMERGE HEALTH MAINTENANCE LETTER PATIENT EMAIL E-VISIT MOBILE ORDER ONLY QUESTIONNAIRE SERIES SUBMISSION PATIENT OUTREACH CONTACT MOVED NURSE TRIAGE E-CONSULT E-CONSULT COMMUNITY ORDER TELEMEDICINE EXTERNAL CONTACT OPHTH EXAM HOSPICE ADMISSION HOME HEALTH ADMISSION HOME CARE VISIT HOME CARE UPDATE PATIENT WEB UPDATE COMMUNITY ORDERS COMMITTEE REVIEW POST MORTEM DOCUMENTATION BILLING ENCOUNTER HOSPITAL CONFIDENTIAL OPH TESTING EDUCATOR VOICE CLINIC TELEPHONE REGISTRATION EMPTY LAB REQUISITION INITIAL CONSULT ANTI-COAG VISIT PROCEDURE VISIT OFFICE VISIT CONSENT FORM SCREENING FORM EXTERNAL HOSPITAL ADMISSION LETTER (OUT) REFILL IMMUNIZATION HISTORY RESEARCH ENCOUNTER REFERRAL ORDERS ONLY RX REFILL AUTHORIZE MEDS ONLY (WEB) MEDS VOID (WEB) RESOLUTE PROFESSIONAL BILLING HOSPITAL PROF FEE EPISODE CHANGES ANCILLARY ORDERS PHARMACY VISIT BPA ROUTINE PRENATAL INITIAL PRENATAL OPHTH OFFICE VISIT ABSTRACT WALK-IN TREATMENT PLAN ALLIED HEALTH NURSE ONLY SOCIAL WORK NUTRITION PHYSICAL THERAPY OCCUPATIONAL THERAPY SPEECH THERAPY RESPIRATORY THERAPY CASE MANAGEMENT EDUCATION SURGICAL H&P CLINICAL SUPPORT MEDS ONLY / E - PRESCRIBE PFT ONLY TRANSPLANT PRE-EVALUATION TRANSPLANT EVALUATION TRANSPLANT FOLLOW-UP TRANSPLANT RESULTS ENTRY IMMUNOTHERAPY ALLERGY TESTING SPECIMEN COLLECTION AUTO RELEASE ORDERS URODYNAMIC TESTING PRE-NATAL CONSULT CHECKLIST BOWEL MANAGEMENT CARE CONFERENCE INTAKE/TRIAGE VNS REPROGRAM/SHUTOFF CLINICAL NOTE GENETICS PASTORAL THERAPY VISIT INTAKE - NEW PATIENT HIM SCANS PRE-VISIT PLANNING TRANSCRIBED ORDERS SCHOOL TEACHER/INTERVENTION CHILD LIFE THERAPY PROGRESS SUMMARY BRONCHOSCOPY REQUEST HEMONC SOCIAL WORK AUD CONSULT OPH CONSULT ALG CONSULT UROLOGY COMPLEX INTAKE
slide-39
SLIDE 39 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Give me all data for element X…

slide-40
SLIDE 40 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Why is this so complicated?

  • Conceptual idea of clinical process does not

translate to how data are captured in the EHR

  • Many different ways to document same piece of

information

– Workflow used to collect data often dictates where those elements are stored in reporting database – Most researchers lack understanding of these workflows

  • Quality of results then depend on how question is

asked, skill of analyst

slide-41
SLIDE 41 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

EHRs are constantly evolving

  • New functionality is released & workflows

change over time

– Clinician-entered – Patient entry via welcome kiosk – Patient entry via web-based questionnaire

  • These workflows are typically additive, not

substitutive

– Need to remember this history – Will otherwise result in gaps in population

slide-42
SLIDE 42 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Example – Has a HEALTH RELATED QUALITY OF LIFE (QOL) ASSESSMENT been documented?

  • Flowsheet RHE PEDS QL #129, Measure RHE PARENT #3757
  • Flowsheet RHE PEDS QL #129, Measure RHE PATIENT #1799
  • Flowsheet RHE PEDS QL #129, Measure GEN PATIENT #3758
  • Flowsheet RHE PEDS QL #129, Measure GEN PARENT#3759
  • Questionnaire RHE PEDSQL 13-18 TEEN REPORT #20702, Question RHE PEDSQL 13-18 CHILD TOTAL SCORE #400411
  • Questionnaire RHE PEDSQL 13-18 PARENT REPORT FOR TEENS #20703, Question: RHE PEDSQL 13-18 PARENT TOTAL SCORE

#20544

  • Questionnaire RHE PEDSQL 2-4 PARENT REPORT FOR TODDLERS #20699, Question: RHE PEDSQL 2-4 PARENT TOTAL SCORE

#400415

  • Questionnaire RHE PEDSQL 5-7 PARENT REPORT FOR YOUNG CHILDREN #20700, Question: RHE PEDSQL 5-7 PARENT TOTAL

SCORE #400421

  • Questionnaire RHE PEDSQL 5-7 YOUNG CHILD REPORT #20701, Question: RHE PEDSQL 5-7 CHILD TOTAL SCORE #400427
  • Questionnaire RHE PEDSQL 8-12 PARENT REPORT FOR CHILDREN #20706, Question: RHE PEDSQL 8-12 PARENT TOTAL

SCORE#400439

  • Questionnaire RHE PEDSQL 8-12 CHILD REPORT #20705, Question: RHE PEDSQL 8-12 CHILD TOTAL SCORE #400433
  • Questionnaire PEDSQL GENERIC 1-12MOS PARENT REPORT FOR INFANTS #20758, Question: PEDSQL 1-12MOS TOTAL SCORE

#400280

  • Questionnaire PEDSQL GENERIC 13-18 TEEN REPORT #20745, Question: PEDSQL 13-18C TOTAL SCORE #400163
  • Questionnaire PEDSQL GENERIC 13-18 PARENT REPORT FOR TEENS #20686, Question: PEDSQL 13-18P TOTAL SCORE #400158
  • Questionnaire PEDSQL GENERIC 13-24MOS PARENT REPORT FOR INFANTS #20759, Question: PEDSQL 13-24MOS TOTAL

SCORE #100857

  • Questionnaire PEDSQL GENERIC 18-25 YOUNG ADULT REPORT #20684, Question: PEDSQL 18-25C TOTAL SCORE #400183
  • Questionnaire PEDSQL GENERIC 2-4 PARENT REPORT FOR TODDLERS #20688, Question: PEDSQL 2-4P TOTAL SCORE #400188
  • Questionnaire PEDSQL GENERIC 5-7 PARENT REPORT FOR YOUNG CHILDREN #20689, Question: PEDSQL 5-7P TOTAL SCORE

#400153

  • Questionnaire PEDSQL GENERIC 5-7 YOUNG CHILD REPORT #20683, Question: PEDSQL 5-7C TOTAL SCORE #400178
  • Questionnaire PEDSQL GENERIC 8-12 PARENT REPORT FOR CHILDREN #20687, Question: PEDSQL 8-12P TOTAL SCORE

#400173

  • Questionnaire PEDSQL GENERIC 8-12 CHILD REPORT #20685, Question: PEDSQL 8-12C TOTAL SCORE #400168
slide-43
SLIDE 43 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Are there any solutions?

  • Engagement with operational reporting groups /

data stewards

– Often serve as source of truth for a given area – Deal with much higher request volume – However – different priorities, funding models – can be difficult to keep activities aligned

  • Quality checks / Data characterization

– Should help identify if there is a problem – But not necessarily where to look for the solution – Difficult to communicate/disseminate findings

slide-44
SLIDE 44 The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS.

Are there any solutions?

  • Engagement with operational reporting groups /

data stewards

– Often serve as source of truth for a given area – Deal with much higher request volume – However – different priorities, funding models – can be difficult to keep activities aligned

  • Quality checks / Data characterization

– Should help identify if there is a problem – But not necessarily where to look for the solution – Difficult to communicate/disseminate findings

slide-45
SLIDE 45

Assessing Data Quality within PCORnet

slide-46
SLIDE 46
slide-47
SLIDE 47

Enabling Research at a National Scale

How do you ask a research question at hundreds

  • f institutions and get back results you can trust?

Option 1 — Write a description and have everyone create a local implementation to run on their data Option 2 — Create an algorithm that can run against a single, common data model

slide-48
SLIDE 48

PCORnet Data Strategy

Standardize data into a common data model Focus on data quality: data curation Operate a secure distributed query infrastructure

  • Develop re-usable tools to query the data
  • Send questions to the data and only return

required information Learn by doing and repeat

slide-49
SLIDE 49

Loading the Common Data Model (easy)

Same data are represented differently at different institutions (e.g., Race)

Common Data Model Value Set

01 = American Indian or Alaska Native 02 = Asian 03 = Black or African American 04 = Native Hawaiian or Other Pacific Islander 05 = White 06 = Multiple Race 07 = Refuse to Answer NI = No Information UT = Unknown OT = Other

In order to be able to trust results of an analysis, we need to have consistent representations

Common Data Model Value Set

01 = American Indian or Alaska Native 02 = Asian 03 = Black or African American 04 = Native Hawaiian or Other Pacific Islander 05 = White 06 = Multiple Race 07 = Refuse to Answer NI = No Information UT = Unknown OT = Other SITE 1

Caucasian African American Asian Multiple Race Blank

SITE 2

101 201 300 401 500 600

SITE 3

African American American Indian Asian American White Other Unknown

SITE 1

Caucasian African American Asian Multiple Race Blank

SITE 2

101 201 300 401 500 600

SITE 3

African American American Indian Asian American White Other Unknown

22

slide-50
SLIDE 50

Loading the Common Data Model (less easy)

Same data are represented differently at different institutions (e.g., Type of Encounter)

In order to be able to trust results

  • f an analysis, we need to have

consistent representations

Common Data Model

Ambulatory Visit (AV) Emergency Department (ED) ED Admit to Inpatient (EI) Inpatient Hospital (IP) Non-Acute Inst. Stay (IS) Other Ambulatory (OA) Other (OT) Unknown (UN) No Information (NI) SITE 1 Social Work Visit Allied Health Office Visit Nurse Visit Procedure Visit Employee Health Vascular Lab Sleep Study Visit Social Work Visit SITE 2 Office Visit Specimen Postpartum Visit Clinical Support Initial Prenatal SITE 3 Home Care Visit Office Visit Therapy Visit Orders Only Cardiology Testing Hospital Encounter

21

slide-51
SLIDE 51

Factors that increase complexity

People interpret the CDM specification differently, resulting in variability in how CDM is populated Different health systems, with different EHRs, implemented at different times Clinical workflows differ across institutions & impact availability of data Understanding of EHR / claims data sources differs across institutions – may impact what gets loaded from source systems All of these issues are present when doing research with EHR data, even within a single center

51

slide-52
SLIDE 52

We have tools/processes to address this!

Data Curation assesses and improves global data quality

  • Characterize the contents of the PCORnet CDM
  • Evaluate global data quality and fitness-for-use across a broad

research portfolio

For a given study, still need to consider data characterization specific to the aims

  • Assess data on the intended cohort
  • Ensure that outcomes / variables of interest are available & complete
  • Determine whether partners actually have enough data / patients to

participate

  • Requires upfront investment, but can save significant time overall

52

slide-53
SLIDE 53

Data Curation

slide-54
SLIDE 54

Data curation

Step 1 Network partner plans DataMart refresh Step 2 Network partner responds to the data characterization query package Step 3 Coordinating Center approves the DataMart Step 4 Coordinating Center analyzes results and solicits more information as needed Step 5 Coordinating Center holds Data Characterization and Implementation Forums and updates Implementation Guidance

54

slide-55
SLIDE 55

Cycle 2 Required Data Checks

55

Category Data Check Description Data Model Conformance DC 1.01 Required tables are not present DC 1.02 Expected tables are not populated DC 1.03 Required fields are not present DC 1.04 Fields do not conform to CDM specifications for data type, length, or name. DC 1.05 Tables have primary key definition errors DC 1.06 Fields contain values outside of CDM specifications DC 1.07 Fields have non-permissible missing values DC 1.08 Tables contain orphan PATIDs (PATIDs not in DEMOGRAPHIC) DC 1.10 Replication errors between the ENCOUNTER, PROCEDURES and DIAGNOSIS tables Data Completeness DC 3.04 Less than 50% of patients with encounters have DIAGNOSIS records DC 3.05 Less than 50% of patients with encounters have PROCEDURES records

slide-56
SLIDE 56

Cycle 2 Investigative Data Checks

Category Data Check Data Check Description

Data Model Conformance DC 1.09 Tables have orphan ENCOUNTERIDs for more than 5% of records. Data Plausibility DC 2.01 More than 5% of records have future dates. DC 2.02 More than 10% of records fall into the lowest or highest categories of age, height, weight, diastolic blood pressure, systolic blood pressure, prescribed days supply, or dispensed days supply DC 2.03 More than 5% of records have illogical date relationships. DC 2.04 The average number of encounters per visit is > 2.0 for inpatient (IP), emergency department (ED), or ED to inpatient (EI) encounters Data Completeness DC 3.01 The average number of diagnoses records with known diagnosis types per encounter is below threshold [1.0 for ambulatory (AV), inpatient (IP), emergency department (ED), or ED to inpatient (EI) encounters]. DC 3.02 The average number of procedure records with known procedure types per encounter is below threshold [0.75 for ambulatory (AV) encounters, 0.75 for emergency department (ED) encounters, 1.00 for ED to inpatient (EI) encounters, and 1.00 for inpatient (IP) encounters DC 3.03 More than 10% of records have missing or unknown values for the following fields: BIRTH_DATE, SEX, DISCHARGE_DISPOSITION (IP/EI encounters only), DISCHARGE_DATE (IP/EI encounters only), PX_DATE, LOINC, RX_NORM_CUI, RX_ORDER_DATE, RX_DAYS_SUPPLY, or DISPENSE_SUP DC 3.06 More than 10% of inpatient (IP) or ED to inpatient (EI) encounters with a diagnosis don't have a principal diagnosis

56

Data partners are asked to investigate and comment on any exceptions in their Annotated Data Dictionary, and to classify these exceptions as follows: feature/limitation of source data; could be improved in the near future; may be improved in the future;

  • r warrants further investigation.
slide-57
SLIDE 57

Resources for Network Partners

Empirical Data Characterization Report Excerpt

57

slide-58
SLIDE 58

Resources for Network Partners

Empirical Data Characterization Report Excerpt

58

slide-59
SLIDE 59

Study-specific data quality

slide-60
SLIDE 60

Antibiotics study overview

Study Aims: To evaluate the comparative effects of different types, timing, and amount of antibiotics prescribed during the first 2 years of life on:

  • Body mass index and risk of obesity at 5 and 10 years
  • Growth trajectories from infancy onwards

And how these effects differ according to:

  • Child sex, race/ethnicity, geography
  • Use of other medications
  • Maternal BMI, antibiotics during pregnancy, C-section (analysis at 7

sites) Conducted study-specific data characterization to assess site eligibility:

  • Findings for prescriptions
  • RxNorm considerations

60

slide-61
SLIDE 61

Study-specific data characterization findings

Lower number of children ≤ 2 with an antibiotics prescription Start minus end date

  • Low percent missing (~5%)
  • Note: This is very different than global measures (highly missing)
  • May be useful: 50th percentile = 10 days
  • Huge range (5th percentile = 0 days ; 95th percentile = 108 days)

Quantity

  • Varying interpretations of quantity (pill, mg, ml, etc.)
  • Large range (5th percentile = 11.00; 95th percentile = 225.50)
  • Missing in 52% of ABX prescriptions

Refills - not consistently populated (60% missing) Days supply - only populated in 4% of ABX prescribing records

61

slide-62
SLIDE 62

Study-specific data characterization findings

Initial query only included RxNorm Dose Form and Clinical Drug or Pack

  • Specific codes that allow identification of all aspects of the

prescription (>2000 codes)

  • Did not include less specific codes: RxNorm Ingredient, Precise

Ingredient, or Drug Component Learned that several network partners had not mapped to the specific codes

  • Had to ask network partners to map to the specific codes
  • Assess whether to include ingredient-level records in the

analysis

62

slide-63
SLIDE 63

What RXCUI term types are used?

Categorization of term types

63

Category-1

  • 1. Semantic Clinical Drug-SCD
  • 2. Semantic Branded Drug-SBD
  • 3. Generic Pack-GPCK
  • 4. Branded Name Pack-BPCK

Category-2

  • 1. Semantic Clinical Drug Form-SCDF
  • 2. Semantic Branded Drug Form-SBDF
  • 3. Semantic Clinical Dose Form Group-SCDG
  • 4. Multiple Ingredients-MIN
  • 5. Precise Ingredient-PIN
  • 6. Ingredient-IN
  • 7. Semantic Branded Drug Component-SBDC
  • 8. Semantic Clinical Drug Component-SCDC

Category-3

  • 1. Branded Name-BN
  • 2. Semantic Branded Dose Form Group-SBDG
  • 3. Dose Form Group-DFG
  • 4. Dose Form-DF

Category-1

(Ingredient + Strength + Dose Form)

Category-2

  • 1. Ingredient
  • 2. Ingredient + Strength
  • 3. Ingredient+ Dose Form

Category-3

1. Brand Name 2. Dose Form

slide-64
SLIDE 64

RXCUI Term Types Distribution by Category and DataMart

64

Network ID

Data Mart ID

slide-65
SLIDE 65

From EHR to Clinical Evidences

EHR recorded at the point of care Data Extractions Data Wrangling Data Curation Data Analyses Causal Inference Decision Theory Evidence Based Decision