If it is in the EHR it must be true Using EHR data for research - - PowerPoint PPT Presentation
If it is in the EHR it must be true Using EHR data for research - - PowerPoint PPT Presentation
If it is in the EHR it must be true Using EHR data for research Keith Marsolo, PhD Jareen Meinzen-Derr, PhD Bin Huang, PhD The Clinical and Translational Science Awards (CTSA) is a registered trademark of DHHS. Outline of Discussion Jareen
Outline of Discussion
- Jareen Meinzen-Derr - Epidemiologist
– Introduction to using EHR in research, advantages and methodologic limitations/challenges
- Keith Marsolo – Informaticist
– Overview of data abstraction and challenges, introduction to large network EHR- based registry (PCORNet)
- Bin Huang - Biostatistician
– More in-depth look at the challenges and implications from the analysis perspective along with potential solutions and considerations
Large-scale electronic health record-based research is more challenging than traditional retrospective studies
A Primer
- My target population of interest is all children
with autism spectrum disorder (ASD) seen at Cincinnati Children’s
– How do I define ASD?
- ICD9? ICD10?
- Age? ASD diagnostic assessments?
– Where is my population?
- Specific divisions?
- Any clinic? Inpatient vs. outpatient?
- With or without follow-up?
A Primer
MRN ICD-9 Clinic Date 0001 299.0 Dev Peds 01/01/2015 0001 299.0 Dev Peds 10/01/2015 0002 299.0 Optho 02/01/2013 0003 315.31 Dev Peds 03/01/2012 0003 299.8 Dev Peds 03/01/2013 0004 348.39 Psych 01/01/2009 0004 348.39 Dev Peds 01/01/2010
Only record in chart Expressive language disorder Do you include previous visit? Static encephalopathy Notes state ASD assessments indicate ASD If I include this code in search, I will receive thousands of records who do not have ASD
Electronic Health Records
- A longitudinal collection of electronic health
information for and about persons
- Immediate electronic access to person- and
population-level information by authorized, and
- nly authorized, users
- Provision of knowledge and decision support
that enhance the quality, safety, and efficacy of patient care
- Support of efficient processes for healthcare
delivery
IOM
EHR use in research
- Surge in the use of EHR (12.2%-2009 to 75.5%- 2014)
– EHR-based outcomes research studies have increased >6- fold
- Accommodate collection of structured, coded,
electronically available data
– Can be used to build longitudinal histories
- All access to health records from multiple locations
– Electronic transmission of records
- More efficient/less expensive alternative to clinical trials
- Can be used to populate databases for both clinical and
research purposes
Great Opportunities
- Quality improvement purposes
– Facilitate data sharing, decision-making, efficient administrative operations
- Recruiting for prospective studies/clinical trials
- Public health initiatives
– Facilitate surveillance of infectious diseases, disease
- utbreaks, chronic illnesses
- Replicating results of randomized controlled trials
- Conduct “Big Data” research
– Rich data to study disease progress, health disparities, clinical outcomes, treatment effectiveness, efficacy of public health interventions
How do I Begin? JUST LIKE YOU WOULD ANY OTHER OBSERVATIONAL RESEARCH STUDY THAT USES SECONDARY DATA
Shifts in primary responsibility
Study design protocol
Create Data Collection Tool Manual data abstraction & entry Manual verify missing & erroneous data Data management Data analysis
Study design protocol
Electronic Data Abstraction
Data management verify data
Manual verification
Data analysis
Clinical researcher Methodologist should be engaged throughout Methodologist Clinical researcher Clinical researcher Methodologist
How do I Begin?
- What is your research question?
– Is it descriptive vs. analytic? – Does it have a clear testable hypothesis?
- What are the appropriate study designs?
- Is the information needed to answer question
present, accessible, & reliable in the EHR?
- How will you extract and analyze the
information?
– What are additional data management and methods needs?
Before you begin
- Crucial to develop criteria for identifying
patients who have condition to be studied
– Data may need to be searched from problem lists, billing codes, medication lists, physical exam results across any/all possible clinic sources – Must identify how long a patient has had a problem – Develop processes for solving issues such as identification of first diagnosis
- Study subjects are patients, not participants
– Part of an “open-cohort” and enter or leave at any time
Have an Awareness
- Known limitations of EHR data must be
considered
– In the study design – In the data collection/abstraction – In the data analysis – In the interpretation
- Consequences can include:
– Flawed conclusions – Altered policy decision or clinical practice
EHRs are designed for clinical care, not research
- Not structured in a way that facilitates research
– Providers decide where to put information – Information may be entered free-text (not structured or finite list) – Providers use different terms for same info – Information not always stored in a way that is readily searchable – Data not important in clinical care may be missing
Awareness: Poor Data Quality
- Quality variable due to differences in measurement,
recording, information systems, and clinical focus
- Serious threat to validity and generalizability of clinical
research findings
- Context dependent
– Same elements deemed high quality for one use and poor quality for different use
- Presence of extreme values may be irrelevant in
determining a median rough estimate of #eligible patients for study
- Same extreme values may have significant undue
influence on results of algorithms, or analytic methods sensitive to extreme values
Incomplete Data
- Due to fragmentation of healthcare systems
– Patients moving between systems for special referrals
- r emergency care
- Due to “poor”/inaccurate documentation (on the
part of patients and healthcare providers)
– Lack essential information such as treatment
- utcomes
- Sick patients often have more data
– Non-random missing
- Complete information about patient vs. complete
information about patient’s encounter
Examples in the literature
- 30-40% of patients have clinical visits across
multiple institutions
- 55% of clinical research studies supplemented
with non-EHR sources of data
– 40% supplemented with patient-reported data
- 49% of patients with ICD-9 pancreatic cancer
did not have corresponding pathology documentation (incomplete or incorrect)
Bourgeouis 2010; Finnell 2011; Thiru 2003; Dean 2009; Botsis 2010
“Sicker” Have More Data
Figure 5. Average number of days with data per patient by ASA class. For both medication orders and laboratory results, all ASA Classes are significantly different except for Classes 1 and 2. Rusanov 2014
Sicker have more complete data
Figure 4. Complete records by ASA Class where complete records are those having at least seven values in each of the two categories (medication orders and laboratory results).
Data Quality
- Data entry errors
– Reported as high as 26.9% Goldberg 2008 – Medication discrepancies common
- Data coding, standardization, extraction
– Free text narrative – Inconsistent terms, phrases, abbreviations – Billing purposes – Diagnostic codes may be recorded for detection
- r “rule out” purposes
Meredith L et al. 2008
Study Design Still Matters
- Errors can occur in selecting a cohort and
characterizing that cohort
- Errors in a small number of cases can have a
relatively large effect on outcomes
- Manual review of cases or a sample of cases is
invaluable in improving the sample
- May be difficult to find “healthy” patients with
sufficient data (comparison cohorts)
- Requires special methodologic approaches to
selecting complete patient records from EHR databases while avoiding bias
Hripcsak 2011; Weiskopf 2013
Impact of Data Errors
Hripcsak 2011
Bias Challenges
Selection bias: Subset of individuals studied is not representative of the population of interest
– Selection is not random
- Can distort assessments of measures (e.g.
disease prevalence or exposure risk)
- Estimates not as generalizable
– Ex: Including only patients with complete data – Ex: Generalizing findings from a hospital-based study to all who may have a condition
Bias Challenges
Measurement bias: Errors in measurement and/or data collection
- Instrument calibration
- Data collection variability – depending on the field,
clinician judgement plays a role
- Patient’s ability to complete assessment/provide
history (recall)
- Use of certain codes/data to measure exposure
- Clinician decides how long to “follow” patient
– Impact calculation of prevalence, incidence, risk ratios
Confounding
- Distortion of the estimated effect of exposure
- n outcome caused by the presence of an
extraneous factor associated with both exposure and outcome
– SES factors, lifestyle choices, age
- Without consideration, estimated effect of
treatment may be actually caused by some
- ther factor
Confounding Can Hurt
- EHR study of hospitalized patients >65
years, NSAID use associated with 32% mortality risk reduction
- However, after included additional specific
confounders and analytical techniques, NSAID use associated with 6% mortality risk increase
– Addressed unmeasured confounding
Sturmer 2005
Confounding
- Confounding by indication
– Treatment choices influenced by severity or duration of patient’s disease – Also influences outcome of treatment – Sicker patients receive different treatments – Sicker patients have different (worse) outcomes
- Cannot be adjusted for in conventional
regression analyses
From EHR to Clinical Evidences
EHR recorded at the point of care Data Extractions Data Wrangling Data Curation Data Analyses Causal Inference Decision Theory Evidence Based Decision
EHR data – entry to extract
Sources of variability – data entry
*partial list
Sources of variability – ETL
*partial list
Sources of variability – User request
*partial list
Sources of variability - analyst
*partial list
Sources of variability – self-service tools
*partial list
Why is this so complicated?
- Conceptual idea of clinical process does not
translate to how data are captured in the EHR
- Many different ways to document same piece of
information
– Workflow used to collect data often dictates where those elements are stored in reporting database – Most researchers lack understanding of these workflows
- Quality of results then depend on how question is
asked, skill of analyst
Why is this so complicated?
- Conceptual idea of clinical process does not
translate to how data are captured in the EHR
- Many different ways to document same piece of
information
– Workflow used to collect data often dictates where those elements are stored in reporting database – Most researchers lack understanding of these workflows
- Quality of results then depend on how question is
asked, skill of analyst
Example – encounters (CCHMC FY14)
- Annual Report
– Total patient encounters: ~1.2 million – ED visits: ~100K – Admissions (including short stay): ~31K – Outpatient: ~1 million
- EHR
– Total patient encounters: ~3 million – ED admissions to inpatient: ~145K – Inpatient: ~28K – Ambulatory: ~2.8 million
- Encounter != encounter
Just pull data from “ambulatory” encounters…
EEG EXERCISE CARDIOLOGY TESTING PUMP/CGM INITIATION ORDERS MED TAPER SCHEDULE GENETIC COUNSELOR NEONATOLOGY TESTING CARE CONFERENCE - PATIENT/FAMILY PRESENT HOME VISIT - PALLIATIVE CARE ABUSE REPORTING CARE COORDINATOR SPECIAL NEEDS SUMMARY EARLY INTERVENTION HI NEURODEVELOPMENTAL CLINIC TRACKING INFUSION ORDERS ENT CLINIC VISITS FEES/VOICE HEPATOBLASTOMA LIVER TRANSPLANT FOLLOW UP PRE-ADOPTION ENCOUNTER EB PLANNING FEES CLINIC VPI - ENT/SPEECH INTAKE HVMC PLANNING PRE-OP PHYSICAL PLAN OF CARE ENT INPATIENT VISIT HOSPITAL TO HOSPITAL TRANSFER DEVELOPMENTAL TESTING BIOETHICS CONSULT ENDO STIM TESTING HIM INTERFACE CREATED SURGICAL SITE INFECTION DERM PATCH TESTING INTAKE CONSULT ADEC INTAKE CPST-PSY ENCOUNTER ECONSULT TELEMEDICINE ROADMAP HOSPITAL ENCOUNTER UPDATE PCP/CLINIC CHANGE WAIT LIST CLERICAL ORDERS MOTHER BABY LINK LACTATION ENCOUNTER CANCELED APPOINTMENT SURGERY ANESTHESIA ANESTHESIA EVENT UNMERGE HEALTH MAINTENANCE LETTER PATIENT EMAIL E-VISIT MOBILE ORDER ONLY QUESTIONNAIRE SERIES SUBMISSION PATIENT OUTREACH CONTACT MOVED NURSE TRIAGE E-CONSULT E-CONSULT COMMUNITY ORDER TELEMEDICINE EXTERNAL CONTACT OPHTH EXAM HOSPICE ADMISSION HOME HEALTH ADMISSION HOME CARE VISIT HOME CARE UPDATE PATIENT WEB UPDATE COMMUNITY ORDERS COMMITTEE REVIEW POST MORTEM DOCUMENTATION BILLING ENCOUNTER HOSPITAL CONFIDENTIAL OPH TESTING EDUCATOR VOICE CLINIC TELEPHONE REGISTRATION EMPTY LAB REQUISITION INITIAL CONSULT ANTI-COAG VISIT PROCEDURE VISIT OFFICE VISIT CONSENT FORM SCREENING FORM EXTERNAL HOSPITAL ADMISSION LETTER (OUT) REFILL IMMUNIZATION HISTORY RESEARCH ENCOUNTER REFERRAL ORDERS ONLY RX REFILL AUTHORIZE MEDS ONLY (WEB) MEDS VOID (WEB) RESOLUTE PROFESSIONAL BILLING HOSPITAL PROF FEE EPISODE CHANGES ANCILLARY ORDERS PHARMACY VISIT BPA ROUTINE PRENATAL INITIAL PRENATAL OPHTH OFFICE VISIT ABSTRACT WALK-IN TREATMENT PLAN ALLIED HEALTH NURSE ONLY SOCIAL WORK NUTRITION PHYSICAL THERAPY OCCUPATIONAL THERAPY SPEECH THERAPY RESPIRATORY THERAPY CASE MANAGEMENT EDUCATION SURGICAL H&P CLINICAL SUPPORT MEDS ONLY / E - PRESCRIBE PFT ONLY TRANSPLANT PRE-EVALUATION TRANSPLANT EVALUATION TRANSPLANT FOLLOW-UP TRANSPLANT RESULTS ENTRY IMMUNOTHERAPY ALLERGY TESTING SPECIMEN COLLECTION AUTO RELEASE ORDERS URODYNAMIC TESTING PRE-NATAL CONSULT CHECKLIST BOWEL MANAGEMENT CARE CONFERENCE INTAKE/TRIAGE VNS REPROGRAM/SHUTOFF CLINICAL NOTE GENETICS PASTORAL THERAPY VISIT INTAKE - NEW PATIENT HIM SCANS PRE-VISIT PLANNING TRANSCRIBED ORDERS SCHOOL TEACHER/INTERVENTION CHILD LIFE THERAPY PROGRESS SUMMARY BRONCHOSCOPY REQUEST HEMONC SOCIAL WORK AUD CONSULT OPH CONSULT ALG CONSULT UROLOGY COMPLEX INTAKEGive me all data for element X…
Why is this so complicated?
- Conceptual idea of clinical process does not
translate to how data are captured in the EHR
- Many different ways to document same piece of
information
– Workflow used to collect data often dictates where those elements are stored in reporting database – Most researchers lack understanding of these workflows
- Quality of results then depend on how question is
asked, skill of analyst
EHRs are constantly evolving
- New functionality is released & workflows
change over time
– Clinician-entered – Patient entry via welcome kiosk – Patient entry via web-based questionnaire
- These workflows are typically additive, not
substitutive
– Need to remember this history – Will otherwise result in gaps in population
Example – Has a HEALTH RELATED QUALITY OF LIFE (QOL) ASSESSMENT been documented?
- Flowsheet RHE PEDS QL #129, Measure RHE PARENT #3757
- Flowsheet RHE PEDS QL #129, Measure RHE PATIENT #1799
- Flowsheet RHE PEDS QL #129, Measure GEN PATIENT #3758
- Flowsheet RHE PEDS QL #129, Measure GEN PARENT#3759
- Questionnaire RHE PEDSQL 13-18 TEEN REPORT #20702, Question RHE PEDSQL 13-18 CHILD TOTAL SCORE #400411
- Questionnaire RHE PEDSQL 13-18 PARENT REPORT FOR TEENS #20703, Question: RHE PEDSQL 13-18 PARENT TOTAL SCORE
#20544
- Questionnaire RHE PEDSQL 2-4 PARENT REPORT FOR TODDLERS #20699, Question: RHE PEDSQL 2-4 PARENT TOTAL SCORE
#400415
- Questionnaire RHE PEDSQL 5-7 PARENT REPORT FOR YOUNG CHILDREN #20700, Question: RHE PEDSQL 5-7 PARENT TOTAL
SCORE #400421
- Questionnaire RHE PEDSQL 5-7 YOUNG CHILD REPORT #20701, Question: RHE PEDSQL 5-7 CHILD TOTAL SCORE #400427
- Questionnaire RHE PEDSQL 8-12 PARENT REPORT FOR CHILDREN #20706, Question: RHE PEDSQL 8-12 PARENT TOTAL
SCORE#400439
- Questionnaire RHE PEDSQL 8-12 CHILD REPORT #20705, Question: RHE PEDSQL 8-12 CHILD TOTAL SCORE #400433
- Questionnaire PEDSQL GENERIC 1-12MOS PARENT REPORT FOR INFANTS #20758, Question: PEDSQL 1-12MOS TOTAL SCORE
#400280
- Questionnaire PEDSQL GENERIC 13-18 TEEN REPORT #20745, Question: PEDSQL 13-18C TOTAL SCORE #400163
- Questionnaire PEDSQL GENERIC 13-18 PARENT REPORT FOR TEENS #20686, Question: PEDSQL 13-18P TOTAL SCORE #400158
- Questionnaire PEDSQL GENERIC 13-24MOS PARENT REPORT FOR INFANTS #20759, Question: PEDSQL 13-24MOS TOTAL
SCORE #100857
- Questionnaire PEDSQL GENERIC 18-25 YOUNG ADULT REPORT #20684, Question: PEDSQL 18-25C TOTAL SCORE #400183
- Questionnaire PEDSQL GENERIC 2-4 PARENT REPORT FOR TODDLERS #20688, Question: PEDSQL 2-4P TOTAL SCORE #400188
- Questionnaire PEDSQL GENERIC 5-7 PARENT REPORT FOR YOUNG CHILDREN #20689, Question: PEDSQL 5-7P TOTAL SCORE
#400153
- Questionnaire PEDSQL GENERIC 5-7 YOUNG CHILD REPORT #20683, Question: PEDSQL 5-7C TOTAL SCORE #400178
- Questionnaire PEDSQL GENERIC 8-12 PARENT REPORT FOR CHILDREN #20687, Question: PEDSQL 8-12P TOTAL SCORE
#400173
- Questionnaire PEDSQL GENERIC 8-12 CHILD REPORT #20685, Question: PEDSQL 8-12C TOTAL SCORE #400168
Are there any solutions?
- Engagement with operational reporting groups /
data stewards
– Often serve as source of truth for a given area – Deal with much higher request volume – However – different priorities, funding models – can be difficult to keep activities aligned
- Quality checks / Data characterization
– Should help identify if there is a problem – But not necessarily where to look for the solution – Difficult to communicate/disseminate findings
Are there any solutions?
- Engagement with operational reporting groups /
data stewards
– Often serve as source of truth for a given area – Deal with much higher request volume – However – different priorities, funding models – can be difficult to keep activities aligned
- Quality checks / Data characterization
– Should help identify if there is a problem – But not necessarily where to look for the solution – Difficult to communicate/disseminate findings
Assessing Data Quality within PCORnet
Enabling Research at a National Scale
How do you ask a research question at hundreds
- f institutions and get back results you can trust?
Option 1 — Write a description and have everyone create a local implementation to run on their data Option 2 — Create an algorithm that can run against a single, common data model
PCORnet Data Strategy
Standardize data into a common data model Focus on data quality: data curation Operate a secure distributed query infrastructure
- Develop re-usable tools to query the data
- Send questions to the data and only return
required information Learn by doing and repeat
Loading the Common Data Model (easy)
Same data are represented differently at different institutions (e.g., Race)
Common Data Model Value Set
01 = American Indian or Alaska Native 02 = Asian 03 = Black or African American 04 = Native Hawaiian or Other Pacific Islander 05 = White 06 = Multiple Race 07 = Refuse to Answer NI = No Information UT = Unknown OT = Other
In order to be able to trust results of an analysis, we need to have consistent representations
Common Data Model Value Set
01 = American Indian or Alaska Native 02 = Asian 03 = Black or African American 04 = Native Hawaiian or Other Pacific Islander 05 = White 06 = Multiple Race 07 = Refuse to Answer NI = No Information UT = Unknown OT = Other SITE 1
Caucasian African American Asian Multiple Race Blank
SITE 2
101 201 300 401 500 600
SITE 3
African American American Indian Asian American White Other Unknown
SITE 1
Caucasian African American Asian Multiple Race Blank
SITE 2
101 201 300 401 500 600
SITE 3
African American American Indian Asian American White Other Unknown
22
Loading the Common Data Model (less easy)
Same data are represented differently at different institutions (e.g., Type of Encounter)
In order to be able to trust results
- f an analysis, we need to have
consistent representations
Common Data Model
Ambulatory Visit (AV) Emergency Department (ED) ED Admit to Inpatient (EI) Inpatient Hospital (IP) Non-Acute Inst. Stay (IS) Other Ambulatory (OA) Other (OT) Unknown (UN) No Information (NI) SITE 1 Social Work Visit Allied Health Office Visit Nurse Visit Procedure Visit Employee Health Vascular Lab Sleep Study Visit Social Work Visit SITE 2 Office Visit Specimen Postpartum Visit Clinical Support Initial Prenatal SITE 3 Home Care Visit Office Visit Therapy Visit Orders Only Cardiology Testing Hospital Encounter
21
Factors that increase complexity
People interpret the CDM specification differently, resulting in variability in how CDM is populated Different health systems, with different EHRs, implemented at different times Clinical workflows differ across institutions & impact availability of data Understanding of EHR / claims data sources differs across institutions – may impact what gets loaded from source systems All of these issues are present when doing research with EHR data, even within a single center
51
We have tools/processes to address this!
Data Curation assesses and improves global data quality
- Characterize the contents of the PCORnet CDM
- Evaluate global data quality and fitness-for-use across a broad
research portfolio
For a given study, still need to consider data characterization specific to the aims
- Assess data on the intended cohort
- Ensure that outcomes / variables of interest are available & complete
- Determine whether partners actually have enough data / patients to
participate
- Requires upfront investment, but can save significant time overall
52
Data Curation
Data curation
Step 1 Network partner plans DataMart refresh Step 2 Network partner responds to the data characterization query package Step 3 Coordinating Center approves the DataMart Step 4 Coordinating Center analyzes results and solicits more information as needed Step 5 Coordinating Center holds Data Characterization and Implementation Forums and updates Implementation Guidance
54
Cycle 2 Required Data Checks
55
Category Data Check Description Data Model Conformance DC 1.01 Required tables are not present DC 1.02 Expected tables are not populated DC 1.03 Required fields are not present DC 1.04 Fields do not conform to CDM specifications for data type, length, or name. DC 1.05 Tables have primary key definition errors DC 1.06 Fields contain values outside of CDM specifications DC 1.07 Fields have non-permissible missing values DC 1.08 Tables contain orphan PATIDs (PATIDs not in DEMOGRAPHIC) DC 1.10 Replication errors between the ENCOUNTER, PROCEDURES and DIAGNOSIS tables Data Completeness DC 3.04 Less than 50% of patients with encounters have DIAGNOSIS records DC 3.05 Less than 50% of patients with encounters have PROCEDURES records
Cycle 2 Investigative Data Checks
Category Data Check Data Check Description
Data Model Conformance DC 1.09 Tables have orphan ENCOUNTERIDs for more than 5% of records. Data Plausibility DC 2.01 More than 5% of records have future dates. DC 2.02 More than 10% of records fall into the lowest or highest categories of age, height, weight, diastolic blood pressure, systolic blood pressure, prescribed days supply, or dispensed days supply DC 2.03 More than 5% of records have illogical date relationships. DC 2.04 The average number of encounters per visit is > 2.0 for inpatient (IP), emergency department (ED), or ED to inpatient (EI) encounters Data Completeness DC 3.01 The average number of diagnoses records with known diagnosis types per encounter is below threshold [1.0 for ambulatory (AV), inpatient (IP), emergency department (ED), or ED to inpatient (EI) encounters]. DC 3.02 The average number of procedure records with known procedure types per encounter is below threshold [0.75 for ambulatory (AV) encounters, 0.75 for emergency department (ED) encounters, 1.00 for ED to inpatient (EI) encounters, and 1.00 for inpatient (IP) encounters DC 3.03 More than 10% of records have missing or unknown values for the following fields: BIRTH_DATE, SEX, DISCHARGE_DISPOSITION (IP/EI encounters only), DISCHARGE_DATE (IP/EI encounters only), PX_DATE, LOINC, RX_NORM_CUI, RX_ORDER_DATE, RX_DAYS_SUPPLY, or DISPENSE_SUP DC 3.06 More than 10% of inpatient (IP) or ED to inpatient (EI) encounters with a diagnosis don't have a principal diagnosis
56
Data partners are asked to investigate and comment on any exceptions in their Annotated Data Dictionary, and to classify these exceptions as follows: feature/limitation of source data; could be improved in the near future; may be improved in the future;
- r warrants further investigation.
Resources for Network Partners
Empirical Data Characterization Report Excerpt
57
Resources for Network Partners
Empirical Data Characterization Report Excerpt
58
Study-specific data quality
Antibiotics study overview
Study Aims: To evaluate the comparative effects of different types, timing, and amount of antibiotics prescribed during the first 2 years of life on:
- Body mass index and risk of obesity at 5 and 10 years
- Growth trajectories from infancy onwards
And how these effects differ according to:
- Child sex, race/ethnicity, geography
- Use of other medications
- Maternal BMI, antibiotics during pregnancy, C-section (analysis at 7
sites) Conducted study-specific data characterization to assess site eligibility:
- Findings for prescriptions
- RxNorm considerations
60
Study-specific data characterization findings
Lower number of children ≤ 2 with an antibiotics prescription Start minus end date
- Low percent missing (~5%)
- Note: This is very different than global measures (highly missing)
- May be useful: 50th percentile = 10 days
- Huge range (5th percentile = 0 days ; 95th percentile = 108 days)
Quantity
- Varying interpretations of quantity (pill, mg, ml, etc.)
- Large range (5th percentile = 11.00; 95th percentile = 225.50)
- Missing in 52% of ABX prescriptions
Refills - not consistently populated (60% missing) Days supply - only populated in 4% of ABX prescribing records
61
Study-specific data characterization findings
Initial query only included RxNorm Dose Form and Clinical Drug or Pack
- Specific codes that allow identification of all aspects of the
prescription (>2000 codes)
- Did not include less specific codes: RxNorm Ingredient, Precise
Ingredient, or Drug Component Learned that several network partners had not mapped to the specific codes
- Had to ask network partners to map to the specific codes
- Assess whether to include ingredient-level records in the
analysis
62
What RXCUI term types are used?
Categorization of term types
63
Category-1
- 1. Semantic Clinical Drug-SCD
- 2. Semantic Branded Drug-SBD
- 3. Generic Pack-GPCK
- 4. Branded Name Pack-BPCK
Category-2
- 1. Semantic Clinical Drug Form-SCDF
- 2. Semantic Branded Drug Form-SBDF
- 3. Semantic Clinical Dose Form Group-SCDG
- 4. Multiple Ingredients-MIN
- 5. Precise Ingredient-PIN
- 6. Ingredient-IN
- 7. Semantic Branded Drug Component-SBDC
- 8. Semantic Clinical Drug Component-SCDC
Category-3
- 1. Branded Name-BN
- 2. Semantic Branded Dose Form Group-SBDG
- 3. Dose Form Group-DFG
- 4. Dose Form-DF
Category-1
(Ingredient + Strength + Dose Form)
Category-2
- 1. Ingredient
- 2. Ingredient + Strength
- 3. Ingredient+ Dose Form
Category-3
1. Brand Name 2. Dose Form
RXCUI Term Types Distribution by Category and DataMart
64
Network ID
Data Mart ID
From EHR to Clinical Evidences
EHR recorded at the point of care Data Extractions Data Wrangling Data Curation Data Analyses Causal Inference Decision Theory Evidence Based Decision