ASSESSMENT FRAMEWORK LISA SCHILLING, MD, MSPH ACADEMY HEALTH ANNUAL - - PowerPoint PPT Presentation

assessment framework
SMART_READER_LITE
LIVE PREVIEW

ASSESSMENT FRAMEWORK LISA SCHILLING, MD, MSPH ACADEMY HEALTH ANNUAL - - PowerPoint PPT Presentation

DATA QUALITY ASSESSMENT FRAMEWORK LISA SCHILLING, MD, MSPH ACADEMY HEALTH ANNUAL RESEARCH MEETING JUNE 27, 2017 LOTS OF ACKNOWLEDGMENTS Funding AHRQ 1R01HS019908 (SAFTINet, PI Schilling) AHRQ1R01HS019912 (SPAN, PI Steiner) AHRQ


slide-1
SLIDE 1

DATA QUALITY ASSESSMENT FRAMEWORK

LISA SCHILLING, MD, MSPH ACADEMY HEALTH ANNUAL RESEARCH MEETING JUNE 27, 2017

slide-2
SLIDE 2

LOTS OF ACKNOWLEDGMENTS

  • Funding
  • AHRQ 1R01HS019908 (SAFTINet, PI Schilling)
  • AHRQ1R01HS019912 (SPAN, PI Steiner)
  • AHRQ U13 HS19564-01 AcademyHealth / EDM Forum (PI Holve)
  • NCATS UL1 TR000154 (University of Colorado CTSA) (PI Sokel)
  • PCORI CER Methods Award 5581 (PI Kahn)
  • Slides
  • Michael Kahn, MD, PhD, University of Colorado
  • Tiffany Callahan, University of Colorado
  • Maggie Massery, Children’s Hospital Colorado
slide-3
SLIDE 3

3 Weber, G. M., Mandl, K. D. & Kohane, I. S. Finding the missing link for big biomedical data. JAMA 311, 2479–2480 (2014).

slide-4
SLIDE 4

DATA QUALITY IN EHRs

  • Data collection tools optimized for efficiency and billing
  • Text templates
  • Copy/paste
  • Minimal data validation checks
  • Min/Max limits
  • Pick lists
  • Required fields
slide-5
SLIDE 5

THE SIMPLE STUFF…..

slide-6
SLIDE 6

6 Blood Pressure Measure Name Patients Used on Times used BLOOD PRESSURE 538,647 13,869,327 R AN NIBP 63,576 2,949,877 CARD BP 3 14,631 26,889 ABP INVASIVE PRESSURE 9,031 3,382,825 BLOOD PRESSURE (ED SEDATION) 7,402 41,498 EDU STAND BP 6,950 33,876 EDU LYING BP 6,941 32,609 CARD BP 2 6,878 9,934 ED PRE HOSP BP 6,323 7,117 BP #2 5,529 40,592

EDU SIT BP 4,957 6,152 CARD BP 4 4,452 6,806 R AN IBP ART 4,430 1,181,368 BP #3 4,330 24,675 BP - STANDING 4,098 6,120 BP - LYING 4,068 5,753 BP - SITTING 3,920 5,292 BP #4 3,477 15,898 BP PRE SEDATION 1,831 2,246 PAP 1,793 218,931 BLOOD PRESSURE (CS) 1,322 8,290 ART PRESSURE #2 404 136,579 R AN IBP PAP 71 6,488 R AN IBP P1 60 4,562 ECMO BLOOD PRESSURE 57 85,037 CARD BP 1 55 56 R AN IBP AO 53 4,129 RV PRESSURE 50 124 R AN IBP FAP 37 3,997 R AN IBP UAP 27 1,021 R AN IBP P2 13 339 R AN IBP P 11 282 R AN IBP LAP 11 634 BP #2 8 8 R AN IBP P4 2 2 R AN IBP BAP 1 1

CHCO Slides from Maggie Massary. Used with permission

BP – ANY TIME

slide-7
SLIDE 7

EHR WORKFLOWS MEET DATA QUALITY

  • Core vital signs: Blood Pressure, Height & Weight
  • Blood Pressure: 113 unique BP Names:
  • 15 have been deleted
  • 45 are hidden
  • 52 are available:
  • 37 are in use (have values)
  • 29 have been used more than a thousand times
  • 14 has been used on less than 71 patients
  • 23 have been used on more than 371 patients

7

CHCO Slides from Maggie Massary. Used with permission

slide-8
SLIDE 8

LET’S TALK ABOUT DATA QUALITY (DQ) AND THE WAY WE DESCRIBE IT….....

  • Six-year olds who their EMR records say are…..
  • Married (53)
  • Have significant others (18)
  • Divorced (2) / legally separated (3)
  • What term would you use to describe this issue?
  • Data validity
  • Data accuracy
  • Trueness- Truthiness
  • Believability
  • Consistency (age versus martial status)

8

slide-9
SLIDE 9

WHY STANDARDIZE DATA QUALITY TERMINOLOGY?

  • Standardizing DQ terminology is a first step in….
  • Standardizing DQ assessment methods….
  • Supports sharable and reusable DQ methods
  • Supports common understanding of DQ

issues

  • Supports increased transparency and

trust in analytic methods & findings

slide-10
SLIDE 10

10

DIVERSITY IN THE USE OF DQ TERMS

slide-11
SLIDE 11

COMMUNITY-DRIVEN CONSENSUS RECOMMENDATIONS FOR DQ REPORTING

11

slide-12
SLIDE 12

20 ITEMS IN 5 DOMAINS

  • Original Data Source
  • Data Steward
  • Data Processing/Provenance
  • Data Element Characterization
  • Analysis-specific Data Quality Specifications
slide-13
SLIDE 13

COMMUNITY-DRIVEN CONSENSUS RECOMMENDATIONS FOR DQ REPORTING

13

slide-14
SLIDE 14
slide-15
SLIDE 15

THE HARMONIZED DATA QUALITY TERMINOLOGY

  • Divides the DQ “world” into two “contexts”
  • Verification: What you can do with just the data (and

knowledge) you have on hand.

  • Expectations are derived internally
  • Validation: Brings in external resources – relative gold

standards, recognized benchmarks/comparators

  • Expectations are derived externally

15

slide-16
SLIDE 16

THREE DQ CATEGORIES THAT BUILD ON EACH OTHER

  • Completeness: Are data values present?
  • Doesn’t evaluate if the values makes sense, just “Are values there
  • r not”
  • Fidelity: Are the data dependable?
  • Doesn’t evaluate if the values are believable, just “Do values

align together as expected”

  • Plausibility: Are the data believable?
  • Doesn’t depend on the existence of an absolute truth

16

slide-17
SLIDE 17

VERIFICATION VALIDATION Definition Example Definition Example COMPLETENESS: ARE THE DATA PRESENT? Density

  • a. Atemporal: Measures
  • f data density

against a denominator are expected based on internal knowledge.

  • b. Temporal: Measures
  • f data density

against a time-

  • riented denominator

are expected based

  • n internal

knowledge. Includes total missingness measures.

  • a. Similar counts of missing

patient observations between ETLs.

  • b. Counts of monthly

emergency room visits during flu season.

  • a. Atemporal: Measures of

data density against a denominator are expected based on external knowledge.

  • b. Temporal: Measures of

data density against a time-oriented denominator are expected based on external knowledge. Includes total missingness measures.

  • a. Similar counts of missing

patient observations across network data partners.

  • b. Changes in counts of

monthly emergency room visits during flu season are similar to health department reports.

slide-18
SLIDE 18

VERIFICATION VALIDATION Definition Example Definition Example FIDELITY: ARE THE DATA DEPENDABLE? Metadata

  • a. Data elements

conform to internal formatting constraints.

  • b. Data elements

conform to relational constraints.

  • a. Sex is only one ASCII

character.

  • a. Sex only has values ‘M’, ‘F’
  • r ‘U’.
  • a. Patient MRN’s link to other

tables as required.

  • a. Data elements conform

to representational constraints based on external standards.

  • a. Formatting for the

primary language variable in the demographics table conforms to ISO standards. Measure

  • a. Repeated

measurement of the same fact show expected variability.

  • a. Patient height

measurements are similar when taken by two separate nurses within the same facility.

  • a. Two dependent

databases (e.g., database 1 abstracted from database 2) yield similar results for identical measurements.

  • a. Recorded date of birth is

consistent between EHR data and registry data for the same facility. Derivation

  • a. Derived values

conform to computational or programming specifications.

  • a. Database- and hand-

calculated Body Mass Index values are identical.

  • a. Two programmers

provided with identical specifications and identical data sets report identical results for derived values.

  • a. Data transformations

implemented in SAS and R yield identical results

  • n the same data set.

Uniqueness

  • a. The database is

absent of duplicate measurements.

  • b. Within a database,

merged objects are

  • nly counted once.
  • a. Each patient is registered

under a single MRN.

  • a. Person records obtained

via EHR and claims data are only counted once.

  • a. An object represented

in a source database is uniquely represented in a target database.

  • a. An object represented

in a source database is represented by its components in a target database.

  • a. A single charge in a

claims database represents a single encounter in the EHR.

  • b. A single drug order in an

EHR database is represented by its ingredients in a pharmacy database.

slide-19
SLIDE 19

VERIFICATION VALIDATION Definition Example Definition Example PLAUSIBILITY: ARE THE DATA BELIEVABLE? Measure

  • a. Data values and

distributions agree with an internal measurement or local knowledge.

  • b. Independent

measurements of the same fact are in agreement.

  • c. Logical constraints

between variables and subgroups agree with local or common knowledge (Includes "expected" missingness).

  • a. All patients have positive

values for height and weight.

  • b. Serum glucose

measurement is similar to finger stick glucose measurement.

  • b. Oral and axillary

temperatures are similar.

  • c. Sex agreement with sex-

specific contexts (pregnancy, prostate cancer).

  • c. Inpatient diagnoses are

not associated with

  • utpatient encounters.
  • a. Data values and

distributions (including subgroup distributions) agree with trusted reference standards or external knowledge.

  • b. Similar results for

identical measurements are obtained from two independent databases representing the same

  • bservations with equal

credibility.

  • a. HbA1c values from

hospital and national reference lab are statistically similar under the same conditions.

  • b. Diabetes ICD-9 and CPT

codes are similar between two independent claims databases serving similar populations. Time

  • a. Observed or derived

values conform to expected temporal properties.

  • b. Sequences or state

transitions conform to expected properties.

  • a. Length of stay for
  • utpatient procedures

per year conforms to expectations.

  • a. An initial immunization

precedes a booster immunization.

  • a. Observed or derived

values have similar temporal properties across one or more external comparators or gold standards.

  • b. Sequences or state

transitions are similar to external comparators or gold standards.

  • a. Length of stay for
  • utpatient procedures

conforms to Medicare data for similar populations.

  • b. Immunization sequences

match the state immunization registry sequence.

slide-20
SLIDE 20

DQ CODE-A-THON

  • Four teams
  • All workshop artifacts available on public github
  • https://github.com/DQCode-A-Thon/

DQ CODE-A-THON

slide-21
SLIDE 21

www.pcori.org 21

http://dododas.github.io/dqa-viz/dashboards.html

slide-22
SLIDE 22

www.pcori.org 22

http://dododas.github.io/dqa-viz/dashboards.html

slide-23
SLIDE 23

www.pcori.org 23

https://sigfried.github.io/parcoords/

slide-24
SLIDE 24

SECOND THURSDAY OF THE MONTH @ 3PM *ET, next is July 13th email

MICHAEL.KAHN@UCDENVER.EDU

JOIN US! Data Quality Collaboration

THANK YOU!