MASTER PATIENT INDEX AND DATA LINKAGES August 2020 Kathy Hines, - - PowerPoint PPT Presentation

master patient index and data
SMART_READER_LITE
LIVE PREVIEW

MASTER PATIENT INDEX AND DATA LINKAGES August 2020 Kathy Hines, - - PowerPoint PPT Presentation

IMPACT OF DE-IDENTIFICATION ON MASTER PATIENT INDEX AND DATA LINKAGES August 2020 Kathy Hines, Senior Director of Partner Operations & Data Compliance Scott Curley, Manager of Privacy & Compliance CENTER FOR HEALTH INFORMATION AND


slide-1
SLIDE 1

IMPACT OF DE-IDENTIFICATION ON MASTER PATIENT INDEX AND DATA LINKAGES

CENTER FOR HEALTH INFORMATION AND ANALYSIS

August 2020 Kathy Hines, Senior Director of Partner Operations & Data Compliance Scott Curley, Manager of Privacy & Compliance

slide-2
SLIDE 2

OVERVIEW

slide-3
SLIDE 3
  • Rising external cybersecurity threats to healthcare data
  • Internal risks of accidental or intentional data exposure.
  • Specific to the APCD – Federal Law 42 CFR Part II

Motivation for Change

3 Impact of De-identification on MPI and Data Linkages | Scott Curley, Kathy Hines| August 2020

slide-4
SLIDE 4
  • Outright removing PII would prevent CHIA and our external

community of data users from connecting health care encounters across carriers and to other datasets

  • CHIA set an objective to dramatically decrease the risk of

exposure of collected PII while retaining the ability to connect data together.

Analytic Challenge

4 Impact of De-identification on MPI and Data Linkages | Scott Curley, Kathy Hines| August 2020

slide-5
SLIDE 5

5 Impact of De-identification on MPI and Data Linkages | Scott Curley, Kathy Hines| August 2020

CHIA’s Solution

1. Software

  • CHIA’s File Secure software is deployed to the site of data submission

(insurance carriers and hospitals) that replaces key PII fields with pseudonymized equivalents

2. Internal Architecture

  • CHIA never receives PII “in the clear” and the data is stored separately

from the data warehouse and are not released to internal users or external data applicants.

3. Submission Guide Updates

  • CHIA stopped collection of certain fields

4. Master Patient Index

  • One ID for each person regardless of insurance carrier with the ability

to link to external data

slide-6
SLIDE 6

DE-IDENTIFICATION USING EXPERT DETERMINATION

slide-7
SLIDE 7

HIPAA De-Identification

7 Impact of De-identification on MPI and Data Linkages | Scott Curley, Kathy Hines| August 2020

Safe Harbor

Pros

  • Easy to implement and

maintain

Cons

  • 18 data elements redacted
  • r removed entirely
  • More restrictive than

statistical de-identification with respect to birth dates, service dates, and geographic data

Expert Determination

Pros

  • Methodology tailored to data

set in question

  • Lower overall risk of re-

identification

Cons

  • No single method for

implementation

  • Routine reassessment
  • More restrictive than Safe

Harbor with respect to some individual claim lines

*Slide courtesy of ONPOINT Health Data

slide-8
SLIDE 8
  • Established the variables to be considered for a formal re-

identification risk analysis

  • Catalogued all direct identifiers and quasi-identifiers
  • Determined acceptable risk levels
  • Minimum cell size, maximum risk, average risk
  • Assumed an adversarial environment where the recipients
  • f the data have knowledge of quasi-identifying values for

the individual

  • Established annual re-assessments

OnPoint Worked with CHIA to Define Approach

8 Impact of De-identification on MPI and Data Linkages | Scott Curley, Kathy Hines| August 2020

* Slide courtesy of ONPOINT Health Data

slide-9
SLIDE 9

Graphic should fit approximately in this space

Applied the Data Strategy

9 Impact of De-identification on MPI and Data Linkages | Scott Curley, Kathy Hines| August 2020

  • The risk mitigation model

was applied to multiple years of data (MA APCD data set years 2012 – 2017) to assess the risk stability over time and project a solution for the following year.

* Slide courtesy of ONPOINT Health Data

slide-10
SLIDE 10

FILE SECURE

slide-11
SLIDE 11
  • Data Submitters prepare files that include PII at their location
  • File Secure replaces key fields with pseudonymized values (128

character length) while still at their location

  • Name
  • SSN
  • Full DOB (MMYYYY are left in the clear for analytics)
  • “In the clear” versions of Name, SSN, DOB never leave the data

submitter’s site

CHIA’s File Secure

11 Impact of De-identification on MPI and Data Linkages | Scott Curley, Kathy Hines| August 2020

slide-12
SLIDE 12
  • Zip code processing
  • Flag if invalid zip code
  • Retain MA Zip codes only
  • Map MA Zip codes to mask small areas in MA APCD
  • State code processing
  • Flag if invalid state
  • Retain only New England and New York state codes
  • Map MA Zip codes to mask small areas in MA APCD
  • File Secure encrypts the file with NIST compliant encryption

before data is sent to CHIA

CHIA’s File Secure

12 Impact of De-identification on MPI and Data Linkages | Scott Curley, Kathy Hines| August 2020

slide-13
SLIDE 13

SUBMISSION GUIDE

slide-14
SLIDE 14
  • Claims
  • First/Last names
  • Social Security numbers (SSNs)
  • Address information
  • Eligibility
  • Street/City address information
  • Zip code limited to 5 digits
  • Race/Ethnicity indicators
  • Disability/Marital/Student/Family size indicators
  • Language (list abbreviated)
  • Date of Death

Submission Guide Changes – Data Removal

14 Impact of De-identification on MPI and Data Linkages | Scott Curley, Kathy Hines| August 2020

slide-15
SLIDE 15

15 Impact of De-identification on MPI and Data Linkages | Scott Curley, Kathy Hines| August 2020

Insurance Carrier Submissions

A

Eligibility Removed:

  • Street/City
  • Marital Status
  • Race/Ethnicity
  • Employee Status
  • Student Status
  • Date of Death

Medical Claims Removed: PII Product Provider Dental Removed: PII Rx Claims Removed: PII CHIA File Secure Software

  • USPS

Nickname Table

  • NYSIIS - First

and Last name

  • HASH

Function

  • Remove

known dummy values

  • Re-map 1% of

population (small ZIP codes) Eligibility Final HASH

  • First Name
  • Last Name
  • SSN
  • DOB

Clear

  • State
  • Zip
  • Insurance ID
  • Org ID
  • Month / Year
  • f Birth
  • Gender

Encrypted for Transport to CHIA

Insurance Carrier Site

CHIA Landing Zone APCD Submission Files (“in the clear”)

slide-16
SLIDE 16

MASTER PATIENT INDEX (MPI)

slide-17
SLIDE 17
  • CHIA creates a master patient index (MPI) using a probabilistic

matching algorithm with pseudonymized identifiers. The ID connects all records that are very likely the same person and assigns them a key that is not based in any way on PII or any

  • ther attributes of a person’s data.
  • Example of what an APCD data user might have access to
  • MPI – CHIA’s randomly generated unique ID for a person
  • MM/YYYY of birth
  • 5 digit ZIP code for largely populated ZIP codes
  • CHIA has deployed a service to connect external data to APCD
  • r Case Mix using a combination of CHIA’s File Secure software

and CHIA’s probabilistic matching engine

MPI and Record Linking

17 Impact of De-identification on MPI and Data Linkages | Scott Curley, Kathy Hines| August 2020

slide-18
SLIDE 18

18 Impact of De-identification on MPI and Data Linkages | Scott Curley, Kathy Hines| August 2020

CHIA Master Patient Index

CHIA Landing Zone Data Preparation CHIA APCD Algorithm

Probabilistic matching method using:

  • HASH Fname
  • HASH Lname
  • HASH SSN
  • HASH DOB
  • Zip
  • Gender

Links records within and across carriers. Filter Known Data Issues

CHIA Master Patient Index Hub (MEID) Data Load

Records where:

  • Insurance ID
  • Org ID

are the same are considered the same person. The last 5 valid values of each input field are stored to capture name changes, people moving etc. CHIA MPI Org ID Insurance ID First Name Last Name DOB Gender SSN ZIP Code 111111 30 BBY00002211 ABCD QRSTUVWXYZ POIUYT F HFHDSFH 02116 KDFGJKDFKFK 02461 02090 112233 22 HVD00000122 QWDD DGFGDFFGFG GFGDFF M FGDDDFG 02118 112233 30 BBY000034234 QWDD DGFGDFFGFG GFGDFF M FGDDDFG 01056 13116 01025

slide-19
SLIDE 19

CHIA MATCHING SERVICE

slide-20
SLIDE 20

CHIA Matching Service (Master Data Management)

20 Impact of De-identification on MPI and Data Linkages | Scott Curley, Kathy Hines| August 2020 Customer File

  • First Name
  • Last Name
  • SSN
  • Gender
  • DOB
  • Zip Code
  • Insurance ID
  • Study ID

The more complete the file, the better the match results however not all fields are needed for each record for a confident match

CHIA File Secure Software Linking File Prepped HASH

  • First Name
  • Last Name
  • SSN
  • DOB

In the Clear

  • Zip
  • Insurance ID
  • Gender
  • Study ID

CHIA APCD Algorithm

High Score Matches Lower Score Matches Custom Match Threshold based

  • n how accurate

the matches need to be. For example: Higher = All fields present and up to 1 mismatch Any number of additional match scenarios can be added and separated from the High Score Matches based

  • n study need

For example: Lower = SSN Missing and up to 1 mismatch Scores each input record against likely candidates in the MPI Hub. Used by customer CHIA use Master Enterprise ID to identify corresponding claims, this ID is then replaced with the project’s unique Study ID and claims returned to customer

Customer Site SFTP to CHIA CHIA

slide-21
SLIDE 21

CHIA Linkage Service (MPI Search)

21 Impact of De-identification on MPI and Data Linkages | Scott Curley, Kathy Hines| August 2020

Input Row from Customer - Hashed Equivalent Study ID First Name Last Name DOB SSN Zip Code Gender 8888 ABCD QRSTUVWXYZ POIUYT 02116 F

APCD Linking Scenarios

CHIA ID (MPI) First Name Last Name DOB SSN Zip Code Gender Match Result Match Score Disposition 4455544 ABCD QRSTUVWXYZ POIUYT 02116 F 5 Matches, 0 Mismatch Highest Input Row links to these APCD records 4455544 ABCD QRSTUVWXYZ POIUYT 02119 F 4 Matches, 1 Mismatch Higher 4455544 ABCD HIJKLMNOPQ POIUYT 02116 F 4 Matches, 1 Mismatch 4455544 ABCD QRSTUVWXYZ POIUYT 02116 M 4 Matches, 1 Mismatch 4455544 MNOP QRSTUVWXYZ POIUYT 02116 F 4 Matches, 1 Mismatch 2332332 ABCD QRSTUVWXYZ LKJHGD 02116 F 4 Matches, 1 Mismatch, DOB weighted stronger Based on Study Requirements, Input Row may link to these APCD Records 4455544 ABCD HIJKLMNOPQ POIUYT 02116 M 3 Matches, 1 Mismatch Lower 5755542 ABCD MNBCDVSWX LKJHGD 02119 F 2 Matches, 3 Mismatch Input Row does not link to these APCD records 7886655 MNOP HIJKLMNOPQ POIUYT 02116 M 2 Matches, 3 Mismatch Too Low

slide-22
SLIDE 22

Successful data linkage projects leveraging pseudonymized identifiers

  • Dept. of Public Health study linking to opioid data (CH. 55)
  • Dept. of Public Health Public Health Data Warehouse (included linking of 21

datasets)

  • Dept. of Elder Affairs study linking long-term services and support data & federal

Housing & Urban Development housing data

  • Dept. of Public Health study linking to birth certificate records to study

postpartum depression

  • Dept. of Public Health studying linking to assisted reproductive technology data

In Progress

  • Dept. of Public Health study linking public housing and smoking cessastion data
  • U.S. Dept. of VA study linking to VA hospital data
  • Brigham and Women’s study linking to cardiac data

Example Matching Projects

22 Impact of De-identification on MPI and Data Linkages | Scott Curley, Kathy Hines| August 2020

slide-23
SLIDE 23

23

For questions, please contact:

  • Kathy Hines
  • Kathy.Hines@state.ma.us (617) 701-8275
  • Scott Curley
  • Scott.Curley@state.ma.us (617) 701-8255

Contact Information

Impact of De-identification on MPI and Data Linkages | Scott Curley, Kathy Hines| August 2020