28 October 2014
1
NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & RESPONSE RATES - - PowerPoint PPT Presentation
1 NEC METHODS: MATCHING, DEDUPLICATION, ANALYSIS & RESPONSE RATES 28 October 2014 Matching & Deduplication 2 Purpose of the Merged Analytic Cross- Region Datasets 3 PIF-ER Merged Dataset Analyses on types of trainees who
1
3
PIF-ER Merged Dataset
Analyses on types of trainees who attended particular
PIF-ER-ACRE Merged Dataset
Analyses on outcomes of AETC training programs related
1.
Collect regional process and evaluation data
2.
Convert data in submitted format (Excel, CSV, SPSS) to SAS
3.
Reformat regional datasets to match expected data file specifications (e.g., character/numeric type)
Process data: HRSA data manual
Evaluation data: ACRE implementation manual
4.
Create all-region ER, PIF, ACRE IP , ACRE FUP , and FTCC PIF datasets by concatenating/appending regional files of the same type
5.
Create analytic PIF-ER merged dataset
6.
Create analytic PIF-ER-ACRE datasets
4
5
Steps 1, 2, 3, 4: Collect, convert, reformat
PIF, ACRE IP and FUP datasets. Step 5: Create analytic ER-PIF dataset Step 6: Create analytic ER-PIF-ACRE dataset
6
Check to see which regions have repeats on
Merge PIF and ER For 1-2 regions with repeated PROG_ID, sort and
For all other regions that have distinct PROG_ID, sort
PROG_ID AETC LPS Bottom of PIF:
7
Select eligible ACRE IP data Check to see which regions have repeats on PROG_ID by LPS Exclude records where all 4 IP questions are missing/blank Exclude records where the PIF_ID is . [missing], 0, or 99999999 De-duplicate IP records by AETC, LPS (if applicable), PROG_ID,
PIF_ID, AIP1, AIP2
Select eligible records from the previously created ER-PIF
Include only records where there is at least 1 PIF record included
(e.g., there are some ERs without any PIFs)
Exclude records where the PIF_ID is . [missing], 0, or 99999999
Cont.’d
8 Sort the ER-PIF and the ACRE IP data by AETC LPS (if applicable) PROG_ID
PIF_ID. The ER-PIF dataset is further sorted by PIFDATE
Merge the ER-PIF-IP by AETC LPS PROG_ID PIF_ID De-duplicate the data based on the key variables AETC, LPS (if
applicable), PROG_ID, PIF_ID [*Note, this deletes <200 records]
Sort the all-region ACRE FUP by AETC LPS (if applicable) PROG_ID PIF_ID Sort the previously created ER-PIF-IP dataset by AETC LPS (if applicable)
PROG_ID, PIF_ID
Merge the ER-PIF-IP with the ACRE FUP by these key variable Restrict the analytic dataset to records with a valid, non-missing PIF_ID with
a PIF available [Note, approx 20K records removed]
9
PIF ID is available on the PIF, ACRE IP
Though not on the ER form, the Program ID on the PIF
PIF ID used for matching
Across training events (repeat trainees) Across evaluation forms (ACRE IP and FUP) month of birth + day of birth + last 4 digits of SSN PIF_ID
10
Valid PIF ID contains:
Valid month of birth (1-12) Valid day of birth (1-31) Valid last 4 digits of SSN (≥1 and not 9999)
Valid PIF ID is a numeric value <99999999 Examples of invalid PIF IDs:
99999999 0 . [missing] 12345678 04049999 1122420932
Records with invalid PIF IDs are excluded from regression analyses
11
For overall ACRE regression analyses:
ER-PIF-ACRE dataset restricted to records with a valid PIF ID and with a
linked PIF
Restricted dataset sorted by combined AETC region, PIF ID, eligibility for
ACRE IP , having associated IP record, and PIF date
Last record is outputted
For MAI ACRE regression analyses, similar:
ER-PIF-ACRE dataset restricted to records with a valid PIF ID and with a
linked PIF
Restricted ER-PIF-ACRE dataset sorted by combined AETC region, PIF ID,
having an MAI training record, eligibility for ACRE IP , having associated IP record, and PIF date
Last record is outputted
13
Last eligible record among repeat trainees is used “Eligible” means the PIF_ID is not an invalid code according to the
NEC algorithm, there is truly an associated PIF in the linked dataset
Analytic population includes: For IP: targeted IP trainee (i.e., attended Level 1, 2, or 3
training), who has an associated PIF and IP record, and is a direct HIV provider (PIF13=1)
For FUP: targeted FUP trainee (i.e., attended Level 2 training and
topic included clinical management [ER4_1-16] or prevention and behavior change [ER4_29-31] topics), who has an associated PIF and FUP record, and is a direct HIV provider (PIF13=1)
Event Record form
14
ACRE immediate post questions asked immediately after training event ER9_3>0 ER9_2>0 ER9_1>0
ANY
Event Record form
15
ACRE follow-up asked 6 weeks after training through a web-based survey ER9_2>0 ER4_1=1 or ER4_2=1 or etc. …. or ER4_31=1
16
Data source: cross-region ER-PIF and ACRE IP FY11-12.
N = 108,687 excludes n = 2,459 event records without a PIF associated and n = 5,736 records with an invalid PIF ID. This number includes repeat trainees. Though n = 93,756 records fulfilled the IP target criteria, n = 42,465 (45.3%) ER-PIF- IP records that linked and fulfilled the target. Of these, n = 15,979 (52.7%) indicated they were direct HIV providers on the PIF. N = 72,642 ACRE IP records received by NEC N = 108,687 FY 11-12 trainees (based on linked AETC PIF and ER) n = 45,452 linked ER-PIF-ACRE IP n = 42,465 linked records and a targeted IP training n = 2,987 linked records and NOT a targeted IP training n = 30,331 linked records, IP targeted, and trainee’s last record in FY 11-12
17
Data source: cross-region ER-PIF and ACRE FUP FY11-12.
N = 3,847 ACRE FUP records received by NEC N = 108,687 FY 11-12 trainees (based on linked AETC PIF and ER) n = 2,620 linked ER-PIF-ACRE FUP n = 2,018 linked records and a targeted FUP training n = 602 linked records and NOT a targeted FUP training n = 1,707 linked records, FUP targeted, and trainee’s last record in FY 11-12 N = 108,687 excludes n = 2,459 event records without a PIF associated and n = 5,736 records with an invalid PIF ID. This number includes repeat trainees. Though n = 61,647 records fulfilled the FUP target criteria, n = 2,018 (3.3%) ER-PIF-FUP records that linked and fulfilled the target. Of these, n = 1,014 (59.4%) indicated they were direct HIV providers on the PIF and FUP survey.
18 Regression models have included the following predictors:
Big 6 Worked in Ryan White funded setting Minority provider Minority serving Provider experience HIV+ clients per month Repeat trainee
All of the above predictors come directly from the PIF except for Repeat
trainee status, which is based on the linked PIF-ER
Regression models are restricted to direct providers of HIV+
ACRE FUP web survey is targeted to direct providers
19
Comes from PIF question 3
Clinical providers encompass 7 professional categories, though we often refer to them as “big 6” All other non-missing responses are coded as non-clinical providers
Participant Information Form
PIF3 Mutually exclusive
20
From the RWFUND administrative variable on the
Participant Information Form
Exceptions apply: some regions have advised the NEC
RWFUND =1 =0 =1 =0 =9 PIF8A
21
A minority provider is
A non-minority provider is a
Those without any race
Participant Information Form
PIF10_1 PIF10_2 PIF10_3 PIF10_4 PIF10_5 PIF9 =0 =1 Mutually exclusive Not mutually exclusive
22
Among providers with direct service experience to
“Minority serving” (i.e., serves greater than half
Not minority serving (i.e., serves fewer than half
=0 =1 =2 =3 =4
Participant Information Form
Skip pattern: This question should only be answered if PIF12_1=1 and PIF13=1 PIF12_2
23
Among providers with direct service experience to
Novice: 0 to <2 years of experience New: 2 to <3 years of experience Experienced: 3 or more years of experience = continuous numeric variable Skip pattern: This question should only be answered if PIF12_1=1 and PIF13=1 PIF14
24
Categories for HIV+ clients per month:
0/month: PIF13 = 0 (No direct HIV+ services provided)
1-19/month: PIF15 = 1 or 2 20+/month: PIF15 = 3 or 4 =0 =1 =2 =3 =4 Skip pattern: This question should only be answered if PIF12_1=1 and PIF13=1 PIF15
26 Repeat trainee status is relative to the last eligible record during the
analysis period
An individual who attended multiple AETC trainings with only 1 MAI
training would not be categorized as a repeat trainee in an MAI analysis, since the last eligible MAI training record is the first and only MAI training
However, this same individual would be considered a repeat trainee
for a cross-region analysis during this time period
A trainee is considered non-unique if s/he has the same PIF ID within a
combined AETC region (e.g., AETC13, 39, 51 considered combined PAMA region)
Assumption: An individual took trainings within one region only. For
example, Trainee who moved from CA to NY with training records in both regions would be counted as two separate individuals in the cross- site data.
27
Regional AETC Name Combined AETC Codes Delta 1, 30 Florida/Caribbean 2, 31, 57, 61 Midwest 4, 32 Mountain Plains 5, 33, 56 New England 8, 35 NY/NJ 10, 36 Northwest 11, 37, 52 Pacific 12, 38, 50, 68 PAMA 13, 39, 51 Southeast 15, 40, 58 TX/OK 16, 41
28
PIF_ID AETC Funding Type Training event (any type) during analytic period 12345678 13 MAI 1 12345678 13 Base 2 12345678 39 CDC testing 3 12345678 13 Base 4
If PIF_ID 12345678 were truly a valid ID and the records below are all
event data for this trainee in the fiscal year, sorted by event date:
In an MAI analysis, the latest MAI record would be retained. This trainee is
not a repeat trainee during the MAI training.
In an overall analysis, this trainee is a repeat trainee. The fourth training
record retained for the analysis.
Notes: AETC=39 is grouped with AETC=13 for the region PA/MA. Repeat
trainee analyses are coupled with the de-duplication process.
We identified data to include by limiting records to
Event Record form
29
ER5_3=1 Not mutually exclusive
We identified data to include by limiting records to
AETC = 30-41 (CDC testing code)
Event Record form
30
ER4_7=1 ER4_31=1
31
Original Scale IP Meanings FUP Meanings New Scale 1 “Novice” “Poor” “Disagree Strongly” “Strongly Disagree” 2 “Disagree” 25 3 “Neither Agree or Disagree” 50 4 “Agree” 75 5 “Expert” “Excellent” “Agree Strongly” “Strongly Agree” 100
For ease of interpretation, all outcome responses were rescaled from 1-5 to
0-100 so that the results could be interpreted as percent change:
Original scale values of 0 or >5 are recoded to missing. Decimal values
between 1-5 are rounded down to a whole number.
33
Over a wide range of disciplines, email response
Factors hypothesized to influence response rates
Number of questions Pre-notification Follow-up Salience
34
2013 response rates: 5% - 64%, avg: 30%
Top responders:
University of Hawaii (Pacific) – 64.1% YVFWC (Northwest) – 47.7% UNC Chapel Hill (SEATEC) – 42.5% AARTH (Northwest) – 42.1% Indiana (MATEC) – 38.5%
*Response rates by LPS for VF users with a minimum of 20 total participants
35
2014 response rates: 11% - 61%, avg: 27%
Top responders:
UK (SEATEC) – 60.5% USC (SEATEC) – 55.8% AZ AIDS ETC (Pacific) – 47.4% SPIPA (Northwest) – 44.0% Pittsburgh (PA/MA) – 41.0%
*Response rates by LPS for VF users with a minimum of 20 total participants **Response rates through October 1, 2014
36
LPS with >1 events per year have higher response
Average response rate for LPS with 10+ events: 35% Average response rate for LPS with 50+
Average response rate for LPS with <20
37
Email comments from top responders:
Online registration (UK & USC) Participant buy-in (UK, USC, SPIPA) Cultural awareness of participants (SPIPA) Monthly audits from central office (UK & USC)
38
Additional comments? Questions/concerns?