The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges
Dean H. Judson Planning, Research and Evaluation Division U.S. Census Bureau
The Statistical Administrative Records System and Administrative - - PowerPoint PPT Presentation
The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning, Research and Evaluation Division U.S. Census Bureau How Administrative Records Are
Dean H. Judson Planning, Research and Evaluation Division U.S. Census Bureau
11/20/2000 U.S. CENSUS BUREAU 2
Presentation (query results and displays) Database Recorded Events and Objects (administrative record) Observed Events and Objects ("sampling frame") Events and Objects (population)
Policy changes which change the definition of events and objects “Ontologies” and thresholds for observation Data entry errors and coding schemes Data management issues Query structure and spurious structure Data collection
11/20/2000 U.S. CENSUS BUREAU 3
State 1 State 2 State 3 State 1 State 2 State 3 State 4 State 1 State 2 State 3 State 1 State 2 State 1 State 2 State 3 State 1 State 2 State 1 State 2 State 1 State 2 State 3 State 4
Proper Representation Incomplete Representation Ambiguous Representation Meaningless States Source: Wand and Wang, 1996:90
11/20/2000 U.S. CENSUS BUREAU 4
11/20/2000 U.S. CENSUS BUREAU 5
Return to 4
IRS 1099 Person Edited File
5.25
IRS 1040 Person Edited File
5.20
HUD-TRACS Person Edited File
5.35
Medicare Person Edited File
5.30
Medicaid
Person Edited File (future possibility) 5.45
SSS Person Edited File
5.15
IHS Person Edited File
5.40
FAFSA
Person Edited File (future possibility) 5.55
CHUMS
Person Edited File (future possibility) 5.50
Unduplicate & Reset Address Pointers
5.75
Address Output
(aka 4.25) 5.70
Original Address Pointers
5.65
5
Concatenate, sort, and unduplicate
5.10
Person Characteristic File (PCF)
(aka 14.100) 5.05
Updated Address Pointers
5.80
Composite Person Output
5.60
Person Output
5.90
Merge
5.85 7 9 7 9
11/20/2000 U.S. CENSUS BUREAU 6
– Tax year data; April, 2000 refers to “tax year” 1999 – TY ‘99 file arrives October, 2000 – Business entities, estates, other institutions included – 120 million records/year – Households below the filing threshold do not need to file
– Czajka, 2000: 10-20% of addresses are PO Boxes, business addresses, tax preparers
– TY95+: SSN’s of dependents requested, recorded – Czjaika, 2000: 1987 study: .5% of primary filer, 1.6% of secondary filer, 3.4% of dependents’ SSN’s in error – Age, race, sex hispanic origin microdata not available
11/20/2000 U.S. CENSUS BUREAU 7
11/20/2000 U.S. CENSUS BUREAU 8
11/20/2000 U.S. CENSUS BUREAU 9
– Current and historical Medicare enrollment – “Active” and “Inactive” cases – 35-40 million records at any one point in time; September ‘93: 77 million records (active + inactive) – Proxy recipients listed on the file (e.g., John Doe’s benefits c/o Jane Doe; John Doe’s benefits c/o nursing home) – A small portion of records at any point in time are probably deceased (Kim and Sater, 2000) – Used in population estimates system for 65+ household population estimates
11/20/2000 U.S. CENSUS BUREAU 10
65+ population than “non-snowbird” states
11/20/2000 U.S. CENSUS BUREAU 11
11/20/2000 U.S. CENSUS BUREAU 12
11/20/2000 U.S. CENSUS BUREAU 13
– 750 million transaction records → 400 million individual SSN records – Post 1985: Enumeration at birth – For each SSN: Date of birth, gender, race, place of birth
as such
etc.)
20% either “unknown” or “other”
11/20/2000 U.S. CENSUS BUREAU 14
– For person data: One output record per person, assigned to an individual residence corresponding as closely as possible to Census residence definitions, in a household structure corresponding as closely as possible to Census household structure, containing microdata corresponding as closely as possible to Census short form microdata, and excluding persons which are not in the population of interest. – For address data: One output record per individual housing unit at a Basic Street Address, geocoded to Census TIGER geography, with address microdata and concepts corresponding as closely as possible to DMAF address fields and concepts, and excluding locations which are not in the population of interest.
11/20/2000 U.S. CENSUS BUREAU 15
15
Process file this cycle?
15.05
Yes No Process file this cycle?
15.05
Process file this cycle?
15.05
Hold for next cycle
15.10
End
Household Data Processing
15.90
17
Household Output 15.95
Address Data Processing
15.20
10
Person Editing
15.35
15
Program Development
Final Output Program 15.100
8
Data Delivery
15.115
5
Go To End15a
Final StARS Processing
15.105
18
Final StARS Output 15.110 Address Output 15.25 Person Output 15.80
Program Development
Household Processing Program 15.85
8
Program Development
Address Processing Program 15.15
8
Program Development
Person Editing Program 15.30
8
No Yes
Is current year’s PCF available? 15.60
Process Person Data
15.75
16
Social Security Number (SSN) Verification
15.50
13
Program Development
SSN Verification Program 15.45
8
Edited IHS File 15.40 Verified IHS File 15.55
Create Person Characteristic File (PCF)
15.65
14
Person Characteristic File (PCF) 15.70
11/20/2000 U.S. CENSUS BUREAU 16
– MD: Baltimore city, Baltimore county; – CO: El Paso county, Douglas county, Jefferson county
– Group Quarters survey – Coverage measurement survey
– Request for physical address (PO boxes/RR’s) – MAFGOR Geocoding – Field verification of addresses not matched to DMAF
11/20/2000 U.S. CENSUS BUREAU 17
Post-Processing
For details, see AREX 2000: Administrative Records Research File Processing Flowcharts.
17.195
Post-Processing
For details, see AREX 2000: Administrative Records Research File Processing Flowcharts.
17.195
Method 2 Only (Bottom-Up) Method 2 Only (Bottom-Up) Methods 1 and 2 Methods 1 and 2
Unmatched DMAF Addresses
17.160
Start
DMAF
17.120
Maryland & Colorado (MD&CO) Geocoded Files (with test site
records flagged) 17.25
Computer geocode the National File
(GEO) 17.20
Extract test site records from MD&CO Files
(GEO) 17.700
Receive MD&CO Files from GEO
(PRED) 17.30
Create StARS 1999 from MD&CO Files
(PRED) 17.35
StARS 1999 Master Housing File (MHF) for MD&CO
17.40
Extract ungeocoded city-style records
(GEO) 17.75
Perform Exploratory Data Analysis (EDA)
(PRED) 17.45
Request for Physical Addresses Mailout & Processing
(DSCMO/NPC/GEO/RCCs) 17.110
2
Unmatched Admin. Record Addresses
17.145
Census 2000 Person Data
17.190
AREX Address File
(after MAFGOR, Request for Physical Addresses, and Field Address Verification updates) 17.180
Matched Addresses
17.185
StARS Person Data
17.175
G Q Person
Data from Census
17.170
Clerical Resolution
Addresses (MAFGOR)
(GEO/FLD/RCCs) 17.80
3
Additional Un- geocoded Test Site Records
17.55
Additional Geocoded Test Site Records
17.50
Obtain DMAF from DSCMO
(PRED) 17.125
Pull off address records from DMAF by AREX test site counties
(PRED) 17.130
Planning & OMB Approval
(PRED) 17.05 National Administrative Address Records File 17.15
Acquire National Administrative Records File (PRED) 17.10
Field Address Verification & Processing
(FLD / DSCMO / NPC) 17.150
4
Copy P.O.Box and rural-style addresses
(PRED) 17.95
AREX P.O. Box and rural-style addresses
(aka 2.40) 17.100
Perform clerical review
(PRED) 17.140
Copy test site records to create AREX Address File
(PRED) 17.60
Match Geocoded City-style AREX Addresses to DMAF
(PRED) 17.135
AREX Address File
17.65
Update AREX Address File with MAFGOR results
(PRED) 17.85
Obtain person data from Census 2000
(DSCMO)17.165
Update AREX Address File with
(PRED) 17.155
Update AREX Address File with
(PRED) 17.115
Geocoded City-style AREX Addresses
17.90
11/20/2000 U.S. CENSUS BUREAU 18
g Evaluation 1: Comparison of both methods’ site and block level counts of population by race, Hispanic origin, age groups and gender, with comparable decennial census counts g Evaluation 2: Analyzing selected components of the AREX implementation processing g Evaluation 3: Comparison of “bottom up” housing unit and household level information with comparable Census 2000 housing unit and household information g Evaluation 4: Assessing the feasibility of using administrative records in lieu
11/20/2000 U.S. CENSUS BUREAU 19
– A delivery address suitable for receiving a payment check may not suffice for putting individuals at a street address – Difficult to distinguish individual units within the Basic Street Address – Race coding: Hispanic Origin is a separate race on NUMIDENT – Transaction data ≠ person data – How many names does a person have (and in what order)?
– JOHN WILSON The address is for Mary Smith. John Wilson may or – C/O MARY SMITH may not live there. – 1004 LAUREL LANE – ROCKMONT, MD 22345
11/20/2000 U.S. CENSUS BUREAU 20
– Huang and Kim, 2000: About 10 % of addresses are rural style – PO Boxes: 45% for IHS, 9.5% for Medicare, 7.5% for IRS 1040, 6.8% for SSS, 3.8% for IRS 1099, .4% for HUD-TRACS – Sater, 1995 IRS/CPS match: 86.5% of tax return cases had the same address as residence address, 94% coded to same county
– Addresses with both business and residential components
11/20/2000 U.S. CENSUS BUREAU 21
– When addresses or personal characteristics are measured with substantial variation, it is often not obvious whether a particular pair of records represent a duplicate or not. Yet, with multiple files, unduplication decisions must be made.
A Banana St 1 Apple St B 17 Banana St 3 Apple St Apt 1 C 19 Banana St Apt 5 3 Apple St Apt 2 D 44 MLK, Jr. Blvd 3 Apple St Apt 3 E 100 Route 4 3 Apple St Apt 4 F 7 Marie Ln 7 Apple St G Wife Mrs. Smith 9 Apple St H 5 Apple St # Apple St I 27 Apple St # Martin Luther King, Jr. Blvd J Apple St # Pennsylvania Ave K 9999 Apple St 7 Maria Ln L 3 Apple St Apt 5 M 1 Apple St N 3 Apple St Apt A O 3 Apple St ZZ P 3 Apple St Q 3 Apple St Apt 1 CHUMS-enhanced IMH File MAF
11/20/2000 U.S. CENSUS BUREAU 22
Street BSA BSA+Unit Example NO N/A N/A 1 Street is not in MAF, either it was just missing or it's a new street A,B,C 2 Different, but valid representation of street name D,E 3 Misspelling of street name F 4 Erroneous street name G YES NO N/A 1 BSA is not in MAF, either it was just missing or it's a new BSA - There is a "hole" in MAF H 2 BSA is not in MAF, either it was just missing or it's a new BSA - A missing "street extension" I 3 Existing street with no incoming street number J 4 Erroneous street number K YES YES NO 1 Unit not in MAF, either it was just missing or it's a new unit L 2 Valid match - a BSA without separate units M 3 Different representation of a unit N 4 Erroneous unit information O 5 Missing unit information P YES YES YES 1 Valid match Q MATCH
Outcome of "CHUMS-enhanced IMH File" / MAF Match
Possible Explanations
11/20/2000 U.S. CENSUS BUREAU 23
11/20/2000 U.S. CENSUS BUREAU 24
– Distinct problem from “point in time” data collection – Information states change over time/over databases
SAM SMITH
486 MAIN STREET
FAIRFIELD, VA 33412
(From TY97 IRS file, filed sometime in 1998)
matching can be performed
decision logic at each step
11/20/2000 U.S. CENSUS BUREAU 25
Center Meeting, San Francisco, CA, April 4, 1995.
Census, December 15, 1999.
from the U.S. Bureau of the Census, February 10, 2000.
Unpublished document.
Population Estimates Program. Paper presented at the Southern Demographic Association meetings, New Orleans, LA, August 29, 2000.
the Population Association of America, San Francisco, CA, April 6, 1995.
Communications of the ACM, 39: 86-95.
administrative records, and sampled nonresponse followup. Presentation to the U.S. Bureau of the Census, August 6, 1996.
the International Conference on Survey Nonresponse, Portland, OR., October 29, 1999.