SLIDE 1 Keypunch operators, 1940 Census
Catherine Fitch and Steven Ruggles Family History Technology Workshop February 2018
SLIDE 2 Each person in the world creates a Book of Life. This Book starts with birth and ends with death. Record linkage is the name of the process of assembling the pages of this Book into a volume.
SLIDE 3 Big Data
Transactional or “Organic” Data
- Administrative
- Social Security
- Medicare
- Military
- Taxes
- Commercial
- Credit ratings
- Phone records
- Social Media
Designed Data
- Censuses
- Surveys
- Remote sensing
- satellite imagery
- weather stations
SLIDE 4 The biggest payoff will lie in new combinations of designed data and organic data, not in one type alone
SLIDE 5 Organic/Transactional data is voluminous, but
- shallow (few variables) and
- non-representative
Both problems can be overcome by linking to Designed data
SLIDE 6
SLIDE 7 National Longitudinal Research Infrastructure
Life histories for each person
- Censuses
- Social Security
- Military records (draft, enlistment)
- Vital records (birth, death, marriage, divorce)
- Health (Medicare, Medicaid)
- Surveys
SLIDE 8
Link across 5+ generations, 1850-2020
National Longitudinal Research Infrastructure
SLIDE 9
Cover, 1960 Census Microdata Codebook Distributed on 13 Univac Tapes (or 18,000 punchcards)
The First Microdata: The 1960 Census Samples
SLIDE 10
Historical Data
SLIDE 11
1991: Eight Census Years 1850-1980
All Incompatible (except 1960 and 1970)
SLIDE 12
1991 IPUMS proposal: An integrated database for
1880, 1900, 1910, 1940, 1950, 1960, 1970, 1980, 1990
Harmonized codes Consistent record layout Integrated documentation No loss of information
.
SLIDE 13 Percent Female; Scientists and Engineers
5 10 15 20 25 30 35 40 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2005
Year Percent Female Engineers Scientists
IPUMS Graph from “A Century of Women in Science and Engineering,” History Day project by Abby Norling-Ruggles, age 12
SLIDE 14
500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000 1995 2000 2005 2010 2015
Gigabytes per week
IPUMS Data Dissemination, 1995-2017 Five Terabytes distributed each week
SLIDE 15
25,000 50,000 75,000 100,000 125,000 150,000 175,000 200,000 1995 2000 2005 2010 2015
Number of users
Registered IPUMS data users, 1995-2017 189,000 Registered IPUMS Users
SLIDE 16 500 1,000 1,500 2,000 1995 2000 2005 2010 2015
Annual citations
Annual citations
(Google Scholar)
A new paper every four hours
SLIDE 17
usa.ipums.org
SLIDE 18
200,000,000 400,000,000 600,000,000 800,000,000 1,000,000,000 Microdata digitized from historical manuscripts Public-use microdata from Census Bureau
U.S. public use microdata available for research, 1973-2018
(number of person records) 1880 1940
SLIDE 19
500,000,000 1,000,000,000 1,500,000,000 2,000,000,000 1970 1980 1990 2000 2010 IPUMS-Format Microdata in the Census Research Data Centers IPUMS Microdata digitized from historical manuscripts Public-use IPUMS data from Census
Integrated U.S. microdata available for research, 1970-2018
(number of person records)
We are here
SLIDE 20
Federal Statistical Research Data Centers
30 locations and growing
SLIDE 21
- Census Longitudinal Infrastructure Project
- IPUMS Multigenerational Longitudinal Panel
SLIDE 22
Sanders Ferrie O’Hara
The Census Longitudinal Infrastructure Project (CLIP)
Alexander
1940 Linking Meeting Minneapolis, February 10-11, 2014
SLIDE 23
1940 Census Medicare Medicaid 2000, 2010 Censuses 1940-2020
Deaths CLIP Linking Strategy
HUD Selective Service Private Vendors SSA Numident Federal Surveys WW II Military
SLIDE 24
Capturing names in the 1990 census through OCR
SLIDE 25
Multigenerational Longitudinal Panel
Ruggles Warren Fitch Hacker Roberts Sobek Bailey Goeken Price
SLIDE 26
1850 IPUMS Sample
100% 1880 Census
1870 IPUMS Sample 1900 IPUMS Sample 1910 IPUMS Sample 1930 IPUMS Sample 1920 IPUMS Sample 1860 IPUMS Sample
IPUMS Linked Representative Samples
Final version June 2010
SLIDE 27
1900 Census 1910 Census 1930 Census 1920 Census
Multigenerational Longitudinal Panel
1940 Census 1880 Census 1870 Census 1860 Census 1850 Census
SLIDE 28
1940 Census CLIP WW I Military 1881-1930 Births Marriages 1881-2020 Deaths
MLP Linking Strategy
1850-1930 Censuses Numident Genealogies?
SLIDE 29 National Longitudinal Research Infrastructure
Life histories for each person
- Impact of early life conditions on later health and
well-being
- Social, Economic, Geographic Mobility
- Life course transitions
SLIDE 30 Link across 5+ generations
- Impact of forebears on health and well-being
- Socioeconomic mobility across generations:
Do we have dynasties?
National Longitudinal Research Infrastructure
SLIDE 31
Understanding the great transformations:
demographic transition, family transition, urbanization, immigration, industrialization
National Longitudinal Research Infrastructure
SLIDE 32
Higher prior exposure to water-borne lead among male World War Two U.S. Army enlistees was associated with lower intelligence test scores. Exposure was proxied by urban residence and the water pH levels of the cities where enlistees lived in 1930.
SLIDE 33 National Longitudinal Research Infrastructure
- Impact of lead exposure on Alzheimer’s disease
- Effect of early-life cognitive capacity on later
economic success
- Transmission of health and well-being over
multiple generations
- Effects of early-life income support on later
- utcomes
SLIDE 34
Thank You.