Catherine Fitch and Steven Ruggles Family History Technology - - PowerPoint PPT Presentation

catherine fitch and steven ruggles family history
SMART_READER_LITE
LIVE PREVIEW

Catherine Fitch and Steven Ruggles Family History Technology - - PowerPoint PPT Presentation

Catherine Fitch and Steven Ruggles Family History Technology Workshop February 2018 Keypunch operators, 1940 Census Each person in the world creates a Book of Life. This Book starts with birth and ends with death. Record linkage is the name


slide-1
SLIDE 1

Keypunch operators, 1940 Census

Catherine Fitch and Steven Ruggles Family History Technology Workshop February 2018

slide-2
SLIDE 2

Each person in the world creates a Book of Life. This Book starts with birth and ends with death. Record linkage is the name of the process of assembling the pages of this Book into a volume.

  • Halbert L. Dunn, 1946
slide-3
SLIDE 3

Big Data

Transactional or “Organic” Data

  • Administrative
  • Social Security
  • Medicare
  • Military
  • Taxes
  • Commercial
  • Credit ratings
  • Phone records
  • Social Media

Designed Data

  • Censuses
  • Surveys
  • Remote sensing
  • satellite imagery
  • weather stations
slide-4
SLIDE 4

The biggest payoff will lie in new combinations of designed data and organic data, not in one type alone

  • Robert Groves, 2011
slide-5
SLIDE 5

Organic/Transactional data is voluminous, but

  • shallow (few variables) and
  • non-representative

Both problems can be overcome by linking to Designed data

slide-6
SLIDE 6
slide-7
SLIDE 7

National Longitudinal Research Infrastructure

Life histories for each person

  • Censuses
  • Social Security
  • Military records (draft, enlistment)
  • Vital records (birth, death, marriage, divorce)
  • Health (Medicare, Medicaid)
  • Surveys
slide-8
SLIDE 8

Link across 5+ generations, 1850-2020

National Longitudinal Research Infrastructure

slide-9
SLIDE 9

Cover, 1960 Census Microdata Codebook Distributed on 13 Univac Tapes (or 18,000 punchcards)

The First Microdata: The 1960 Census Samples

slide-10
SLIDE 10

Historical Data

slide-11
SLIDE 11

1991: Eight Census Years 1850-1980

All Incompatible (except 1960 and 1970)

slide-12
SLIDE 12

1991 IPUMS proposal: An integrated database for

1880, 1900, 1910, 1940, 1950, 1960, 1970, 1980, 1990

Harmonized codes Consistent record layout Integrated documentation No loss of information

.

slide-13
SLIDE 13

Percent Female; Scientists and Engineers

5 10 15 20 25 30 35 40 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2005

Year Percent Female Engineers Scientists

IPUMS Graph from “A Century of Women in Science and Engineering,” History Day project by Abby Norling-Ruggles, age 12

slide-14
SLIDE 14

500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000 1995 2000 2005 2010 2015

Gigabytes per week

IPUMS Data Dissemination, 1995-2017 Five Terabytes distributed each week

slide-15
SLIDE 15

25,000 50,000 75,000 100,000 125,000 150,000 175,000 200,000 1995 2000 2005 2010 2015

Number of users

Registered IPUMS data users, 1995-2017 189,000 Registered IPUMS Users

slide-16
SLIDE 16

500 1,000 1,500 2,000 1995 2000 2005 2010 2015

Annual citations

Annual citations

  • f IPUMS Data

(Google Scholar)

A new paper every four hours

slide-17
SLIDE 17

usa.ipums.org

slide-18
SLIDE 18

200,000,000 400,000,000 600,000,000 800,000,000 1,000,000,000 Microdata digitized from historical manuscripts Public-use microdata from Census Bureau

U.S. public use microdata available for research, 1973-2018

(number of person records) 1880 1940

slide-19
SLIDE 19

500,000,000 1,000,000,000 1,500,000,000 2,000,000,000 1970 1980 1990 2000 2010 IPUMS-Format Microdata in the Census Research Data Centers IPUMS Microdata digitized from historical manuscripts Public-use IPUMS data from Census

Integrated U.S. microdata available for research, 1970-2018

(number of person records)

We are here

slide-20
SLIDE 20

Federal Statistical Research Data Centers

30 locations and growing

slide-21
SLIDE 21
  • Census Longitudinal Infrastructure Project
  • IPUMS Multigenerational Longitudinal Panel
slide-22
SLIDE 22

Sanders Ferrie O’Hara

The Census Longitudinal Infrastructure Project (CLIP)

Alexander

1940 Linking Meeting Minneapolis, February 10-11, 2014

slide-23
SLIDE 23

1940 Census Medicare Medicaid 2000, 2010 Censuses 1940-2020

Deaths CLIP Linking Strategy

HUD Selective Service Private Vendors SSA Numident Federal Surveys WW II Military

slide-24
SLIDE 24

Capturing names in the 1990 census through OCR

slide-25
SLIDE 25

Multigenerational Longitudinal Panel

Ruggles Warren Fitch Hacker Roberts Sobek Bailey Goeken Price

slide-26
SLIDE 26

1850 IPUMS Sample

100% 1880 Census

1870 IPUMS Sample 1900 IPUMS Sample 1910 IPUMS Sample 1930 IPUMS Sample 1920 IPUMS Sample 1860 IPUMS Sample

IPUMS Linked Representative Samples

Final version June 2010

slide-27
SLIDE 27

1900 Census 1910 Census 1930 Census 1920 Census

Multigenerational Longitudinal Panel

1940 Census 1880 Census 1870 Census 1860 Census 1850 Census

slide-28
SLIDE 28

1940 Census CLIP WW I Military 1881-1930 Births Marriages 1881-2020 Deaths

MLP Linking Strategy

1850-1930 Censuses Numident Genealogies?

slide-29
SLIDE 29

National Longitudinal Research Infrastructure

Life histories for each person

  • Impact of early life conditions on later health and

well-being

  • Social, Economic, Geographic Mobility
  • Life course transitions
slide-30
SLIDE 30

Link across 5+ generations

  • Impact of forebears on health and well-being
  • Socioeconomic mobility across generations:

Do we have dynasties?

National Longitudinal Research Infrastructure

slide-31
SLIDE 31

Understanding the great transformations:

demographic transition, family transition, urbanization, immigration, industrialization

National Longitudinal Research Infrastructure

slide-32
SLIDE 32

Higher prior exposure to water-borne lead among male World War Two U.S. Army enlistees was associated with lower intelligence test scores. Exposure was proxied by urban residence and the water pH levels of the cities where enlistees lived in 1930.

slide-33
SLIDE 33

National Longitudinal Research Infrastructure

  • Impact of lead exposure on Alzheimer’s disease
  • Effect of early-life cognitive capacity on later

economic success

  • Transmission of health and well-being over

multiple generations

  • Effects of early-life income support on later
  • utcomes
slide-34
SLIDE 34

Thank You.