DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Working with tables and summary variables in tidycensus Kyle Walker Instructor
DataCamp Analyzing US Census Data in R Tables in the ACS library(tidycensus) library(tidyverse) wa_income <- get_acs(geography = "county", state = "WA", table = "B19001") # A tibble: 663 x 5 GEOID NAME variable estimate moe <chr> <chr> <chr> <dbl> <dbl> 1 53001 Adams County, Washington B19001_001 5733 124 2 53001 Adams County, Washington B19001_002 400 100 3 53001 Adams County, Washington B19001_003 252 87 4 53001 Adams County, Washington B19001_004 373 126 5 53001 Adams County, Washington B19001_005 456 133 6 53001 Adams County, Washington B19001_006 396 103 7 53001 Adams County, Washington B19001_007 250 105 8 53001 Adams County, Washington B19001_008 342 82 9 53001 Adams County, Washington B19001_009 273 107 10 53001 Adams County, Washington B19001_010 283 112 # ... with 653 more rows
DataCamp Analyzing US Census Data in R Summary variables in tidycensus race_vars <- c(White = "B03002_003", Black = "B03002_004", Native = "B03002_005", Asian = "B03002_006", HIPI = "B03002_007", Hispanic = "B03002_012") tx_race <- get_acs(geography = "county", state = "TX", variables = race_vars, summary_var = "B03002_001") tx_race # A tibble: 1,524 x 7 GEOID NAME variable estimate moe summary_est summary_moe <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> 1 48001 Anderson County, Texas White 34680 5 57772 NA 2 48001 Anderson County, Texas Black 12246 146 57772 NA 3 48001 Anderson County, Texas Native 206 58 57772 NA 4 48001 Anderson County, Texas Asian 336 71 57772 NA 5 48001 Anderson County, Texas HIPI 8 14 57772 NA 6 48001 Anderson County, Texas Hispanic 9799 NA 57772 NA 7 48003 Andrews County, Texas White 7250 20 17215 NA 8 48003 Andrews County, Texas Black 256 154 17215 NA 9 48003 Andrews County, Texas Native 15 25 17215 NA 10 48003 Andrews County, Texas Asian 36 62 17215 NA # ... with 1,514 more rows
DataCamp Analyzing US Census Data in R Calculating percentages tx_race_pct <- tx_race %>% mutate(pct = 100 * (estimate / summary_est)) %>% select(NAME, variable, pct) tx_race_pct # A tibble: 1,524 x 3 NAME variable pct <chr> <chr> <dbl> 1 Anderson County, Texas White 60.0 2 Anderson County, Texas Black 21.2 3 Anderson County, Texas Native 0.357 4 Anderson County, Texas Asian 0.582 5 Anderson County, Texas HIPI 0.0138 6 Anderson County, Texas Hispanic 17.0 7 Andrews County, Texas White 42.1 8 Andrews County, Texas Black 1.49 9 Andrews County, Texas Native 0.0871 10 Andrews County, Texas Asian 0.209 # ... with 1,514 more rows
DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Let's practice!
DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Census data wrangling with tidy tools Kyle Walker Instructor
DataCamp Analyzing US Census Data in R The tidyverse
DataCamp Analyzing US Census Data in R Group-wise Census data analysis tx_largest <- tx_race %>% group_by(GEOID) %>% filter(estimate == max(estimate)) %>% select(NAME, variable, estimate) tx_largest # A tibble: 254 x 4 # Groups: GEOID [254] GEOID NAME variable estimate <chr> <chr> <chr> <dbl> 1 48001 Anderson County, Texas White 34680 2 48003 Andrews County, Texas Hispanic 9360 3 48005 Angelina County, Texas White 54060 4 48007 Aransas County, Texas White 16836 5 48009 Archer County, Texas White 7751 6 48011 Armstrong County, Texas White 1601 7 48013 Atascosa County, Texas Hispanic 30094 8 48015 Austin County, Texas White 18573 9 48017 Bailey County, Texas Hispanic 4401 10 48019 Bandera County, Texas White 16636 # ... with 244 more rows
DataCamp Analyzing US Census Data in R Group-wise Census data analysis tx_largest %>% group_by(variable) %>% tally() # A tibble: 2 x 2 variable n <chr> <int> 1 Hispanic 67 2 White 187
DataCamp Analyzing US Census Data in R Recoding variables for group-wise analysis wa_grouped <- wa_income %>% filter(variable != "B19001_001") %>% mutate(incgroup = case_when( variable < "B19001_008" ~ "below35k", variable < "B19001_013" ~ "35kto75k", TRUE ~ "above75k")) %>% group_by(NAME, incgroup) %>% summarize(group_est = sum(estimate)) wa_grouped # A tibble: 117 x 3 NAME incgroup group_est <chr> <chr> <dbl> 1 Adams County, Washington 35kto75k 2124 2 Adams County, Washington above75k 1482 3 Adams County, Washington below35k 2127 4 Asotin County, Washington 35kto75k 3054 5 Asotin County, Washington above75k 2533 6 Asotin County, Washington below35k 3710 7 Benton County, Washington 35kto75k 22106 8 Benton County, Washington above75k 27525 9 Benton County, Washington below35k 18787 10 Chelan County, Washington 35kto75k 9549 # ... with 107 more rows
DataCamp Analyzing US Census Data in R Iterating through years with purrr mi_cities <- map_df(2012:2016, function(x) { get_acs(geography = "place", variables = c(totalpop = "B01003_001"), state = "MI", survey = "acs1", year = x) %>% mutate(year = x) }) mi_cities %>% arrange(NAME, year) # A tibble: 80 x 6 GEOID NAME variable estimate moe year <chr> <chr> <chr> <dbl> <dbl> <int> 1 2603000 Ann Arbor city, Michigan totalpop 116128 35 2012 2 2603000 Ann Arbor city, Michigan totalpop 117034 43 2013 3 2603000 Ann Arbor city, Michigan totalpop 117759 44 2014 4 2603000 Ann Arbor city, Michigan totalpop 117070 33 2015 5 2603000 Ann Arbor city, Michigan totalpop 120777 33 2016 6 2621000 Dearborn city, Michigan totalpop 96470 28 2012 7 2621000 Dearborn city, Michigan totalpop 95888 35 2013 8 2621000 Dearborn city, Michigan totalpop 95546 48 2014 9 2621000 Dearborn city, Michigan totalpop 95180 40 2015 10 2621000 Dearborn city, Michigan totalpop 94430 52 2016 # ... with 70 more rows
DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Let's practice!
DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Working with margins of error in tidycensus Kyle Walker Instructor
DataCamp Analyzing US Census Data in R ACS data vs. Census data Decennial Census provides official counts ACS provides population characteristics Learn more from the ACS handbook
DataCamp Analyzing US Census Data in R Margins of error in the ACS get_acs(geography = "county", variables = c(median_age = "B01002_001"), state = "OR") # A tibble: 36 x 5 GEOID NAME variable estimate moe <chr> <chr> <chr> <dbl> <dbl> 1 41001 Baker County, Oregon median_age 48.2 0.4 2 41003 Benton County, Oregon median_age 32.6 0.3 3 41005 Clackamas County, Oregon median_age 41.4 0.2 4 41007 Clatsop County, Oregon median_age 43.7 0.4 5 41009 Columbia County, Oregon median_age 43.3 0.4 6 41011 Coos County, Oregon median_age 48.2 0.3 7 41013 Crook County, Oregon median_age 48.3 0.7 8 41015 Curry County, Oregon median_age 55.1 0.4 9 41017 Deschutes County, Oregon median_age 42 0.3 10 41019 Douglas County, Oregon median_age 47 0.3 # ... with 26 more rows
DataCamp Analyzing US Census Data in R Inspecting margins of error vt_eldpov <- get_acs(geography = "tract", variables = c(eldpovm = "B17001_016", eldpovf = "B17001_030"), state = "VT") vt_eldpov # A tibble: 368 x 5 GEOID NAME variable estimate moe <chr> <chr> <chr> <dbl> <dbl> 1 50001960100 Census Tract 9601... eldpovm 0. 9. 2 50001960100 Census Tract 9601... eldpovf 5. 5. 3 50001960200 Census Tract 9602... eldpovm 0. 9. 4 50001960200 Census Tract 9602... eldpovf 0. 9. 5 50001960300 Census Tract 9603... eldpovm 16. 14. 6 50001960300 Census Tract 9603... eldpovf 5. 7. 7 50001960400 Census Tract 9604... eldpovm 11. 7. 8 50001960400 Census Tract 9604... eldpovf 18. 9. 9 50001960500 Census Tract 9605... eldpovm 0. 9. 10 50001960500 Census Tract 9605... eldpovf 0. 9. # ... with 358 more rows
Recommend
More recommend