working with tables and summary variables in tidycensus
play

Working with tables and summary variables in tidycensus Kyle - PowerPoint PPT Presentation

DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Working with tables and summary variables in tidycensus Kyle Walker Instructor DataCamp Analyzing US Census Data in R Tables in the ACS library(tidycensus)


  1. DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Working with tables and summary variables in tidycensus Kyle Walker Instructor

  2. DataCamp Analyzing US Census Data in R Tables in the ACS library(tidycensus) library(tidyverse) wa_income <- get_acs(geography = "county", state = "WA", table = "B19001") # A tibble: 663 x 5 GEOID NAME variable estimate moe <chr> <chr> <chr> <dbl> <dbl> 1 53001 Adams County, Washington B19001_001 5733 124 2 53001 Adams County, Washington B19001_002 400 100 3 53001 Adams County, Washington B19001_003 252 87 4 53001 Adams County, Washington B19001_004 373 126 5 53001 Adams County, Washington B19001_005 456 133 6 53001 Adams County, Washington B19001_006 396 103 7 53001 Adams County, Washington B19001_007 250 105 8 53001 Adams County, Washington B19001_008 342 82 9 53001 Adams County, Washington B19001_009 273 107 10 53001 Adams County, Washington B19001_010 283 112 # ... with 653 more rows

  3. DataCamp Analyzing US Census Data in R Summary variables in tidycensus race_vars <- c(White = "B03002_003", Black = "B03002_004", Native = "B03002_005", Asian = "B03002_006", HIPI = "B03002_007", Hispanic = "B03002_012") tx_race <- get_acs(geography = "county", state = "TX", variables = race_vars, summary_var = "B03002_001") tx_race # A tibble: 1,524 x 7 GEOID NAME variable estimate moe summary_est summary_moe <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> 1 48001 Anderson County, Texas White 34680 5 57772 NA 2 48001 Anderson County, Texas Black 12246 146 57772 NA 3 48001 Anderson County, Texas Native 206 58 57772 NA 4 48001 Anderson County, Texas Asian 336 71 57772 NA 5 48001 Anderson County, Texas HIPI 8 14 57772 NA 6 48001 Anderson County, Texas Hispanic 9799 NA 57772 NA 7 48003 Andrews County, Texas White 7250 20 17215 NA 8 48003 Andrews County, Texas Black 256 154 17215 NA 9 48003 Andrews County, Texas Native 15 25 17215 NA 10 48003 Andrews County, Texas Asian 36 62 17215 NA # ... with 1,514 more rows

  4. DataCamp Analyzing US Census Data in R Calculating percentages tx_race_pct <- tx_race %>% mutate(pct = 100 * (estimate / summary_est)) %>% select(NAME, variable, pct) tx_race_pct # A tibble: 1,524 x 3 NAME variable pct <chr> <chr> <dbl> 1 Anderson County, Texas White 60.0 2 Anderson County, Texas Black 21.2 3 Anderson County, Texas Native 0.357 4 Anderson County, Texas Asian 0.582 5 Anderson County, Texas HIPI 0.0138 6 Anderson County, Texas Hispanic 17.0 7 Andrews County, Texas White 42.1 8 Andrews County, Texas Black 1.49 9 Andrews County, Texas Native 0.0871 10 Andrews County, Texas Asian 0.209 # ... with 1,514 more rows

  5. DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Let's practice!

  6. DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Census data wrangling with tidy tools Kyle Walker Instructor

  7. DataCamp Analyzing US Census Data in R The tidyverse

  8. DataCamp Analyzing US Census Data in R Group-wise Census data analysis tx_largest <- tx_race %>% group_by(GEOID) %>% filter(estimate == max(estimate)) %>% select(NAME, variable, estimate) tx_largest # A tibble: 254 x 4 # Groups: GEOID [254] GEOID NAME variable estimate <chr> <chr> <chr> <dbl> 1 48001 Anderson County, Texas White 34680 2 48003 Andrews County, Texas Hispanic 9360 3 48005 Angelina County, Texas White 54060 4 48007 Aransas County, Texas White 16836 5 48009 Archer County, Texas White 7751 6 48011 Armstrong County, Texas White 1601 7 48013 Atascosa County, Texas Hispanic 30094 8 48015 Austin County, Texas White 18573 9 48017 Bailey County, Texas Hispanic 4401 10 48019 Bandera County, Texas White 16636 # ... with 244 more rows

  9. DataCamp Analyzing US Census Data in R Group-wise Census data analysis tx_largest %>% group_by(variable) %>% tally() # A tibble: 2 x 2 variable n <chr> <int> 1 Hispanic 67 2 White 187

  10. DataCamp Analyzing US Census Data in R Recoding variables for group-wise analysis wa_grouped <- wa_income %>% filter(variable != "B19001_001") %>% mutate(incgroup = case_when( variable < "B19001_008" ~ "below35k", variable < "B19001_013" ~ "35kto75k", TRUE ~ "above75k")) %>% group_by(NAME, incgroup) %>% summarize(group_est = sum(estimate)) wa_grouped # A tibble: 117 x 3 NAME incgroup group_est <chr> <chr> <dbl> 1 Adams County, Washington 35kto75k 2124 2 Adams County, Washington above75k 1482 3 Adams County, Washington below35k 2127 4 Asotin County, Washington 35kto75k 3054 5 Asotin County, Washington above75k 2533 6 Asotin County, Washington below35k 3710 7 Benton County, Washington 35kto75k 22106 8 Benton County, Washington above75k 27525 9 Benton County, Washington below35k 18787 10 Chelan County, Washington 35kto75k 9549 # ... with 107 more rows

  11. DataCamp Analyzing US Census Data in R Iterating through years with purrr mi_cities <- map_df(2012:2016, function(x) { get_acs(geography = "place", variables = c(totalpop = "B01003_001"), state = "MI", survey = "acs1", year = x) %>% mutate(year = x) }) mi_cities %>% arrange(NAME, year) # A tibble: 80 x 6 GEOID NAME variable estimate moe year <chr> <chr> <chr> <dbl> <dbl> <int> 1 2603000 Ann Arbor city, Michigan totalpop 116128 35 2012 2 2603000 Ann Arbor city, Michigan totalpop 117034 43 2013 3 2603000 Ann Arbor city, Michigan totalpop 117759 44 2014 4 2603000 Ann Arbor city, Michigan totalpop 117070 33 2015 5 2603000 Ann Arbor city, Michigan totalpop 120777 33 2016 6 2621000 Dearborn city, Michigan totalpop 96470 28 2012 7 2621000 Dearborn city, Michigan totalpop 95888 35 2013 8 2621000 Dearborn city, Michigan totalpop 95546 48 2014 9 2621000 Dearborn city, Michigan totalpop 95180 40 2015 10 2621000 Dearborn city, Michigan totalpop 94430 52 2016 # ... with 70 more rows

  12. DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Let's practice!

  13. DataCamp Analyzing US Census Data in R ANALYZING US CENSUS DATA IN R Working with margins of error in tidycensus Kyle Walker Instructor

  14. DataCamp Analyzing US Census Data in R ACS data vs. Census data Decennial Census provides official counts ACS provides population characteristics Learn more from the ACS handbook

  15. DataCamp Analyzing US Census Data in R Margins of error in the ACS get_acs(geography = "county", variables = c(median_age = "B01002_001"), state = "OR") # A tibble: 36 x 5 GEOID NAME variable estimate moe <chr> <chr> <chr> <dbl> <dbl> 1 41001 Baker County, Oregon median_age 48.2 0.4 2 41003 Benton County, Oregon median_age 32.6 0.3 3 41005 Clackamas County, Oregon median_age 41.4 0.2 4 41007 Clatsop County, Oregon median_age 43.7 0.4 5 41009 Columbia County, Oregon median_age 43.3 0.4 6 41011 Coos County, Oregon median_age 48.2 0.3 7 41013 Crook County, Oregon median_age 48.3 0.7 8 41015 Curry County, Oregon median_age 55.1 0.4 9 41017 Deschutes County, Oregon median_age 42 0.3 10 41019 Douglas County, Oregon median_age 47 0.3 # ... with 26 more rows

  16. DataCamp Analyzing US Census Data in R Inspecting margins of error vt_eldpov <- get_acs(geography = "tract", variables = c(eldpovm = "B17001_016", eldpovf = "B17001_030"), state = "VT") vt_eldpov # A tibble: 368 x 5 GEOID NAME variable estimate moe <chr> <chr> <chr> <dbl> <dbl> 1 50001960100 Census Tract 9601... eldpovm 0. 9. 2 50001960100 Census Tract 9601... eldpovf 5. 5. 3 50001960200 Census Tract 9602... eldpovm 0. 9. 4 50001960200 Census Tract 9602... eldpovf 0. 9. 5 50001960300 Census Tract 9603... eldpovm 16. 14. 6 50001960300 Census Tract 9603... eldpovf 5. 7. 7 50001960400 Census Tract 9604... eldpovm 11. 7. 8 50001960400 Census Tract 9604... eldpovf 18. 9. 9 50001960500 Census Tract 9605... eldpovm 0. 9. 10 50001960500 Census Tract 9605... eldpovf 0. 9. # ... with 358 more rows

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend