show the right numbers ggplot s flow of action will be
play

Show the Right Numbers ggplots FLOW OF ACTION Will be handled - PowerPoint PPT Presentation

Show the Right Numbers ggplots FLOW OF ACTION Will be handled automatically Themes unless we say Guides otherwise Coordinates and Scales We always have Geoms and/or Stats to specify these Aesthetic Mappings Data to draw a plot


  1. Show the Right Numbers

  2. ggplot’s FLOW OF ACTION

  3. Will be handled automatically Themes unless we say Guides otherwise Coordinates and Scales We always have Geoms and/or Stats to specify these Aesthetic Mappings Data to draw a plot

  4. Grouped Data and the group aesthetic

  5. p + geom_line (color = "gray70", mapping = aes (group = country)) + geom_smooth (size = 1.1, method = "loess", se = FALSE) + scale_y_log10 (labels=scales::dollar) + facet_wrap (~ continent, ncol = 5) + labs (x = "Year", y = "GDP per capita", title = "GDP per capita on Five Continents") The labs() function lets you name labels, title, subtitle, etc.

  6. geoms CAN TRANSFORM DATA

  7. gss_sm A subset of General Social Survey Questions from 2016

  8. with (gss_sm, table (religion)) ## ## Protestant Catholic Jewish None Other ## 1371 649 51 619 159

  9. p <- ggplot (data = gss_sm, mapping = aes (x = bigregion)) p + geom_bar () Just the one aesthetic mapping, to x .

  10. The y-axis variable, count , is not in the data. Instead, ggplot has calculated it for us. It does this using the default stat_ function associated with geom_bar() , stat_count(). This function can compute two new variables, count , and prop (short for proportion ). The count statistic is the default one used.

  11. p <- ggplot (data = gss_sm, mapping = aes (x = bigregion)) p + geom_bar (mapping = aes (y = ..prop..))

  12. p <- ggplot (data = gss_sm, mapping = aes (x = bigregion)) p + geom_bar (mapping = aes (y = ..prop.., group = 1))

  13. p + geom_bar () p + stat_count () geom_ functions call their default stat_ functions behind the scenes. (And vice versa)

  14. p <- ggplot (data = gss_sm, mapping = aes (x = religion)) p + geom_bar () p <- ggplot (data = gss_sm, mapping = aes (x = religion, color = religion)) p + geom_bar () p <- ggplot (data = gss_sm, mapping = aes (x = religion, fill = religion)) p + geom_bar () p <- ggplot (data = gss_sm, mapping = aes (x = religion, fill = religion)) p + geom_bar () + guides(fill = FALSE)

  15. p <- ggplot (data = gss_sm, mapping = aes (x = religion, color = religion)) p + geom_bar () p <- ggplot (data = gss_sm, mapping = aes (x = religion, fill = religion)) p + geom_bar () + guides (fill = FALSE)

  16. HISTOGRAMS & KERNEL DENSITIES

  17. midwest County-Level Census Data for Midwestern States

  18. p <- ggplot (data = midwest, mapping = aes (x = area)) p + geom_histogram () ## `stat_bin()` using `bins = 30`. ## Pick better value with `binwidth`. The default stat for this geom has to make a choice, and is letting us know we might want to override it.

  19. p <- ggplot (data = midwest, mapping = aes (x = area)) p + geom_histogram (bins = 10)

  20. oh_wi <- c ("OH", "WI") a convenient, subset our data built-in operator on the fly p <- ggplot (data = subset (midwest, state %in% oh_wi), mapping = aes (x = percollege, fill = state)) p + geom_histogram (position = "identity", alpha = 0.4, bins = 20) Just plot x by its values on the scale, don’t stack or dodge

  21. p <- ggplot (data = midwest, mapping = aes (x = area)) p + geom_density () geom_hist() ’s continuous counterpart, geom_density()

  22. p <- ggplot (data = midwest, mapping = aes (x = area, fill = state, color = state)) p + geom_density (alpha = 0.3)

  23. AVOIDING TRANSFORMATIONS WHEN NECESSARY

  24. > titanic ## fate gender n percent ## 1 perished male 1364 62.0 ## 2 perished female 126 5.7 ## 3 survived male 367 16.7 ## 4 survived female 344 15.6 No counting up required? Then stat = identity

  25. p <- ggplot (data = titanic, mapping = aes (x = fate, y = percent, fill = sex)) p + geom_bar (stat = "identity", position = "dodge") + theme (legend.position = "top") The theme() function controls parts of the plot that don’t belong to its “grammatical” structure

  26. p <- ggplot (data = titanic, mapping = aes (x = fate, y = percent, fill = sex)) p + geom_col (position = "dodge") + theme (legend.position = "top") Even better: for convenience when not counting up, just use geom_col()

  27. oecd_sum ## # A tibble: 57 x 5 ## # Groups: year [57] ## year other usa diff hi_lo ## <int> <dbl> <dbl> <dbl> <chr> ## 1 1960 68.6 69.9 1.30 Below ## 2 1961 69.2 70.4 1.20 Below ## 3 1962 68.9 70.2 1.30 Below ## 4 1963 69.1 70.0 0.900 Below ## 5 1964 69.5 70.3 0.800 Below ## 6 1965 69.6 70.3 0.700 Below ## 7 1966 69.9 70.3 0.400 Below ## 8 1967 70.1 70.7 0.600 Below ## 9 1968 70.1 70.4 0.300 Below ## 10 1969 70.1 70.6 0.500 Below ## # ... with 47 more rows

  28. p <- ggplot (data = oecd_sum, mapping = aes (x = year, y = diff, fill = hi_lo)) p + geom_col () + guides (fill = FALSE) + labs (x = NULL, y = "Difference in Years", title = "The US Life Expectancy Gap", subtitle = "Difference between US and OECD average life expectancies, 1960-2015", caption = "Data: OECD. After a chart by Christopher Ingraham, Washington Post, December 27th 2017.")

  29. CROSSTABULATION THE AWKWARD WAY

  30. WARNING! There’s nothing wrong with the code on the next few slides. If you go searching online for how to make a proportional bar chart with ggplot you’ll see answers like this. But, doing it this way is confusing and I find it is much easier to work a slightly di ff erent way. So, I won’t cover this approach in class. I’m including it here so you can see why it’s awkward.

  31. p <- ggplot (data = gss_sm, mapping = aes (x = religion, color = religion)) p + geom_bar () p <- ggplot (data = gss_sm, mapping = aes (x = religion, fill = religion)) p + geom_bar () + guides (fill = FALSE)

  32. p <- ggplot (data = gss_sm, mapping = aes (x = bigregion, fill = religion)) p + geom_bar () Counts are easy

  33. Position adjustments don’t give us the view we want p <- ggplot (data = gss_sm, mapping = aes (x = bigregion, fill = religion)) p + geom_bar (position = "fill")

  34. p <- ggplot (data = gss_sm, mapping = aes (x = bigregion, fill = religion)) p + geom_bar (position = "dodge", mapping = aes (y = ..prop..))

  35. Nope

  36. p <- ggplot (data = gss_sm, mapping = aes (x = bigregion, fill = religion)) p + geom_bar (position = "dodge", mapping = aes (y = ..prop.., group = religion))

  37. Still not right! Also: hard to read

  38. Time to take a step back p <- ggplot (data = gss_sm, mapping = aes (x = religion)) p + geom_bar (position = "dodge", mapping = aes (y = ..prop.., group = bigregion)) + facet_wrap (~ bigregion, ncol = 2)

  39. SURELY THINGS CAN BE EASIER THAN THIS?

  40. TRANSFORM AND SUMMARIZE FIRST THEN SEND CLEAN TABLES TO ggplot

  41. CROSSTABULATION

  42. Protestant Catholic Jewish None Other NA Row percents / Northeast 32.4 33.2 5.5 23.0 5.7 0.2 100 Midwest 46.8 24.7 0.4 22.6 4.7 0.7 100 Row Marginals South 61.8 15.2 1.0 16.2 4.8 1.0 100 West 37.7 24.5 1.6 28.5 7.6 0.2 100 Protestant Catholic Jewish None Other NA Column percents / Northeast 11.5 25.0 52.9 18.1 17.6 5.6 Midwest 23.7 26.5 5.9 25.4 20.8 27.8 Column Marginals South 47.4 24.7 21.6 27.5 31.4 61.1 West 17.4 23.9 19.6 29.1 30.2 5.6 100 100 100 100 100 100 Protestant Catholic Jewish None Other Northeast 5.5 5.7 0.9 3.9 1 Total percents Midwest 11.3 6 0.1 5.5 1.2 South 22.7 5.6 0.4 5.9 1.7 West 8.3 5.4 0.3 6.3 1.7

  43. dplyr lets you manipulate tables in a series of steps, or pipeline

  44. dplyr draws on the logic of database queries, where the focus is managing and summarizing tables

  45. Group the data at the level we want, such as "Religion by Region" group_by() or "Authors by Publications by Year". Filter or Select pieces of the data. This gets us the subset of the filter() rows select() columns table we want to work on. Mutate the data by creating new variables at the current level of mutate() grouping. Mutating adds new columns to the table. Summarize the grouped data. This creates new variables at a higher level of grouping. For example we might calculate means summarize() with mean() or counts with n(). This results in a smaller, summary table, which we might do more things with if we want.

  46. Create a pipeline of tabular transformations with the pipe operator %>%

  47. REORGANIZING TABLES WITH dplyr

  48. rel_by_region <- gss_sm %>% group_by (bigregion, religion) %>% summarize (n = n ()) %>% mutate (freq = n / sum (n), pct = round ((freq*100), 1))

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend