show the right numbers ggplot implements a grammar of
play

Show the Right Numbers ggplot IMPLEMENTS A GRAMMAR OF GRAPHICS - PowerPoint PPT Presentation

Show the Right Numbers ggplot IMPLEMENTS A GRAMMAR OF GRAPHICS The grammar is a set of rules for how produce graphics from data, taking pieces of data and mapping them to geometric objects (like points and lines) that have aesthetic attributes


  1. rel_by_region <- gss_sm %>% group_by (bigregion, religion) %>% summarize (N = n ())

  2. rel_by_region <- gss_sm %>% group_by (bigregion, religion) %>% summarize (N = n ()) %>% mutate (freq = N / sum (N), pct = round ((freq*100), 1))

  3. rel_by_region <- gss_sm %>% group_by (bigregion, religion) %>% summarize (N = n ()) %>% mutate (freq = N / sum (N), pct = round ((freq*100), 1)) Objects in a pipeline carry forward some assumptions about context

  4. rel_by_region <- gss_sm %>% group_by (bigregion, religion) %>% summarize (N = n ()) %>% mutate (freq = N / sum (N), pct = round ((freq*100), 1)) Grouping with group_by() carries forward; summary calculations are applied to the innermost group

  5. rel_by_region <- gss_sm %>% group_by (bigregion, religion) %>% summarize (N = n ()) %>% mutate (freq = N / sum (N), pct = round ((freq*100), 1)) mutate() doesn’t change the grouping level

  6. rel_by_region <- gss_sm %>% group_by (bigregion, religion) %>% summarize (N = n ()) %>% mutate (freq = N / sum (N), pct = round ((freq*100), 1)) Notice how we can create variables on the fly and use them immediately

  7. rel_by_region ## Source: local data frame [24 x 5] ## Groups: bigregion [4] ## ## # A tibble: 24 x 5 ## bigregion religion N freq pct ## <fctr> <fctr> <int> <dbl> <dbl> ## 1 Northeast Protestant 158 0.32377049 32.4 ## 2 Northeast Catholic 162 0.33196721 33.2 ## 3 Northeast Jewish 27 0.05532787 5.5 ## 4 Northeast None 112 0.22950820 23.0 ## 5 Northeast Other 28 0.05737705 5.7 ## 6 Northeast NA 1 0.00204918 0.2 ## 7 Midwest Protestant 325 0.46762590 46.8 ## 8 Midwest Catholic 172 0.24748201 24.7 ## 9 Midwest Jewish 3 0.00431655 0.4 ## 10 Midwest None 157 0.22589928 22.6 ## # ... with 14 more rows

  8. rel_by_region <- gss_sm %>% group_by (bigregion, religion) %>% summarize (n = n ()) %>% mutate (freq = n / sum (n), pct = round ((freq*100), 1)) Some Shorthand for this …

  9. gss_sm %>% gss_sm %>% gss_sm %>% group_by(bigregion, religion) %>% group_by(bigregion, religion) %>% count (bigregion, religion) summarize (n = n ()) tally () # A tibble: 24 x 3 # A tibble: 24 x 3 # A tibble: 24 x 3 # Groups: bigregion [4] # Groups: bigregion [4] bigregion religion n bigregion religion n bigregion religion n <fct> <fct> <int> <fct> <fct> <int> <fct> <fct> <int> 1 Northeast Protestant 158 1 Northeast Protestant 158 1 Northeast Protestant 158 2 Northeast Catholic 162 2 Northeast Catholic 162 2 Northeast Catholic 162 3 Northeast Jewish 27 3 Northeast Jewish 27 3 Northeast Jewish 27 4 Northeast None 112 4 Northeast None 112 4 Northeast None 112 5 Northeast Other 28 5 Northeast Other 28 5 Northeast Other 28 6 Northeast NA 1 6 Northeast NA 1 6 Northeast NA 1 7 Midwest Protestant 325 7 Midwest Protestant 325 7 Midwest Protestant 325 8 Midwest Catholic 172 8 Midwest Catholic 172 8 Midwest Catholic 172 9 Midwest Jewish 3 9 Midwest Jewish 3 9 Midwest Jewish 3 10 Midwest None 157 10 Midwest None 157 10 Midwest None 157 # … with 14 more rows # … with 14 more rows # … with 14 more rows n() tally() count()

  10. Use pipelines to create summary table objects, then graph them

  11. rel_by_region %>% group_by (bigregion) %>% summarize (total = sum (pct)) ## # A tibble: 4 x 2 ## bigregion total ## <fctr> <dbl> ## 1 Northeast 100.0 ## 2 Midwest 99.9 ## 3 South 100.0 ## 4 West 100.1 Pipelined tables are easier to check for errors

  12. p <- ggplot (data = rel_by_region, mapping = aes (x = bigregion, y = pct, fill = religion)) p + geom_col (position = "dodge") + labs (x = "Region", y = "Percent", fill = "Religion") + theme (legend.position = "top")

  13. But is this an e ff ective graph? Not Really!

  14. p <- ggplot (data = rel_by_region, mapping = aes (x = religion, y = pct, fill = religion)) p + geom_col (position = "dodge") + labs (x = NULL, y = "Percent", fill = "Religion") + guides (fill = FALSE) + coord_flip () + facet_wrap (~ bigregion, nrow = 1)

  15. WHAT WE’VE NOW BUILT UP

  16. p <- ggplot (data = <DATA>, mapping=aes(<MAPPINGS>)) + <GEOM_FUNCTION>( mapping = aes(<MAPPINGS>), stat = <STAT>, position = <POSITION>) + <SCALE_FUNCTION> + <COORDINATE_FUNCTION> + <FACET_FUNCTION> + <THEME_FUNCTION>

  17. p <- ggplot (data = gapminder, mapping = aes (x = year, y = gdpPercap)) p + geom_line ( aes (group = country)) + scale_y_log10 () + coord_cartesian () + facet_wrap (~ continent)

  18. geom_point () geom_line () geom_smooth () geom_bar () geom_histogram () geom_density () geom_boxplot ()

  19. THE ORGAN DONATION DATA

  20. organdata %>% select (1:6) %>% sample_n (size = 10) ## # A tibble: 10 x 6 ## country year donors pop pop_dens gdp ## <chr> <date> <dbl> <int> <dbl> <int> ## 1 Switzerland NA NA NA NA NA ## 2 Switzerland 1997-01-01 14.3 7089 17.2 27675 ## 3 United Kingdom 1997-01-01 13.4 58283 24.0 22442 ## 4 Sweden NA NA 8559 1.90 18660 ## 5 Ireland 2002-01-01 21.0 3932 5.60 32571 ## 6 Germany 1998-01-01 13.4 82047 23.0 23283 ## 7 Italy NA NA 56719 18.8 17430 ## 8 Italy 2001-01-01 17.1 57894 19.2 25359 ## 9 France 1998-01-01 16.5 58398 10.6 24044 ## 10 Spain 1995-01-01 27.0 39223 7.75 15720 Everyday use of dplyr and pipes

  21. p <- ggplot (data = organdata, mapping = aes (x = year, y = donors)) p + geom_point () ## Warning: Removed 34 rows containing missing values ## (geom_point). p <- ggplot (data = organdata, mapping = aes (x = year, y = donors)) p + geom_line ( aes (group = country)) + facet_wrap (~ country)

  22. Continuous Variables by Categories

  23. p <- ggplot (data = organdata, mapping = aes (x = country, y = donors)) p + geom_boxplot ()

  24. p <- ggplot (data = organdata, mapping = aes (x = country, y = donors)) p + geom_boxplot () + coord_flip () Explicit use of a coordinate system transformation

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend