getting started we want to draw good data graphics
play

GETTING STARTED WE WANT TO DRAW GOOD DATA GRAPHICS REPRODUCIBLY - PowerPoint PPT Presentation

GETTING STARTED WE WANT TO DRAW GOOD DATA GRAPHICS REPRODUCIBLY Abstraction in Software Less More Easy things are awkward Easy things are trivial Hard things are straightforward Hard things are really awkward Really hard things are


  1. library (gapminder) gapminder # A tibble: 1,704 x 6 country continent year lifeExp pop gdpPercap <fctr> <fctr> <int> <dbl> <int> <dbl> 1 Afghanistan Asia 1952 28.801 8425333 779.4453 2 Afghanistan Asia 1957 30.332 9240934 820.8530 3 Afghanistan Asia 1962 31.997 10267083 853.1007 4 Afghanistan Asia 1967 34.020 11537966 836.1971 5 Afghanistan Asia 1972 36.088 13079460 739.9811 6 Afghanistan Asia 1977 38.438 14880372 786.1134 7 Afghanistan Asia 1982 39.854 12881816 978.0114 8 Afghanistan Asia 1987 40.822 13867957 852.3959 9 Afghanistan Asia 1992 41.674 16317921 649.3414 10 Afghanistan Asia 1997 41.763 22227415 635.3414 # ... with 1,694 more rows

  2. Named thing gets … … using these arguments p <- ggplot (data = gapminder, mapping = aes (x = gdpPercap, y = lifeExp)) … the output of this function … Objects created by ggplot() are unusual in that you can add p things to them, and they will work as p + geom_point () though you wrote all the code at once.

  3. p <- ggplot (data = gapminder, mapping = aes (x = gdpPercap, y = lifeExp)) p + geom_point ()

  4. 80 60 lifeExp 40 0 30000 60000 90000 gdpPercap

  5. Make Some Graphs

  6. ggplot wants you to feed it TIDY DATA

  7. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909 81 80 Euro 126 40 20 Asia

  8. country year cases population 1 Afghanistan 1999 745 19987071 2 Afghanistan 2000 2666 20595360 3 Brazil 1999 37737 172006362 4 Brazil 2000 80488 174504898 5 China 1999 212258 1272915272 6 China 2000 213766 1280428583

  9. country year key value 1 Afghanistan 1999 cases 745 2 Afghanistan 1999 population 19987071 3 Afghanistan 2000 cases 2666 4 Afghanistan 2000 population 20595360 5 Brazil 1999 cases 37737 6 Brazil 1999 population 172006362 7 Brazil 2000 cases 80488 8 Brazil 2000 population 174504898 9 China 1999 cases 212258 10 China 1999 population 1272915272 11 China 2000 cases 213766 12 China 2000 population 1280428583

  10. country year key value 1 Afghanistan 1999 cases 745 2 Afghanistan 1999 population 19987071 3 Afghanistan 2000 cases 2666 4 Afghanistan 2000 population 20595360 5 Brazil 1999 cases 37737 6 Brazil 1999 population 172006362 7 Brazil 2000 cases 80488 8 Brazil 2000 population 174504898 9 China 1999 cases 212258 10 China 1999 population 1272915272 11 China 2000 cases 213766 12 China 2000 population 1280428583

  11. country year rate 1 Afghanistan 1999 745/19987071 2 Afghanistan 2000 2666/20595360 3 Brazil 1999 37737/172006362 4 Brazil 2000 80488/174504898 5 China 1999 212258/1272915272 6 China 2000 213766/1280428583

  12. country 1999 2000 1 Afghanistan 745 2666 2 Brazil 37737 80488 3 China 212258 213766 country 1999 2000 1 Afghanistan 19987071 20595360 2 Brazil 172006362 174504898 3 China 1272915272 1280428583

  13. GETTING YOUR DATA INTO R

  14. my_data <- read_csv (file = “data/organdonation.csv") Field delimiter is , read_csv2 (file = "data/my_csv_file.csv") Field delimiter is ; read_dta (file = "data/my_stata_file.dta") read_spss (file = "data/my_spss_file.sav") read_sas (data_file = "<NAME>", catalog_file = "<NAME>") read_table (file = "<NAME>") Structured but not delimited

  15. Local File Path organs <- read_csv (file = "data/organdonation.csv") Remote URL url <- "https://cdn.rawgit.com/kjhealy/viz- organdata/master/organdonation.csv" organs <- read_csv (file = url)

  16. engmort <- read_table (file = "data/mortality.txt", skip = 2, na = ".")

  17. HOW ggplot WORKS

  18. ggplot’s FLOW OF ACTION

  19. gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909 81 80 Euro 126 40 20 Asia

  20. A Gapminder Plot Continent Asia Euro Amer Life Expectancy Population 0-35 36-100 >100 log GDP

  21. 2. Mapping 1. Tidy Data 3. Geom ggplot(data = gapminder) ggplot(mapping = aes(x = …)) geom_point() gdp lifexp pop continent 340 65 31 Euro 227 51 200 Amer 909 81 80 Euro 126 40 20 Asia x=gdp y=lifexp size=pop color=continent

  22. 4. Coordinate 5. Scales 6. Labels System & Guides A Gapminder Plot Asia Euro Continent Amer 0-35 Population Life Expectancy 36-100 y y >100 x log10 x log GDP

  23. A Gapminder Plot Continent Asia Euro Amer Life Expectancy Population 0-35 36-100 >100 log GDP

  24. PIECE BY PIECE

  25. head (gapminder) ## # A tibble: 6 × 6 ## country continent year lifeExp pop gdpPercap ## <fctr> <fctr> <int> <dbl> <int> <dbl> ## 1 Afghanistan Asia 1952 28.801 8425333 779.4453 ## 2 Afghanistan Asia 1957 30.332 9240934 820.8530 ## 3 Afghanistan Asia 1962 31.997 10267083 853.1007 ## 4 Afghanistan Asia 1967 34.020 11537966 836.1971 ## 5 Afghanistan Asia 1972 36.088 13079460 739.9811 ## 6 Afghanistan Asia 1977 38.438 14880372 786.1134 dim (gapminder) ## [1] 1704 6

  26. p <- ggplot (data = gapminder) Create a ggplot object Data is gapminder table

  27. p <- ggplot (data = gapminder, mapping = aes (x = gdpPercap, y = lifeExp)) mapping: tell ggplot the variables you want represented by features of the plot

  28. • The mapping = aes( ��../ ) instruction links variables to things you will see on the plot. • The x and y values are the most obvious ones. • Other aesthetic mappings can include, e.g., color , shape , and size .

  29. Mappings do not directly specify the particular , e.g., colors, shapes, or line styles that will appear on the plot. Rather they establish which variables in the data will be represented by which visible features on the plot.

  30. p + geom_point () Add a geom layer to the plot

  31. p + geom_smooth () Try a di ff erent geom

  32. p + geom_point () + geom_smooth () + scale_x_log10 (labels = scales::dollar) This process is literally additive

  33. p + geom_point () + geom_smooth (method = "lm") Every geom is a function. Functions take arguments.

  34. p <- ggplot (data = gapminder, mapping = aes (x = gdpPercap, y = lifeExp)) p + geom_point () + geom_smooth (method = "lm") + scale_x_log10 (label = scales::dollar) Keep Layering

  35. p + geom_point () + geom_smooth (method = "gam") + scale_x_log10 (labels = scales::dollar) + labs (x = "GDP Per Capita", y = "Life Expectancy in Years", title = "Economic Growth and Life Expectancy", subtitle = "Data points are country-years", caption = "Data source: Gapminder")

  36. MAPPING vs SETTING AESTHETICS

  37. p <- ggplot (data = gapminder, mapping = aes (x = gdpPercap, y = lifeExp, color = "purple")) p + geom_point () + geom_smooth (method = "loess") + scale_x_log10 ()

  38. What has gone wrong here?

  39. p <- ggplot (data = gapminder, mapping = aes (x = gdpPercap, y = lifeExp) p + geom_point (color = "purple") + geom_smooth (method = "loess")) + scale_x_log10 ()

  40. p <- ggplot (data = gapminder, mapping = aes (x = gdpPercap, y = lifeExp)) p + geom_point (alpha = 0.3) + geom_smooth (color = "orange", se = FALSE, size = 2, method = "lm") + scale_x_log10 () Here, some aesthetics are mapped, and some are set

  41. p <- ggplot (data = gapminder, mapping = aes (x = gdpPercap, y = lifeExp, color = continent, fill = continent)) p + geom_point () + geom_smooth (method = "loess") + scale_x_log10 ()

  42. MAP or SET AESTHETICS per geom

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend