Coding Lab: Visualizing data with ggplot2 Ari Anisfeld Summer 2020 - PowerPoint PPT Presentation

Coding Lab: Visualizing data with ggplot2 Ari Anisfeld Summer 2020 1 / 36

How to use ggplot ◮ How to map data to aesthetics with aes() (and what that means) ◮ How to visualize the mappings with geom s ◮ How to get more out of your data by using multiple aesthetics ◮ How to use facets to add dimensionality There are whole books on how to use ggplot . This is a quick introduction! 2 / 36

Understanding ggplot() By itself, ggplot() tells R to prepare to make a plot. texas_annual_sales <- texas_housing_data %>% group_by (year) %>% summarize (total_volume = sum (volume, na.rm = TRUE)) ggplot (data = texas_annual_sales) 3 / 36

Adding a mapping Adding mapping = aes() says how the data will map to “aesthetics”. ◮ e.g. tell R to make x-axis year and y-axis total_volume . ◮ Each row of the data has ( year , total_volume ). ◮ R will map that to the coordinate pair ( x , y ) . ◮ Look at the data before moving on! ggplot (data = texas_annual_sales, mapping = aes (x = year, y = total_volume)) 8e+10 total_volume 7e+10 6e+10 5e+10 4e+10 2000 2005 2010 2015 4 / 36 year

Visualizing the mapping with a geom geom_<name> tells R what type of visualization to produce. Here we see points. ◮ Each row of the data has ( year , total_volume ). ◮ R will map that to the coordinate pair ( x , y ). ggplot (data = texas_annual_sales, mapping = aes (x = year, y = total_volume)) + geom_point () 8e+10 total_volume 7e+10 6e+10 5e+10 4e+10 2000 2005 2010 2015 5 / 36 year

Visualizing the mapping with a geom Here we see bars. ◮ Each row of the data has ( year , total_volume ). ◮ R will map that to the coordinate pair ( x , y ) ggplot (data = texas_annual_sales, mapping = aes (x = year, y = total_volume)) + geom_col () 8e+10 total_volume 6e+10 4e+10 2e+10 0e+00 2000 2005 2010 2015 year 6 / 36

Visualizing the mapping with a geom Here we see a line connecting each ( x , y ) pair. ggplot (data = texas_annual_sales, mapping = aes (x = year, y = total_volume)) + geom_line () 8e+10 total_volume 7e+10 6e+10 5e+10 4e+10 2000 2005 2010 2015 year 7 / 36

Visualizing the mapping with a geom Here we see a smooth line. R does a statistical transformation! ◮ Now R doesn’t visualize the mapping ( year , total_volume ) to each ( x , y ) pair ◮ Instead it fits a model to the ( x , y ) and then plots the “smooth” line ggplot (data = texas_annual_sales, mapping = aes (x = year, y = total_volume)) + geom_smooth () ## `geom_smooth()` using method = 'loess' and formula 'y ~ x' 7.5e+10 total_volume 5.0e+10 2.5e+10 2000 2005 2010 2015 8 / 36 year

Visualizing the mapping with a geom We can overlay several geom . ggplot (data = texas_annual_sales, mapping = aes (x = year, y = total_volume)) + geom_smooth () + geom_point () 7.5e+10 total_volume 5.0e+10 2.5e+10 2000 2005 2010 2015 year 9 / 36

Visualizing the mapping with a geom ◮ We saw that we can visualize a relationship between two variables mapping data to x and y ◮ The data can be visualized with different geoms that can be composed ( + ) together. ◮ We can even calculate new variables with statistics and plot those on the fly. Next : Now we’ll look at aesthetics that go beyond x and y axes. 10 / 36

Using aesthetics to explore data. We’ll use midwest data and start with only mapping to x and y midwest %>% ggplot ( aes (x = percollege, y = percbelowpoverty)) + geom_point () 50 percbelowpoverty 40 30 20 10 0 10 20 30 40 50 percollege 11 / 36

Using aesthetics to explore data. ◮ color maps data to the color of points or lines. ◮ Each state is assigned a color. ◮ This works with discrete data and continuous data. midwest %>% ggplot ( aes (x = percollege, y = percbelowpoverty, color = state)) + geom_point () 50 state percbelowpoverty 40 IL 30 IN MI 20 OH 10 WI 0 10 20 30 40 50 percollege 12 / 36

Using aesthetics to explore data. ◮ shape maps data to the shape of points. ◮ Each state is assigned a shape. ◮ This works with discrete data only. midwest %>% ggplot ( aes (x = percollege, y = percbelowpoverty, shape = state)) + geom_point () 50 state percbelowpoverty 40 IL 30 IN MI 20 OH 10 WI 0 10 20 30 40 50 percollege 13 / 36

Using aesthetics to explore data. ◮ alpha maps data to the transparency of points. ◮ Here we map the percentage of people within a known poverty status to alpha midwest %>% ggplot ( aes (x = percollege, y = percbelowpoverty, alpha = poptotal)) + geom_point () 50 poptotal percbelowpoverty 40 1e+06 30 2e+06 3e+06 20 4e+06 10 5e+06 0 10 20 30 40 50 percollege 14 / 36

Using aesthetics to explore data. ◮ size maps data to the size of points and width of lines. ◮ Here we map the percentage of people within a known poverty status to size midwest %>% ggplot ( aes (x = percollege, y = percbelowpoverty, size = poptotal)) + geom_point () 50 poptotal percbelowpoverty 40 1e+06 30 2e+06 3e+06 20 4e+06 10 5e+06 0 10 20 30 40 50 percollege 15 / 36

Using aesthetics to explore data. We can combine any and all aesthetics, and even map the same variable to multiple aesthetics midwest %>% ggplot ( aes (x = percollege, y = percbelowpoverty, alpha = percpovertyknown, size = poptotal, color = state)) + geom_point () 16 / 36

Using aesthetics to explore data. 50 percbelowpoverty 40 percpovertyknown 30 85 90 20 95 10 0 state 10 20 30 40 50 percollege IL 17 / 36

Using aesthetics to explore data Different geoms have specific aesthetics that go with them. ◮ use ? to see which aesthetics a geom accepts (e.g ?geom_point ) ◮ the bold aesthetics are required. ◮ the ggplot cheatsheet shows all the geom s with their associated aesthetics 18 / 36

Facets Facets provide an additional tool to explore multidimensional data midwest %>% ggplot ( aes (x = log (poptotal), y = percbelowpoverty)) + geom_point () + facet_wrap ( vars (state)) IL IN MI 50 percbelowpoverty 40 30 20 10 0 8 10 12 14 OH WI 50 40 30 20 10 0 8 10 12 14 8 10 12 14 log(poptotal) 19 / 36

discrete vs continuous data aes discrete continuous limited number of classes unlimited number of classes usually chr or lgl numeric x , y yes yes color , fill yes yes shape yes (6 or fewer categories) no size , alpha not advised yes yes not advised facet Here, discrete and continuous have different meaning than in math ◮ For ggplot meaning is more fluid. ◮ If you do group_by with the var and there are fewer than 6 to 10 groups, discrete visualizations can work ◮ If your “discrete” data is numeric, as.character() or as_factor() to enforce the decision. 20 / 36

color can be continuous midwest %>% ggplot ( aes (x = percollege, y = percbelowpoverty, color = percpovertyknown)) + geom_point () 50 percpovertyknown percbelowpoverty 40 95 30 90 20 10 85 0 10 20 30 40 50 percollege 21 / 36

shape does not play well with many categories ◮ Will only map to 6 categories, the rest become NA . ◮ We can override this behavior and get up to 25 distinct shapes midwest %>% ggplot ( aes (x = percollege, y = percbelowpoverty, shape = county)) + geom_point () + # legend off, otherwise it overwhelms theme (legend.position = "none") 50 percbelowpoverty 40 30 20 10 0 10 20 30 40 50 22 / 36 percollege

alpha and size can be misleading with discrete data midwest %>% ggplot ( aes (x = percollege, y = percbelowpoverty, alpha = state)) + geom_point () ## Warning: Using alpha for a discrete variable is not advised. 50 state percbelowpoverty 40 IL 30 IN MI 20 OH 10 WI 0 10 20 30 40 50 percollege 23 / 36

Adding vertical lines texas_annual_sales %>% ggplot ( aes (x = year, y = total_volume)) + geom_point () + geom_vline ( aes (xintercept = 2007), linetype = "dotted") 8e+10 total_volume 7e+10 6e+10 5e+10 4e+10 2000 2005 2010 2015 year ◮ add horizontal lines with geom_hline() ◮ add any linear fit using geom_abline() by providing a slope 24 / 36 and intercept.

Key take aways ◮ ggplot starts by mapping data to “aesthetics”. ◮ e.g. What data shows up on x and y axes and how color , size and shape appear on the plot. ◮ We need to be aware of ‘continuous’ vs. ‘discrete’ variables. ◮ Then, we use geom s to create a visualization based on the mapping. ◮ Again we need to be aware of ‘continuous’ vs. ‘discrete’ variables. ◮ Making quick plots helps us understand data and makes us aware of data issues Resources : R for Data Science chap. 3 (r4ds.had.co.nz); RStudio’s ggplot cheatsheet. 25 / 36

Appendix: Some graphs you made along the way 26 / 36

lab 0: a map geom_path is like geom_line , but connects ( x , y ) pairs in the order they appear in the data set. storms %>% group_by (name, year) %>% filter ( max (category) == 5) %>% ggplot ( aes (x = long, y = lat, color = name)) + geom_path () + borders ("world") + coord_quickmap (xlim = c ( - 130, -60), ylim = c (20, 50)) 27 / 36

lab 0: a map Dean Emily 50 Felix 40 lat Gilbert 30 Hugo 20 Isabel −120 −100 −80 −60 Ivan long Katrina Mitch 28 / 36

Coding Lab: Visualizing data with ggplot2 Ari Anisfeld Summer 2020 - PowerPoint PPT Presentation

Coding Lab: Visualizing data with ggplot2 Ari Anisfeld Summer 2020 1 / 36 How to use ggplot How to map data to aesthetics with aes() (and what that means) How to visualize the mappings with geom s How to get more out of your data by

Grid Graphics Data Visualization with ggplot2 ggplot2 internals 35 Explore grid graphics

Case Study I Bag Plot Data Visualization with ggplot2 ggplot2 2.0 Write your own

CSSS 569 Visualizing Data and Models Lab 4: Advanced ggplot2 Kai Ping (Brian) Leung Department of

Choropleths Data Visualization with ggplot2 Chapter Contents Maps GIS = Geographic

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Outline - Tasks - Map projections - Visualizing area data - Visualizing point data -

R package ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UCBerkeley

Introduction to ggplot2 R Pruim July, 2014 Goals What I will try to do give a tour of

Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UCBerkeley

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

CSSS 569 Visualizing Data and Models Lab 3: Intro to ggplot2 Kai Ping (Brian) Leung Department of

CME/STATS 195 CME/STATS 195 Lecture 4: Visualizing data Lecture 4: Visualizing data Evan

CSSS 569 Visualizing Data and Models Lab 8: Visualizing Relational Data Kai Ping (Brian) Leung

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

QGIS Tool for Landslide Hazard Assessment Darya Golovko, Sigrid Roessner, Robert Behling and

Classes in R Modified from materials by Mark Hansen, STAT 202a Object oriented programming Based

ASIAN TRANSFORMATIONS Some Discussants Comments Will Martin International Food Policy

ggplot and the GRAMMAR OF GRAPHICS MAPPING vs SETTING AESTHETICS p <- ggplot (data =

Workshop 7: (Generalized) Linear models Murray Logan 19 Jul 2017 Section 1 Linear model

http://listenonrepeat.com/ watch/?v=X2Q_udvSakg#Col umbia_FSAE_Metlife_Stadiu m_Autocross Making

Introduction to Data Science: Neural [ 1 , 2 , , p ] g x w h m g h g f w old M

Regression and the Bias-Variance Decomposition William Cohen 10-601 April 2008 Readings: Bishop

Coding Lab: Visualizing data with ggplot2 Ari Anisfeld Summer 2020 - PowerPoint PPT Presentation

Coding Lab: Visualizing data with ggplot2 Ari Anisfeld Summer 2020 1 / 36 How to use ggplot How to map data to aesthetics with aes() (and what that means) How to visualize the mappings with geom s How to get more out of your data by

Grid Graphics Data Visualization with ggplot2 ggplot2 internals 35 Explore grid graphics

Case Study I Bag Plot Data Visualization with ggplot2 ggplot2 2.0 Write your own

CSSS 569 Visualizing Data and Models Lab 4: Advanced ggplot2 Kai Ping (Brian) Leung Department of

Choropleths Data Visualization with ggplot2 Chapter Contents Maps GIS = Geographic

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Outline - Tasks - Map projections - Visualizing area data - Visualizing point data -

R package ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UCBerkeley

Introduction to ggplot2 R Pruim July, 2014 Goals What I will try to do give a tour of

Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UCBerkeley

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

CSSS 569 Visualizing Data and Models Lab 3: Intro to ggplot2 Kai Ping (Brian) Leung Department of

CME/STATS 195 CME/STATS 195 Lecture 4: Visualizing data Lecture 4: Visualizing data Evan

CSSS 569 Visualizing Data and Models Lab 8: Visualizing Relational Data Kai Ping (Brian) Leung

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

QGIS Tool for Landslide Hazard Assessment Darya Golovko, Sigrid Roessner, Robert Behling and

Classes in R Modified from materials by Mark Hansen, STAT 202a Object oriented programming Based

ASIAN TRANSFORMATIONS Some Discussants Comments Will Martin International Food Policy

ggplot and the GRAMMAR OF GRAPHICS MAPPING vs SETTING AESTHETICS p &lt;- ggplot (data =

Workshop 7: (Generalized) Linear models Murray Logan 19 Jul 2017 Section 1 Linear model

http://listenonrepeat.com/ watch/?v=X2Q_udvSakg#Col umbia_FSAE_Metlife_Stadiu m_Autocross Making

Introduction to Data Science: Neural [ 1 , 2 , , p ] g x w h m g h g f w old M

Regression and the Bias-Variance Decomposition William Cohen 10-601 April 2008 Readings: Bishop

ggplot and the GRAMMAR OF GRAPHICS MAPPING vs SETTING AESTHETICS p <- ggplot (data =