BootcampR
AN INTRODUCTION TO R
Jason A. Heppler, PhD University of Nebraska at Omaha March 3, 2020 @jaheppler
BootcampR AN INTRODUCTION TO R Jason A. Heppler, PhD University of - - PowerPoint PPT Presentation
BootcampR AN INTRODUCTION TO R Jason A. Heppler, PhD University of Nebraska at Omaha March 3, 2020 @jaheppler Hi. I'm Jason. I like to gesture at screens. Digital Engagement Librarian , University of Nebraska at Omaha Mentor, Mozilla Open
AN INTRODUCTION TO R
Jason A. Heppler, PhD University of Nebraska at Omaha March 3, 2020 @jaheppler
I like to gesture at screens.
Digital Engagement Librarian, University of Nebraska at Omaha Mentor, Mozilla Open Leaders Researcher, Humanities+Design, Stanford University
Schedule March 10: 1:30-3 Making Networks in CL 232 March 17: 1:30-3 Making Maps in CL 112 March 31: 1:30-3 Clustering and Classifying in CL 112
Open up RStudio. We'll start doing a few things together soon.
"The bad news is that when ever you learn a new skill you’re going to suck. It’s going to be frustrating. The good news is that is typical and happens to everyone and it is
You can’t go from knowing nothing to becoming an expert without going through a period of great frustration and great suckiness."
—Hadley Wickham
ggplot is part of the tidyverse A highly functional package for reasoning the creation of statistical charts and graphics.
Edward Tufte suggests that graphical excellence is defined by "that which gives the viewer the greatest number of ideas, in the shortest time, with the least ink, the smallest space, and which tells the truth about data."
Edward Tufte, The Visual Display of Quantitative Information (Graphics Press, 1983)
Types of Visualization
charts and graphs to represent data
Types of Visualization
charts and graphs to represent data
that has close ties to real-world objects with spatial properties
Types of Visualization
charts and graphs to represent data
that has close ties to real-world objects with spatial properties
and visualizations with narrative
Cognitive and Social Aspects
Gestalt Principles of Data Visualization
Gestalt psychology is an old practice of understanding how humans perceive patterns. The principles of Gestalt psychology attempt to explore how we view separate visual elements as a whole.
Gestalt Principles of Data Visualization
e.g.) are perceived as part of the same group.
Gestalt Principles of Data Visualization
e.g.) are perceived as part of the same group.
Gestalt Principles of Data Visualization
e.g.) are perceived as part of the same group.
Gestalt Principles of Data Visualization
part of a single group.
Gestalt Principles of Data Visualization
visual element.
Gestalt Principles of Data Visualization
missing information. When viewing a shape with missing segments, we perceive it as a single unit.
Effective design of complex visualizations must consider these principles and the intentional and unintentional signals our graphics send to our readers.
In the mid-1980s, statisticians William Cleveland and Robert McGill ran experiments with human volunteers to study the perception of quantitative information encoded by different cues.
Let's get started. ggplot2 is a data visualization package that uses
graphics Supplementary packages for ggplot are available for more customization and function, for example:
...and many more.
Grammar of graphics ggplot has three essential components:
columns to map to x and y), and assigning variables to visual elements (color, shape, size, etc.)
bars, maps, etc.)
Grammar of graphics
Grammar of graphics A visualization concept created by Leland Wilkinson (1999) to define the elements of statistical graphics:
"... describes the meaning of what we do when we construct statistical graphics ... More than a taxonomy ... Computational system based on the underlying mathematics of representing statistical functions of data."
Adapted by the creator of ggplot, Hadley Wickham, in 2009. ggplot offers a:
See Hadley Wickham, "A Layered Grammar of Graphics," Journal of Computational and Graphical Statistics vol. 19 no. 1 (2010): 3--28 http://vita.had.co.nz/papers/layered-grammar.pdf.
Grammar of graphics
library(tidyverse) # devtools::install_github("hepplerj/superfundr") library(superfundr) data(superfunds) # Let's look at the first five rows superfunds %>% head(5)
# A tibble: 5 x 20 site_name epa_id city county state zipcode region npl_status superfund_agree… <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> 1 ATLAS TA… MAD00… FAIR… BRIST… MA 02719 1 Currently… N 2 ATLAS TA… MAD00… FAIR… BRIST… MA 02719 1 Currently… N 3 ATLAS TA… MAD00… FAIR… BRIST… MA 02719 1 Currently… N 4 ATLAS TA… MAD00… FAIR… BRIST… MA 02719 1 Currently… N 5 ATLAS TA… MAD00… FAIR… BRIST… MA 02719 1 Currently… N # … with 11 more variables: federal_facility <chr>, op_unit_no <dbl>, seq_id <dbl>, # decision_type <chr>, completion_date <dttm>, fiscal_year <dbl>, media <chr>, # contaminant <chr>, address <fct>, latitude <dbl>, longitude <dbl>
Grammar of graphics
library(tidyverse) # devtools::install_github("hepplerj/superfundr") library(superfundr) data(superfunds) # Let's look at the first five rows superfunds %>% head(5) # distinct() lets us identify unique values # and the .keep_all argument returns all data # that matches superfunds_subset <- superfunds %>% distinct(site_name, .keep_all = TRUE)
ggplot(data = superfunds_subset, aes(x = state)) + geom_bar() + labs(title = "U.S. Superfund Sites", x = "State", y = "Count")
Grammar of graphics
Grammar of graphics Let's break down what we did with the ggplot code. The code for our previous bar chart looked like:
ggplot(data = superfunds, aes(x = state)) + geom_bar() + labs(title = "U.S. Superfund Sites", x = "State", y = "Count")
Grammar of graphics Let's break down what we did with the ggplot code. The code for our previous bar chart looked like:
ggplot(data = superfunds, aes(x = state)) + geom_bar()
Grammar of graphics Let's break down what we did with the ggplot code. The code for our previous bar chart looked like:
ggplot(data = superfunds, aes(x = state)) + geom_bar()
ggplot needs:
Grammar of graphics Let's break down what we did with the ggplot code. The code for our previous bar chart looked like:
ggplot(data = superfunds, aes(x = state)) + geom_bar()
data = superfunds
aes(x = state)
Grammar of graphics Let's break down what we did with the ggplot code. The code for our previous bar chart looked like:
ggplot(data = superfunds, aes(x = state)) + geom_bar()
data = superfunds
aes(x = state)
Grammar of graphics Let's break down what we did with the ggplot code. The code for our previous bar chart looked like:
ggplot(data = superfunds, aes(x = state)) + geom_bar()
geom_bar()
See the ggplot geom_bar() documentation for the differences in the stat flag. By default, geom_bar() uses "stat='count'" which sets the height of the bar proportion to the number of cases in each group. Since we want the height of the bars to represent values in the data, we use "stat='identity'" to map a variable to the y aesthetic.
Grammar of graphics There are several geometric objects (think: chart types) available in
sites were proposed for Superfund.
# the distinct() function lets me grab just unique names in # the data frame so we're not counting duplicate entries. superfunds %>% distinct(site_name, .keep_all = TRUE) %>% ggplot(aes(x = fiscal_year)) + geom_histogram() + labs(title = "U.S. Superfund Sites", x = "Fiscal Year", y = "Count")
Grammar of graphics
R makes the production of these graphics
barchart and histogram with just two lines of code.
Let's do some more hands-on
Head on over to https://tinyurl.com/unobootcamp and select the worksheet for this week.
Questions? Troubleshooting?
Next workshop: March 10, 1:30p-3p: Making Networks (CL 232)