Show the Right Numbers ggplots FLOW OF ACTION Will be handled - PowerPoint PPT Presentation

Show the Right Numbers

ggplot’s FLOW OF ACTION

Will be handled automatically Themes unless we say Guides otherwise Coordinates and Scales We always have Geoms and/or Stats to specify these Aesthetic Mappings Data to draw a plot

Grouped Data and the group aesthetic

p + geom_line (color = "gray70", mapping = aes (group = country)) + geom_smooth (size = 1.1, method = "loess", se = FALSE) + scale_y_log10 (labels=scales::dollar) + facet_wrap (~ continent, ncol = 5) + labs (x = "Year", y = "GDP per capita", title = "GDP per capita on Five Continents") The labs() function lets you name labels, title, subtitle, etc.

geoms CAN TRANSFORM DATA

gss_sm A subset of General Social Survey Questions from 2016

with (gss_sm, table (religion)) ## ## Protestant Catholic Jewish None Other ## 1371 649 51 619 159

p <- ggplot (data = gss_sm, mapping = aes (x = bigregion)) p + geom_bar () Just the one aesthetic mapping, to x .

The y-axis variable, count , is not in the data. Instead, ggplot has calculated it for us. It does this using the default stat_ function associated with geom_bar() , stat_count(). This function can compute two new variables, count , and prop (short for proportion ). The count statistic is the default one used.

p <- ggplot (data = gss_sm, mapping = aes (x = bigregion)) p + geom_bar (mapping = aes (y = ..prop..))

p <- ggplot (data = gss_sm, mapping = aes (x = bigregion)) p + geom_bar (mapping = aes (y = ..prop.., group = 1))

p + geom_bar () p + stat_count () geom_ functions call their default stat_ functions behind the scenes. (And vice versa)

p <- ggplot (data = gss_sm, mapping = aes (x = religion)) p + geom_bar () p <- ggplot (data = gss_sm, mapping = aes (x = religion, color = religion)) p + geom_bar () p <- ggplot (data = gss_sm, mapping = aes (x = religion, fill = religion)) p + geom_bar () p <- ggplot (data = gss_sm, mapping = aes (x = religion, fill = religion)) p + geom_bar () + guides(fill = FALSE)

p <- ggplot (data = gss_sm, mapping = aes (x = religion, color = religion)) p + geom_bar () p <- ggplot (data = gss_sm, mapping = aes (x = religion, fill = religion)) p + geom_bar () + guides (fill = FALSE)

HISTOGRAMS & KERNEL DENSITIES

midwest County-Level Census Data for Midwestern States

p <- ggplot (data = midwest, mapping = aes (x = area)) p + geom_histogram () ## `stat_bin()` using `bins = 30`. ## Pick better value with `binwidth`. The default stat for this geom has to make a choice, and is letting us know we might want to override it.

p <- ggplot (data = midwest, mapping = aes (x = area)) p + geom_histogram (bins = 10)

oh_wi <- c ("OH", "WI") a convenient, subset our data built-in operator on the fly p <- ggplot (data = subset (midwest, state %in% oh_wi), mapping = aes (x = percollege, fill = state)) p + geom_histogram (position = "identity", alpha = 0.4, bins = 20) Just plot x by its values on the scale, don’t stack or dodge

p <- ggplot (data = midwest, mapping = aes (x = area)) p + geom_density () geom_hist() ’s continuous counterpart, geom_density()

p <- ggplot (data = midwest, mapping = aes (x = area, fill = state, color = state)) p + geom_density (alpha = 0.3)

AVOIDING TRANSFORMATIONS WHEN NECESSARY

> titanic ## fate gender n percent ## 1 perished male 1364 62.0 ## 2 perished female 126 5.7 ## 3 survived male 367 16.7 ## 4 survived female 344 15.6 No counting up required? Then stat = identity

p <- ggplot (data = titanic, mapping = aes (x = fate, y = percent, fill = sex)) p + geom_bar (stat = "identity", position = "dodge") + theme (legend.position = "top") The theme() function controls parts of the plot that don’t belong to its “grammatical” structure

p <- ggplot (data = titanic, mapping = aes (x = fate, y = percent, fill = sex)) p + geom_col (position = "dodge") + theme (legend.position = "top") Even better: for convenience when not counting up, just use geom_col()

oecd_sum ## # A tibble: 57 x 5 ## # Groups: year [57] ## year other usa diff hi_lo ## <int> <dbl> <dbl> <dbl> <chr> ## 1 1960 68.6 69.9 1.30 Below ## 2 1961 69.2 70.4 1.20 Below ## 3 1962 68.9 70.2 1.30 Below ## 4 1963 69.1 70.0 0.900 Below ## 5 1964 69.5 70.3 0.800 Below ## 6 1965 69.6 70.3 0.700 Below ## 7 1966 69.9 70.3 0.400 Below ## 8 1967 70.1 70.7 0.600 Below ## 9 1968 70.1 70.4 0.300 Below ## 10 1969 70.1 70.6 0.500 Below ## # ... with 47 more rows

p <- ggplot (data = oecd_sum, mapping = aes (x = year, y = diff, fill = hi_lo)) p + geom_col () + guides (fill = FALSE) + labs (x = NULL, y = "Difference in Years", title = "The US Life Expectancy Gap", subtitle = "Difference between US and OECD average life expectancies, 1960-2015", caption = "Data: OECD. After a chart by Christopher Ingraham, Washington Post, December 27th 2017.")

CROSSTABULATION THE AWKWARD WAY

WARNING! There’s nothing wrong with the code on the next few slides. If you go searching online for how to make a proportional bar chart with ggplot you’ll see answers like this. But, doing it this way is confusing and I find it is much easier to work a slightly di ff erent way. So, I won’t cover this approach in class. I’m including it here so you can see why it’s awkward.

p <- ggplot (data = gss_sm, mapping = aes (x = religion, color = religion)) p + geom_bar () p <- ggplot (data = gss_sm, mapping = aes (x = religion, fill = religion)) p + geom_bar () + guides (fill = FALSE)

p <- ggplot (data = gss_sm, mapping = aes (x = bigregion, fill = religion)) p + geom_bar () Counts are easy

Position adjustments don’t give us the view we want p <- ggplot (data = gss_sm, mapping = aes (x = bigregion, fill = religion)) p + geom_bar (position = "fill")

p <- ggplot (data = gss_sm, mapping = aes (x = bigregion, fill = religion)) p + geom_bar (position = "dodge", mapping = aes (y = ..prop..))

p <- ggplot (data = gss_sm, mapping = aes (x = bigregion, fill = religion)) p + geom_bar (position = "dodge", mapping = aes (y = ..prop.., group = religion))

Still not right! Also: hard to read

Time to take a step back p <- ggplot (data = gss_sm, mapping = aes (x = religion)) p + geom_bar (position = "dodge", mapping = aes (y = ..prop.., group = bigregion)) + facet_wrap (~ bigregion, ncol = 2)

SURELY THINGS CAN BE EASIER THAN THIS?

TRANSFORM AND SUMMARIZE FIRST THEN SEND CLEAN TABLES TO ggplot

CROSSTABULATION

Protestant Catholic Jewish None Other NA Row percents / Northeast 32.4 33.2 5.5 23.0 5.7 0.2 100 Midwest 46.8 24.7 0.4 22.6 4.7 0.7 100 Row Marginals South 61.8 15.2 1.0 16.2 4.8 1.0 100 West 37.7 24.5 1.6 28.5 7.6 0.2 100 Protestant Catholic Jewish None Other NA Column percents / Northeast 11.5 25.0 52.9 18.1 17.6 5.6 Midwest 23.7 26.5 5.9 25.4 20.8 27.8 Column Marginals South 47.4 24.7 21.6 27.5 31.4 61.1 West 17.4 23.9 19.6 29.1 30.2 5.6 100 100 100 100 100 100 Protestant Catholic Jewish None Other Northeast 5.5 5.7 0.9 3.9 1 Total percents Midwest 11.3 6 0.1 5.5 1.2 South 22.7 5.6 0.4 5.9 1.7 West 8.3 5.4 0.3 6.3 1.7

dplyr lets you manipulate tables in a series of steps, or pipeline

dplyr draws on the logic of database queries, where the focus is managing and summarizing tables

Group the data at the level we want, such as "Religion by Region" group_by() or "Authors by Publications by Year". Filter or Select pieces of the data. This gets us the subset of the filter() rows select() columns table we want to work on. Mutate the data by creating new variables at the current level of mutate() grouping. Mutating adds new columns to the table. Summarize the grouped data. This creates new variables at a higher level of grouping. For example we might calculate means summarize() with mean() or counts with n(). This results in a smaller, summary table, which we might do more things with if we want.

Create a pipeline of tabular transformations with the pipe operator %>%

REORGANIZING TABLES WITH dplyr

rel_by_region <- gss_sm %>% group_by (bigregion, religion) %>% summarize (n = n ()) %>% mutate (freq = n / sum (n), pct = round ((freq*100), 1))

Show the Right Numbers ggplots FLOW OF ACTION Will be handled - PowerPoint PPT Presentation

Show the Right Numbers ggplots FLOW OF ACTION Will be handled automatically Themes unless we say Guides otherwise Coordinates and Scales We always have Geoms and/or Stats to specify these Aesthetic Mappings Data to draw a plot

More ggplot Steve Bagley somgen223.stanford.edu 1 More ggplot somgen223.stanford.edu 2 iris

ggplot and the GRAMMAR OF GRAPHICS MAPPING vs SETTING AESTHETICS p <- ggplot (data =

Show the Right Numbers ggplot IMPLEMENTS A GRAMMAR OF GRAPHICS The grammar is a set of rules

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS

Introduction to Data Science: Basic Let's create a dot plot of the example points and text, by

Visualization of Linear Models Correlation and Regression Possums > ggplot(data = possum,

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Kanban in Action: Kanban in Action: Kanban in Action: Kanban in Action: Thoughtfully

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

An introduction to complex numbers The complex numbers Are the real numbers not sufficient? A

Signed numbers Goals unsigned numbers - non-negative integers signed numbers - positive/negative

Finding the Right Target Audience Defining the Right Audience Right Visitors Right Time

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

Dual Numbers Gino van den Bergen gino@dtecta.com Introduction Dual numbers extend real

Real Numbers and their Properties Types of Numbers Z + Natural numbers - counting numbers - 1

Start Me Up: Determining and Sharing TCPs Initial Congestion Window Safiqul Islam and Michael

Adding Explicit Congestjon Notjfjcatjon (ECN) to TCP control packets and TCP retransmissions New

Reordering factors Emily Robinson Data Scientist DataCamp Categorical Data in the Tidyverse

Fourier Transform for Partial Differential Equations Introduction: Fourier Transform

19 May 2018, Suntec Singapore Important notice Forward-looking statements Certain statements in

Introduction to R Week 3: Selecting, ltering, and mutating Louisa Smith July 27 - July 31

P t Prr

MFCS 2014 in Budapest , Hungary in 2014 39th International Symposium on Mathematical Foundations

Show the Right Numbers ggplots FLOW OF ACTION Will be handled - PowerPoint PPT Presentation

Show the Right Numbers ggplots FLOW OF ACTION Will be handled automatically Themes unless we say Guides otherwise Coordinates and Scales We always have Geoms and/or Stats to specify these Aesthetic Mappings Data to draw a plot

More ggplot Steve Bagley somgen223.stanford.edu 1 More ggplot somgen223.stanford.edu 2 iris

ggplot and the GRAMMAR OF GRAPHICS MAPPING vs SETTING AESTHETICS p &lt;- ggplot (data =

Show the Right Numbers ggplot IMPLEMENTS A GRAMMAR OF GRAPHICS The grammar is a set of rules

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS

Introduction to Data Science: Basic Let's create a dot plot of the example points and text, by

Visualization of Linear Models Correlation and Regression Possums &gt; ggplot(data = possum,

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Kanban in Action: Kanban in Action: Kanban in Action: Kanban in Action: Thoughtfully

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

An introduction to complex numbers The complex numbers Are the real numbers not sufficient? A

Signed numbers Goals unsigned numbers - non-negative integers signed numbers - positive/negative

Finding the Right Target Audience Defining the Right Audience Right Visitors Right Time

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

Dual Numbers Gino van den Bergen gino@dtecta.com Introduction Dual numbers extend real

Real Numbers and their Properties Types of Numbers Z + Natural numbers - counting numbers - 1

Start Me Up: Determining and Sharing TCPs Initial Congestion Window Safiqul Islam and Michael

Adding Explicit Congestjon Notjfjcatjon (ECN) to TCP control packets and TCP retransmissions New

Reordering factors Emily Robinson Data Scientist DataCamp Categorical Data in the Tidyverse

Fourier Transform for Partial Differential Equations Introduction: Fourier Transform

19 May 2018, Suntec Singapore Important notice Forward-looking statements Certain statements in

Introduction to R Week 3: Selecting, ltering, and mutating Louisa Smith July 27 - July 31

P t Prr

MFCS 2014 in Budapest , Hungary in 2014 39th International Symposium on Mathematical Foundations

ggplot and the GRAMMAR OF GRAPHICS MAPPING vs SETTING AESTHETICS p <- ggplot (data =

Visualization of Linear Models Correlation and Regression Possums > ggplot(data = possum,