Working with pipes Computational Pipelines R.W. Oldford Pipes - - PowerPoint PPT Presentation
Working with pipes Computational Pipelines R.W. Oldford Pipes - - PowerPoint PPT Presentation
Working with pipes Computational Pipelines R.W. Oldford Pipes French surrealist painter Rene Magrittes 1929 painting The Treachery of Images The famous pipe. How people reproached me for it! And yet, could you stuff my pipe? No,
Pipes
French surrealist painter Rene Magritte’s 1929 painting “The Treachery of Images” “The famous pipe. How people reproached me for it! And yet, could you stuff my pipe? No, it’s just a representation, is it not? So if I had written on my picture ‘This is a pipe’, I’d have been lying!”
Pipes
Plumbing These too are representations of pipes, and pipe connectors
Pipes, connectors, pipelines
Put them together you get pipelines With a variety of different connectors. The resulting pipelines can be large and complex.
Computational pipes
Have some function/module which takes some input, performs some actions on it (transformations, summarizing, adding information, etc.) and produces output: If we have several of these, we can connect them one to another in sequence to produce a “pipeline” of modules or steps in the processing of the original input:
Computational pipes
Have some function/module which takes some input, performs some actions on it (transformations, summarizing, adding information, etc.) and produces output: If we have several of these, we can connect them one to another in sequence to produce a “pipeline” of modules or steps in the processing of the original input: The connected components form a “pipeline” through which the original input “flows”, with some processing/transformation of the data occurring at each step.
Computational pipelines
A simple metaphor (viz. that of laying pipes end to end):
◮ data passes through and is processed by a set of computational steps
serially linked so that the output of one becomes the input of the next
◮ E.g. the Unix “pipe” | is called a “pipe”: ls -R Notes | grep ".pdf" |
sort | more
◮ E.g. a graphics rendering pipeline (from Kaufman, Fan and Petkov (2009) Implementing the lattice
Boltzmann model on commodity graphics hardware J. Stat. Mech.)]
magrittr
The (CRAN) package authored by Stefan Milton Bache (and later joined by Hadley Wickham) “The magrittr (to be pronounced with a sophisticated french accent) is a package with two aims: to decrease development time and to improve readability and maintainability of code. Or even shortr: to make your code smokin’ (puff puff)!” (See outragreous French accent.) “To archive its humble aims, magrittr (remember the accent) provides a new pipe-like operator, %>%, with which you may pipe a value forward into an expression or function call; something along the lines of x %>% f, rather than f(x).” library(magrittr)
magrittr - a pipe changes program control to program flow
The basic idea is simple. Instead of writing f(x), write x %>% f In magrittr, %>% is a binary operator which pipes the output of the first
- perand and provides it as the first argument of the second operand.
So, instead of head(mtcars, n = 3) ## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 With magrittr we can use the pipe %>% to do the same thing mtcars %>% head(n = 3) ## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
magrittr - the payoff
By joining pipes end to end in a pipeline through which data (output) flows and is treated by different operations along the way a more natural program flow can often be had. For example, Hadley Wickham likes to illustrate this point using the nursery rhyme “Little Bunny foo foo” (sung to the tune of the traditional Canadian children’s song “Alouette”). Little bunny foo foo hopping through the forest scooping up the field mice bopping them on the head.
magrittr - the payoff
How do we represent this natural language (English) expression
Little bunny foo foo / hopping through the forest / scooping up the field mice / bopping them on the head
and represent it in code?
# Start with a little bunny use forward assignment to name it little_bunny() -> foo_foo
Which is a more natural expression? This standard procedural version?
# Without pipes: bop_on( scoop_up( hop_through(foo_foo, forest), field_mouse), head) Or this pipelined version? # Or with pipes: foo_foo %>% hop_through(forest) %>% scoop_up(field_mouse) %>% bop_on(head)
Note: the assignment little_bunny() -> foo_foo appeared in neither expression.
magrittr - simplified program control with pipes
An example adapted from the magrittr package vignette:
mtcars %>% subset(am == 0) %>% aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2)) %>% transform(kpl = mpg %>% multiply_by(0.4251)) ## cyl mpg disp hp drat wt qsec vs am gear carb kpl ## 1 4 22.90 135.87 84.67 3.77 2.94 20.97 1 0 3.67 1.67 9.734790 ## 2 6 19.12 204.55 115.25 3.42 3.39 19.21 1 0 3.50 2.50 8.127912 ## 3 8 15.05 357.62 194.17 3.12 4.10 17.14 0 3.00 3.08 6.397755
Note (adapted from vignette):
- 1. By default the left-hand side (LHS) will be piped in as the first argument of
the function appearing on the right-hand side (RHS). This is the case in the subset and transform expressions.
- 2. %>% may be used in a nested fashion, e.g. it may appear in expressions
within arguments. This is used in the mpg to kpl conversion.
- 3. When the LHS is needed at a position other than the first, one can use the
dot (.) as placeholder. This is used in the aggregate expression.
magrittr - simplified program control with pipes
# Again mtcars %>% subset(am == 0) %>% aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2)) %>% transform(kpl = mpg %>% multiply_by(0.4251))
Note (continued from previous slide):
- 4. Note that if a dot appears naturally as part of a normal R expression
(e.g. in a formula), it is not confused with marking where the data will enter the pipe component (e.g. as it appears in the aggregate expression).
- 5. If the dot (.) appears as the LHS of a pipeline it creates a single argument
function around the pipeline. E.g. this defines the aggregator function for FUN = above.
- 6. Note that magrittr has built in some functions like multiply_by() to get
around purely binary operators like *, though this is not strictly necessary (e.g. ‘*‘(2, 3) is the same as 2 * 3).
magrittr - forward assignment -> at end
The result can be assigned to another variable using forward assignment
mtcars %>% subset(am == 0) %>% aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2)) %>% transform(kpl = mpg %>% multiply_by(0.4251)) -> # FORWARD assigment new_mtcars
which preserves the direction of the data flow along the pipe. Unfortunately, the pipe flow ends with the forward assignment -> . E.g.
mtcars %>% subset(am == 0) %>% aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2)) %>% transform(kpl = mpg %>% multiply_by(0.4251)) -> # forward assigment new_mtcars %>% # pipe cannot continue after assignment head Error in new_mtcars %>% head <- mtcars %>% subset(am == 0) %>% aggregate(. ~ : could not find function “%>%<-” which seems like an oversight on the part of magrittr.
magrittr - forward assignment ends the pipe
Instead use assign() function (with dot .) to maintain a sense of flow. mtcars %>% subset(am == 0) %>% aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2)) %>% transform(kpl = mpg %>% multiply_by(0.4251)) %>% # pipe assign("new_mtcars", . ) %>% # use assign and dot head ## cyl mpg disp hp drat wt qsec vs am gear carb kpl ## 1 4 22.90 135.87 84.67 3.77 2.94 20.97 1 0 3.67 1.67 9.734790 ## 2 6 19.12 204.55 115.25 3.42 3.39 19.21 1 0 3.50 2.50 8.127912 ## 3 8 15.05 357.62 194.17 3.12 4.10 17.14 0 3.00 3.08 6.397755 Little bunny foo-foo again: # Or with pipes: little_bunny() %>% assign("foo_foo", . ) %>% hop_through(forest) %>% scoop_up(field_mouse) %>% bop_on(head)
magrittr - usual back assignment <- at start
Alternatively, one could reserve the variable name at the beginning: new_mtcars <- # Use assigment operation up front <- mtcars %>% subset(am == 0) %>% aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2)) %>% transform(kpl = mpg %>% multiply_by(0.4251)) head(new_mtcars) ## cyl mpg disp hp drat wt qsec vs am gear carb kpl ## 1 4 22.90 135.87 84.67 3.77 2.94 20.97 1 0 3.67 1.67 9.734790 ## 2 6 19.12 204.55 115.25 3.42 3.39 19.21 1 0 3.50 2.50 8.127912 ## 3 8 15.05 357.62 194.17 3.12 4.10 17.14 0 3.00 3.08 6.397755 At the conceptual cost of breaking the pipeline flow metaphor (a bit).
magrittr - compound assignment
Or, perhaps most perversely, could use a different pipe connector, the so-called compound assignment pipe-operator %<>% could be used. N.B. this will have the side-effect that the original data will be changed. For illustration, first copy mtcars
new_mtcars <- mtcars # make a copy head(new_mtcars, 2) ## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21 6 160 110 3.9 2.620 16.46 1 4 4 ## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 1 4 4
Now use the compound assignment %<>% on new_mtcars
new_mtcars %<>% # Use compound assignment subset(am == 0) %>% aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2)) %>% transform(kpl = mpg %>% multiply_by(0.4251)) # what has happened to new_mtcars? head(new_mtcars, 2) ## cyl mpg disp hp drat wt qsec vs am gear carb kpl ## 1 4 22.90 135.87 84.67 3.77 2.94 20.97 1 0 3.67 1.67 9.734790 ## 2 6 19.12 204.55 115.25 3.42 3.39 19.21 1 0 3.50 2.50 8.127912
It’s as if the data got passed through the pipe and bounced back at the end of the flow!
magrittr - simplified program control with pipes
%>% works with any function provided it accepts the output from the pipe as its first argument. For example, we could also create a plot using pipes
mtcars %>% subset(am == 0) %>% transform(kpl = mpg %>% multiply_by(0.4251)) %>% data.frame(lp100k = 100/.$kpl) %>% with(plot(x = wt, y = lp100k, col = "firebrick", xlab = "Weight", ylab = "litres per 100 km." ))
2.5 3.0 3.5 4.0 4.5 5.0 5.5 10 14 18 22 Weight litres per 100 km.
Note data.frame() here appended the column lp100k to its input data. (e.g. try data.frame(mtcars, mtcars))
magrittr - pipes and with()
Note the use of with() to move things along through the pipe. For example, will the following work?
mtcars %>% subset(am == 0) %>% transform(kpl = mpg %>% multiply_by(0.4251)) %>% data.frame(lp100k = 100/.$kpl) %>% with(plot(x = wt, y = lp100k, col = "firebrick", xlab = "Weight", ylab = "litres per 100 km." )) %>% with(lines(x = range(wt), y = range(lp100k), col = "steelblue", lwd = 2))
The penultimate piece of pipe passed NULL as the output of the with(plot(...)) on to the final with(lines(...)). Does NOT work because of the NULL and because lines() cannot determine the values of wt and of lpk100.
magrittr - pipes and with()
Note the use of with() to move things along throught the pipe. For example, will the following work?
mtcars %>% subset(am == 0) %>% transform(kpl = mpg %>% multiply_by(0.4251)) %>% data.frame(lp100k = 100/.$kpl) %>% with(plot(x = wt, y = lp100k, col = "firebrick", xlab = "Weight", ylab = "litres per 100 km." )) %>% axis(side = 3)
The penultimate piece of pipe passed on NULL as the output of the with(plot) Works because axis() will happily accept (and ignore) NULL because the argument side was named and specified (otherwise it would have failed).
magrittr - pipes and with()
Note the use of with() to move things along throught the pipe. For example, will the following work?
mtcars %>% subset(am == 0) %>% transform(kpl = mpg %>% multiply_by(0.4251)) %>% data.frame(lp100k = 100/.$kpl) %>% with({ plot(x = wt, y = lp100k, col = "firebrick", xlab = "Weight", ylab = "litres per 100 km." ) lines(x = range(wt), y = range(lp100k), col = "steelblue", lwd = 2) }) %>% axis(side = 3)
The last piece of pipe passed on NULL as the output of the with(plot(...)) to axis() Works because axis(side = 3) is called on the active, wherever it might be. Which is pretty bad programming style.
magrittr - pipes and with()
The preferred use with ‘with()
mtcars %>% subset(am == 0) %>% transform(kpl = mpg %>% multiply_by(0.4251)) %>% data.frame(lp100k = 100/.$kpl) %>% with({ plot(x = wt, y = lp100k, col = "firebrick", xlab = "Weight", ylab = "litres per 100 km." ) lines(x = range(wt), y = range(lp100k), col = "steelblue", lwd = 2) axis(side = 3, col = "blue", col.ticks = "forestgreen") })
This time the value of the with(...) would be that of axis() which returns the location of the axis tic marks. You really need to know what each piece of a pipe passes on to the next!
magrittr - the exposition pipe %$%
Instead of using with() an exposition pipe %$% will do nearly the same. It exposes the names from the data from the LHS to be used in the RHS expression. For example, the following allows plot() to refer to the names wt and lp100k
- f its input data.frame.
mtcars %>% subset(am == 0) %>% transform(kpl = mpg %>% multiply_by(0.4251)) %>% data.frame(lp100k = 100/.$kpl) %$% # exposition pipe plot(wt, lp100k, col = "firebrick", xlab = "Weight", ylab = "litres per 100 km." )
2.5 3.0 3.5 4.0 4.5 5.0 5.5 10 12 14 16 18 20 22 Weight litres per 100 km.
magrittr - the exposition pipe %$%
Note however, that the pipe has ended with the plot (no further piping). We cannot, for example, now pipe to lines() or to axis(), and expect to continue to refer to the names of the dataset.
mtcars %>% subset(am == 0) %>% transform(kpl = mpg %>% multiply_by(0.4251)) %>% data.frame(lp100k = 100/.$kpl) %$% # exposition pipe plot(wt, lp100k, col = "firebrick", xlab = "Weight", ylab = "litres per 100 km." ) %>% lines(x = range(mtcars$wt), y = range(mtcars$lp100k), col = "steelblue", lwd = 2) %>% axis(side = 3, col = "blue", col.ticks = "forestgreen")
2.5 3.0 3.5 4.0 4.5 5.0 5.5 10 12 14 16 18 20 22 Weight litres per 100 km. 2.5 3.0 3.5 4.0 4.5 5.0 5.5
Instead, as above, we had to reintroduce the data mtcars which breaks the pipe metaphor.
magrittr - the exposition pipe %$%
Fortunately, this problem is easily resolved using braces {}.
mtcars %>% subset(am == 0) %>% transform(kpl = mpg %>% multiply_by(0.4251)) %>% data.frame(lp100k = 100/.$kpl) %$% # exposition pipe {plot(x = wt, lp100k, col = "firebrick", xlab = "Weight", ylab = "litres per 100 km." ) lines(x = range(wt), y = range(lp100k), col = "steelblue", lwd = 2) axis(side = 3, col = "blue", col.ticks = "forestgreen") }
2.5 3.0 3.5 4.0 4.5 5.0 5.5 10 14 18 22 Weight litres per 100 km. 2.5 3.0 3.5 4.0 4.5 5.0 5.5
magrittr - the exposition pipe %$% and braces {}
Can the piping continue? Of course. Just be mindful of the last output . . . it might not be what you need. Remember,
mtcars %>% subset(am == 0) %>% transform(kpl = mpg %>% multiply_by(0.4251)) %>% data.frame(lp100k = 100/.$kpl) %$% # exposition pipe {plot(x = wt, lp100k, col = "firebrick", xlab = "Weight", ylab = "litres per 100 km." ) lines(x = range(wt), y = range(lp100k), col = "steelblue", lwd = 2) axis(side = 3, col = "blue", col.ticks = "forestgreen") } %>% print
2.5 3.0 3.5 4.0 4.5 5.0 5.5 10 14 18 22 Weight litres per 100 km. 2.5 3.0 3.5 4.0 4.5 5.0 5.5
## [1] 2.5 3.0 3.5 4.0 4.5 5.0 5.5
which are just . . . the numeric tic locations from axis().
magrittr - the tee pipe %T>%
The tee pipe %T>% is like the usual pipe %>% except that it returns the value of the LHS instead of the value of the RHS. This can be very handy. For example,
library(knitr) mtcars %$% # exposition pipe lm(mpg ~ wt) %T>% # tee pipe, fit is passed on through the next piece {plot(x = .$model$wt, y = .$model$mpg, col = "firebrick", main = "1974 Motor Trend magazine", xlab = "Weight", ylab = "miles per US gallon" ) abline(.$coef, col = "steelblue") } %>% # standard pipe coefficients %>% # standard pipe round %>% # standard pipe kable
2 3 4 5 10 15 20 25 30
1974 Motor Trend magazine
Weight miles per US gallon
x (Intercept) 37 wt
- 5
magrittr - the tee pipe %T>% and loon
The tee pipe %T>% is especially handy with loon plots. For example,
set.seed(314159) library(loon) mtcars %$% # exposition pipe l_plot(x = wt, y = mpg, color = cyl, glyph = c("ocircle", "ccircle")[am+1], size = hp/5, itemLabel = rownames(.), title = "1974 Motor Trend magazine", xlabel = "Weight (1000s of lbs)", ylabel = "miles per US gallon") %>% # standard pipe, plot passed on l_configure("selected" = sample(c(TRUE, rep(FALSE, 5)), length(.["x"]), replace = TRUE) ) %T>% # tee pipe, # because l_scaleto_selected returns NULL, and plot needs to be passed on l_scaleto_selected %>% # standard pipe since l_configure returns plot l_configure("showGuides" = TRUE, "showScales" = TRUE) %>% # standard pipe l_configure("showItemLabels" = TRUE) %T>% # tee pipe because a layer would be returned by l_layer_line l_layer_line(x = sort(.["x"]), y = predict(lm(mpg ~ wt, data = data.frame(wt = .["x"], mpg = .["y"])), newdata = data.frame(wt = sort(.["x"])) ), color = l_getOption("select-color"), linewidth = 4 ) -> # forward assignment of plot to p
magrittr - the tee pipe %T>% and loon
produces
plot(p)
Weight (1000s of lbs) miles per US gallon
1974 Motor Trend magazine
2.2 2.6 3 3.4 16 20 24
magrittr - lots of ways to write the code
For example,
set.seed(314159) library(loon) mtcars %$% { # exposition pipe to several statements l_hist(disp, xlabel = "Displacement (cubic inches)", linkingGroup = "mtcars") ->> h # global assignment l_plot(x = wt, y = mpg, linkingGroup = "mtcars", color = cyl, glyph = c("ocircle", "ccircle")[am+1], size = hp/5, itemLabel = rownames(.), title = "1974 Motor Trend magazine", xlabel = "Weight (1000s of lbs)", ylabel = "miles per US gallon", showGuides = TRUE, showScales = TRUE, showItemLabels = TRUE, selected = disp < median(disp) ) } %T>% # tee pipe # because next expression returns NULL l_scaleto_selected %T>% # tee pipe again ... why? l_layer_line(x = sort(.["x"][.["selected"]]), y = predict(loess(mpg ~ wt, data = data.frame(wt = .["x"], mpg = .["y"]), # fit only the selected observations subset = .["selected"]), newdata = data.frame(wt = sort(.["x"][.["selected"]])) ), color = l_getOption("select-color"), linewidth = 4, dash = c(10,4), index = "end" # bottommost layer ) -> # forward assignment p
magrittr - the tee pipe %T>% and loon
From the previous plots we can still make adjustments and then export the results to a grid graphic.
h["showStackedColors"] <- TRUE l_scaleto_world(p) # Get the grobs necessary for grid.arrange gh <- loonGrob(h) gp <- loonGrob(p) library(gridExtra) # contains some friendly grid functions like grid.arrange(gh, gp, nrow = 1)
Displacement (cubic inches) Frequency Weight (1000s of lbs) miles per US gallon
1974 Motor Trend magazine
2 3 4 5 15 25 35
magrittr - same approach (sort of) with base graphics
For example,
set.seed(314159) mtcars %$% { # exposition pipe to several statements savePar <- par(mfrow = c(1,2)) hist(disp, xlab = "Displacement (cubic inches)") plot(x = wt, y = mpg, col = cyl, pch = c(19, 21)[am+1], cex = hp/50, # divide by 50 now main = "1974 Motor Trend magazine", xlab = "Weight (1000s of lbs)", ylab = "miles per US gallon" )
- rderX <- order(wt)
lines(x = sort(wt), # fit all observations y = predict(loess(mpg ~ wt, data = data.frame(wt = wt[orderX], mpg = mpg[orderX]))), col = "grey", lwd = 4, lty = 2 ) par(savePar) } # no assignment, no tee pipe
Histogram of disp Displacement (cubic inches) Frequency 100 200 300 400 500 1 2 3 4 5 6 7 2 3 4 5 10 15 20 25 30 1974 Motor Trend magazine Weight (1000s of lbs) miles per US gallon
On using magrittr pipes
- 1. Pipes connect a left hand side expression, LHS, to a right hand side
expression, RHS, as in LHS %pipe% RHS where the result of the LHS expression is passed as the first argument to the RHS expression. The result of the LHS can be referenced as a dot . in the RHS.
- 2. Pipelines are a series of connected pipes:
expr_1 %pipe% expr_2 %pipe% expr_3 %pipe% ... %pipe% expr_k
These are evaluated left to right in pairs as in
((...((expr_1 %pipe% expr_2) %pipe% expr_3) %pipe% ...) %pipe% expr_k)
On using magrittr pipes
- 3. There are four different pipes: %>%, %T>%,%$%, and%<>%‘
◮ Standard: LHS %>% RHS ◮ result of RHS is passed on ◮ Tee: LHS %T>% RHS ◮ result of LHS is passed on from RHS ◮ Exposition: LHS %$% RHS ◮ names of result of LHS are exposed to RHS ◮ result of RHS is passed on ◮ Compound assignment: LHS %<>% RHS ◮ result of RHS is passed on ◮ result of pipeline is finally assigned to LHS ◮ changes LHS by side-effect (e.g. try iris[,1:4] %<>% sqrt)
On using magrittr pipes
- 4. Pipelines are most easily understood when it is essentially the same object
being pushed through the pipes.
◮ Example: data construction pipeline
mtcars %>% subset(am == 0) %>% transform(lp100k = 100 /(mpg * 0.4251)) -> autoTransData
◮ Example: model pipeline
autoTransData %$% lm(lp100k ~ wt) %>% predict(interval = "prediction") -> autoTransFit
On using magrittr pipes
- 4. Continued
◮ Example: plot pipeline (mainly implicit pipeline)
data.frame(autoTransData, autoTransFit)[order(autoTransData[, "wt"]),] %T>% with({ plot(wt, lp100k, ylim = extendrange(c(lwr, fit, upr)), xlab = "Weight (1000s of lbs)", ylab = "Litres per 100 kilometres", main = "Cars with automatic transmissions", col = "firebrick", pch = 19) lines(wt, fit, col = "steelblue") lines(wt, lwr, col = "firebrick", lty = 2) lines(wt, upr, col = "firebrick", lty = 2) }) -> autoTransDataFit
2.5 3.0 3.5 4.0 4.5 5.0 5.5 5 10 15 20 25 Cars with automatic transmissions Weight (1000s of lbs) Litres per 100 kilometres
On using magrittr pipes
- 4. Continued
◮ Example: loon plot pipeline (more obvious pipeline)
data.frame(autoTransData, autoTransFit)[order(autoTransData[, "wt"]),] %>% with({ l_plot(x = wt, y = lp100k, linkingGroup = "automatic transmissions", xlabel = "Weight (1000s of lbs)", ylabel = "Litres per 100 kilometres", title = "Cars with automatic transmissions", color = "firebrick", glyph = "circle") %T>% l_layer_line(x = wt, y = fit, color = "steelblue", index = "end") %T>% l_layer_line(x = wt, y = lwr, dash = c(5,5), color = "firebrick", index = "end") %T>% l_layer_line(x = wt, y = upr, dash = c(5,5), color = "firebrick", index = "end") %T>% l_scaleto_world }) -> p plot(p)
Weight (1000s of lbs) Litres per 100 kilometres
Cars with automatic transmissions
On using magrittr pipes
When should you use pipelines?
◮ Not as a general programming style. ◮ More for data analysis, data wrangling, . . .
◮ track your analysis in easy to read pieces ◮ use many sets of small pipelines ◮ helps you understand and identify chunks of analysis ◮ interupt the pipe anywhere to make sure you are getting what you intended ◮ edit and re-run (true for any commands in a file) ◮ provides reusable chunks that might be adapted and applied to different data
and analyses
◮ if the pipeline becomes too difficult to read, it probably needs to be
separated into different pieces
A pipeline model for statistical graphics
Lee Wilkinson’s monumental The Grammar of Graphics begins with a pipeline model for constructing statistical graphics: Each step in the pipeline transforms its input to produce output for the next step. The order of steps is essential, though not all need be there for every plot. Because the pipeline consists of separate components, the final graphic that is rendered can be simply and sometimes dramatically changed by making changes to a single component in the pipeline.
ggplot2 – a grammar of graphics for R
library(ggplot2)
Inspired by Wilkinson’s “Grammar of Graphics”, ggplot2 is a “layered” grammar
- f graphics.
Much like Wilkinson’s original grammar, ggplot2 uses a pipeline model for its graphics construction in that a plot is built in an ordered series of steps, where each step operates on the output of its immediate predecessor in the line. Departing from the grammar, ggplot2 slightly mixes metaphors in that each step in the pipeline can (typically) be thought of as adding a layer to all that preceded it. From the ggplot2 book:
"The layered grammar of graphics (Wickham 2009) builds on Wilkinson’s grammar, focussing on the primacy of layers and adapting it for embedding within R. In brief, the grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). The plot may also contain statistical transformations of the data and is drawn on a specific coordinate system. Facetting can be used to generate the same plot for different subsets of the dataset. It is the combination of these independent components that make up a graphic."
Notationally, the components of the pipeline appear in sequence connected one to the next via an intervening + sign, thus emphasizing each as an addition of a layer (or of some further processing of the plot). Note that he + sign mixes the concepts of layer and of pipeline.
Example - South African heart disease
Consider the ‘SAheart‘ data from the package ‘ElemStatLearn‘. This is a sample from a retrospective study of heart disease in males from a high-risk region of the Western Cape, South Africa. There are 462 cases and 10 variates. The first few
- bervations (cases) are shown below.
sbp tobacco ldl adiposity famhist typea
- besity
alcohol age chd 160 12.00 5.73 23.11 Present 49 25.30 97.20 52 1 144 0.01 4.41 28.61 Absent 55 28.87 2.06 63 1 118 0.08 3.48 32.28 Present 52 29.14 3.81 46 170 7.50 6.41 38.03 Present 51 31.99 24.26 58 1 134 13.60 3.50 27.78 Present 60 25.99 57.34 49 1 132 6.20 6.47 36.21 Present 62 30.77 14.14 45 For example, sbp denotes “systolic blood pressure”, sbp “low density lipoprotein cholesterol”. famhist “family history of heart disease”, age “age at onset” (in years), and chd indicates whether the patient has coronary heart disease or not (a response).
(see help(SAheart, package="ElemStatLearn") for details)
Constructing a plot - the pipeline
In the grammar of graphics, a plot processes each component in turn
ggplot(data = SAheart)
First the data
Constructing a plot - pipeline
In the grammar of graphics, a plot processes each component in turn
ggplot(data = SAheart) + aes( x = age, y = chd)
0.00 0.25 0.50 0.75 1.00 20 30 40 50 60
age chd
Then the mapping of the data to plot “aesthetics”
Constructing a plot - pipeline
In the grammar of graphics, a plot processes each component in turn
ggplot(data = SAheart) + aes( x = age, y = chd) + geom_point()
0.00 0.25 0.50 0.75 1.00 20 30 40 50 60
age chd
Then the geometry.
Constructing a plot - pipeline
In the grammar of graphics, a plot processes each component in turn
ggplot(data = SAheart) + aes( x = age, y = chd) + geom_point() + geom_smooth()
0.00 0.25 0.50 0.75 1.00 20 30 40 50 60
age chd
Which can have several further steps in the pipeline
Constructing a plot
Alternatively, in the grammar of ggplot2, a plot is also a sum of component layers.
ggplot(data = SAheart, mapping = aes(x = age, y = chd))
0.00 0.25 0.50 0.75 1.00 20 30 40 50 60
age chd
The base display with mapping.
Constructing a plot
Alternatively, in the grammar of ggplot2, a plot is also a sum of component layers.
ggplot(data = SAheart, mapping = aes(x = age, y = chd)) + geom_point()
0.00 0.25 0.50 0.75 1.00 20 30 40 50 60
age chd
Here the + is adding layers.
Constructing a plot
Alternatively, in the grammar of ggplot2, a plot is also a sum of component layers.
ggplot(data = SAheart, mapping = aes(x = age, y = chd)) + geom_point() + geom_smooth()
0.00 0.25 0.50 0.75 1.00 20 30 40 50 60
age chd
Here the + is adding layers.
Constructing a plot - separate mappings
Alternatively, we could deliberately associate only the data with the plot, forcing the mapping of the data to aesthetics within each individual component layer: ggplot(data = SAheart) + geom_point(mapping = aes(x = age, y = chd))
0.00 0.25 0.50 0.75 1.00 20 30 40 50 60
age chd
The mapping is explicit for each layer.
Constructing a plot - separate mappings
What would the following plot look like?
ggplot(data = SAheart) + geom_point(mapping = aes(x = age, y = chd)) + geom_smooth()
Constructing a plot - separate mappings
What would the following plot look like?
ggplot(data = SAheart) + geom_point(mapping = aes(x = age, y = chd)) + geom_smooth()
It fails . . . why? How could it be fixed?
Constructing a plot - separate mappings
What would the following plot look like?
ggplot(data = SAheart) + geom_point(mapping = aes(x = age, y = chd)) + geom_smooth()
It fails . . . why? How could it be fixed? Cautionary note: the ggplot2 grammar mixes the two metaphors of “layers” and “pipes”. Just because an aesthetic precedes a component in the pipeline does not mean that it is available for use.
Constructing a plot - separate mappings
Solution 1: explicitly, give the mapping for each layer:
ggplot(data = SAheart) + geom_point(mapping = aes(x = age, y = chd)) + geom_smooth(mapping = aes(x = age, y = chd))
0.00 0.25 0.50 0.75 1.00 20 30 40 50 60
age chd
Constructing a plot - separate mappings
Solution 2: provide aesthetics upstream:
ggplot(data = SAheart) + geom_point(mapping = aes(x = age, y = chd)) + aes(x = age, y = chd) + geom_smooth()
0.00 0.25 0.50 0.75 1.00 20 30 40 50 60
age chd
Constructing a plot - separate mappings
ggplot(data = SAheart) + geom_point(mapping = aes(x = age, y = chd, col = famhist)) + geom_smooth(mapping = aes(x = age, y = chd))
0.00 0.25 0.50 0.75 1.00 20 30 40 50 60
age chd famhist
Absent Present
Constructing a plot - shared and separate mappings
ggplot(data = SAheart) + aes(group = famhist) + geom_point(mapping = aes(x = age, y = chd)) + geom_smooth(mapping = aes(x = age, y = chd))
0.0 0.5 1.0 20 30 40 50 60
age chd
Constructing a plot - shared and separate mappings
ggplot(data = SAheart, mapping = aes(group = famhist)) + geom_point(mapping = aes(x = age, y = chd, col = famhist)) + geom_smooth(mapping = aes(x = age, y = chd))
0.0 0.5 1.0 20 30 40 50 60
age chd famhist
Absent Present
Constructing a plot - shared and separate mappings
ggplot(data = SAheart, mapping = aes(group = famhist, col = famhist)) + geom_point(mapping = aes(x = age, y = chd)) + geom_smooth(mapping = aes(x = age, y = chd))
0.0 0.5 1.0 20 30 40 50 60
age chd famhist
Absent Present
Constructing a plot
Alternatively, we could split the plot into two pieces by facetting:
ggplot(data = SAheart, mapping = aes(x = age, y = chd)) + geom_point(col="steelblue", size = 3, alpha = 0.4) + geom_smooth(method = "loess", col = "steelblue") + facet_wrap(~famhist)
Absent Present 20 30 40 50 60 20 30 40 50 60 0.0 0.5 1.0
age chd
Components of the layered grammar
In the grammar of ggplot2, a plot is a structured combination of:
◮ a dataset, ◮ a set of mappings from variates to aesthetics, ◮ one or more layers, each composed of ◮ a geometric object, ◮ a statistical transformation, ◮ a position adjustment, and ◮ (optionally) its own dataset and aesthetic mappings ◮ a scale for each aesthetic mapping, ◮ a coordinate system, ◮ a facetting specification
Geometric objects
There are a variety of geometric objects that can be added to a plot
◮ geom_abline(), geom_hline(),geom_vline(), geom_curve(),
geom_segment(), geom_step()
◮ geom_label(), geom_text() ◮ geom_point(), geom_smooth(), geom_crossbar(), geom_errorbar(),
geom_errorbarh(), geom_linerange(), geom_pointrange(),
◮ geom_rect(), geom_raster(), geom_area(), geom_ribbon(),
geom_tile(),
◮ geom_bar(), geom_col(), ◮ geom_dotplot(), geom_boxplot(), geom_histogram(),
geom_freqpoly(), geom_density(), geom_violin(), geom_quantile(), geom_qq()
◮ geom_bin2d(), geom_density2d(), geom_hex(), ◮ geom_contour(), ◮ geom_map(), geom_polygon()
Each of these will have their own arguments including mapping, data, stat, et cetera.
Geometric objects - adding to plots
Beginning with a plot different geometric objects may be added. For example:
p <- ggplot(data = SAheart, mapping = aes(x = tobacco, y = sbp)) p
100 125 150 175 200 10 20 30
tobacco sbp
Geometric objects - points and density
Beginning with a plot different geometric objects may be added. For example:
p + geom_point() + geom_density_2d(lwd = 1.5, col = "steelblue")
100 125 150 175 200 10 20 30
tobacco sbp
Geometric objects - histogram
h <- ggplot(data = SAheart, mapping = aes(x = adiposity)) h + geom_histogram(bins = 10, fill = "steelblue", col = "black", alpha = 0.5)
25 50 75 10 20 30 40
adiposity count
Note that had we tried to layer the histogram on top of p, it would have inherited from p a y aesthetic (namely y = sbp) which does not make sense for a histogram.
Geometric objects - histogram
h + geom_histogram(mapping = aes(y = ..density..), bins = 10, fill = "steelblue", col = "black", alpha = 0.5)
0.00 0.01 0.02 0.03 0.04 0.05 10 20 30 40
adiposity density
A y aesthetic that does make sense for a histogram is ..density.. which forces the scaling of the vertical axis so that the histogram has unit area. Note that the x aesthetic was inherited from h.
Geometric objects - density scale histogram
Provided we provide a y aesthetic mapping, a histogram could therefore be added to p as well. p + geom_histogram(mapping = aes(x = adiposity, y = ..density..), bins = 10, fill = "steelblue", col = "black", alpha = 0.5)
0.00 0.01 0.02 0.03 0.04 0.05 10 20 30 40
tobacco sbp
Note:
◮ the change in vertical scale matches the histogram ◮ the axes labels do not match the aesthetics of the histogram (though the tick marks and
values happen to) Because this is only a grammar, it is as easy to make silly visualizations as it is silly sentences.
Geometric objects - layering effect
The order of layering (on top of h now) matters:
h + geom_histogram(mapping = aes(y = ..density..), bins = 10, fill = "steelblue", col = "black", alpha = 0.5) + geom_density(mapping = aes(y = ..density..), fill = "grey", alpha = 0.5)
0.00 0.01 0.02 0.03 0.04 0.05 10 20 30 40
adiposity density
Note that the y aesthetic had to be repeated here . . .
Geometric objects - layering effect
Switch the order of addition:
h + geom_density(mapping = aes(y = ..density..), fill = "grey", alpha = 0.5) + geom_histogram(mapping = aes(y = ..density..), bins = 10, fill = "steelblue", col = "black", alpha = 0.5)
0.00 0.01 0.02 0.03 0.04 0.05 10 20 30 40
adiposity density
Note that the aesthetics need to be repeated here . . .
Geometric objects - bar charts
ggplot(SAheart) + geom_bar(mapping = aes(x = factor(chd), fill = famhist)) + labs(x = "chd", title ="South African heart disease") + coord_flip()
1 100 200 300
count chd famhist
Absent Present
South African heart disease
Which makes you wonder how the data were collected.
Geometric objects
A different scatterplot
p2 <- ggplot(data = SAheart, mapping = aes(x = sqrt(age), y = sbp)) p2 + geom_point()
100 125 150 175 200 4 5 6 7 8
sqrt(age) sbp
Geometric objects
Note that each geometric object has its own arguments and properties that can be set.
p2 + geom_point(col = "red", size = 3, pch = 21, fill = "yellow", alpha = 0.5) + geom_smooth(method = "loess", col = "steelblue", lty = 2, lwd = 1.5, alpha = 0.2)
100 125 150 175 200 4 5 6 7 8
sqrt(age) sbp
Geometric objects
Aesthetics apply to every point individually.
p2 + geom_point(mapping = aes(size = obesity), fill = "steelblue", col = "black", pch = 21, alpha = 0.4) + geom_smooth(method = "loess", col = "yellow", lty = 2, lwd = 1.5, alpha = 0.2)
100 125 150 175 200 4 5 6 7 8
sqrt(age) sbp
- besity
20 30 40
Geometric objects
Aesthetics apply to every point individually.
p2 + geom_point(mapping = aes(size = obesity, fill = tobacco), col = "black", pch = 21, alpha = 0.4) + geom_smooth(method = "loess", col = "yellow", lty = 2, lwd = 1.5, alpha = 0.2)
100 125 150 175 200 4 5 6 7 8
sqrt(age) sbp
- besity
20 30 40 10 20 30
tobacco
Geometric objects
The data may change with each layer
heartAttack <- SAheart[, "chd"] == 1 hAplot <- p2 + geom_point(data = SAheart[heartAttack, ], mapping = aes(size = obesity), alpha = 0.4, col = "black", pch = 21, fill = "steelblue") hAplot
100 125 150 175 200 4 5 6 7 8
sqrt(age) sbp
- besity
15 20 25 30 35 40 45
Geometric objects
The data may change with each layer
qboth <- hAplot + geom_point(data = SAheart[!heartAttack, ], # Not heartAttack mapping = aes(size = obesity), alpha = 0.4, col = "black", pch = 21, fill = "firebrick") qboth
100 125 150 175 200 4 5 6 7 8
sqrt(age) sbp
- besity
20 30 40
Geometric objects
The data may change with each layer
qboth + geom_smooth(data = SAheart[heartAttack, ], method = "loess", col = "steelblue", alpha = 0.4) + geom_smooth(data = SAheart[!heartAttack, ], method = "loess", col = "firebrick", alpha = 0.4)
120 160 200 4 5 6 7 8
sqrt(age) sbp
- besity
20 30 40
Geometric objects
The data may change with each layer
qboth + geom_smooth(method = "loess")
100 125 150 175 200 4 5 6 7 8
sqrt(age) sbp
- besity
20 30 40
Note smooth is using all of the data here.
Geometric objects
The data may change with each layer
qboth + geom_smooth(mapping = aes(color = factor(chd)), method = "loess")
120 160 200 4 5 6 7 8
sqrt(age) sbp factor(chd)
1
- besity
20 30 40
Here the smooth is separate for each colour given by chd as factor. Note ggplot’s default colours.
Geometric objects
The colours can be coordinated by relying on the original data and using chd as a factor:
p2 + geom_point(mapping = aes(size = obesity, fill = factor(chd)), col = "black", pch = 21, alpha = 0.4) + geom_smooth(mapping = aes(col = factor(chd)), method = "loess", lwd = 1.5, alpha = 0.2)
120 160 200 4 5 6 7 8
sqrt(age) sbp factor(chd)
1
- besity
20 30 40
Here the smooth is separate for each colour given by chd as factor.
Scales
A map from the domain of data values to the range of some aesthetic (e.g. colour, size, axis ranges, . . . ).
p2 + geom_point(mapping = aes(size = obesity, fill = factor(chd)), col = "black", pch = 21, alpha = 0.4) + geom_smooth(mapping = aes(col = factor(chd)), method = "loess", lwd = 1.5, alpha = 0.2) + scale_fill_manual("chd", values=c("steelblue", "firebrick"))+ scale_color_manual("chd", values=c("steelblue", "firebrick"))
120 160 200 4 5 6 7 8
sqrt(age) sbp chd
1
- besity
20 30 40
. . . gets your own “scale” values for colour and for fill.
Scales
A map from the domain of data values to the range of some aesthetic (e.g. colour, size, axis ranges, . . . ).
p2 + geom_point(mapping = aes(size = obesity, fill = factor(chd)), col = "black", pch = 21, alpha = 0.4) + geom_smooth(mapping = aes(col = factor(chd)), method = "loess", lwd = 1.5, alpha = 0.2) + scale_fill_manual("chd", values=c("steelblue", "firebrick"))+ scale_color_manual("chd", values=c("steelblue", "firebrick")) + scale_size("obesity", breaks = seq(0,100,5))
120 160 200 4 5 6 7 8
sqrt(age) sbp chd
1
- besity
15 20 25 30 35 40 45
. . . additonally gets your own “scale” values for point size (which is proportional to area).
Scales
A map from the domain of data values to the range of some aesthetic (e.g. colour, size, axis ranges, . . . ).
p2 + geom_point(mapping = aes(size = obesity, fill = factor(chd)), col = "black", pch = 21, alpha = 0.4) + geom_smooth(mapping = aes(col = factor(chd)), method = "loess", lwd = 1.5, alpha = 0.2) + scale_fill_manual("chd", values=c("steelblue", "firebrick"))+ scale_color_manual("chd", values=c("steelblue", "firebrick")) + scale_size_area("obesity", breaks = seq(0,100,5))
120 160 200 4 5 6 7 8
sqrt(age) sbp chd
1
- besity
15 20 25 30 35 40 45
. . . Now a zero value gives a zero area.
Position scales
There are two position scales: horizontal (x) and vertical (y).
p + geom_point(alpha = 0.5, size = 1) + scale_x_continuous(limits = c(0,40)) + scale_y_continuous(limits = c(75,225))
100 150 200 10 20 30 40
tobacco sbp
Position scales
There are two position scales: horizontal (x) and vertical (y).
p + geom_point(alpha = 0.5, size = 1) + xlim(0,40) + ylim(75,225)
100 150 200 10 20 30 40
tobacco sbp
Position scales
There are two position scales: horizontal (x) and vertical (y).
p + aes(x = tobacco + 1) + geom_point(alpha = 0.5, size = 1) + scale_x_log10()
100 125 150 175 200 1 3 10 30
tobacco + 1 sbp
Coordinates
This is the coordinate system in which the positions are to be plotted. We have already seen coord_flip() which swaps the x and y axes. There are many
- thers; the aspect ratio, for example, is fixed using coord_fixed():
ggplot(SAheart, aes(obesity, adiposity)) + geom_point() + coord_fixed(ratio = 1)
10 20 30 40 20 30 40
- besity
adiposity
Here the aspect ratio is fixed so that one unit change in the x direction produces only one unit change in the y direction.
Coordinates
This is the coordinate system in which the positions are to be plotted. We have already seen coord_flip() which swaps the x and y axes. There are many
- thers; the aspect ratio, for example, is fixed using coord_fixed():
ggplot(SAheart, aes(obesity, adiposity)) + geom_point() + coord_fixed(ratio = 0.5)
10 20 30 40 20 30 40
- besity
adiposity
Here the aspect ratio is fixed so that one unit change in the x direction produces only half a unit change in the y direction.
Coordinates
One coordinate system that is used is called coord_polar() which, unlike its name suggests, does not calculate a polar coordinate transformation but rather treats one of the two positions as defining an angle and the other as defining the radius.
ggplot(SAheart, aes(obesity, adiposity)) + geom_point() + geom_smooth() + coord_polar(theta = "x")
20 30 40 10 20 30 40
- besity
adiposity
which, arguably, is a pretty weird plot but does demonstrate how coordinate systems are abstracted
- ut as part of the grammar. Consequently coord_polar() should be used with considerable caution
Coordinates
Arguably overly complicated, one use of coord_polar() is to construct a pie chart. This is just a bar chart expressed using coord_polar(). First the bar chart
barChart <- ggplot(SAheart, aes(x = factor(1), fill = famhist)) + geom_bar(width = 1) + xlab("") barChart
100 200 300 400 1
count famhist
Absent Present
Coordinates
Arguably overly complicated, one use of coord_polar() is to construct a pie chart. This is just a bar chart expressed using coord_polar(). Now the pie chart
barChart + coord_polar(theta = "y")
100 200 300 400 1
count famhist
Absent Present
Be careful with coord_polar() and bar charts; it is easy to produce some very silly pointless charts.
Positions
A bar chart with two variates. Default position is “stack”
barChart2 <- ggplot(SAheart, aes(x = factor(chd), fill = famhist)) + geom_bar(position="stack") + xlab("chd") barChart2
100 200 300 1
chd count famhist
Absent Present
which stacks the two colours in the same bar.
Positions
To place the colours beside each other rather than stack them, change the position to “dodge”
barChart3 <- ggplot(SAheart, aes(x = factor(chd), fill = famhist)) + geom_bar(position="dodge") + xlab("chd") barChart3
50 100 150 200 1
chd count famhist
Absent Present
Statistical transformations - stat
These transformations often summarize data in some manner (e.g. by counting, by averaging, etc.). Some statistical functions operate “behind the scenes”:
◮ stat_bin() in geom_bar(), geom_freqpoly(), geom_histogram() ◮ stat_bin2d() in geom_bin2d() ◮ stat_bindot() in geom_dotplot() ◮ stat_binhex() in geom_hex() ◮ stat_boxplot() in geom_boxplot() ◮ stat_contour() in geom_contour() ◮ stat_quantile() in geom_quantile() ◮ stat_smooth() in geom_smooth() ◮ stat_sum() in geom_count()
but may also be called directly (outside the geom_)
Statistical transformations - stat
Other stats have no corresponding geom_ function:
◮ stat_ecdf(): compute a empirical cumulative distribution plot. ◮ stat_function(): compute y values from a function of x values. ◮ stat_summary(): summarise y values at distinct x values. ◮ stat_summary2d(), stat_summary_hex(): summarise binned values. ◮ stat_qq(): perform calculations for a quantile-quantile plot. ◮ stat_spoke(): convert angle and radius to position. ◮ stat_unique(): remove duplicated rows.
Statistical transformations - example
Adding some statistical summary information to the scatterplot p2
p2 + geom_point(mapping = aes(size = obesity, fill = factor(chd)), col = "black", pch = 21, alpha = 0.4) + stat_summary(geom = "point", fun = "median",
- rientation = "x",
col = "yellow", size = 2, pch = 19)
100 125 150 175 200 4 5 6 7 8
sqrt(age) sbp factor(chd)
1
- besity
20 30 40
Adds the median of the ys at each observed x.
Statistical transformations - example
Alternatively use stat = "summary" in geom_point(). Also add connecting lines to the scatterplot p2
p2 + geom_point(mapping = aes(size = obesity, fill = factor(chd)), col = "black", pch = 21, alpha = 0.4) + geom_point(stat = "summary", fun = "median",
- rientation = "x",
col = "yellow", size = 2, pch = 19) + stat_summary(geom = "line", fun = "median",
- rientation = "x",
col = "yellow", size = 1, pch = 19)
100 125 150 175 200 4 5 6 7 8
sqrt(age) sbp factor(chd)
1
- besity
20 30 40
Adds the median of the ys at each observed x.
Miscellaneous
◮ Can also facet in a matrix grid using facet_grid() ◮ position can also be “jitter” (best for scatterplots) ◮ there is a function called theme() which is how the appearance of all
non-data plot components are changed.
◮ E.g. it is possible to turn that grey background grid off via theme() (though
it seems a lot of work)
◮ there is a function qplot() or quickplot() which is more like a base
graphics plot() call and so may be easier to use than following the ggplot2 grammar via ggplot() + ...
◮ ggsave() will save the most recent ggplot.
Miscellaneous
Note: to plot time series (objects of class ts) you need the ggfortify package and then use autoplot().
library(ggfortify) autoplot(sunspots)
50 100 150 200 250 1750 1800 1850 1900 1950
Similarly, library(ggmap) for raster maps from get_map()
Working with magrittr pipes
The grammar model of ggplot has + behaving much like a pipe in magrittr and can be used with the pipes of magrittr.
library(magrittr) mtcars %>% subset(am == 0) %>% transform(kpl = mpg %>% multiply_by(0.4251)) %>% data.frame(lp100k = 100/.$kpl) %>% ggplot(aes(x = wt, y = lp100k)) + geom_point(mapping = aes(col = vs)) + ylab("Litres per 100 kilometres") + ggtitle("Gas usage") -> p p
12 16 20 3 4 5
wt Litres per 100 kilometres
0.00 0.25 0.50 0.75 1.00
vs
Gas usage
Working with magrittr pipes
Note that unlike the base graphics plots, but like grid and loon plots, ggplots are structures that can be passed on with the pipes.
library(magrittr) mtcars %>% subset(am == 0) %>% transform(kpl = mpg %>% multiply_by(0.4251)) %>% data.frame(lp100k = 100/.$kpl) %>% ggplot(aes(x = wt, y = lp100k)) + geom_point(mapping = aes(col = vs)) + ylab("Litres per 100 kilometres") + ggtitle("Gas usage") %>% print ## $title ## [1] "Gas usage" ## ## attr(,"class") ## [1] "labels"
12 16 20 3 4 5
wt Litres per 100 kilometres
0.00 0.25 0.50 0.75 1.00
vs
Gas usage
Note that typically a ggplot data structure is not completely constructed until it has been printed (or forced).
Interactive ggplots via loon.ggplot package
Coming soon to CRAN!! ggplots can be made interactive using loon.ggplot() Github: “https://github.com/great-northern-diver/loon.ggplot”
A different piping package pipeR
An alternative pipeline package is pipeR which has a single pipe operator %>>%
◮ simplifies syntax (no looking for the one or two symbols that are different
between %)
◮ pipes to first argument ◮ pipes to dot . (handles formula as well) ◮ single piece of new syntax ~ allows s
◮ “tee pipe” by identifying pipe components as only for side effect ◮ allows assignment within the pipeline ◮ allows simeple printing of comment strings