ggmap
Recent Advances in Spatial Visualization
Department of Statistical ScienceDavid J. Kahle, Ph.D.
Available on CRAN!
ggmap Available on CRAN! Recent Advances in Spatial Visualization - - PDF document
ggmap Available on CRAN! Recent Advances in Spatial Visualization David J. Kahle, Ph.D. @ Department of Statistical Science B A Y L OR u n i v e r s i t y components of this talk 1. ggplot2 and the layered grammar of graphics 2. ggmap
Recent Advances in Spatial Visualization
Department of Statistical ScienceDavid J. Kahle, Ph.D.
Available on CRAN!
components of this talk
and the layered grammar of graphics ggmap = ggplot2 + online mapping sources
David J. Kahle, Ph.D. 45th Symposium on the Interface (2015) : Data Science u n i v e r s i t yB A Y L OR
Visualizing spatial interpolation Updates to geocode
ggplot2
and the layered grammar of graphics
B A Y L OR
the diamonds dataset
head(diamonds)
## carat cut color clarity depth table price x y z ## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 ## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 ## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 ## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63 ## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 ## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
B A Y L OR
the diamonds dataset
Clarity Flawless Internally Flawless Very Very Slightly Included Very Slightly Included Slightly Included Included Grade FL IF VVS1 VVS2 VS1 VS2 SI1 SI2 I1 I2 I3Color Clarity
Source : http://www.affordablediamondsonline.com/diamondshapes.jpg, http://beaufortsjeweler.com/images/diamond_color_chart.gif, http://www.am-diamonds.com/images/anatomy_diamond.gifz table width x carat y price also depth = z / diameter table = table width / x
Everything else Cut Ideal > Premium > Very Good > Good > Fair
B A Y L OR
the diamonds dataset
head(diamonds)
## carat cut color clarity depth table price x y z ## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 ## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 ## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 ## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63 ## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 ## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
library(dplyr) set.seed(1) d <- diamonds %>% sample_n(100)
the diamonds dataset is pretty big, so let’s thin it down:
B A Y L OR
plot(d$carat, d$price)
d$carat d$price
the ggplot2 framework
d$carat d$price
David J. Kahle, Ph.D. 45th Symposium on the Interface (2015) : Data Science u n i v e r s i t yB A Y L OR
plot(d$carat, d$price)
the ggplot2 framework
library(ggplot2) theme_set(theme_bw()) qplot(d$carat, d$price)
d$carat d$price
David J. Kahle, Ph.D. 45th Symposium on the Interface (2015) : Data Science u n i v e r s i t yB A Y L OR
plot(d$carat, d$price)
the ggplot2 framework
library(ggplot2) theme_set(theme_bw()) qplot(d$carat, d$price)
carat price
qplot(carat, price, data = d)
qplot(carat, price, data = d)
d$carat d$price
David J. Kahle, Ph.D. 45th Symposium on the Interface (2015) : Data Science u n i v e r s i t yB A Y L OR
the ggplot2 framework
carat price
qplot(carat, price, data = d, geom = "point" )
carat price
qplot(carat, price, data = d)
David J. Kahle, Ph.D. 45th Symposium on the Interface (2015) : Data Science u n i v e r s i t yB A Y L OR
the ggplot2 framework
qplot(carat, price, data = d, geom = "point" ) qplot(carat, price, data = d, geom = "smooth" )
carat price
qplot(carat, price, data = d)
David J. Kahle, Ph.D. 45th Symposium on the Interface (2015) : Data Science u n i v e r s i t yB A Y L OR
the ggplot2 framework
qplot(carat, price, data = d, geom = "point" ) qplot(carat, price, data = d, geom = "smooth" ) qplot(carat, price, data = d, geom = "density2d" )
carat price
qplot(carat, price, data = d)
David J. Kahle, Ph.D. 45th Symposium on the Interface (2015) : Data Science u n i v e r s i t yB A Y L OR
the ggplot2 framework
qplot(carat, price, data = d, geom = "point" ) qplot(carat, price, data = d, geom = "smooth" ) qplot(carat, price, data = d, geom = "density2d" ) qplot(carat, price, data = d, geom = c("density2d", "point") )
B A Y L OR
abline bar histogram bin2d blank boxplot crossbar dotplot errorbar freqpoly hex hline linerange map density contour density2d path step point jitter
pointrange
polygon quantile raster rect ribbon rug segment smooth text tile violin vline
See docs.ggplot2.org/current/ for details
list of geoms
The geom you use depends on your situation!
B A Y L OR
abline bar histogram bin2d blank boxplot crossbar dotplot errorbar freqpoly hex hline linerange map density contour density2d path step point jitter
pointrange
polygon quantile raster rect ribbon rug segment smooth text tile violin vline
See docs.ggplot2.org/current/ for details
1d discrete geom
list of geoms
B A Y L OR
abline bar histogram bin2d blank boxplot crossbar dotplot errorbar freqpoly hex hline linerange map density contour density2d path step point jitter
pointrange
polygon quantile raster rect ribbon rug segment smooth text tile violin vline
See docs.ggplot2.org/current/ for details
1d continuous geoms
list of geoms
B A Y L OR
abline bar histogram bin2d blank boxplot crossbar dotplot errorbar freqpoly hex hline linerange map density contour density2d path step point jitter
pointrange
polygon quantile raster rect ribbon rug segment smooth text tile violin vline
See docs.ggplot2.org/current/ for details
discrete-continuous geoms
list of geoms
B A Y L OR
abline bar histogram bin2d blank boxplot crossbar dotplot errorbar freqpoly hex hline linerange map density contour density2d path step point jitter
pointrange
polygon quantile raster rect ribbon rug segment smooth text tile violin vline
See docs.ggplot2.org/current/ for details
continuous-continuous geoms
list of geoms
qplot(carat, price, data = d)
d$carat d$price
David J. Kahle, Ph.D. 45th Symposium on the Interface (2015) : Data Science u n i v e r s i t yB A Y L OR
aesthetics
carat price
x y
carat price
cutqplot(carat, price, data = d)
David J. Kahle, Ph.D. 45th Symposium on the Interface (2015) : Data Science u n i v e r s i t yB A Y L OR
aesthetics x y
qplot(carat, price, data = d, color = cut )
x y color
carat price
cutqplot(carat, price, data = d)
David J. Kahle, Ph.D. 45th Symposium on the Interface (2015) : Data Science u n i v e r s i t yB A Y L OR
aesthetics x y
qplot(carat, price, data = d, color = cut )
x y color
qplot(carat, price, data = d, size = cut )
x y size
B A Y L OR
aesthetic mappings and the grammar of graphics
Variables in dataset
cut color clarity price
z measurement y measurement x measurement table depth x axis y axis z axis? size shape color fill alphaVisual elements of graphic (aesthetics)
carat
B A Y L OR
aesthetic mappings and the grammar of graphics
Variables in dataset
cut color clarity price
z measurement y measurement x measurement table depth x axis y axis z axis? size shape color fill alphaVisual elements of graphic (aesthetics)
carat
carat price
cutqplot(carat, price, data = d)
David J. Kahle, Ph.D. 45th Symposium on the Interface (2015) : Data Science u n i v e r s i t yB A Y L OR
aesthetics
qplot(carat, price, data = d, color = cut ) qplot(carat, price, data = d, size = cut )
B A Y L OR
aesthetic mappings and the grammar of graphics
adj alpha angle bg cex col color colour fg fill group hjust label linetype lower lty lwd max middle min
pch radius
sample
shape size srt upper vjust weight width x xend xmax xmin xintercept y yend ymax ymin yintercept z
different aesthetics apply to different geoms
B A Y L OR
aesthetic scales A scale is a function that maps levels of a variable to aesthetic values
cut color price
z measurement y measurement x measurement table depth x axis y axis z axis? size shape color fill alphacarat
cut
Good Very Good Premium Ideal
Fair Good … … scale
carat price
cut Fair Good Very Good Premium Idealqplot(carat, price, data = d)
David J. Kahle, Ph.D. 45th Symposium on the Interface (2015) : Data Science u n i v e r s i t yB A Y L OR
aesthetics
qplot(carat, price, data = d, color = cut ) qplot(carat, price, data = d, size = cut, alpha = I(.4) )
setting aesthetics
for when you don’t want to map them to a variable
B A Y L OR
Discrete Continuous Color/Fill Size Shape Alpha
Rainbow of colors Color gradient Discrete size steps Different shape for points
Aesthetic effect by variable type
Continuous size gradient
Bubble Charts
Discrete transparency steps Continuous transparency gradient
aesthetic scales
B A Y L OR
layered ggplot specifications Both of the following make the same graphic
qplot(carat, price, data = d) ggplot() + layer( data = d, mapping = aes(x = carat, y = price), geom = "point" )
lays down a background and axes lays down a layer with only the geom
B A Y L OR
layered ggplot specifications
5000 10000 15000 0.5 1.0 1.5 2.0 carat pricelayer(...) + =
qplot(...)
B A Y L OR
layered ggplot specifications
= ggplot() layer(...) + qplot(...)
5000 10000 15000 0.5 1.0 1.5 2.0 carat price+ layer(...)
ggplot2 demands the aesthetic scales stay consistent across layers
B A Y L OR
layered ggplot specifications Both of the following make the same graphic
qplot(carat, price, data = d) ggplot() + layer( data = d, mapping = aes(x = carat, y = price), geom = "point" )
lays down a background and axes lays down a layer with only the geom
ggplot() + layer( data = d, mapping = aes(x = carat, y = price), geom = "point" ) + layer( data = d, mapping = aes(x = carat, y = price), geom = "density2d" )
David J. Kahle, Ph.D. 45th Symposium on the Interface (2015) : Data Science u n i v e r s i t yB A Y L OR
layered ggplot specifications Both of the following make the same graphic
qplot(carat, price, data = d)
B A Y L OR
layered ggplot specifications Both of the following make the same graphic
qplot(carat, price, data = d) ggplot() + geom_point(aes(carat, price), data = d) + geom_density2d(aes(carat, price), data = d)
ggmap
B A Y L OR
ggmap – the key ideas
B A Y L OR
From the meuse dataset documentation (in sp)
This data set gives locations and topsoil heavy metal concentrations, along with a number of soil and l a n d s c a p e v a r i a b l e s a t t h e
flood plain of the river Meuse, near the village of Stein (NL). Heavy metal concentrations are from composite samples of an area of approximately 15 m x 15 m.
Field data were collected by Ruud van Rijn and Mathieu Rikken; compiled for R by Edzer Pebesma; description extended by David Rossiterthe meuse dataset
B A Y L OR
head(m) ## lon lat cadmium copper lead zinc ## 1 5.758536 50.99156 11.7 85 299 1022 ## 2 5.757863 50.99109 8.6 81 277 1141 ## 3 5.759855 50.99089 6.5 68 199 640 ## 4 5.761746 50.99041 2.6 81 116 257 ## 5 5.761863 50.98903 2.8 48 117 269 ## 6 5.763040 50.98839 3.0 61 137 281
+ other variables the meuse dataset
B A Y L OR
when meuse is a spatialPointsDataFrame –
plot(meuse)
library(sp) library(rgdal)!
data(meuse) coordinates(meuse) <- c("x", "y") proj4string(meuse) <- CRS("+init=epsg:28992") meuse <- spTransform(meuse, CRS("+proj=longlat +datum=WGS84"))the meuse dataset
B A Y L OR
qmplot when meuse is a data.frame –
qmplot(lon, lat, data = m)
Map from URL : http://tile.stamen.com/toner-lite/15/16904/10970.png
# after the code from the last slide m <- data.frame(meuse@coords, meuse@data) names(m)[1:2] <- c("lon", "lat")B A Y L OR
when meuse is a data.frame –
qmplot(lon, lat, data = m) ## Using zoom = 15...
zoom = 14 )
qmplot
B A Y L OR
qmplot(lon, lat, data = m, size = zinc, zoom = 14, source = "google", maptype = "satellite", alpha = I(.75), color = I("green"), legend = "topleft", darken = .2 ) + scale_size("Zinc (ppm)")
qmplot
Zinc (ppm) 500 1000 1500
B A Y L OR
qmplot(lon, lat, data = m, size = zinc, zoom = 14, source = "google", maptype = "satellite", alpha = I(.75), color = I("green"), legend = "topleft", darken = .2 ) + scale_size("Zinc (ppm)") + geom_density2d(color = "lightblue", size = .25)
Not helpful, but illustrative… qmplot with an additional layer
Zinc (ppm) 500 1000 1500
B A Y L OR
the base layer formerly the ggmap()
Long form
ggmap(get_map(...), ...)
Short form
qmap(...)
"ggmap" object
raster with meta data
get_map() ggmap()
Quartz 2 [*]qmap()
…
Longest form
ggmap(get_googlemap(...), ...) ggmap(get_openstreetmap(...), ...) ggmap(get_stamenmap(...), ...) ggmap(get_cloudmademap(...), ...) ggmap(get_navermap(...), ...)B A Y L OR
ggmap and the base layer the Stamen Design maptypes
B A Y L OR
list of geoms useful in ggmap
abline bar histogram bin2d blank boxplot crossbar dotplot errorbar freqpoly hex hline linerange map density contour density2d path step point jitter
pointrange
polygon quantile raster rect ribbon rug segment smooth text tile violin vline
See docs.ggplot2.org/current/ for details
some modeling
B A Y L OR
meuse.grid (in gstat) is a grid over the same space
plot(meuse.grid)
the meuse.grid dataset
library(gstat) data(meuse.grid) coordinates(meuse.grid) <- c("x", "y") proj4string(meuse.grid) <- CRS(“+init=epsg:28992") meuse.grid <- spTransform(meuse.grid, CRS("+proj=longlat +datum=WGS84"))B A Y L OR
qmplot(lon, lat, data = mg, zoom = 14, legend = "topleft" ) + geom_point( aes(size = zinc), shape = I(15), data = m, color = "red" ) + scale_size("Zinc (ppm)")
What are the zinc levels at
B A Y L OR
qmplot(lon, lat, data = mg, shape = I(15), color = idw, zoom = 14, legend = "topleft", alpha = I(.75), darken = .4 ) + scale
inverse distance weighted interpolation
idw_mod <- idw(log(zinc) ~ 1, meuse, meuse.grid, idp = 2.5 ) mg$idw <- exp( idw_mod@data$var1.pred )
scale <- scale_color_gradient("Predicted\nZinc (ppm)", low = "green", high = "red", lim = c(100, 1850) )B A Y L OR
linear regression
lin <- krige(log(zinc) ~ 1, meuse, meuse.grid ) mg$lin <- exp( lin@data$var1.pred ) qmplot(lon, lat, data = mg, shape = I(15), color = lin, zoom = 14, legend = "topleft", alpha = I(.75), darken = .4 ) + scale
B A Y L OR
trend surface analysis
tsa <- krige(log(zinc) ~ 1, meuse, meuse.grid, degree = 2 ) mg$tsa <- exp( tsa@data$var1.pred ) qmplot(lon, lat, data = mg, shape = I(15), color = tsa, zoom = 14, legend = "topleft", alpha = I(.75), darken = .4 ) + scale
B A Y L OR
vgram <- variogram(log(zinc) ~ 1, meuse) vgramFit <- fit.variogram(vgram, vgm(1, "Exp", .2, .1) )
meuse, meuse.grid, vgramFit ) mg$ordKrige <- exp(
)
qmplot(lon, lat, data = mg, shape = I(15), color = ordKrige, zoom = 14, legend = "topleft", alpha = I(.75), darken = .4 ) + scale
B A Y L OR
universal kriging
vgram <- variogram(log(zinc) ~ 1, meuse) vgramFit <- fit.variogram(vgram, vgm(1, "Exp", .2, .1) ) univKrige <- krige(log(zinc) ~ sqrt(dist), meuse, meuse.grid, vgramFit ) mg$univKrige <- exp( univKrige@data$var1.pred )
qmplot(lon, lat, data = mg, shape = I(15), color = univKrige, zoom = 14, legend = "topleft", alpha = I(.75), darken = .4 ) + scale
B A Y L OR
faceting
IDW TSA
500 1000 1500 Predicted Zinc (ppm)
library(dplyr); library(tidyr) mg2 <- mg %>% select(lon, lat, idw, tsa, ordKrige, lin, univKrige) names(mg2)[3:7] <- c("IDW", "TSA", "Ord. Krig.", "Lin. Reg.", "Uni. Krig.")!
mg2 <- mg2 %>% gather("Method", "Prediction", 3:7)qmplot(lon, lat, data = mg2, shape = I(15), color = Prediction, zoom = 14, alpha = I(.75), darken = .4 ) + scale + facet_wrap(~ Method)
utility functions
B A Y L OR
B A Y L OR
geocode() Address Map coordinates Geocoding:
geocode("one bear place, waco, texas") ## lon lat ## 1 -97.16013 31.41688
Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=one+bear+place,+waco,+texas&sensor=falsegeocode( c("baylor university", "the white house"), source = "google" ) ## lon lat ## 1 -97.11844 31.54822 ## 2 -77.03653 38.89768
B A Y L OR
geocode()
big12
## university address wins ## 1 baylor One Bear Place, Waco, TX 76798 8 ## 2 iowa state Ames, IA 50011 0 ## 3 kansas 1450 Jayhawk Boulevard, Lawrence, KS 66045 1 ## 4 kansas state Manhattan, KS 66506 7 ## 5 oklahoma 660 Parrington Oval, Norman, OK 73019 5 ## 6 oklahoma state Stillwater, OK 74074 4 ## 7 texas Austin, TX 78712 5 ## 8 texas christian 2800 South University Drive, Fort Worth, TX 76129 8 ## 9 texas tech 2500 Broadway, Lubbock, TX 79409 2 ## 10 west virginia Morgantown, WV 26506 5
B A Y L OR
geocode()
geocode(address, data = big12)
big12
## university address wins lon lat ## 1 baylor One Bear Place, Waco, TX 76798 8 -97.16013 31.41688 ## 2 iowa state Ames, IA 50011 0 -93.63645 42.02403 ## 3 kansas 1450 Jayhawk Boulevard, Lawrence, KS 66045 1 -95.24935 38.95830 ## 4 kansas state Manhattan, KS 66506 7 -96.58388 39.19603 ## 5 oklahoma 660 Parrington Oval, Norman, OK 73019 5 -97.44528 35.21036 ## 6 oklahoma state Stillwater, OK 74074 4 -97.06906 36.10152 ## 7 texas Austin, TX 78712 5 -97.73100 30.28217 ## 8 texas christian 2800 South University Drive, Fort Worth, TX 76129 8 -97.36044 32.71171 ## 9 texas tech 2500 Broadway, Lubbock, TX 79409 2 -101.87063 33.58444 ## 10 west virginia Morgantown, WV 26506 5 -79.96267 39.64528qmplot()
B A Y L OR
route()
legs_df <- route( 'marrs mclean science, baylor university', '220 south 3rd street, waco, tx 76701', alternatives = TRUE)
## m km miles seconds minutes hours startLon startLat endLon endLat leg route ## 1 64 0.064 0.039 6 0.10 0.0016 -97.11969 31.54889 -97.12016 31.54849 1 A ## 2 1183 1.183 0.735 184 3.06 0.0511 -97.12016 31.54849 -97.12902 31.55596 2 A ## 3 129 0.129 0.080 19 0.31 0.0052 -97.12902 31.55596 -97.12805 31.55678 3 A ## 4 35 0.035 0.021 21 0.35 0.0058 -97.12805 31.55678 -97.12831 31.55700 4 A ## 5 70 0.070 0.043 6 0.10 0.0016 -97.11969 31.54889 -97.11916 31.54933 1 B ## 6 208 0.208 0.129 51 0.85 0.0141 -97.11916 31.54933 -97.12072 31.55065 2 B ## 7 224 0.224 0.139 25 0.41 0.0069 -97.12072 31.55065 -97.11915 31.55214 3 B ## 8 1003 1.003 0.623 126 2.10 0.0350 -97.11915 31.55214 -97.12612 31.55845 4 B ## 9 260 0.260 0.161 42 0.70 0.0116 -97.12612 31.55845 -97.12805 31.55678 5 B ## 10 35 0.035 0.021 14 0.23 0.0038 -97.12805 31.55678 -97.12831 31.55700 6 BB A Y L OR
route()
qmap('424 clay avenue, waco, tx', zoom = 15, maptype = 'hybrid', base_layer = ggplot( aes(x = startLon, y = startLat), data = legs_df) ) + geom_leg( aes( x = startLon, y = startLat, xend = endLon, yend = endLat, colour = route ), alpha = 3/4, size = 2, data = legs_df ) + facet_wrap(~ route, ncol = 2)
B A Y L OR
route()
A B route A B
future directions
B A Y L OR
mutate_geocode(), more sources and failsafes something like geom_krige()
B A Y L OR
route()
david.kahle@gmail.com
http://sites.google.com/site/davidkahle/
Stamen Maps Watercolor