DATA VISUALIZATION WITH GGPLOT2
Case Study I Bag Plot Data Visualization with ggplot2 ggplot2 2.0 - - PowerPoint PPT Presentation
Case Study I Bag Plot Data Visualization with ggplot2 ggplot2 2.0 - - PowerPoint PPT Presentation
DATA VISUALIZATION WITH GGPLOT2 Case Study I Bag Plot Data Visualization with ggplot2 ggplot2 2.0 Write your own extensions Extremely flexible Create bag plot John Tukey (box plots) 2D box plot Data Visualization
Data Visualization with ggplot2
ggplot2 2.0
- Write your own extensions
- Extremely flexible
- Create bag plot
- John Tukey (box plots)
- 2D box plot
Data Visualization with ggplot2
data set
> dim(df) [1] 202 2 > head(df) type Value 1 1 99.43952 2 1 99.76982 3 1 101.55871 4 1 100.07051 5 1 100.12929 6 1 101.71506
Data Visualization with ggplot2
2 box plots
> ggplot(df, aes(x = type, Value)) + geom_boxplot() + facet_wrap(~type, ncol = 2, scales = "free")
- 1
2 98 100 102 104 146 148 150 152 1 2
type Value
Data Visualization with ggplot2
slope plot
> df$ID <- seq_len(nrow(df) / 2) > ggplot(df, aes(x = type, Value, group = ID)) + geom_line(alpha = 0.3)
100 120 140 1 2
type Value
Data Visualization with ggplot2
Distribution of slope
40 45 50
slope
Box plot?
Data Visualization with ggplot2
2 distinct variables
> head(dat) group1 group2 1 99.43952 149.2896 2 99.76982 150.2569 3 101.55871 149.7533 4 100.07051 149.6525 5 100.12929 149.0484 6 101.71506 149.9550
Data Visualization with ggplot2
Scaer plot
> ggplot(dat, aes(x = group1, y = group2)) + geom_point()
- 146
148 150 152 98 100 102 104
group1 group2
Data Visualization with ggplot2
2D density plot
> library(viridis) > ggplot(dat, aes(x = group1, y = group2)) + stat_density_2d(geom = "tile", aes(fill = ..density..), contour = FALSE) + scale_fill_viridis()
145.0 147.5 150.0 152.5 98 100 102 104
group1 group2
0.05 0.10 0.15
density
Data Visualization with ggplot2
Bag plot
> library(aplpack) > bagplot(dat[1:2])
98 100 102 104 146 148 150 152 group1 group2
- hull
loop bag
Data Visualization with ggplot2
aplpack
> library(aplpack) > plot_data <- compute.bagplot(x = dat$group1, y = dat$group2) > names(plot_data) [1] "center" "hull.center" "hull.bag" "hull.loop" [5] "pxy.bag" "pxy.outer" "pxy.outlier" "hdepths" [9] "is.one.dim" "prdata" "xy" "xydata"
Data Visualization with ggplot2
ggplot2
> ggplot(dat, aes(x = group1, y = group2)) + geom_point()
- 146
148 150 152 98 100 102 104
group1 group2
Data Visualization with ggplot2
ggplot2
> ggplot(dat, aes(x = group1, y = group2)) + stat_bag(alpha = 0.2)
146 148 150 152 98 100 102 104
group1 group2
Data Visualization with ggplot2
Remarks
- Useful but not popular
- Poorly understood
- Learn to use ggplot2 extensions
DATA VISUALIZATION WITH GGPLOT2
Let’s practice!
DATA VISUALIZATION WITH GGPLOT2
Case Study II Weather (Part 1)
Data Visualization with ggplot2
Weather
Source: hp://www.edwardtue.com/
Data Visualization with ggplot2
present
> dim(present) [1] 153 5 > head(present, n = 4) month day year temp new_day 1 1 1 2016 41 1 2 1 2 2016 37 2 3 1 3 2016 40 3 4 1 4 2016 33 4 > tail(present, n = 4) month day year temp new_day 148 5 28 2016 79 148 149 5 29 2016 80 149 150 5 30 2016 73 150 151 5 31 2016 76 151
Data Visualization with ggplot2
Time series
> ggplot(present, aes(x = new_day, y = temp)) + geom_line()
20 40 60 80 50 100 150
new_day temp
Data Visualization with ggplot2
past
> str(past) 'data.frame': 7645 obs. of 11 variables: $ month : num 1 1 1 1 1 1 1 1 1 1 ... $ day : num 1 2 3 4 5 6 7 8 9 10 ... $ year : num 1995 1995 1995 1995 1995 ... $ temp : num 44 41 28 31 21 27 42 35 34 29 ... $ new_day : int 1 2 3 4 5 6 7 8 9 10 ... $ upper : num 51 48 57 55 56 62 52 57 54 47 ... $ lower : num 17 15 16 15 21 14 14 12 21 8.5 ... $ avg : num 35.6 35.4 34.9 35.1 35.9 ... $ se : num 2.19 1.83 2.46 2.53 1.92 ... $ avg_upper: num 40.2 39.2 40 40.5 39.9 ... $ avg_lower: num 31 31.5 29.7 29.8 31.9 ...
Data Visualization with ggplot2
Each year separately
> ggplot(past, aes(x = new_day, y = temp, group = year)) + geom_line(alpha = 0.2)
25 50 75 100 200 300
new_day temp
Data Visualization with ggplot2
present + past
> ggplot(past, aes(x = new_day, y = temp, group = year)) + geom_line(alpha = 0.4) + geom_line(data = present, aes(group = 1), col = "red")
25 50 75 100 200 300
new_day temp
Data Visualization with ggplot2
present + past
> ggplot(past, aes(x = new_day, y = temp, group = year)) + geom_line(alpha = 0.4) + geom_line(data = present, aes(group = 1), col = "red")
25 50 75 100 200 300
new_day temp
Data Visualization with ggplot2
Linerange
25 50 75 100 200 300
new_day temp
Data Visualization with ggplot2
Records
- 25
50 75 100 200 300
new_day temp
Data Visualization with ggplot2
Custom legend
- 95% CI range
Current year past record high past record low
- New record high
New record low
25 50 75 100 200 300
new_day temp
DATA VISUALIZATION WITH GGPLOT2
Let’s practice!
DATA VISUALIZATION WITH GGPLOT2
Case Study II Weather (Part 2)
Data Visualization with ggplot2
Up to now
- 95% CI range
Current year past record high past record low
- New record high
New record low
25 50 75 100 200 300
new_day temp
Data Visualization with ggplot2
Situation
- Many data frames
- Plot summary data frame as a layer
- stat_summary()
Data Visualization with ggplot2
stat_historical()
> ggplot(my_data, aes(x = new_day, y = temp, fill = year)) + stat_historical()
25 50 75 100 200 300
new_day temp
Data Visualization with ggplot2
stat_present()
> ggplot(my_data, aes(x = new_day, y = temp, fill = year)) + stat_historical() + stat_present()
25 50 75 100 200 300
new_day temp
Data Visualization with ggplot2
stat_extremes()
> ggplot(my_data, aes(new_day, temp, fill = year)) + stat_historical() + stat_present() + stat_extremes(aes(colour = ..record..))
- 25
50 75 100 200 300
new_day temp
Data Visualization with ggplot2
Specific layers
> ggplot(my_data, aes(new_day, temp, fill = year)) + stat_historical() + # stat_present() + stat_extremes(aes(colour = ..record..))
- 25
50 75 100 200 300
new_day temp
Data Visualization with ggplot2
Faceing
- PARIS
REYKJAVIK NEW YORK LONDON 25 50 75 25 50 75 100 200 300 100 200 300
new_day temp
DATA VISUALIZATION WITH GGPLOT2
Let’s practice!
DATA VISUALIZATION WITH GGPLOT2
Wrap-up
Data Visualization with ggplot2
Design Graphical Data Analysis Communication & Perception Statistics
Data Visualization with ggplot2
Explain
Inform and Persuade
Explore
Confirm and Analyse
Data Visualization with ggplot2
Element Description Data The dataset being ploed. Aesthetics The scales onto which we map our data. Geometries The visual elements used for our data.
Data Visualization with ggplot2
Element Description Data The dataset being ploed. Aesthetics The scales onto which we map our data. Geometries The visual elements used for our data. Facets Ploing small multiples. Statistics Representations of our data to aid understanding. Coordinates The space on which the data will be ploed. Themes All non-data ink.
Data Visualization with ggplot2
10 20 30 40 50 60 70 1931 1932
Year Yield (bushels/acre)
Site Waseca Crookston Morris University Farm Duluth Grand Rapids
- 3
6 9 12 15 18 21 24 Carnivore Herbivore Insectivore Omnivore
Eating habits Total sleep time (h)
Data Visualization with ggplot2
18 18 18 18 19 19 19 19 20 20 20 20 21 21 21 21 22 22 22 22 23 23 23 23 24 24 24 24 25 25 25 25 26 26 26 26 27 27 27 27 28 28 28 28 29 29 29 29 30 30 30 30 31 31 31 31 32 32 32 32 33 33 33 33 34 34 34 34 35 35 35 35 36 36 36 36 37 37 37 37 38 38 38 38 39 39 39 39 40 40 40 40 41 41 41 41 42 42 42 42 43 43 43 43 44 44 44 44 45 45 45 45 46 46 46 46 47 47 47 47 48 48 48 48 49 49 49 49 50 50 50 50 51 51 51 51 52 52 52 52 53 53 53 53 54 54 54 54 55 55 55 55 56 56 56 56 57 57 57 57 58 58 58 58 59 59 59 59 60 60 60 60 61 61 61 61 62 62 62 62 63 63 63 63 64 64 64 64 65 65 65 65 66 66 66 66 67 67 67 67 68 68 68 68 69 69 69 69 70 70 70 70 71 71 71 71 72 72 72 72 73 73 73 73 74 74 74 74 75 75 75 75 76 76 76 76 77 77 77 77 78 78 78 78 79 79 79 79 80 80 80 80 81 81 81 81 82 82 82 82 83 83 83 83 84 84 84 84 Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight Healthy−weight Obese Over−weight Under−weight
0.00 0.25 0.50 0.75 1.00 10000 20000 30000 40000
xtext 1
−5.0−2.5 0.0 2.5 5.0 residual
Data Visualization with ggplot2
20 40 60 80 100 2 4 6 8 1 2 4 6 8 1
Silt Sand Clay
2 3 4 5 50 60 70 80 90
waiting eruptions
0.005 0.010 0.015 0.020 0.025
density
3 6 9 12
Unemployment (%)
Data Visualization with ggplot2
2.0 2.5 3.0 3.5 4.0 4.5 4 5 6 7 8
Length Width Species
setosa versicolor virginica
Anderson, 1936
Iris Sepals
Data Visualization with ggplot2
146 148 150 152 98 100 102 104
group1 group2
- 95% CI range
Current year past record high past record low
- New record high
New record low
25 50 75 100 200 300
new_day temp
DATA VISUALIZATION WITH GGPLOT2