stats with geoms
play

Stats with geoms IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2 - PowerPoint PPT Presentation

Stats with geoms IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2 Rick Scavetta Founder, Scavetta Academy ggplot2, course 2 Statistics Coordinates Facets Data Visualization Best Practices INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2


  1. Stats with geoms IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2 Rick Scavetta Founder, Scavetta Academy

  2. ggplot2, course 2 Statistics Coordinates Facets Data Visualization Best Practices INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  3. Statistics layer Two categories of functions Called from within a geom Called independently stats_ INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  4. geom_ <-> stat_ p <- ggplot(iris, aes(x = Se p + geom_histogram() INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  5. geom_ <-> stat_ p <- ggplot(iris, aes(x = Sepal.Width)) p + geom_histogram() p + geom_bar() INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  6. geom_ <-> stat_ p <- ggplot(mtcars, aes(x = factor(cyl), fill = factor(am) p + geom_bar() p + stat_count() INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  7. The geom_/stat_ connection stat_ geom_ geom_histogram() , geom_freqpoly() stat_bin() stat_count() geom_bar() INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  8. stat_smooth() ggplot(iris, aes(x = Sepal.Lengt y = Sepal.Width color = Species geom_point() + geom_smooth() geom_smooth() using method = 'lo formula 'y ~ x' INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  9. stat_smooth(se = FALSE) ggplot(iris, aes(x = Sepal.L y = Sepal.W color = Spe geom_point() + geom_smooth(se = FALSE) geom_smooth() using method = formula 'y ~ x' INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  10. geom_smooth(span = 0.4) ggplot(iris, aes(x = Sepal.L y = Sepal.W color = Spe geom_point() + geom_smooth(se = FALSE, sp geom_smooth() using method = formula 'y ~ x' INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  11. geom_smooth(method = "lm") ggplot(iris, aes(x = Sepal.L y = Sepal.W color = Spe geom_point() + geom_smooth(method = "lm", INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  12. geom_smooth(fullrange = TRUE) ggplot(iris, aes(x = Sepal.L y = Sepal.W color = Spe geom_point() + geom_smooth(method = "lm", fullrange = TR INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  13. The geom_/stat_ connection stat_ geom_ geom_histogram() , geom_freqpoly() stat_bin() stat_count() geom_bar() stat_smooth() geom_smooth() INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  14. Other stat_ functions stat_ geom_ stat_boxplot() geom_boxplot() INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  15. Other stat_ functions stat_ geom_ stat_boxplot() geom_boxplot() stat_bindot() geom_dotplot() stat_bin2d() geom_bin2d() stat_binhex() geom_hex() INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  16. Other stat_ functions stat_ geom_ stat_boxplot() geom_boxplot() stat_bindot() geom_dotplot() stat_bin2d() geom_bin2d() stat_binhex() geom_hex() stat_contour() geom_contour() stat_quantile() geom_quantile() stat_sum() geom_count() INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  17. Let's practice! IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2

  18. Stats: sum and quantile IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2 Rick Scavetta Founder, Scavetta Academy

  19. Recall from course 1 Cause of Over-plotting Solutions Alpha-blending, hollow circles, point 1. Large datasets size 2. Aligned values on a single As above, plus change position axis 3. Low-precision data Position: jitter 4. Integer data Position: jitter INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  20. Plot counts to overcome over-plotting Cause of Over- Solutions Here... plotting Alpha-blending, hollow 1. Large datasets circles, point size 2. Aligned values on As above, plus change a single axis position 3. Low-precision Position: jitter geom_count() data 4. Integer data Position: jitter geom_count() INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  21. Low precision (& integer) data p <- ggplot(iris, aes(Sepal. Sepal. p + geom_point() INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  22. Jittering may give a wrong impressions p + geom_jitter(alpha = 0.5, width = 0.1, height = 0.1 INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  23. geom_count() p + geom_count() INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  24. The geom/stat connection geom_ stat_ geom_count() stat_sum() INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  25. stat_sum() p + stat_sum() INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  26. Over-plotting can still be a problem! ggplot(iris, aes(Sepal.Lengt Sepal.Width color = Spe geom_count(alpha = 0.4) INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  27. geom_quantile() ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_count(alpha = 0.4) INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  28. Dealing with heteroscedasticity library(AER) data(Journals) p <- ggplot(Journals, aes(log(price/ci log(subs))) geom_point(alpha = 0.5) + labs(...) p INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  29. Using geom_quantiles p + geom_quantile(quantiles = c(0.05, 0.50 INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  30. The geom/stat connection geom_ stat_ geom_count() stat_sum() geom_quantile() stat_quantile() INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  31. Ready for exercises! IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2

  32. Stats outside geoms IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2 Rick Scavetta Founder, Scavetta Academy

  33. Basic plot ggplot(iris, aes(x = Species y = Sepal.L geom_jitter(width = 0.2) INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  34. Calculating statistics set.seed(123) xx <- rnorm(100) mean(xx) [1] 0.09040591 mean(xx) + (sd(xx) * c(-1, 1)) [1] -0.822410 1.003222 INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  35. Calculating statistics set.seed(123) xx <- rnorm(100) # Hmisc library(Hmisc) smean.sdl(xx, mult = 1) Mean Lower Upper 0.09040591 -0.82240997 1.00322179 # ggplot2 mean_sdl(xx, mult = 1) y ymin ymax 1 0.09040591 -0.82241 1.003222 INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  36. stat_summary() ggplot(iris, aes(x = Species y = Sepal.L stat_summary(fun.data = mea fun.args = l Uses geom_pointrange() by default INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  37. stat_summary() ggplot(iris, aes(x = Species y = Sepal.L stat_summary(fun.y = mean, geom = "point stat_summary(fun.data = me fun.args = li geom = "error width = 0.1) INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  38. Not recommended! INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  39. 95% con�dence interval ERR <- qt(0.975, length(xx) - 1) * (sd(xx) / sqrt(length(xx))) mean(xx) 0.09040591 mean(xx) + (ERR * c(-1, 1)) # 95% CI -0.09071657 0.27152838 mean_cl_normal(xx) y ymin ymax 0.09040591 -0.09071657 0.2715284 INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  40. Other stat_ functions Description stat_ summarize y values at distinct x values. stat_summary() compute y values from a function of x values. stat_function() perform calculations for a quantile-quantile stat_qq() plot. INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  41. MASS::mammals INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  42. Normal distribution mam.new <- data.frame(body = log10(mam ggplot(mam.new, aes(x = body)) + geom_histogram(aes( y = ..density..) geom_rug() + stat_function(fun = dnorm, color = " args = list(mean = mea sd = sd(ma INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  43. QQ plot ggplot(mam.new, aes(sample = stat_qq() + geom_qq_line(col = "red") INTERMEDIATE DATA VISUALIZATION WITH GGPLOT2

  44. Your turn! IN TERMEDIATE DATA VIS UALIZ ATION W ITH GGP LOT2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend