comparing distributions
play

Comparing Distributions Nick Strayer Instructor DataCamp - PowerPoint PPT Presentation

DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Comparing Distributions Nick Strayer Instructor DataCamp Visualization Best Practices in R Why compare distributions? Verify balanced groups For comparisons sake


  1. DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Comparing Distributions Nick Strayer Instructor

  2. DataCamp Visualization Best Practices in R Why compare distributions? Verify balanced groups For comparisons sake

  3. DataCamp Visualization Best Practices in R Why not facet histogams? ggplot(md_speeding, aes(x = speed_over)) + geom_histogram() + facet_grid(vehicle_color~.)

  4. DataCamp Visualization Best Practices in R The box plot

  5. DataCamp Visualization Best Practices in R Box plot pros Familiar Lots of good summary statistics

  6. DataCamp Visualization Best Practices in R Boxplot cons Show me the data!

  7. DataCamp Visualization Best Practices in R A simple addition geom_jitter() shows raw points jostled to avoid overlap. Layer under your geom_boxplot . md_speeding %>% filter(vehicle_color == 'BLUE') %>% ggplot(aes(x = gender, y = speed)) + # Draw points behind geom_jitter(alpha = 0.3, color = 'steelblue') + geom_boxplot(alpha = 0) + # make transparent labs(title = 'Distribution of speed for blue cars by gender')

  8. DataCamp Visualization Best Practices in R

  9. DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Let's compare some distributions

  10. DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Boxplot alternatives Nick Strayer Instructor

  11. DataCamp Visualization Best Practices in R Limitations of the boxplot w/ jitter Josteling points can only deal with so much overlap Hard to get an idea of data density

  12. DataCamp Visualization Best Practices in R What are some other options? Beeswarm plots Violin plots

  13. DataCamp Visualization Best Practices in R Beeswarm plots 'Smart' jittering Individual points are clumped together as close to the axis as possible Handily included as geom_beeswarm in the ggbeeswarm package. library(ggbeeswarm) ggplot(data, aes(y = y, x = group)) + geom_beeswarm(color = 'steelblue')

  14. DataCamp Visualization Best Practices in R

  15. DataCamp Visualization Best Practices in R Beeswarm pros Individual datapoints Distributional shape

  16. DataCamp Visualization Best Practices in R Beeswarm cons Get hard with lots of data Arbitrary stacking

  17. DataCamp Visualization Best Practices in R Violin plots KDE reflected to be symmetric Just replace geom_boxplot with geom_violin . ggplot(data, aes(y = y, x = group)) + geom_violin(fill = 'steelblue')

  18. DataCamp Visualization Best Practices in R

  19. DataCamp Visualization Best Practices in R Violin pros Every datapoint is heard Not every datapoint is seen, so good for lots of data.

  20. DataCamp Visualization Best Practices in R Violin cons Kernel width choice Not every datapoint is seen

  21. DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Let's try some more advanced comparisons!

  22. DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Comparing spatially related distribution Nick Strayer Instructor

  23. DataCamp Visualization Best Practices in R What are 'spatially connected axes'? There is an underlying ordering of the classes. E.g. months of the year: Jan < Feb < Mar < ...

  24. DataCamp Visualization Best Practices in R The ridgeline plot library(ggridges) # gives us geom_density_ridges() ggplot(md_speeding, aes(x = speed_over, y = month)) + geom_density_ridges(bandwidth = 2) + xlim(1, 35)

  25. DataCamp Visualization Best Practices in R Ridgeline pros

  26. DataCamp Visualization Best Practices in R Ridgeline cons

  27. DataCamp Visualization Best Practices in R

  28. DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Let's make some ridgelines!

  29. DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Congratulations! Nick Strayer Instructor

  30. DataCamp Visualization Best Practices in R

  31. DataCamp Visualization Best Practices in R

  32. DataCamp Visualization Best Practices in R

  33. DataCamp Visualization Best Practices in R

  34. DataCamp Visualization Best Practices in R Going further Flowing data Datawrapper Blog Curated list of data visualizations and R- Articles that dig deep into visualization based tutorials. techniques and mistakes. Twitter (#datavis) Books! An ongoing stream of cool projects and Data Visualization , Andy Kirk inspiration. The Functional Art and The Truthful Art by Alberto Cairo

  35. DataCamp Visualization Best Practices in R VISUALIZATION BEST PRACTICES IN R Thank You!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend