201ab Quantitative methods Visualization E D V UL | UCSD Psychology

• Visualization failure modes • Cool vs informative visualizations • Ways graphs can mislead • Making a graph pretty • ggplot: grammar of graphics E D V UL | UCSD Psychology

Entirely made up. E D V UL | UCSD Psychology

Nonsense variables. E D V UL | UCSD Psychology

Graph independent of data. E D V UL | UCSD Psychology

Multiple variables graphed as one. E D V UL | UCSD Psychology

Credit: xkcd E D V UL | UCSD Psychology

Not labeled (or mislabeled). E D V UL | UCSD Psychology

Misleading or useless axis scales. E D V UL | UCSD Psychology

Misleading binning. E D V UL | UCSD Psychology

Illegible E D V UL | UCSD Psychology

Visualization failure modes • Completely made up. • Nonsense variables/relationships. • Graph independent of data. • Multiple variables treated as one. • Not labeled, or mislabeled. • Misleading / unusable scales. • Misleading binning. • Illegible. • Crazy mapping from variables -> visual properties. E D V UL | UCSD Psychology

E D V UL | UCSD Psychology

• Visualization failure modes • Cool vs scientific visualizations • Making a graph pretty • ggplot: grammar of graphics • How to graph common data types. E D V UL | UCSD Psychology

From dynamicdiagrams.com E D V UL | UCSD Psychology

This one. This one. - Looks cooler! - Looks a bit more boring - Provides a visual puzzle. - Is much easier to parse and understand - Misrepresents magnitudes. - Accurately, quantitatively represents - Does not adhere to (modern!) convention. magnitudes. - Makes it difficult to make quantitative - Adheres to modern convention comparisons, or extract numbers - Makes it easy to make quantitative comparisons, and extract numbers This is a bad scientific data display This is a good scientific data display But it is a cool visualization But might not be as interesting a visualization E D V UL | UCSD Psychology 24

• Visualization failure modes • Cool vs scientific visualizations • Making a graph pretty • ggplot: grammar of graphics • How to graph common data types. E D V UL | UCSD Psychology

May have gone a bit overboard into “visualization” territory – looks good, but starts violating some conventions: - No Y axis - Y axis label used as title E D V UL | UCSD Psychology

• Visualization failure modes • Cool vs informative visualizations • Making a graph pretty • ggplot: grammar of graphics • Graphs for common types of data. E D V UL | UCSD Psychology

library(ggplot2) Fig <- ggplot(data=..., mapping=aes(...)) + facet_*() + geom_*() + stat_*() + scale_*() + theme*() Basic operation: Take a tidy data frame map variables onto different aesthetic variables (e.g., x, y, color, fill, size, shape, alpha, group). Draw some geom(etric entity) according to that mapping (e.g., point, line, tile, area, ribbon, etc.) E D V UL | UCSD Psychology

• Visualization failure modes • Cool vs informative visualizations • Making a graph pretty • ggplot: grammar of graphics • Graphs for common types of data. • Practice in R. • More exotic graph types / considerations E D V UL | UCSD Psychology

Goal: show how response/dependent variable(s) change with explanatory/independent variable(s). What kind of variables? Categorical? Numerical? Helps to think of it as an abstract formula of sorts, e.g.,: How does height (numerical response) vary across sex (categorical), nationality (categorical), and parents’ income (numerical): numerical ~ 2*categorical + numerical This abstraction helps you pick starting points for graphs. E D V UL | UCSD Psychology

categorical ~ 0 (1 categorical response variable, with 0 explanatory variables) Stacked bar plot Histogram Pie chart + easy-ish comparisons barplot of counts - Hardest comparisons + easy-ish proportion ++ Easiest comparisons ++ easiest proportion - Hardest proportion + socially acceptable pie chart - Waste of ink - Considered tacky. Data: http://vulstats.ucsd.edu/data/spsp.demographics.cleaned.csv E D V UL | UCSD Psychology

categorical ~ 0 (1 categorical response variable, with 0 explanatory variables) Counts: highlight sample size proportions: easier when n is small interpretation. Data: http://vulstats.ucsd.edu/data/spsp.demographics.cleaned.csv E D V UL | UCSD Psychology

numerical ~ 0 (1 numerical response variable, with 0 explanatory variables) Histogram Smoothed density + Portrays noisiness. - Obscures noisiness - Impression sensitive to bins + not too sensitive to reasonable kernel width. Data: http://vulstats.ucsd.edu/data/cal1020.cleaned.Rdata E D V UL | UCSD Psychology

numerical ~ 0 (1 numerical response variable, with 0 explanatory variables) E D V UL | UCSD Psychology

numerical ~ categorical (1 numerical response variable, with 1 categorical explanatory variable) Mean+error Jitter violin boxplot densities Emp CDF (coords flipped) (coords flipped) Easy stat. Useful when Useful when Best when coords not flipped, comparison n is small n is large Best for few categories (<4?). E D V UL | UCSD Psychology

numerical ~ categorical (1 numerical response variable, with 1 categorical explanatory variable) – Always put error bars on bar charts (std. error or CI are fine) – Look at rawer data (e.g,. strip charts) before going to more compressed plots. – By removing the solid bar from a bar chart, you can add a good visualization of data distribution. This is better. E D V UL | UCSD Psychology

numerical ~ categorical (my suggestions) With small n: Show all the data points with jitter (here, data are sub- sampled to generate a low n scenario) With large n: Show distribution with violin or density. E D V UL | UCSD Psychology

numerical ~ categorical (eclectic plots, useful with large n, weird distributional differences) Cumulative distribution functions Highlights differences in the tails. Overlayed density/histograms Only useful with really large n With large n can show weird differences. (so tails aren’t just noise). E D V UL | UCSD Psychology

numerical ~ numerical (1 numerical response variable, with 1 numerical explanatory variable) 2 x numerical ~ 0 2D histogram heatmap: Scatterplot: Useless for small n. Best option with small n. Best option with large n. Hard to make legible with large n. E D V UL | UCSD Psychology

numerical ~ numerical (1 numerical response variable, with 1 numerical explanatory variable) Fitted conditional means Conditional means Very rarely should you show these on their This will require binning by x. own, without the raw data. Generally: use method=lm, rather than loess. E D V UL | UCSD Psychology

numerical ~ numerical (my recommendation) My recommendation: Show data, show fit. E D V UL | UCSD Psychology

numerical ~ numerical (1 numerical response variable, with 1 numerical explanatory variable) Normalization by x useful when you don’t care about distribution over x. Note: you are unlikely to luxuriate in this much data. E D V UL | UCSD Psychology

numerical ~ numerical + categorical (1 numerical response, with numerical & categorical explanatory variable) Color-coded scatterplot Fitted lines / conditional means. Hard to parse with lots of data. Show error bars. If y is smooth in x, show Note importance of explanatory conditional means (as in here). variable on the x axis! Bin width matters. E D V UL | UCSD Psychology

numerical ~ numerical + categorical (1 numerical response, with numerical & categorical explanatory variable) If scatterplots are important, split into facets with large n. If line comparison is important, keep in same panel. E D V UL | UCSD Psychology

General pointers E D V UL | UCSD Psychology

General pointers • Label your axes. • Follow conventions – Explanatory variable on x axis. – Don’t get creative – respect variable types. – Don’t make visualization puzzles • Convey information clearly, numerically • Represent uncertainty! (distribution, error, confidence) • Be wary of binning artifacts / thresholding • Cool visualizations are not good science graphs E D V UL | UCSD Psychology

201ab Quantitative methods Visualization E D V UL | UCSD Psychology - PowerPoint PPT Presentation

201ab Quantitative methods Visualization E D V UL | UCSD Psychology Visualization failure modes Cool vs informative visualizations Ways graphs can mislead Making a graph pretty ggplot: grammar of graphics E D V UL | UCSD

201ab Quantitative methods L.13: ANOVA (b) ANalysis Of VAriance E D V UL | UCSD Psychology

201ab Quantitative methods L.12 Linear model: Categorical predictors E D V UL | UCSD Psychology

201ab Quantitative methods non-linear Transformations E D V UL | UCSD Psychology 1 Linearly

201ab Quantitative methods L.09: Correlation, regression (2) Alt-text: Correlation doesn't imply

201ab Quantitative methods ANCOVA E D V UL | UCSD Psychology What does ANCOVA do? In an ANOVA ,

201ab Quantitative methods Multiple regression (b) With great illustrations from Julian Parris. E

201ab Quantitative methods Linear model diagnostics. Model assumptions, in order of importance

Why not Quantitative Methods? Why not Quantitative Methods? division into variables:

Monitoring and data filtering I. Classical Methods Advanced Quantitative Methods in Herd

Empirical Methods Empirical Methods t= a +b Research Landscape Quantitative =

Information Visualization Computational benchmarks? domain situation who are the

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

Clustering methods R.W. Oldford Interactive data visualization An important advantage of data

Clustering methods R.W. Oldford Interactive data visualization An important advantage of data

Data Visualization Steve Marschner Cornell CS 322 unless noted, images are from our

Quantitative Evaluation Research Questions Quantitative Data Controlled Studies Experimental

Quantitative Evaluation Research Questions Quantitative Data Controlled Studies Experimental

Adaptive wavelet methods: Quantitative improvements and extensions Rob Stevenson Korteweg-de

Security Visualization Tim Vidas & Hanan Hibshi UPS 2011 1 Visualization Visualization can

Quantitative Methods Assignment 1 Instructor: Xi Chen Due date: Oct. 17 1. Consider the training

Application of geospatial methods and remote sensing and for evaluation Blending quantitative

Energy Markets and Quantitative Methods Padova, 17.Oct. 19 hugo@energyquantified.com +47 9187

Approximate methods for scalable data mining Andrew Clegg Data Analytics & Visualization

Sound quality of textural audio: characterization, modeling and visualization ESI Modern Methods

201ab Quantitative methods Visualization E D V UL | UCSD Psychology - PowerPoint PPT Presentation

201ab Quantitative methods Visualization E D V UL | UCSD Psychology Visualization failure modes Cool vs informative visualizations Ways graphs can mislead Making a graph pretty ggplot: grammar of graphics E D V UL | UCSD

201ab Quantitative methods L.13: ANOVA (b) ANalysis Of VAriance E D V UL | UCSD Psychology

201ab Quantitative methods L.12 Linear model: Categorical predictors E D V UL | UCSD Psychology

201ab Quantitative methods non-linear Transformations E D V UL | UCSD Psychology 1 Linearly

201ab Quantitative methods L.09: Correlation, regression (2) Alt-text: Correlation doesn't imply

201ab Quantitative methods ANCOVA E D V UL | UCSD Psychology What does ANCOVA do? In an ANOVA ,

201ab Quantitative methods Multiple regression (b) With great illustrations from Julian Parris. E

201ab Quantitative methods Linear model diagnostics. Model assumptions, in order of importance

Why not Quantitative Methods? Why not Quantitative Methods? division into variables:

Monitoring and data filtering I. Classical Methods Advanced Quantitative Methods in Herd

Empirical Methods Empirical Methods t= a +b Research Landscape Quantitative =

Information Visualization Computational benchmarks? domain situation who are the

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

Clustering methods R.W. Oldford Interactive data visualization An important advantage of data

Clustering methods R.W. Oldford Interactive data visualization An important advantage of data

Data Visualization Steve Marschner Cornell CS 322 unless noted, images are from our

Quantitative Evaluation Research Questions Quantitative Data Controlled Studies Experimental

Quantitative Evaluation Research Questions Quantitative Data Controlled Studies Experimental

Adaptive wavelet methods: Quantitative improvements and extensions Rob Stevenson Korteweg-de

Security Visualization Tim Vidas &amp; Hanan Hibshi UPS 2011 1 Visualization Visualization can

Quantitative Methods Assignment 1 Instructor: Xi Chen Due date: Oct. 17 1. Consider the training

Application of geospatial methods and remote sensing and for evaluation Blending quantitative

Energy Markets and Quantitative Methods Padova, 17.Oct. 19 hugo@energyquantified.com +47 9187

Approximate methods for scalable data mining Andrew Clegg Data Analytics &amp; Visualization

Sound quality of textural audio: characterization, modeling and visualization ESI Modern Methods

Security Visualization Tim Vidas & Hanan Hibshi UPS 2011 1 Visualization Visualization can

Approximate methods for scalable data mining Andrew Clegg Data Analytics & Visualization