Making data analysis easier
Feature Hierarchy in Graphical Displays
Heike Hofmann*, Susan VanderPlas Iowa State University
*currently visiting Monash
Feature Hierarchy in Graphical Displays Heike Hofmann*, Susan - - PowerPoint PPT Presentation
Making data analysis easier Feature Hierarchy in Graphical Displays Heike Hofmann*, Susan VanderPlas Iowa State University *currently visiting Monash Making data analysis easier to communicate Feature Hierarchy in Graphical Displays
Making data analysis easier
Heike Hofmann*, Susan VanderPlas Iowa State University
*currently visiting Monash
Making data analysis easier
Heike Hofmann*, Susan VanderPlas Iowa State University
to communicate
*currently visiting Monash
1 2 −1 1 2 3 4
x y
−2 −1 1 2 −2 −1 1 2
x y
−0.5 0.0 0.5 1.0 −0.5 0.0 0.5 1.0
x y
Cognitive principles for grouping Proximity Similarity Continuity
visual tasks: comparisons along common axis, lengths, area, …
1999): color, shape, angle, …
translate to understanding charts … need more direct validation
natural habitat’
color and shape and additional features (lines, ellipses) influence pattern detection
data embedded among a set of ‘null’ plots
nulls are generated by the same mechanism”
is the most different?”
against the null hypothesis
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Which of these plots is the most different?
data embedded among a set of ‘null’ plots
nulls are generated by the same mechanism”
is the most different?”
against the null hypothesis
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Which of these plots is the most different?
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Which of these plots is the most different?
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Which of these plots is the most different?
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Which of these plots is the most different?
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Fleck et al 2010)
λ : 0 λ : 0.25 λ : 0.5 λ : 0.75 λ : 1 −2 −1 1 2 K : 3
y
trend target cluster target nulls Model MT
with parameter sT
Model MC
with parameter sC
mixture
Statistic: R squared Statistic: Cluster Measure 10 20 30 40 0.6 0.7 0.8 0.9 0.6 0.7 0.8 0.9
Simulated Distribution of Test Statistic Density Distribution
Data Most Extreme of 18 Null Dists
(b) Cluster cohesion statistics C .
K : 3 K : 5 0.2 0.3 0.4 0.5 0.2 0.3 0.4 0.5 0.2 0.3 0.4 0.5 0.2 0.3 0.4 0.5 0.2 0.3 0.4 0.5 0.2 0.3 0.4 0.5 0.2 0.3 0.4 0.5 σC : 0.1 σC : 0.15 σC : 0.2 σC : 0.25 σC : 0.3 σC : 0.35 σC : 0.4 0.80 0.85 0.90 0.95 0.85 0.90 0.95
Interquartile intervals of Max (18) null distribution (blue) and target distribution (red) of amount of clustering. Variability along the trend : σT Distribution
Data Max(18 Nulls)
Trend Emphasis Strength 1 2 None Trend Trend + Error Cluster 1 Color Shape Color + Trend Emphasis 2 Color + Shape Color + Ellipse Color + Ellipse + Trend + Error 3 Color + Shape + Ellipse
lineup evaluations that identified one of the targets (9959 out of 12010 evaluations)
random intercept for data set difficulty
b bc bd bd bd cd cd a d a
Trend + Error Color + Ellipse + Trend + Error Plain Trend Color Shape Color + Shape Color + Ellipse Color + Trend Color + Shape + Ellipse <−−Trend Target 1/2 1/1.75 1/1.5 1/1.25 1 1.25 1.5 1.75 2 Cluster−−> Target
Odds (on log scale) of selecting Cluster over Trend Target and 95% Wald Intervals (Reference level: Plain plot)
Odds of selecting Cluster over Trend Target
cluster target
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
target not as strong???
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
strong signal (single missing ellipse cuts probability by 44%)
(a) Plain, neither target (b) Plain, cluster target (c) Plain, trend target
(j) Color + Ellipse, neither (k) Color + Ellipse, cluster (l) Color + Ellipse, trend
trends follow the expectation: color, shape and ellipses emphasize clustering trend-line and predictions emphasize trends
signal
missing groups, if they expected them.