https://www.cs.ubc.ca/~tmm/courses/436V-20
Information Visualization Aggregate & Filter
Tamara Munzner Department of Computer Science University of British Columbia
Lect 17, 10 Mar 2020
Information Visualization Aggregate & Filter Tamara Munzner - - PowerPoint PPT Presentation
Information Visualization Aggregate & Filter Tamara Munzner Department of Computer Science University of British Columbia Lect 17, 10 Mar 2020 https://www.cs.ubc.ca/~tmm/courses/436V-20 Upcoming Foundations 5: out Thu Mar 12, due Wed
https://www.cs.ubc.ca/~tmm/courses/436V-20
Lect 17, 10 Mar 2020
–(with update announce last week, schedule status component)
2
3
–different attributes different items (different condition keys, same gene keys), same attributes: expression values for node colors –(same network layout for nodes=genes)
4
[Cerebral: Visualizing Multiple Experimental Conditions on a Graph with Biological Context. Barsky, Munzner, Gardy, and Kincaid. IEEE Trans. Visualization and Computer Graphics (Proc. InfoVis 2008) 14:6 (2008), 1253–1260.]
5
– Ch 1. What's Vis, and Why Do It?
– Ch 2. What: Data Abstraction – Ch 4. Analysis: Four Levels for Validation
– Ch 3. Why: Task Abstraction
– Ch 5. Marks and Channels
– Ch 7. Arrange Tables
Views
– Ch 11. Manipulate View – Ch 12. Facet into Multiple Views
– Ch 8. Arrange Spatial Data (only 8.1-8.3)
– Ch 10. Map Color and Other Channels
– Ch 9. Arrange Networks and Trees
– Ch 13. Reduce Items and Attributes – Ch 14. Embed: Focus+Context
Thumb (upcoming)
– Ch 6. Rules of Thumb
6
Visualization Analysis & Design, free through library: catalog page EZProxy direct link
7
–MPG quantitative –Cylinders ordinal –Horsepower quantitative –Weight quantitative –Acceleration quantitative –Model Year ordinal –Origin categorical
– [8 min] –Socrative: true when done
8
9
Manipulate Facet Reduce Change Select Navigate Juxtapose Partition Superimpose Filter Aggregate Embed
Derive
10
Encode Arrange Express Separate Order Align Use Manipulate Facet Reduce Change Select Navigate Juxtapose Partition Superimpose Filter Aggregate Embed
How? Encode Manipulate Facet
Map Color Motion Size, Angle, Curvature, ...
Hue Saturation Luminance
Shape
Direction, Rate, Frequency, ...
from categorical and ordered attributes
11
Reducing Items and Attributes Filter Items Attributes Aggregate Items Attributes
12
–pro: straightforward and intuitive
–con: out of sight, out of mind
–pro: inform about whole set –con: difficult to avoid losing signal
–combine filter, aggregate –combine reduce, change, facet
Reduce
Filter Aggregate Embed
Reducing Items and Attributes Filter Items Attributes Aggregate Items Attributes
–either items or attributes
–any possible function that partitions dataset into two sets
–query: start with nothing, add in elements –filters: start with everything, remove elements –best approach depends on dataset size
13
Reducing Items and Attributes Filter Items Attributes
–tightly coupled interaction and visual encoding idioms, so user can immediately see results of action
14
[Ahlberg & Shneiderman, Visual Information Seeking: Tight Coupling of Dynamic Query Filters with Starfield Displays. CHI 1994.]
15
[http://square.github.io/crossfilter/]
16
17
–new table: keys are bins, values are counts
–pattern can change dramatically depending on discretization –opportunity for interaction: control bin size on the fly
18
20 15 10 5 Weight Class (lbs)
19
http://tinlizzie.org/histograms/
–make it interactive when possible
–# bins = sqrt(n) –# bins = log2(n)+1
20
age age # passengers # passengers 20 bins 10 bins
–better cues for information foraging: show whether value in drilling down further vs looking elsewhere
21
[Scented Widgets: Improving Navigation Cues with Embedded Visualizations. Willett, Heer, and Agrawala. IEEE TVCG (Proc. InfoVis 2007) 13:6 (2007), 1129–1136.]
[Multivariate Network Exploration and Presentation: From Detail to Overview via Selections and Aggregations. van den Elzen, van Wijk, IEEE TVCG 20(12): 2014 (Proc. InfoVis 2014).]
22
[ICLIC: Interactive categorization of large image collections. van der Corput and van
–also: interaction speed w/ scatterplot vs list view
23
https://keshif.me/gallery/olympics
–visual representation of static legends w/ –interaction mechanisms of widgets
24
Riche 2010
–5 quant attribs
– values beyond which items are outliers
–outliers beyond fence cutoffs explicitly shown
–unlimited number of items!
25
! ! ! ! ! ! ! ! !
n s k mm !2 2 4
[40 years of boxplots. Wickham and Stryjewski. 2012. had.co.nz]
–show outliers as points
26
[wikipedia]
27
http://stat.mq.edu.au/wp-content/uploads/2014/05/Can_the_Box_Plot_be_Improved.pdf
28
https://towardsdatascience.com/violin-plots-explained-fb1d115e023d
–smoothed, continuous version of a histogram estimated from data –continuous curve (the kernel, usually Gaussian bell curve) drawn at each data point –add curves together for single smooth density estimation
29
https://towardsdatascience.com/histograms-and-density-plots-in-python-f6bda88f5ac0
KDE wikipedia
30
https://observablehq.com/@d3/kernel-density-estimation
– key attribs x,y for pixels – quant attrib: overplot density
– no limits on overplotting: millions of items
31
[Continuous Scatterplots. Bachthaler and
IEEE TVCG (Proc. Vis 08) 14:6 (2008), 1428–1435. 2008. ]
32
–Plan: I livestream with video + audio + screenshare, will also try recording. –You'll be able to just join the session –Please connect audio-only, no video, to avoid congestion –You'll be auto-muted. If you have a question use the Show Hand (click on Participants, button is at the bottom of the popup window), I'll unmute you myself
–Please do connect with video if possible, in addition to audio –I'll use the Waiting Room feature, where I will individually allow you in
it's your turn.
33
–different Zoom URL for each TA, stay tuned –you can sign up for reserved slots in advance, or check for availability on the fly –more details soon
–but will not be in person –you are free to leave campus when you want (but are not required to do so)
34
–M2 due Wed Mar 25 –M3 due Wed Apr 8
–will go out Thu Mar 26, due Wed Apr 1
35
–Gradescope has detailed breakdown, note stats are wrt total of 75 –Canvas has percentages, mean was 79% –solutions have detailed rubric w/ answer alternatives & explanations
–we specifically suggest meet to discuss during labs or office hrs to several teams
–bimodal distribution
36
37
38
39
–changing boundaries of cartographic regions can yield dramatically different results –zone effects –scale effects
40
[http://www.e-education.psu/edu/geog486/l4_p7.html, Fig 4.cg.6]
https://blog.cartographica.com/blog/2011/5/19/ the-modifiable-areal-unit-problem-in-gis.html
41
https://www.washingtonpost.com/news/wonk/wp/2015/03/01/this-is-the-best-explanation-of- gerrymandering-you-will-ever-see/
A real district in Pennsylvania: Democrats won 51% of the vote but only 5 out of 18 house seats
42
https://www.nytimes.com/interactive/2018/01/17/upshot/pennsylvania-gerrymandering.html
43
https://www.nytimes.com/interactive/2018/11/29/us/politics/north-carolina-gerrymandering.html?action=click&module=Top%20Stories&pgtype=Homepage
–based on similiarity measure
–partitioning algorithms
–hierarchical algorithms
–cluster more homogeneous than whole dataset
44
45
–network –cluster hierarchy atop it
–connection marks for network links –containment marks for hierarchy –point marks for nodes
–select individual metanodes in hierarchy to expand/ contract
[GrouseFlocks: Steerable Exploration of Graph Hierarchy Space. Archambault, Munzner, and Auber. IEEE TVCG 14(4): 900-913, 2008.] Graph Hierarchy 1
46
[http://www.cs.umd.edu/hcil/hce/]
–cluster band with variable transparency, line at mean, width by min/max values –color by proximity in hierarchy
47
[Hierarchical Parallel Coordinates for Exploration of Large Datasets. Fua, Ward, and Rundensteiner. Proc. IEEE Visualization Conference (Vis ’99), pp. 43– 50, 1999.]
48
–derive low-dimensional target space from high-dimensional measured space
–use when you can’t directly measure what you care about
49 46
50
Task 1 In HD data Out 2D data Produce In High- dimensional data Why? What? Derive In 2D data Task 2 Out 2D data How? Why? What? Encode Navigate Select Discover Explore Identify In 2D data Out Scatterplot Out Clusters & points Out Scatterplot Clusters & points Task 3 In Scatterplot Clusters & points Out Labels for clusters Why? What? Produce Annotate In Scatterplot In Clusters & points Out Labels for clusters
wombat
–improve performance of downstream algorithm
–data analysis
– dimension-oriented tasks
– cluster-oriented tasks
51
[Visualizing Dimensionally-Reduced Data: Interviews with Analysts and a Characterization of Task
52
[A global geometric framework for nonlinear dimensionality reduction. Tenenbaum, de Silva, and Langford. Science, 290(5500):2319–2323, 2000.]
53
no discernable clusters clearly discernable clusters partial match cluster/class clear match cluster/class no match cluster/class
[Visualizing Dimensionally-Reduced Data: Interviews with Analysts and a Characterization of Task
–finding axes: first with most variance, second with next most, … –describe location of each point as linear combination of weights for each axis
54
[http://en.wikipedia.org/wiki/File:GaussianScatterPCA.png]
–new dimensions often cannot be easily related to originals
– mapping synthesized dims to original dims task is difficult
–many literatures: visualization, machine learning, optimization, psychology, ... –techniques: t-SNE, MDS (multidimensional scaling), charting, isomap, LLE,… –t-SNE: excellent for clusters – but some trickiness remains: http://distill.pub/2016/misread-tsne/ –MDS: confusingly, entire family of techniques, both linear and nonlinear – minimize stress or strain metrics – early formulations equivalent to PCA
55
–both emphasize cluster structure
56
https://pair-code.github.io/understanding-umap/ https://distill.pub/2016/misread-tsne/ https://colah.github.io/posts/2014-10-Visualizing-MNIST/
MDS PCA t-SNE UMAP
–goal: simulate how light bounces off materials to make realistic pictures
–idea: measure what light does with real materials
57
[Fig 2. Matusik, Pfister, Brand, and McMillan. A Data-Driven Reflectance Model. SIGGRAPH 2003]
–each image 4M pixels
–simulate completely new materials
–104 materials * 4M pixels = 400M dims –want concise model with meaningful knobs
58
[Figs 5/6. Matusik et al. A Data-Driven Reflectance Model. SIGGRAPH 2003]
–scree plots: error vs number of dimensions in lowD projection
–specular highlights cannot have holes!
59
[Figs 6/7. Matusik et al. A Data-Driven Reflectance Model. SIGGRAPH 2003]
–scree plot suggests 10-15 dims –note: dim estimate depends on technique used!
60
[Fig 10/11. Matusik et al. A Data-Driven Reflectance Model. SIGGRAPH 2003]
–synthetic dims created by algorithm but named by human analysts –points represent real-world images (spheres) –people inspect images corresponding to points to decide if axis could have meaningful name
–arrows show simulated images (teapots) made from model –check if those match dimension semantics
61
row 4
[Fig 12/16. Matusik et al. A Data-Driven Reflectance Model. SIGGRAPH 2003]
62
[Fig 13/14/16. Matusik et al. A Data-Driven Reflectance Model. SIGGRAPH 2003]
Specular-Metallic Diffuseness-Glossiness
63
64
–selectively filter and aggregate
–local lens
–region shape: radial, rectilinear, complex –how many regions: one, many –region extent: local, global –interaction metaphor
Embed Elide Data Superimpose Layer Distort Geometry
65
–some items dynamically filtered out –some items dynamically aggregated together –some items shown in detail
[DOITrees Revisited: Scalable, Space-Constrained Visualization of Hierarchical Data. Heer and Card. Proc. Advanced Visual Interfaces (AVI), pp. 421–424, 2004.]
66
–shape: radial –focus: single extent –extent: local –metaphor: draggable lens
http://tulip.labri.fr/TulipDrupal/?q=node/351 http://tulip.labri.fr/TulipDrupal/?q=node/371
67
68
–shape: rectilinear –foci: multiple –impact: global –metaphor: stretch and squish, borders fixed
[TreeJuxtaposer: Scalable Tree Comparison Using Focus+Context With Guaranteed
Tasiran, Zhang, and Zhou. ACM Transactions on Graphics (Proc. SIGGRAPH) 22:3 (2003), 453– 462.]
–combine focus and context information in single view
–length comparisons impaired
comparisons unaffected: connection, containment
–effects of distortion unclear if
–object constancy/tracking maybe impaired
69
[Living Flows: Enhanced Exploration of Edge-Bundled Graphs Based on GPU-Intensive Edge Rendering. Lambert, Auber, and Melançon. Proc. Intl. Conf. Information Visualisation (IV), pp. 523–530, 2010.]
fisheye lens magnifying lens neighborhood layering Bring and Go
70