I590 Interactive Visual Analytics Week 13 | Nov 16, 2016 Filtering - - PowerPoint PPT Presentation

▶

Apr 06, 2024 126 likes •546 views

I590 Interactive Visual Analytics Week 13 | Nov 16, 2016 Filtering and Aggregation Models in Visual Analytics Khairi Reda | redak@iu.edu School of Informa5cs & Compu5ng, IUPUI

SLIDE 1

Khairi Reda | redak@iu.edu School of Informa5cs & Compu5ng, IUPUI

Week 13 | Nov 16, 2016 Filtering and Aggregation Models in Visual Analytics

I590 Interactive Visual Analytics

SLIDE 2

http://www.michelecoscia.com/wp-content/uploads/2012/08/demon2.png

SLIDE 3

Filtering & Aggrega1on

Too much data can overwhelm the

visualiza5on

Some5mes we need to show less data

points

Filter: eliminate irrelevant items
Aggregate: group similar items

SLIDE 4

SLIDE 5

Filter

Any func5on that par55ons

the data into two set based

n aGributes
Larger / smaller than X
Within a specified

geographic extents

Noisy / significant readings
Filtering can also be apply to

aGributes, as opposed to the data point themselves

Based on a slide by Alex Lex

SLIDE 6

Filtering with Dynamic Queries

Schneiderman

SLIDE 7

Filtering with menus

SLIDE 8

Scented Widgets

Willett 2007, Via Alex Lex

Provide cues (scent) to the users to aid in

filtering and explora5on

Usually come in the form of small visual

representa5ons that bind to interface elements

SLIDE 9

Interac1ve Legends

Provides filtering controls from the

legend

Riche 2010, Via Alex Lex

SLIDE 10

Aggrega1on

SLIDE 11

Histogram

Aggregate items into bins
Display the number of items (i.e., frequency)

in each bin

SLIDE 12

Histogram

Number of bins can affect the shape of the histogram

Distribution of passengers by Age 10 Bins 20 Bins

Based on a slide by Alex Lex

SLIDE 13

http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/plotting_distributions.html

Density plots

SLIDE 14

Box plots (aka Box-and- Whisker) plots

First quar5le: splits the lowest

25% of the data

Median: splits data into half
Third quar5le: splits the

highest 25% of the data

http://image.mathcaptain.com/cms/images/106/box-plot.png

SLIDE 15

Box plots (aka Box-and- Whisker) plots

Wikipedia

An alterna5ve

representa5on to the min/max is to scale the whiskers by the Interquar5le Range (Q3-Q1)

SLIDE 16

One box plot, four distribu1ons

http://stat.mq.edu.au/wp-content/uploads/2014/05/Can_the_Box_Plot_be_Improved.pdf

SLIDE 17

Distribu1on, errors bars, and box plots

Streit & Gehlenborg, PoV, Nature Methods, 2014 Via Alex Lex

SLIDE 18

Violin plots

http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/plotting_distributions.html

SLIDE 19

Heatmaps

Aggregate 2D points into 2D bins

SLIDE 20

Heatmaps (for scaNerplots)

SLIDE 21

Spa1al Aggrega1on

Changing the boundaries / structure of the aggrega5on bins yields different results

Based on a slide by Alex Lex

SLIDE 22

Spa1al Aggrega1on

Gerrymandering

Based on a slide by Alex Lex

SLIDE 23

Clustering

Classifica5on of items into “similar” bins
Typically based on a similarity measure
Euclidean distance, Pearson correla5on, etc…
Many different clustering algorithms, with

weaknesses and strengths

K-Means
Hierarchical clustering

SLIDE 24

K-Means

Pick K star5ng points as centroids.

Those eventually will comprise the clusters

Calculate distance of every point to

centroid, assigning the point to the closest centroid

Update the centroid to the average
f the cluster’s members
Repeat

SLIDE 25

K-Means

Have to pick K
Assump5ons about the data:

roughly “circular” clusters of equal size

http://stats.stackexchange.com/questions/133656/how-to- understand-the-drawbacks-of-k-means

Limita5ons

SLIDE 26

K-Means

Limita5ons

http://stats.stackexchange.com/questions/133656/how-to-understand-the-drawbacks-of-k-means

SLIDE 27

Dimensionality Reduc1on

Item ANr 1 ANr 2 ANr 3 ANr 4 ANr 5 ANr 6 ANr 7 ANr 8 ANr 9 ANr 10 ANr 11 …

A B C …

High-dimensional data: large number of

aGributes

Dimensionality reduc5on: Reduce number of

dimensions (aGributes) while keeping as much varia5on as possible

SLIDE 28

Dimensionality Reduc1on

Item ANr 1 ANr 2 ANr 3 ANr 4 ANr 5 ANr 6 ANr 7 ANr 8 ANr 9 ANr 10 ANr 11 …

A B C …

Principle component analysis
Mul5dimensional scaling
And other techniques…

SLIDE 29

Principle Component Analysis (PCA)

Find a new set of

dimensions (axes) that explains the majority of the variance in the data

Order the new dimensions

by variance

The first principle

component accounts for most variance

SLIDE 30

Principle Component Analysis (PCA)

http://setosa.io/ev/principal-component-analysis/

SLIDE 31

Mul1dimensional scaling (MDS)

Project the high-dimensional

space onto a much lower space (e.g, 2D)

Relies on similarity between

points (usually have to compute pairwise similarity between every pair of points)

Non-linear transforma5on:

More difficult to interpret than PCA, but can maintain structures beGer in some cases

SLIDE 32

Models in Visual Analy1cs

Adapted from: http://slideplayer.com/slide/4659134/ and from Remo Chang, 2010

SLIDE 33

Models in Visual Analy1cs

Abstrac5ons of how visualiza5on works:
Provide a way of talking about how

humans interact with visualiza5ons

Language for describing different parts of

the visual analy5c process

Every model is (overly) simplified:

beware!

SLIDE 34

Terminology / Assump1ons

Sense making: The act of processing incomplete

informa5on in order to improve one’s understanding

f a situa5on and/or to make decisions
A person’s decision making is bound by [1]
incomplete informa5on
the amount of 5me they have to decide
the finite processing power of their brain
Mental model: An abstracted versions of the real-

world that are more tractable

[1] H. Simon 1957. “A Behavioral Model of Rational Choice”

SLIDE 35

Models in Visual Analy1cs

SLIDE 36

Informa1on Visualiza1on Reference Model

Card, Mackinlay, and Scneiderman. Readings in Information Visualization: Using Vision to Think, Morgan Kaufmann, 1999, pp. 17

SLIDE 37

Van Wijk’s Model

Van Wijk, J. “The value of visualization”, 2005

D=Data V=Visualiza5on S=Specifica5on I=Image P=Percep5on K=Knowledge E=Explora5on

SLIDE 38

Keim’s Visual Analy1cs Model

Keim, D et al. “Visual Analytics: Definition, process, and challenges”, 2008

SLIDE 39

Pirolli and Card Sensemaking model

Pirolli, P and Card, S. “The sense making process and leverage points for analyst technology as identified through cognitive task analysis”, 2005

I590 Interactive Visual Analytics

Filtering & Aggrega1on

visualiza5on

points

Filter

Filtering with Dynamic Queries

Filtering with menus

Scented Widgets

Interac1ve Legends

legend

Aggrega1on

Histogram

Histogram

Density plots

Box plots (aka Box-and- Whisker) plots

Box plots (aka Box-and- Whisker) plots

One box plot, four distribu1ons

Distribu1on, errors bars, and box plots

Violin plots

Heatmaps

Heatmaps (for scaNerplots)

Spa1al Aggrega1on

Changing the boundaries / structure of the aggrega5on bins yields different results

Spa1al Aggrega1on

Gerrymandering

Clustering

K-Means

K-Means

Limita5ons

K-Means

Limita5ons

Dimensionality Reduc1on

Dimensionality Reduc1on

Principle Component Analysis (PCA)

Principle Component Analysis (PCA)

Mul1dimensional scaling (MDS)

Models in Visual Analy1cs

Models in Visual Analy1cs

humans interact with visualiza5ons

the visual analy5c process

beware!

Terminology / Assump1ons

Models in Visual Analy1cs

Informa1on Visualiza1on Reference Model

Van Wijk’s Model

Keim’s Visual Analy1cs Model

Pirolli and Card Sensemaking model

Next week

Class canceled — Happy Thanksgiving!

Week 15 - Nov 30

Time series and temporal data Inference and uncertainty in visualiza5on