I590 Interactive Visual Analytics Week 13 | Nov 16, 2016 Filtering - - PowerPoint PPT Presentation

i590 interactive visual analytics
SMART_READER_LITE
LIVE PREVIEW

I590 Interactive Visual Analytics Week 13 | Nov 16, 2016 Filtering - - PowerPoint PPT Presentation

I590 Interactive Visual Analytics Week 13 | Nov 16, 2016 Filtering and Aggregation Models in Visual Analytics Khairi Reda | redak@iu.edu School of Informa5cs & Compu5ng, IUPUI


slide-1
SLIDE 1

Khairi Reda | redak@iu.edu School of Informa5cs & Compu5ng, IUPUI

Week 13 | Nov 16, 2016 Filtering and Aggregation Models in Visual Analytics

I590 Interactive Visual Analytics

slide-2
SLIDE 2

http://www.michelecoscia.com/wp-content/uploads/2012/08/demon2.png

slide-3
SLIDE 3

Filtering & Aggrega1on

  • Too much data can overwhelm the

visualiza5on

  • Some5mes we need to show less data

points

  • Filter: eliminate irrelevant items
  • Aggregate: group similar items
slide-4
SLIDE 4
slide-5
SLIDE 5

Filter

  • Any func5on that par55ons

the data into two set based

  • n aGributes
  • Larger / smaller than X
  • Within a specified

geographic extents

  • Noisy / significant readings
  • Filtering can also be apply to

aGributes, as opposed to the data point themselves

Based on a slide by Alex Lex

slide-6
SLIDE 6

Filtering with Dynamic Queries

Schneiderman

slide-7
SLIDE 7

Filtering with menus

slide-8
SLIDE 8

Scented Widgets

Willett 2007, Via Alex Lex

  • Provide cues (scent) to the users to aid in

filtering and explora5on

  • Usually come in the form of small visual

representa5ons that bind to interface elements

slide-9
SLIDE 9

Interac1ve Legends

  • Provides filtering controls from the

legend

Riche 2010, Via Alex Lex

slide-10
SLIDE 10

Aggrega1on

slide-11
SLIDE 11

Histogram

  • Aggregate items into bins
  • Display the number of items (i.e., frequency)

in each bin

slide-12
SLIDE 12

Histogram

Number of bins can affect the shape of the histogram

Distribution of passengers by Age 10 Bins 20 Bins

Based on a slide by Alex Lex

slide-13
SLIDE 13

http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/plotting_distributions.html

Density plots

slide-14
SLIDE 14

Box plots (aka Box-and- Whisker) plots

  • First quar5le: splits the lowest

25% of the data

  • Median: splits data into half
  • Third quar5le: splits the

highest 25% of the data

http://image.mathcaptain.com/cms/images/106/box-plot.png

slide-15
SLIDE 15

Box plots (aka Box-and- Whisker) plots

Wikipedia

  • An alterna5ve

representa5on to the min/max is to scale the whiskers by the Interquar5le Range (Q3-Q1)

slide-16
SLIDE 16

One box plot, four distribu1ons

http://stat.mq.edu.au/wp-content/uploads/2014/05/Can_the_Box_Plot_be_Improved.pdf

slide-17
SLIDE 17

Distribu1on, errors bars, and box plots

Streit & Gehlenborg, PoV, Nature Methods, 2014 Via Alex Lex

slide-18
SLIDE 18

Violin plots

http://web.stanford.edu/~mwaskom/software/seaborn/tutorial/plotting_distributions.html

slide-19
SLIDE 19

Heatmaps

  • Aggregate 2D points into 2D bins
slide-20
SLIDE 20

Heatmaps (for scaNerplots)

slide-21
SLIDE 21

Spa1al Aggrega1on

Changing the boundaries / structure of the aggrega5on bins yields different results

Based on a slide by Alex Lex

slide-22
SLIDE 22

Spa1al Aggrega1on

Gerrymandering

Based on a slide by Alex Lex

slide-23
SLIDE 23

Clustering

  • Classifica5on of items into “similar” bins
  • Typically based on a similarity measure
  • Euclidean distance, Pearson correla5on, etc…
  • Many different clustering algorithms, with

weaknesses and strengths

  • K-Means
  • Hierarchical clustering
slide-24
SLIDE 24

K-Means

  • Pick K star5ng points as centroids.

Those eventually will comprise the clusters

  • Calculate distance of every point to

centroid, assigning the point to the closest centroid

  • Update the centroid to the average
  • f the cluster’s members
  • Repeat
slide-25
SLIDE 25

K-Means

  • Have to pick K
  • Assump5ons about the data:

roughly “circular” clusters of equal size

http://stats.stackexchange.com/questions/133656/how-to- understand-the-drawbacks-of-k-means

Limita5ons

slide-26
SLIDE 26

K-Means

Limita5ons

http://stats.stackexchange.com/questions/133656/how-to-understand-the-drawbacks-of-k-means

slide-27
SLIDE 27

Dimensionality Reduc1on

Item ANr 1 ANr 2 ANr 3 ANr 4 ANr 5 ANr 6 ANr 7 ANr 8 ANr 9 ANr 10 ANr 11 …

A B C …

  • High-dimensional data: large number of

aGributes

  • Dimensionality reduc5on: Reduce number of

dimensions (aGributes) while keeping as much varia5on as possible

slide-28
SLIDE 28

Dimensionality Reduc1on

Item ANr 1 ANr 2 ANr 3 ANr 4 ANr 5 ANr 6 ANr 7 ANr 8 ANr 9 ANr 10 ANr 11 …

A B C …

  • Principle component analysis
  • Mul5dimensional scaling
  • And other techniques…
slide-29
SLIDE 29

Principle Component Analysis (PCA)

  • Find a new set of

dimensions (axes) that explains the majority of the variance in the data

  • Order the new dimensions

by variance

  • The first principle

component accounts for most variance

slide-30
SLIDE 30

Principle Component Analysis (PCA)

http://setosa.io/ev/principal-component-analysis/

slide-31
SLIDE 31

Mul1dimensional scaling (MDS)

  • Project the high-dimensional

space onto a much lower space (e.g, 2D)

  • Relies on similarity between

points (usually have to compute pairwise similarity between every pair of points)

  • Non-linear transforma5on:

More difficult to interpret than PCA, but can maintain structures beGer in some cases

slide-32
SLIDE 32

Models in Visual Analy1cs

Adapted from: http://slideplayer.com/slide/4659134/ and from Remo Chang, 2010

slide-33
SLIDE 33

Models in Visual Analy1cs

  • Abstrac5ons of how visualiza5on works:
  • Provide a way of talking about how

humans interact with visualiza5ons

  • Language for describing different parts of

the visual analy5c process

  • Every model is (overly) simplified:

beware!

slide-34
SLIDE 34

Terminology / Assump1ons

  • Sense making: The act of processing incomplete

informa5on in order to improve one’s understanding

  • f a situa5on and/or to make decisions
  • A person’s decision making is bound by [1]
  • incomplete informa5on
  • the amount of 5me they have to decide
  • the finite processing power of their brain
  • Mental model: An abstracted versions of the real-

world that are more tractable

[1] H. Simon 1957. “A Behavioral Model of Rational Choice”

slide-35
SLIDE 35

Models in Visual Analy1cs

slide-36
SLIDE 36

Informa1on Visualiza1on Reference Model

Card, Mackinlay, and Scneiderman. Readings in Information Visualization: Using Vision to Think, Morgan Kaufmann, 1999, pp. 17

slide-37
SLIDE 37

Van Wijk’s Model

Van Wijk, J. “The value of visualization”, 2005

D=Data V=Visualiza5on S=Specifica5on I=Image P=Percep5on K=Knowledge E=Explora5on

slide-38
SLIDE 38

Keim’s Visual Analy1cs Model

Keim, D et al. “Visual Analytics: Definition, process, and challenges”, 2008

slide-39
SLIDE 39

Pirolli and Card Sensemaking model

Pirolli, P and Card, S. “The sense making process and leverage points for analyst technology as identified through cognitive task analysis”, 2005

slide-40
SLIDE 40

Next week

Class canceled — Happy Thanksgiving!

Week 15 - Nov 30

Time series and temporal data Inference and uncertainty in visualiza5on