[PPT] - Courtesy of Prof. Shixia Liu @Tsinghua University Outline PowerPoint Presentation

SLIDE 1

Courtesy of Prof. Shixia Liu @Tsinghua University

SLIDE 2

Outline

Introduction
Classification of Techniques

– Table – Scatter Plot Matrices – Projections – Parallel Coordinates

Summary

SLIDE 3

Motivation

Real world data contain multiple dimensions

SLIDE 4

Multivariate/Multidimensional Data Visualization

Multivariate data visualization is a specific type of

information visualization that deals with multivariate/multidimensional data

The data to be visualized are of high

dimensionality in which the correlations between these many attributes are of interest

SLIDE 5

Dimensionality

Refers to the number of attributes that presents in

the data

– 1: one-dimensional 1D / univariate – 2: two-dimensional 2D/ bivaraite – 3: three-dimensional 3D / trivariate – ≥3: multidimensional / hypervarite / multivariate

Boundary between high and low dimensionality

not clear, generally high dimensionality has >4 variables

SLIDE 6

Terminology

Dimensions Variables Multidimensional Dimensionality of the independent dimensions Multivariate Dimensionality of the dependent variables

SLIDE 7

Outline

Introduction
Classification of Techniques

– Projections – Parallel Coordinates – Table – Scatter Plot Matrices

Summary

SLIDE 8

Classification of Techniques

Projection
Parallel Coordinates Plot
Table
Scatter Plot Matrix

SLIDE 9

What if we have too many dimensions?
A intuitive way is to project to low dimension

space

Linear projections
Nonlinear projections

A projection (X -> Y) maps points {x1, x2, …, xm} in an n-dimensional space into a p-dimensional space as {y1, y2, …, ym} (p << n) while preserving distance measures of data items.

SLIDE 10

Classification

Linear projection

– Example: PCA (principal

component analysis)

Non-linear projection

– Example: t-SNE (t-distributed

stochastic neighbor embedding)

SLIDE 11

PCA

Seeks a space of lower

dimensionality (magenta)

Such that the orthogonal

projection of the data points (red) onto this subspace maximizes the variance of the projected points (green)

SLIDE 12

Maximizes Variance

To begin with, consider the projection onto a one-

dimensional space

The direction of this space
Variance
How to maximize this?

Trick:

SLIDE 13

Maximizes Variance (cont’d)

Eigenvalue

SLIDE 14

One Example

SLIDE 15

Extension to M-dimension

Define additional principal components in an

incremental fashion (details refer to Chapter 12 in Patter Recognition and Machine Learning)

Conclusion of M dimension:
The M eigenvectors u1,...,uM of the data

covariance matrix S corresponding to the M largest eigenvalues λ1 ,...,λM

SLIDE 16

Covariance Matrix Covariance

SLIDE 17

Fit an n-d Ellipsoid to the Data

SLIDE 18

T-SNE

SLIDE 19

T-SNE

Particularly well-suited for embedding high-

dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot

SLIDE 20

Major Goal

t-Distributed stochastic neighbor embedding (t-

SNE) minimizes the divergence between two distributions: a distribution that measures pairwise similarities of the input objects and a distribution that measures pairwise similarities of the corresponding low-dimensional points in the embedding.

SLIDE 21

Two Main Stages

First, t-SNE constructs a probability

distribution over pairs of high-dimensional objects

– Similar objects have a high probability of being picked – Dissimilar points have an extremely small probability of

being picked

SLIDE 22

Example – Step 1

SLIDE 23

Two Main Stages (cont’d)

Second, t-SNE defines a probability distribution
ver the points in the low-dimensional map

– Similar to the one in high-dimensional space – Minimizes the Kullback–Leibler divergence between

the two distributions with respect to the locations of the points in the map.

Heavy-tailed student-t distribution

SLIDE 24

Example: Step Two

SLIDE 25

Example: Step Two

Before optimization

SLIDE 26

Example: Final Result

Student t-distribution Gaussian distribution

SLIDE 27

The t-Student distribution

The volume of the N-dimensional ball of radius r

scales is

When N is large, if we pick random points

uniformly in the ball, most points will be close to the surface, and very few will be near the center.

SLIDE 28

The t-Student distribution

If the same Gaussian distribution is used for the

low dimensional map points， not enough space is available in low dimensional space

– The crowding problem

Use a t-Student with one degree of freedom (or

Cauchy) distribution instead for the map points.

– Has a much heavier tail than the Gaussian distribution,

which compensates the original imbalance.

SLIDE 29

Comparison

SLIDE 30

The Distribution Model

Probability model for high-dimensional data points
Probability model for low-dimensional map points
The different between two distributions

SLIDE 31

The Solution

To minimize this score, we perform a gradient
descent. The gradient can be computed

analytically:

Update yi iteratively

SLIDE 32

One Example

SLIDE 33

Example: MNIST

Hand written digit (0-9)

SLIDE 34

Package

Laurens van der Maate https://lvdmaaten.github.io/tsne/

– L.J.P. van der Maaten. Accelerating t-SNE using Tree-Based

Algorithms. Journal of Machine Learning Research 15(Oct):

3221-3245, 2014.

– L.J.P. van der Maaten and G.E. Hinton. Visualizing Non-Metric

Similarities in Multiple Maps. Machine Learning 87(1):33-55, 2012.

– L.J.P. van der Maaten. Learning a Parametric Embedding by

Preserving Local Structure. In Proceedings of the Twelfth International Conference on Artificial Intelligence & Statistics (AI- STATS), JMLR W&CP 5:384-391, 2009. PDF

– L.J.P. van der Maaten and G.E. Hinton. Visualizing High-

Dimensional Data Using t-SNE. Journal of Machine Learning Research 9(Nov):2579-2605, 2008.

SLIDE 35

Comparison

PCA, MDS

– Linear technique – Keep the low-dimensional representations of dissimilar

data points far apart

t-SNE

– Non-linear technique – Capture much of the local structure of the high-

dimensional data very well, while also revealing global structure such as the presence of clusters at multiple scales.

SLIDE 36

Comparison

SLIDE 37

Inselberg， "Multidimensional detective" (parallel

coordiantes), 1997

SLIDE 38

Parallel Coordinates: Visual Design

Dimensions as parallel

axes

Data items as line

segments

Intersections on the axes

indicates the values of the corresponding attributes

dim1 dim2 dim3 dimn …… Min: 0 Max: 1 0.8 0.6 0.8 0.3 0.25

SLIDE 39

Parallel Coordinates: Pros and Cons

Correlations among

attributes studied by spotting the locations of the intersection points

Effective for revealing

data distributions and functional dependencies

Visual clutter due to

limited space available for each parallel axis

Axes packed very

closely when dimensionality is high

SLIDE 40

Clustering and filtering approaches
Dimension reordering approaches
Visual enhancement approaches

Out5d dataset (5 dimensions, 16384 data items)

SLIDE 41

Star Coordinates

Scatterplots for higher

dimensions: attribute as axis on a circle, data item as point

Change the length of axis

alters contribution of attribute

Change the direction of axis

angles not equal, adjusts correlations between attributes Useful for gaining insight into hierarchically clustered datasets and for multi-factor analysis for decision-making

SLIDE 42

Table Lens

Represents rows as data items and columns as

attributes

Each column viewed as histogram or plot
Information along rows or columns interrelated
Uses the familiar concept “table”

The table lens: merging graphical and symbolic representations in an interactive focus+ context visualization for tabular information

SLIDE 43

Scatterplot Matrix

Scatterplot: 2 attributes

projected along the x- and y-axis

Collection of scatterplots

is organized in a matrix

Straightforward
Important patterns in

higher dimensions barely recognized

Chaotic when number of

data items too large

SLIDE 44

Outline

Introduction
Classification of Techniques

– Table – Scatter Plot Matrices – Projections – Parallel Coordinates – Pixel-Oriented Techniques – Iconography

Summary

SLIDE 45

Visualizations Advantages Disadvantages Clear visual patterns

1. Obscured semantics
2. Loss of information
3. Visual Clutter

Clear visual patterns Visual Clutter Uses the familiar concept “table” Support limited numbers

f dimensions

Simple

1. Visual clutter
2. Unclear patterns

SLIDE 46

References

Rao, Ramana, and Stuart K. Card. "The table lens: merging graphical

and symbolic representations in an interactive focus+ context visualization for tabular information." Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 1994.

Gratzl, Samuel, et al. "Lineup: Visual analysis of multi-attribute

rankings."IEEE transactions on visualization and computer graphics 19.12 (2013): 2277-2286.

van Wijk, Jarke J., and Robert van Liere. "HyperSlice: visualization of

scalar functions of many variables." Proceedings of the 4th conference on Visualization'93. IEEE Computer Society, 1993.

Kim, Hannah, et al. "InterAxis: Steering Scatterplot Axes via

Observation-Level Interaction." IEEE transactions on visualization and computer graphics22.1 (2016): 131-140.

SLIDE 49

References

Maaten, Laurens van der, and Geoffrey Hinton. "Visualizing data

using t-SNE." Journal of Machine Learning Research 9.Nov (2008): 2579-2605.

Zhou, Hong, et al. "Visual clustering in parallel

coordinates." Computer Graphics Forum. Vol. 27. No. 3., 2008.

Ferdosi, Bilkis J., and Jos BTM Roerdink. "Visualizing High‐

Dimensional Structures by Dimension Ordering and Filtering using Subspace Analysis."Computer Graphics Forum. Vol. 30. No. 3, 2011.

Novotny, Matej, and Helwig Hauser. "Outlier-preserving focus+

context visualization in parallel coordinates." IEEE Transactions on Visualization and Computer Graphics 12.5 (2006): 893-900.

SLIDE 50

References

Keim, Daniel A., and H-P. Kriegel. "Visualization techniques for