Courtesy of Prof. Shixia Liu @Tsinghua University Outline - - PowerPoint PPT Presentation

courtesy of prof shixia liu tsinghua university outline
SMART_READER_LITE
LIVE PREVIEW

Courtesy of Prof. Shixia Liu @Tsinghua University Outline - - PowerPoint PPT Presentation

Courtesy of Prof. Shixia Liu @Tsinghua University Outline Introduction Classification of Techniques Table Scatter Plot Matrices Projections Parallel Coordinates Summary Motivation Real world data contain multiple


slide-1
SLIDE 1

Courtesy of Prof. Shixia Liu @Tsinghua University

slide-2
SLIDE 2

Outline

  • Introduction
  • Classification of Techniques

– Table – Scatter Plot Matrices – Projections – Parallel Coordinates

  • Summary
slide-3
SLIDE 3

Motivation

  • Real world data contain multiple dimensions
slide-4
SLIDE 4

Multivariate/Multidimensional Data Visualization

  • Multivariate data visualization is a specific type of

information visualization that deals with multivariate/multidimensional data

  • The data to be visualized are of high

dimensionality in which the correlations between these many attributes are of interest

slide-5
SLIDE 5

Dimensionality

  • Refers to the number of attributes that presents in

the data

– 1: one-dimensional 1D / univariate – 2: two-dimensional 2D/ bivaraite – 3: three-dimensional 3D / trivariate – ≥3: multidimensional / hypervarite / multivariate

  • Boundary between high and low dimensionality

not clear, generally high dimensionality has >4 variables

slide-6
SLIDE 6

Terminology

Dimensions Variables Multidimensional Dimensionality of the independent dimensions Multivariate Dimensionality of the dependent variables

slide-7
SLIDE 7

Outline

  • Introduction
  • Classification of Techniques

– Projections – Parallel Coordinates – Table – Scatter Plot Matrices

  • Summary
slide-8
SLIDE 8

Classification of Techniques

  • Projection
  • Parallel Coordinates Plot
  • Table
  • Scatter Plot Matrix
slide-9
SLIDE 9
  • What if we have too many dimensions?
  • A intuitive way is to project to low dimension

space

  • Linear projections
  • Nonlinear projections

A projection (X -> Y) maps points {x1, x2, …, xm} in an n-dimensional space into a p-dimensional space as {y1, y2, …, ym} (p << n) while preserving distance measures of data items.

slide-10
SLIDE 10

Classification

  • Linear projection

– Example: PCA (principal

component analysis)

  • Non-linear projection

– Example: t-SNE (t-distributed

stochastic neighbor embedding)

slide-11
SLIDE 11

PCA

  • Seeks a space of lower

dimensionality (magenta)

  • Such that the orthogonal

projection of the data points (red) onto this subspace maximizes the variance of the projected points (green)

slide-12
SLIDE 12

Maximizes Variance

  • To begin with, consider the projection onto a one-

dimensional space

  • The direction of this space
  • Variance
  • How to maximize this?

Trick:

slide-13
SLIDE 13

Maximizes Variance (cont’d)

  • Eigenvalue
slide-14
SLIDE 14

One Example

slide-15
SLIDE 15

Extension to M-dimension

  • Define additional principal components in an

incremental fashion (details refer to Chapter 12 in Patter Recognition and Machine Learning)

  • Conclusion of M dimension:
  • The M eigenvectors u1,...,uM of the data

covariance matrix S corresponding to the M largest eigenvalues λ1 ,...,λM

slide-16
SLIDE 16

Covariance Matrix Covariance

slide-17
SLIDE 17

Fit an n-d Ellipsoid to the Data

slide-18
SLIDE 18

T-SNE

slide-19
SLIDE 19

T-SNE

  • Particularly well-suited for embedding high-

dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot

slide-20
SLIDE 20

Major Goal

  • t-Distributed stochastic neighbor embedding (t-

SNE) minimizes the divergence between two distributions: a distribution that measures pairwise similarities of the input objects and a distribution that measures pairwise similarities of the corresponding low-dimensional points in the embedding.

slide-21
SLIDE 21

Two Main Stages

  • First, t-SNE constructs a probability

distribution over pairs of high-dimensional objects

– Similar objects have a high probability of being picked – Dissimilar points have an extremely small probability of

being picked

slide-22
SLIDE 22

Example – Step 1

slide-23
SLIDE 23

Two Main Stages (cont’d)

  • Second, t-SNE defines a probability distribution
  • ver the points in the low-dimensional map

– Similar to the one in high-dimensional space – Minimizes the Kullback–Leibler divergence between

the two distributions with respect to the locations of the points in the map.

Heavy-tailed student-t distribution

slide-24
SLIDE 24

Example: Step Two

slide-25
SLIDE 25

Example: Step Two

Before optimization

slide-26
SLIDE 26

Example: Final Result

Student t-distribution Gaussian distribution

slide-27
SLIDE 27

The t-Student distribution

  • The volume of the N-dimensional ball of radius r

scales is

  • When N is large, if we pick random points

uniformly in the ball, most points will be close to the surface, and very few will be near the center.

slide-28
SLIDE 28

The t-Student distribution

  • If the same Gaussian distribution is used for the

low dimensional map points, not enough space is available in low dimensional space

– The crowding problem

  • Use a t-Student with one degree of freedom (or

Cauchy) distribution instead for the map points.

– Has a much heavier tail than the Gaussian distribution,

which compensates the original imbalance.

slide-29
SLIDE 29

Comparison

slide-30
SLIDE 30

The Distribution Model

  • Probability model for high-dimensional data points
  • Probability model for low-dimensional map points
  • The different between two distributions
slide-31
SLIDE 31

The Solution

  • To minimize this score, we perform a gradient
  • descent. The gradient can be computed

analytically:

  • Update yi iteratively
slide-32
SLIDE 32

One Example

slide-33
SLIDE 33

Example: MNIST

  • Hand written digit (0-9)
slide-34
SLIDE 34

Package

  • Laurens van der Maate https://lvdmaaten.github.io/tsne/

– L.J.P. van der Maaten. Accelerating t-SNE using Tree-Based

  • Algorithms. Journal of Machine Learning Research 15(Oct):

3221-3245, 2014.

– L.J.P. van der Maaten and G.E. Hinton. Visualizing Non-Metric

Similarities in Multiple Maps. Machine Learning 87(1):33-55, 2012.

– L.J.P. van der Maaten. Learning a Parametric Embedding by

Preserving Local Structure. In Proceedings of the Twelfth International Conference on Artificial Intelligence & Statistics (AI- STATS), JMLR W&CP 5:384-391, 2009. PDF

– L.J.P. van der Maaten and G.E. Hinton. Visualizing High-

Dimensional Data Using t-SNE. Journal of Machine Learning Research 9(Nov):2579-2605, 2008.

slide-35
SLIDE 35

Comparison

  • PCA, MDS

– Linear technique – Keep the low-dimensional representations of dissimilar

data points far apart

  • t-SNE

– Non-linear technique – Capture much of the local structure of the high-

dimensional data very well, while also revealing global structure such as the presence of clusters at multiple scales.

slide-36
SLIDE 36

Comparison

slide-37
SLIDE 37
  • Inselberg, "Multidimensional detective" (parallel

coordiantes), 1997

slide-38
SLIDE 38

Parallel Coordinates: Visual Design

  • Dimensions as parallel

axes

  • Data items as line

segments

  • Intersections on the axes

indicates the values of the corresponding attributes

dim1 dim2 dim3 dimn …… Min: 0 Max: 1 0.8 0.6 0.8 0.3 0.25

slide-39
SLIDE 39

Parallel Coordinates: Pros and Cons

Correlations among

attributes studied by spotting the locations of the intersection points

  • Effective for revealing

data distributions and functional dependencies

  • Visual clutter due to

limited space available for each parallel axis

  • Axes packed very

closely when dimensionality is high

slide-40
SLIDE 40
  • Clustering and filtering approaches
  • Dimension reordering approaches
  • Visual enhancement approaches

Out5d dataset (5 dimensions, 16384 data items)

slide-41
SLIDE 41

Star Coordinates

  • Scatterplots for higher

dimensions: attribute as axis on a circle, data item as point

  • Change the length of axis

alters contribution of attribute

  • Change the direction of axis

angles not equal, adjusts correlations between attributes Useful for gaining insight into hierarchically clustered datasets and for multi-factor analysis for decision-making

slide-42
SLIDE 42

Table Lens

  • Represents rows as data items and columns as

attributes

  • Each column viewed as histogram or plot
  • Information along rows or columns interrelated
  • Uses the familiar concept “table”

The table lens: merging graphical and symbolic representations in an interactive focus+ context visualization for tabular information

slide-43
SLIDE 43

Scatterplot Matrix

  • Scatterplot: 2 attributes

projected along the x- and y-axis

  • Collection of scatterplots

is organized in a matrix

  • Straightforward
  • Important patterns in

higher dimensions barely recognized

  • Chaotic when number of

data items too large

slide-44
SLIDE 44

Outline

  • Introduction
  • Classification of Techniques

– Table – Scatter Plot Matrices – Projections – Parallel Coordinates – Pixel-Oriented Techniques – Iconography

  • Summary
slide-45
SLIDE 45

Visualizations Advantages Disadvantages Clear visual patterns

  • 1. Obscured semantics
  • 2. Loss of information
  • 3. Visual Clutter

Clear visual patterns Visual Clutter Uses the familiar concept “table” Support limited numbers

  • f dimensions

Simple

  • 1. Visual clutter
  • 2. Unclear patterns
slide-46
SLIDE 46

Further Reading

  • Survey

– Dos Santos, Selan, and Ken Brodlie. "Gaining

understanding of multivariate and multidimensional data through visualization." Computers & Graphics28.3 (2004): 311-325.

  • Website

– http://www.sci.utah.edu/~shusenl/highDimSurvey/

website/

slide-47
SLIDE 47

Further Reading

  • Evaluation

– Rubio-Sánchez, Manuel, et al. "A comparative study

between RadViz and Star Coordinates." IEEE transactions on visualization and computer graphics 22.1 (2016): 619-628.

slide-48
SLIDE 48

References

  • Rao, Ramana, and Stuart K. Card. "The table lens: merging graphical

and symbolic representations in an interactive focus+ context visualization for tabular information." Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 1994.

  • Gratzl, Samuel, et al. "Lineup: Visual analysis of multi-attribute

rankings."IEEE transactions on visualization and computer graphics 19.12 (2013): 2277-2286.

  • van Wijk, Jarke J., and Robert van Liere. "HyperSlice: visualization of

scalar functions of many variables." Proceedings of the 4th conference on Visualization'93. IEEE Computer Society, 1993.

  • Kim, Hannah, et al. "InterAxis: Steering Scatterplot Axes via

Observation-Level Interaction." IEEE transactions on visualization and computer graphics22.1 (2016): 131-140.

slide-49
SLIDE 49

References

  • Maaten, Laurens van der, and Geoffrey Hinton. "Visualizing data

using t-SNE." Journal of Machine Learning Research 9.Nov (2008): 2579-2605.

  • Zhou, Hong, et al. "Visual clustering in parallel

coordinates." Computer Graphics Forum. Vol. 27. No. 3., 2008.

  • Ferdosi, Bilkis J., and Jos BTM Roerdink. "Visualizing High‐

Dimensional Structures by Dimension Ordering and Filtering using Subspace Analysis."Computer Graphics Forum. Vol. 30. No. 3, 2011.

  • Novotny, Matej, and Helwig Hauser. "Outlier-preserving focus+

context visualization in parallel coordinates." IEEE Transactions on Visualization and Computer Graphics 12.5 (2006): 893-900.

slide-50
SLIDE 50

References

  • Keim, Daniel A., and H-P. Kriegel. "Visualization techniques for

mining large databases: A comparison." IEEE Transactions on knowledge and data engineering 8.6 (1996): 923-938.