Robust cartogram visualization of outliers in manifold leaning - - PowerPoint PPT Presentation

robust cartogram visualization of outliers in manifold
SMART_READER_LITE
LIVE PREVIEW

Robust cartogram visualization of outliers in manifold leaning - - PowerPoint PPT Presentation

Introduction Methods Experiments Robust cartogram visualization of outliers in manifold leaning Alessandra Tosi and Alfredo Vellido atosi@lsi.upc.edu - avellido@lsi.upc.edu LSI Department - UPC, Barcelona Robust cartogram visualization of


slide-1
SLIDE 1

Introduction Methods Experiments

Robust cartogram visualization of outliers in manifold leaning

Alessandra Tosi and Alfredo Vellido

atosi@lsi.upc.edu - avellido@lsi.upc.edu

LSI Department - UPC, Barcelona

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-2
SLIDE 2

Introduction Methods Experiments

1

Introduction Goals

2

Methods NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

3

Experiments Cartograms representations for GTM and its variants Results

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-3
SLIDE 3

Introduction Methods Experiments Goals

Table of Contents

1

Introduction Goals

2

Methods

3

Experiments

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-4
SLIDE 4

Introduction Methods Experiments Goals

PROBLEM: Increasing amount of available high-dimensional data sets, with different levels of complexity and growing diversity of characteristics.

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-5
SLIDE 5

Introduction Methods Experiments Goals

PROBLEM: Increasing amount of available high-dimensional data sets, with different levels of complexity and growing diversity of characteristics. CHALLENGE: Translation of raw data into useful information that can be acted upon in practical terms.

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-6
SLIDE 6

Introduction Methods Experiments Goals

PROBLEM: Increasing amount of available high-dimensional data sets, with different levels of complexity and growing diversity of characteristics. CHALLENGE: Translation of raw data into useful information that can be acted upon in practical terms. Nonlinear Dimensionality Reduction: Nonlinear techniques are applied to reduce dimensionality of data in order to explore multivariate data. It is almost impossible to completely avoid geometrical distortions while reducing dimensionality

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-7
SLIDE 7

Introduction Methods Experiments Goals

PROBLEM: Increasing amount of available high-dimensional data sets, with different levels of complexity and growing diversity of characteristics. CHALLENGE: Translation of raw data into useful information that can be acted upon in practical terms. Nonlinear Dimensionality Reduction: Nonlinear techniques are applied to reduce dimensionality of data in order to explore multivariate data. It is almost impossible to completely avoid geometrical distortions while reducing dimensionality Distortion Measures: Quantify and visualize this distortion itself in

  • rder to interpret data in a more faithful way.

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-8
SLIDE 8

Introduction Methods Experiments Goals

PROBLEM: Increasing amount of available high-dimensional data sets, with different levels of complexity and growing diversity of characteristics. CHALLENGE: Translation of raw data into useful information that can be acted upon in practical terms. Nonlinear Dimensionality Reduction: Nonlinear techniques are applied to reduce dimensionality of data in order to explore multivariate data. It is almost impossible to completely avoid geometrical distortions while reducing dimensionality Distortion Measures: Quantify and visualize this distortion itself in

  • rder to interpret data in a more faithful way.

Visualization: Explicitly reintroducing the local distortion created by NLDR models into the low-dimensional representation of the MVD for visualization that they produce.

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-9
SLIDE 9

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

Table of Contents

1

Introduction

2

Methods NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

3

Experiments

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-10
SLIDE 10

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

NLDR methods for MVD visualization

To successfully analyse real data, more complex models are often required: Nonlinear Dimensionality Reduction models (NLDR).

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-11
SLIDE 11

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

NLDR methods for MVD visualization

To successfully analyse real data, more complex models are often required: Nonlinear Dimensionality Reduction models (NLDR). Manifold learning attempts to describe MVD through nonlinear low-dimensional manifolds embedded in the observed data space. The aim is to discover the underlying geometry of data, while preserving the topology rather than pairwise distances and generating a low- dimensionality model.

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-12
SLIDE 12

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

NLDR methods for MVD visualization

To successfully analyse real data, more complex models are often required: Nonlinear Dimensionality Reduction models (NLDR). Latent Variables Models attempt to provide an additional set of variables (latent or hidden variables) in addition to the observed ones.

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-13
SLIDE 13

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

NLDR methods for MVD visualization

To successfully analyse real data, more complex models are often required: Nonlinear Dimensionality Reduction models (NLDR). Vector quantization reduces the number of observation by replacing

  • riginal data with a smaller set of vectors of the same dimension, called

prototypes (units, neurons, centroids, weight vectors)

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-14
SLIDE 14

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

Generative Topographic Mapping (GTM)

The Generative Topographic Mapping (GTM) is a nonlinear Latent Variable Model developed by Bishop, Svens´ en and Williams in the late nineties.

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-15
SLIDE 15

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

Generative Topographic Mapping (GTM)

The Generative Topographic Mapping (GTM) is a nonlinear Latent Variable Model developed by Bishop, Svens´ en and Williams in the late nineties. Basic GTM defines a Gaussian probability distribution in the latent space, in order to induce the corresponding probability distribution in the

  • bserved data space, using concepts of Bayesian inference. Images of

sampled data points, or prototypes, are defined according to the following rule: yk = WΦ(uk)

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-16
SLIDE 16

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

Generative Topographic Mapping (GTM)

The Generative Topographic Mapping (GTM) is a nonlinear Latent Variable Model developed by Bishop, Svens´ en and Williams in the late nineties. Basic GTM defines a Gaussian probability distribution in the latent space, in order to induce the corresponding probability distribution in the

  • bserved data space, using concepts of Bayesian inference. Images of

sampled data points, or prototypes, are defined according to the following rule: yk = WΦ(uk) The basic GTM model has some limitations when dealing with atypical data or outliers, as they are likely to bias the estimation of its

  • parameters. More robust formulations of GTM have been proposed using

a mixture of Student’s t-distributions (t-GTM).

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-17
SLIDE 17

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

Magnification Factor

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-18
SLIDE 18

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

Magnification Factor

dA’/dA =

  • det(JJT )

J is the Jacobian (of dimension 2 × d) of the mapping transformation.

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-19
SLIDE 19

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

Cartograms

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-20
SLIDE 20

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

Cartograms

∂ (Tx1, Tx2) ∂ (x1, x2) ∝ d

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-21
SLIDE 21

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

Cartograms representations for NLDR methods

We propose a Cartogram-based method, in which :

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-22
SLIDE 22

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

Cartograms representations for NLDR methods

We propose a Cartogram-based method, in which : political borders of geographic maps are replaced by the square grid

  • f latent points uk in the visualization space

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-23
SLIDE 23

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

Cartograms representations for NLDR methods

We propose a Cartogram-based method, in which : political borders of geographic maps are replaced by the square grid

  • f latent points uk in the visualization space

map-underlying quantities such as density of population are replaced by the Magnification Factor

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-24
SLIDE 24

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

Cartograms representations for NLDR methods

We propose a Cartogram-based method, in which : political borders of geographic maps are replaced by the square grid

  • f latent points uk in the visualization space

map-underlying quantities such as density of population are replaced by the Magnification Factor the level of distortion within each of the squares associated to uk is assumed to be uniform

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-25
SLIDE 25

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

Cartograms representations for NLDR methods

We propose a Cartogram-based method, in which : political borders of geographic maps are replaced by the square grid

  • f latent points uk in the visualization space

map-underlying quantities such as density of population are replaced by the Magnification Factor the level of distortion within each of the squares associated to uk is assumed to be uniform the level of distortion in the space beyond this square grid is assumed to be uniform and equal to the mean distortion over the complete map, that is 1/K K

k=1 J(uk), where J is the Jacobian of

the transformation of the considered method

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-26
SLIDE 26

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

Cartograms representations for NLDR methods

GOAL: better visualize the embedded model manifold, expecting inter-point distances in the observed data space to be more faithfully reflected in the low-dimensional representation space.

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-27
SLIDE 27

Introduction Methods Experiments NLDR methods: Generative Topographic Mapping Distortion measures in NLDR: Magnification Factor Cartogram-based representation

Cartograms representations for NLDR methods

GOAL: better visualize the embedded model manifold, expecting inter-point distances in the observed data space to be more faithfully reflected in the low-dimensional representation space. An advantage of this cartogram-based method is its portability, as it should be easy to implement for different representation architectures and with alternative NLDR visualization techniques for which distortion can be quantified.

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-28
SLIDE 28

Introduction Methods Experiments Cartograms representations for GTM and its variants Results

Table of Contents

1

Introduction

2

Methods

3

Experiments Cartograms representations for GTM and its variants Results

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-29
SLIDE 29

Introduction Methods Experiments Cartograms representations for GTM and its variants Results

Cartograms representations for t-GTM

In the following experiments we investigate the impact of outliers on the visualization using both basic GTM and t-GTM.

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-30
SLIDE 30

Introduction Methods Experiments Cartograms representations for GTM and its variants Results

Cartograms representations for GTM

Calculate over continuum the Jacobian J of the mapping transformation in basic GTM algorithm, in terms of the derivatives of the basis functions Φ, and apply the Magnification Factor (MF) formula: dA’/dA =

  • det(JJT )

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-31
SLIDE 31

Introduction Methods Experiments Cartograms representations for GTM and its variants Results

Cartograms representations for GTM

Calculate over continuum the Jacobian J of the mapping transformation in basic GTM algorithm, in terms of the derivatives of the basis functions Φ, and apply the Magnification Factor (MF) formula: dA’/dA =

  • det(JJT )

Basic GTM dA’/dA =

  • det(ΨTWTWΨ)

where Ψ is a M × 2 matrix with elements ψmi = ∂φm/∂ui, m = 1, . . . , M, i = 1, . . . , 2.

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-32
SLIDE 32

Introduction Methods Experiments Cartograms representations for GTM and its variants Results

Cartograms representations for t-GTM

The conditional distribution of the observed data variables, given the latent variables, p(x|u) takes the following form for t-GTM: p(x|u, W, β, ν) = Γ( ν+D

2 )β

D 2

Γ( ν

2 )(νπ)D/2 (1 + β

ν x − y(u)2)− ν+D

2 ,

(1) To implement the Magnification Factor, we explicitly calculate the Jacobian J = ΨW, where Ψ is a M × 2 matrix with elements ϕmi, defined as:

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-33
SLIDE 33

Introduction Methods Experiments Cartograms representations for GTM and its variants Results

Cartograms representations for t-GTM

The conditional distribution of the observed data variables, given the latent variables, p(x|u) takes the following form for t-GTM: p(x|u, W, β, ν) = Γ( ν+D

2 )β

D 2

Γ( ν

2 )(νπ)D/2 (1 + β

ν x − y(u)2)− ν+D

2 ,

(1) To implement the Magnification Factor, we explicitly calculate the Jacobian J = ΨW, where Ψ is a M × 2 matrix with elements ϕmi, defined as: t-GTM ∂φm ∂ui = Γ( ν+D

2 )(−ν − D)β

D+2 2

Γ( ν

2 )πD/2ν

D+2 2

  • ui − µi

m

1 + β ν u − µm2 − ν+D−2

2

(2)

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-34
SLIDE 34

Introduction Methods Experiments Cartograms representations for GTM and its variants Results

Cartograms representations for GTM

−30 −20 −10 10 20 −20 20 40 −20 −15 −10 −5 5 10 15 20 GTM −30 −20 −10 10 20 −20 20 40 −20 −15 −10 −5 5 10 15 20 t−GTM

Representation of data together with the manifold grid (GTM on the left, t-GTM on the right).

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-35
SLIDE 35

Introduction Methods Experiments Cartograms representations for GTM and its variants Results

Cartograms representations for GTM

Representation of MF maps and corresponding cartograms (GTM on the left, t-GTM on the right).

500 1000 1500 2000 500 1000 1500 2000

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-36
SLIDE 36

Introduction Methods Experiments Cartograms representations for GTM and its variants Results

Cartograms representations for GTM

Representation of MF maps and corresponding cartograms (GTM on the left, t-GTM on the right).

500 1000 1500 2000 500 1000 1500 2000

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-37
SLIDE 37

Introduction Methods Experiments Cartograms representations for GTM and its variants Results

Cartograms representations for GTM

−40 −20 20 40 −20 −10 10 20 30 −20 −15 −10 −5 5 10 15 20 GTM −40 −20 20 40 −20 −10 10 20 30 −20 −15 −10 −5 5 10 15 20 t−GTM

Representation of data together with the manifold grid (GTM on the left, t-GTM on the right).

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-38
SLIDE 38

Introduction Methods Experiments Cartograms representations for GTM and its variants Results

Cartograms representations for GTM

Representation of MF maps and corresponding cartograms (GTM on the left, t-GTM on the right).

200 400 600 800 1000 1200 1400 1600 200 400 600 800 1000 1200 1400 1600

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-39
SLIDE 39

Introduction Methods Experiments Cartograms representations for GTM and its variants Results

Cartograms representations for GTM

Representation of MF maps and corresponding cartograms (GTM on the left, t-GTM on the right).

200 400 600 800 1000 1200 1400 1600 200 400 600 800 1000 1200 1400 1600

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-40
SLIDE 40

Introduction Methods Experiments Cartograms representations for GTM and its variants Results

Useful Links

Cartograms Software Somtoolbox for Matlab Netlab3 3 for Matlab

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-41
SLIDE 41

Introduction Methods Experiments Cartograms representations for GTM and its variants Results

A short bibliography

  • M. Aupetit, Visualizing distortions and recovering topology in continuous projection techniques, Neurocomputing 70(7-9),

pp.1304-1330, 2007. C.M. Bishop, M. Svens´ en and C.K.I. Williams, Magnification factors for the SOM and GTM algorithms, Proceedings of the Workshop on Self-Organizing Maps (WSOM’97), pp.333-338, June 4-6, Helsinki (Finland), 1997. M.T. Gastner and M.E.J. Newman, Diffusion-based method for producing density-equalizing maps, Proceedings of the National Academy of Sciences of the United States of America, 101(20), pp.7499-7504, National Academy of Sciences, 2004.

  • A. Tosi, A. Vellido, Cartogram representation of the batch-SOM magnification factor. Proceedings of European Symposium on

Artificial Neural Networks (ESANN), Bruges, Belgium, pp.203-207, 2012.

  • A. Vellido, Missing data imputation through GTM as a mixture of t-distributions, Neural Networks 19(10), pp.1624-1635, 2006.
  • A. Vellido, Assessment of an Unsupervised Feature Selection Method for Generative Topographic Mapping. 16th International

Conference on Artificial Neural Networks (ICANN), Athens, Greece. LNCS Vol.4132, pp.361-370, 2006.

  • A. Vellido, P.J.G. Lisboa, D. Vicente, Robust analysis of MRS brain tumour data using t-GTM, Neurocomputing, 69(7-9),

pp.754-768, 2006.

  • A. Vellido, J.D. Mart´

ın, F. Rossi, P.J.G. Lisboa, Seeing is believing: The importance of visualization in real-world machine learning applications, In M. Verleysen, editor, Proceedings of European Symposium on Artificial Neural Networks (ESANN), pp.219-226, Bruges, Belgium, 2011.

  • A. Vellido, J.D. Mart˜

An-Guerrero, P.J.G. Lisboa, Making machine learning models interpretable. Proceedings of European Symposium on Artificial Neural Networks (ESANN), pp.163-172, 2012. Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido
slide-42
SLIDE 42

Introduction Methods Experiments Cartograms representations for GTM and its variants Results

THANK YOU - QUESTIONS?

Alessandra Tosi - http://www.lsi.upc.edu/∼atosi/ atosi@lsi.upc.edu Alfredo Vellido - http://www.lsi.upc.edu/∼avellido/ avellido@lsi.upc.edu

Robust cartogram visualization of outliers in manifold leaning

  • A. Tosi and A. Vellido