Visual Displays. Some evidence through artificial and real data K. - - PowerPoint PPT Presentation

visual displays some evidence through artificial and real
SMART_READER_LITE
LIVE PREVIEW

Visual Displays. Some evidence through artificial and real data K. - - PowerPoint PPT Presentation

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Visual Displays. Some evidence through artificial and real data K. Fern andez-Aguirre M.A. Gar n-Mart


slide-1
SLIDE 1

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors

Visual Displays. Some evidence through artificial and real data

  • K. Fern´

andez-Aguirre M.A. Gar´ ın-Mart´ ın

  • J. I. Modro˜

no-Herr´ an

karmele.fernandez@ehu.es

University of the Basque Country (UPV/EHU), Bilbao, Spain

CARME 2011, Rennes, February 8-11

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-2
SLIDE 2

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors

Contents

1

Presentation & Motivation

2

Matrix associated with a symmetric graph Analytic study Experimental study

3

Case of actual data: number of employees in economic sectors Matrix of quantitative variables: PCA Contingency or frequency table: two-way CA Clustering

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-3
SLIDE 3

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors

Presentation

Displaying and exploring data

Principal Component Analysis (PCA) (quantitative variables) and simple or multiple Correspondence Analysis (CA) (categorical variables) are useful for the identification of structures in the data through interesting graphical visualizations However, some kinds of data sets could be treated alternatively by PCA or CA. For both methods, the clustering would be complementary in the exploration of the data These methods are applied in almost all areas of knowledge where predilection for each of them is variable

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-4
SLIDE 4

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors

Motivation I

Displaying and exploring data

In certain areas in particular, it is still frequent the treatment

  • f categorical variables as if they were continuous, due to the

great influence of the classic school, see Gifi (1990) A data matrix that contains the number of employees in different economic sectors for the countries of the European Union could be treated alternatively by PCA or two-way CA There are another examples (data from surveys, textual data...) that could be treated alternatively by two or more methods Different results depending on the characteristics and the properties of each method.

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-5
SLIDE 5

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors

Motivation II

Displaying and exploring data

Our emphasis in the following discussion is on methods, such as PCA and CA, and visual displays.This paper has two parts In the first part, we analitically study the case of a binary matrix M associated to a symmetric graph G (Octagon), also valid for the cases of high dimensionality graphs. Lebart et al. (1998) shows the case of a Chessboard (square lattice grid) In the second part, we present a case of actual data on the distribution of employees in different economic sectors for the countries of the European Union

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-6
SLIDE 6

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Analytic study Experimental study

Symmetric graphs

Symetric undirected graphs with a central vertex

We analytically study the Octagonal graph and get a conclusion which is also valuable for the Dodecagonal and the Hexadecagonal graphs And show the superiority of CA for the reconstitution and visualization of a M matrix associated with a G symmetric graph over the visualization obtained with PCA

1

Octagonal shaped graph

2

Dodecagonal shaped graph

3

Hexadecagonal shaped graph

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-7
SLIDE 7

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Analytic study Experimental study

Octagonal undirected graph

(mext = 8 + mint = 5) vertices and 23 edges

2 1 1 3 4 5 6 8 7 9 10 11 12 13

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-8
SLIDE 8

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Analytic study Experimental study

Octagonal undirected graph

Numerical coding of the graph

Two vertices are adjacent if there is an edge joining them We consider each vertex as adjacent to itself The associated M binary matrix contains the value 1 in position (i, j) if vertices i and j are adjacent and 0 otherwise Since the graph is undirected each pair of adjacent vertices appears twice

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-9
SLIDE 9

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Analytic study Experimental study

Octagonal graph

Associated M binary (mext + mint, mext + mint)=(13, 13) matrix

2 1 1 3 4 5 6 8 7 9 10 11 12 13

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-10
SLIDE 10

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Analytic study Experimental study

CA of (mext + mint, mext + mint)=(13, 13) M binary matrix

Eigenvalues and eigenvectors: (MN−1)2us = λsus where N is a same order diagonal matrix as M with nii (adjacency degree) as diagonal elements and N−1M is the row or column profile matrix tr(N−1M)2 = mext + mint + 1 5 =

  • s≥1

λs In the analysis relative to the center of gravity:

  • s>1

λs = mext + mint − 4 5

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-11
SLIDE 11

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Analytic study Experimental study

Octagonal graph: CA of (mext + mint, mext + mint) binary matrix

The inertia rate explained by the s-th axis: τs = λs

  • s>1

λs = 5 mext + mint − 4λs, 0 < λs < 1 Conclusion: The inertia explained by the subspace that approximates the initial structure can be made as small as needed, simply by increasing the number of vertices of the graph

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-12
SLIDE 12

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Analytic study Experimental study

Octagonal undirected graph

(mext = 8 + mint = 5) vertices and 23 edges

2 1 1 3 4 5 6 8 7 9 10 11 12 13

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-13
SLIDE 13

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Analytic study Experimental study

Octagonal graph

Reconstitution and visualization: PCA versus CA

Figure: PCA Figure: CA

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-14
SLIDE 14

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Analytic study Experimental study

Dodecagonal undirected graph

edges and vertices

1 4 3 2 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 25 24

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-15
SLIDE 15

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Analytic study Experimental study

Dodecagonal graph

Reconstitution and visualization: PCA versus CA

Figure: PCA Figure: CA

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-16
SLIDE 16

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Analytic study Experimental study

Hexadecagonal undirected graph

edges and vertices

1 2 3 4 5 6 7 8 9 10 11 12 16 15 14 13 17 18 19 20 21 22 23 24 25 28 29 26 27 30 31 32 33 35 36 37 34 38 39 40 41

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-17
SLIDE 17

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Analytic study Experimental study

Hexadecagonal graph

Reconstitution and visualization: PCA versus CA

Figure: PCA Figure: CA

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-18
SLIDE 18

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Matrix of quantitative variables: PCA Contingency or frequency table: two-way CA Clustering

Distribution of employees in different economic sectors

Countries of the European Union

The data measures the number of employees in Primary, Industrial and Services economic sectors for 25 countries of the European Union in 2001 The data matrix can be considered to be a matrix of quantitative variables and a PCA be applied on it OR Considered as frequency table and a two-way CA be applied

  • n it
  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-19
SLIDE 19

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Matrix of quantitative variables: PCA Contingency or frequency table: two-way CA Clustering

Brief reminder of Principal Component Analysis

In PCA the Euclidean distance between individuals is considered Two individuals are closer the smaller is the distance between them considering the centered and usually standardized variables The analysis evaluates the correlations between the variables. If they are all positive the first factor reflects a size effect Size effect: The first factor discriminates between individuals that take high values for the variables and those that take low values

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-20
SLIDE 20

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Matrix of quantitative variables: PCA Contingency or frequency table: two-way CA Clustering

Normed PCA of raw data: number of employees

Summary of the variables correlations

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-21
SLIDE 21

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Matrix of quantitative variables: PCA Contingency or frequency table: two-way CA Clustering

Normed PCA of raw data: number of employees

Position of the countries

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-22
SLIDE 22

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Matrix of quantitative variables: PCA Contingency or frequency table: two-way CA Clustering

Normed PCA of percentages of employees: row profiles

Countries profiles position and normed eigenvectors

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-23
SLIDE 23

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Matrix of quantitative variables: PCA Contingency or frequency table: two-way CA Clustering

Brief reminder of Correspondence Analysis

CA treats the rows (individuals) and columns (variables) of a table in a symmetric fashion. Many equivalent definitions. It can be considered like a double PCA Chi-squared distance between row profiles and between column profiles: Property of distributional equivalence, stability of results In CA the row-points (alt. column-points) are weighted by their relative frequency Equivalence between centered and non centered analysis

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-24
SLIDE 24

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Matrix of quantitative variables: PCA Contingency or frequency table: two-way CA Clustering

Countries x economic sectors

Countries profiles and economic sector profiles: simultaneous representation

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-25
SLIDE 25

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Matrix of quantitative variables: PCA Contingency or frequency table: two-way CA Clustering

Countries x economic sectors: Non-symmetric biplot

Countries in principal coordinates and Economic Sectors in standard coordinates ×weight

  • 0.6
  • 0.4
  • 0.2

0.0 0.2

  • 0.4
  • 0.2

0.0 0.2 0.4

Austria Belgium Cyprus Czech Republic Denmark Estonia Finland France Germany Greece Hungary Ireland Italy Latvia Lithuania Luxembourg Malta Netherlands Poland Portugal Slovakia Slovenia Spain Sweden United Kingdom Euskal Herria Primary.Sector Industrial.Sector Services.Sector

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-26
SLIDE 26

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Matrix of quantitative variables: PCA Contingency or frequency table: two-way CA Clustering

Countries x economic sectors: Non-symmetric biplot

Countries in principal coordinates and Economic Sect. in standard coordinates ×√weight

  • 1.0
  • 0.5

0.0

  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.0 0.2 0.4

Austria Belgium Cyprus Czech Republic Denmark Estonia Finland France Germany Greece Hungary Ireland Italy Latvia Lithuania Luxembourg Malta Netherlands Poland Portugal Slovakia Slovenia Spain Sweden United Kingdom Euskal Herria Primary.Sector Industrial.Sector Services.Sector

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-27
SLIDE 27

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Matrix of quantitative variables: PCA Contingency or frequency table: two-way CA Clustering

Hierarchical clustering (Ward’s criterion) over the factors

  • !"#

$ % $ & ' % ' () * !+ * ,- . ! / (+

  • (+
  • (
  • 1
  • 2
  • Euskal Herria
  • 2
  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-28
SLIDE 28

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Matrix of quantitative variables: PCA Contingency or frequency table: two-way CA Clustering

Hierarchical clustering over the factors

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-29
SLIDE 29

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Matrix of quantitative variables: PCA Contingency or frequency table: two-way CA Clustering Cluster 1 / 4 Characteristic frequencies % of frequency in set % of frequency in cluster % of cluster in frequency Test-value Probability Weight Industrial Sector 29,64 32,44 34,88 17,90 0,000 54704 Primary Sector 5,25 5,89 35,77 8,35 0,000 9691 Services Sector 65,11 61,67 30,20

  • 21,10

0,000 120166

Cluster 2 / 4

Characteristic frequencies % of frequency in set % of frequency in cluster % of cluster in frequency Test-value Probability Weight Industrial Sector 29,64 34,27 20,42 19,96 0,000 54704 Services Sector 65,11 64,27 17,43

  • 3,49

0,000 120166 Primary Sector 5,25 1,46 4,90

  • 38,67

0,000 9691

Cluster 3 / 4

Characteristic frequencies % of frequency in set % of frequency in cluster % of cluster in frequency Test-value Probability Weight Services Sector 65,11 72,48 42,12 52,27 0,000 120166 Industrial Sector 29,64 24,84 31,71

  • 35,49

0,000 54704 Primary Sector 5,25 2,68 19,32

  • 40,46

0,000 9691

Cluster 4 / 4

Characteristic frequencies % of frequency in set % of frequency in cluster % of cluster in frequency Test-value Probability Weight Primary Sector 5,25 16,64 40,02 70,35 0,000 9691 Industrial Sector 29,64 30,49 12,99 3,02 0,001 54704 Services Sector 65,11 52,87 10,26

  • 41,19

0,000 120166

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays

slide-30
SLIDE 30

Presentation & Motivation Matrix associated with a symmetric graph Case of actual data: number of employees in economic sectors Matrix of quantitative variables: PCA Contingency or frequency table: two-way CA Clustering

Concluding comments

Analytical and experimental study of the CA of the binary matrix associated with a symmetric graph: Octogonal, Dodecagonal and Hexadecagonal graphs Particular empirical evidence of the suitability of CA for visualizing symmetric graphs related to associated binary matrices We have illustrated clearly the implications of applying PCA

  • r CA to the same data from an applied point of view

PCA: the first factor opposes Services Sector to the rest while CA opposes Primary Sector and Services Sector In several cases the main interest is clustering. The results are complemented with hierarchical cluster analysis over the factor of the CA, Lebart (1994): links between CA and HCA

  • K. Fern´

andez-Aguirre, M.A. Gar´ ın-Mart´ ın, J.I. Modro˜ no Visual Displays