Biplots: Taking Stock John Gower Mathematics Department The Open - - PowerPoint PPT Presentation
Biplots: Taking Stock John Gower Mathematics Department The Open - - PowerPoint PPT Presentation
Biplots: Taking Stock John Gower Mathematics Department The Open University Milton Keynes, U.K. Scope Biplots simultaneously display two kinds of information; typically, the variables (categorical and numerical) and sample units described by
Scope
Biplots simultaneously display two kinds of information; typically, the variables (categorical and numerical) and sample units described by a multivariate data matrix or the items labeling the rows and columns of a two-way table. Approximation is important. Biplots are useful for visualizing multidimensional analyses, e.g., principal component analysis, canonical variate analysis, multidimensional scaling, multiplicative interactions and correspondence analysis – and many more.
Ptolemy from Wikipaedia
- The first part of the Geographia is a discussion of the data and of
the methods he used. Ptolemy put all this information into a grand
- scheme. Following Marinos, he assigned coordinates to all the
places and geographic features he knew, in a grid that spanned the
- globe. Latitude was measured from the equator, as it is today, but
Ptolemy preferred in book 8 to express it as the length of the longest day rather than degrees of arc (the length of the midsummer day increases from 12h to 24h as one goes from the equator to the polar circle). In books 2 through 7, he used degrees and put the meridian
- f 0 longitude at the most western land he knew, the "Blessed
Islands", probably the Cape Verde islands (not the Canary Islands, as long accepted) as suggested by the location of the six dots labeled the "FORTUNATA" islands near the left extreme of the blue sea of Ptolemy's map here reproduced.
- Geography
- Main article: Geographia (Ptolemy)
- Ptolemy's other main work is his Geographia. This also is a compilation of what was
known about the world's geography in the Roman Empire during his time. He relied somewhat on the work of an earlier geographer, Marinos of Tyre, and on gazetteers
- f the Roman and ancient Persian Empire, but most of his sources beyond the
perimeter of the Empire were unreliable.[citation needed]
- The first part of the Geographia is a discussion of the data and of the methods he
- used. As with the model of the solar system in the Almagest, Ptolemy put all this
information into a grand scheme. Following Marinos, he assigned coordinates to all the places and geographic features he knew, in a grid that spanned the globe. Latitude was measured from the equator, as it is today, but Ptolemy preferred in book 8 to express it as the length of the longest day rather than degrees of arc (the length
- f the midsummer day increases from 12h to 24h as one goes from the equator to the
polar circle). In books 2 through 7, he used degrees and put the meridian of 0 longitude at the most western land he knew, the "Blessed Islands", probably the Cape Verde islands (not the Canary Islands, as long accepted) as suggested by the location of the six dots labelled the "FORTUNATA" islands near the left extreme of the blue sea of Ptolemy's map here reproduced.
Ptolemy and Biplots
- Ptolemy’s maps have all the ingrediants of
a biplot: (a) Points representing cities (b) Lines representing latitude/longitude coordinates (c) Approximation N.B. Ptolemy seems to have recognised that the world is a globe.
Descartes from Wikipaedia
This system was developed in 1637 in two writings by Descartes and independently by Pierre de Fermat, (unpublished). Descartes introduces the new idea of specifying the position of a point or object on a surface, using two intersecting axes as measuring
- guides. In La Géométrie, he further explores the
above-mentioned concepts. Some note that the master artists of the Renaissance used a grid, in the form of a wire mesh, as a tool for breaking up the component parts of their subjects they painted. This may have influenced Descartes. Nicole Oresme, a French philosopher of the 14th Century, used constructions similar to Cartesian coordinates.
Types of approximation
- 1. Through the least-squares properties of
the singular-value decomposition. 2. Representing “cases” by any form of MDS and then superimposing the “variables” either by: (i) The regression method (ii) by superimposing nonlinear trajectories (nonlinear biplots)
Singular Value Decomposition
- SVD X = UΣV'
Beltrami (1873), Jordan (1874), Sylvester (1889)
- Approximation: min where rank = r
= UΣJV' Eckart and Young (1936)
- Ruben Gabriel (1971)
Biplots PCA and Two way Table, Diagnostic biplots Canonical Biplots
- Paul Horst ??
X ˆ
X ˆ
2
X X ˆ
Paul Horst: Willem Heiser
- What I recall is that I said Paul Horst had written about direct
decomposition of data matrices, rather than defining factor analysis in the usual way, as decomposition of secondary structures such as the correlation matrix. He was a man of numbers, not plots, so I don't believe he actually plotted the coordinates obtained (as least, I couldn't find that back).
- See his book "Factor Analysis of Data Matrices" (Holt, Rinehart &
Winston, 1965), and note the "Data Matrices" in the title. In chapter 3 he introduces the singular value decomposition, and in chapter 4 the decomposition of the data matrix as the basic one from which
- ther methods follow. Chapter 22 gives an interesting treatment on
homogeneity analysis, 10 years before Gifi started to make a whole system out of it. To my own surprise, it also contains a method to eliminate the horseshoe effect in multiple correspondence analysis!
Analysis and Presentation
- Biplots are not a method of analysis –
merely a way of presenting (graphically) the results of a variety of multidimensional methods of analysis.
- Although I shall not be discussing methods
- f analysis I note that the main differences
between methods is often one of the initial transformation of the data X.
Initial transformations
Q I X P I Q I X P I X XS N I
log , log
2 1
) (
1 1
2 1 2 1 2 1 2 1
- r
XC R XC R XC R
1 2
iables var canonical
XV X V V W Centre and scale PCA Remove “main effects” Biadditive Pearson Residuals CA Row/Col chi-square distance CA Within-group dispersion spectral decomposition CVA Dispersion FA Dissimilarity MDS Constrained regression Rank,CANOCO
X X X X
XB XB XB Y XB Y
Types of “Scale”
1 2 3 4 5 1 2 3 4 6 5 1 2 3 4 5 7 6 small medium big
small medium big
(a) (b) (c) (d) (e) (f) (g) 1 2 3 4 5
Tools of interpretation
- The inner-product
- Distance
- Area
- Angle
- Orthogonal Projection
- Nearness
- Convex regions
- Content
How to represent inner-products
- Two sets of points (or vectors) aibjcos(θij)
- One set of points plus calibrated axes
- Areas aibjcos(θij) = aibjsin(θij+½π)
i.e rotate one set of points through 90 degrees. Published Biplots need clear indications of what choice has been made (Cartouche).
A1
49 49 50
A2
79 79.2
A3
0.8 0.85 0.9 0.95 1 1.05 1.1 1.1 1.15 1.2
A4
1.4 1.5 1.6 1.7 1.8
A5
20.5 21 21.5 22
B5
20.5 21 21.5 22
C6
12 14 16 18
C7
56 58 58 60 62 62 64 66 66
C4
2.5 2.5 3 3.5 4 4.5
D6
26 27 28 29 30
D7
43 43 44 45 45
D4
4.6 4.8 5 5.2 5.4 5.6
C5
20 20.5 21
E5
14.2 14.3 14.3 14.4 14.5 14.5
C8
29.5 30 30 30.5 31 31.5 32 32 32.5
Jan00 Feb00 Mar00 Apr00 May00 Jun00 Jul00 Aug00 Sep00 Oct00 Nov00 Dec00 Jan01 Feb01 Mar01
Target
A5 B5 Poor quality Good quality Satisfactory quality
PCA biplot: Process Quality Control
Orthogonal Parallel Translation of Axes
1 2 3 5 6 1 2 3 4 5 6 4 (a,b) O
PCA biplot of 4 variables and 23 cases
SPR
2 4 6 8
RGF
3 4 5 6
PLF
0.1 0.2 0.3
SLF
- 2
- 1
1 2 3 4 5 6
a b c d e f g h i j k m n p q r s t u v w
Orthogonal Parallel Shift and Rotation
SPR
2 4 6 8 10RGF
2 3 4 5 6PLF
- 0.1
SLF
- 4
- 3
- 2
- 1
a b c d e f g h i j k m n p q r s t u v w
Σ -Scaling
Plots based on the inner-product given by give Row-pointspAr=U and Column-points qBr =V with α+β = 1,usually. Choices are: (i) α = 1, β = 0 or β = 1, α =0 Preserves I-P; suitable for showing approximations to row or column distances (ii) α = 1, β = 1 I-P not preserved, suitable for showing simultaneous row and column distances (iii) α = ½, β = ½ Preserves I-P; suitable for showing symmetric approximations to
SVD X = UΣV'
X ˆ
λ -Scaling
When plotting points given by the rows of pAr and
qBr one set may have much greater dispersion
than the other. This can be remedied as follows: first observe that AB' = (Aλ)(B'/λ) leaves the inner product unchanged. This simple fact may be used to improve the look of the
- display. One way of choosing λ is to arrange
that the average squared distance of the points in Aλ and B/λ are the same. This requires:
A B B A q p q p
4 2 2
/ / /
General Scaling
- Note that λ could be replaced by any
nonsingular matrix L but this does not seem to be helpful, except perhaps when L is diagonal.
Nonlinear Biplot (Clarke’s distance)
2 4 6
SPR
4 4.5 5
RGF
0.2 0.3 0.4 0.5
PLF
0.5 1 1.5 2 2.5 3
SLF
a b c d e f g h i j k m n p q r s t u v w
Circular Prediction
2 4 6
SPR
4 4.5 5
RGF
0.2 0.3 0.4 0.5
PLF
0.5 1 1.5 2 2.5 3
SLF
a b c d e f g h i j k m n p q r s t u v w
Category Level Points (CLPs)
Quantitative variables define a continuum of values represented by continuous, often linear, axes. Every dummy variable of an indicator matrix takes 0/1 values, so the idea of coordinate axes requires modification. take a finite number of The Lk levels of a categorical variable may be represented by Lk points, termed the Category Level points (CLPs). Just as a sample with a value xk is closer to the marker for xk on the kth axes, so must a sample with a particular category level be closest to the CLP for that category level. For the EMC the CLPs for all L category levels are given by the rows
- f the unit matrix I. For MCA with category frequencies L, the
CLPs are given by the rows of the matrix ; for CVA the CLPs are the coordinates of the canonical means.
2 / 1 1
L
p
Prediction Regions
- The CLPs give an exact representation in
a high-dimensional space.
- In biplot approximations the points nearest
the different CLPs define convex neighbour – regions called Prediction Regions.
Prediction regions for MCA
Gender
George Alisdair Jane Iv or My f anwy Harriet Jeremy F MHair
George Alisdair Jane Iv or My f anwy Harriet Jeremy Fair Grey Brown DarkRegion
George Alisdair Jane Iv or My f anwy Harriet Jeremy England Scotland WalesEducation
George Alisdair Jane Iv or My f anwy Harriet Jeremy School Univ ersity PostgradWork
George Alisdair Jane Iv or My f anwy Harriet Jeremy Manual Clerical Prof essionalBased on EMC, not Chi2
Gender
George Alisdair Jane Iv or My f anwy Harriet Jeremy F MHair
George Alisdair Jane Iv or My f anwy Harriet Jeremy Fair Grey Brown DarkRegion
George Alisdair Jane Iv or My f anwy Harriet Jeremy England Scotland WalesWork
George Alisdair Jane Iv or My f anwy Harriet Jeremy Manual Clerical Prof essionalEducation
George Alisdair Jane Iv or My f anwy Harriet Jeremy School Univ ersity PostgradInteresting combinatorial problem
- Can we devise a configuration of the
cases and a set of convex prediction regions that maximises the number of correct predictions?
CVA biplot with prediction regions
X1
32 32.5 33 33.5
X2
5000 5500 6000 6500 7000 7500
X3
200 300 400 500 600
X4
- 5.8
- 5.6
- 5.4
- 5.2
- 5
- 4.8
X5
5 10 15 20 25
X6
- 20
- 15
- 10
- 5
X7
- 20000
- 10000
10000 20000 30000 40000 40000
X8
0.9 0.95 1 1.05 1.1
Linear axes (3) and Category Regions(3)
1 1.5 2 2.5 3 BuyPrevious 1 2 3 HouseholdSize 1 2 NumCarsLegend for Q1 answers here
1 1.5 2 2.5 3 BuyPrevious 1 2 3 HouseholdSize 1 2 NumCarsLegend for AgeCathere
1 1.5 2 2.5 3 BuyPrevious 1 2 3 HouseholdSize 1 2 NumCars 1 1.5 2 2.5 3 BuyPrevious 1 2 3 HouseholdSize 1 2 NumCarsLegend for IncomeGrphere Legend for EducLevelhere
Nonlinear PCA - ordered categorical variables, rotations and shifts
Nointegrate Support Crime Economy Nojobs Improve C
- s
t s Citizenborn C i t i z e n p a r e n t s Rights Illegal
Accuracy of presentation
- Overall fit: percent variance accounted for
- Adequacy: percent V accounted for per
variable (Circle diagrams)
- Predictivity: SS of difference between
- bserved and fitted values for each
variable as a proportion of the true value.
- Proportion of correct category levels
appearing in prediction regions.
Adequacy Circle Diagram
0.2 0.2 0.4 0.6 0.8 1 0.2 0.2 0.4 0.6 0.8 1 0.2 0.2 0.4 0.6 0.2 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.2 0.2 0.4 0.6 0.8 1
RMOF RMP BMSWA BMCFF BMCFD EXFFBB
CVA showing axis predictivities
X 1 (0.95)
3 2 3 2 .5 3 3 3 3 .5
X2 (0.92)
5 0 0 0 5 5 0 0 6 0 0 0 6 5 0 0 7 0 0 0 7 5 0 0 8 0 0 0
X3 (0.999)
1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0 4 5 0 5 0 0 5 5 0 6 0 0
X4 (0.99)
- 5 .8
- 5 .7
- 5 .7
- 5 .6
- 5 .5
- 5 .4
- 5 .3
- 5 .3
- 5 .2
- 5 .1
- 5 .1
- 5
- 4 .9
- 4 .8
- 4 .7
- 4 .7
- 4 .6
- 4 .5
- 4 .5
X5 (0.99)
- 5
5 1 0 1 5 2 0 2 5
X6 (0.99)
- 2 2
- 2 0
- 1 8
- 1 6
- 1 4
- 1 2
- 1 0
- 8
- 6
- 4
- 2
- 2
2 4
X7 (0.99)
- 1 0 0 0 0
- 5 0 0 0
5 0 0 0 1 0 0 0 0 1 5 0 0 0 2 0 0 0 0 2 5 0 0 0 3 0 0 0 0 3 5 0 0 0 4 0 0 0 0 4 5 0 0 0 5 0 0 0 0 5 5 0 0 0
X8 (0.49)
1 1 .1
1; n = 55 2; n = 96 3; n = 39 4; n = 210 5; n = 97
Triplots for Triadditive models??
This determinant represents the content of a tetrahedron with the origin and given row- coordinates as vertices.
c c b b a a c b a c b a
1 k 2 k 2 j 1 j 1 i 2 i 2 k 2 j 2 i 1 k 1 j 1 i
det
Triadditive rank 2 geometry ai1bj1ck1+ ai2bj2ck2
bj1,bj2 ai1,ai2 ck1,ck2 c b
Rank 3 triadditive geometry – Sum of three tetrahedra!
b2,b3,b1 a1,a2,a3 c b O c3,c1,c2 0c1c2 c3c10 a1a20 b2b30 c30c2 b20b1 a10a3 0b3b1 0a2a3