Biplots: Taking Stock John Gower Mathematics Department The Open - - PowerPoint PPT Presentation

biplots taking stock
SMART_READER_LITE
LIVE PREVIEW

Biplots: Taking Stock John Gower Mathematics Department The Open - - PowerPoint PPT Presentation

Biplots: Taking Stock John Gower Mathematics Department The Open University Milton Keynes, U.K. Scope Biplots simultaneously display two kinds of information; typically, the variables (categorical and numerical) and sample units described by


slide-1
SLIDE 1

Biplots: Taking Stock

John Gower Mathematics Department The Open University Milton Keynes, U.K.

slide-2
SLIDE 2

Scope

Biplots simultaneously display two kinds of information; typically, the variables (categorical and numerical) and sample units described by a multivariate data matrix or the items labeling the rows and columns of a two-way table. Approximation is important. Biplots are useful for visualizing multidimensional analyses, e.g., principal component analysis, canonical variate analysis, multidimensional scaling, multiplicative interactions and correspondence analysis – and many more.

slide-3
SLIDE 3
slide-4
SLIDE 4

Ptolemy from Wikipaedia

  • The first part of the Geographia is a discussion of the data and of

the methods he used. Ptolemy put all this information into a grand

  • scheme. Following Marinos, he assigned coordinates to all the

places and geographic features he knew, in a grid that spanned the

  • globe. Latitude was measured from the equator, as it is today, but

Ptolemy preferred in book 8 to express it as the length of the longest day rather than degrees of arc (the length of the midsummer day increases from 12h to 24h as one goes from the equator to the polar circle). In books 2 through 7, he used degrees and put the meridian

  • f 0 longitude at the most western land he knew, the "Blessed

Islands", probably the Cape Verde islands (not the Canary Islands, as long accepted) as suggested by the location of the six dots labeled the "FORTUNATA" islands near the left extreme of the blue sea of Ptolemy's map here reproduced.

slide-5
SLIDE 5
  • Geography
  • Main article: Geographia (Ptolemy)
  • Ptolemy's other main work is his Geographia. This also is a compilation of what was

known about the world's geography in the Roman Empire during his time. He relied somewhat on the work of an earlier geographer, Marinos of Tyre, and on gazetteers

  • f the Roman and ancient Persian Empire, but most of his sources beyond the

perimeter of the Empire were unreliable.[citation needed]

  • The first part of the Geographia is a discussion of the data and of the methods he
  • used. As with the model of the solar system in the Almagest, Ptolemy put all this

information into a grand scheme. Following Marinos, he assigned coordinates to all the places and geographic features he knew, in a grid that spanned the globe. Latitude was measured from the equator, as it is today, but Ptolemy preferred in book 8 to express it as the length of the longest day rather than degrees of arc (the length

  • f the midsummer day increases from 12h to 24h as one goes from the equator to the

polar circle). In books 2 through 7, he used degrees and put the meridian of 0 longitude at the most western land he knew, the "Blessed Islands", probably the Cape Verde islands (not the Canary Islands, as long accepted) as suggested by the location of the six dots labelled the "FORTUNATA" islands near the left extreme of the blue sea of Ptolemy's map here reproduced.

slide-6
SLIDE 6
slide-7
SLIDE 7

Ptolemy and Biplots

  • Ptolemy’s maps have all the ingrediants of

a biplot: (a) Points representing cities (b) Lines representing latitude/longitude coordinates (c) Approximation N.B. Ptolemy seems to have recognised that the world is a globe.

slide-8
SLIDE 8
slide-9
SLIDE 9

Descartes from Wikipaedia

This system was developed in 1637 in two writings by Descartes and independently by Pierre de Fermat, (unpublished). Descartes introduces the new idea of specifying the position of a point or object on a surface, using two intersecting axes as measuring

  • guides. In La Géométrie, he further explores the

above-mentioned concepts. Some note that the master artists of the Renaissance used a grid, in the form of a wire mesh, as a tool for breaking up the component parts of their subjects they painted. This may have influenced Descartes. Nicole Oresme, a French philosopher of the 14th Century, used constructions similar to Cartesian coordinates.

slide-10
SLIDE 10

Types of approximation

  • 1. Through the least-squares properties of

the singular-value decomposition. 2. Representing “cases” by any form of MDS and then superimposing the “variables” either by: (i) The regression method (ii) by superimposing nonlinear trajectories (nonlinear biplots)

slide-11
SLIDE 11

Singular Value Decomposition

  • SVD X = UΣV'

Beltrami (1873), Jordan (1874), Sylvester (1889)

  • Approximation: min where rank = r

= UΣJV' Eckart and Young (1936)

  • Ruben Gabriel (1971)

Biplots PCA and Two way Table, Diagnostic biplots Canonical Biplots

  • Paul Horst ??

X ˆ

X ˆ

2

X X ˆ 

slide-12
SLIDE 12

Paul Horst: Willem Heiser

  • What I recall is that I said Paul Horst had written about direct

decomposition of data matrices, rather than defining factor analysis in the usual way, as decomposition of secondary structures such as the correlation matrix. He was a man of numbers, not plots, so I don't believe he actually plotted the coordinates obtained (as least, I couldn't find that back).

  • See his book "Factor Analysis of Data Matrices" (Holt, Rinehart &

Winston, 1965), and note the "Data Matrices" in the title. In chapter 3 he introduces the singular value decomposition, and in chapter 4 the decomposition of the data matrix as the basic one from which

  • ther methods follow. Chapter 22 gives an interesting treatment on

homogeneity analysis, 10 years before Gifi started to make a whole system out of it. To my own surprise, it also contains a method to eliminate the horseshoe effect in multiple correspondence analysis!

slide-13
SLIDE 13

Analysis and Presentation

  • Biplots are not a method of analysis –

merely a way of presenting (graphically) the results of a variety of multidimensional methods of analysis.

  • Although I shall not be discussing methods
  • f analysis I note that the main differences

between methods is often one of the initial transformation of the data X.

slide-14
SLIDE 14

Initial transformations

         

Q I X P I Q I X P I X XS N I     

log , log

2 1

) (

1 1

2 1 2 1 2 1 2 1

  • r

     

XC R XC R XC R

1 2

iables var canonical

     XV X V V W Centre and scale PCA Remove “main effects” Biadditive Pearson Residuals CA Row/Col chi-square distance CA Within-group dispersion spectral decomposition CVA Dispersion FA Dissimilarity MDS Constrained regression Rank,CANOCO

X X X X  

XB XB XB Y XB Y     

slide-15
SLIDE 15

Types of “Scale”

1 2 3 4 5 1 2 3 4 6 5 1 2 3 4 5 7 6 small medium big

small medium big

(a) (b) (c) (d) (e) (f) (g) 1 2 3 4 5

slide-16
SLIDE 16

Tools of interpretation

  • The inner-product
  • Distance
  • Area
  • Angle
  • Orthogonal Projection
  • Nearness
  • Convex regions
  • Content
slide-17
SLIDE 17

How to represent inner-products

  • Two sets of points (or vectors) aibjcos(θij)
  • One set of points plus calibrated axes
  • Areas aibjcos(θij) = aibjsin(θij+½π)

i.e rotate one set of points through 90 degrees. Published Biplots need clear indications of what choice has been made (Cartouche).

slide-18
SLIDE 18

A1

49 49 50

A2

79 79.2

A3

0.8 0.85 0.9 0.95 1 1.05 1.1 1.1 1.15 1.2

A4

1.4 1.5 1.6 1.7 1.8

A5

20.5 21 21.5 22

B5

20.5 21 21.5 22

C6

12 14 16 18

C7

56 58 58 60 62 62 64 66 66

C4

2.5 2.5 3 3.5 4 4.5

D6

26 27 28 29 30

D7

43 43 44 45 45

D4

4.6 4.8 5 5.2 5.4 5.6

C5

20 20.5 21

E5

14.2 14.3 14.3 14.4 14.5 14.5

C8

29.5 30 30 30.5 31 31.5 32 32 32.5

Jan00 Feb00 Mar00 Apr00 May00 Jun00 Jul00 Aug00 Sep00 Oct00 Nov00 Dec00 Jan01 Feb01 Mar01

Target

A5 B5 Poor quality Good quality Satisfactory quality

PCA biplot: Process Quality Control

slide-19
SLIDE 19

Orthogonal Parallel Translation of Axes

1 2 3 5 6 1 2 3 4 5 6 4 (a,b) O

slide-20
SLIDE 20

PCA biplot of 4 variables and 23 cases

SPR

2 4 6 8

RGF

3 4 5 6

PLF

0.1 0.2 0.3

SLF

  • 2
  • 1

1 2 3 4 5 6

a b c d e f g h i j k m n p q r s t u v w

slide-21
SLIDE 21

Orthogonal Parallel Shift and Rotation

SPR

2 4 6 8 10

RGF

2 3 4 5 6

PLF

  • 0.1
0.1 0.2 0.3 0.4 0.4

SLF

  • 4
  • 3
  • 2
  • 1
1 2 3 4 5 6

a b c d e f g h i j k m n p q r s t u v w

slide-22
SLIDE 22

Σ -Scaling

Plots based on the inner-product given by give Row-pointspAr=U and Column-points qBr =V with α+β = 1,usually. Choices are: (i) α = 1, β = 0 or β = 1, α =0 Preserves I-P; suitable for showing approximations to row or column distances (ii) α = 1, β = 1 I-P not preserved, suitable for showing simultaneous row and column distances (iii) α = ½, β = ½ Preserves I-P; suitable for showing symmetric approximations to

SVD X = UΣV'

X ˆ

slide-23
SLIDE 23

λ -Scaling

When plotting points given by the rows of pAr and

qBr one set may have much greater dispersion

than the other. This can be remedied as follows: first observe that AB' = (Aλ)(B'/λ) leaves the inner product unchanged. This simple fact may be used to improve the look of the

  • display. One way of choosing λ is to arrange

that the average squared distance of the points in Aλ and B/λ are the same. This requires:

A B B A q p q p

4 2 2

/ / /  

  

slide-24
SLIDE 24

General Scaling

  • Note that λ could be replaced by any

nonsingular matrix L but this does not seem to be helpful, except perhaps when L is diagonal.

slide-25
SLIDE 25

Nonlinear Biplot (Clarke’s distance)

2 4 6

SPR

4 4.5 5

RGF

0.2 0.3 0.4 0.5

PLF

0.5 1 1.5 2 2.5 3

SLF

a b c d e f g h i j k m n p q r s t u v w

slide-26
SLIDE 26

Circular Prediction

2 4 6

SPR

4 4.5 5

RGF

0.2 0.3 0.4 0.5

PLF

0.5 1 1.5 2 2.5 3

SLF

a b c d e f g h i j k m n p q r s t u v w

slide-27
SLIDE 27

Category Level Points (CLPs)

Quantitative variables define a continuum of values represented by continuous, often linear, axes. Every dummy variable of an indicator matrix takes 0/1 values, so the idea of coordinate axes requires modification. take a finite number of The Lk levels of a categorical variable may be represented by Lk points, termed the Category Level points (CLPs). Just as a sample with a value xk is closer to the marker for xk on the kth axes, so must a sample with a particular category level be closest to the CLP for that category level. For the EMC the CLPs for all L category levels are given by the rows

  • f the unit matrix I. For MCA with category frequencies L, the

CLPs are given by the rows of the matrix ; for CVA the CLPs are the coordinates of the canonical means.

2 / 1 1 

L

p

slide-28
SLIDE 28

Prediction Regions

  • The CLPs give an exact representation in

a high-dimensional space.

  • In biplot approximations the points nearest

the different CLPs define convex neighbour – regions called Prediction Regions.

slide-29
SLIDE 29

Prediction regions for MCA

Gender

George Alisdair Jane Iv or My f anwy Harriet Jeremy F M

Hair

George Alisdair Jane Iv or My f anwy Harriet Jeremy Fair Grey Brown Dark

Region

George Alisdair Jane Iv or My f anwy Harriet Jeremy England Scotland Wales

Education

George Alisdair Jane Iv or My f anwy Harriet Jeremy School Univ ersity Postgrad

Work

George Alisdair Jane Iv or My f anwy Harriet Jeremy Manual Clerical Prof essional
slide-30
SLIDE 30

Based on EMC, not Chi2

Gender

George Alisdair Jane Iv or My f anwy Harriet Jeremy F M

Hair

George Alisdair Jane Iv or My f anwy Harriet Jeremy Fair Grey Brown Dark

Region

George Alisdair Jane Iv or My f anwy Harriet Jeremy England Scotland Wales

Work

George Alisdair Jane Iv or My f anwy Harriet Jeremy Manual Clerical Prof essional

Education

George Alisdair Jane Iv or My f anwy Harriet Jeremy School Univ ersity Postgrad
slide-31
SLIDE 31

Interesting combinatorial problem

  • Can we devise a configuration of the

cases and a set of convex prediction regions that maximises the number of correct predictions?

slide-32
SLIDE 32

CVA biplot with prediction regions

X1

32 32.5 33 33.5

X2

5000 5500 6000 6500 7000 7500

X3

200 300 400 500 600

X4

  • 5.8
  • 5.6
  • 5.4
  • 5.2
  • 5
  • 4.8

X5

5 10 15 20 25

X6

  • 20
  • 15
  • 10
  • 5

X7

  • 20000
  • 10000

10000 20000 30000 40000 40000

X8

0.9 0.95 1 1.05 1.1

slide-33
SLIDE 33

Linear axes (3) and Category Regions(3)

1 1.5 2 2.5 3 BuyPrevious 1 2 3 HouseholdSize 1 2 NumCars

Legend for Q1 answers here

1 1.5 2 2.5 3 BuyPrevious 1 2 3 HouseholdSize 1 2 NumCars

Legend for AgeCathere

1 1.5 2 2.5 3 BuyPrevious 1 2 3 HouseholdSize 1 2 NumCars 1 1.5 2 2.5 3 BuyPrevious 1 2 3 HouseholdSize 1 2 NumCars

Legend for IncomeGrphere Legend for EducLevelhere

slide-34
SLIDE 34

Nonlinear PCA - ordered categorical variables, rotations and shifts

Nointegrate Support Crime Economy Nojobs Improve C

  • s

t s Citizenborn C i t i z e n p a r e n t s Rights Illegal

slide-35
SLIDE 35

Accuracy of presentation

  • Overall fit: percent variance accounted for
  • Adequacy: percent V accounted for per

variable (Circle diagrams)

  • Predictivity: SS of difference between
  • bserved and fitted values for each

variable as a proportion of the true value.

  • Proportion of correct category levels

appearing in prediction regions.

slide-36
SLIDE 36

Adequacy Circle Diagram

0.2 0.2 0.4 0.6 0.8 1 0.2 0.2 0.4 0.6 0.8 1 0.2 0.2 0.4 0.6 0.2 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.2 0.2 0.4 0.6 0.8 1

RMOF RMP BMSWA BMCFF BMCFD EXFFBB

slide-37
SLIDE 37

CVA showing axis predictivities

X 1 (0.95)

3 2 3 2 .5 3 3 3 3 .5

X2 (0.92)

5 0 0 0 5 5 0 0 6 0 0 0 6 5 0 0 7 0 0 0 7 5 0 0 8 0 0 0

X3 (0.999)

1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0 4 5 0 5 0 0 5 5 0 6 0 0

X4 (0.99)

  • 5 .8
  • 5 .7
  • 5 .7
  • 5 .6
  • 5 .5
  • 5 .4
  • 5 .3
  • 5 .3
  • 5 .2
  • 5 .1
  • 5 .1
  • 5
  • 4 .9
  • 4 .8
  • 4 .7
  • 4 .7
  • 4 .6
  • 4 .5
  • 4 .5

X5 (0.99)

  • 5

5 1 0 1 5 2 0 2 5

X6 (0.99)

  • 2 2
  • 2 0
  • 1 8
  • 1 6
  • 1 4
  • 1 2
  • 1 0
  • 8
  • 6
  • 4
  • 2
  • 2

2 4

X7 (0.99)

  • 1 0 0 0 0
  • 5 0 0 0

5 0 0 0 1 0 0 0 0 1 5 0 0 0 2 0 0 0 0 2 5 0 0 0 3 0 0 0 0 3 5 0 0 0 4 0 0 0 0 4 5 0 0 0 5 0 0 0 0 5 5 0 0 0

X8 (0.49)

1 1 .1

1; n = 55 2; n = 96 3; n = 39 4; n = 210 5; n = 97

slide-38
SLIDE 38

Triplots for Triadditive models??

This determinant represents the content of a tetrahedron with the origin and given row- coordinates as vertices.

            c c b b a a c b a c b a

1 k 2 k 2 j 1 j 1 i 2 i 2 k 2 j 2 i 1 k 1 j 1 i

det

slide-39
SLIDE 39

Triadditive rank 2 geometry ai1bj1ck1+ ai2bj2ck2

bj1,bj2 ai1,ai2 ck1,ck2 c b

slide-40
SLIDE 40

Rank 3 triadditive geometry – Sum of three tetrahedra!

b2,b3,b1 a1,a2,a3 c b O c3,c1,c2 0c1c2 c3c10 a1a20 b2b30 c30c2 b20b1 a10a3 0b3b1 0a2a3

slide-41
SLIDE 41

John Gower and Son's wagon, 1910. Even then the Gower wagon was drawn by TWO horses