package package ca function function ca mjca (simple) - - PowerPoint PPT Presentation

package package ca function function ca mjca simple
SMART_READER_LITE
LIVE PREVIEW

package package ca function function ca mjca (simple) - - PowerPoint PPT Presentation

Assos Venue for CARME in ASSOS View of Aegean Sea and island of Lesbos. Turkey, August 2010. Simple correspondence analysis (CA), Simple correspondence analysis (CA), Multiple correspondence analysis (MCA), Multiple correspondence analysis


slide-1
SLIDE 1

Michael Greenacre

Universitat Pompeu Fabra Barcelona

Michael Greenacre

Universitat Pompeu Fabra Barcelona

Simple correspondence analysis (CA), Simple correspondence analysis (CA), Multiple correspondence analysis (MCA), Multiple correspondence analysis (MCA), Joint correspondence analysis (JCA), Joint correspondence analysis (JCA), as well as all subset versions of these, as well as all subset versions of these, using using R package package ca.

Oleg Nenadić & Michael Greenacre

University of Göttingen & Universitat Pompeu Fabra

View of Aegean Sea and island of Lesbos. Turkey, August 2010.

Assos Venue for CARME in ASSOS

slide-2
SLIDE 2

ca

package package

function

ca

function

mjca

(simple) correspondence analysis (CA) multiple correspondence analysis (MCA) adjusted MCA joint correspondence analysis (JCA) subset CA, subset MCA, adjusted subset MCA, subset JCA subset versions subset versions

slide-3
SLIDE 3

Contribution coordinates Contribution coordinates

  • 0.6
  • 0.4
  • 0.2

0.0 0.2 0.4 0.6

  • 0.4
  • 0.2

0.0 0.2 0.4 0.6 0.8

  • 6
  • 4
  • 2

2 4 6

  • 4
  • 2

2 4 6 8

  • • •
  • 14:0

14:1(n-5) i-15:0 a-15:0 15:0 15:1(n-6) i-16:0 16:0 16:1(n-9) 16:1(n-7) 16:1(n-5) i-17:0 a-17:0 16:2(n-4) 17:0 16:3(n-4) 16:4(n-1) 18:0 18:1(n-9) 18:1(n-7) 18:2(n-6) 18:3(n-6) 18:3(n-3) 18:4(n-3) 20:0 20:1(n-11) 20:1(n-9) 20:1(n-7) 20:2(n-6) 20:3(n-6) 20:4(n-6) 20:3(n-3) 20:4(n-3) 20:5(n-3) 22:1(n-11) 22:1(n-9) 22:1(n-7) 22:5(n-3) 22:6(n-3) 24:1(n-9)

  • 0.6
  • 0.4
  • 0.2

0.0 0.2 0.4 0.6

  • 0.4
  • 0.2

0.0 0.2 0.4 0.6 0.8

  • • •
  • 16:1(n-7)

18:0 18:4(n-3) 20:1(n-9) 20:5(n-3) 22:1(n-11)

asymmetric asymmetric map: map:

map="rowprincipal"

contribution ntribution coordina

  • ordinates:

tes:

map="rowgreen"

See See Biplots Biplots in Practice n Practice (Greenacre (Greenacre 2010) 010) www.multivariatestatistics.org

slide-4
SLIDE 4

Problem of variance explained Problem of variance explained

> summary(mjca(wg93[,1:4], lambda="indicator"))

Principal inertias (eigenvalues): dim value % cum% scree plot 1 0.457379 11.4 11.4 ************************* 2 0.430966 10.8 22.2 *********************** 3 0.321926 8.0 30.3 *************** : : : :

> summary(mjca(wg93[,1:4], lambda="Burt"))

Principal inertias (eigenvalues): dim value % cum% scree plot 1 0.209196 18.6 18.6 ************************* 2 0.185732 16.5 35.0 ********************** 3 0.103636 9.2 44.2 *********** : : : :

> summary(mjca(wg93[,1:4], lambda="adjusted")))) #DEFAULT

Principal inertias (eigenvalues): dim value % cum% scree plot 1 0.076455 44.9 44.9 ************************* 2 0.058220 34.2 79.1 ******************* 3 0.009197 5.4 84.5 *** : : : :

> summary(mjca(wg93[,1:4]), lambda="JCA"))

Percentage explained by JCA in 2 dimensions: 85.7% (Eigenvalues are not nested) [Iterations in JCA: 44 , epsilon = 9.91e-05]

increasing inertia explained

slide-5
SLIDE 5

Same problem for individual points Same problem for individual points

> summary(mjca(wg93[,1:4], lambda="Burt"))

Principal inertias (eigenvalues): dim value % cum% scree plot 1 0.209196 18.6 18.6 ************************* 2 0.185732 16.5 35.0 ********************** 3 0.103636 9.2 44.2 *********** : : : : : name mass qlt inr k=1 cor ctr k=2 cor ctr 1 | A1 | 34 445 55 | -840 391 53 | -314 54 8 | 2 | A2 | 92 169 38 | -250 136 13 | 123 33 3 | 3 | A3 | 59 344 47 | 204 47 5 | 517 298 36 | 4 | A4 | 51 350 50 | 533 258 32 | -318 92 12 | 5 | A5 | 14 401 60 | 913 170 25 | -1064 231 36 | 6 | B1 | 20 621 62 | -1338 519 80 | -590 101 16 | 7 | B2 | 50 158 47 | -293 80 9 | 287 77 10 | 8 | B3 | 59 227 45 | -158 29 3 | 415 198 24 | 9 | B4 | 81 210 41 | 327 185 19 | 121 25 3 | 10 | B5 | 40 722 60 | 619 229 34 | -908 493 77 | 11 | C1 | 44 732 60 | -987 632 93 | -392 100 16 | 12 | C2 | 91 164 38 | -113 27 3 | 255 137 14 | 13 | C3 | 57 296 48 | 283 84 10 | 450 212 27 | 14 | C4 | 44 345 52 | 617 289 37 | -274 57 8 | 15 | C5 | 15 471 60 | 671 99 15 | -1300 372 59 | 16 | D1 | 17 251 56 | -551 83 11 | -785 168 25 | 17 | D2 | 67 14 42 | 101 14 1 | 3 0 0 | 18 | D3 | 58 303 48 | 176 33 4 | 499 269 34 | 19 | D4 | 65 25 43 | 101 14 1 | 91 11 1 | 20 | D5 | 43 272 50 | -324 81 10 | -496 191 25 |

slide-6
SLIDE 6

Same problem for individual points Same problem for individual points

> summary(mjca(wg93[,1:4], lambda="Burt"))

Principal inertias (eigenvalues): dim value % cum% scree plot 1 0.209196 18.6 18.6 ************************* 2 0.185732 16.5 35.0 ********************** 3 0.103636 9.2 44.2 *********** > mjca(wg93[,1:4])$Burt

A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 D1 D2 D3 D4 D5 A1 119 0 0 0 0 27 28 30 22 12 49 40 18 7 5 15 25 17 34 28 A2 0 322 0 0 0 38 74 84 96 30 67 142 60 41 12 22 102 76 68 54 A3 0 0 204 0 0 3 48 63 73 17 18 75 70 34 7 10 44 68 58 24 A4 0 0 0 178 0 3 21 23 79 52 16 50 40 56 16 9 52 28 54 35 A5 0 0 0 0 48 0 3 5 11 29 2 9 9 16 12 4 9 13 12 10 B1 27 38 3 3 0 71 0 0 0 0 43 19 4 3 2 9 17 10 10 25 B2 28 74 48 21 3 0 174 0 0 0 36 88 34 15 1 16 51 42 45 20 B3 30 84 63 23 5 0 0 205 0 0 37 90 57 19 2 10 53 63 51 28 B4 22 96 73 79 11 0 0 0 281 0 27 88 75 74 17 6 66 70 92 47 B5 12 30 17 52 29 0 0 0 0 140 9 31 27 43 30 19 45 17 28 31 C1 49 67 18 16 2 43 36 37 27 9 152 0 0 0 0 25 24 15 38 50 C2 40 142 75 50 9 19 88 90 88 31 0 316 0 0 0 15 97 67 89 48 C3 18 60 70 40 9 4 34 57 75 27 0 0 197 0 0 5 51 83 41 17 C4 7 41 34 56 16 3 15 19 74 43 0 0 0 154 0 6 44 30 51 23 C5 5 12 7 16 12 2 1 2 17 30 0 0 0 0 52 9 16 7 7 13 D1 15 22 10 9 4 9 16 10 6 19 25 15 5 6 9 60 0 0 0 0 D2 25 102 44 52 9 17 51 53 66 45 24 97 51 44 16 0 232 0 0 0 D3 17 76 68 28 13 10 42 63 70 17 15 67 83 30 7 0 0 202 0 0 D4 34 68 58 54 12 10 45 51 92 28 38 89 41 51 7 0 0 0 226 0 D5 28 54 24 35 10 25 20 28 47 31 50 48 17 23 13 0 0 0 0 151

slide-7
SLIDE 7

Joint correspondence analysis Joint correspondence analysis

> mjca(wg93[,1:4], lambda="JCA")$Burt.upd

A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 D1 D2 D3 D4 D5 A1 31 53 19 14 3 27 28 30 22 12 49 40 18 7 5 15 25 17 34 28 A2 53 131 77 52 10 38 74 84 96 30 67 142 60 41 12 22 102 76 68 54 A3 19 77 63 39 7 3 48 63 73 17 18 75 70 34 7 10 44 68 58 24 A4 14 52 39 54 20 3 21 23 79 52 16 50 40 56 16 9 52 28 54 35 A5 3 10 7 20 9 0 3 5 11 29 2 9 9 16 12 4 9 13 12 10 B1 27 38 3 3 0 21 20 18 8 3 43 19 4 3 2 9 17 10 10 25 B2 28 74 48 21 3 20 46 54 50 4 36 88 34 15 1 16 51 42 45 20 B3 30 84 63 23 5 18 54 65 64 4 37 90 57 19 2 10 53 63 51 28 B4 22 96 73 79 11 8 50 64 104 55 27 88 75 74 17 6 66 70 92 47 B5 12 30 17 52 29 3 4 4 55 74 9 31 27 43 30 19 45 17 28 31 C1 49 67 18 16 2 43 36 37 27 9 82 55 4 3 7 25 24 15 38 50 C2 40 142 75 50 9 19 88 90 88 31 55 126 79 46 9 15 97 67 89 48 C3 18 60 70 40 9 4 34 57 75 27 4 79 66 41 6 5 51 83 41 17 C4 7 41 34 56 16 3 15 19 74 43 3 46 41 45 18 6 44 30 51 23 C5 5 12 7 16 12 2 1 2 17 30 7 9 6 18 11 9 16 7 7 13 D1 15 22 10 9 4 9 16 10 6 19 25 15 5 6 9 9 15 5 13 18 D2 25 102 44 52 9 17 51 53 66 45 24 97 51 44 16 15 62 56 61 38 D3 17 76 68 28 13 10 42 63 70 17 15 67 83 30 7 5 56 64 56 21 D4 34 68 58 54 12 10 45 51 92 28 38 89 41 51 7 13 61 56 60 36 D5 28 54 24 35 10 25 20 28 47 31 50 48 17 23 13 18 38 21 36 38

  • default: two-dimensional solution
  • at convergence

the diagonal blocks are perfectly fitted

slide-8
SLIDE 8

Joint correspondence analysis

  • int correspondence analysis

> summary(mjca(wg93[,1:4], lambda="JCA"))

Principal inertias (eigenvalues): dim value 1 0.099091 2 0.065033 : :

  • Total: 0.182425

Diagonal inertia discounted from eigenvalues: 0.0547405 Percentage explained by JCA in 2 dimensions: 85.7% (Eigenvalues are not nested) [Iterations in JCA: 44 , epsilon = 9.91e-05]

857 . 0547405 . 182425 . 0547405 . ) 065033 . 099091 . (    

Subset version

  • f

JCA available in new version: i.e., a subset of the categories is specified, and the analysis fits these

  • ptimally, using

the

  • riginal margins
  • f

the Burt matrix, omitting the (subsets

  • f)

categories in the diagonal blocks.

slide-9
SLIDE 9

Adjusted MCA Adjusted MCA

  • 1.5
  • 1.0
  • 0.5

0.0 0.5 1.0

  • 1.0
  • 0.5

0.0 0.5

A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 D1 D2 D3 D4 D5

  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.0 0.2 0.4 0.6

  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.0 0.2 0.4

A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 D1 D2 D3 D4 D5

Burt: 1 , 2 , … 35% explained Adjusted: 1 *, 2 *, … 79% explained

2 2 2 *

) 1 ( ) 1 ( Q Q Q

i i

    

slide-10
SLIDE 10

Adjusted MCA – Adjusted MCA – nullifying the Burt matrix ullifying the Burt matrix

A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 D1 D2 D3 D4 D5 A1 119 0 0 0 0 27 28 30 22 12 49 40 18 7 5 15 25 17 34 28 A2 0 322 0 0 0 38 74 84 96 30 67 142 60 41 12 22 102 76 68 54 A3 0 0 204 0 0 3 48 63 73 17 18 75 70 34 7 10 44 68 58 24 A4 0 0 0 178 0 3 21 23 79 52 16 50 40 56 16 9 52 28 54 35 A5 0 0 0 0 48 0 3 5 11 29 2 9 9 16 12 4 9 13 12 10 B1 27 38 3 3 0 71 0 0 0 0 43 19 4 3 2 9 17 10 10 25 B2 28 74 48 21 3 0 174 0 0 0 36 88 34 15 1 16 51 42 45 20 B3 30 84 63 23 5 0 0 205 0 0 37 90 57 19 2 10 53 63 51 28 B4 22 96 73 79 11 0 0 0 281 0 27 88 75 74 17 6 66 70 92 47 B5 12 30 17 52 29 0 0 0 0 140 9 31 27 43 30 19 45 17 28 31 C1 49 67 18 16 2 43 36 37 27 9 152 0 0 0 0 25 24 15 38 50 C2 40 142 75 50 9 19 88 90 88 31 0 316 0 0 0 15 97 67 89 48 C3 18 60 70 40 9 4 34 57 75 27 0 0 197 0 0 5 51 83 41 17 C4 7 41 34 56 16 3 15 19 74 43 0 0 0 154 0 6 44 30 51 23 C5 5 12 7 16 12 2 1 2 17 30 0 0 0 0 52 9 16 7 7 13 D1 15 22 10 9 4 9 16 10 6 19 25 15 5 6 9 60 0 0 0 0 D2 25 102 44 52 9 17 51 53 66 45 24 97 51 44 16 0 232 0 0 0 D3 17 76 68 28 13 10 42 63 70 17 15 67 83 30 7 0 0 202 0 0 D4 34 68 58 54 12 10 45 51 92 28 38 89 41 51 7 0 0 0 226 0 D5 28 54 24 35 10 25 20 28 47 31 50 48 17 23 13 0 0 0 0 151

slide-11
SLIDE 11

Adjusted MCA – Adjusted MCA – nullified Burt matrix ullified Burt matrix

A1 A2 A3 A4 A5 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 D1 D2 D3 D4 D5 A1 119 0 0 0 0 27 28 30 22 12 49 40 18 7 5 15 25 17 34 28 A2 0 322 0 0 0 38 74 84 96 30 67 142 60 41 12 22 102 76 68 54 A3 0 0 204 0 0 3 48 63 73 17 18 75 70 34 7 10 44 68 58 24 A4 0 0 0 178 0 3 21 23 79 52 16 50 40 56 16 9 52 28 54 35 A5 0 0 0 0 48 0 3 5 11 29 2 9 9 16 12 4 9 13 12 10 B1 27 38 3 3 0 71 0 0 0 0 43 19 4 3 2 9 17 10 10 25 B2 28 74 48 21 3 0 174 0 0 0 36 88 34 15 1 16 51 42 45 20 B3 30 84 63 23 5 0 0 205 0 0 37 90 57 19 2 10 53 63 51 28 B4 22 96 73 79 11 0 0 0 281 0 27 88 75 74 17 6 66 70 92 47 B5 12 30 17 52 29 0 0 0 0 140 9 31 27 43 30 19 45 17 28 31 C1 49 67 18 16 2 43 36 37 27 9 152 0 0 0 0 25 24 15 38 50 C2 40 142 75 50 9 19 88 90 88 31 0 316 0 0 0 15 97 67 89 48 C3 18 60 70 40 9 4 34 57 75 27 0 0 197 0 0 5 51 83 41 17 C4 7 41 34 56 16 3 15 19 74 43 0 0 0 154 0 6 44 30 51 23 C5 5 12 7 16 12 2 1 2 17 30 0 0 0 0 52 9 16 7 7 13 D1 15 22 10 9 4 9 16 10 6 19 25 15 5 6 9 60 0 0 0 0 D2 25 102 44 52 9 17 51 53 66 45 24 97 51 44 16 0 232 0 0 0 D3 17 76 68 28 13 10 42 63 70 17 15 67 83 30 7 0 0 202 0 0 D4 34 68 58 54 12 10 45 51 92 28 38 89 41 51 7 0 0 0 226 0 D5 28 54 24 35 10 25 20 28 47 31 50 48 17 23 13 0 0 0 0 151

  • Perform

eigendecomposition

  • n

B0

(suitably centred & normalized, as in MCA)

  • The

POSITIVE eigenvalues are exactly the adjusted inertias

  • Adjustments

for each category

  • btained

in same way

B0 =

slide-12
SLIDE 12

Results for new version (default is “adjusted”) Results for new version (default is “adjusted”)

> summary(mjca(wg93[,1:4]))

Principal inertias (eigenvalues): dim value % cum% scree plot 1 0.076455 44.9 44.9 ************************* 2 0.058220 34.2 79.1 ******************* 3 0.009197 5.4 84.5 *** : : : : : name mass qlt inr k=1 cor ctr k=2 cor ctr 1 | A1 | 34 963 55 | 508 860 115 | -176 103 18 | 2 | A2 | 92 659 38 | 151 546 28 | 69 113 7 | 3 | A3 | 59 929 47 | -124 143 12 | 289 786 84 | 4 | A4 | 51 798 50 | -322 612 69 | -178 186 28 | 5 | A5 | 14 799 60 | -552 369 55 | -596 430 84 | 6 | B1 | 20 911 62 | 809 781 174 | -331 131 38 | 7 | B2 | 50 631 47 | 177 346 21 | 161 285 22 | 8 | B3 | 59 806 45 | 96 117 7 | 233 690 55 | 9 | B4 | 81 620 41 | -197 555 41 | 68 65 6 | 10 | B5 | 40 810 60 | -374 285 74 | -509 526 179 | 11 | C1 | 44 847 60 | 597 746 203 | -219 101 36 | 12 | C2 | 91 545 38 | 68 101 6 | 143 444 32 | 13 | C3 | 57 691 48 | -171 218 22 | 252 473 62 | 14 | C4 | 44 788 52 | -373 674 80 | -153 114 18 | 15 | C5 | 15 852 60 | -406 202 32 | -728 650 136 | 16 | D1 | 17 782 56 | 333 285 25 | -440 497 57 | 17 | D2 | 67 126 42 | -61 126 3 | 2 0 0 | 18 | D3 | 58 688 48 | -106 87 9 | 280 601 78 | 19 | D4 | 65 174 43 | -61 103 3 | 51 71 3 | 20 | D5 | 43 869 50 | 196 288 22 | -278 581 57 |

Subset version also available, using nullified Burt matrix as before

slide-13
SLIDE 13

Packages with CA Packages with CA

  • ca
  • FactoMiner
  • vegan
  • ade4
  • MASS
  • caGUI
  • biplotGUI
slide-14
SLIDE 14

I - 0 Correspondence analysis with ca

Correspondence analysis with ca

Tutorial presented at the CARME 2011 in Rennes, France February 8, 2011

  • M. Greenacre, O. Nenadi
slide-15
SLIDE 15

I - 1 Correspondence analysis with ca

Introduction In the practical part of this tutorial we demonsrate how to apply the ca package for simple, multiple and joint correspondence analysis in R. R is a freely available statistical software environment. Since its introduction by R. Ihaka and R. Gentleman (1996) it has gained much popularity in the statistical community. One advantage of R is the extension system, which allows for extending R‘s capabilities by so-called packages. Further information on R is available at the official R website: http://www.R-project.org .

slide-16
SLIDE 16

I - 2 Correspondence analysis with ca

The ca package, an overview The ca package offers functions for the computation and visualization of correspondence analysis. The core computations are done by the functions ca() (simple correspondence analysis) and mjca() (multiple and joint correspondence analysis). Each function has its corresponding print, summary and plot method which are used for presenting numerical results of the analysis and for the graphical display. Additional functions include auxillary functions that are usually not called directly by the users (such as e.g. iterate.mjca() which is used in a joint correspondence analysis).

slide-17
SLIDE 17

I - 3 Correspondence analysis with ca

The ca package, an overview The core functions in ca and its methods: simple correspon- multiple and joint dence analysis correspondence analysis

  • Computation:

ca() mjca()

  • Numerical output:

print.ca() print.mjca() summary.ca() summary.mjca()

  • Graphical display:

plot.ca() plot.mjca() plot3d.ca() (plot3d.mjca()) Where applicable, the functions for simple and for multiple / joint correspondence analysis share the same structure of arguments.

slide-18
SLIDE 18

I - 4 Correspondence analysis with ca

Simple correspondence analysis Simple correspondence analysis is performed with the function ca():

> ca(smoke)

Principal inertias (eigenvalues): 1 2 3 Value 0.074759 0.010017 0.000414 Percentage 87.76% 11.76% 0.49% Rows: SM JM SE JE SC Mass 0.056995 0.093264 0.264249 0.455959 0.129534 ChiDist 0.216559 0.356921 0.380779 0.240025 0.216169 Inertia 0.002673 0.011881 0.038314 0.026269 0.006053

  • Dim. 1 -0.240539 0.947105 -1.391973 0.851989 -0.735456
  • Dim. 2 -1.935708 -2.430958 -0.106508 0.576944 0.788435

Columns: none light medium heavy Mass 0.316062 0.233161 0.321244 0.129534 ChiDist 0.394490 0.173996 0.198127 0.355109 Inertia 0.049186 0.007059 0.012610 0.016335

  • Dim. 1 -1.438471 0.363746 0.718017 1.074445
  • Dim. 2 -0.304659 1.409433 0.073528 -1.975960
slide-19
SLIDE 19

I - 5 Correspondence analysis with ca

Simple correspondence analysis Additional details are given with the summary method:

> summary(ca(smoke))

Principal inertias (eigenvalues): dim value % cum% scree plot 1 0.074759 87.8 87.8 ************************* 2 0.010017 11.8 99.5 *** 3 0.000414 0.5 100.0

  • ------- -----

Total: 0.085190 100.0 Rows: name mass qlt inr k=1 cor ctr k=2 cor ctr 1 | SM | 57 893 31 | -66 92 3 | -194 800 214 | 2 | JM | 93 991 139 | 259 526 84 | -243 465 551 | 3 | SE | 264 1000 450 | -381 999 512 | -11 1 3 | 4 | JE | 456 1000 308 | 233 942 331 | 58 58 152 | 5 | SC | 130 999 71 | -201 865 70 | 79 133 81 | Columns: name mass qlt inr k=1 cor ctr k=2 cor ctr 1 | none | 316 1000 577 | -393 994 654 | -30 6 29 | 2 | lght | 233 984 83 | 99 327 31 | 141 657 463 | 3 | medm | 321 983 148 | 196 982 166 | 7 1 2 | 4 | hevy | 130 995 192 | 294 684 150 | -198 310 506 |

slide-20
SLIDE 20

I - 6 Correspondence analysis with ca

Simple correspondence analysis Extensions to simple correspondence analysis include supplementary rows and/or columns as well as a subset analysis. These extensions are handled by the optional arguments supcol / suprow and subsetcol / subsetrow :

# Considering the first column (non-smokers) as supplementary: > ca(smoke, supcol = 1) # Considering the subset of non-smokers (i.e. columns 2,3 and 4): > ca(smoke, subsetcol = 2:4) # Adding a supplementary column to a subset analysis: > ca(smoke, subsetcol = 2:4, supcol = 1)

slide-21
SLIDE 21

I - 7 Correspondence analysis with ca

Simple correspondence analysis The visualization of simple correspondence analysis is done with the corresponding plot method:

> plot(ca(smoke, supcol = 1))

slide-22
SLIDE 22

I - 8 Correspondence analysis with ca

Simple correspondence analysis As with the core function, additional options are provided by optional

  • arguments. For example, different map scaling options are available with

the option map :

  • ption

description "symmetric" Rows and columns in principal coordinates (default) "rowprincipal" Rows in principal and columns in standard coordinates "colprincipal" Rows in standard and columns in principal coordinates "symbiplot" Row and column coordinates are scaled to have variances equal to the singular values "rowgab" Rows in principal coordinates and columns in standard co-

  • rdinates times mass

"colgab" Columns in principal coordinates and rows in standard co-

  • rdinates times mass

(according to a proposal by Gabriel and Odoro, 1990) "rowgreen" Rows in principal coordinates and columns in standard co-

  • rdinates times the square root of the mass

"colgreen" Columns in principal coordinates and rows in standard co-

  • rdinates times the square root of the mass

(according to a proposal by Greenacre, 2006)

slide-23
SLIDE 23

I - 9 Correspondence analysis with ca

Simple correspondence analysis In addition, three-dimensional maps can be displayed using the rgl- package (D. Murdoch, D. Adler):

> plot3d(ca(smoke))

slide-24
SLIDE 24

I - 10 Correspondence analysis with ca

Multiple and joint correspondence analysis Multiple and joint correspondence analysis is computed with the function mjca(). The approach to MCA is determined by the option lambda: lambda=“indicator” Multiple correspondence analysis based

  • n the indicator matrix

lambda=“Burt” Multiple correspondence analysis based

  • n the Burt matrix

lambda=“adjusted” Adjusted multiple correspondence analysis lambda=“JCA” Joint correspondence analysis By default, an adjusted MCA is performed, i.e. lambda=“adjusted“.

slide-25
SLIDE 25

I - 11 Correspondence analysis with ca

Multiple and joint correspondence analysis The input data for mjca() is a data frame comprising factors as the columns (response pattern matrix). Internally, computations are performed on the Burt matrix (B), which is

  • btained from the indicator matrix (Z).
slide-26
SLIDE 26

I - 12 Correspondence analysis with ca

Multiple and joint correspondence analysis An example: A multiple correspondence analysis on the wg93 dataset (i.e. four questions on attitude towards science with responses on a five-point scale):

> mjca(wg93[,1:4])

Eigenvalues: 1 2 3 4 5 6 Value 0.076455 0.05822 0.009197 0.00567 0.001172 7e-06 Percentage 44.91% 34.2% 5.4% 3.33% 0.69% 0% Columns: A1 A2 A3 A4 A5 B1 B2 B3 Mass 0.034156 0.092423 0.058553 0.051091 0.013777 0.020379 0.049943 0.058840 ChiDist 1.343394 0.676433 0.947274 1.049164 2.214898 1.856041 1.034203 0.933288 Inertia 0.061642 0.042289 0.052542 0.056238 0.067588 0.070203 0.053417 0.051252

  • Dim. 1 1.836627 0.546240 -0.446797 -1.165903 -1.995217 2.924321 0.641516 0.346050
  • Dim. 2 -0.727459 0.284443 1.199439 -0.736782 -2.470026 -1.370078 0.666938 0.963918

B4 B5 C1 C2 C3 C4 C5 D1 Mass 0.080654 0.040184 0.043628 0.090700 0.056544 0.044202 0.014925 0.017222 ChiDist 0.760011 1.294006 1.241063 0.688137 0.977789 1.148345 2.132827 1.915937 Inertia 0.046587 0.067286 0.067197 0.042950 0.054060 0.058289 0.067895 0.063217

  • Dim. 1 -0.714126 -1.353725 2.157782 0.246828 -0.618996 -1.348858 -1.467582 1.203782
  • Dim. 2 0.280071 -2.107677 -0.908553 0.591611 1.044412 -0.634647 -3.016588 -1.821975

...

slide-27
SLIDE 27

I - 13 Correspondence analysis with ca

Multiple and joint correspondence analysis As in simple CA a more detailed output is given with the summary method:

> summary(mjca(wg93[,1:4]))

Principal inertias (eigenvalues): dim value % cum% scree plot 1 0.076455 44.9 44.9 ************************* 2 0.058220 34.2 79.1 ******************* 3 0.009197 5.4 84.5 *** 4 0.005670 3.3 87.8 ** 5 0.001172 0.7 88.5 6 7e-06000 0.0 88.5

  • ------- -----

Total: 0.170246 Columns: name mass qlt inr k=1 cor ctr k=2 cor ctr 1 | A1 | 34 963 55 | 508 860 115 | -176 103 18 | 2 | A2 | 92 659 38 | 151 546 28 | 69 113 7 | 3 | A3 | 59 929 47 | -124 143 12 | 289 786 84 | 4 | A4 | 51 798 50 | -322 612 69 | -178 186 28 | 5 | A5 | 14 799 60 | -552 369 55 | -596 430 84 | 6 | B1 | 20 911 62 | 809 781 174 | -331 131 38 |

...

slide-28
SLIDE 28

I - 14 Correspondence analysis with ca

Multiple and joint correspondence analysis The different approaches to MCA are specified with the optional argument lambda:

# MCA based on the indicator matrix: > mjca(wg93[,1:4], lambda = “indicator”) # MCA based on the Burt matrix: > mjca(wg93[,1:4], lambda = “Burt”) # MCA based on the adjusted approach: > mjca(wg93[,1:4], lambda = “adjusted”) # lambda=“adjusted” is the default, hence the following # gives the same result: > mjca(wg93[,1:4]) # Joint correspondence analysis: > mjca(wg93[,1:4], lambda = “JCA”)

slide-29
SLIDE 29

I - 15 Correspondence analysis with ca

Multiple and joint correspondence analysis As with simple CA, supplementary variables are specified with the option

  • supcol. In mjca() only supplementary variables (i.e. columns) are

considered. Columns 5 to 7 of the wg93 dataset contain additional demographic information (sex, age and education). These are included as supplementary variables as follows:

> mjca(wg93, supcol = 5:7)

slide-30
SLIDE 30

I - 16 Correspondence analysis with ca

Multiple and joint correspondence analysis The option subsetcol in mjca() referrs to the column indexes of the subset categories (i.e. the levels of the variables). For example, excluding the middle categories in the analysis of the wg93 dataset is done as follows:

> si <- (1:20)[-seq(3,18,5)] > si [1] 1 2 4 5 6 7 9 10 11 12 14 15 16 17 19 20 > mjca(wg93[,1:4], subsetcol = si)

slide-31
SLIDE 31

I - 17 Correspondence analysis with ca

Multiple and joint correspondence analysis Both options, subsetcol and supcol, can be combined, i.e. supplementary variables can be included in a subset analysis:

> mjca(wg93, subsetcol = si, supcol = 5:7)

Eigenvalues: 1 2 3 4 5 Value 0.070422 0.034998 0.007176 0.000875 0.00044 Percentage 53.96% 26.81% 5.5% 0.67% 0.34%

Columns: A1 A2 A4 A5 B1 B2 B4 B5 Mass 0.034156 0.092423 0.051091 0.013777 0.020379 0.049943 0.080654 0.040184 ChiDist 1.343394 0.676433 1.049164 2.214898 1.856041 1.034203 0.760011 1.294006 Inertia 0.061642 0.042289 0.056238 0.067588 0.070203 0.053417 0.046587 0.067286

  • Dim. 1 1.706316 0.544095 -1.307329 -2.435074 2.759360 0.850833 -0.569441 -1.710689
  • Dim. 2 1.275991 -0.343625 0.201719 2.794810 2.003836 -0.658112 -0.533170 2.100918

... sex1(*) sex2(*) age1(*) age2(*) age3(*) age4(*) age5(*) age6(*) Mass NA NA NA NA NA NA NA NA ChiDist NA NA NA NA NA NA NA NA Inertia NA NA NA NA NA NA NA NA

  • Dim. 1 -0.341876 0.328786 -0.405213 -0.243592 -0.033779 -0.030832 0.025808 0.666671
  • Dim. 2 -0.130770 0.125763 -0.319599 0.305108 0.075773 -0.016810 -0.190774 -0.146837

...

slide-32
SLIDE 32

I - 18 Correspondence analysis with ca

Multiple and joint correspondence analysis The plotting method gives the graphical representation of the result as a map:

> plot(mjca(wg93[,1:4]))

slide-33
SLIDE 33

I - 19 Correspondence analysis with ca

Summary The computation is done with two functions, ca() for simple CA and mjca() for multiple and joint CA. The input data is a table of frequencies for simple CA and a response pattern matrix (i.e. a data frame with factors) for multiple and joint CA. In mjca() the type of analysis is controlled by the option lambda. Subsets and supplementary variables are specified with subsetcol and supcol (in simple CA also subsetrow and suprow). Output (numerical and graphical) is managed by the corresponding methods (print, summary and plot). All available options are listed in the manual / help files.

slide-34
SLIDE 34

I - 20 Correspondence analysis with ca

The End The package is available from the CARME-N website (Correspondence Analysis and Related Methods Network): http://www.carme-n.org Currently the package is at version 0.50, the current version includes a major revision for the mjca-part, where all computations have been rewritten to follow a unified approach. The next update will focus on the graphical output. Feedback and suggestions are highly welcome: michael@upf.edu ; onenadi@uni-goettingen.de