Correspondence Analysis of Surveys with Conditioned and Multiple - - PowerPoint PPT Presentation

correspondence analysis of surveys with conditioned and
SMART_READER_LITE
LIVE PREVIEW

Correspondence Analysis of Surveys with Conditioned and Multiple - - PowerPoint PPT Presentation

Correspondence Analysis of Surveys with Conditioned and Multiple Response Questions Amaya Z arraga and Beatriz Goitisolo Department of Econometrics and Statistics. University of Basque Country. Spain First Prev Next Last


slide-1
SLIDE 1
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Correspondence Analysis of Surveys with Conditioned and Multiple Response Questions

Amaya Z´ arraga and Beatriz Goitisolo Department of Econometrics and Statistics. University of Basque Country. Spain

slide-2
SLIDE 2
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Contents

1 Introduction: Surveys with closed questions with a finite number of response categories 3 2 How to analyze surveys 7 3 Possible Solution: Creation of the CDT 9 3.1 Effects of forcing the creation of a CDT . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4 Another possible solution: CA of the PDT 12 4.1 Problems Resulting from the Application of CA to the PDT: Effect on Distances . . . 13 5 Suggested Approach: CA of PDT with a modified marginal 16 5.1 Computation of Axes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6 Illustrative Example 19 7 Conclusions 24

slide-3
SLIDE 3
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

1. Introduction: Surveys with closed questions with a finite number of response categories

  • 1. Multiple Choice Questions: individuals choose one and only one response category
  • Gender

– Male – Female

  • Have you ever taken a course on computers?

– Yes, in the last year – Yes, more than a year ago – No, never

  • Use of computers every day?

– Yes – No

slide-4
SLIDE 4
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit
  • 2. Multiple Response Questions: individuals can choose more than one category
  • Have you ever, even once, used the following?

– Tobacco – Alcohol – Marijuana – Cocaine – Crack – Heroin – Hallucinogens – Inhalants – Pain Relievers – Tranquilizers – Stimulants – Sedatives

slide-5
SLIDE 5
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit
  • 3. Conditioned Response Questions: individuals must answer a question or not depending
  • n their answer to a previous one.
  • Use of computers every day?

– Yes – No (go to question 16)

  • Purpose of computer use: Leisure

– Yes – No

  • Purpose of computer use: Music

– Yes – No

  • Purpose of computer use: Games

– Yes – No

slide-6
SLIDE 6
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit
  • 4. Conditioned Multiple Response Questions:
  • Is the number of children you have the desired one?

– Yes (go to question 26) – No

  • Which of the following are the reasons of this discrepancy?

– Desire to continue studying – Problems of health – Supposes loss of freedom . . .

slide-7
SLIDE 7
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

2. How to analyze surveys

⇒ The study and visualization of the relationships among response categories

  • 1. Multiple Choice:

Classical analysis: MCA ⇒ Create Complete Disjunctive Table (CDT) coding as 0 (category of no chosen responses) and 1 (category of chosen response) ⇒ Create Burt’s Table Gender Course ... i M F < 1 > 1 No 1 1 1 1 Q 2 1 1 1 Q 3 1 1 1 Q . . . n n n n nQ zij =

  • 1

1 value Jq − 1 values ∀q ∈ Q zq

i

= 1 ∀q ∈ Q ∀i ∈ I zq = n ∀q ∈ Q zi. = Q ∀i ∈ I z = nQ

slide-8
SLIDE 8
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit
  • 2. Multiple Choice, Multiple Response, Conditioned:

Gender Course Drugs Computer Purpose i M F < 1 > 1 No Tobacco ... Sedatives Y N Music Games 1 1 1 1 ... 1 1 1 ... ?=zi. 2 1 1 1 ... 1 ... ?=zi′. 3 1 1 ... 1 1 ... ? . . . n n ? n ? ? = = = zq zq′ z

zij =

  • Jq values for some i and some q

conditioned questions 1 Jq values for some i and some q multiple response questions zq

i

= 1 for some i and some q zq = n for some q zi. = Q for some i z = nQ ⇒ Partial Disjunctive Table (PDT)

slide-9
SLIDE 9
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

3. Possible Solution: Creation of the CDT

⇒ Advantage: MCA ⇒ For each response category (in MRQ) a new category that denies the previous one (fictitious or dummy category (D)) have to be created. ⇒ m original categories ⇒ m questions ⇒ 2m final categories

Drugs Tobacco ... Sedatives i Y D Y D Y D 1 1 1 2 1 1 . . .

⇒ For conditioned questions by a previous one, a new category indicating Not required to answer (NRA) is created for each question. ⇒ For conditioned MRQ, both types of artificial category (D) and (NRA) have to be created for each original category. ⇒ m original categories ⇒ m questions ⇒ 3m final categories

Gender ... Children

  • C. Studying
  • C. Health

... i M F Yes No Yes D NRA Yes D NRA 1 1 1 1 1 2 1 1 1 1 . . . 1 1 1 1 1 1 1 1

slide-10
SLIDE 10
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

3.1. Effects of forcing the creation of a CDT

  • Increase in the number of response categories ⇒

– Increase in the variability (inertia in terms of CA) – All the categories (originals + fictitious) contribute to the creation of factorial axes – Planes covered by points (complicating the interpretation)

  • Dummy categories may really fit to the negative of the original category but can also hide a

desire of not to answer and/or ignorance of the response. Aim: study of the relationships among original categories

  • In the case ”pink k/m” (k < m), (m−k) dummy categories which only represent the restriction
  • f choosing k among the original m are created.
  • Dummy categories may have similar response patterns and even they can create the first fac-

torial axes (case in conditioned questions).

slide-11
SLIDE 11
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Completed Disjunctive Table with Not Required to Answer (NRA) categories

  • 1. Advantage: MCA
  • 2. Disadvantage: could create the first axes

1 2 3

  • 2
  • 1

1

Analysis of the CDT

Factor 1 ( 86.13 %) Factor 2 ( 10.47 %)

Computer-Yes Computer-No

CLeisure-Yes CLeisure-No CLeisure-NRA CSchool-Yes CSchool-No CSchool-NRA COther-Yes COther-No COther-NRA CPHome-Yes CPHome-No CPHome-NRA CPFriends-Yes CPFriends-No CPFriends-NRA CPSchool-Yes CPSchool-No CPSchool-NRA CPPublic-Yes CPPublic-No CPPublic-NRA CPcibercafe-Yes CPcibercafe-No CPcibercafe-NRA

Internet-Yes Internet-No

ILeisure-Yes ILeisure-No ILeisure-NRA ISchool-Yes ISchool-No ISchool-NRA IOther-Yes IOther-No IOther-NRA IPHome-Yes IPHome-No IPHome-NRA IPFriends-Yes IPFriends-No IPFriends-NRA IPSchool-Yes IPSchool-No IPSchool-NRA IPPublic-Yes IPPublic-No IPPublic-NRA IPCibercafe-Yes IPCibercafe-No IPCibercafe-NRA

Mobile-Yes Mobile-No 74.37% Factor 1 73.18% Factor 2

  • Survey on Equipment and Use of Information and Communication Technologies in the Home

(Spanish Institute of Statistics, 2007)

  • Block: Use of computers and the Internet by children (aged 10 to 15) (18 questions)
slide-12
SLIDE 12
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

4. Another possible solution: CA of the PDT

Frequencies and Profiles Relative and marginal frequencies: pij = zij z

  • pi. =
  • j∈J

pij = zi. z p.j =

  • i∈I

pij = z.j z Row profiles i, i ∈ I: pij pi. = zij zi. ∀j ∈ J ⇒ N(I) ⊂ RJ Column profiles j, j ∈ J : pij p.j = zij z.j ∀i ∈ I ⇒ N(J ) ⊂ Rn

slide-13
SLIDE 13
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

4.1. Problems Resulting from the Application of CA to the PDT: Effect

  • n Distances

In CA, similarity between any pair of row profiles and between any pair of column profiles is calculated by means of the χ2 distance. The χ2 distance between two row profiles i and i′: d2(i, i′) =

  • j∈J

1 p.j pij pi. − pi′j pi′. 2 =

  • j∈J

z z.j zij zi. − zi′j zi′. 2 In CDT: d2(i = 1, i′ = 2) = nQ z.M 1 Q − 0 Q 2

  • =0

+nQ z.F Q − 1 Q 2

  • =0

+ · · · + nQ z.CY 1 Q − 1 Q 2

  • =0

+ . . . In PDT, z1. = z2.: d2(i = 1, i′ = 2) = z z.M 1 z1. − 0 z2. 2 + z z.F z1. − 1 z2. 2 + · · · + z z.CY 1 z1. − 1 z2. 2

  • = 0

+ . . .

slide-14
SLIDE 14
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

The χ2 distance between two column profiles j and j′: d2(j, j′) =

  • i∈I

1 pi. pij p.j − pij′ p.j′ 2 =

  • i∈I

z zi. zij z.j − zij′ z.j′ 2 In CDT zi. = Q ∀q ∈ Q: d2(j, j′) =

  • i∈I

nQ Q zij z.j − zij′ z.j′ 2 In PDT, zi. = zi′.: d2(j, j′) =

  • i∈I

z zi. zij z.j − zij′ z.j′ 2

slide-15
SLIDE 15
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

The χ2 distance between column profile j and average profile: d2(j, GJ) =

  • i

1 pi. (pij p.j − pi.)

2

pij p.j = 1 n ∀i ∈ I In CDT pi. = 1

n

∀i: d2(j, GJ) =

  • i

n 1 n − 1 n 2 = 0 In PDT, pi. = zi.

z :

d2(j, GJ) =

  • i

z zi. 1 n − zi. z 2 = 0

slide-16
SLIDE 16
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

5. Suggested Approach: CA of PDT with a modified marginal

The marginal of the PDT, pi. = zi./z, is replaced by ri. = 1/n.

  • It is the natural centre of gravity of multiple choice questions and non conditioned question

categories in the survey

  • It gives equal weight or importance to all individuals
  • The distances between individuals are due only to differences between their response patterns:

d2(i, i′) =

  • j∈J

1 p.j pij ri. − pi′j ri′. 2 = n2 z

  • j∈J

1 z.j (zij − zi′j)2 d2(i = 1, i′ = 2) = n2 z   1 z.M (1 − 0)2 + 1 z.F (0 − 1)2 + · · · + 1 z.CY (1 − 1)2

=0

+ . . .  

  • In the distances between two categories all individuals have the same importance, regardless
  • f whether they answer all the questions or not:

d2(j, j′) =

  • i∈I

1 ri. pij p.j − pij′ p.j′ 2 =

  • i∈I

n zij z.j − zij′ z.j′ 2

  • The categories chosen by all individuals, if this situation arises, do not influence the analysis
  • It is not necessary to introduce dummy categories in the analysis.
slide-17
SLIDE 17
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

5.1. Computation of Axes

CA of a CDT (SVD of S) S = D−1/2

r

  • P − Dr11TDc
  • D−1/2

c

where : P = {pij = zij/nQ} Dr = diag{pi. = 1/n} Dc = diag{p.j = z.j/nQ} CA of a PDT with the imposed marginal (SVD of S) S = D−1/2

r

  • P − Dr11TDc
  • D−1/2

c

= √n Z∗ z − 1 n11TDc

  • D−1/2

c

where : P = {pij = zij/z} Dr = diag{ri. = 1/n} Dc = diag{z.j/z} S = UΣVT where: UTU = VTV = I Σ = {σ1, σ2, . . . , σS} σ2

s = λs are called principal inertias

  • s

λs = total inertia

slide-18
SLIDE 18
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

Projections of rows, fs, and columns, gs, on the sth axis are calculated as principal coordinates: fs = √n us

  • λs

gs = D−1/2

c

vs

  • λs

Transition relationships fs = n Z∗ z − 1 n11TDc

  • gs

1 √λs s ∈ S (1) gs = D−1

c

Z∗ z − 1 n11TDc T fs 1 √λs s ∈ S (2)

  • r for each row and column:

fs(i) = 1 √λs

  • n
  • j∈J

zij z gs(j) −

  • j∈J

z.j z gs(j)

  • (3)

gs(j) = 1 √λs

  • i∈I

zij z.j fs(i) −

  • i∈I

1 n fs(i)

  • (4)

fs

∗(i)

= 1 √λs n

  • j∈J

zij z gs(j) (5)

slide-19
SLIDE 19
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

6. Illustrative Example

  • Survey on Households and the Environment. Year 2008
  • INE (Spanish Institute of Statistics )
  • Sample size: 26689 people aged 16 and over
  • Question: Would you agree with the following measures for the protection of the environment?
  • Making separation of household waste obligatory
  • Restricting abusive consumption of water
  • Establishing a tax on fuel
  • Establishing restrictions on private transport
  • Establishing a tax on tourism
  • Installing renewable energy parks (windfarm, solar power)

in your town in spite of the effect on the landscape

  • Paying more for alternative energy
  • Reducing traffic noise

Oblige separate waste Restrict water Tax fuel Restrict private transport Tax tourism Install energy park Pay alternative energy Reduce traffic noise Original categories ⇒ PDT Original categories + Fictitious (-F) ⇒ CDT

slide-20
SLIDE 20
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit
  • 0.5

0.0 0.5 1.0

  • 1.0
  • 0.5

0.0

Analysis of the CDT

Factor 1 ( 32.453 %) Factor 2 ( 12.581 %)

Oblige_separate_waste Restrict_water Tax_fuel Restrict_private_transport Tax_tourism Install_energy_park Pay_alternative_energy Reduce_traffic_noise Oblige_separate_waste-F Restrict_water-F Tax_fuel-F Restrict_private_transport-F Tax_tourism-F Install_energy_park-F Pay_alternative_energy-F Reduce_traffic_noise-F

Fictitious Originals

60.56% Factor 1 67.74% Factor 2

slide-21
SLIDE 21
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit
  • 1.0
  • 0.5

0.0 0.5

  • 0.4
  • 0.2

0.0 0.2 0.4 0.6

Analysis of the CDT

Factor 3 ( 11.307 %) Factor 4 ( 9.812 %)

Oblige_separate_waste Restrict_water Tax_fuel

Restrict_private_transport Tax_tourism

Install_energy_park

Pay_alternative_energy

Reduce_traffic_noise Oblige_separate_waste-F Restrict_water-F Tax_fuel-F Restrict_private_transport-F Tax_tourism-F Install_energy_park-F Pay_alternative_energy-F Reduce_traffic_noise-F

slide-22
SLIDE 22
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

0.2 0.4 0.6 0.8 1.0 1.2

  • 1.0
  • 0.5

0.0 0.5

Analysis of the PDT

Factor 1 ( 36.113 %) Factor 2 ( 16.837 %)

Oblige_separate_waste Restrict_water Tax_fuel

Restrict_private_transport Tax_tourism

Install_energy_park

Pay_alternative_energy

Reduce_traffic_noise

Pay No pay

slide-23
SLIDE 23
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit
  • 0.4
  • 0.2

0.0 0.2 0.4 0.6 0.8

  • 0.6
  • 0.4
  • 0.2

0.0 0.2 0.4 0.6

Analysis of the PDT

Factor 3 ( 12.572 %) Factor 4 ( 10.695 %) Oblige_separate_waste

Restrict_water Tax_fuel

Restrict_private_transport Tax_tourism

Install_energy_park Pay_alternative_energy Reduce_traffic_noise

slide-24
SLIDE 24
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

7. Conclusions

Surveys with multiple choice, multiple response and/or conditioned questions: ⇒ Partial Disjunctive Table ⇒ Problems with direct CA of PDT ⇒ Complete Disjunctive Table (PDT + dummies) ⇒ Problems of CA of this CDT ⇒ Solution: CA of PDT with imposed marginal

slide-25
SLIDE 25
  • First •Prev •Next •Last •Go Back •Full Screen •Close •Quit

References

[1] J.P. Benz´

  • ecri. Sur le calcul des taux d’ inertie dans l’ analyse d’ un questionnaire: addendum et

erratum ` a [bin. mult.]. Les Cahiers de l’ Analyse des Donn´ ees, IV(3):377–378, 1979. [2] B. Escofier. Traitement des questionarires avec non-r´ eponse, analyse des correspondances avec marge modifi´ ee et analyse multicanonique avec contrainte. Publications de l’ Institut de Statistique de l’ Universit` e de Paris, XXXII(fasc 3):33–70, 1987. [3] B. Escofier and J. Pag`

  • es. Analyses Factorielles Simples et Multiples. Objetifs, M´

ethodes et In- terpr´

  • etation. Dunod, Paris, 2nd edition, 1998.

[4] B. Goitisolo and A. Z´

  • arraga. Equivalence between the incomplete disjunctive table and the asso-

ciated burt pseudo-table analysis. In K. Fernandez-Aguirre and A. Morineau, editors, Analyses Multidimensionnelles des Donn´ ees, pages 227–238. Cisia-Ceresta, Saint-Mand´ e (France), 1998. [5] M.J. Greenacre. Theory and Application of Correspondence Analysis. Academic Press, London, 1984. [6] M.J. Greenacre. Correspondence Analysis in Practice. Chapman & Hall/CRC, Boca Raton, 2nd edition, 2007. [7] L. Lebart, M. Piron, and A. Morineau. Statistique Exploratoire Multidimensionnelle. Dunod, Paris, 4th edition, 2006. [8] A. Z´ arraga and B. Goitisolo. Independence between questions in the factor analysis of incomplete disjunctive tables with conditioned questions. Questii´

  • , 23(3):465–488, 1999.

[9] A. Z´ arraga and B. Goitisolo. Estudio comparativo de an´ alisis alternativos de Tablas Disyuntivas

  • Incompletas. D.T. Biltoki, 08:1–29, 2000.

[10] A. Z´ arraga and B. Goitisolo. An´ alisis de encuestas con preguntas condicionadas. Metodolog´ ıa de Encuestas, 10:39–58, 2008.