correspondence analysis of surveys with conditioned and
play

Correspondence Analysis of Surveys with Conditioned and Multiple - PowerPoint PPT Presentation

Correspondence Analysis of Surveys with Conditioned and Multiple Response Questions Amaya Z arraga and Beatriz Goitisolo Department of Econometrics and Statistics. University of Basque Country. Spain First Prev Next Last


  1. Correspondence Analysis of Surveys with Conditioned and Multiple Response Questions Amaya Z´ arraga and Beatriz Goitisolo Department of Econometrics and Statistics. University of Basque Country. Spain • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit

  2. Contents 1 Introduction: Surveys with closed questions with a finite number of response categories 3 2 How to analyze surveys 7 3 Possible Solution: Creation of the CDT 9 3.1 Effects of forcing the creation of a CDT . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4 Another possible solution: CA of the PDT 12 4.1 Problems Resulting from the Application of CA to the PDT: Effect on Distances . . . 13 5 Suggested Approach: CA of PDT with a modified marginal 16 5.1 Computation of Axes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6 Illustrative Example 19 7 Conclusions 24 • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit

  3. 1. Introduction: Surveys with closed questions with a finite number of response categories 1. Multiple Choice Questions: individuals choose one and only one response category • Gender – Male – Female • Have you ever taken a course on computers? – Yes, in the last year – Yes, more than a year ago – No, never • Use of computers every day? – Yes – No • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit

  4. 2. Multiple Response Questions: individuals can choose more than one category • Have you ever, even once, used the following? – Tobacco – Alcohol – Marijuana – Cocaine – Crack – Heroin – Hallucinogens – Inhalants – Pain Relievers – Tranquilizers – Stimulants – Sedatives • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit

  5. 3. Conditioned Response Questions: individuals must answer a question or not depending on their answer to a previous one. • Use of computers every day? – Yes – No (go to question 16) • Purpose of computer use: Leisure – Yes – No • Purpose of computer use: Music – Yes – No • Purpose of computer use: Games – Yes – No • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit

  6. 4. Conditioned Multiple Response Questions: • Is the number of children you have the desired one? – Yes (go to question 26) – No • Which of the following are the reasons of this discrepancy? – Desire to continue studying – Problems of health – Supposes loss of freedom . . . • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit

  7. 2. How to analyze surveys ⇒ The study and visualization of the relationships among response categories 1. Multiple Choice: Classical analysis: MCA ⇒ Create Complete Disjunctive Table (CDT) coding as 0 (category of no chosen responses) and 1 (category of chosen response) ⇒ Create Burt’s Table Gender Course ... < 1 > 1 No i M F 1 1 0 1 0 0 1 0 0 Q 2 0 1 0 1 0 0 1 0 Q 3 1 0 0 0 1 1 0 0 Q . . . n n n n nQ � 1 1 value z ij = ∀ q ∈ Q 0 J q − 1 values z q = 1 ∀ q ∈ Q ∀ i ∈ I i z q = n ∀ q ∈ Q z i. = Q ∀ i ∈ I z = nQ • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit

  8. 2. Multiple Choice, Multiple Response, Conditioned: Gender Course Drugs Computer Purpose Sedatives Tobacco Games Music < 1 > 1 No i M F ... Y N 1 1 0 1 0 0 1 ... 1 1 0 1 ... 0 ?= z i. 2 0 1 0 1 0 1 ... 0 0 1 0 ... 0 ?= z i ′ . 3 1 0 0 0 1 0 ... 1 0 1 0 ... 0 ? . . . n n ? n ? ? = = = z q z q ′ z � 0 J q values for some i and some q conditioned questions z ij = 1 J q values for some i and some q multiple response questions z q � = 1 for some i and some q i z q � = n for some q z i. � = Q for some i z � = nQ ⇒ Partial Disjunctive Table (PDT) • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit

  9. 3. Possible Solution: Creation of the CDT ⇒ Advantage: MCA ⇒ For each response category (in MRQ) a new category that denies the previous one (fictitious or dummy category (D)) have to be created. ⇒ m original categories ⇒ m questions ⇒ 2m final categories Drugs Tobacco ... Sedatives i Y D Y D Y D 1 1 0 1 0 2 1 0 0 1 . . . ⇒ For conditioned questions by a previous one, a new category indicating Not required to answer (NRA) is created for each question. ⇒ For conditioned MRQ, both types of artificial category (D) and (NRA) have to be created for each original category. ⇒ m original categories ⇒ m questions ⇒ 3m final categories Gender ... Children C. Studying C. Health ... NRA NRA Yes Yes Yes No M D D i F 1 1 0 0 1 1 0 0 1 0 0 2 0 1 0 1 1 0 0 0 1 0 . . . 1 0 1 0 0 0 1 0 0 1 1 0 1 0 0 0 1 0 0 1 • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit

  10. 3.1. Effects of forcing the creation of a CDT • Increase in the number of response categories ⇒ – Increase in the variability (inertia in terms of CA) – All the categories (originals + fictitious) contribute to the creation of factorial axes – Planes covered by points (complicating the interpretation) • Dummy categories may really fit to the negative of the original category but can also hide a desire of not to answer and/or ignorance of the response. Aim: study of the relationships among original categories • In the case ”pink k / m ” ( k < m ), ( m − k ) dummy categories which only represent the restriction of choosing k among the original m are created. • Dummy categories may have similar response patterns and even they can create the first fac- torial axes (case in conditioned questions). • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit

  11. Completed Disjunctive Table with Not Required to Answer (NRA) categories 1. Advantage: MCA 2. Disadvantage: could create the first axes Analysis of the CDT Factor 2 ( 10.47 %) Internet-No ILeisure-NRA ISchool-NRA IOther-NRA 1 IPHome-NRA IPSchool-NRA IPFriends-NRA CLeisure-No IPPublic-NRA CSchool-No IPCibercafe-NRA CPHome-No CPSchool-Yes Computer-Yes CPPublic-No CPcibercafe-No CPFriends-No CPSchool-No Mobile-No CPcibercafe-Yes COther-No CSchool-Yes CLeisure-Yes CPHome-Yes 0 COther-Yes CPFriends-Yes CPPublic-Yes Mobile-Yes 74.37% Factor 1 Internet-Yes ILeisure-Yes ILeisure-No 73.18% Factor 2 ISchool-Yes ISchool-No IOther-Yes IOther-No IPHome-Yes IPHome-No IPFriends-Yes IPFriends-No IPSchool-Yes IPSchool-No IPPublic-Yes IPPublic-No -1 IPCibercafe-Yes IPCibercafe-No Computer-No CLeisure-NRA CSchool-NRA COther-NRA CPHome-NRA CPSchool-NRA CPPublic-NRA CPFriends-NRA CPcibercafe-NRA -2 0 1 2 3 Factor 1 ( 86.13 %) • Survey on Equipment and Use of Information and Communication Technologies in the Home (Spanish Institute of Statistics, 2007) • Block: Use of computers and the Internet by children (aged 10 to 15) (18 questions) • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit

  12. 4. Another possible solution: CA of the PDT Frequencies and Profiles Relative and marginal frequencies: p ij = z ij p ij = z i. p ij = z .j � � p i. = p .j = z z z j ∈J i ∈I Row profiles i, i ∈ I : p ij = z ij N ( I ) ⊂ R J ∀ j ∈ J ⇒ p i. z i. Column profiles j, j ∈ J : p ij = z ij N ( J ) ⊂ R n ∀ i ∈ I ⇒ p .j z .j • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit

  13. 4.1. Problems Resulting from the Application of CA to the PDT: Effect on Distances In CA, similarity between any pair of row profiles and between any pair of column profiles is calculated by means of the χ 2 distance. The χ 2 distance between two row profiles i and i ′ : � p ij � 2 � z ij � 2 1 − p i ′ j z − z i ′ j � � d 2 ( i, i ′ ) = = p .j p i. p i ′ . z .j z i. z i ′ . j ∈J j ∈J In CDT: � 1 � 0 � 1 � 2 � 2 � 2 d 2 ( i = 1 , i ′ = 2) = nQ Q − 0 + nQ Q − 1 + · · · + nQ Q − 1 + . . . z .M Q z .F Q z .CY Q � �� � � �� � � �� � � =0 � =0 =0 In PDT , z 1 . � = z 2 . : � 1 � 0 � 1 � 2 � 2 � 2 z − 0 + z − 1 z − 1 d 2 ( i = 1 , i ′ = 2) = + · · · + + . . . z .M z 1 . z 2 . z .F z 1 . z 2 . z .CY z 1 . z 2 . � �� � � = 0 • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit

  14. The χ 2 distance between two column profiles j and j ′ : � p ij � 2 � z ij � 2 1 − p ij ′ z − z ij ′ � � d 2 ( j, j ′ ) = = p i. p .j p .j ′ z i. z .j z .j ′ i ∈I i ∈I In CDT z i. = Q ∀ q ∈ Q : � z ij � 2 nQ − z ij ′ � d 2 ( j, j ′ ) = Q z .j z .j ′ i ∈I In PDT , z i. � = z i ′ . : � z ij � 2 z − z ij ′ � d 2 ( j, j ′ ) = z i. z .j z .j ′ i ∈I • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit

  15. The χ 2 distance between column profile j and average profile : 1 ( p ij � 2 d 2 ( j, G J ) = − p i. ) p i. p .j i p ij = 1 ∀ i ∈ I p .j n In CDT p i. = 1 ∀ i : n � 1 � 2 n − 1 � d 2 ( j, G J ) = n = 0 n i In PDT , p i. = z i. z : � 1 � 2 z n − z i. � d 2 ( j, G J ) = � = 0 z i. z i • First • Prev • Next • Last • Go Back • Full Screen • Close • Quit

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend