Screening the Data for Detecting Methodological induced Variation - - PowerPoint PPT Presentation

screening the data for detecting methodological induced
SMART_READER_LITE
LIVE PREVIEW

Screening the Data for Detecting Methodological induced Variation - - PowerPoint PPT Presentation

Screening the Data for Detecting Methodological induced Variation Jrg Blasius University of Bonn, Germany Victor Thiessen Dalhousie University, Halifax, Canada 6 th CARME Conference Rennes, France, February 9-11, 2011 Substantive and


slide-1
SLIDE 1

Screening the Data for Detecting Methodological induced Variation

Jörg Blasius University of Bonn, Germany Victor Thiessen Dalhousie University, Halifax, Canada 6th CARME Conference Rennes, France, February 9-11, 2011

slide-2
SLIDE 2

Substantive and Non-Substantive Variation

Non-substantive variation, produced by

  • Response styles, such as acquiescence, disacquiescence, extreme

response styles, midpoint responding, wide range responding, …

  • Hidden non-responses (using the midpoint, random responses, …)
  • Misunderstanding of questions
  • Translations and coding errors (in cross-national surveys)
  • Different field work standards (in cross-national surveys)
  • Missing data (item non-response)
  • Social Desirability
  • Primacy and recency effects
  • Fatigue effects
  • Biased samples (unit non-response)
  • Faked interviews

… which is often summarized as “Measurement Error”

slide-3
SLIDE 3

Substantive variation, produced by individual attributes – and depending

  • n cognitive competencies (which have an effect on the dimensionality of

the solution; Thiessen/Blasius, 2008) Quality of Data: The higher share of substantive variation, or the lower the share of non-substantive variation, the higher is the quality of the data. But: How to assess the quality of data?

slide-4
SLIDE 4

Canadian Nationwide Election Study 1984: “Political Trust and Efficacy Data” (N=3,377)

Item SA AS NN DS SD NO a) Generally, those elected to Parliament soon lose touch with the people. 26.6 44.5 3.5 16.1 4.8 4.5 b) I don't think the (Federal) Government cares much about what people like me think. 26.9 32.9 3.8 24.2 9.0 3.2 c) Sometimes, (Federal) Politics and Govern- ment seem so complicated that a person like me can't really understand what's going on. 30.8 33.1 2.5 19.1 12.6 1.9 d) People like me don't have any say about what the Government in (Ottawa) does. 33.4 28.3 2.2 20.0 14.0 2.1 e) So many other people vote in (Federal) elections that it does not matter very much whether I vote or not. 7.8 9.9 1.8 16.0 62.8 1.7 f) Many people in the (Federal) Government are dishonest. 10.5 25.1 10.1 24.6 18.2 11.5 g) People in the (Federal) Government waste a lot of the money we pay in taxes. 46.3 33.2 3.9 9.0 3.6 4.1

slide-5
SLIDE 5

Item SA AS NN DS SD NO h) Most of the time we can trust people in the (Federal) Government to do what is right. 10.4 46.0 6.2 23.5 9.7 4.2 i) Most of the people running the (Federal) Government are smart people who usually know what they are doing. 15.9 45.5 5.9 21.0 8.2 3.6

slide-6
SLIDE 6

Subset Multiple Correspondence Analysis (SMCA)

SMCA concentrates on just some of the response categories, while exclud- ing others from the solution (Greenacre and Pardo 2006, Greenacre 2007). For example, with SMCA the structure of the subset of NOs can be analyz- ed separately, or these responses can be excluded from the solution while concentrating only on the substantive responses. Suppose we have five variables with four categories, ranging from SA to

  • SD. Since the row sums of the indicator matrix are 5, SMCA maintain the

equal weighting of all respondents, the row profile values are 0.2 and zero. If we concentrate on SA, respondents with five answers on SA will have five profile values of 0.2 (and a row sum of 1.0), respondents with four answers on SA will have four profile values of 0.2 (and a row sum of 0.8), respondents with two answers on SA will have two profile values of 0.2 (and a row sum of 0.4); in case of omitting the categories they would have four profile values of 0.25 (or two values of 0.5) and a row sum of one.

slide-7
SLIDE 7

SMCA, Burt-Table

a1 a2 a4 a5 b1 b2 ... i1 i2 i4 i5 a3 a9 b3 b9 c3 c9 ... i3 i9 a1 a2 a4 a5 b1 b2 ... i4 i5 Subset MCA, Set 1 Interaction, Set 1 × Set 2 a3 a9 b3 b9 ... i3 i9 Interaction, Set 2 × Set 1 Subset MCA, Set 2

slide-8
SLIDE 8

Constructing a two-dimensional Map by Means

  • f (Subset) Multiple Correspondence Analysis
  • Best method to see different kinds of methodologically-induced

variation, for example, response sets; as well as to distinguish between methodologically-induced and substantive variation

  • In MCA and SMCA, similarities between variable categories (or bet-

ween respondents) are reflected by short (Euclidian) distances, dissi- milarities by large distances

  • If the quality of data is high, in MCA/SMCA the first dimension should

capture mainly substantive variation due to political efficacy and trust, with the second dimension reflecting the horseshoe.

  • The items associated with the first dimension should retain their
  • rdinality in this dimension.
slide-9
SLIDE 9
  • If people did not pay attention to the direction of the questions, the

responses to the negatively-formulated items will not conform to an

  • rdinal scale.
  • The horseshoe might also appear on the first dimension (large amount
  • f non-substantive variation) or between dimensions 1 and 2 (two-

dimensional solution, data might be on high quality).

  • If there is a high intercorrelation within the non-substantive responses,

in MCA, the first or second dimension will just reflect the difference between substantive and non-substantive responses, in SMCA the non- substantive responses can be excluded without missing any information as it is true in the case in listwise deletion.

slide-10
SLIDE 10

−1.0 −0.5 0.0 0.5 1.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0

a1 a2 a3 a4 a5 a9 b1 b2 b3 b4 b5 b9 c1 c2 c3 c4 c5 c9 d1 d2 d3 d4 d5 d9 e1 e2 e3 e4 e5 e9 f1 f2 f3 f4 f5 f9 g1 g2 g3 g4 g5 g9 h1 h2 h3 h4 h5 h9 i1 i2 i3 i4 i5 i9

Fedgov, N = 3,377; all respondents

slide-11
SLIDE 11

−0.5 0.0 0.5 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8

a1 a2 a3 a4 a5 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 d1 d2 d3 d4 d5 e1 e2 e3 e4 e5 f1 f2 f3 f4 f5 g1 g2 g3 g4 g5 h1 h2 h3 h4 h5 i1 i2 i3 i4 i5

SMCA (1,2,3,4,5), Fedgov, N=3,377

slide-12
SLIDE 12

−2.5 −2.0 −1.5 −1.0 −0.5 0.0 −0.5 0.0 0.5 1.0 1.5

a3 a9 b3 b9 c3 c9 d3 d9 e3 e9 f3 f9 g3 g9 h3 h9 i3 i9

SMCA (3,9), Fedgov, N=3,377

slide-13
SLIDE 13

−0.8 −0.6 −0.4 −0.2 0.0 −0.6 −0.4 −0.2 0.0 0.2

a1 b1 c1 d1 e1 f1 g1 h5 i5

Fedgov, SMCA (1), N=3,377

slide-14
SLIDE 14

−0.4 −0.3 −0.2 −0.1 −0.4 −0.3 −0.2 −0.1 0.0 0.1 0.2

a2 b2 c2 d2 e2 f2 g2 h4 i4

Fedgov, SMCA (2), N=3,377

slide-15
SLIDE 15

−1.5 −1.0 −0.5 0.0 −1.5 −1.0 −0.5 0.0 0.5

a3 b3 c3 d3 e3 f3 g3 h3 i3

Fedgov, SMCA (3), N=3,377

slide-16
SLIDE 16

0.0 0.2 0.4 0.6 −0.4 −0.2 0.0 0.2

a4 b4 c4 d4 e4 f4 g4 h2 i2

Fedgov, SMCA (4), N=3,377

slide-17
SLIDE 17

0.0 0.2 0.4 0.6 0.8 1.0 −0.6 −0.4 −0.2 0.0 0.2 0.4

a5 b5 c5 d5 e5 f5 g5 h1 i1

Fedgov, SMCA (5), N=3,377

slide-18
SLIDE 18

−2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

a9 b9 c9 d9 e9 f9 g9 h9 i9

Fedgov, SMCA (9), N=3,377

slide-19
SLIDE 19

Decomposition of inertia, SMCA, Federal Government

Dimension 1 Dimension 2 Total Model K Abs. In % Abs. In % Abs. In % All categories 45 0.1118 15.8 0.1036 14.6 0.7083 100.0 Subset(1,2,3,4,5) Subset(9) Interaction 45 9 9 0.1107 0.0929 0.0046 20.3 62.7 57.8 0.0625 0.0117 0.0010 11.5 7.9 12.7 0.5441 0.1481 0.0080 76.8 20.9 1.1 Subset(1,2,4,5) Subset(3,9) Interaction 36 18 18 0.1095 0.0934 0.0066 25.9 36.2 47.9 0.0599 0.0327 0.0017 14.2 12.6 12.2 0.4225 0.2583 0.0137 59.6 36.5 1.9 Subset(1) Subset(2) Subset(3) Subset(4) Subset(5) Subset(9) 9 9 9 9 9 9 0.0543 0.0150 0.0326 0.0229 0.0442 0.0929 56.7 24.1 30.0 32.4 45.5 62.7 0.0128 0.0114 0.0140 0.0093 0.0138 0.0117 13.3 18.3 12.9 13.2 14.2 7.9 0.0959 0.0622 0.1086 0.0705 0.0972 0.1481 13.5 8.8 15.3 10.0 13.7 20.9 Subset(1): First category items “a” to “g”, last category items “h” and “I”, and so on. Example: 76.8 + 20.9 + 2 × 1.1 = 100.0

slide-20
SLIDE 20

Understanding of questions, subdivision by political interest: First row, low PI, N = 1,935; second row: High PI, N = 1,441

Item SA AS NN DS SD NO χ2 a) Generally, those elected to Parlia- ment soon lose touch with the people. 28.1 24.6 44.4 44.8 3.9 3.0 13.5 19.5 3.5 6.5 6.6 1.7 85.2 b) I don't think the (Federal) Govern- ment cares much about what people like me think. 30.0 22.7 33.0 32.8 4.1 3.4 22.2 26.9 6.5 12.3 4.2 1.9 69.5 c) Sometimes, (Federal) Politics and Government seem so complicated that a person like me can't really understand what's going on. 38.4 20.7 34.5 31.4 2.8 1.9 15.1 24.4 6.8 20.3 2.3 1.3 249.7 d) People like me don't have any say about what the Government in (Ottawa) does. 38.0 27.1 28.8 27.8 2.8 1.5 16.8 24.3 10.7 18.5 2.9 0.9 111.1 e) So many other people vote in (Fe- deral) elections that it does not matter very much whether I vote or not. 10.4 4.3 12.5 6.5 2.2 1.3 19.2 11.8 53.4 75.4 2.4 0.7 179.2 f) Many people in the (Federal) Govern- ment are dishonest. 11.4 9.2 26.0 23.9 11.0 9.0 23.3 26.4 13.4 24.7 15.0 6.8 118.2

slide-21
SLIDE 21

Item SA AS NN DS SD NO χ2 g) People in the (Federal) Government waste a lot of the money we pay in taxes. 46.4 46.1 33.3 33.0 4.5 3.0 7.7 10.7 2.5 5.1 5.5 2.0 54.2 h) Most of the time we can trust people in the (Federal) Government to do what is right. 8.8 12.6 47.1 44.6 7.4 4.6 22.4 24.8 8.9 10.8 5.4 2.6 43.9 i) Most of the people running the (Federal) Government are smart people who usually know what they are doing. 14.5 17.8 46.8 43.7 7.1 4.2 19.1 23.5 7.6 9.0 4.9 1.7 51.5 One missing case because one respondents did not answer the political interest items.

slide-22
SLIDE 22

−1 1 −4 −3 −2 −1

a1 a2a3 a4 a5 a9 b1 b2 b3 b4 b5 b9 c1 c2 c3c4 c5 c9 d1 d2 d3 d4 d5 d9 e1 e2 e3 e4 e5 e9 f1 f2 f3 f4 f5 f9 g1 g2 g3 g4 g5 g9 h1 h2 h3 h4 h5 h9 i1 i2 i3 i4 i5 i9

Fedgov, High Political Interest

slide-23
SLIDE 23

−0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 −0.4 −0.2 0.0 0.2 0.4 0.6

a1 a2 a3 a4 a5 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 d1 d2 d3 d4 d5 e1 e2 e3 e4 e5 f1 f2 f3 f4 f5 g1 g2 g3 g4 g5 h1 h2 h3 h4 h5 i1 i2 i3 i4 i5

Fedgov, SMCA (1,2,3,4,5), High Political Interest

slide-24
SLIDE 24

−1.0 −0.5 0.0 0.5 1.0 −2.0 −1.5 −1.0 −0.5 0.0

a1 a2 a3 a4 a5 a9 b1 b2 b3 b4 b5 b9 c1 c2 c3 c4 c5 c9 d1 d2 d3 d4 d5 d9 e1 e2 e3 e4 e5 e9 f1 f2 f3 f4 f5 f9 g1 g2 g3 g4 g5 g9 h1 h2 h3 h4 h5 h9 i1 i2 i3 i4 i5 i9

Fedgov, MCA, Low Political Interest

slide-25
SLIDE 25

−1.0 −0.5 0.0 0.5 0.0 0.5 1.0

a1 a2 a3 a4 a5 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 d1 d2 d3 d4 d5 e1 e2 e3 e4 e5 f1 f2 f3 f4 f5 g1 g2 g3 g4 g5 h1 h2 h3 h4 h5 i1 i2 i3 i4 i5

Fedgov, SMCA (1,2,3,4,5), Low Political Interest

slide-26
SLIDE 26

a1 a1 b1 b1 c1 c1 a5 b5 c5 a5 b5 c5

2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 0.5 0.5 0.5 0.5 0.5 0.5

  • 0.5
  • 0.5
  • 1.0
  • 1.0
  • 1.0
  • 0.5
  • 0.5
  • 0.5

Generally, those elected to Parliament soon lose touch with the people I don't think the Federal Government cares much about what people like me think Sometimes, Federal Politics and Government seem so complicated that a person like me can't really understand what's going on

Single Items: high (solid lines) versus low (dashed lines) political interest

slide-27
SLIDE 27

4

d1 d1 e1 e1 f1 f1 d5 e5 f5 f5 e5 d5

2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 0.5 0.5 0.5 0.5 0.5

  • 0.5
  • 1.0
  • 1.0
  • 1.0
  • 0.5
  • 0.5
  • 0.5

People like me don't have any say about what the Government in Ottawa does So many other people vote in Federal elections that it does not matter very much whether I vote or not Many people in the Federal Government are dishonest

slide-28
SLIDE 28

i1 i1 g1 g1 h1 h1 i5 h5 g5 g5 h5 i5

2 2 2 2 2 3 3 2 3 3 3 3 4 4 4 4 4 4 0.5 0.5 0.5 0.5 0.5

  • 0.5
  • 1.0
  • 1.0
  • 1.0
  • 0.5
  • 0.5

People in the Federal Government waste a lot of the money we pay in taxes Most of the time we can trust people in the Federal Government to do what is right Most of the people running the Federal Government are smart people who usually know what they are doing

slide-29
SLIDE 29

Decomposition of inertia, low and high political interest, SMCA of single categories

Low PI, 9 items High PI, 9 items Low PI, 7 items High PI, 7 items Model Inertia D1, in % Inertia D1, in % Inertia D1, in % Inertia D1, in % Subset(1) Subset(2) Subset(3) Subset(4) Subset(5) Subset(9) 0.0915 0.0619 0.1086 0.0741 0.0996 0.1412 57.7 24.2 31.6 36.0 40.0 61.0 0.1010 0.0631 0.1088 0.0664 0.0899 0.1478 54.7 24.4 26.2 27.9 46.7 59.4 0.0629 0.0451 0.0842 0.0655 0.0755 0.1025 63.9 31.3 34.1 38.1 44.3 60.6 0.0733 0.0475 0.0842 0.0576 0.0651 0.1071 61.0 29.6 29.0 30.8 49.9 58.7

slide-30
SLIDE 30

Fatigue Effect: First row, Federal Government, N = 3,377; second row: Provincial Government, N=3,346

Item SA AS NN DS SD NO a) Generally, those elected to Parliament soon lose touch with the people. 26.6 24.2 44.5 42.0 3.5 2.7 16.1 19.8 4.8 6.2 4.5 5.1 b) I don't think the (Federal) Government cares much about what people like me think. 26.9 24.7 32.9 31.7 3.8 2.2 24.2 28.4 9.0 9.3 3.2 3.8 c) Sometimes, (Federal) Politics and Govern- ment seem so complicated that a person like me can't really understand what's going on. 30.8 25.9 33.1 36.6 2.5 1.6 19.1 18.5 12.6 14.6 1.9 2.8 d) People like me don't have any say about what the Government in (Ottawa) does. 33.4 25.1 28.3 29.7 2.2 2.0 20.0 25.1 14.0 15.1 2.1 3.0 e) So many other people vote in (Federal) elections that it does not matter very much whether I vote or not. 7.8 6.7 9.9 9.0 1.8 1.6 16.0 17.2 62.8 62.5 1.7 2.9 f) Many people in the (Federal) Government are dishonest. 10.5 8.9 25.1 22.9 10.1 9.3 24.6 26.7 18.2 18.3 11.5 14.0 g) People in the (Federal) Government waste a lot of the money we pay in taxes. 46.3 35.2 33.2 39.3 3.9 3.7 9.0 11.7 3.6 3.8 4.1 6.3

slide-31
SLIDE 31

Item SA AS NN DS SD NO h) Most of the time we can trust people in the (Federal) Government to do what is right. 10.4 11.0 46.0 49.2 6.2 5.7 23.5 18.4 9.7 10.2 4.2 5.4 i) Most of the people running the (Federal) Government are smart people who usually know what they are doing. 15.9 14.1 45.5 49.3 5.9 5.1 21.0 18.2 8.2 7.5 3.6 5.7

slide-32
SLIDE 32

0.0 0.5 1.0 1.5 2.0 2.5 3.0 −0.5 0.0 0.5 1.0

a1 a2 a3 a4 a5 a9 b1 b2 b3 b4 b5 b9 c1 c2 c3 c4 c5 c9 d1 d2 d3 d4 d5 d9 e1 e2 e3 e4 e5 e9 f1 f2 f3 f4 f5 f9 g1 g2 g3 g4 g5 g9 h1 h2 h3 h4 h5 h9 i1 i2 i3 i4 i5 i9

Provgov, MCA, all respondents

slide-33
SLIDE 33

−1.0 −0.5 0.0 0.5 0.0 0.5 1.0

a1 a2 a3 a4 a5 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 d1 d2 d3 d4 d5 e1 e2 e3 e4 e5 f1 f2 f3 f4 f5 g1 g2 g3 g4 g5 h1 h2 h3 h4 h5 i1 i2 i3 i4 i5

Provgov, SMCA (1,2,3,4,5), all respondents

slide-34
SLIDE 34

Decomposition of inertia, SMCA, Provincial and Federal Government

Provincial Government Federal Government D1 Total D1 Total Model K Abs. In % Abs. In % Abs. In % Abs. In % All categories 45 0.2222 24.9 0.8914 100.0 0.1118 15.8 0.7083 100.0 Subset(1-5) Subset(9) Interaction 45 9 9 0.1568 0.1972 0.0113 24.9 85.0 79.3 0.6309 0.2320 0.0143 70.8 26.0 1.6 0.1107 0.0929 0.0046 20.3 62.7 57.8 0.5441 0.1481 0.0080 76.8 20.9 1.1 Subset(1,2,4,5) Subset(3,9) Interaction 36 18 18 0.1559 0.1976 0.0137 31.2 56.0 69.6 0.4989 0.3531 0.0197 56.0 39.6 2.2 0.1095 0.0934 0.0066 25.9 36.2 47.9 0.4225 0.2583 0.0137 59.6 36.5 1.9 Subset(1) Subset(2) Subset(3) Subset(4) Subset(5) Subset(9) 9 9 9 9 9 9 0.0865 0.0239 0.0487 0.0340 0.0700 0.1972 68.2 34.7 40.7 46.1 60.9 85.0 0.1268 0.0689 0.1197 0.0737 0.1150 0.2320 14.2 7.7 13.4 8.3 12.9 26.0 0.0543 0.0150 0.0326 0.0229 0.0442 0.0929 56.7 24.1 30.0 32.4 45.5 62.7 0.0959 0.0622 0.1086 0.0705 0.0972 0.1481 13.5 8.8 15.3 10.0 13.7 20.9

slide-35
SLIDE 35

Decomposition of inertia, SMCA, Provincial and Federal Government

Provincial Government Federal Government D1 Total D1 Total Model K Abs. In % Abs. In % Abs. In % Abs. In % All categories 45 0.2222 24.9 0.8914 100.0 0.1118 15.8 0.7083 100.0 Subset(1-5) Subset(9) Interaction 45 9 9 0.1568 0.1972 0.0113 24.9 85.0 79.3 0.6309 0.2320 0.0143 70.8 26.0 1.6 0.1107 0.0929 0.0046 20.3 62.7 57.8 0.5441 0.1481 0.0080 76.8 20.9 1.1 Subset(1,2,4,5) Subset(3,9) Interaction 36 18 18 0.1559 0.1976 0.0137 31.2 56.0 69.6 0.4989 0.3531 0.0197 56.0 39.6 2.2 0.1095 0.0934 0.0066 25.9 36.2 47.9 0.4225 0.2583 0.0137 59.6 36.5 1.9 Subset(1) Subset(2) Subset(3) Subset(4) Subset(5) Subset(9) 9 9 9 9 9 9 0.0865 0.0239 0.0487 0.0340 0.0700 0.1972 68.2 34.7 40.7 46.1 60.9 85.0 0.1268 0.0689 0.1197 0.0737 0.1150 0.2320 14.2 7.7 13.4 8.3 12.9 26.0 0.0543 0.0150 0.0326 0.0229 0.0442 0.0929 56.7 24.1 30.0 32.4 45.5 62.7 0.0959 0.0622 0.1086 0.0705 0.0972 0.1481 13.5 8.8 15.3 10.0 13.7 20.9

slide-36
SLIDE 36

Decomposition of inertia, SMCA, Provincial and Federal Government

Provincial Government Federal Government D1 Total D1 Total Model K Abs. In % Abs. In % Abs. In % Abs. In % All categories 45 0.2222 24.9 0.8914 100.0 0.1118 15.8 0.7083 100.0 Subset(1-5) Subset(9) Interaction 45 9 9 0.1568 0.1972 0.0113 24.9 85.0 79.3 0.6309 0.2320 0.0143 70.8 26.0 1.6 0.1107 0.0929 0.0046 20.3 62.7 57.8 0.5441 0.1481 0.0080 76.8 20.9 1.1 Subset(1,2,4,5) Subset(3,9) Interaction 36 18 18 0.1559 0.1976 0.0137 31.2 56.0 69.6 0.4989 0.3531 0.0197 56.0 39.6 2.2 0.1095 0.0934 0.0066 25.9 36.2 47.9 0.4225 0.2583 0.0137 59.6 36.5 1.9 Subset(1) Subset(2) Subset(3) Subset(4) Subset(5) Subset(9) 9 9 9 9 9 9 0.0865 0.0239 0.0487 0.0340 0.0700 0.1972 68.2 34.7 40.7 46.1 60.9 85.0 0.1268 0.0689 0.1197 0.0737 0.1150 0.2320 14.2 7.7 13.4 8.3 12.9 26.0 0.0543 0.0150 0.0326 0.0229 0.0442 0.0929 56.7 24.1 30.0 32.4 45.5 62.7 0.0959 0.0622 0.1086 0.0705 0.0972 0.1481 13.5 8.8 15.3 10.0 13.7 20.9

slide-37
SLIDE 37

Constructing Scales using Categorical Principal Component Analysis

  • An ordering of response categories is required, for example,

categories running from “strongly agree” to “strongly disagree”

  • Recalculating the distances between successive categories within

an iterative procedure (Gifi 1990)

  • Since the order of categories is retained, the minimum value

between two successive categories is zero (ties)

  • The number of dimensions must be set in advance, often k=2
  • Visualisation of variables is possible with the help of biplots

(Gower and Hand 1996)

  • CatPCA is especially used to exclude methodologically-induced

variation and for visualizing the substantive information in ordinal data.

slide-38
SLIDE 38

CatPCA Quantifications, divided by Political Interest

Strongly agree Agree somewhat Neither nor Disagree somewhat Disagree strongly

Generally, those elected to parliament soon lose touch with the people.

  • 1.618
  • 1.406

0.197 0.372 0.730 1.030 0.956 1.249 1.444 1.249 I don't think the Fed. Gov. cares much about what people like me think.

  • 1.569
  • 1.366
  • 0.119

0.216 0.302 0.858 0.785 1.072 1.413 1.097 Sometimes, Federal politics and govern- ment seem so complicated … .

  • 1.773
  • 1.224
  • 0.130

0.380 0.353 0.940 0.612 1.199 1.128 1.253 People like me don't have any say about what the government in Ottawa does.

  • 1.524
  • 1.187

0.039 0.393 0.551 0.945 0.722 1.129 1.108 1.129 So many other people vote in Federal elections that it does not matter … .

  • 4.137
  • 2.730
  • 1.360
  • 0.828
  • 1.081

0.099

  • 0.244

0.407 0.400 0.516 Many people in the Federal Government are dishonest.

  • 2.142
  • 2.133
  • 0.674
  • 0.303
  • 0.172

0.170 0.281 0.566 1.232 1.306 People in the Federal government waste a lot of the money we pay in taxes.

  • 1.024
  • 0.965

0.660 0.880 1.121 1.136 1.387 1.371 1.472 1.484 Most of the time we can trust people in the Fed. Gov. to do what is right.

  • 1.648
  • 1.247
  • 0.439
  • 0.591
  • 0.118
  • 0.234

0.844 0.837 1.823 2.352 Most of the people running the Federal Government are smart people … .

  • 1.212
  • 0.892
  • 0.453
  • 0.550
  • 0.114
  • 0.102

0.904 0.969 2.203 2.542

slide-39
SLIDE 39

An Idea for Measuring the Quality of Ordered Categorical Data: The Dirty Data Index (DDI)

CatPCA quantification values are standardized to mean of zero and standard deviation of one. The closer the data to be metric, the small- er the difference between PCA and CatPCA solutions and the closer is the distribution of the quantification values to the values of the standard normal distribution. Comparing the quantification areas (derived from the standard normal distribution) with the cumulative frequencies (midpoints) provides with an indicator for the “quality”

  • f the respective set of items.

Limitation: Missing data have to be excluded either by listwise dele- tion or by some kind of imputation technique.

slide-40
SLIDE 40
slide-41
SLIDE 41

Notation N = Number of cases K = Number of items, with k = item index JK = Number of categories in each item, with j = item category index fjk = Frequency of category j of item k mjk = Mass (relative frequency of category j of item k) c(1,j)k = Cumulative relative frequencies of categories of item k qjk = Quantification of category j of item k (provided by CatPCA) With: Relative frequencies (masses) for each category: mj = fj / N Cumulative masses: c(1,j) = mj + c(1,j-1) (if j = 1, c(1,j-1) = 0)

slide-42
SLIDE 42

First step, compute midpoints of the item(s): Start with the relative frequency (mass) of the first category and divide the value by 2; g1= m1 / 2 (for j = 1) → first midpoint on the level of relative frequencies. Add the mass of the first category (m1) with the mass (divided by 2) of the second category (m2 / 2); g2 = g1 + m2 / 2 → second threshold value. Add the masses of the first two masses plus the half mass of the third category, ..., add the first (JK – 1) masses (c(1,J-1)) plus half of mass of the last category. Note: The number of midpoints is the same as the number of categories (= JK): Do: j = 1 to JK (for each item k); gj = gj-1 + mj / 2 (with g0 = 0)

slide-43
SLIDE 43

Second step: Compute the areas left to the quantification values (qjk) (based on standard normal distribution) Third step: Compute the differences (absolute values) between midpoint (areas) and quantification (areas) and add them. Last step: From simulation studies, the upper bound of the value from the fourth step is

  • 1

k k

, with k = number of categories (the DDI has

to be standardized by this value). The lower bound of the DDI is 0 (which is a theoretical value). Using random data, depending on the given distribution (“normal”, “u-shaped”, …) the DDI fluc- tuate between 0.5 and 0.7 (simulation studies).

slide-44
SLIDE 44

FG_1: Generally, those elected to parliament soon lose touch with the people A) Freq. B) Quantif. C) Mass D) cum. Mass E) Mpts Area F) Qtf. Area G) Diff. 1 736

  • 1.508

0.280 0.280 0.140 0.066 0.075 2 1224 0.306 0.466 0.747 0.514 0.620 0.106 3 88 0.717 0.034 0.780 0.764 0.763 0.001 4 444 1.122 0.169 0.950 0.865 0.869 0.004 5 132 1.322 0.050 1.000 0.975 0.907 0.068 Sum 2624 1.000 0.253 With C(1) = A(1) / N = 736 / 2624 = 0.280; D = cumulative values of C; E(1) = D(1) / 2 = 0.280 / 2 = 0.140, E(2) = D(1) + C(2) / 2 = 0.280 + 0.466 / 2 = 0.514; …; E(5) = D(4) + C(5) / 2 = 0.950 + 0.050 / 2 = 0.975 F(1) = STANDNORMDIS(B1) = STANDNORMDIS(-1.508) = 0.066 G(1) = ABS(E1-F1); Note, the value has to by standardized by 5 1.25 1 4 k k = = − DDI(FG_1): 0.253/1.25 = 0.203

slide-45
SLIDE 45

Calculation of DDI, Fedgov, divided by Political Interest High PI, N=1,243 Low PI, N=1,381 Generally, those elected to parliament soon lose touch with the people. 0.1993 0.2639 I don't think the Federal Government cares much about what people like me think. 0.2014 0.3854 Sometimes, Federal politics and government seem so complicated that a person like me … . 0.3967 0.3368 People like me don't have any say about what the government in Ottawa does. 0.3643 0.3965 So many other people vote in Federal elections that it does not matter very much … . 0.3581 0.6530 Many people in the Federal Government are dishonest. 0.1188 0.2281 People in the Federal government waste a lot of the money we pay in taxes. 0.2566 0.2807 Most of the time we can trust people in the Federal Government to do what is right. 0.2219 0.5204 Most of the people running the Federal Government are smart people who usually know … . 0.5615 0.5194 Average Distance 0.2976 0.3982

slide-46
SLIDE 46

Calculation of DDI, Fatigue Effect FG; N=2,624 PG; N=2,579 Generally, those elected to parliament soon lose touch with the people. 0.2026 0.3481 I don't think the Federal Government cares much about what people like me think. 0.3002 0.4245 Sometimes, Federal politics and government seem so complicated that a person like me can’t really … . 0.4194 0.4896 People like me don't have any say about what the government in Ottawa does. 0.3959 0.4864 So many other people vote in Federal elections that it does not matter very much whether I vote or not. 0.4618 0.7136 Many people in the Federal Government are dishonest. 0.1459 0.2662 People in the Federal government waste a lot of the money we pay in taxes. 0.2172 0.3172 Most of the time we can trust people in the Federal Government to do what is right. 0.3750 0.5432 Most of the people running the Federal Government are smart people who usually know what they are doing. 0.5504 0.6148 Average Distance 0.3409 0.4671

slide-47
SLIDE 47

Conclusion

  • MCA and SMCA make minimal demands on the data. Both are

particular suited for distinguishing between substantive and methodologically-induced variation.

  • Comparing the inertias and the explained variances from SMCA
  • f dimension 1 indicates differences in the amount of methodolo-

gical induced variation, in the given example acquiescence and the understanding of questions.

  • For measuring the “quality of ordinal data”, we constructed the

Dirty Data Index (DDI) based on the differences between the quantification areas from CatPCA (from the standard normal distribution) and the empirical cumulative frequencies (midpoint areas).

slide-48
SLIDE 48
  • Using this index for assessing the quality of ordinal data, it could

be shown that in the CNES 1984 the political highly interested people provides with “better data” than those with low political

  • interest. Furthermore, when comparing our DDI computed on the

basis of the same items but collected at different points in time (and only slightly different subjects, here, federal and governmen- tal government) it could be shown that there is a clear fatigue effect in the data – the responses given roughly half an hour earlier are of better quality.

  • While SMCA is based on the single categories over all items, DDI

is based on the single items across all (ordered categorical) cate- gories.