Screening the Data for Detecting Methodological induced Variation - - PowerPoint PPT Presentation
Screening the Data for Detecting Methodological induced Variation - - PowerPoint PPT Presentation
Screening the Data for Detecting Methodological induced Variation Jrg Blasius University of Bonn, Germany Victor Thiessen Dalhousie University, Halifax, Canada 6 th CARME Conference Rennes, France, February 9-11, 2011 Substantive and
Substantive and Non-Substantive Variation
Non-substantive variation, produced by
- Response styles, such as acquiescence, disacquiescence, extreme
response styles, midpoint responding, wide range responding, …
- Hidden non-responses (using the midpoint, random responses, …)
- Misunderstanding of questions
- Translations and coding errors (in cross-national surveys)
- Different field work standards (in cross-national surveys)
- Missing data (item non-response)
- Social Desirability
- Primacy and recency effects
- Fatigue effects
- Biased samples (unit non-response)
- Faked interviews
… which is often summarized as “Measurement Error”
Substantive variation, produced by individual attributes – and depending
- n cognitive competencies (which have an effect on the dimensionality of
the solution; Thiessen/Blasius, 2008) Quality of Data: The higher share of substantive variation, or the lower the share of non-substantive variation, the higher is the quality of the data. But: How to assess the quality of data?
Canadian Nationwide Election Study 1984: “Political Trust and Efficacy Data” (N=3,377)
Item SA AS NN DS SD NO a) Generally, those elected to Parliament soon lose touch with the people. 26.6 44.5 3.5 16.1 4.8 4.5 b) I don't think the (Federal) Government cares much about what people like me think. 26.9 32.9 3.8 24.2 9.0 3.2 c) Sometimes, (Federal) Politics and Govern- ment seem so complicated that a person like me can't really understand what's going on. 30.8 33.1 2.5 19.1 12.6 1.9 d) People like me don't have any say about what the Government in (Ottawa) does. 33.4 28.3 2.2 20.0 14.0 2.1 e) So many other people vote in (Federal) elections that it does not matter very much whether I vote or not. 7.8 9.9 1.8 16.0 62.8 1.7 f) Many people in the (Federal) Government are dishonest. 10.5 25.1 10.1 24.6 18.2 11.5 g) People in the (Federal) Government waste a lot of the money we pay in taxes. 46.3 33.2 3.9 9.0 3.6 4.1
Item SA AS NN DS SD NO h) Most of the time we can trust people in the (Federal) Government to do what is right. 10.4 46.0 6.2 23.5 9.7 4.2 i) Most of the people running the (Federal) Government are smart people who usually know what they are doing. 15.9 45.5 5.9 21.0 8.2 3.6
Subset Multiple Correspondence Analysis (SMCA)
SMCA concentrates on just some of the response categories, while exclud- ing others from the solution (Greenacre and Pardo 2006, Greenacre 2007). For example, with SMCA the structure of the subset of NOs can be analyz- ed separately, or these responses can be excluded from the solution while concentrating only on the substantive responses. Suppose we have five variables with four categories, ranging from SA to
- SD. Since the row sums of the indicator matrix are 5, SMCA maintain the
equal weighting of all respondents, the row profile values are 0.2 and zero. If we concentrate on SA, respondents with five answers on SA will have five profile values of 0.2 (and a row sum of 1.0), respondents with four answers on SA will have four profile values of 0.2 (and a row sum of 0.8), respondents with two answers on SA will have two profile values of 0.2 (and a row sum of 0.4); in case of omitting the categories they would have four profile values of 0.25 (or two values of 0.5) and a row sum of one.
SMCA, Burt-Table
a1 a2 a4 a5 b1 b2 ... i1 i2 i4 i5 a3 a9 b3 b9 c3 c9 ... i3 i9 a1 a2 a4 a5 b1 b2 ... i4 i5 Subset MCA, Set 1 Interaction, Set 1 × Set 2 a3 a9 b3 b9 ... i3 i9 Interaction, Set 2 × Set 1 Subset MCA, Set 2
Constructing a two-dimensional Map by Means
- f (Subset) Multiple Correspondence Analysis
- Best method to see different kinds of methodologically-induced
variation, for example, response sets; as well as to distinguish between methodologically-induced and substantive variation
- In MCA and SMCA, similarities between variable categories (or bet-
ween respondents) are reflected by short (Euclidian) distances, dissi- milarities by large distances
- If the quality of data is high, in MCA/SMCA the first dimension should
capture mainly substantive variation due to political efficacy and trust, with the second dimension reflecting the horseshoe.
- The items associated with the first dimension should retain their
- rdinality in this dimension.
- If people did not pay attention to the direction of the questions, the
responses to the negatively-formulated items will not conform to an
- rdinal scale.
- The horseshoe might also appear on the first dimension (large amount
- f non-substantive variation) or between dimensions 1 and 2 (two-
dimensional solution, data might be on high quality).
- If there is a high intercorrelation within the non-substantive responses,
in MCA, the first or second dimension will just reflect the difference between substantive and non-substantive responses, in SMCA the non- substantive responses can be excluded without missing any information as it is true in the case in listwise deletion.
−1.0 −0.5 0.0 0.5 1.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0
a1 a2 a3 a4 a5 a9 b1 b2 b3 b4 b5 b9 c1 c2 c3 c4 c5 c9 d1 d2 d3 d4 d5 d9 e1 e2 e3 e4 e5 e9 f1 f2 f3 f4 f5 f9 g1 g2 g3 g4 g5 g9 h1 h2 h3 h4 h5 h9 i1 i2 i3 i4 i5 i9
Fedgov, N = 3,377; all respondents
−0.5 0.0 0.5 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8
a1 a2 a3 a4 a5 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 d1 d2 d3 d4 d5 e1 e2 e3 e4 e5 f1 f2 f3 f4 f5 g1 g2 g3 g4 g5 h1 h2 h3 h4 h5 i1 i2 i3 i4 i5
SMCA (1,2,3,4,5), Fedgov, N=3,377
−2.5 −2.0 −1.5 −1.0 −0.5 0.0 −0.5 0.0 0.5 1.0 1.5
a3 a9 b3 b9 c3 c9 d3 d9 e3 e9 f3 f9 g3 g9 h3 h9 i3 i9
SMCA (3,9), Fedgov, N=3,377
−0.8 −0.6 −0.4 −0.2 0.0 −0.6 −0.4 −0.2 0.0 0.2
a1 b1 c1 d1 e1 f1 g1 h5 i5
Fedgov, SMCA (1), N=3,377
−0.4 −0.3 −0.2 −0.1 −0.4 −0.3 −0.2 −0.1 0.0 0.1 0.2
a2 b2 c2 d2 e2 f2 g2 h4 i4
Fedgov, SMCA (2), N=3,377
−1.5 −1.0 −0.5 0.0 −1.5 −1.0 −0.5 0.0 0.5
a3 b3 c3 d3 e3 f3 g3 h3 i3
Fedgov, SMCA (3), N=3,377
0.0 0.2 0.4 0.6 −0.4 −0.2 0.0 0.2
a4 b4 c4 d4 e4 f4 g4 h2 i2
Fedgov, SMCA (4), N=3,377
0.0 0.2 0.4 0.6 0.8 1.0 −0.6 −0.4 −0.2 0.0 0.2 0.4
a5 b5 c5 d5 e5 f5 g5 h1 i1
Fedgov, SMCA (5), N=3,377
−2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0
a9 b9 c9 d9 e9 f9 g9 h9 i9
Fedgov, SMCA (9), N=3,377
Decomposition of inertia, SMCA, Federal Government
Dimension 1 Dimension 2 Total Model K Abs. In % Abs. In % Abs. In % All categories 45 0.1118 15.8 0.1036 14.6 0.7083 100.0 Subset(1,2,3,4,5) Subset(9) Interaction 45 9 9 0.1107 0.0929 0.0046 20.3 62.7 57.8 0.0625 0.0117 0.0010 11.5 7.9 12.7 0.5441 0.1481 0.0080 76.8 20.9 1.1 Subset(1,2,4,5) Subset(3,9) Interaction 36 18 18 0.1095 0.0934 0.0066 25.9 36.2 47.9 0.0599 0.0327 0.0017 14.2 12.6 12.2 0.4225 0.2583 0.0137 59.6 36.5 1.9 Subset(1) Subset(2) Subset(3) Subset(4) Subset(5) Subset(9) 9 9 9 9 9 9 0.0543 0.0150 0.0326 0.0229 0.0442 0.0929 56.7 24.1 30.0 32.4 45.5 62.7 0.0128 0.0114 0.0140 0.0093 0.0138 0.0117 13.3 18.3 12.9 13.2 14.2 7.9 0.0959 0.0622 0.1086 0.0705 0.0972 0.1481 13.5 8.8 15.3 10.0 13.7 20.9 Subset(1): First category items “a” to “g”, last category items “h” and “I”, and so on. Example: 76.8 + 20.9 + 2 × 1.1 = 100.0
Understanding of questions, subdivision by political interest: First row, low PI, N = 1,935; second row: High PI, N = 1,441
Item SA AS NN DS SD NO χ2 a) Generally, those elected to Parlia- ment soon lose touch with the people. 28.1 24.6 44.4 44.8 3.9 3.0 13.5 19.5 3.5 6.5 6.6 1.7 85.2 b) I don't think the (Federal) Govern- ment cares much about what people like me think. 30.0 22.7 33.0 32.8 4.1 3.4 22.2 26.9 6.5 12.3 4.2 1.9 69.5 c) Sometimes, (Federal) Politics and Government seem so complicated that a person like me can't really understand what's going on. 38.4 20.7 34.5 31.4 2.8 1.9 15.1 24.4 6.8 20.3 2.3 1.3 249.7 d) People like me don't have any say about what the Government in (Ottawa) does. 38.0 27.1 28.8 27.8 2.8 1.5 16.8 24.3 10.7 18.5 2.9 0.9 111.1 e) So many other people vote in (Fe- deral) elections that it does not matter very much whether I vote or not. 10.4 4.3 12.5 6.5 2.2 1.3 19.2 11.8 53.4 75.4 2.4 0.7 179.2 f) Many people in the (Federal) Govern- ment are dishonest. 11.4 9.2 26.0 23.9 11.0 9.0 23.3 26.4 13.4 24.7 15.0 6.8 118.2
Item SA AS NN DS SD NO χ2 g) People in the (Federal) Government waste a lot of the money we pay in taxes. 46.4 46.1 33.3 33.0 4.5 3.0 7.7 10.7 2.5 5.1 5.5 2.0 54.2 h) Most of the time we can trust people in the (Federal) Government to do what is right. 8.8 12.6 47.1 44.6 7.4 4.6 22.4 24.8 8.9 10.8 5.4 2.6 43.9 i) Most of the people running the (Federal) Government are smart people who usually know what they are doing. 14.5 17.8 46.8 43.7 7.1 4.2 19.1 23.5 7.6 9.0 4.9 1.7 51.5 One missing case because one respondents did not answer the political interest items.
−1 1 −4 −3 −2 −1
a1 a2a3 a4 a5 a9 b1 b2 b3 b4 b5 b9 c1 c2 c3c4 c5 c9 d1 d2 d3 d4 d5 d9 e1 e2 e3 e4 e5 e9 f1 f2 f3 f4 f5 f9 g1 g2 g3 g4 g5 g9 h1 h2 h3 h4 h5 h9 i1 i2 i3 i4 i5 i9
Fedgov, High Political Interest
−0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 −0.4 −0.2 0.0 0.2 0.4 0.6
a1 a2 a3 a4 a5 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 d1 d2 d3 d4 d5 e1 e2 e3 e4 e5 f1 f2 f3 f4 f5 g1 g2 g3 g4 g5 h1 h2 h3 h4 h5 i1 i2 i3 i4 i5
Fedgov, SMCA (1,2,3,4,5), High Political Interest
−1.0 −0.5 0.0 0.5 1.0 −2.0 −1.5 −1.0 −0.5 0.0
a1 a2 a3 a4 a5 a9 b1 b2 b3 b4 b5 b9 c1 c2 c3 c4 c5 c9 d1 d2 d3 d4 d5 d9 e1 e2 e3 e4 e5 e9 f1 f2 f3 f4 f5 f9 g1 g2 g3 g4 g5 g9 h1 h2 h3 h4 h5 h9 i1 i2 i3 i4 i5 i9
Fedgov, MCA, Low Political Interest
−1.0 −0.5 0.0 0.5 0.0 0.5 1.0
a1 a2 a3 a4 a5 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 d1 d2 d3 d4 d5 e1 e2 e3 e4 e5 f1 f2 f3 f4 f5 g1 g2 g3 g4 g5 h1 h2 h3 h4 h5 i1 i2 i3 i4 i5
Fedgov, SMCA (1,2,3,4,5), Low Political Interest
a1 a1 b1 b1 c1 c1 a5 b5 c5 a5 b5 c5
2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 0.5 0.5 0.5 0.5 0.5 0.5
- 0.5
- 0.5
- 1.0
- 1.0
- 1.0
- 0.5
- 0.5
- 0.5
Generally, those elected to Parliament soon lose touch with the people I don't think the Federal Government cares much about what people like me think Sometimes, Federal Politics and Government seem so complicated that a person like me can't really understand what's going on
Single Items: high (solid lines) versus low (dashed lines) political interest
4
d1 d1 e1 e1 f1 f1 d5 e5 f5 f5 e5 d5
2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 0.5 0.5 0.5 0.5 0.5
- 0.5
- 1.0
- 1.0
- 1.0
- 0.5
- 0.5
- 0.5
People like me don't have any say about what the Government in Ottawa does So many other people vote in Federal elections that it does not matter very much whether I vote or not Many people in the Federal Government are dishonest
i1 i1 g1 g1 h1 h1 i5 h5 g5 g5 h5 i5
2 2 2 2 2 3 3 2 3 3 3 3 4 4 4 4 4 4 0.5 0.5 0.5 0.5 0.5
- 0.5
- 1.0
- 1.0
- 1.0
- 0.5
- 0.5
People in the Federal Government waste a lot of the money we pay in taxes Most of the time we can trust people in the Federal Government to do what is right Most of the people running the Federal Government are smart people who usually know what they are doing
Decomposition of inertia, low and high political interest, SMCA of single categories
Low PI, 9 items High PI, 9 items Low PI, 7 items High PI, 7 items Model Inertia D1, in % Inertia D1, in % Inertia D1, in % Inertia D1, in % Subset(1) Subset(2) Subset(3) Subset(4) Subset(5) Subset(9) 0.0915 0.0619 0.1086 0.0741 0.0996 0.1412 57.7 24.2 31.6 36.0 40.0 61.0 0.1010 0.0631 0.1088 0.0664 0.0899 0.1478 54.7 24.4 26.2 27.9 46.7 59.4 0.0629 0.0451 0.0842 0.0655 0.0755 0.1025 63.9 31.3 34.1 38.1 44.3 60.6 0.0733 0.0475 0.0842 0.0576 0.0651 0.1071 61.0 29.6 29.0 30.8 49.9 58.7
Fatigue Effect: First row, Federal Government, N = 3,377; second row: Provincial Government, N=3,346
Item SA AS NN DS SD NO a) Generally, those elected to Parliament soon lose touch with the people. 26.6 24.2 44.5 42.0 3.5 2.7 16.1 19.8 4.8 6.2 4.5 5.1 b) I don't think the (Federal) Government cares much about what people like me think. 26.9 24.7 32.9 31.7 3.8 2.2 24.2 28.4 9.0 9.3 3.2 3.8 c) Sometimes, (Federal) Politics and Govern- ment seem so complicated that a person like me can't really understand what's going on. 30.8 25.9 33.1 36.6 2.5 1.6 19.1 18.5 12.6 14.6 1.9 2.8 d) People like me don't have any say about what the Government in (Ottawa) does. 33.4 25.1 28.3 29.7 2.2 2.0 20.0 25.1 14.0 15.1 2.1 3.0 e) So many other people vote in (Federal) elections that it does not matter very much whether I vote or not. 7.8 6.7 9.9 9.0 1.8 1.6 16.0 17.2 62.8 62.5 1.7 2.9 f) Many people in the (Federal) Government are dishonest. 10.5 8.9 25.1 22.9 10.1 9.3 24.6 26.7 18.2 18.3 11.5 14.0 g) People in the (Federal) Government waste a lot of the money we pay in taxes. 46.3 35.2 33.2 39.3 3.9 3.7 9.0 11.7 3.6 3.8 4.1 6.3
Item SA AS NN DS SD NO h) Most of the time we can trust people in the (Federal) Government to do what is right. 10.4 11.0 46.0 49.2 6.2 5.7 23.5 18.4 9.7 10.2 4.2 5.4 i) Most of the people running the (Federal) Government are smart people who usually know what they are doing. 15.9 14.1 45.5 49.3 5.9 5.1 21.0 18.2 8.2 7.5 3.6 5.7
0.0 0.5 1.0 1.5 2.0 2.5 3.0 −0.5 0.0 0.5 1.0
a1 a2 a3 a4 a5 a9 b1 b2 b3 b4 b5 b9 c1 c2 c3 c4 c5 c9 d1 d2 d3 d4 d5 d9 e1 e2 e3 e4 e5 e9 f1 f2 f3 f4 f5 f9 g1 g2 g3 g4 g5 g9 h1 h2 h3 h4 h5 h9 i1 i2 i3 i4 i5 i9
Provgov, MCA, all respondents
−1.0 −0.5 0.0 0.5 0.0 0.5 1.0
a1 a2 a3 a4 a5 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 d1 d2 d3 d4 d5 e1 e2 e3 e4 e5 f1 f2 f3 f4 f5 g1 g2 g3 g4 g5 h1 h2 h3 h4 h5 i1 i2 i3 i4 i5
Provgov, SMCA (1,2,3,4,5), all respondents
Decomposition of inertia, SMCA, Provincial and Federal Government
Provincial Government Federal Government D1 Total D1 Total Model K Abs. In % Abs. In % Abs. In % Abs. In % All categories 45 0.2222 24.9 0.8914 100.0 0.1118 15.8 0.7083 100.0 Subset(1-5) Subset(9) Interaction 45 9 9 0.1568 0.1972 0.0113 24.9 85.0 79.3 0.6309 0.2320 0.0143 70.8 26.0 1.6 0.1107 0.0929 0.0046 20.3 62.7 57.8 0.5441 0.1481 0.0080 76.8 20.9 1.1 Subset(1,2,4,5) Subset(3,9) Interaction 36 18 18 0.1559 0.1976 0.0137 31.2 56.0 69.6 0.4989 0.3531 0.0197 56.0 39.6 2.2 0.1095 0.0934 0.0066 25.9 36.2 47.9 0.4225 0.2583 0.0137 59.6 36.5 1.9 Subset(1) Subset(2) Subset(3) Subset(4) Subset(5) Subset(9) 9 9 9 9 9 9 0.0865 0.0239 0.0487 0.0340 0.0700 0.1972 68.2 34.7 40.7 46.1 60.9 85.0 0.1268 0.0689 0.1197 0.0737 0.1150 0.2320 14.2 7.7 13.4 8.3 12.9 26.0 0.0543 0.0150 0.0326 0.0229 0.0442 0.0929 56.7 24.1 30.0 32.4 45.5 62.7 0.0959 0.0622 0.1086 0.0705 0.0972 0.1481 13.5 8.8 15.3 10.0 13.7 20.9
Decomposition of inertia, SMCA, Provincial and Federal Government
Provincial Government Federal Government D1 Total D1 Total Model K Abs. In % Abs. In % Abs. In % Abs. In % All categories 45 0.2222 24.9 0.8914 100.0 0.1118 15.8 0.7083 100.0 Subset(1-5) Subset(9) Interaction 45 9 9 0.1568 0.1972 0.0113 24.9 85.0 79.3 0.6309 0.2320 0.0143 70.8 26.0 1.6 0.1107 0.0929 0.0046 20.3 62.7 57.8 0.5441 0.1481 0.0080 76.8 20.9 1.1 Subset(1,2,4,5) Subset(3,9) Interaction 36 18 18 0.1559 0.1976 0.0137 31.2 56.0 69.6 0.4989 0.3531 0.0197 56.0 39.6 2.2 0.1095 0.0934 0.0066 25.9 36.2 47.9 0.4225 0.2583 0.0137 59.6 36.5 1.9 Subset(1) Subset(2) Subset(3) Subset(4) Subset(5) Subset(9) 9 9 9 9 9 9 0.0865 0.0239 0.0487 0.0340 0.0700 0.1972 68.2 34.7 40.7 46.1 60.9 85.0 0.1268 0.0689 0.1197 0.0737 0.1150 0.2320 14.2 7.7 13.4 8.3 12.9 26.0 0.0543 0.0150 0.0326 0.0229 0.0442 0.0929 56.7 24.1 30.0 32.4 45.5 62.7 0.0959 0.0622 0.1086 0.0705 0.0972 0.1481 13.5 8.8 15.3 10.0 13.7 20.9
Decomposition of inertia, SMCA, Provincial and Federal Government
Provincial Government Federal Government D1 Total D1 Total Model K Abs. In % Abs. In % Abs. In % Abs. In % All categories 45 0.2222 24.9 0.8914 100.0 0.1118 15.8 0.7083 100.0 Subset(1-5) Subset(9) Interaction 45 9 9 0.1568 0.1972 0.0113 24.9 85.0 79.3 0.6309 0.2320 0.0143 70.8 26.0 1.6 0.1107 0.0929 0.0046 20.3 62.7 57.8 0.5441 0.1481 0.0080 76.8 20.9 1.1 Subset(1,2,4,5) Subset(3,9) Interaction 36 18 18 0.1559 0.1976 0.0137 31.2 56.0 69.6 0.4989 0.3531 0.0197 56.0 39.6 2.2 0.1095 0.0934 0.0066 25.9 36.2 47.9 0.4225 0.2583 0.0137 59.6 36.5 1.9 Subset(1) Subset(2) Subset(3) Subset(4) Subset(5) Subset(9) 9 9 9 9 9 9 0.0865 0.0239 0.0487 0.0340 0.0700 0.1972 68.2 34.7 40.7 46.1 60.9 85.0 0.1268 0.0689 0.1197 0.0737 0.1150 0.2320 14.2 7.7 13.4 8.3 12.9 26.0 0.0543 0.0150 0.0326 0.0229 0.0442 0.0929 56.7 24.1 30.0 32.4 45.5 62.7 0.0959 0.0622 0.1086 0.0705 0.0972 0.1481 13.5 8.8 15.3 10.0 13.7 20.9
Constructing Scales using Categorical Principal Component Analysis
- An ordering of response categories is required, for example,
categories running from “strongly agree” to “strongly disagree”
- Recalculating the distances between successive categories within
an iterative procedure (Gifi 1990)
- Since the order of categories is retained, the minimum value
between two successive categories is zero (ties)
- The number of dimensions must be set in advance, often k=2
- Visualisation of variables is possible with the help of biplots
(Gower and Hand 1996)
- CatPCA is especially used to exclude methodologically-induced
variation and for visualizing the substantive information in ordinal data.
CatPCA Quantifications, divided by Political Interest
Strongly agree Agree somewhat Neither nor Disagree somewhat Disagree strongly
Generally, those elected to parliament soon lose touch with the people.
- 1.618
- 1.406
0.197 0.372 0.730 1.030 0.956 1.249 1.444 1.249 I don't think the Fed. Gov. cares much about what people like me think.
- 1.569
- 1.366
- 0.119
0.216 0.302 0.858 0.785 1.072 1.413 1.097 Sometimes, Federal politics and govern- ment seem so complicated … .
- 1.773
- 1.224
- 0.130
0.380 0.353 0.940 0.612 1.199 1.128 1.253 People like me don't have any say about what the government in Ottawa does.
- 1.524
- 1.187
0.039 0.393 0.551 0.945 0.722 1.129 1.108 1.129 So many other people vote in Federal elections that it does not matter … .
- 4.137
- 2.730
- 1.360
- 0.828
- 1.081
0.099
- 0.244
0.407 0.400 0.516 Many people in the Federal Government are dishonest.
- 2.142
- 2.133
- 0.674
- 0.303
- 0.172
0.170 0.281 0.566 1.232 1.306 People in the Federal government waste a lot of the money we pay in taxes.
- 1.024
- 0.965
0.660 0.880 1.121 1.136 1.387 1.371 1.472 1.484 Most of the time we can trust people in the Fed. Gov. to do what is right.
- 1.648
- 1.247
- 0.439
- 0.591
- 0.118
- 0.234
0.844 0.837 1.823 2.352 Most of the people running the Federal Government are smart people … .
- 1.212
- 0.892
- 0.453
- 0.550
- 0.114
- 0.102
0.904 0.969 2.203 2.542
An Idea for Measuring the Quality of Ordered Categorical Data: The Dirty Data Index (DDI)
CatPCA quantification values are standardized to mean of zero and standard deviation of one. The closer the data to be metric, the small- er the difference between PCA and CatPCA solutions and the closer is the distribution of the quantification values to the values of the standard normal distribution. Comparing the quantification areas (derived from the standard normal distribution) with the cumulative frequencies (midpoints) provides with an indicator for the “quality”
- f the respective set of items.
Limitation: Missing data have to be excluded either by listwise dele- tion or by some kind of imputation technique.
Notation N = Number of cases K = Number of items, with k = item index JK = Number of categories in each item, with j = item category index fjk = Frequency of category j of item k mjk = Mass (relative frequency of category j of item k) c(1,j)k = Cumulative relative frequencies of categories of item k qjk = Quantification of category j of item k (provided by CatPCA) With: Relative frequencies (masses) for each category: mj = fj / N Cumulative masses: c(1,j) = mj + c(1,j-1) (if j = 1, c(1,j-1) = 0)
First step, compute midpoints of the item(s): Start with the relative frequency (mass) of the first category and divide the value by 2; g1= m1 / 2 (for j = 1) → first midpoint on the level of relative frequencies. Add the mass of the first category (m1) with the mass (divided by 2) of the second category (m2 / 2); g2 = g1 + m2 / 2 → second threshold value. Add the masses of the first two masses plus the half mass of the third category, ..., add the first (JK – 1) masses (c(1,J-1)) plus half of mass of the last category. Note: The number of midpoints is the same as the number of categories (= JK): Do: j = 1 to JK (for each item k); gj = gj-1 + mj / 2 (with g0 = 0)
Second step: Compute the areas left to the quantification values (qjk) (based on standard normal distribution) Third step: Compute the differences (absolute values) between midpoint (areas) and quantification (areas) and add them. Last step: From simulation studies, the upper bound of the value from the fourth step is
- 1
k k
, with k = number of categories (the DDI has
to be standardized by this value). The lower bound of the DDI is 0 (which is a theoretical value). Using random data, depending on the given distribution (“normal”, “u-shaped”, …) the DDI fluc- tuate between 0.5 and 0.7 (simulation studies).
FG_1: Generally, those elected to parliament soon lose touch with the people A) Freq. B) Quantif. C) Mass D) cum. Mass E) Mpts Area F) Qtf. Area G) Diff. 1 736
- 1.508
0.280 0.280 0.140 0.066 0.075 2 1224 0.306 0.466 0.747 0.514 0.620 0.106 3 88 0.717 0.034 0.780 0.764 0.763 0.001 4 444 1.122 0.169 0.950 0.865 0.869 0.004 5 132 1.322 0.050 1.000 0.975 0.907 0.068 Sum 2624 1.000 0.253 With C(1) = A(1) / N = 736 / 2624 = 0.280; D = cumulative values of C; E(1) = D(1) / 2 = 0.280 / 2 = 0.140, E(2) = D(1) + C(2) / 2 = 0.280 + 0.466 / 2 = 0.514; …; E(5) = D(4) + C(5) / 2 = 0.950 + 0.050 / 2 = 0.975 F(1) = STANDNORMDIS(B1) = STANDNORMDIS(-1.508) = 0.066 G(1) = ABS(E1-F1); Note, the value has to by standardized by 5 1.25 1 4 k k = = − DDI(FG_1): 0.253/1.25 = 0.203
Calculation of DDI, Fedgov, divided by Political Interest High PI, N=1,243 Low PI, N=1,381 Generally, those elected to parliament soon lose touch with the people. 0.1993 0.2639 I don't think the Federal Government cares much about what people like me think. 0.2014 0.3854 Sometimes, Federal politics and government seem so complicated that a person like me … . 0.3967 0.3368 People like me don't have any say about what the government in Ottawa does. 0.3643 0.3965 So many other people vote in Federal elections that it does not matter very much … . 0.3581 0.6530 Many people in the Federal Government are dishonest. 0.1188 0.2281 People in the Federal government waste a lot of the money we pay in taxes. 0.2566 0.2807 Most of the time we can trust people in the Federal Government to do what is right. 0.2219 0.5204 Most of the people running the Federal Government are smart people who usually know … . 0.5615 0.5194 Average Distance 0.2976 0.3982
Calculation of DDI, Fatigue Effect FG; N=2,624 PG; N=2,579 Generally, those elected to parliament soon lose touch with the people. 0.2026 0.3481 I don't think the Federal Government cares much about what people like me think. 0.3002 0.4245 Sometimes, Federal politics and government seem so complicated that a person like me can’t really … . 0.4194 0.4896 People like me don't have any say about what the government in Ottawa does. 0.3959 0.4864 So many other people vote in Federal elections that it does not matter very much whether I vote or not. 0.4618 0.7136 Many people in the Federal Government are dishonest. 0.1459 0.2662 People in the Federal government waste a lot of the money we pay in taxes. 0.2172 0.3172 Most of the time we can trust people in the Federal Government to do what is right. 0.3750 0.5432 Most of the people running the Federal Government are smart people who usually know what they are doing. 0.5504 0.6148 Average Distance 0.3409 0.4671
Conclusion
- MCA and SMCA make minimal demands on the data. Both are
particular suited for distinguishing between substantive and methodologically-induced variation.
- Comparing the inertias and the explained variances from SMCA
- f dimension 1 indicates differences in the amount of methodolo-
gical induced variation, in the given example acquiescence and the understanding of questions.
- For measuring the “quality of ordinal data”, we constructed the
Dirty Data Index (DDI) based on the differences between the quantification areas from CatPCA (from the standard normal distribution) and the empirical cumulative frequencies (midpoint areas).
- Using this index for assessing the quality of ordinal data, it could
be shown that in the CNES 1984 the political highly interested people provides with “better data” than those with low political
- interest. Furthermore, when comparing our DDI computed on the
basis of the same items but collected at different points in time (and only slightly different subjects, here, federal and governmen- tal government) it could be shown that there is a clear fatigue effect in the data – the responses given roughly half an hour earlier are of better quality.
- While SMCA is based on the single categories over all items, DDI