[PPT] - ta prob oble lems ms of of i ipsativ ative e da data PowerPoint Presentation

SLIDE 1

Ho How I Ite tem m Resp sponse

nse Th

Theor

ry

y can sol

lve

ve prob

ble

lems ms of

f i

ipsativ ative e da data ta

17th International Meeting

f the Psychometric Society

Hong Kong 2011

Doctoral student: Anna Brown Advisor: Prof. Alber berto to Maydeu deu-Oliva livares es

SLIDE 2

Practical problem that desperately needed a

solution

– Old – Very widespread (mainly workplace selection and assessment tools, millions of administrations per year) – Thousands of pages in journals over years have been devoted to the problem – Psychometrics with all its sophisticated methodologies had failed to provide a solution

Mot

tiv

ivation ation fo for th this is wo work

2

SLIDE 3

Fo Forced-cho choic ice e respo ponse nse f for

rmat

mat

Curse of ipsative data

3

SLIDE 4

4

SLIDE 5

5

SLIDE 6

6

SLIDE 7

7

SLIDE 8

8

SLIDE 9

Multidimensional Forced-Choice (MFC) format

– Rank-order two or more items from different dimensions

Fo Forced-cho choic ice e fo format

9

A. I manag

age e to relax x easily

B. I am ca

caref eful ul over er detail ail

C. I enjoy

joy workin king g with h othe hers rs

SLIDE 10

– Designed to reduce response biases

Direct comparison overcomes problems with

interpretation of the rating scale

Impossible to endorse all items thus:

– eliminates uniform biases / response sets (acquiescence, extreme/central tendency responding) – Reduces halo/horn effects – May reduce effects of socially desirable responding

Ad Advantages ntages of

f c

com

mpar

arative ative fo forma mats ts

10

SLIDE 11

Inverted rank orders of items are added to their

respective scales

– The same number of points in each block is allocated for any individual – The total score on the test is constant for each individual (ipsative data)

Cl Classic ssical al scori

ring

ng of

f F

FC f C for

rmats

ts

11

Classical score

A. I mana

nage ge to relax ax easi sily ly 1

B. I am carefu

eful l over detai ail 2

C. I enjoy
y worki

rking ng with th others ers

Most Least

     

SLIDE 12

1. Scores are relative

– Impossible to get all high/low scores – Intra-individual comparisons are problematic

2. Construct validity is distorted

– Variance of the total test score is zero – Negative average scale inter-correlation

3. Criterion-related validity is distorted

– Correlations with an external criterion must sum to zero – Compensatory correlations

4. Reliability estimates are distorted

– Basic assumptions are violated (Cronbach’s alpha and

ther coefficients)

Prob

ble

lems ms of

f i

ipsati ative ve da data ta

12

1 1 r d

SLIDE 13

Classical scoring methodology is inadequate for

forced-choice items

– Rankings are treated as ratings (i.e. relative scores are treated as absolute) – Items within each block are NOT assessed independently

Need to radically depart from classical scoring

schemes

– Modelling the psychological process of responding to forced-choice items is the key to making sense of comparative data

Suitable psychological models for such data have

existed for a long time, and they are well known

Inade dequate quate scor

ring

ing

13

SLIDE 14

Th Thurston rstonia ian IRT T mo mode del

Modelling decision process behind responding to forced- choice items

14

SLIDE 15

Louis Thurstone (1927-1929) introduced the

notion of a psychological value or utility

– Describes “the affect that the object calls forth”; – Varies across individuals for the same object, and across objects within the same individual; – Can be placed on a psychological continuum; – Assumed normally distributed across individuals.

The notion of utility maximisation

– when confronted with choosing between two items, respondents will choose the item with the highest psychological value (utility).

Ps Psychol ychologi

gical

cal valu lue

15

SLIDE 16

A respondent prefers item i to item k, if her or his

utility ti is larger than tk

The difference of two utilities is normally

distributed

Binary outcome of comparison yl linked to yl*

through a threshold process

La Law of

f c

com

mparati

rative ve ju judg dgeme ment nt ( (19 1927) 7)

16

1, if , 0, if

i k l l i k

t t y y i k t t

* *

1, if , 0, if

l l l l

y y y i k y

* l i k

y t t  

SLIDE 17

Any ordering or ranking of n choice alternatives requires n(n-1)/2

separate comparison judgements:

– Rank-ordering of 2 items elicits 1 comparison: {A,B} – Rank-ordering of 3 items elicits 3 comparisons: {A,B} {A,C} {B,C} – Rank-ordering of 4 items elicits 6 comparisons: {A,B} {A,C} {A,D} {B,C} {B,D} {C,D} – Etc. – “Most” – “least” format with 4 or more alternatives is partial ranking

Any comparison {i, k} is coded as 1 if i preferred to k, and 0
therwise
Ordering {B, A, C} can be equivalently presented as 3 outcomes

{A, B} {A, C} {B, C} 0 1 1

Bin inary y cod

din

ing of

f r

ranki king ng da data ta

17

SLIDE 18

Maydeu-Olivares (1999); Maydeu-

Olivares & Böckenholt (2005)

Outcomes of comparisons are

determined by the difference in utilities (no error terms)

Second-order factors (traits) can

be modelled

Identification constraints are

needed

– Fixing uniqueness of one utility per block – Factor variances are fixed to 1

Special identification cases

– Pairs of items

Thursto rstonian nian factor

r models

els

18

* 1,2

y

* 1,3

y

* 2,3

y

1



4



7



2



5



3



6



9



8



21



32



31



2 1



1

1

1

1
1

1

t

2 2



2

t

2 3



3

t

2 4



1 1

1

1

1
1

4

t

2 5



5

t

2 6



6

t

2 7



1 1

1

1

1
1

7

t

2 8



8

t

2 9



9

t

* 7,8

y

* 7,9

y

* 8,9

y

* 4,5

y

* 4,6

y

* 5,6

y

1

1

2

1

3

1

SLIDE 19

Thurstonian second-order models cannot be used directly in person-

centric applications

– we are interested in persons’ traits (second-order factors), not the utilities (first-order factors) – but the latent traits cannot be estimated

Re-parameterization as an IRT model (first-order)
Utilities of items i and k are functions of underlying factors (traits)

a and b: ti = i + ia+ i tk = k + kb+ k

Latent difference of utilities yl* = ti-tk is a function of the traits:

yl = ti - tk =(i -k )+(ia - kb)+(i - k) = =-k +(ia - kb)+(i - k)

IRT T reparame ameter terization ization

19

SLIDE 20

The IRF for the binary outcome variable yl, which is the result of

comparison between items i and k measuring traits a and b, is

In intercept / slope form:
Special case of same-trait

(one-dimensional) comparisons

Item em response

nse func

nctio tion

20

Φ

2 2

Pr 1 ,

l i a k b l a b i k

y

Φ Pr 1

l a b l i a k b

y Φ Pr 1 (

l l i k

y

3
2
1

1 2 3 0.0 0.5 1.0

3
1.5

1.5 3

trait 2 P(y=1) trait 1

SLIDE 21

Outcomes of comparisons are

indicators of common factors (traits)

Special features for blocks of 3 or

more items

– Factor loadings are structured – Uniquenesses are structured – Structured local dependencies

Identification constraints are the

same as for the second-order model

Thursto rstonian nian IRT T model el

21

* 1,2

y

2 2 1 2

  

* 1,3

y

2 2 1 3

  

* 2,3

y

2 2 2 3

  

2 2



2 1



2 3



* 7,8

y

2 2 7 8

  

* 7,9

y

2 2 7 9

  

* 8,9

y

2 2 8 9

  

2 8



2 7



2 9



* 4,5

y

2 2 4 5

  

* 4,6

y

2 2 4 6

  

* 5,6

y

2 2 5 6

  

2 5



2 4



2 6



1

1



1



2

 1

2



5



6



7



9



6



21



32



31



2

1

3

1

3



3



4



5



4



8



7



9



8



SLIDE 22

Estimated with general-purpose SEM software

Mplus (Muthén & Muthén, 1998-2010)

Limited information methods are the only option

for most applications

– When partial ranking format is used, Bayesian MI are recommended

Respondents' traits levels are estimated by the

MAP method

– Computationally efficient and unaffected by the number of latent traits

Estim timation ation and sc d scor

ring

ing

22

SLIDE 23

Direction of information is considered
Information in direction of trait a for one binary
utcome

– Smaller when traits are positively correlated

One-dimensional case

Ite tem in info formation tion fu functio tion

23

2 2

corr , , 1 ,

i k a b l i a k b a l a b l a b l a b

P P

3
2
1

1 2 3 0.0 0.1 0.2 0.3 0.4 0.5 0.6

3
1.5

1.5 3

trait 2 Information in direction of trait 1 trait 1

2 2

( ) 1

i k l i k l l l

P P

SLIDE 24

Test information in direction of trait a
When using posterior latent trait estimator
Standard error for trait a
Empirical reliability (for estimated scores in a sample)

Te Test t in info formation tion and re d reli liabil bility ity

24

a a l l

1

a a P

SE

2 2

ln

a a a a P a a

2 1

1 1

N error a j j P

N

2 2 2 P error P

SLIDE 25

Ap Appli lication ation – CC CCSQ

Empirical applications

25

SLIDE 26

Customer Contact Styles Questionnaire (CCSQ)

– Measures 16 work-related traits, used in assessment for customer service roles (published by SHL) – Forced-choice format

128 items grouped into 32 quads (7 to 10 items per scale)

– All items are also administered with a 5-point rating scale

Sample

– N=610 – Paper & Pencil UK standardisation sample from 2001 – 39% female – Half of the sample applicants, half job incumbents

CC CCSQ in Q instrume trument nt and s d sampl ple

26

SLIDE 27

Missing data problem arises

– MAR but not MCAR – Limited information methods would produce distorted parameter estimates (Asparauhov & Muthen, 2010)

Multiple imputations in Mplus are performed
Model parameters are estimated on the 10 imputed

datasets (ULS estimator)

Person parameters are estimated on the original

dataset (MAP estimator)

CC CCSQ mo Q mode del es l estim timation ation

27 Partial ranking Binary Outcomes A B C D {A,B} {A,C} {A,D} {B,C} {B,D} {C,D} most least 1 . 1 1

SLIDE 28

Exa xamp mple le prof rofile ile

28

3.5
3.0
2.5
2.0
1.5
1.0
0.5

0.0 0.5 1.0 1.5 2.0 2.5 3.0 Persuasive Self-control Empathic Modest Participative Sociable Analytical Innovative Flexible Structured Detail conscious Conscientious Resilience Competitive Results orientated Energetic

Standardized scale scores

Normative Ipsative IRT Single- Stimulus IRT Forced- Choice

SLIDE 29

Can have all high/low scores

– Scores are directly interpretable for comparison between individuals

No c

con
nstraint

traint on

n ov
verall

ll te test t scor

re

29

0% 10% 20% 30% 40% 50% Percentage of respondents Average profile score (average of all trait scores ) CTT ipsative CTT normative 0% 2% 4% 6% 8% 10% 12% 14% 16% Percentage of respondents Average profile score (average of all trait scores ) IRT-FC IRT-SS

SLIDE 30

Absolute trait locations are recovered well

– Mahalanobis distances between rating and ranking scores are much smaller with IRT method

Construct validity undistorted

– Average scale inter-correlation for ratings r = 0.21 – For rankings scored with IRT r = 0.12 – And for classically scored rankings (ipsative) r = -.07 – Identical factors extracted from rankings and ratings

Reliabilities can now be estimated as well as SE

for each individual set of scores

Sc Scor

res

s are no lo

longer ip

ipsativ ative

30

SLIDE 31

Opti tima mal l fo forced-cho choice ice de desig igns ns

Simulation studies

31

SLIDE 32

Number of traits 2 5 Correlations between traits

0, +0.5, -0.5 as in FFM

Number of items

12 and 24 items per trait 12 items per trait

Block sizes

pairs pairs, triplets, quads

Keyed direction of items

positively worded / positively and negatively worded positively worded / positively and negatively worded

Conditions crossed (1000 replications with 1000 cases):
For all reasonable designs

– Good parameter recovery, including correlations between traits – Good latent trait recovery – Empirical chi-square rejection rates for more complex models are too high – Empirical reliabilities are sufficiently close to actual reliabilities

Sim imula latio tion n de desig igns ns

32

SLIDE 33

Given sufficient number good quality items, the following are

important factors:

Positively and negatively keyed items

With approximately the same number of binary outcomes comes from comparing items keyed in the same and opposite directions, the trait recovery can be good with any number of traits, and any trait correlations

Number of traits assessed

When the number of traits is large, and traits are not strongly positively correlated overall, any forced-choice designs will reliably locate trait scores

Correlations between traits

Comparing items keyed in the same direction is more effective the lower correlations between the latent traits

Block size

Same items provide more information if combined in larger blocks

Fo Forced-cho choic ice e de desig ign n rule les

33

SLIDE 34

Ipsative ative prob

ble

lem so m solved lved

Conclusions

34

SLIDE 35

Other IRT models have been suggested for creating new

FC questionnaires

– McCloy et al.(2005) – Stark, Chernyshenko & Drasgow (2005) – These models do not provide a solution for the existing forced- choice tests

Thurstonian IRT model can be readily applied to any

forced-choice data, with the objective of estimating

– item parameters, – relationships between the latent traits, – and persons’ parameters.

It works with any existing tests using ranking format

– Any number of items per block – Any number of traits – Multi- and one-dimensional comparisons

Th Thur urstoni stonian an IRT T mo mode del in l in pe persp spectiv ective

35

SLIDE 36

Benefits of the forced-choice format can be enjoyed

without the disadvantages of ipsative data

– Reducing halo effects in research – Cross-cultural research free of response sets – Exploration of factor structures without method factors – Etc.

Embedding the model in an SEM framework allows

further latent variable modeling

There should be no more ipsative data – the problem of

ipsative data has been effectively solved

Grow

wing

ing area of

f r

resear earch ch

36

SLIDE 37

This work would not have been possible without

– the extraordinary support of my advisor Alberto Maydeu- Olivares – moral support given to me by Simon

Thanks to my former employer SHL Group and my research director

Dave Bartram

Thanks to the SMEP for the Dissertation support award
Thanks to the Psychometric Society for the 2011

Dissertation Award

Thank YOU FOR LISTENING!

Ac Ackn knowl

wled

edgeme gements nts

37