ta prob oble lems ms of of i ipsativ ative e da data - - PowerPoint PPT Presentation

ta prob oble lems ms of of i ipsativ ative e da data
SMART_READER_LITE
LIVE PREVIEW

ta prob oble lems ms of of i ipsativ ative e da data - - PowerPoint PPT Presentation

17 th International Meeting of the Psychometric Society Hong Kong 2011 How I Ho Ite tem m Resp sponse onse Th Theor ory y can sol olve ve ta prob oble lems ms of of i ipsativ ative e da data Doctoral student: Anna Brown


slide-1
SLIDE 1

Ho How I Ite tem m Resp sponse

  • nse Th

Theor

  • ry

y can sol

  • lve

ve prob

  • ble

lems ms of

  • f i

ipsativ ative e da data ta

17th International Meeting

  • f the Psychometric Society

Hong Kong 2011

Doctoral student: Anna Brown Advisor: Prof. Alber berto to Maydeu deu-Oliva livares es

slide-2
SLIDE 2
  • Practical problem that desperately needed a

solution

– Old – Very widespread (mainly workplace selection and assessment tools, millions of administrations per year) – Thousands of pages in journals over years have been devoted to the problem – Psychometrics with all its sophisticated methodologies had failed to provide a solution

Mot

  • tiv

ivation ation fo for th this is wo work

2

slide-3
SLIDE 3

Fo Forced-cho choic ice e respo ponse nse f for

  • rmat

mat

Curse of ipsative data

3

slide-4
SLIDE 4

4

slide-5
SLIDE 5

5

slide-6
SLIDE 6

6

slide-7
SLIDE 7

7

slide-8
SLIDE 8

8

slide-9
SLIDE 9
  • Multidimensional Forced-Choice (MFC) format

– Rank-order two or more items from different dimensions

Fo Forced-cho choic ice e fo format

9

  • A. I manag

age e to relax x easily

  • B. I am ca

caref eful ul over er detail ail

  • C. I enjoy

joy workin king g with h othe hers rs

slide-10
SLIDE 10

– Designed to reduce response biases

  • Direct comparison overcomes problems with

interpretation of the rating scale

  • Impossible to endorse all items thus:

– eliminates uniform biases / response sets (acquiescence, extreme/central tendency responding) – Reduces halo/horn effects – May reduce effects of socially desirable responding

Ad Advantages ntages of

  • f c

com

  • mpar

arative ative fo forma mats ts

10

slide-11
SLIDE 11
  • Inverted rank orders of items are added to their

respective scales

– The same number of points in each block is allocated for any individual – The total score on the test is constant for each individual (ipsative data)

Cl Classic ssical al scori

  • ring

ng of

  • f F

FC f C for

  • rmats

ts

11

Classical score

  • A. I mana

nage ge to relax ax easi sily ly 1

  • B. I am carefu

eful l over detai ail 2

  • C. I enjoy
  • y worki

rking ng with th others ers

Most Least

     

slide-12
SLIDE 12
  • 1. Scores are relative

– Impossible to get all high/low scores – Intra-individual comparisons are problematic

  • 2. Construct validity is distorted

– Variance of the total test score is zero – Negative average scale inter-correlation

  • 3. Criterion-related validity is distorted

– Correlations with an external criterion must sum to zero – Compensatory correlations

  • 4. Reliability estimates are distorted

– Basic assumptions are violated (Cronbach’s alpha and

  • ther coefficients)

Prob

  • ble

lems ms of

  • f i

ipsati ative ve da data ta

12

1 1 r d

slide-13
SLIDE 13
  • Classical scoring methodology is inadequate for

forced-choice items

– Rankings are treated as ratings (i.e. relative scores are treated as absolute) – Items within each block are NOT assessed independently

  • Need to radically depart from classical scoring

schemes

– Modelling the psychological process of responding to forced-choice items is the key to making sense of comparative data

  • Suitable psychological models for such data have

existed for a long time, and they are well known

Inade dequate quate scor

  • ring

ing

13

slide-14
SLIDE 14

Th Thurston rstonia ian IRT T mo mode del

Modelling decision process behind responding to forced- choice items

14

slide-15
SLIDE 15
  • Louis Thurstone (1927-1929) introduced the

notion of a psychological value or utility

– Describes “the affect that the object calls forth”; – Varies across individuals for the same object, and across objects within the same individual; – Can be placed on a psychological continuum; – Assumed normally distributed across individuals.

  • The notion of utility maximisation

– when confronted with choosing between two items, respondents will choose the item with the highest psychological value (utility).

Ps Psychol ychologi

  • gical

cal valu lue

15

slide-16
SLIDE 16
  • A respondent prefers item i to item k, if her or his

utility ti is larger than tk

  • The difference of two utilities is normally

distributed

  • Binary outcome of comparison yl linked to yl*

through a threshold process

La Law of

  • f c

com

  • mparati

rative ve ju judg dgeme ment nt ( (19 1927) 7)

16

1, if , 0, if

i k l l i k

t t y y i k t t

* *

1, if , 0, if

l l l l

y y y i k y

* l i k

y t t  

slide-17
SLIDE 17
  • Any ordering or ranking of n choice alternatives requires n(n-1)/2

separate comparison judgements:

– Rank-ordering of 2 items elicits 1 comparison: {A,B} – Rank-ordering of 3 items elicits 3 comparisons: {A,B} {A,C} {B,C} – Rank-ordering of 4 items elicits 6 comparisons: {A,B} {A,C} {A,D} {B,C} {B,D} {C,D} – Etc. – “Most” – “least” format with 4 or more alternatives is partial ranking

  • Any comparison {i, k} is coded as 1 if i preferred to k, and 0
  • therwise
  • Ordering {B, A, C} can be equivalently presented as 3 outcomes

{A, B} {A, C} {B, C} 0 1 1

Bin inary y cod

  • din

ing of

  • f r

ranki king ng da data ta

17

slide-18
SLIDE 18
  • Maydeu-Olivares (1999); Maydeu-

Olivares & Böckenholt (2005)

  • Outcomes of comparisons are

determined by the difference in utilities (no error terms)

  • Second-order factors (traits) can

be modelled

  • Identification constraints are

needed

– Fixing uniqueness of one utility per block – Factor variances are fixed to 1

  • Special identification cases

– Pairs of items

Thursto rstonian nian factor

  • r models

els

18

* 1,2

y

* 1,3

y

* 2,3

y

1

4

7

2

5

3

6

9

8

21

32

31

2 1

1

  • 1

1

  • 1
  • 1

1

t

2 2

2

t

2 3

3

t

2 4

1 1

  • 1

1

  • 1
  • 1

4

t

2 5

5

t

2 6

6

t

2 7

1 1

  • 1

1

  • 1
  • 1

7

t

2 8

8

t

2 9

9

t

* 7,8

y

* 7,9

y

* 8,9

y

* 4,5

y

* 4,6

y

* 5,6

y

1

1

2

1

3

1

1

slide-19
SLIDE 19
  • Thurstonian second-order models cannot be used directly in person-

centric applications

– we are interested in persons’ traits (second-order factors), not the utilities (first-order factors) – but the latent traits cannot be estimated

  • Re-parameterization as an IRT model (first-order)
  • Utilities of items i and k are functions of underlying factors (traits)

a and b: ti = i + ia+ i tk = k + kb+ k

  • Latent difference of utilities yl* = ti-tk is a function of the traits:

yl = ti - tk =(i -k )+(ia - kb)+(i - k) = =-k +(ia - kb)+(i - k)

IRT T reparame ameter terization ization

19

slide-20
SLIDE 20
  • The IRF for the binary outcome variable yl, which is the result of

comparison between items i and k measuring traits a and b, is

  • In intercept / slope form:
  • Special case of same-trait

(one-dimensional) comparisons

Item em response

  • nse func

nctio tion

20

Φ

2 2

Pr 1 ,

l i a k b l a b i k

y

Φ Pr 1

l a b l i a k b

y Φ Pr 1 (

l l i k

y

  • 3
  • 2
  • 1

1 2 3 0.0 0.5 1.0

  • 3
  • 1.5

1.5 3

trait 2 P(y=1) trait 1

slide-21
SLIDE 21
  • Outcomes of comparisons are

indicators of common factors (traits)

  • Special features for blocks of 3 or

more items

– Factor loadings are structured – Uniquenesses are structured – Structured local dependencies

  • Identification constraints are the

same as for the second-order model

Thursto rstonian nian IRT T model el

21

* 1,2

y

2 2 1 2

  

* 1,3

y

2 2 1 3

  

* 2,3

y

2 2 2 3

  

2 2



2 1

2 3

* 7,8

y

2 2 7 8

  

* 7,9

y

2 2 7 9

  

* 8,9

y

2 2 8 9

  

2 8



2 7

2 9

* 4,5

y

2 2 4 5

  

* 4,6

y

2 2 4 6

  

* 5,6

y

2 2 5 6

  

2 5



2 4

2 6

1

1

1

2

 1

2



5

6



7

9



6



21

32

31

2

1

3

1

3



3



4

5



4

8



7

9



8

slide-22
SLIDE 22
  • Estimated with general-purpose SEM software

Mplus (Muthén & Muthén, 1998-2010)

  • Limited information methods are the only option

for most applications

– When partial ranking format is used, Bayesian MI are recommended

  • Respondents' traits levels are estimated by the

MAP method

– Computationally efficient and unaffected by the number of latent traits

Estim timation ation and sc d scor

  • ring

ing

22

slide-23
SLIDE 23
  • Direction of information is considered
  • Information in direction of trait a for one binary
  • utcome

– Smaller when traits are positively correlated

  • One-dimensional case

Ite tem in info formation tion fu functio tion

23

2 2

corr , , 1 ,

i k a b l i a k b a l a b l a b l a b

P P

  • 3
  • 2
  • 1

1 2 3 0.0 0.1 0.2 0.3 0.4 0.5 0.6

  • 3
  • 1.5

1.5 3

trait 2 Information in direction of trait 1 trait 1

2 2

( ) 1

i k l i k l l l

P P

slide-24
SLIDE 24
  • Test information in direction of trait a
  • When using posterior latent trait estimator
  • Standard error for trait a
  • Empirical reliability (for estimated scores in a sample)

Te Test t in info formation tion and re d reli liabil bility ity

24

a a l l

1

a a P

SE

2 2

ln

a a a a P a a

2 1

1 1

N error a j j P

N

2 2 2 P error P

slide-25
SLIDE 25

Ap Appli lication ation – CC CCSQ

Empirical applications

25

slide-26
SLIDE 26
  • Customer Contact Styles Questionnaire (CCSQ)

– Measures 16 work-related traits, used in assessment for customer service roles (published by SHL) – Forced-choice format

  • 128 items grouped into 32 quads (7 to 10 items per scale)

– All items are also administered with a 5-point rating scale

  • Sample

– N=610 – Paper & Pencil UK standardisation sample from 2001 – 39% female – Half of the sample applicants, half job incumbents

CC CCSQ in Q instrume trument nt and s d sampl ple

26

slide-27
SLIDE 27
  • Missing data problem arises

– MAR but not MCAR – Limited information methods would produce distorted parameter estimates (Asparauhov & Muthen, 2010)

  • Multiple imputations in Mplus are performed
  • Model parameters are estimated on the 10 imputed

datasets (ULS estimator)

  • Person parameters are estimated on the original

dataset (MAP estimator)

CC CCSQ mo Q mode del es l estim timation ation

27 Partial ranking Binary Outcomes A B C D {A,B} {A,C} {A,D} {B,C} {B,D} {C,D} most least 1 . 1 1

slide-28
SLIDE 28

Exa xamp mple le prof rofile ile

28

  • 3.5
  • 3.0
  • 2.5
  • 2.0
  • 1.5
  • 1.0
  • 0.5

0.0 0.5 1.0 1.5 2.0 2.5 3.0 Persuasive Self-control Empathic Modest Participative Sociable Analytical Innovative Flexible Structured Detail conscious Conscientious Resilience Competitive Results orientated Energetic

Standardized scale scores

Normative Ipsative IRT Single- Stimulus IRT Forced- Choice

slide-29
SLIDE 29
  • Can have all high/low scores

– Scores are directly interpretable for comparison between individuals

No c

  • con
  • nstraint

traint on

  • n ov
  • verall

ll te test t scor

  • re

29

0% 10% 20% 30% 40% 50% Percentage of respondents Average profile score (average of all trait scores ) CTT ipsative CTT normative 0% 2% 4% 6% 8% 10% 12% 14% 16% Percentage of respondents Average profile score (average of all trait scores ) IRT-FC IRT-SS

slide-30
SLIDE 30
  • Absolute trait locations are recovered well

– Mahalanobis distances between rating and ranking scores are much smaller with IRT method

  • Construct validity undistorted

– Average scale inter-correlation for ratings r = 0.21 – For rankings scored with IRT r = 0.12 – And for classically scored rankings (ipsative) r = -.07 – Identical factors extracted from rankings and ratings

  • Reliabilities can now be estimated as well as SE

for each individual set of scores

Sc Scor

  • res

s are no lo

  • longer ip

ipsativ ative

30

slide-31
SLIDE 31

Opti tima mal l fo forced-cho choice ice de desig igns ns

Simulation studies

31

slide-32
SLIDE 32

Number of traits 2 5 Correlations between traits

0, +0.5, -0.5 as in FFM

Number of items

12 and 24 items per trait 12 items per trait

Block sizes

pairs pairs, triplets, quads

Keyed direction of items

positively worded / positively and negatively worded positively worded / positively and negatively worded

  • Conditions crossed (1000 replications with 1000 cases):
  • For all reasonable designs

– Good parameter recovery, including correlations between traits – Good latent trait recovery – Empirical chi-square rejection rates for more complex models are too high – Empirical reliabilities are sufficiently close to actual reliabilities

Sim imula latio tion n de desig igns ns

32

slide-33
SLIDE 33
  • Given sufficient number good quality items, the following are

important factors:

  • Positively and negatively keyed items

With approximately the same number of binary outcomes comes from comparing items keyed in the same and opposite directions, the trait recovery can be good with any number of traits, and any trait correlations

  • Number of traits assessed

When the number of traits is large, and traits are not strongly positively correlated overall, any forced-choice designs will reliably locate trait scores

  • Correlations between traits

Comparing items keyed in the same direction is more effective the lower correlations between the latent traits

  • Block size

Same items provide more information if combined in larger blocks

Fo Forced-cho choic ice e de desig ign n rule les

33

slide-34
SLIDE 34

Ipsative ative prob

  • ble

lem so m solved lved

Conclusions

34

slide-35
SLIDE 35
  • Other IRT models have been suggested for creating new

FC questionnaires

– McCloy et al.(2005) – Stark, Chernyshenko & Drasgow (2005) – These models do not provide a solution for the existing forced- choice tests

  • Thurstonian IRT model can be readily applied to any

forced-choice data, with the objective of estimating

– item parameters, – relationships between the latent traits, – and persons’ parameters.

  • It works with any existing tests using ranking format

– Any number of items per block – Any number of traits – Multi- and one-dimensional comparisons

Th Thur urstoni stonian an IRT T mo mode del in l in pe persp spectiv ective

35

slide-36
SLIDE 36
  • Benefits of the forced-choice format can be enjoyed

without the disadvantages of ipsative data

– Reducing halo effects in research – Cross-cultural research free of response sets – Exploration of factor structures without method factors – Etc.

  • Embedding the model in an SEM framework allows

further latent variable modeling

  • There should be no more ipsative data – the problem of

ipsative data has been effectively solved

Grow

  • wing

ing area of

  • f r

resear earch ch

36

slide-37
SLIDE 37
  • This work would not have been possible without

– the extraordinary support of my advisor Alberto Maydeu- Olivares – moral support given to me by Simon

  • Thanks to my former employer SHL Group and my research director

Dave Bartram

  • Thanks to the SMEP for the Dissertation support award
  • Thanks to the Psychometric Society for the 2011

Dissertation Award

  • Thank YOU FOR LISTENING!

Ac Ackn knowl

  • wled

edgeme gements nts

37