[PPT] - Scale construction Michelle Mazurek (some material from Bilge PowerPoint Presentation

SLIDE 1

1

Scale construction

Michelle Mazurek (some material from Bilge Mutlu)

SLIDE 2

3

About scales

Bridging from qual. to quant
Using (typically) ordinal questions

– Sometimes nominal categorical

Using them in a repeatable way
That is validated!
For construct validity

SLIDE 3

4

Thinking about construct validity

How to measure something complicated / hard

to define

– Risk taking – Privacy concern – Sociability – Etc.

In a va

vali lida date ted way!

SLIDE 4

5

What do we want to validate?

Items -> latent factors
Reliability: internal consistency, test-retest
Reflects something in the real world

SLIDE 5

6

Overall procedure

Generate items and review for wording, match

to intended construct, etc.

– Expert review; cognitive interview

Refine items

– Check for range effects – Do exploratory factor analysis – Get rid of ones that don’t work – Set up subscales – Repeat

SLIDE 6

7

Overall procedure (2)

Validate

– Against other scales, real-world behavior – That subscales still intra-correlate and load – Test-retest – Different populations? Modes (internet)?

SLIDE 7

8

EXPLOR ORATOR ORY FACTOR OR AN ANAL ALYSIS

SLIDE 8

9

Often, multiple components

Risk perception: different kinds of risk
Privacy -- ideas about collection vs. unauthorized

sharing, etc.

…
Subscales!

SLIDE 9

10

Observable vs. latent

Observable: answers to items, test scores, other

measurements

Latent: underlying fa

factor that correlates with (governs?) multiple measurable components

Factor analysis: re

reduce large number of

bservables to smaller number of latent factors

– Resultant factors hopefully (mostly) independent

SLIDE 10

11

Factor analysis model

X1-Xn: measured variables
F1-Fm: latent factors
b11-bnm: factor loa

loadin ings

X1 = b11F1 + b12F2 + …. b1mFm + e1

X2 = b21F1 + b22F2 + …. b2mFm + e2 Xn = bn1F1 + bn2F2 + …. bnmFm + en

SLIDE 11

12

Factor analysis model

Loadings: -1 to 1, where 0 = no loading
Like to end up w/ mostly 1s and 0s
All based on correlation / covariance matrices

among the measure variables

SLIDE 12

13

Assumptions:

Measurement error constant variance, avg=0
No assoc. btwn errors
No assoc. btwn factor + measurement error
Local/conditional independence:

– Meas. Vars are independent (given the factor)

In practice: everything in standardized

– Subtract the mean (center at 0) and div by StD (var=1) – Total variance = # of meas. variables

SLIDE 13

14

Requires large samples

Rule of thumb: 10 observations per variable in

the list (so if 30 item scale, n=300)

SLIDE 14

15

Running example

Teaching reviews (from “Real Statistics with

Excel” website)

120 obs. of 9 questions

– All on 1-10 Likerts – E.g. is entertaining, communicates well, has expertise in the subject, passion for teaching, etc.

SLIDE 15

16

Overall procedure

Co

Collec ect + ex explore e data

Extract initial factors; choose how many to retain
Choose and use estimation method
Rotate
Interpret, adjust, repeat

SLIDE 16

17

Explore data

Check for range effects
Check for applicability of factor analysis

– KMO sampling adequacy (> 0.6) – Bartlett’s sphericity

Null: correlation matrix is identity matrix (everything is

uncorrelated). You want to reject it (p < 0.05). But, it’s always rejected basically.

SLIDE 17

18

Overall procedure

Collect + explore data
Extra

Extract ct initial facto ctors rs; ch choose how many y to to reta tain

Choose and use estimation method
Rotate
Interpret, adjust, repeat

SLIDE 18

19

How many factors?

Theoretical / predicted answer
Guess and check
Use PCA to find out

– Start with factors = # of variables – Decide how many to retain based on results

Too many: some may have zero loadings; not

parsimonious

Too few: may have incorrect loadings (worse!)

SLIDE 19

20

Using PCA to retain factors

Each factor has an associated eigenvalue; retain based
n eigenvalues
All with eigenvalue > 1 (Kaiser)

– Factor contributes more than a single measure variable to the total variance (each meas has var=1) – This is obviously arbitrary; can retain too many

Scree plot (Catell): Plot, keep left of inflection

– Subjective

Min factors where sum > 70% (80%) of total variance
Others

SLIDE 20

21

Overall procedure

Collect + explore data
Extract initial factors; choose how many to retain
Cho

Choose e and nd us use e es esti tima mati tion n metho method

Rotate
Interpret, adjust, repeat
Confirm: collect new data and fit to model

– Evaluate adequacy; compare to other models

SLIDE 21

22

Main estimation method

Maximum likelihood

– Max. likelihood of seeing this corr. matrix (more CFA)

Principle Axis

– Put as many vars as possible on first factor, etc.

Principle components (ish)

– Account for max. variance with first factor, etc.

SLIDE 22

23

Overall procedure

Collect + explore data
Extract initial factors; choose how many to retain
Choose and use estimation method
Ro

Rota tate te

Interpret, adjust items, repeat

SLIDE 23

24

Rotation factor loadings

There are infinite equally good solutions to the

factor loadings (matrix math)

Think of these as rotations

– Factors are axes/vectors, variables “load” onto close by axes, can ”rotate” them infinitely

Goal: loadings that are close to either 1 or 0

– Distribute items among factors – Clearly distinguish “on” or “off” – Does not improve fit!

SLIDE 24

25

Rotation methods

Orthogonal: factors independent

– Varimax: max sq. loading variance ac across ss va vars rs

Most common

– Quartimax: max. it ac across fac ss factors

Oblique: not independent

– Oblimin, promax

SLIDE 25

26

Choosing rotation

Maybe not super important
Orthogonal: simple to interpret

– Is independence reasonable for your construct?

Oblique: maybe simpler structure, but

interactions are confusing

– Loading not interpretable as correlation var + factor

SLIDE 26

27

Overall procedure

Collect + explore data
Extract initial factors; choose how many to retain
Choose and use estimation method
Rotate
In

Interpret, a , adjust st i items, r s, repeat

SLIDE 27

28

Detour: FA vs. clustering

Clustering: Group ob
bservation

ions

– Find and profile subgroups

FA: Group va

vari riabl bles

– Data reduction – Latent factors

SLIDE 28

29

Detour: FA vs. PCA

Meta-analysis study
CFA: underlying construct

– Best for correlations of variables, structure of data

PCA: increased factor loadings

– Best for summarizing, reducing variables

(Kim 2008)

SLIDE 29

30

Detour: Communality vs. uniqueness

Communality: Variance in the measure variable

explained by the factors

Uniqueness: variance explained by the e term

SLIDE 30

31

Choosing items

Drop anything with uniqueness > 0.5

– Not well mapped to factors

Keep things that load > 0.3 (or 0.5)
Avoid cross-loading items

– Anything that doesn’t load as least 2x on “main” factor (“Saucier”)

SLIDE 31

32

Interpreting a subscale

Is there a coherent explanation for why these

particular questions fit together?

Do the subscale items have high reliability?

– Cronbach alpha > 0.6 for each, 0.7 for majority of the subscales (McKinley) – Item-total correlation (pearson btwn item and subscale average) > 0.2 (Everitt)

SLIDE 32

33

Validating the scale

Get a new sample, check validity
Does PCA produce same # of factors?
Do items load as predicted?
Test-retest: same participants, over time
Validate against real-world data: