Scale construction Michelle Mazurek (some material from Bilge - - PowerPoint PPT Presentation

scale construction
SMART_READER_LITE
LIVE PREVIEW

Scale construction Michelle Mazurek (some material from Bilge - - PowerPoint PPT Presentation

Scale construction Michelle Mazurek (some material from Bilge Mutlu) 1 About scales Bridging from qual. to quant Using (typically) ordinal questions Sometimes nominal categorical Using them in a repeatable way That is


slide-1
SLIDE 1

1

Scale construction

Michelle Mazurek (some material from Bilge Mutlu)

slide-2
SLIDE 2

3

About scales

  • Bridging from qual. to quant
  • Using (typically) ordinal questions

– Sometimes nominal categorical

  • Using them in a repeatable way
  • That is validated!
  • For construct validity
slide-3
SLIDE 3

4

Thinking about construct validity

  • How to measure something complicated / hard

to define

– Risk taking – Privacy concern – Sociability – Etc.

  • In a va

vali lida date ted way!

slide-4
SLIDE 4

5

What do we want to validate?

  • Items -> latent factors
  • Reliability: internal consistency, test-retest
  • Reflects something in the real world
slide-5
SLIDE 5

6

Overall procedure

  • Generate items and review for wording, match

to intended construct, etc.

– Expert review; cognitive interview

  • Refine items

– Check for range effects – Do exploratory factor analysis – Get rid of ones that don’t work – Set up subscales – Repeat

slide-6
SLIDE 6

7

Overall procedure (2)

  • Validate

– Against other scales, real-world behavior – That subscales still intra-correlate and load – Test-retest – Different populations? Modes (internet)?

slide-7
SLIDE 7

8

EXPLOR ORATOR ORY FACTOR OR AN ANAL ALYSIS

slide-8
SLIDE 8

9

Often, multiple components

  • Risk perception: different kinds of risk
  • Privacy -- ideas about collection vs. unauthorized

sharing, etc.

  • Subscales!
slide-9
SLIDE 9

10

Observable vs. latent

  • Observable: answers to items, test scores, other

measurements

  • Latent: underlying fa

factor that correlates with (governs?) multiple measurable components

  • Factor analysis: re

reduce large number of

  • bservables to smaller number of latent factors

– Resultant factors hopefully (mostly) independent

slide-10
SLIDE 10

11

Factor analysis model

  • X1-Xn: measured variables
  • F1-Fm: latent factors
  • b11-bnm: factor loa

loadin ings

  • X1 = b11F1 + b12F2 + …. b1mFm + e1

X2 = b21F1 + b22F2 + …. b2mFm + e2 Xn = bn1F1 + bn2F2 + …. bnmFm + en

slide-11
SLIDE 11

12

Factor analysis model

  • Loadings: -1 to 1, where 0 = no loading
  • Like to end up w/ mostly 1s and 0s
  • All based on correlation / covariance matrices

among the measure variables

slide-12
SLIDE 12

13

Assumptions:

  • Measurement error constant variance, avg=0
  • No assoc. btwn errors
  • No assoc. btwn factor + measurement error
  • Local/conditional independence:

– Meas. Vars are independent (given the factor)

  • In practice: everything in standardized

– Subtract the mean (center at 0) and div by StD (var=1) – Total variance = # of meas. variables

slide-13
SLIDE 13

14

Requires large samples

  • Rule of thumb: 10 observations per variable in

the list (so if 30 item scale, n=300)

slide-14
SLIDE 14

15

Running example

  • Teaching reviews (from “Real Statistics with

Excel” website)

  • 120 obs. of 9 questions

– All on 1-10 Likerts – E.g. is entertaining, communicates well, has expertise in the subject, passion for teaching, etc.

slide-15
SLIDE 15

16

Overall procedure

  • Co

Collec ect + ex explore e data

  • Extract initial factors; choose how many to retain
  • Choose and use estimation method
  • Rotate
  • Interpret, adjust, repeat
slide-16
SLIDE 16

17

Explore data

  • Check for range effects
  • Check for applicability of factor analysis

– KMO sampling adequacy (> 0.6) – Bartlett’s sphericity

  • Null: correlation matrix is identity matrix (everything is

uncorrelated). You want to reject it (p < 0.05). But, it’s always rejected basically.

slide-17
SLIDE 17

18

Overall procedure

  • Collect + explore data
  • Extra

Extract ct initial facto ctors rs; ch choose how many y to to reta tain

  • Choose and use estimation method
  • Rotate
  • Interpret, adjust, repeat
slide-18
SLIDE 18

19

How many factors?

  • Theoretical / predicted answer
  • Guess and check
  • Use PCA to find out

– Start with factors = # of variables – Decide how many to retain based on results

  • Too many: some may have zero loadings; not

parsimonious

  • Too few: may have incorrect loadings (worse!)
slide-19
SLIDE 19

20

Using PCA to retain factors

  • Each factor has an associated eigenvalue; retain based
  • n eigenvalues
  • All with eigenvalue > 1 (Kaiser)

– Factor contributes more than a single measure variable to the total variance (each meas has var=1) – This is obviously arbitrary; can retain too many

  • Scree plot (Catell): Plot, keep left of inflection

– Subjective

  • Min factors where sum > 70% (80%) of total variance
  • Others
slide-20
SLIDE 20

21

Overall procedure

  • Collect + explore data
  • Extract initial factors; choose how many to retain
  • Cho

Choose e and nd us use e es esti tima mati tion n metho method

  • Rotate
  • Interpret, adjust, repeat
  • Confirm: collect new data and fit to model

– Evaluate adequacy; compare to other models

slide-21
SLIDE 21

22

Main estimation method

  • Maximum likelihood

– Max. likelihood of seeing this corr. matrix (more CFA)

  • Principle Axis

– Put as many vars as possible on first factor, etc.

  • Principle components (ish)

– Account for max. variance with first factor, etc.

slide-22
SLIDE 22

23

Overall procedure

  • Collect + explore data
  • Extract initial factors; choose how many to retain
  • Choose and use estimation method
  • Ro

Rota tate te

  • Interpret, adjust items, repeat
slide-23
SLIDE 23

24

Rotation factor loadings

  • There are infinite equally good solutions to the

factor loadings (matrix math)

  • Think of these as rotations

– Factors are axes/vectors, variables “load” onto close by axes, can ”rotate” them infinitely

  • Goal: loadings that are close to either 1 or 0

– Distribute items among factors – Clearly distinguish “on” or “off” – Does not improve fit!

slide-24
SLIDE 24

25

Rotation methods

  • Orthogonal: factors independent

– Varimax: max sq. loading variance ac across ss va vars rs

  • Most common

– Quartimax: max. it ac across fac ss factors

  • Oblique: not independent

– Oblimin, promax

slide-25
SLIDE 25

26

Choosing rotation

  • Maybe not super important
  • Orthogonal: simple to interpret

– Is independence reasonable for your construct?

  • Oblique: maybe simpler structure, but

interactions are confusing

– Loading not interpretable as correlation var + factor

slide-26
SLIDE 26

27

Overall procedure

  • Collect + explore data
  • Extract initial factors; choose how many to retain
  • Choose and use estimation method
  • Rotate
  • In

Interpret, a , adjust st i items, r s, repeat

slide-27
SLIDE 27

28

Detour: FA vs. clustering

  • Clustering: Group ob
  • bservation

ions

– Find and profile subgroups

  • FA: Group va

vari riabl bles

– Data reduction – Latent factors

slide-28
SLIDE 28

29

Detour: FA vs. PCA

  • Meta-analysis study
  • CFA: underlying construct

– Best for correlations of variables, structure of data

  • PCA: increased factor loadings

– Best for summarizing, reducing variables

  • (Kim 2008)
slide-29
SLIDE 29

30

Detour: Communality vs. uniqueness

  • Communality: Variance in the measure variable

explained by the factors

  • Uniqueness: variance explained by the e term
slide-30
SLIDE 30

31

Choosing items

  • Drop anything with uniqueness > 0.5

– Not well mapped to factors

  • Keep things that load > 0.3 (or 0.5)
  • Avoid cross-loading items

– Anything that doesn’t load as least 2x on “main” factor (“Saucier”)

slide-31
SLIDE 31

32

Interpreting a subscale

  • Is there a coherent explanation for why these

particular questions fit together?

  • Do the subscale items have high reliability?

– Cronbach alpha > 0.6 for each, 0.7 for majority of the subscales (McKinley) – Item-total correlation (pearson btwn item and subscale average) > 0.2 (Everitt)

slide-32
SLIDE 32

33

Validating the scale

  • Get a new sample, check validity
  • Does PCA produce same # of factors?
  • Do items load as predicted?
  • Test-retest: same participants, over time
  • Validate against real-world data:

– SEBIS vs. measured security behavior – DOSPERT vs. risk behaviors